I would like to find a way to escape the delimiter character in my
data so that it doesn't get interpreted as extra columns. For example,
if I'm using comma as a delimiter, and I have a column with value
"foo,bar" I want that string interpreted as a single column without
having the loader pick up the comma in the middle. I noticed that old
versions of PigLoader actually used a regex match as the delimiter
(which would have been perfect), but that was removed in favor of a
simple string match.
Before I go to the trouble of writing a custom loader, which I'd
rather not do if I can avoid it:
1. Is there a way that I could pre-process the data to escape that
character that I missed in the wiki docs? (e.g. could I escape it as
"foo\,bar" or "foo,,bar" or something similar with the existing loader?
2. Is there a different approach altogether I could take? My data is
being generated in a controllable format, and can certainly be
massaged or filtered in some way to make life easier.