I'm having an issue with regex in pig.
Specifically, I'm loading an apache access log and trying to break out the
bits from the query string:
logs = LOAD '$input' using logloader as (remoteHost:CHARARRAY,
hyphen:CHARARRAY, hyphen2:CHARARRAY, time:CHARARRAY, method:CHARARRAY,
uri:CHARARRAY, protocol:CHARARRAY, statusCode:CHARARRAY,
responseSize:CHARARRAY, treferer:CHARARRAY, agent:CHARARRAY);
full_logs = FOREACH logs GENERATE time, uri, FLATTEN(REGEX_EXTRACT(uri,
The uri looks like:
However when I run this simple pig script, I get the uri but not the 'id'
I then tried using "\d" instead of [0-9] - still won't work.
I tried both [0-9] and \d in php and I get 'id=1' and '1' so I'm not sure
what I'm doing wrong.
Thanks in advance.