Grokbase Groups Pig user June 2011

I'm trying to write a pig script to examine a csv file and I'm having problems with the flatten and extract functions. The problem is when I run the pig script below I get:

ERROR 1017: Schema mismatch. A basic type on flattening cannot have more than one column. User defined schema: {startip: chararray,endip: chararray,country: chararray,region: chararray,city: chararray,postal: chararray,lat: chararray,lon: chararray,dma: chararray,areacode: chararray}

and if I take flatten out:
[main] ERROR - ERROR 1000: Error during parsing. Encountered "" at line 27, column 6.
Was expecting one of:

Here is an example of my data:

Here is my program:

--declare udf
REGISTER file:/usr/lib/pig/contrib/piggybank/java/piggybank.jar

--define aliases for any classes you wanto to use
DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.RegexExtract();

--load in data
rawlogs = load 'geoshort.csv' using TextLoader as (line:chararray);

--print out a couple lines of data
illustrate rawlogs;

logbase = foreach rawlogs generate
EXTRACT(line, '^(\\S+) (\\S+) "(.+?)" "(.+?)" "(.+?)" "(.+?)" (\\S+) (\\S+) (\\S+) (\\S+)')
as (
startip: chararray,
endip: chararray,
country: chararray,
region: chararray,
city: chararray,
postal: chararray,
lat: chararray,
lon: chararray,
dma: chararray,
areacode: chararray

illustrate logbase;

Thanks in advance.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 21, '11 at 9:03p
activeJun 21, '11 at 9:03p

1 user in discussion

Ross Nordeen: 1 post



site design / logo © 2021 Grokbase