FAQ
Hello there,

Based on
http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ I
want to add geolocalisation to my haproxy raw logs stored in Hbase Table.

Here is my pig script (wrapper.sh is an auto extract bash archive that
deploy the perl script and its dependances very close to the one in my
reference and launch it ) :

DEFINE iplookup `wrapper.sh GeoIP`
ship ('wrapper.sh')
cache('/GeoIP/GeoIPcity.dat#GeoIP');

A = load 'log' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109151630
-loadKey') AS (rowkey, data);
B = LIMIT A 10;
C = FOREACH B {
t =
REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+)
',1);
generate rowkey, t;
}
D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country,
state, city);
STORE D INTO 'geoip_pig' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip
location:country_code location:country location:state location:city');

I can DUMP D; without problem, I get what is promised with my
geolocalisation :
(_f:squid_t:20110916103000_b:squid_s:200-+PH/I6eJ9h8Sy8/1+yz2kw==,77.192.16.143,FR,France,B9,Lyon)
(_f:squid_t:20110916103000_b:squid_s:200-+XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-+gl66vwlvPL9Di1zzut9Bg==,178.250.1.40,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-+qAtjeGfssc2vkwWR4fmJQ==,86.73.78.25,FR,France,A8,La
Courneuve)
(_f:squid_t:20110916103000_b:squid_s:200-+wgQq1q8H/vp52//EIevzA==,80.13.204.64,FR,France,A8,Paris)
(_f:squid_t:20110916103000_b:squid_s:200-/3J9EosV46v521VBlb6zxQ==,82.127.103.161,FR,France,B6,Erquery)
(_f:squid_t:20110916103000_b:squid_s:200-/3okAiWeWMmpm54Qlk7JyQ==,86.75.127.253,FR,France,B5,La
Dagueni�re)
(_f:squid_t:20110916103000_b:squid_s:200-/yZ09fLNWflcBlWX1BjEkA==,83.200.13.146,FR,France,A8,Villiers-le-bel)
(_f:squid_t:20110916103000_b:squid_s:200-0/HiVaFE6b1zrUTtHkV05Q==,193.228.156.10,FR,France,,)
(_f:squid_t:20110916103000_b:squid_s:200-0CTc6LQ9jGpgQQLwmJZxQQ==,195.93.102.10,FR,France,,)



But when I want to store (last line) in a new existing HTable I get the
following error message in the reduce JT UI :

java.io.IOException: java.lang.IllegalArgumentException: No columns to insert
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:439)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.cleanup(PigMapReduce.java:492)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:178)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: No columns to insert
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:845)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:677)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:431)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:514)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:437)
... 9 more


Other question : what is the code for comments in pig script (except /*
... */) to exclude one line rapidly.

I use cdh3u1 packages.

Thank you for helping.

Regards,

--
Damien

Search Discussions

  • Dmitriy Ryaboy at Sep 16, 2011 at 9:27 am
    1) Please try trunk.

    2) Like in sql, a single line comment is preceded by two dashes: "--"


    D
    On Fri, Sep 16, 2011 at 2:10 AM, Damien Hardy wrote:

    Hello there,

    Based on http://www.cloudera.com/blog/**2009/06/analyzing-apache-logs-**
    with-pig/<http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/>I want to add geolocalisation to my haproxy raw logs stored in Hbase Table.

    Here is my pig script (wrapper.sh is an auto extract bash archive that
    deploy the perl script and its dependances very close to the one in my
    reference and launch it ) :

    DEFINE iplookup `wrapper.sh GeoIP`
    ship ('wrapper.sh')
    cache('/GeoIP/GeoIPcity.dat#**GeoIP');

    A = load 'log' using org.apache.pig.backend.hadoop.**
    hbase.HBaseStorage('default:**body','-gt=_f:squid_t:**201109151630
    -loadKey') AS (rowkey, data);
    B = LIMIT A 10;
    C = FOREACH B {
    t = REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
    ',1);
    generate rowkey, t;
    }
    D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state,
    city);
    STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
    hbase.HBaseStorage('location:**ip location:country_code location:country
    location:state location:city');

    I can DUMP D; without problem, I get what is promised with my
    geolocalisation :
    (_f:squid_t:20110916103000_b:**squid_s:200-+PH/I6eJ9h8Sy8/1+**
    yz2kw==,77.192.16.143,FR,**France,B9,Lyon)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**
    XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.**204.64,FR,France,A8,Paris)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**gl66vwlvPL9Di1zzut9Bg==,178.
    **250.1.40,FR,France,,)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**
    qAtjeGfssc2vkwWR4fmJQ==,86.73.**78.25,FR,France,A8,La Courneuve)
    (_f:squid_t:20110916103000_b:**squid_s:200-+wgQq1q8H/vp52//**
    EIevzA==,80.13.204.64,FR,**France,A8,Paris)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**3J9EosV46v521VBlb6zxQ==,82.*
    *127.103.161,FR,France,B6,**Erquery)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**3okAiWeWMmpm54Qlk7JyQ==,
    86.75.127.253,FR,**France,B5,La Dagueni�re)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**yZ09fLNWflcBlWX1BjEkA==,83.*
    *200.13.146,FR,France,A8,**Villiers-le-bel)
    (_f:squid_t:20110916103000_b:**squid_s:200-0/**HiVaFE6b1zrUTtHkV05Q==,193.
    **228.156.10,FR,France,,)
    (_f:squid_t:20110916103000_b:**squid_s:200-**0CTc6LQ9jGpgQQLwmJZxQQ==,195.
    **93.102.10,FR,France,,)



    But when I want to store (last line) in a new existing HTable I get the
    following error message in the reduce JT UI :

    java.io.IOException: java.lang.**IllegalArgumentException: No columns to
    insert
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:439)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.cleanup(PigMapReduce.**java:492)
    at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:178)
    at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**
    ReduceTask.java:572)
    at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:414)
    at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:396)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**
    UserGroupInformation.java:**1127)
    at org.apache.hadoop.mapred.**Child.main(Child.java:264)
    Caused by: java.lang.**IllegalArgumentException: No columns to insert
    at org.apache.hadoop.hbase.**client.HTable.validatePut(**
    HTable.java:845)
    at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:677)
    at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**667)
    at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
    TableRecordWriter.write(**TableOutputFormat.java:127)
    at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
    TableRecordWriter.write(**TableOutputFormat.java:82)
    at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
    HBaseStorage.java:431)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
    PigOutputFormat.java:138)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
    PigOutputFormat.java:97)
    at org.apache.hadoop.mapred.**ReduceTask$**
    NewTrackingRecordWriter.write(**ReduceTask.java:514)
    at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
    TaskInputOutputContext.java:**80)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:437)
    ... 9 more


    Other question : what is the code for comments in pig script (except /* ...
    */) to exclude one line rapidly.

    I use cdh3u1 packages.

    Thank you for helping.

    Regards,

    --
    Damien
  • Damien Hardy at Sep 16, 2011 at 10:13 am
    Thank you Dimitriy.

    The 0.9.1-SNAPSHOT version of pig is working without error with the same
    script ...
    Is there a bug open at Cloudera ?

    Thank you.

    Regards.

    --
    Damien


    Le 16/09/2011 11:26, Dmitriy Ryaboy a écrit :
    1) Please try trunk.

    2) Like in sql, a single line comment is preceded by two dashes: "--"


    D

    On Fri, Sep 16, 2011 at 2:10 AM, Damien Hardywrote:
    Hello there,

    Based on http://www.cloudera.com/blog/**2009/06/analyzing-apache-logs-**
    with-pig/<http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/>I want to add geolocalisation to my haproxy raw logs stored in Hbase Table.

    Here is my pig script (wrapper.sh is an auto extract bash archive that
    deploy the perl script and its dependances very close to the one in my
    reference and launch it ) :

    DEFINE iplookup `wrapper.sh GeoIP`
    ship ('wrapper.sh')
    cache('/GeoIP/GeoIPcity.dat#**GeoIP');

    A = load 'log' using org.apache.pig.backend.hadoop.**
    hbase.HBaseStorage('default:**body','-gt=_f:squid_t:**201109151630
    -loadKey') AS (rowkey, data);
    B = LIMIT A 10;
    C = FOREACH B {
    t = REGEX_EXTRACT(data,'([0-9]{1,**3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\**.[0-9]{1,3}):([0-9]+)
    ',1);
    generate rowkey, t;
    }
    D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state,
    city);
    STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.**
    hbase.HBaseStorage('location:**ip location:country_code location:country
    location:state location:city');

    I can DUMP D; without problem, I get what is promised with my
    geolocalisation :
    (_f:squid_t:20110916103000_b:**squid_s:200-+PH/I6eJ9h8Sy8/1+**
    yz2kw==,77.192.16.143,FR,**France,B9,Lyon)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**
    XSr1ZpMyLGmi8iDvZ4lLQ==,80.13.**204.64,FR,France,A8,Paris)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**gl66vwlvPL9Di1zzut9Bg==,178.
    **250.1.40,FR,France,,)
    (_f:squid_t:20110916103000_b:**squid_s:200-+**
    qAtjeGfssc2vkwWR4fmJQ==,86.73.**78.25,FR,France,A8,La Courneuve)
    (_f:squid_t:20110916103000_b:**squid_s:200-+wgQq1q8H/vp52//**
    EIevzA==,80.13.204.64,FR,**France,A8,Paris)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**3J9EosV46v521VBlb6zxQ==,82.*
    *127.103.161,FR,France,B6,**Erquery)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**3okAiWeWMmpm54Qlk7JyQ==,
    86.75.127.253,FR,**France,B5,La Dagueni�re)
    (_f:squid_t:20110916103000_b:**squid_s:200-/**yZ09fLNWflcBlWX1BjEkA==,83.*
    *200.13.146,FR,France,A8,**Villiers-le-bel)
    (_f:squid_t:20110916103000_b:**squid_s:200-0/**HiVaFE6b1zrUTtHkV05Q==,193.
    **228.156.10,FR,France,,)
    (_f:squid_t:20110916103000_b:**squid_s:200-**0CTc6LQ9jGpgQQLwmJZxQQ==,195.
    **93.102.10,FR,France,,)



    But when I want to store (last line) in a new existing HTable I get the
    following error message in the reduce JT UI :

    java.io.IOException: java.lang.**IllegalArgumentException: No columns to
    insert
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:439)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.cleanup(PigMapReduce.**java:492)
    at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:178)
    at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**
    ReduceTask.java:572)
    at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:414)
    at org.apache.hadoop.mapred.**Child$4.run(Child.java:270)
    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:396)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**
    UserGroupInformation.java:**1127)
    at org.apache.hadoop.mapred.**Child.main(Child.java:264)
    Caused by: java.lang.**IllegalArgumentException: No columns to insert
    at org.apache.hadoop.hbase.**client.HTable.validatePut(**
    HTable.java:845)
    at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:677)
    at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**667)
    at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
    TableRecordWriter.write(**TableOutputFormat.java:127)
    at org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
    TableRecordWriter.write(**TableOutputFormat.java:82)
    at org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
    HBaseStorage.java:431)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
    PigOutputFormat.java:138)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.**PigOutputFormat$**PigRecordWriter.write(**
    PigOutputFormat.java:97)
    at org.apache.hadoop.mapred.**ReduceTask$**
    NewTrackingRecordWriter.write(**ReduceTask.java:514)
    at org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
    TaskInputOutputContext.java:**80)
    at org.apache.pig.backend.hadoop.**executionengine.**
    mapReduceLayer.PigMapReduce$**Reduce.runPipeline(**PigMapReduce.java:437)
    ... 9 more


    Other question : what is the code for comments in pig script (except /* ...
    */) to exclude one line rapidly.

    I use cdh3u1 packages.

    Thank you for helping.

    Regards,

    --
    Damien
  • Damien Hardy at Sep 16, 2011 at 12:37 pm
    Hmm when I look twice on my result, it is not exacltly what I expect :

    hbase(main):022:0> get 'geoip_pig',
    "_f:squid_t:20110916140500_b:squid_s:200-1VPVjbVwywTpNtLA4mHl+A=="
    COLUMN CELL
    location:city
    timestamp=1316176308180, value=
    location:country
    timestamp=1316176308180, value=
    location:country_code
    timestamp=1316176308180, value=
    location:ip
    timestamp=1316176308180, value=90.9.213.170,FR,France,A9,Llupia
    location:state
    timestamp=1316176308180, value=


    It seams that HbaseStorage() can detect the fields separator ( "," for
    instance ) to get the rowkey but not for the columns.

    Bug ?

    Regards,

    --
    Damien

    Le 16/09/2011 12:12, Damien Hardy a écrit :
    Thank you Dimitriy.

    The 0.9.1-SNAPSHOT version of pig is working without error with the
    same script ...
    Is there a bug open at Cloudera ?

    Thank you.

    Regards.
  • Damien Hardy at Sep 16, 2011 at 1:51 pm
    I have the same result with 0.10.0 (trunk) even with '-delim=","' or
    '-delim=,' in optString


    Le 16/09/2011 14:35, Damien Hardy a écrit :
    Hmm when I look twice on my result, it is not exacltly what I expect :

    hbase(main):022:0> get 'geoip_pig',
    "_f:squid_t:20110916140500_b:squid_s:200-1VPVjbVwywTpNtLA4mHl+A=="
    COLUMN CELL
    location:city
    timestamp=1316176308180, value=
    location:country
    timestamp=1316176308180, value=
    location:country_code
    timestamp=1316176308180, value=
    location:ip
    timestamp=1316176308180, value=90.9.213.170,FR,France,A9,Llupia
    location:state
    timestamp=1316176308180, value=


    It seams that HbaseStorage() can detect the fields separator ( "," for
    instance ) to get the rowkey but not for the columns.

    Bug ?

    Regards,

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 16, '11 at 9:12a
activeSep 16, '11 at 1:51p
posts5
users2
websitepig.apache.org

2 users in discussion

Damien Hardy: 4 posts Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2021 Grokbase