Grokbase Groups Pig dev August 2010
FAQ
Difference in Semantics between Load statement in Pig and HDFS client on Command line
-------------------------------------------------------------------------------------

Key: PIG-1576
URL: https://issues.apache.org/jira/browse/PIG-1576
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.7.0, 0.6.0
Reporter: Viraj Bhat


Here is my directory structure on HDFS which I want to access using Pig.
This is a sample, but in real use case I have more than 100 of these directories.
{code}
$ hadoop fs -ls /user/viraj/recursive/
Found 3 items
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080615
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080616
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080617
{code}
Using the command line I am access them using variety of options:
{code}
$ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt

$ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/

-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt

-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt

-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
{code}

I have written a Pig script, all the below combination of load statements do not work?
{code}
--A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') as (k:int, v:chararray);
A = load '/user/viraj/recursive/{20080615..20080617}/' using PigStorage('\u0001') as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}

I get the following error in Pig 0.8
{noformat}
2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27 LIMIT
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A A,AL Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: /user/viraj/recursive/{20080615..20080617}/
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 0 files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
... 7 more
hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
{noformat}

The following works:
{code}
A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}

Why is there an inconsistency between HDFS client and Pig?

Viraj

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Alan Gates (JIRA) at Sep 21, 2010 at 6:01 pm
    [ https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-1576:
    ----------------------------

    Fix Version/s: 0.9.0
    Difference in Semantics between Load statement in Pig and HDFS client on Command line
    -------------------------------------------------------------------------------------

    Key: PIG-1576
    URL: https://issues.apache.org/jira/browse/PIG-1576
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: 0.6.0, 0.7.0
    Reporter: Viraj Bhat
    Fix For: 0.9.0


    Here is my directory structure on HDFS which I want to access using Pig.
    This is a sample, but in real use case I have more than 100 of these directories.
    {code}
    $ hadoop fs -ls /user/viraj/recursive/
    Found 3 items
    drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080615
    drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080616
    drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080617
    {code}
    Using the command line I am access them using variety of options:
    {code}
    $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
    $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
    -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
    {code}
    I have written a Pig script, all the below combination of load statements do not work?
    {code}
    --A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') as (k:int, v:chararray);
    A = load '/user/viraj/recursive/{20080615..20080617}/' using PigStorage('\u0001') as (k:int, v:chararray);
    AL = limit A 10;
    dump AL;
    {code}
    I get the following error in Pig 0.8
    {noformat}
    2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27 LIMIT
    Failed!
    Failed Jobs:
    JobId Alias Feature Message Outputs
    N/A A,AL Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: /user/viraj/recursive/{20080615..20080617}/
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:619)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 0 files
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
    ... 7 more
    hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
    {noformat}
    The following works:
    {code}
    A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') as (k:int, v:chararray);
    AL = limit A 10;
    dump AL;
    {code}
    Why is there an inconsistency between HDFS client and Pig?
    Viraj
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 27, '10 at 11:50p
activeSep 21, '10 at 6:01p
posts2
users1
websitepig.apache.org

1 user in discussion

Alan Gates (JIRA): 2 posts

People

Translate

site design / logo © 2021 Grokbase