Grokbase Groups Pig dev November 2010
FAQ
[ https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-1576.
---------------------------------

Resolution: Invalid

We followed up with the HDFS guys and they confirmed that HDFS does not support this pattern either through API. The reason this worked is likely due to interaction with Unix shell
Difference in Semantics between Load statement in Pig and HDFS client on Command line
-------------------------------------------------------------------------------------

Key: PIG-1576
URL: https://issues.apache.org/jira/browse/PIG-1576
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
Fix For: 0.9.0


Here is my directory structure on HDFS which I want to access using Pig.
This is a sample, but in real use case I have more than 100 of these directories.
{code}
$ hadoop fs -ls /user/viraj/recursive/
Found 3 items
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080615
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080616
drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080617
{code}
Using the command line I am access them using variety of options:
{code}
$ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
$ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
-rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
{code}
I have written a Pig script, all the below combination of load statements do not work?
{code}
--A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') as (k:int, v:chararray);
A = load '/user/viraj/recursive/{20080615..20080617}/' using PigStorage('\u0001') as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}
I get the following error in Pig 0.8
{noformat}
2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27 LIMIT
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A A,AL Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: /user/viraj/recursive/{20080615..20080617}/
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 0 files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
... 7 more
hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
{noformat}
The following works:
{code}
A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}
Why is there an inconsistency between HDFS client and Pig?
Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupdev @
categoriespig, hadoop
postedNov 17, '10 at 1:39a
activeNov 17, '10 at 1:39a
posts1
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 1 post

People

Translate

site design / logo © 2021 Grokbase