Grokbase Groups Pig user June 2008
FAQ
If I have multiple files in a directory, how do I load this into Pig? I want to run Pig over an input directory, not an individual file.

%ls Data
myfile1.txt
myfile2.txt
myfile3.txt
myfile4.txt
myfile5.txt

thanks.

Also, if I run the sample Pig Latin commands, I keep getting errors saying "Unable to open iterator"

For example,

A = LOAD 'myfile.txt' USING PigStorage('\t') AS (f1,f2,f3);
dump A

Gives me correct:
<1, 2, 3>
<4, 2, 1>
<8, 3, 4>
<4, 3, 3>
<7, 2, 5>
<8, 4, 3>

but, then when I do the next sample,
Y = FILTER A BY f1 == '8';
dump Y

I get a bunch of parser errors then the Unable to open iterator Y.

This happens for most of the rest of the samples.

What's going on?

Search Discussions

  • Prashanth Pappu at Jun 6, 2008 at 4:33 pm

    Y = FILTER A BY f1 == '8';
    dump Y

    You are using the '==' operator with a string '8'. Just try
    Y = FILTER A BY f1==8;

    This is related to the concerns I've been raising. In the above example
    (with f1 == '8'), the result is an empty table. And we need to ensure that
    both semantically and implementation wise, PIG handles empty tables/bags in
    a manner consistent with non-empty tables.
  • Pi song at Jun 9, 2008 at 1:57 am
    Regarding multiple file input, please have a look at Hadoop globbing
    support:-

    http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
    On Sat, Jun 7, 2008 at 2:33 AM, Prashanth Pappu wrote:

    Y = FILTER A BY f1 == '8';
    dump Y

    You are using the '==' operator with a string '8'. Just try
    Y = FILTER A BY f1==8;

    This is related to the concerns I've been raising. In the above example
    (with f1 == '8'), the result is an empty table. And we need to ensure that
    both semantically and implementation wise, PIG handles empty tables/bags in
    a manner consistent with non-empty tables.
  • Alan Gates at Jun 9, 2008 at 2:43 pm
    If you want to read every file in the directory, you can give the
    directory name. Every file should be read. At least in map reduce
    mode. I'm not sure if this works in local mode.

    Alan.

    Kayla Jay wrote:
    If I have multiple files in a directory, how do I load this into Pig? I want to run Pig over an input directory, not an individual file.

    %ls Data
    myfile1.txt
    myfile2.txt
    myfile3.txt
    myfile4.txt
    myfile5.txt

    thanks.

    Also, if I run the sample Pig Latin commands, I keep getting errors saying "Unable to open iterator"

    For example,

    A = LOAD 'myfile.txt' USING PigStorage('\t') AS (f1,f2,f3);
    dump A

    Gives me correct:
    <1, 2, 3>
    <4, 2, 1>
    <8, 3, 4>
    <4, 3, 3>
    <7, 2, 5>
    <8, 4, 3>

    but, then when I do the next sample,
    Y = FILTER A BY f1 == '8';
    dump Y

    I get a bunch of parser errors then the Unable to open iterator Y.

    This happens for most of the rest of the samples.

    What's going on?


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 6, '08 at 1:30p
activeJun 9, '08 at 2:43p
posts4
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase