Grokbase Groups Pig user October 2010
FAQ
My PIG script that is roughly like this:

A = LOAD input1 USING JsonLoader AS (x:map[]);
B = LOAD input2 USING JsonLoader AS (x:map[]);

A = FOREACH A GENERATE x, x#'item' AS item:chararray;
B = FOREACH B GENERATE x, x#'item' AS item:chararray;

U = UNION A, B;

DUMP U;


This leads to the following exception:

java.lang.RuntimeException: Unexpected data type -1 found in stream.
at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Any ideas ?

I am able to dump A and B.

-Rakesh

Search Discussions

  • Rekha Joshi at Oct 21, 2010 at 8:50 am
    Hi Rakesh,

    There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
    As a confirmatory step, you can try removing the explicit cast of chararray and check?

    Thanks & Regards,
    /Rekha.

    On 10/21/10 11:58 AM, "rakesh kothari" wrote:
  • Guy Bayes at Oct 21, 2010 at 4:58 pm
    We have a job that processes several hundred files in a directory

    We generally glob the directory in a single load statement

    Sometimes the jobs chokes on a bad row in a single file

    I could have sworn that pig printed the file name of the chunks it is processing in the task log but cannot see it

    Does anyone know under what conditions file names are printed, or how to find the file that is causing the issues?

    Thanks
    Guy
    >
  • Romain Rigaux at Oct 25, 2010 at 4:03 pm
    Hi,

    I don't think that filenames are directly available but I do something like
    this in order to get them (I did not try with Pig 0.7+ yet):

    Create a new loader inheriting from PigStorage and get the "location" path
    of the data. Then either:

    - print it if everything happens in the same task
    - append it in each records

    Hope this helps,

    Romain
    On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes wrote:

    We have a job that processes several hundred files in a directory

    We generally glob the directory in a single load statement

    Sometimes the jobs chokes on a bad row in a single file

    I could have sworn that pig printed the file name of the chunks it is
    processing in the task log but cannot see it

    Does anyone know under what conditions file names are printed, or how to
    find the file that is causing the issues?

    Thanks
    Guy
  • Guy Bayes at Oct 25, 2010 at 4:10 pm
    I'm pretty sure they are suppose to be on the Input split of the tasktracker
    logs aren't they?

    For some reason all the Input-Slits are null

    Input-split file: null
    Input-split start-offset: -1
    Input-split length: -1

    thanks
    Guy
    On Mon, Oct 25, 2010 at 9:02 AM, Romain Rigaux wrote:

    Hi,thanks
    I don't think that filenames are directly available but I do something like
    this in order to get them (I did not try with Pig 0.7+ yet):

    Create a new loader inheriting from PigStorage and get the "location" path
    of the data. Then either:

    - print it if everything hasupposeppens in the same task
    - append it in each records

    Hope this helps,

    Romain
    On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes wrote:

    We have a job that processes several hundred files in a directory

    We generally glob the directory in a single load statement

    Sometimes the jobs chokes on a bad row in a single file

    I could have sworn that pig printed the file name of the chunks it is
    processing in the task log but cannot see it

    Does anyone know under what conditions file names are printed, or how to
    find the file that is causing the issues?

    Thanks
    Guy


    --
    you may be acquainted with the night
    but i have seen the darkness in the day
    and you must know it is a terrifying sight...
  • Rakesh kothari at Oct 21, 2010 at 6:27 pm
    I am using Pig 0.7. No Luck even after removing explicit cast.

    PIG is not able to determine the type of the elements of the map and failing. I am able to DUMP A and B in isolation. It's the union that's not working.

    DESCRIBE U results in:

    {x: map[ ],item: chararray}

    -Rakesh
    From: rekhajos@yahoo-inc.com
    To: user@pig.apache.org; pig-user@hadoop.apache.org
    Date: Thu, 21 Oct 2010 14:19:36 +0530
    Subject: Re: Unexpected data type -1 found in stream.

    Hi Rakesh,

    There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
    As a confirmatory step, you can try removing the explicit cast of chararray and check?

    Thanks & Regards,
    /Rekha.

    On 10/21/10 11:58 AM, "rakesh kothari" wrote:
    My PIG script that is roughly like this:

    A = LOAD input1 USING JsonLoader AS (x:map[]);
    B = LOAD input2 USING JsonLoader AS (x:map[]);

    A = FOREACH A GENERATE x, (chararray) x#'item' AS item:chararray;
    B = FOREACH B GENERATE x, (chararray) x#'item' AS item:chararray;

    U = UNION A, B;

    DUMP U;


    This leads to the following exception:

    java.lang.RuntimeException: Unexpected data type -1 found in stream.
    at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
    at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
    at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
    at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
    at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    Any ideas ?

    I am able to dump A and B.

    -Rakesh
  • Rakesh kothari at Oct 21, 2010 at 7:24 pm
    Actually I figured out the issue. There were fields with null in my json and those fields were being serialized to "org.json.JSONObject.Null" objects and hence PIG was not able to map it to any valid type.

    -Rakesh

    From: rkothari_iit@hotmail.com
    To: user@pig.apache.org
    Subject: RE: Unexpected data type -1 found in stream.
    Date: Thu, 21 Oct 2010 11:27:24 -0700


    I am using Pig 0.7. No Luck even after removing explicit cast.

    PIG is not able to determine the type of the elements of the map and failing. I am able to DUMP A and B in isolation. It's the union that's not working.

    DESCRIBE U results in:

    {x: map[ ],item: chararray}

    -Rakesh
    From: rekhajos@yahoo-inc.com
    To: user@pig.apache.org; pig-user@hadoop.apache.org
    Date: Thu, 21 Oct 2010 14:19:36 +0530
    Subject: Re: Unexpected data type -1 found in stream.

    Hi Rakesh,

    There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
    As a confirmatory step, you can try removing the explicit cast of chararray and check?

    Thanks & Regards,
    /Rekha.

    On 10/21/10 11:58 AM, "rakesh kothari" wrote:
    My PIG script that is roughly like this:

    A = LOAD input1 USING JsonLoader AS (x:map[]);
    B = LOAD input2 USING JsonLoader AS (x:map[]);

    A = FOREACH A GENERATE x, (chararray) x#'item' AS item:chararray;
    B = FOREACH B GENERATE x, (chararray) x#'item' AS item:chararray;

    U = UNION A, B;

    DUMP U;


    This leads to the following exception:

    java.lang.RuntimeException: Unexpected data type -1 found in stream.
    at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
    at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
    at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
    at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
    at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    Any ideas ?

    I am able to dump A and B.

    -Rakesh

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 21, '10 at 6:30a
activeOct 25, '10 at 4:10p
posts7
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase