I tried using the fs -cat and sh -cat function to combine the header and
output file to a new file . But it is not working. Does hadoop give an
option to combine two files to a new file in pig script.
This is the command I used at the end of the pig script.
STORE out3 INTO '$OUTPUT' USING
sh -cat $OUTPUT/.pig_header $OUTPUT/part* > $OUTPUT/top10adv.csv
hadoop fs -ls pigdbck/output/top10advperimpfileh5
Found 4 items
-rw-r--r-- 1 root supergroup 30 2011-05-26 17:52
-rw-r--r-- 1 root supergroup 361 2011-05-26 17:52
drwxr-xr-x - root supergroup 0 2011-05-26 17:51
-rw-r--r-- 1 root supergroup 117 2011-05-26 17:52
On 26 May 2011 12:02, Subhramanian, Deepak wrote:
I thought any java class extension was a UDF. Thanks Dmitriy for
clarifying. Yes. I meant extending the StoreFunce. I guess I will use the
PigStorageSchema for the time being as I am tight on my deadlines. And use
the cat to concatenate the header. I didnt realized that we can use the cat
directly in the pig script and that is why thought of extending the
StoreFunc. Thanks Alan for your inputs.
I will have to read more on how the output part files are created on hdfs
so that I can combine all the part files at the end of the pig script for a
final output if the file size is very big.
On 25 May 2011 21:22, Dmitriy Ryaboy wrote:
Still not clear on how you expect a UDF to help.. normally when we say
UDFs, we mean functions work on individual tuples. They don't have
anything to do with how you store data.
You probably mean StoreFunc; since in this case you want a StoreFunc
that messes with the file format, as opposed to writing a side file
like PigStorageSchema does, you'll need to go pretty deep -- write a
whole StoreFunc + OutputFormat + RecordWriter stack.
On Wed, May 25, 2011 at 12:51 PM, Subhramanian, Deepak
Thanks for the inputs. I am looking for a UDF which I can use to store the
headers in the pig output file.
On 25 May 2011 18:30, Dmitriy Ryaboy wrote:
Can you explain what UDF you are looking for?
The intended usage for the .pig_header file is to cat it:
hadoop fs -cat myresults/.pig_header myresults/part*
(which drops the header right on top of your data).
We don't want to put the header inside the data files because that can
break subsequent processing.
As for names of the fields, that's a pig feature, it's there for
disambiguation. If you don't like it, you can rename the fields:
FLATTEN(aggregated) as (advertiserId, Advertiser, OrderId, ....)
On Wed, May 25, 2011 at 9:00 AM, Subhramanian, Deepak
Hi , I just realized that it is creating .pig_header file in the same output
directory. I guess I need to create a new UDF. Also if I am grouping
appending the tag aggregated::group: to the header column. Is Flatten
suppose to remove the group ?
On 25 May 2011 16:48, Subhramanian, Deepak <
I tried the PigStorageSchema. For some reason it doesnt create the
Is it because I am loading the data using another UDF ?
This is the command I used in the pigscript..
STORE out INTO '$OUTPUT' USING
On 25 May 2011 16:13, Dmitriy Ryaboy wrote:
You can try PigStorageSchema from the piggybank.
From: "Subhramanian, Deepak" <email@example.com>
Sent: 5/25/2011 5:28 AM
Subject: Storing Headers in Pig Output File
Is there a way to store the headers (titles of each) column using
command in Pig Script (STORE out3 INTO '$OUTPUT' USING
now it stores only the data. Somewhere I read in Pig0.8 it stores
with map reduce option. Do we have to supply extra parameters ?
"Please consider the environment before printing this e-mail"
The Newspaper Marketing Agency: Opening Up Newspapers:
This e-mail and any attachments are confidential, may be legally privileged and are the property of
News International Limited (which is the holding company for the News International group, is
registered in England under number 81701 and whose registered office is 3 Thomas More Square,
London E98 1XY, VAT number GB 243 8054 69), on whose systems they were generated.
If you have received this e-mail in error, please notify the sender immediately and do not use,
distribute, store or copy it in any way. Statements or opinions in this e-mail or any attachment are
those of the author and are not necessarily agreed or authorised by News International Limited or
any member of its group. News International Limited may monitor outgoing or incoming emails as
permitted by law. It accepts no liability for viruses introduced by this e-mail or attachments.