I do that kind of streaming on hdfs files using Hadoop streaming, outside of pig. I assume you could do it from inside pig too, but haven’t tested.
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
From: Moore, Michael A.
Sent: Tuesday, June 07, 2011 3:14 PM
To: user@pig.apache.org
Subject: Re: Loading Files with Comment Lines
Possibly. Can I do that if the file is already in HDFS?
______________________________________
Michael Moore :: Michael.Moore@jhuapl.edu
The Johns Hopkins University Applied Physics Laboratory
0B7B17EE1AE2A80B pgp
BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
On Jun 7, 2011, at 3:12 PM, wrote:
Can you stream it through
grep -v ‘^#’
?
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
From: Moore, Michael A.
Sent: Tuesday, June 07, 2011 3:04 PM
To: user@pig.apache.org
Subject: Loading Files with Comment Lines
Hello all-
I've got a quick question and Google isn't proving to be much help.
I've got a big file, that has a few lines in it prefaced with a pound sign (#) to indicate they are to be ignored. I would like to LOAD this file using PigStorage. Is there a way to do this, or is it handled automatically?
The data might look something like this:
# Data Source: Project A
# Contact MMoore with Questions
# SenderId RecipientId
1 2
3 5
6 7
#2 1
3 6
11 7
Thanks!
-Michael
______________________________________
Michael Moore :: Michael.Moore@jhuapl.edu
The Johns Hopkins University Applied Physics Laboratory
0B7B17EE1AE2A80B pgp
BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint