So, I've tried the exact same request but loading the data from HDFS
files (using the regular Pig loader) : it works !
Here is the request loading from HDFS:
start_sessions = LOAD 'start_sessions' AS (sid:chararray,
infoid:chararray, imei:chararray, start:long);
end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long,
locid:chararray);
infos = LOAD 'infos' AS (infoid:chararray, network_type:chararray,
network_subtype:chararray, locale:chararray, version_name:chararray,
carrier_country:chararray, carrier_name:chararray,
phone_manufacturer:chararray, phone_model:chararray,
firmware_version:chararray, firmware_name:chararray);
sessions = JOIN start_sessions BY sid, end_sessions BY sid;
sessions = FILTER sessions BY end > start AND end - start < 86400000L;
sessions = JOIN sessions BY infoid, infos BY infoid;
sessions = LIMIT sessions 100;
dump sessions;
The same request loading from HBase don't work:
start_sessions = LOAD 'startSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
infoid:chararray, imei:chararray, start:long);
end_sessions = LOAD 'endSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:timestamp meta:locid') AS (sid:chararray, end:long,
locid:chararray);
infos = LOAD 'info' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
data:networkType data:networkSubtype data:locale
data:applicationVersionName data:carrierCountry data:carrierName
data:phoneManufacturer data:phoneModel data:firmwareVersion
data:firmwareName') AS (infoid:chararray, network_type:chararray,
network_subtype:chararray, locale:chararray, version_name:chararray,
carrier_country:chararray, carrier_name:chararray,
phone_manufacturer:chararray, phone_model:chararray,
firmware_version:chararray, firmware_name:chararray);
sessions = JOIN start_sessions BY sid, end_sessions BY sid;
sessions = FILTER sessions BY end > start AND end - start < 86400000L;
sessions = JOIN sessions BY infoid, infos BY infoid;
sessions = LIMIT sessions 100;
dump sessions;
I guess it definitively means there is a nasty bug in the HBase loader.
Here is the PIG dump for the non working request:
aws09:~# pig
2011-07-28 08:17:36,329 [main] INFO org.apache.pig.Main - Logging
error messages to: /root/pig_1311841056328.log
2011-07-28 08:17:36,641 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at:
hdfs://aws09.preprod.ubithere.com:9000
2011-07-28 08:17:36,923 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: aws09.preprod.ubithere.com:9001
grunt> start_sessions = LOAD
'startSession.mde253811.preprod.ubithere.com' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
infoid:chararray, imei:chararray, start:long);
grunt> end_sessions = LOAD
'endSession.mde253811.preprod.ubithere.com' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:timestamp meta:locid') AS (sid:chararray, end:long,
locid:chararray);
grunt> infos = LOAD 'info.mde253811.preprod.ubithere.com' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
data:networkType data:networkSubtype data:locale
data:applicationVersionName data:carrierCountry data:carrierName
data:phoneManufacturer data:phoneModel data:firmwareVersion
data:firmwareName') AS (infoid:chararray, network_type:chararray,
network_subtype:chararray, locale:chararray, version_name:chararray,
carrier_country:chararray, carrier_name:chararray,
phone_manufacturer:chararray, phone_model:chararray,
firmware_version:chararray, firmware_name:chararray);
grunt> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
grunt> sessions = FILTER sessions BY end > start AND end - start <
86400000L;
grunt> sessions = JOIN sessions BY infoid, infos BY infoid;
grunt> sessions = LIMIT sessions 100;
grunt> dump sessions;
2011-07-28 08:17:50,275 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: HASH_JOIN,FILTER,LIMIT
2011-07-28 08:17:50,275 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-07-28 08:17:51,213 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: sessions:
Store(hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740:org.apache.pig.impl.io.InterStorage)
- scope-93 Operator Key: scope-93)
2011-07-28 08:17:51,225 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-07-28 08:17:51,281 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
- Rewrite: POPackage->POForEach to POJoinPackage
2011-07-28 08:17:51,281 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
- Rewrite: POPackage->POForEach to POJoinPackage
2011-07-28 08:17:51,350 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3
2011-07-28 08:17:51,350 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 3
2011-07-28 08:17:51,402 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-07-28 08:17:51,411 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:zookeeper.version=3.3.2-1031432, built on
11/05/2010 05:32 GMT
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:host.name=aws09.machine.com
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:java.version=1.6.0_22
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:java.vendor=Sun Microsystems Inc.
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client
environment:java.class.path=/opt/pig/bin/../conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/pig/bin/../pig-0.8.1-core.jar:/opt/pig/bin/../build/pig-*-SNAPSHOT.jar:/opt/pig/bin/../lib/commons-el-1.0.jar:/opt/pig/bin/../lib/commons-lang-2.4.jar:/opt/pig/bin/../lib/commons-logging-1.1.1.jar:/opt/pig/bin/../lib/guava-r06.jar:/opt/pig/bin/../lib/hbase-0.90.3.jar:/opt/pig/bin/../lib/hsqldb-1.8.0.10.jar:/opt/pig/bin/../lib/jackson-core-asl-1.0.1.jar:/opt/pig/bin/../lib/jackson-mapper-asl-1.0.1.jar:/opt/pig/bin/../lib/javacc-4.2.jar:/opt/pig/bin/../lib/javacc.jar:/opt/pig/bin/../lib/jetty-util-6.1.14.jar:/opt/pig/bin/../lib/jline-0.9.94.jar:/opt/pig/bin/../lib/joda-time-1.6.jar:/opt/pig/bin/../lib/jsch-0.1.38.jar:/opt/pig/bin/../lib/junit-4.5.jar:/opt/pig/bin/../lib/jython-2.5.0.jar:/opt/pig/bin/../lib/log4j-1.2.14.jar:/opt/pig/bin/../lib/pigudfs.jar:/opt/pig/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/pig/bin/../lib/zookeeper-3.3.2.jar:/opt/hadoop/conf_computation:/opt/hbase/conf:/opt/pig/lib/hadoop-0.20-append-core.jar
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client
environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:java.io.tmpdir=/tmp
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:java.compiler=<NA>
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:os.name=Linux
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:os.arch=amd64
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:os.version=2.6.21.7-2.fc8xen-ec2-v1.0
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:user.name=root
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:user.home=/root
2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
- Client environment:user.dir=/root
2011-07-28 08:17:51,471 [main] INFO org.apache.zookeeper.ZooKeeper
- Initiating client connection, connectString=aws09.machine.com:2222
sessionTimeout=60000 watcher=hconnection
2011-07-28 08:17:51,493 [main-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:17:51,499 [main-SendThread(aws09.machine.com:2222)]
INFO org.apache.zookeeper.ClientCnxn - Socket connection
established to aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:17:51,508 [main-SendThread(aws09.machine.com:2222)]
INFO org.apache.zookeeper.ClientCnxn - Session establishment
complete on server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada6054b, negotiated timeout = 60000
2011-07-28 08:17:51,575 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:17:51,687 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:17:51,696 [main] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:51,700 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
is aws03.machine.com:60020
2011-07-28 08:17:51,726 [main] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:51,729 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
aws03.machine.com:60020
2011-07-28 08:17:53,328 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-07-28 08:17:53,335 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
2011-07-28 08:17:53,335 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism is set for this job.
Setting number of reducers to 1
2011-07-28 08:17:53,442 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-07-28 08:17:53,944 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-07-28 08:17:53,989 [Thread-13] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:17:53,990 [Thread-13-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:17:53,991
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:17:53,996
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada6054c, negotiated timeout = 60000
2011-07-28 08:17:54,000 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:17:54,005 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:17:54,006 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:54,011 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
is aws03.machine.com:60020
2011-07-28 08:17:54,017 [Thread-13] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:17:54,017 [Thread-13-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:17:54,018
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:17:54,025
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada6054d, negotiated timeout = 60000
2011-07-28 08:17:54,029 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:17:54,032 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:17:54,033 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:54,037 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
is aws03.machine.com:60020
2011-07-28 08:17:54,039 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=2147483647 rows
2011-07-28 08:17:54,067 [Thread-13] DEBUG
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
split -> 0 -> aws03.machine.com:,
2011-07-28 08:17:54,068 [Thread-13] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
splits.
2011-07-28 08:17:54,068 [Thread-13] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
Returning 1 splits.
2011-07-28 08:17:54,109 [Thread-13] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:17:54,110 [Thread-13-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:17:54,111
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:17:54,119
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada6054e, negotiated timeout = 60000
2011-07-28 08:17:54,123 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:17:54,140 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:17:54,142 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:54,148 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
aws03.machine.com:60020
2011-07-28 08:17:54,154 [Thread-13] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:17:54,158 [Thread-13-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:17:54,159
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:17:54,161
[Thread-13-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada6054f, negotiated timeout = 60000
2011-07-28 08:17:54,164 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:17:54,167 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:17:54,169 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=10 rows
2011-07-28 08:17:54,172 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
aws03.machine.com:60020
2011-07-28 08:17:54,173 [Thread-13] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at
row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
max=2147483647 rows
2011-07-28 08:17:54,180 [Thread-13] DEBUG
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
split -> 0 -> aws03.machine.com:,
2011-07-28 08:17:54,180 [Thread-13] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
splits.
2011-07-28 08:17:54,180 [Thread-13] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
Returning 1 splits.
2011-07-28 08:17:55,037 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201107251336_0314
2011-07-28 08:17:55,037 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_03142011-07-28 08:19:06,924 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 8% complete
2011-07-28 08:19:15,971 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 16% complete
2011-07-28 08:19:18,985 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 19% complete
2011-07-28 08:19:25,035 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 27% complete
2011-07-28 08:20:14,810 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-07-28 08:20:14,812 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-07-28 08:20:14,830 [main] INFO org.apache.zookeeper.ZooKeeper
- Initiating client connection, connectString=aws09.machine.com:2222
sessionTimeout=60000 watcher=hconnection
2011-07-28 08:20:14,831 [main-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:20:14,832 [main-SendThread(aws09.machine.com:2222)]
INFO org.apache.zookeeper.ClientCnxn - Socket connection
established to aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:20:14,838 [main-SendThread(aws09.machine.com:2222)]
INFO org.apache.zookeeper.ClientCnxn - Session establishment
complete on server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada60556, negotiated timeout = 60000
2011-07-28 08:20:14,842 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:20:14,847 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:20:14,849 [main] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
for max=10 rows
2011-07-28 08:20:14,852 [main] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
aws03.machine.com:60020
2011-07-28 08:20:16,311 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-07-28 08:20:16,324 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1000000000 maxReducers=999
totalInputFileSize=198330658
2011-07-28 08:20:16,324 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism is set for this job.
Setting number of reducers to 1
2011-07-28 08:20:16,341 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-07-28 08:20:16,656 [Thread-32] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-07-28 08:20:16,656 [Thread-32] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process : 1
2011-07-28 08:20:16,693 [Thread-32] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:20:16,694 [Thread-32-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:20:16,695
[Thread-32-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:20:16,702
[Thread-32-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada60557, negotiated timeout = 60000
2011-07-28 08:20:16,705 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:20:16,709 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:20:16,710 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
for max=10 rows
2011-07-28 08:20:16,714 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
aws03.machine.com:60020
2011-07-28 08:20:16,716 [Thread-32] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
connectString=aws09.machine.com:2222 sessionTimeout=60000
watcher=hconnection
2011-07-28 08:20:16,717 [Thread-32-SendThread()] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to
server aws09.machine.com/10.83.1.244:2222
2011-07-28 08:20:16,718
[Thread-32-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
aws09.machine.com/10.83.1.244:2222, initiating session
2011-07-28 08:20:16,720
[Thread-32-SendThread(aws09.machine.com:2222)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on
server aws09.machine.com/10.83.1.244:2222, sessionid =
0x131617dada60558, negotiated timeout = 60000
2011-07-28 08:20:16,723 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$
[email protected];
hsa=aws03.machine.com:60020
2011-07-28 08:20:16,726 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
2011-07-28 08:20:16,727 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
for max=10 rows
2011-07-28 08:20:16,730 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
- Cached location for
info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
aws03.machine.com:60020
2011-07-28 08:20:16,732 [Thread-32] DEBUG
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
for max=2147483647 rows
2011-07-28 08:20:16,772 [Thread-32] DEBUG
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
split -> 0 -> aws03.machine.com:,
2011-07-28 08:20:16,772 [Thread-32] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
splits.
2011-07-28 08:20:16,772 [Thread-32] INFO
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
Returning 1 splits.
2011-07-28 08:20:17,500 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201107251336_0315
2011-07-28 08:20:17,500 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_03152011-07-28 08:20:28,075 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 37% complete
2011-07-28 08:20:34,106 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 41% complete
2011-07-28 08:20:37,124 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2011-07-28 08:20:46,168 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 51% complete
2011-07-28 08:20:49,183 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 61% complete
2011-07-28 08:20:52,198 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 62% complete
2011-07-28 08:20:55,214 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 64% complete
2011-07-28 08:21:01,244 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 66% complete
2011-07-28 08:21:07,311 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-07-28 08:21:07,312 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-07-28 08:21:08,770 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-07-28 08:21:08,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-07-28 08:21:08,910 [Thread-47] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-07-28 08:21:08,910 [Thread-47] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process : 1
2011-07-28 08:21:08,911 [Thread-47] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths (combined) to process : 1
2011-07-28 08:21:09,280 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201107251336_0316
2011-07-28 08:21:09,280 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_03162011-07-28 08:21:16,321 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 83% complete
2011-07-28 08:21:34,439 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-07-28 08:21:34,441 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt
Features
0.20-append 0.8.1-SNAPSHOT root 2011-07-28 08:17:51
2011-07-28 08:21:34 HASH_JOIN,FILTER,LIMIT
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme
AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime
Alias Feature Outputs
job_201107251336_0314 2 1 75 66 70 63 63
63 end_sessions,sessions,start_sessions HASH_JOIN
job_201107251336_0315 4 1 15 6 12 24 24
24 infos,sessions HASH_JOIN
job_201107251336_0316 1 1 3 3 3 12 12 12
hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740,
Input(s):
Successfully read 2069446 records from:
"endSession.mde253811.preprod.ubithere.com"
Successfully read 2072419 records from:
"startSession.mde253811.preprod.ubithere.com"
Successfully read 19441 records from:
"info.mde253811.preprod.ubithere.com"
Output(s):
Successfully stored 0 records in:
"hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 1
Total records proactively spilled: 1944943
Job DAG:
job_201107251336_0314 -> job_201107251336_0315,
job_201107251336_0315 -> job_201107251336_0316,
job_201107251336_0316
2011-07-28 08:21:34,472 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2011-07-28 08:21:34,500 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-07-28 08:21:34,501 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process : 1
grunt>
Le 27/07/11 16:59, Raghu Angadi a écrit :
Vincent,
is the behavior random or the same each time?
Couple of things to narrow it down..
- attach the entire console output from PIG run when this happened.
- only load start_sessions and end_sessions and store them..
- load the data from tables from previous step and run the same pig
command
Consider filing a JIRA. it might be a better place to go into more details.
-Raghu.
On Wed, Jul 27, 2011 at 5:38 AM, Vincent Baratwrote:
More info on this issue:
1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
2- The issue can be reproduced with PIG trunk too
The script:
start_sessions = LOAD 'startSession.mde253811.**preprod.ubithere.com<
http://startSession.mde253811.preprod.ubithere.com>'
USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
imei:chararray, start:long);
end_sessions = LOAD 'endSession.mde253811.preprod.**ubithere.com<
http://endSession.mde253811.preprod.ubithere.com>'
USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
sessions = JOIN start_sessions BY sid, end_sessions BY sid;
sessions = FILTER sessions BY end> start AND end - start< 86400000L;
sessions = FOREACH sessions GENERATE start_sessions::sid, imei, start, end;
sessions = LIMIT sessions 100;
dump sessions;
<output 1>
dump sessions;
<output 2>
The issue:
<output 1> is empty
<output 2> is 100 lines
I can reproduce the issue systematically.
Please advice: this issue prevent me from moving to HBase 0.90.3 in
production, as I need to upgrade to PIG 0.8.1 at the same time !