The tarball version of CDH does not support direct read because libhadoop.so is
not available if you have installed from a tarball. You must install from
an .rpm, .deb, or parcel in order to use short-circuit local reads.
See
https://ccp.cloudera.com/display/CDH4DOC/Tips+and+Guidelines#TipsandGuidelines-ImprovePerformanceforLocalReads
Can you re-install CDH using package?
Thanks,
Alan
On Tue, Mar 19, 2013 at 12:37 AM, Matti Niemenmaa wrote:
Hi,
I have recently begun using Impala 0.6 with CDH 4.2.0 and have managed
to get a tarball installation of it working, but with the exception of
block location metadata. I've followed the latest instructions at
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
to configure Hadoop appropriately, and as far as I can tell
short-circuit reads and native checksumming are both functioning properly.
I've replicated the problems with a simple two-machine test cluster with
the namenode, jobtracker, and statestored on one of the two computers
and a datanode, tasktracker, and impalad on the other.
Upon giving a simple "select * from rc limit 10", Impala's relevant
output is:
13/03/18 18:09:11 INFO planner.HdfsScanNode: collecting partitions for
table rc
13/03/18 18:09:11 INFO service.Frontend: get scan range locations
13/03/18 18:09:12 INFO catalog.HdfsTable: loaded partiton
PartitionBlockMetadata{#blocks=405, #filenames=203, totalStringLen=9966}
13/03/18 18:09:12 INFO hdfs.BlockStorageLocationUtil: Failed to connect
to datanode 10.10.253.222:49697
13/03/18 18:09:12 INFO catalog.HdfsTable: loaded disk ids for
PartitionBlockMetadata{#blocks=405, #filenames=203, totalStringLen=9966}
13/03/18 18:09:12 INFO catalog.HdfsTable: block metadata cache:
CacheStats{hitCount=0, missCount=1, loadSuccessCount=1,
loadExceptionCount=0, totalLoadTime=878930596, evictionCount=0}
Impala's logs additionally have this warning:
W0318 18:09:12.878093 1555 hdfs-scan-node.cc:184] Unknown disk id.
This will negatively affect performance. Check your hdfs settings to
enable block location metadata.
And Hadoop's logs from the same time state this:
2013-03-18 18:09:12,711 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol.getHdfsBlockLocations
from 10.10.253.222:52125: outp
2013-03-18 18:09:12,713 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 49697 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:144)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:342)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
It seems clear that the ClosedChannelException is the cause of Impala's
troubles but I can't figure out what could be causing the IPC issues.
Any help would be appreciated.
Hi,
I have recently begun using Impala 0.6 with CDH 4.2.0 and have managed
to get a tarball installation of it working, but with the exception of
block location metadata. I've followed the latest instructions at
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
to configure Hadoop appropriately, and as far as I can tell
short-circuit reads and native checksumming are both functioning properly.
I've replicated the problems with a simple two-machine test cluster with
the namenode, jobtracker, and statestored on one of the two computers
and a datanode, tasktracker, and impalad on the other.
Upon giving a simple "select * from rc limit 10", Impala's relevant
output is:
13/03/18 18:09:11 INFO planner.HdfsScanNode: collecting partitions for
table rc
13/03/18 18:09:11 INFO service.Frontend: get scan range locations
13/03/18 18:09:12 INFO catalog.HdfsTable: loaded partiton
PartitionBlockMetadata{#blocks=405, #filenames=203, totalStringLen=9966}
13/03/18 18:09:12 INFO hdfs.BlockStorageLocationUtil: Failed to connect
to datanode 10.10.253.222:49697
13/03/18 18:09:12 INFO catalog.HdfsTable: loaded disk ids for
PartitionBlockMetadata{#blocks=405, #filenames=203, totalStringLen=9966}
13/03/18 18:09:12 INFO catalog.HdfsTable: block metadata cache:
CacheStats{hitCount=0, missCount=1, loadSuccessCount=1,
loadExceptionCount=0, totalLoadTime=878930596, evictionCount=0}
Impala's logs additionally have this warning:
W0318 18:09:12.878093 1555 hdfs-scan-node.cc:184] Unknown disk id.
This will negatively affect performance. Check your hdfs settings to
enable block location metadata.
And Hadoop's logs from the same time state this:
2013-03-18 18:09:12,711 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol.getHdfsBlockLocations
from 10.10.253.222:52125: outp
2013-03-18 18:09:12,713 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 49697 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:144)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:342)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
It seems clear that the ClosedChannelException is the cause of Impala's
troubles but I can't figure out what could be causing the IPC issues.
Any help would be appreciated.