FAQ
All we have our application running on the same host as the flume agent.
Application is fine in connecting to base, flume uses the same hbase-site
as the application. But flume seems to timeout using the Hbase sink

0:2181. Will not attempt to authenticate using SASL (unknown error)
29 Nov 2012 18:11:36,186 INFO
[lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
(org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:849) - Socket
connection established to cloudera-dev.localdomain/10.211.55.10:2181,
initiating session
29 Nov 2012 18:11:36,208 INFO
[lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
(org.apache.zookeeper.ClientCnxn$SendThread.onConnected:1207) - Session
establishment complete on server
cloudera-dev.localdomain/10.211.55.10:2181, sessionid = 0x13b35e5ded7009c,
negotiated timeout = 60000
29 Nov 2012 18:12:16,208 INFO
[lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
(org.apache.zookeeper.ClientCnxn$SendThread.run:1083) - Client session
timed out, have not heard from server in 40001ms for sessionid
0x13b35e5ded7009c, closing socket connection and attempting reconnect
29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
(org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists:159) -
Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master

Has anyone seen this before?

--

Search Discussions

  • Mike Percy at Nov 29, 2012 at 6:17 pm
    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized cluster?
    Does this ever work or has it worked before?

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 10:13 AM, Benc wrote:

    All we have our application running on the same host as the flume agent.
    Application is fine in connecting to base, flume uses the same hbase-site
    as the application. But flume seems to timeout using the Hbase sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:849) - Socket
    connection established to cloudera-dev.localdomain/10.211.55.10:2181,
    initiating session
    29 Nov 2012 18:11:36,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.onConnected:1207) - Session
    establishment complete on server cloudera-dev.localdomain/
    10.211.55.10:2181, sessionid = 0x13b35e5ded7009c, negotiated timeout =
    60000
    29 Nov 2012 18:12:16,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.run:1083) - Client session
    timed out, have not heard from server in 40001ms for sessionid
    0x13b35e5ded7009c, closing socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists:159) -
    Possibly transient ZooKeeper exception:
    org.apache.zookeeper.KeeperException$ConnectionLossException:
    KeeperErrorCode = ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --
  • Benc at Nov 29, 2012 at 6:19 pm
    Hi Mike

    Using CDH4, flume-ng 1.3.0, hbase that comes with hbase-0.92.1-cdh4.0.1.
    Don't think we are using a Kerberized cluster. And no this has not worked
    just trying to get it up and running.

    Have the following

    agent1.sources = source1
    agent1.sinks = hbaseSink
    agent1.channels = memoryChannel

    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F application_server.log
    agent1.sources.source1.channels = memoryChannel

    agent1.sinks.hbaseSink.channel = memoryChannel
    agent1.sinks.hbaseSink.type = org.apache.flume.sink.hbase.HBaseSink
    agent1.sinks.hbaseSink.table = flume-ng-test
    agent1.sinks.hbaseSink.columnFamily = testing
    agent1.sinks.hbaseSink.column = foo
    agent1.sinks.hbaseSink.serializer =
    org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
    agent1.sinks.hbaseSink.serializer.payloadColumn = pcol
    agent1.sinks.hbaseSink.serializer.incrementColumn = icol
    agent1.channels.memoryChannel.type = memory
    agent1.channels.memoryChannel.capacity = 100

    On Thursday, November 29, 2012 6:16:18 PM UTC, Mike Percy wrote:

    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized cluster?
    Does this ever work or has it worked before?

    Regards,
    Mike

    On Thu, Nov 29, 2012 at 10:13 AM, Benc <ben.cu...@celer-tech.com<javascript:>
    wrote:
    All we have our application running on the same host as the flume agent.
    Application is fine in connecting to base, flume uses the same hbase-site
    as the application. But flume seems to timeout using the Hbase sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:849) - Socket
    connection established to cloudera-dev.localdomain/10.211.55.10:2181,
    initiating session
    29 Nov 2012 18:11:36,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.onConnected:1207) - Session
    establishment complete on server cloudera-dev.localdomain/
    10.211.55.10:2181, sessionid = 0x13b35e5ded7009c, negotiated timeout =
    60000
    29 Nov 2012 18:12:16,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.run:1083) - Client session
    timed out, have not heard from server in 40001ms for sessionid
    0x13b35e5ded7009c, closing socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists:159) -
    Possibly transient ZooKeeper exception:
    org.apache.zookeeper.KeeperException$ConnectionLossException:
    KeeperErrorCode = ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --
  • Benc at Nov 29, 2012 at 8:32 pm
    Dropping down to flume 1.2.0 this seems to work. Not sure what the
    difference is.

    On Thursday, November 29, 2012 6:19:19 PM UTC, Benc wrote:

    Hi Mike

    Using CDH4, flume-ng 1.3.0, hbase that comes with hbase-0.92.1-cdh4.0.1.
    Don't think we are using a Kerberized cluster. And no this has not worked
    just trying to get it up and running.

    Have the following

    agent1.sources = source1
    agent1.sinks = hbaseSink
    agent1.channels = memoryChannel

    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F application_server.log
    agent1.sources.source1.channels = memoryChannel

    agent1.sinks.hbaseSink.channel = memoryChannel
    agent1.sinks.hbaseSink.type = org.apache.flume.sink.hbase.HBaseSink
    agent1.sinks.hbaseSink.table = flume-ng-test
    agent1.sinks.hbaseSink.columnFamily = testing
    agent1.sinks.hbaseSink.column = foo
    agent1.sinks.hbaseSink.serializer =
    org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
    agent1.sinks.hbaseSink.serializer.payloadColumn = pcol
    agent1.sinks.hbaseSink.serializer.incrementColumn = icol
    agent1.channels.memoryChannel.type = memory
    agent1.channels.memoryChannel.capacity = 100

    On Thursday, November 29, 2012 6:16:18 PM UTC, Mike Percy wrote:

    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized
    cluster? Does this ever work or has it worked before?

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 10:13 AM, Benc wrote:

    All we have our application running on the same host as the flume agent.
    Application is fine in connecting to base, flume uses the same hbase-site
    as the application. But flume seems to timeout using the Hbase sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:849) - Socket
    connection established to cloudera-dev.localdomain/10.211.55.10:2181,
    initiating session
    29 Nov 2012 18:11:36,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.onConnected:1207) - Session
    establishment complete on server cloudera-dev.localdomain/
    10.211.55.10:2181, sessionid = 0x13b35e5ded7009c, negotiated timeout =
    60000
    29 Nov 2012 18:12:16,208 INFO
    [lifecycleSupervisor-1-1-SendThread(cloudera-dev.localdomain:2181)]
    (org.apache.zookeeper.ClientCnxn$SendThread.run:1083) - Client session
    timed out, have not heard from server in 40001ms for sessionid
    0x13b35e5ded7009c, closing socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists:159) -
    Possibly transient ZooKeeper exception:
    org.apache.zookeeper.KeeperException$ConnectionLossException:
    KeeperErrorCode = ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --
  • Mike Percy at Nov 29, 2012 at 9:04 pm
    Ben,
    I'd recommend using flume-ng-1.2.0+122 from CDH 4.1.2:
    https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information#CDHVersionandPackagingInformation-CDHVersion4.1.2Packaging

    That version is formally QA tested by Cloudera and certified to work with
    CDH4.1.2.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 12:32 PM, Benc wrote:

    Dropping down to flume 1.2.0 this seems to work. Not sure what the
    difference is.

    On Thursday, November 29, 2012 6:19:19 PM UTC, Benc wrote:

    Hi Mike

    Using CDH4, flume-ng 1.3.0, hbase that comes with hbase-0.92.1-cdh4.0.1.
    Don't think we are using a Kerberized cluster. And no this has not worked
    just trying to get it up and running.

    Have the following

    agent1.sources = source1
    agent1.sinks = hbaseSink
    agent1.channels = memoryChannel

    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F application_server.log
    agent1.sources.source1.**channels = memoryChannel

    agent1.sinks.hbaseSink.channel = memoryChannel
    agent1.sinks.hbaseSink.type = org.apache.flume.sink.hbase.**HBaseSink
    agent1.sinks.hbaseSink.table = flume-ng-test
    agent1.sinks.hbaseSink.**columnFamily = testing
    agent1.sinks.hbaseSink.column = foo
    agent1.sinks.hbaseSink.**serializer = org.apache.flume.sink.hbase.**
    SimpleHbaseEventSerializer
    agent1.sinks.hbaseSink.**serializer.payloadColumn = pcol
    agent1.sinks.hbaseSink.**serializer.incrementColumn = icol
    agent1.channels.memoryChannel.**type = memory
    agent1.channels.memoryChannel.**capacity = 100

    On Thursday, November 29, 2012 6:16:18 PM UTC, Mike Percy wrote:

    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized
    cluster? Does this ever work or has it worked before?

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 10:13 AM, Benc wrote:

    All we have our application running on the same host as the flume
    agent. Application is fine in connecting to base, flume uses the same
    hbase-site as the application. But flume seems to timeout using the Hbase
    sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.**primeConnection:849) - Socket connection
    established to cloudera-dev.localdomain/10.**211.55.10:2181<http://10.211.55.10:2181>,
    initiating session
    29 Nov 2012 18:11:36,208 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.**onConnected:1207) - Session establishment
    complete on server cloudera-dev.localdomain/10.**211.55.10:2181<http://10.211.55.10:2181>,
    sessionid = 0x13b35e5ded7009c, negotiated timeout = 60000
    29 Nov 2012 18:12:16,208 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.run:**1083) - Client session timed out, have
    not heard from server in 40001ms for sessionid 0x13b35e5ded7009c, closing
    socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.**zookeeper.**RecoverableZooKeeper.exists:**159)
    - Possibly transient ZooKeeper exception: org.apache.zookeeper.**
    KeeperException$**ConnectionLossException: KeeperErrorCode =
    ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --

    --
  • Benc at Nov 29, 2012 at 9:53 pm
    I found this tarball.

    http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/pool/contrib/f/flume-ng/flume-ng_1.2.0+122.orig.tar.gz

    I had to copy in my hadoop and hbase jars into the lib dir. They are on
    version 4.0.1 is that still okay?

    I still get

    29 Nov 2012 21:51:27,917 INFO [lifecycleSupervisor-1-0]
    (org.apache.flume.source.ExecSource.start:155) - Exec source starting with
    command:tail -f /tmp/demo.txt
    29 Nov 2012 21:51:27,918 INFO [lifecycleSupervisor-1-1]
    (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) -
    Component type: SINK, name: hdfs-sink started
    29 Nov 2012 21:51:42,999 WARN
    [SinkRunner-PollingRunner-DefaultSinkProcessor]
    (org.apache.flume.sink.hdfs.HDFSEventSink.process:444) - HDFS IO error
    java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2138)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2145)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:188)
    at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:50)
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:157)
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:154)
    at
    org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:154)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:316)
    at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:718)
    at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:715)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

    On Thursday, November 29, 2012 9:03:16 PM UTC, Mike Percy wrote:

    Ben,
    I'd recommend using flume-ng-1.2.0+122 from CDH 4.1.2:
    https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information#CDHVersionandPackagingInformation-CDHVersion4.1.2Packaging

    That version is formally QA tested by Cloudera and certified to work with
    CDH4.1.2.

    Regards,
    Mike

    On Thu, Nov 29, 2012 at 12:32 PM, Benc <ben.cu...@celer-tech.com<javascript:>
    wrote:
    Dropping down to flume 1.2.0 this seems to work. Not sure what the
    difference is.

    On Thursday, November 29, 2012 6:19:19 PM UTC, Benc wrote:

    Hi Mike

    Using CDH4, flume-ng 1.3.0, hbase that comes with hbase-0.92.1-cdh4.0.1.
    Don't think we are using a Kerberized cluster. And no this has not worked
    just trying to get it up and running.

    Have the following

    agent1.sources = source1
    agent1.sinks = hbaseSink
    agent1.channels = memoryChannel

    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F application_server.log
    agent1.sources.source1.**channels = memoryChannel

    agent1.sinks.hbaseSink.channel = memoryChannel
    agent1.sinks.hbaseSink.type = org.apache.flume.sink.hbase.**HBaseSink
    agent1.sinks.hbaseSink.table = flume-ng-test
    agent1.sinks.hbaseSink.**columnFamily = testing
    agent1.sinks.hbaseSink.column = foo
    agent1.sinks.hbaseSink.**serializer = org.apache.flume.sink.hbase.**
    SimpleHbaseEventSerializer
    agent1.sinks.hbaseSink.**serializer.payloadColumn = pcol
    agent1.sinks.hbaseSink.**serializer.incrementColumn = icol
    agent1.channels.memoryChannel.**type = memory
    agent1.channels.memoryChannel.**capacity = 100

    On Thursday, November 29, 2012 6:16:18 PM UTC, Mike Percy wrote:

    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized
    cluster? Does this ever work or has it worked before?

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 10:13 AM, Benc wrote:

    All we have our application running on the same host as the flume
    agent. Application is fine in connecting to base, flume uses the same
    hbase-site as the application. But flume seems to timeout using the Hbase
    sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.**primeConnection:849) - Socket connection
    established to cloudera-dev.localdomain/10.**211.55.10:2181<http://10.211.55.10:2181>,
    initiating session
    29 Nov 2012 18:11:36,208 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.**onConnected:1207) - Session establishment
    complete on server cloudera-dev.localdomain/10.**211.55.10:2181<http://10.211.55.10:2181>,
    sessionid = 0x13b35e5ded7009c, negotiated timeout = 60000
    29 Nov 2012 18:12:16,208 INFO [lifecycleSupervisor-1-1-**
    SendThread(cloudera-dev.**localdomain:2181)] (org.apache.zookeeper.**
    ClientCnxn$SendThread.run:**1083) - Client session timed out, have
    not heard from server in 40001ms for sessionid 0x13b35e5ded7009c, closing
    socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.**zookeeper.**RecoverableZooKeeper.exists:**159)
    - Possibly transient ZooKeeper exception: org.apache.zookeeper.**
    KeeperException$**ConnectionLossException: KeeperErrorCode =
    ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --

    --
  • Mike Percy at Nov 29, 2012 at 10:07 pm
    That should work, as long as you are set up correctly... but it would be
    better to set HADOOP_HOME and HBASE_HOME instead of copying jars directly
    into lib. The flume start script uses those paths to find the things it
    needs, including conf files and native libs (not just jars).

    Why not just use the deb packages? Why use these tarballs? If you use the
    deb packages and use all the same version (i.e. CDH 4.1.2 for everything)
    then everything should Just Work(TM) out of the box!

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 1:52 PM, Benc wrote:

    I found this tarball.


    http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/pool/contrib/f/flume-ng/flume-ng_1.2.0+122.orig.tar.gz

    I had to copy in my hadoop and hbase jars into the lib dir. They are on
    version 4.0.1 is that still okay?

    I still get

    29 Nov 2012 21:51:27,917 INFO [lifecycleSupervisor-1-0]
    (org.apache.flume.source.ExecSource.start:155) - Exec source starting with
    command:tail -f /tmp/demo.txt
    29 Nov 2012 21:51:27,918 INFO [lifecycleSupervisor-1-1]
    (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) -
    Component type: SINK, name: hdfs-sink started
    29 Nov 2012 21:51:42,999 WARN
    [SinkRunner-PollingRunner-DefaultSinkProcessor]
    (org.apache.flume.sink.hdfs.HDFSEventSink.process:444) - HDFS IO error
    java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2138)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2145)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:188)
    at
    org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:50)
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:157)
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:154)
    at
    org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:154)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:316)
    at
    org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:718)
    at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:715)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

    On Thursday, November 29, 2012 9:03:16 PM UTC, Mike Percy wrote:

    Ben,
    I'd recommend using flume-ng-1.2.0+122 from CDH 4.1.2:
    https://ccp.cloudera.**com/display/DOC/CDH+Version+**
    and+Packaging+Information#**CDHVersionandPackagingInformat**
    ion-CDHVersion4.1.2Packaging<https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information#CDHVersionandPackagingInformation-CDHVersion4.1.2Packaging>

    That version is formally QA tested by Cloudera and certified to work with
    CDH4.1.2.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 12:32 PM, Benc wrote:

    Dropping down to flume 1.2.0 this seems to work. Not sure what the
    difference is.

    On Thursday, November 29, 2012 6:19:19 PM UTC, Benc wrote:

    Hi Mike

    Using CDH4, flume-ng 1.3.0, hbase that comes
    with hbase-0.92.1-cdh4.0.1. Don't think we are using a Kerberized cluster.
    And no this has not worked just trying to get it up and running.

    Have the following

    agent1.sources = source1
    agent1.sinks = hbaseSink
    agent1.channels = memoryChannel

    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -F application_server.log
    agent1.sources.source1.**channel**s = memoryChannel

    agent1.sinks.hbaseSink.channel = memoryChannel
    agent1.sinks.hbaseSink.type = org.apache.flume.sink.hbase.**HB**
    aseSink
    agent1.sinks.hbaseSink.table = flume-ng-test
    agent1.sinks.hbaseSink.**columnF**amily = testing
    agent1.sinks.hbaseSink.column = foo
    agent1.sinks.hbaseSink.**seriali**zer = org.apache.flume.sink.hbase.**
    Si**mpleHbaseEventSerializer
    agent1.sinks.hbaseSink.**seriali**zer.payloadColumn = pcol
    agent1.sinks.hbaseSink.**seriali**zer.incrementColumn = icol
    agent1.channels.memoryChannel.****type = memory
    agent1.channels.memoryChannel.****capacity = 100

    On Thursday, November 29, 2012 6:16:18 PM UTC, Mike Percy wrote:

    Hi Ben,
    What version of CDH, Flume, and HBase? Are you using a Kerberized
    cluster? Does this ever work or has it worked before?

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 10:13 AM, Benc wrote:

    All we have our application running on the same host as the flume
    agent. Application is fine in connecting to base, flume uses the same
    hbase-site as the application. But flume seems to timeout using the Hbase
    sink

    0:2181. Will not attempt to authenticate using SASL (unknown error)
    29 Nov 2012 18:11:36,186 INFO [lifecycleSupervisor-1-1-**Send**
    Thread(cloudera-dev.**localdomai**n:2181)] (org.apache.zookeeper.**
    ClientCn**xn$SendThread.**primeConnection:**849) - Socket
    connection established to cloudera-dev.localdomain/10.**21**
    1.55.10:2181 <http://10.211.55.10:2181>, initiating session
    29 Nov 2012 18:11:36,208 INFO [lifecycleSupervisor-1-1-**Send**
    Thread(cloudera-dev.**localdomai**n:2181)] (org.apache.zookeeper.**
    ClientCn**xn$SendThread.**onConnected:**1207) - Session
    establishment complete on server cloudera-dev.localdomain/10.**21**
    1.55.10:2181 <http://10.211.55.10:2181>, sessionid =
    0x13b35e5ded7009c, negotiated timeout = 60000
    29 Nov 2012 18:12:16,208 INFO [lifecycleSupervisor-1-1-**Send**
    Thread(cloudera-dev.**localdomai**n:2181)] (org.apache.zookeeper.**
    ClientCn**xn$SendThread.run:**1083) - Client session timed out,
    have not heard from server in 40001ms for sessionid 0x13b35e5ded7009c,
    closing socket connection and attempting reconnect
    29 Nov 2012 18:12:16,317 WARN [lifecycleSupervisor-1-1]
    (org.apache.hadoop.hbase.**zooke**eper.**RecoverableZooKeeper.**
    exists:**159) - Possibly transient ZooKeeper exception:
    org.apache.zookeeper.**KeeperExc**eption$**ConnectionLossException**:
    KeeperErrorCode = ConnectionLoss for /hbase/master

    Has anyone seen this before?

    --


    --

    --

    --
  • Benc at Nov 29, 2012 at 10:09 pm

    Hi Mike should have been clear. The reason for using the tarballs is we
    are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --
  • Mike Percy at Nov 29, 2012 at 10:42 pm
    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you want
    to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:09 PM, Benc wrote:

    Hi Mike should have been clear. The reason for using the tarballs is we
    are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --
  • Benc at Nov 29, 2012 at 10:17 pm
    Hi Mike

    Good plan. I will setup flume on our CDH instance so there is a single way
    into HDFS. Then I guess I can use the arvo client to send the tail logs to
    the flume logger on the CDH server. Let me give it a go. Thank for the info.


    On Thursday, November 29, 2012 10:15:03 PM UTC, Mike Percy wrote:

    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you want
    to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike

    On Thu, Nov 29, 2012 at 2:09 PM, Benc <ben.cu...@celer-tech.com<javascript:>
    wrote:
    Hi Mike should have been clear. The reason for using the tarballs is we
    are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --
  • Benc at Nov 29, 2012 at 10:43 pm
    Just a quick one using the simple yum install works fine and I can write
    files into HDFS.

    On Thursday, November 29, 2012 10:17:51 PM UTC, Benc wrote:

    Hi Mike

    Good plan. I will setup flume on our CDH instance so there is a single way
    into HDFS. Then I guess I can use the arvo client to send the tail logs to
    the flume logger on the CDH server. Let me give it a go. Thank for the info.


    On Thursday, November 29, 2012 10:15:03 PM UTC, Mike Percy wrote:

    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you want
    to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:09 PM, Benc wrote:

    Hi Mike should have been clear. The reason for using the tarballs is we
    are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --
  • Mike Percy at Nov 29, 2012 at 10:47 pm
    Great! Glad you got it working Ben.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:43 PM, Benc wrote:

    Just a quick one using the simple yum install works fine and I can write
    files into HDFS.

    On Thursday, November 29, 2012 10:17:51 PM UTC, Benc wrote:

    Hi Mike

    Good plan. I will setup flume on our CDH instance so there is a single
    way into HDFS. Then I guess I can use the arvo client to send the tail logs
    to the flume logger on the CDH server. Let me give it a go. Thank for the
    info.


    On Thursday, November 29, 2012 10:15:03 PM UTC, Mike Percy wrote:

    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you
    want to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:09 PM, Benc wrote:

    Hi Mike should have been clear. The reason for using the tarballs is we
    are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --

    --
  • Benc at Nov 29, 2012 at 11:27 pm
    Sorry Mike one more question. What is the best solution for logging log
    files into HDFS.

    Small count with small sizes?


    On Thursday, November 29, 2012 10:47:10 PM UTC, Mike Percy wrote:

    Great! Glad you got it working Ben.

    Regards,
    Mike

    On Thu, Nov 29, 2012 at 2:43 PM, Benc <ben.cu...@celer-tech.com<javascript:>
    wrote:
    Just a quick one using the simple yum install works fine and I can write
    files into HDFS.

    On Thursday, November 29, 2012 10:17:51 PM UTC, Benc wrote:

    Hi Mike

    Good plan. I will setup flume on our CDH instance so there is a single
    way into HDFS. Then I guess I can use the arvo client to send the tail logs
    to the flume logger on the CDH server. Let me give it a go. Thank for the
    info.


    On Thursday, November 29, 2012 10:15:03 PM UTC, Mike Percy wrote:

    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you
    want to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:09 PM, Benc wrote:

    Hi Mike should have been clear. The reason for using the tarballs is
    we are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --

    --
  • Mike Percy at Nov 30, 2012 at 8:04 am
    Ben can you clarify your question? Not sure I got what you are asking.
    Small files for ingest or small files for output onto HDFS? Maybe provide
    some more detail.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 3:27 PM, Benc wrote:

    Sorry Mike one more question. What is the best solution for logging log
    files into HDFS.

    Small count with small sizes?


    On Thursday, November 29, 2012 10:47:10 PM UTC, Mike Percy wrote:

    Great! Glad you got it working Ben.

    Regards,
    Mike

    On Thu, Nov 29, 2012 at 2:43 PM, Benc wrote:

    Just a quick one using the simple yum install works fine and I can write
    files into HDFS.

    On Thursday, November 29, 2012 10:17:51 PM UTC, Benc wrote:

    Hi Mike

    Good plan. I will setup flume on our CDH instance so there is a single
    way into HDFS. Then I guess I can use the arvo client to send the tail logs
    to the flume logger on the CDH server. Let me give it a go. Thank for the
    info.


    On Thursday, November 29, 2012 10:15:03 PM UTC, Mike Percy wrote:

    Ben,
    If Flume is writing directly to HDFS via your app servers (maybe you
    want to add a Flume collection tier to reduce load on your HDFS cluster's
    namenode instead though) then you still need to install the hdfs-client
    library packages there. You would not actually run a namenode or a datanode
    on your app tier but you need the libs for Flume to write to a cluster. In
    any case, I'd strongly recommend using the binary deb packages if you can,
    as it will pull in the needed dependencies.

    Regards,
    Mike
    On Thu, Nov 29, 2012 at 2:09 PM, Benc wrote:

    Hi Mike should have been clear. The reason for using the tarballs is
    we are running the flume logger on our application server to tail the
    application logs and then write them into HDFS. Not sure if there is
    another way of doing it? But I must be missing a jar or a config, but my
    config is the same.
    --


    --

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedNov 29, '12 at 6:13p
activeNov 30, '12 at 8:04a
posts14
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Benc: 8 posts Mike Percy: 6 posts

People

Translate

site design / logo © 2022 Grokbase