FAQ
Hi!

I'm bulkloading from Hadoop to Cassandra. Currently in the process of
moving to new hardware for both Hadoop and Cassandra, and while
testrunning bulkload, I see the following error:

Exception in thread "Streaming to /2001:4c28:1:413:0:1:1:12:1"
java.lang.RuntimeException: java.io.EOFException at
com.google.common.base.Throwables.propagate(Throwables.java:155) at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375) at
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193)
at
org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more

I see no exceptions related to this on the destination node
(2001:4c28:1:413:0:1:1:12:1).

This makes the whole map task fail with:

2014-01-27 10:46:50,878 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:forsberg (auth:SIMPLE) cause:java.io.IOException: Too many hosts failed: [/2001:4c28:1:413:0:1:1:12]
2014-01-27 10:46:50,878 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Too many hosts failed: [/2001:4c28:1:413:0:1:1:12]
  at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:244)
  at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:209)
  at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:540)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:650)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
  at org.apache.hadoop.mapred.Child.main(Child.java:260)
2014-01-27 10:46:50,880 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

The failed task was on hadoop worker node hdp01-12-4.

However, hadoop later retries this map task on a different hadoop worker node (hdp01-10-2), and that retry succeeds.

So that's weird, but I could live with it. Now, however, comes the real trouble - the hadoop job does not finish due to one task running on hdp01-12-4 being stuck with this:

Exception in thread "Streaming to /2001:4c28:1:413:0:1:1:12:1" java.lang.IllegalStateException: target reports current file is /opera/log2/hadoop/mapred/local/taskTracker/forsberg/jobcache/job_201401161243_0288/attempt_201401161243_0288_m_000473_0/work/tmp/iceland_test/Data_hourly/iceland_test-Data_hourly-ib-1-Data.db but is /opera/log6/hadoop/mapred/local/taskTracker/forsberg/jobcache/job_201401161243_0288/attempt_201401161243_0288_m_000000_0/work/tmp/iceland_test/Data_hourly/iceland_test-Data_hourly-ib-1-Data.db
  at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:154)
  at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:45)
  at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:199)
  at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180)
  at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)

This just sits there forever, or at least until the hadoop task timeout kicks in.

So two questions here:

1) Any clues on what might cause the first EOFException? It seems to appear for *some* of my bulkloads. Not all, but frequent enough to be a problem. Like, every 10:th bulkload I do seems to have the problem.

2) The second problem I have a feeling could be related to https://issues.apache.org/jira/browse/CASSANDRA-4223, but with the extra quirk that with the bulkload case, we have *multiple java processes* creating streaming sessions on the same host, so streaming session IDs are not unique.

I'm thinking 2) happens because the EOFException made the streaming session in 1) sit around on the target node without being closed.

This is on Cassandra 1.2.1. I know that's pretty old, but I would like to avoid upgrading until I have made this migration from old to new hardware. Upgrading to 1.2.13 might be an option.

Any hints welcome.

Thanks,
\EF

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 4 | next ›
Discussion Overview
groupuser @
categoriescassandra
postedJan 27, '14 at 11:57a
activeJan 28, '14 at 2:09a
posts4
users3
websitecassandra.apache.org
irc#cassandra

People

Translate

site design / logo © 2022 Grokbase