FAQ
Hey Nathan,

We've been continuing to get supervisors which go down and have issues with not finding stormconf.ser (leading to perpetual reloads).

If the bug has been fixed, where should I be looking to fix this issue? It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new ZooKeeper.

It's every few days and always generally two of them. The topology uses two workers. It always seems to be after a long string of workers not starting up in time.

2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181 sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@5e725967
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb5, negotiated timeout = 20000
2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update: :connected:none
2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@74c12978
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb6, negotiated timeout = 20000
2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id 05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
2013-02-05 20:56:09 event [ERROR] Error when processing event
java.io.FileNotFoundException: File '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
at clojure.lang.MultiFn.invoke(MultiFn.java:177)
at backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:473)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$dorun.invoke(core.clj:2725)
at clojure.core$doall.invoke(core.clj:2741)
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:662)
2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing an event")


e.x. of not starting up in time:

supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar backtype.storm.daemon.worker contact-pull-1059-491-1360084684 b489ee56-12cf-423a-8eb1-794d04c329ef 6702 58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id contact-pull-1059-491-1360084684
2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time: 1360087255. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255, :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155
  155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319 319] [351 351]}, :port 6702}
2013-02-05 18:00:55 supervisor [INFO] Shutting down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:00:55 supervisor [INFO] Shut down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463


--
Michael Rose (@Xorlev (https://twitter.com/xorlev))
Senior Platform Engineer, FullContact (http://fullcontact.com/)
michael@fullcontact.com (mailto:michael@fullcontact.com)

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Ttyunix ttyunix at Feb 7, 2013 at 11:04 pm
    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /
    10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /
    10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for
    id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for
    id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35]
    [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7]
    [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137
    137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77]
    [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81
    81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85]
    [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121
    121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125]
    [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])}
    for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with
    id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state for
    id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35]
    [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167]
    [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43]
    [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299
    299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175
    175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83
    83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307]
    [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215]
    [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123]
    [155 155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31
    31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287
    287] [319 319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm
    id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com <javascript:>
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Rose at Feb 7, 2013 at 11:48 pm
    They are not installed together.

    Not sent from my iPhone
    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" wrote:

    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.**ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server
    /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.**38.9.44:2181 <http://10.38.9.44:2181>,
    initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.**38.9.44:2181<http://10.38.9.44:2181>,
    sessionid = 0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/**storm <http://10.38.9.44:2181/storm>sessionTimeout=20000 watcher=com.netflix.curator.
    **ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server
    /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.**38.9.44:2181 <http://10.38.9.44:2181>,
    initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.**38.9.44:2181<http://10.38.9.44:2181>,
    sessionid = 0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-**275146ff48de at host
    ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-**a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-**275146ff48de:f3dad62e-8819-**
    4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-*
    *275146ff48de:f3dad62e-8819-**4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-**33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-**275146ff48de:d48cc968-8be8-**
    48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-*
    *275146ff48de:d48cc968-8be8-**48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.**supervisor.LocalAssignment{:**storm-id
    "cherry-pitter-import-staging-**528-1360097475", :executors ([3 3] [35
    35] [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7
    7] [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137
    137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77]
    [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81
    81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85]
    [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121
    121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125]
    [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])}
    for this supervisor 05fe7be7-2971-4a7a-9cfc-**275146ff48de on port 6703
    with id 0f3342c4-f2ca-46db-80f6-**70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File '/mnt/storm/supervisor/**
    stormdist/cherry-pitter-**import-staging-528-1360097475/**stormconf.ser'
    does not exist
    at org.apache.commons.io.**FileUtils.openInputStream(**
    FileUtils.java:137)
    at org.apache.commons.io.**FileUtils.readFileToByteArray(**
    FileUtils.java:1135)
    at backtype.storm.config$read_**supervisor_storm_conf.invoke(**
    config.clj:138)
    at backtype.storm.daemon.**supervisor$fn__4793.invoke(**
    supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(**MultiFn.java:177)
    at backtype.storm.daemon.**supervisor$sync_processes$**
    iter__4684__4688$fn__4689.**invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(**LazySeq.java:42)
    at clojure.lang.LazySeq.seq(**LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:**473)
    at clojure.core$seq.invoke(core.**clj:133)
    at clojure.core$dorun.invoke(**core.clj:2725)
    at clojure.core$doall.invoke(**core.clj:2741)
    at backtype.storm.daemon.**supervisor$sync_processes.**
    invoke(supervisor.clj:237)
    at clojure.lang.AFn.**applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.**java:151)
    at clojure.core$apply.invoke(**core.clj:603)
    at clojure.core$partial$fn__4070.**doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(**RestFn.java:397)
    at backtype.storm.event$event_**manager$fn__2507.invoke(event.**clj:24)
    at clojure.lang.AFn.run(AFn.java:**24)
    at java.lang.Thread.run(Thread.**java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-**pull-1059-491-1360084684/**stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-**1360084684
    b489ee56-12cf-423a-8eb1-**794d04c329ef 6702 58c2eba0-a51d-4baf-95db-**
    6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-**1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-**6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.
    **WorkerHeartbeat{:time-secs 1360087255, :storm-id
    "contact-pull-1059-491-**1360084684", :executors #{[3 3] [35 35] [67 67]
    [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199
    199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75
    75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299]
    [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175]
    [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83]
    [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339
    339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247
    247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155
    155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31]
    [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287]
    [319 319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-**794d04c329ef:58c2eba0-a51d-**
    4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down b489ee56-12cf-423a-8eb1-*
    *794d04c329ef:58c2eba0-a51d-**4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-**1360088463 from /mnt/storm/nimbus/stormdist/**
    contact-pull-1067-504-**1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm
    id contact-pull-1067-504-**1360088463 from /mnt/storm/nimbus/stormdist/**
    contact-pull-1067-504-**1360088463

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Enno Shioji at Feb 9, 2013 at 6:11 am
    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.

    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:

    They are not installed together.

    Not sent from my iPhone
    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" <tty...@gmail.com <javascript:>>
    wrote:
    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.**ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.**38.9.44:2181 <http://10.38.9.44:2181>,
    initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.**38.9.44:2181<http://10.38.9.44:2181>,
    sessionid = 0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/**storm <http://10.38.9.44:2181/storm>sessionTimeout=20000 watcher=com.netflix.curator.
    **ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.**38.9.44:2181 <http://10.38.9.44:2181>,
    initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.**38.9.44:2181<http://10.38.9.44:2181>,
    sessionid = 0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-**275146ff48de at host
    ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-**a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-**275146ff48de:f3dad62e-8819-**
    4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-
    **275146ff48de:f3dad62e-8819-**4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-**33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-**275146ff48de:d48cc968-8be8-**
    48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-
    **275146ff48de:d48cc968-8be8-**48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.**supervisor.LocalAssignment{:**storm-id
    "cherry-pitter-import-staging-**528-1360097475", :executors ([3 3] [35
    35] [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7
    7] [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137
    137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77]
    [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81
    81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85]
    [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121
    121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125]
    [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])}
    for this supervisor 05fe7be7-2971-4a7a-9cfc-**275146ff48de on port 6703
    with id 0f3342c4-f2ca-46db-80f6-**70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File '/mnt/storm/supervisor/**
    stormdist/cherry-pitter-**import-staging-528-1360097475/**stormconf.ser'
    does not exist
    at org.apache.commons.io.**FileUtils.openInputStream(**
    FileUtils.java:137)
    at org.apache.commons.io.**FileUtils.readFileToByteArray(**
    FileUtils.java:1135)
    at backtype.storm.config$read_**supervisor_storm_conf.invoke(**
    config.clj:138)
    at backtype.storm.daemon.**supervisor$fn__4793.invoke(**
    supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(**MultiFn.java:177)
    at backtype.storm.daemon.**supervisor$sync_processes$**
    iter__4684__4688$fn__4689.**invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(**LazySeq.java:42)
    at clojure.lang.LazySeq.seq(**LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:**473)
    at clojure.core$seq.invoke(core.**clj:133)
    at clojure.core$dorun.invoke(**core.clj:2725)
    at clojure.core$doall.invoke(**core.clj:2741)
    at backtype.storm.daemon.**supervisor$sync_processes.**
    invoke(supervisor.clj:237)
    at clojure.lang.AFn.**applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.**java:151)
    at clojure.core$apply.invoke(**core.clj:603)
    at clojure.core$partial$fn__4070.**doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(**RestFn.java:397)
    at backtype.storm.event$event_**manager$fn__2507.invoke(event.**clj:24)
    at clojure.lang.AFn.run(AFn.java:**24)
    at java.lang.Thread.run(Thread.**java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-**pull-1059-491-1360084684/**stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-**1360084684
    b489ee56-12cf-423a-8eb1-**794d04c329ef 6702 58c2eba0-a51d-4baf-95db-**
    6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-**6c18538ed5a9
    still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-**1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-**6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.
    **WorkerHeartbeat{:time-secs 1360087255, :storm-id
    "contact-pull-1059-491-**1360084684", :executors #{[3 3] [35 35] [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167]
    [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43]
    [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299
    299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175
    175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83
    83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307]
    [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215]
    [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123]
    [155 155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31
    31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287
    287] [319 319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-**794d04c329ef:58c2eba0-a51d-**
    4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down b489ee56-12cf-423a-8eb1-
    **794d04c329ef:58c2eba0-a51d-**4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-**1360088463 from /mnt/storm/nimbus/stormdist/**
    contact-pull-1067-504-**1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-**1360088463 from
    /mnt/storm/nimbus/stormdist/**contact-pull-1067-504-**1360088463

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Matthew Gordon at Feb 11, 2013 at 11:33 pm
    We have also been seeing this on 0.7.3 so it doesn't look like a new
    problem. Would be great to get a fix for this.

    On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji wrote:
    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.

    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:

    They are not installed together.

    Not sent from my iPhone
    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" wrote:

    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67
    67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39
    39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137]
    [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109
    109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81]
    [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117
    117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27
    27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63
    63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this
    supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id
    0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199]
    [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75]
    [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331
    331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207
    207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115
    115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339]
    [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247]
    [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155]
    [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63]
    [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319
    319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com.

    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Rose at Feb 11, 2013 at 11:39 pm
    Supposedly this was fixed in 0.8.2, but we're still experiencing it on a regular basis. Brand new cluster, no state shared with old 0.7.2 cluster.

    --
    Michael Rose (@Xorlev (https://twitter.com/xorlev))
    Senior Platform Engineer, FullContact (http://fullcontact.com/)
    michael@fullcontact.com (mailto:michael@fullcontact.com)

    On Monday, February 11, 2013 at 4:33 PM, Matthew Gordon wrote:

    We have also been seeing this on 0.7.3 so it doesn't look like a new
    problem. Would be great to get a fix for this.

    On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji (mailto:eshioji@gmail.com)> wrote:
    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.

    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:

    They are not installed together.

    Not sent from my iPhone
    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" (http://gmail.com)> wrote:

    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67
    67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39
    39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137]
    [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109
    109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81]
    [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117
    117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27
    27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63
    63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this
    supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id
    0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199]
    [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75]
    [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331
    331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207
    207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115
    115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339]
    [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247]
    [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155]
    [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63]
    [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319
    319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com (http://fullcontact.com)
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com (http://googlegroups.com).

    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nathan Marz at Feb 19, 2013 at 3:01 am
    Hi Michael,

    I believe I've now fixed this bug. Try out 0.9.0-wip16 and let me know how
    it goes. Alternatively you can apply this patch to 0.8.2 and build your own
    release with only this change:

    https://github.com/nathanmarz/storm/commit/414af600ab08c4cdd7cefc0205ac95036af64c1e

    Let me know how it goes.

    -Nathan

    On Mon, Feb 11, 2013 at 3:39 PM, Michael Rose wrote:

    Supposedly this was fixed in 0.8.2, but we're still experiencing it on a
    regular basis. Brand new cluster, no state shared with old 0.7.2 cluster.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    michael@fullcontact.com

    On Monday, February 11, 2013 at 4:33 PM, Matthew Gordon wrote:

    We have also been seeing this on 0.7.3 so it doesn't look like a new
    problem. Would be great to get a fix for this.


    On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji wrote:

    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.


    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:


    They are not installed together.

    Not sent from my iPhone

    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" wrote:


    the nimbus and the supervisor don't install one machine together.


    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:


    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35]
    [67
    67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39
    39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137]
    [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109
    109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81]
    [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117
    117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121]
    [27
    27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63
    63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this
    supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id
    0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File

    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at

    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35]
    [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199
    199]
    [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75]
    [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331
    331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207
    207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115
    115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339]
    [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247]
    [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155]
    [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63]
    [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319
    319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com.

    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Rose at Feb 19, 2013 at 3:22 am
    Thanks Nathan!

    I'll roll a release of 0.8.2 patched out tomorrow and see how things go.

    --
    Michael Rose (@Xorlev (https://twitter.com/xorlev))
    Senior Platform Engineer, FullContact (http://fullcontact.com/)
    michael@fullcontact.com (mailto:michael@fullcontact.com)

    On Monday, February 18, 2013 at 8:01 PM, Nathan Marz wrote:

    Hi Michael,

    I believe I've now fixed this bug. Try out 0.9.0-wip16 and let me know how it goes. Alternatively you can apply this patch to 0.8.2 and build your own release with only this change:

    https://github.com/nathanmarz/storm/commit/414af600ab08c4cdd7cefc0205ac95036af64c1e

    Let me know how it goes.

    -Nathan

    On Mon, Feb 11, 2013 at 3:39 PM, Michael Rose (mailto:michael@fullcontact.com)> wrote:
    Supposedly this was fixed in 0.8.2, but we're still experiencing it on a regular basis. Brand new cluster, no state shared with old 0.7.2 cluster.

    --
    Michael Rose (@Xorlev (https://twitter.com/xorlev))
    Senior Platform Engineer, FullContact (http://fullcontact.com/)
    michael@fullcontact.com (mailto:michael@fullcontact.com)

    On Monday, February 11, 2013 at 4:33 PM, Matthew Gordon wrote:

    We have also been seeing this on 0.7.3 so it doesn't look like a new
    problem. Would be great to get a fix for this.

    On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji (mailto:eshioji@gmail.com)> wrote:
    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.

    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:

    They are not installed together.

    Not sent from my iPhone
    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" (http://gmail.com)> wrote:

    the nimbus and the supervisor don't install one machine together.

    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:

    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 (http://10.38.9.44:2181) sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181 (http://10.38.9.44:2181)
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181 (http://10.38.9.44:2181), initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181 (http://10.38.9.44:2181), sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm (http://10.38.9.44:2181/storm) sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181 (http://10.38.9.44:2181)
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181 (http://10.38.9.44:2181), initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181 (http://10.38.9.44:2181), sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67
    67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39
    39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137]
    [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109
    109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81]
    [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117
    117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27
    27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63
    63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this
    supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id
    0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199]
    [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75]
    [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331
    331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207
    207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115
    115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339]
    [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247]
    [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155]
    [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63]
    [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319
    319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com (http://fullcontact.com)
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com (http://googlegroups.com).

    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user%2Bunsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Twitter: @nathanmarz
    http://nathanmarz.com (http://nathanmarz.com/)

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • vonPuh fonPuhendorf at Feb 25, 2013 at 8:58 pm

    Confirm the issue is resolved tried 0.9.0wip15 and then 0.9.0wip16 tested
    and worked.Thanks.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Viral Bajaria at Feb 25, 2013 at 9:30 pm
    I faced this issue when running 0.9.0 wip15, this is what I did to repro
    the issue:

    - start supervisor on a node
    - submit a topology
    - wait for workers to start running
    - kill supervisor ---> for some reason some workers also died, but the
    topology kept on running
    - kill topology (this step is optional)
    - restart supervisor ---> this fails with the stormconf.ser does not exist

    Even after creating the directory, it still had issues starting up. I had
    to kill all workers on that node and then restart supervisor for it to
    start working.

    Thanks,
    Viral
    On Mon, Feb 25, 2013 at 3:24 AM, vonPuh fonPuhendorf wrote:

    Confirm the issue is resolved tried 0.9.0wip15 and then 0.9.0wip16 tested
    and worked.Thanks.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nathan Marz at Feb 25, 2013 at 9:42 pm
    It's fixed in 0.9.0-wip16, not in 0.9.0-wip15.
    On Mon, Feb 25, 2013 at 1:30 PM, Viral Bajaria wrote:

    I faced this issue when running 0.9.0 wip15, this is what I did to repro
    the issue:

    - start supervisor on a node
    - submit a topology
    - wait for workers to start running
    - kill supervisor ---> for some reason some workers also died, but the
    topology kept on running
    - kill topology (this step is optional)
    - restart supervisor ---> this fails with the stormconf.ser does not exist

    Even after creating the directory, it still had issues starting up. I had
    to kill all workers on that node and then restart supervisor for it to
    start working.

    Thanks,
    Viral

    On Mon, Feb 25, 2013 at 3:24 AM, vonPuh fonPuhendorf <
    vonpuhfonpuhendorf@gmail.com> wrote:
    Confirm the issue is resolved tried 0.9.0wip15 and then 0.9.0wip16 tested
    and worked.Thanks.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Viral Bajaria at Feb 25, 2013 at 10:15 pm
    ahh.. sorry... I read the previous email as saying that the tests worked
    for both wip15 and wip16

    Thanks Nathan.
    On Mon, Feb 25, 2013 at 1:42 PM, Nathan Marz wrote:

    It's fixed in 0.9.0-wip16, not in 0.9.0-wip15.

    On Mon, Feb 25, 2013 at 1:30 PM, Viral Bajaria wrote:

    I faced this issue when running 0.9.0 wip15, this is what I did to repro
    the issue:

    - start supervisor on a node
    - submit a topology
    - wait for workers to start running
    - kill supervisor ---> for some reason some workers also died, but the
    topology kept on running
    - kill topology (this step is optional)
    - restart supervisor ---> this fails with the stormconf.ser does not exist

    Even after creating the directory, it still had issues starting up. I had
    to kill all workers on that node and then restart supervisor for it to
    start working.

    Thanks,
    Viral

    On Mon, Feb 25, 2013 at 3:24 AM, vonPuh fonPuhendorf <
    vonpuhfonpuhendorf@gmail.com> wrote:
    Confirm the issue is resolved tried 0.9.0wip15 and then 0.9.0wip16
    tested and worked.Thanks.
    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    Twitter: @nathanmarz
    http://nathanmarz.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Richards Peter at Apr 25, 2013 at 12:28 pm
    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm
    0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it
    in some of the previous releases of storm also. The changelog file in storm
    webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to
    know about the release dates of these builds.

    Thanks,
    Richards Peter.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Rose at Apr 25, 2013 at 1:54 pm
    You can download 0.8.3-wip3 from storm-project.net which contains this fix. 0.8.3-wip3 only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev (https://twitter.com/xorlev))
    Senior Platform Engineer, FullContact (http://fullcontact.com/)
    michael@fullcontact.com (mailto:michael@fullcontact.com)

    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm 0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it in some of the previous releases of storm also. The changelog file in storm webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to know about the release dates of these builds.

    Thanks,
    Richards Peter.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Richards Peter at May 13, 2013 at 11:39 am
    Hi Nathan,

    I would like to verify whether the fix mentioned in
    https://github.com/nathanmarz/storm/blob/master/CHANGELOG.md for storm
    0.8.3 is the one related this issue. I could also see similar log for storm
    0.8.2. So I am little bit confused about the status of this issue. Are the
    fixes on storm 0.8.2 and storm 0.8.3-wip partial fixes?

    Is this issue fixed in both storm-0.8.3-wip and storm-0.9.0-wip16 or only
    in storm-0.9.0-wip16?

    Richards Peter.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Rose at May 13, 2013 at 2:01 pm
    We upgraded to 0.8.3-wip3 , it is fixed as far as we can tell. We've
    encountered it once but haven't been able to replicate it (the day we
    upgraded). After clearing ZK and worker directories its been smooth sailing

    -- Sent from mobile
    On May 13, 2013 5:39 AM, "Richards Peter" wrote:

    Hi Nathan,

    I would like to verify whether the fix mentioned in
    https://github.com/nathanmarz/storm/blob/master/CHANGELOG.md for storm
    0.8.3 is the one related this issue. I could also see similar log for storm
    0.8.2. So I am little bit confused about the status of this issue. Are the
    fixes on storm 0.8.2 and storm 0.8.3-wip partial fixes?

    Is this issue fixed in both storm-0.8.3-wip and storm-0.9.0-wip16 or only
    in storm-0.9.0-wip16?

    Richards Peter.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Patricio Echagüe at May 13, 2013 at 3:30 pm
    We are in 0.8.3 wip3 as well and ran into the same issue. But it only
    happened once in like a month.
    On May 13, 2013 7:01 AM, "Michael Rose" wrote:

    We upgraded to 0.8.3-wip3 , it is fixed as far as we can tell. We've
    encountered it once but haven't been able to replicate it (the day we
    upgraded). After clearing ZK and worker directories its been smooth sailing

    -- Sent from mobile
    On May 13, 2013 5:39 AM, "Richards Peter" wrote:

    Hi Nathan,

    I would like to verify whether the fix mentioned in
    https://github.com/nathanmarz/storm/blob/master/CHANGELOG.md for storm
    0.8.3 is the one related this issue. I could also see similar log for storm
    0.8.2. So I am little bit confused about the status of this issue. Are the
    fixes on storm 0.8.2 and storm 0.8.3-wip partial fixes?

    Is this issue fixed in both storm-0.8.3-wip and storm-0.9.0-wip16 or only
    in storm-0.9.0-wip16?

    Richards Peter.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Allan C at May 13, 2013 at 4:46 pm
    We haven't seen it since we upgraded to 0.8.3-wip a few months ago.
    On Monday, May 13, 2013 at 8:29 AM, Patricio Echagüe wrote:

    We are in 0.8.3 wip3 as well and ran into the same issue. But it only happened once in like a month.
    On May 13, 2013 7:01 AM, "Michael Rose" (mailto:michael@fullcontact.com)> wrote:
    We upgraded to 0.8.3-wip3 , it is fixed as far as we can tell. We've encountered it once but haven't been able to replicate it (the day we upgraded). After clearing ZK and worker directories its been smooth sailing
    -- Sent from mobile
    On May 13, 2013 5:39 AM, "Richards Peter" (mailto:hbkrichards@gmail.com)> wrote:
    Hi Nathan,

    I would like to verify whether the fix mentioned in https://github.com/nathanmarz/storm/blob/master/CHANGELOG.md for storm 0.8.3 is the one related this issue. I could also see similar log for storm 0.8.2. So I am little bit confused about the status of this issue. Are the fixes on storm 0.8.2 and storm 0.8.3-wip partial fixes?

    Is this issue fixed in both storm-0.8.3-wip and storm-0.9.0-wip16 or only in storm-0.9.0-wip16?

    Richards Peter.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user%2Bunsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user%2Bunsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
    To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe?hl=en.
    To unsubscribe from this group and all its topics, send an email to storm-user+unsubscribe@googlegroups.com (mailto:storm-user+unsubscribe@googlegroups.com).
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Billy Watson at Sep 24, 2013 at 12:42 pm
    I am on 0.8.3 and have just seen this error. Is it supposed to be fixed in
    this version?

    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
    at
    backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
    at clojure.lang.AFn.applyToHelper(AFn.java:185)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at
    backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
    at clojure.lang.RestFn.invoke(RestFn.java:512)
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
    at clojure.lang.AFn.applyToHelper(AFn.java:172)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.worker.main(Unknown Source)
    On Thursday, April 25, 2013 9:54:45 AM UTC-4, Michael Rose wrote:

    You can download 0.8.3-wip3 from storm-project.net which contains this
    fix. 0.8.3-wip3 only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com <javascript:>

    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm
    0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it
    in some of the previous releases of storm also. The changelog file in storm
    webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to
    know about the release dates of these builds.

    Thanks,
    Richards Peter.

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Thomas Söhngen at Sep 25, 2013 at 11:41 am
    I would really like to know if this is fixed too. This error is the
    biggest issue we have with Storm atm. We have a lot of Topologies and
    this error occurs quite often. The only workaround we found is to wipe
    the Storm and zookeeper data dirs on the whole cluster and resubmit
    every Topology, which is veeery time-consuming when you have over 20
    Topologies running on your cluster.

    We are looking forward to 0.8.3 mostly to see this fixed! It's the cause
    of continuing annoyance and downtime!


    Am 9/24/2013 2:42 PM, schrieb Billy Watson:
    I am on 0.8.3 and have just seen this error. Is it supposed to be
    fixed in this version?

    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
    at
    backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
    at clojure.lang.AFn.applyToHelper(AFn.java:185)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at
    backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
    at clojure.lang.RestFn.invoke(RestFn.java:512)
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
    at clojure.lang.AFn.applyToHelper(AFn.java:172)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.worker.main(Unknown Source)

    On Thursday, April 25, 2013 9:54:45 AM UTC-4, Michael Rose wrote:

    You can download 0.8.3-wip3 from storm-project.net
    <http://storm-project.net> which contains this fix. 0.8.3-wip3
    only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com <javascript:>
    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and
    storm 0.9.0. We faced this issue with storm 0.8.2 today. We have
    experienced it in some of the previous releases of storm also.
    The changelog file in storm webpage says that the issue is fixed
    in storm 0.8.3 also. So I am keen to know about the release dates
    of these builds.

    Thanks,
    Richards Peter.
    --
    You received this message because you are subscribed to the
    Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out
    <https://groups.google.com/groups/opt_out>.
    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    Thomas Söhngen

    Office: +49 221 294 975 20
    Email: thomas.soehngen@stockpulse.de

    www.stockpulse.de
    www.facebook.com/stockpulse

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Sitz der Gesellschaft: Köln
    Amtsgericht: Köln (HRB 72529)
    Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Registered Office: Cologne
    District Court: Cologne HRB (72529)
    Managing Director: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Quinton Anderson at Sep 25, 2013 at 6:31 pm
    It was fixed properly in 0.9.0-wip16
    On Wednesday, 25 September 2013, Thomas Söhngen wrote:

    I would really like to know if this is fixed too. This error is the
    biggest issue we have with Storm atm. We have a lot of Topologies and this
    error occurs quite often. The only workaround we found is to wipe the Storm
    and zookeeper data dirs on the whole cluster and resubmit every Topology,
    which is veeery time-consuming when you have over 20 Topologies running on
    your cluster.

    We are looking forward to 0.8.3 mostly to see this fixed! It's the cause
    of continuing annoyance and downtime!


    Am 9/24/2013 2:42 PM, schrieb Billy Watson:

    I am on 0.8.3 and have just seen this error. Is it supposed to be fixed in
    this version?

    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
    at
    backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
    at clojure.lang.AFn.applyToHelper(AFn.java:185)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at
    backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
    at clojure.lang.RestFn.invoke(RestFn.java:512)
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
    at clojure.lang.AFn.applyToHelper(AFn.java:172)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.worker.main(Unknown Source)

    On Thursday, April 25, 2013 9:54:45 AM UTC-4, Michael Rose wrote:

    You can download 0.8.3-wip3 from storm-project.net which contains this
    fix. 0.8.3-wip3 only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com

    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm
    0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it
    in some of the previous releases of storm also. The changelog file in storm
    webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to
    know about the release dates of these builds.

    Thanks,
    Richards Peter.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@**googlegroups.com.
    For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
    .




    --
    You re To unsubscribe from this group and stop receiving emails from it,
    send an email to storm-user+unsubscribe@googlegroups.com<javascript:_e({}, 'cvml', 'storm-user+unsubscribe@googlegroups.com');>
    .
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Thomas Söhngen

    Office: +49 221 294 975 20
    Email: thomas.soehngen@stockpulse.de <javascript:_e({}, 'cvml', 'thomas.soehngen@stockpulse.de');>
    www.stockpulse.dewww.facebook.com/stockpulse

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Sitz der Gesellschaft: Köln
    Amtsgericht: Köln (HRB 72529)
    Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Registered Office: Cologne
    District Court: Cologne HRB (72529)
    Managing Director: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    --
    You received this message because you are subscribed to a topic in the
    Google Groups "storm-user" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    storm-user+unsubscribe@googlegroups.com <javascript:_e({}, 'cvml',
    'storm-user%2Bunsubscribe@googlegroups.com');>.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Jon at Oct 18, 2013 at 7:18 pm
    I just had this happen in 0.9.0-rc2
    On Wednesday, September 25, 2013 2:31:33 PM UTC-4, Quinton Anderson wrote:

    It was fixed properly in 0.9.0-wip16
    On Wednesday, 25 September 2013, Thomas Söhngen wrote:

    I would really like to know if this is fixed too. This error is the
    biggest issue we have with Storm atm. We have a lot of Topologies and this
    error occurs quite often. The only workaround we found is to wipe the Storm
    and zookeeper data dirs on the whole cluster and resubmit every Topology,
    which is veeery time-consuming when you have over 20 Topologies running on
    your cluster.

    We are looking forward to 0.8.3 mostly to see this fixed! It's the cause
    of continuing annoyance and downtime!


    Am 9/24/2013 2:42 PM, schrieb Billy Watson:

    I am on 0.8.3 and have just seen this error. Is it supposed to be fixed
    in this version?

    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
    at
    backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
    at clojure.lang.AFn.applyToHelper(AFn.java:185)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at
    backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
    at clojure.lang.RestFn.invoke(RestFn.java:512)
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
    at clojure.lang.AFn.applyToHelper(AFn.java:172)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.worker.main(Unknown Source)

    On Thursday, April 25, 2013 9:54:45 AM UTC-4, Michael Rose wrote:

    You can download 0.8.3-wip3 from storm-project.net which contains this
    fix. 0.8.3-wip3 only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com

    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm
    0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it
    in some of the previous releases of storm also. The changelog file in storm
    webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to
    know about the release dates of these builds.

    Thanks,
    Richards Peter.
    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@**googlegroups.com.
    For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
    .




    --
    You re To unsubscribe from this group and stop receiving emails from it,
    send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Thomas Söhngen

    Office: +49 221 294 975 20
    Email: thomas.soehngen@stockpulse.de
    www.stockpulse.dewww.facebook.com/stockpulse

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Sitz der Gesellschaft: Köln
    Amtsgericht: Köln (HRB 72529)
    Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Registered Office: Cologne
    District Court: Cologne HRB (72529)
    Managing Director: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    --
    You received this message because you are subscribed to a topic in the
    Google Groups "storm-user" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • P. Taylor Goetz at Oct 18, 2013 at 10:30 pm
    Did you clear all zookeeper and local storm state?

    We used to see this on a regular basis under 0.8.x, but since upgrading to 0.9.0-rc2 we haven't seen it at all.

    When upgrading, clearing state is very important. You also want to make sure all processes from the previous version are killed.

    -Taylor
    --
    P. Taylor Goetz
    Software Architect

    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    P: 610.994.5237 • healthmarketscience.com • http://ptgoetz.github.io • @ptgoetz
    On Oct 18, 2013, at 3:18 PM, Jon wrote:

    I just had this happen in 0.9.0-rc2
    On Wednesday, September 25, 2013 2:31:33 PM UTC-4, Quinton Anderson wrote:
    It was fixed properly in 0.9.0-wip16
    On Wednesday, 25 September 2013, Thomas Söhngen wrote:
    I would really like to know if this is fixed too. This error is the biggest issue we have with Storm atm. We have a lot of Topologies and this error occurs quite often. The only workaround we found is to wipe the Storm and zookeeper data dirs on the whole cluster and resubmit every Topology, which is veeery time-consuming when you have over 20 Topologies running on your cluster.

    We are looking forward to 0.8.3 mostly to see this fixed! It's the cause of continuing annoyance and downtime!


    Am 9/24/2013 2:42 PM, schrieb Billy Watson:
    I am on 0.8.3 and have just seen this error. Is it supposed to be fixed in this version?

    java.io.FileNotFoundException: File '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser' does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
    at backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
    at clojure.lang.AFn.applyToHelper(AFn.java:185)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
    at clojure.lang.RestFn.invoke(RestFn.java:512)
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
    at clojure.lang.AFn.applyToHelper(AFn.java:172)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.worker.main(Unknown Source)

    On Thursday, April 25, 2013 9:54:45 AM UTC-4, Michael Rose wrote:
    You can download 0.8.3-wip3 from storm-project.net which contains this fix. 0.8.3-wip3 only has a few bugfixes so far.

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com
    On Thursday, April 25, 2013 at 6:28 AM, Richards Peter wrote:

    Hi,

    I am interested to know about the release date of storm 0.8.3 and storm 0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it in some of the previous releases of storm also. The changelog file in storm webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to know about the release dates of these builds.

    Thanks,
    Richards Peter.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You re To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    Thomas Söhngen

    Office: +49 221 294 975 20
    Email: thomas.soehngen@stockpulse.de

    www.stockpulse.de
    www.facebook.com/stockpulse

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Sitz der Gesellschaft: Köln
    Amtsgericht: Köln (HRB 72529)
    Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    StockPulse GmbH
    Registered Office: Cologne
    District Court: Cologne HRB (72529)
    Managing Director: Stefan Nann, Jonas Krauss
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    --
    You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
    To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Jon at Oct 18, 2013 at 10:35 pm
    Restarting everything seemed to fix it -- we had weird issues where it
    chewed through all of the Zookeeper connections though, with Nimbus then
    not starting, and had to restart Zookeeper as well.

    I didn't clear state before updating to 0.9, but this specific topology did
    not exist before in 0.8.


    Sort of an aside, but it might be worthwhile to have an easy way to clear
    state (unless their is, and I'm unaware of it). Maybe I'll throw something
    together and make a pull request for it.

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Caimc1020 at Aug 6, 2013 at 12:57 am
    I use the 0.9.0-wip16 and i have the same problem!


    在 2013年2月19日星期二UTC+8上午11时01分10秒,Nathan Marz写道:
    Hi Michael,

    I believe I've now fixed this bug. Try out 0.9.0-wip16 and let me know how
    it goes. Alternatively you can apply this patch to 0.8.2 and build your own
    release with only this change:


    https://github.com/nathanmarz/storm/commit/414af600ab08c4cdd7cefc0205ac95036af64c1e

    Let me know how it goes.

    -Nathan


    On Mon, Feb 11, 2013 at 3:39 PM, Michael Rose <mic...@fullcontact.com<javascript:>
    wrote:
    Supposedly this was fixed in 0.8.2, but we're still experiencing it on a
    regular basis. Brand new cluster, no state shared with old 0.7.2 cluster.

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com <javascript:>

    On Monday, February 11, 2013 at 4:33 PM, Matthew Gordon wrote:

    We have also been seeing this on 0.7.3 so it doesn't look like a new
    problem. Would be great to get a fix for this.


    On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji <esh...@gmail.com<javascript:>>
    wrote:

    I've seen the same message on my cluster.
    I stopped all services and nuked all states from my cluster (storm temp
    directory etc.) and restarted them. That made it go away.


    On Thursday, 7 February 2013 23:48:09 UTC, Michael Rose wrote:


    They are not installed together.

    Not sent from my iPhone

    On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" wrote:


    the nimbus and the supervisor don't install one machine together.


    On Wednesday, February 6, 2013 5:04:12 AM UTC+8, Michael Rose wrote:


    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to
    server /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35]
    [67
    67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39
    39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137]
    [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109
    109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81]
    [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117
    117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121]
    [27
    27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31]
    [63
    63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this
    supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id
    0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File

    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at
    backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at

    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35]
    [67
    67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323
    323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199
    199]
    [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75]
    [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331
    331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207
    207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115
    115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339]
    [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247]
    [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155]
    [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63
    63]
    [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319
    319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for
    storm id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev)
    Senior Platform Engineer, FullContact
    mic...@fullcontact.com

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com.

    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • 姚仁捷 at Aug 12, 2013 at 8:21 am
    In my environment, this problem usually happened in storm-0.8.2, and after
    upgrade to 0.9.0, it works well.

    在 2013年2月6日星期三UTC+8上午5时04分12秒,Michael Rose写道:
    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /
    10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /
    10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for
    id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for
    id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35]
    [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7]
    [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137
    137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77]
    [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81
    81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85]
    [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121
    121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125]
    [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])}
    for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with
    id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9
    still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state for
    id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35]
    [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167]
    [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43]
    [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299
    299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175
    175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83
    83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307]
    [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215]
    [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123]
    [155 155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31
    31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287
    287] [319 319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm
    id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com <javascript:>
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Quinton Anderson at Aug 13, 2013 at 9:25 pm
    In 0.9.0-wip16, I did see the problem until I cleared all the state from
    supervisor nodes and zookeeper and then restarted. But the problem then
    went away with 0.9.0-wip16, I validated by going back to 0.8.2 and the
    defect came back.

    So, upgrade, clear your state and then try again. Note that I did still see
    the errors in the log, but this was a transient effect, a warning rather
    than an error.
    On Monday, August 12, 2013 6:21:05 PM UTC+10, 姚仁捷 wrote:

    In my environment, this problem usually happened in storm-0.8.2, and after
    upgrade to 0.9.0, it works well.

    在 2013年2月6日星期三UTC+8上午5时04分12秒,Michael Rose写道:
    Hey Nathan,

    We've been continuing to get supervisors which go down and have issues
    with not finding stormconf.ser (leading to perpetual reloads).

    If the bug has been fixed, where should I be looking to fix this issue?
    It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new
    ZooKeeper.

    It's every few days and always generally two of them. The topology uses
    two workers. It always seems to be after a long string of workers not
    starting up in time.

    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181 sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@5e725967
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server
    /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb5, negotiated timeout = 20000
    2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update:
    :connected:none
    2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
    2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
    2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
    2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection,
    connectString=10.38.9.44:2181/storm sessionTimeout=20000
    watcher=com.netflix.curator.ConnectionState@74c12978
    2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server
    /10.38.9.44:2181
    2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to
    ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
    2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on
    server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid =
    0x13c6e2bce917bb6, negotiated timeout = 20000
    2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id
    05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
    2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state
    for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time:
    1360097769. State: :disallowed, Heartbeat: nil
    2013-02-05 20:56:09 supervisor [INFO] Shutting down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Shut down
    05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
    2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment
    #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
    "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35]
    [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7]
    [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137
    137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77]
    [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81
    81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85]
    [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121
    121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125]
    [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])}
    for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with
    id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
    2013-02-05 20:56:09 event [ERROR] Error when processing event
    java.io.FileNotFoundException: File
    '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser'
    does not exist
    at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
    at
    org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
    at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
    at clojure.lang.MultiFn.invoke(MultiFn.java:177)
    at
    backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:473)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$dorun.invoke(core.clj:2725)
    at clojure.core$doall.invoke(core.clj:2741)
    at
    backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:603)
    at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing
    an event")

    e.x. of not starting up in time:

    supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar
    backtype.storm.daemon.worker contact-pull-1059-491-1360084684
    b489ee56-12cf-423a-8eb1-794d04c329ef 6702
    58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 17:18:16 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:17 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:18 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:19 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 17:18:20 supervisor [INFO]
    58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
    2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id
    contact-pull-1059-491-1360084684
    2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state
    for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time:
    1360087255. State: :disallowed, Heartbeat:
    #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255,
    :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35]
    [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291]
    [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167]
    [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43]
    [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299
    299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175
    175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83
    83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307]
    [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215]
    [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123]
    [155 155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31
    31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287
    287] [319 319] [351 351]}, :port 6702}
    2013-02-05 18:00:55 supervisor [INFO] Shutting down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:00:55 supervisor [INFO] Shut down
    b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
    2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id
    contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
    2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm
    id contact-pull-1067-504-1360088463 from
    /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

    --
    Michael Rose (@Xorlev <https://twitter.com/xorlev>)
    Senior Platform Engineer, FullContact <http://fullcontact.com/>
    mic...@fullcontact.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

People

Translate

site design / logo © 2022 Grokbase