We've been continuing to get supervisors which go down and have issues with not finding stormconf.ser (leading to perpetual reloads).
If the bug has been fixed, where should I be looking to fix this issue? It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new ZooKeeper.
It's every few days and always generally two of them. The topology uses two workers. It always seems to be after a long string of workers not starting up in time.
2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181 sessionTimeout=20000 [email protected]
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb5, negotiated timeout = 20000
2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update: :connected:none
2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181/storm sessionTimeout=20000 [email protected]
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb6, negotiated timeout = 20000
2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id 05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
2013-02-05 20:56:09 event [ERROR] Error when processing event
java.io.FileNotFoundException: File '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
at clojure.lang.MultiFn.invoke(MultiFn.java:177)
at backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:473)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$dorun.invoke(core.clj:2725)
at clojure.core$doall.invoke(core.clj:2741)
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:662)
2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing an event")
e.x. of not starting up in time:
supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar backtype.storm.daemon.worker contact-pull-1059-491-1360084684 b489ee56-12cf-423a-8eb1-794d04c329ef 6702 58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id contact-pull-1059-491-1360084684
2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time: 1360087255. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255, :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155
155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319 319] [351 351]}, :port 6702}
2013-02-05 18:00:55 supervisor [INFO] Shutting down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:00:55 supervisor [INFO] Shut down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
--
Michael Rose (@Xorlev (https://twitter.com/xorlev))
Senior Platform Engineer, FullContact (http://fullcontact.com/)
[email protected] (mailto:[email protected])
--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.