I am using jcascalog. And it is taking lots of time in flow planning phase.

This is kind of non linear delay in planning phase for more than 30 steps.

I have iterative code to do joins and that is what actually creating
incremental number of steps depending upon the input. I have the below
stats :
Number of steps 13 --> less than 1 min
Number of steps 18 --> less than 1 min
Number of steps 23 --> less than 1 min
Number of steps 28 --> 7 mins
Number of steps 33 --> 43 mins
Number of steps 48 --> more than 2 hrs...did not start

After doing some analysis I found that this problem is in cascading while
it prepares flows. I have raised this issue in cascading community long
back but don't have any solution till now. Also like to know if any one has
faced similar problem while using jcascalog as jcascalog uses cascading?
How did you solve it? Kindly help.


I have created a small test to simulate this scenario. I am using Jcascalog
on top of cascading.Code is in my github:

Here I am doing self join for the same input for a given depth. For every
depth cascading creates 2 jobs. Thus by changing depth I was able check the
preparation time for multiple cascading jobs. Here I can see cascading job
preparation time is increasing in non linear fashion as number of jobs
Execution steps for this test is here<https://github.com/sourabhchaki/cascalog-cascading-test/blob/master/README.md>

*depth=5,step 10, time taken:1 sec*
[17/06/2013:14:13:19 IST] [INFO] [cascading.property.AppProps main]: using
app.id: FC106638099703F5450E89B08BB7442F
[17/06/2013:14:13:20 IST] [INFO] [cascading.util.Version flow]: Concurrent,
Inc - Cascading 2.0.0
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] starting
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] starting
jobs: 10
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] allocating
threads: 1
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.FlowStep pool-1-thread-1]:
[] starting step: (6/10)

Dot file for flow is attached. 10jobs.doc

*depth=10, steps: 20: Time taken: 15 mins.*
[17/06/2013:14:14:50 IST] [INFO] [cascading.property.AppProps main]: using
app.id: 264A79523E9A9AF21EB04D2814FBCF9F
[17/06/2013:14:29:54 IST] [INFO] [cascading.util.Version flow]: Concurrent,
Inc - Cascading 2.0.0
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: [] starting
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: [] starting
jobs: 20

Dot file for flow is attached. 20jobs.doc

I tried with depth =15, so jobs= 30, and waited for 1 hrs but the
application never started.

Hope this will help you to investigate the problem.

Let me know if you need any more details.


You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJul 9, '13 at 12:54p
activeJul 9, '13 at 2:06p

2 users in discussion

Sourabh Chaki: 1 post Andre Kelpe: 1 post



site design / logo © 2021 Grokbase