FAQ
Is there a simple example of a project.clj and .clj file that runs on
Amazon Elastic Map Reduce? I keep getting the same error:

2012-07-08 05:31:31,837 WARN org.apache.hadoop.hdfs.DFSClient
(Thread-18): DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be
replicated to 0 nodes, instead of 1

I'm using `lein uberjar` and `defmain` to create the jar. Everthing
runs fine on my local Hadoop install on my laptop.

The latest attempt is in https://gist.github.com/3069628

Cheers,
Chris Dean

Search Discussions

  • Marc Limotte at Jul 8, 2012 at 5:49 pm
    Chris,

    Did you try starting a new EMR cluster? Judging from the exception I
    wouldn't think this is related to your code. Probably a cluster issue
    (network? disk?).

    marc
    On Sun, Jul 8, 2012 at 2:43 AM, Chris Dean wrote:

    Is there a simple example of a project.clj and .clj file that runs on
    Amazon Elastic Map Reduce? I keep getting the same error:

    2012-07-08 05:31:31,837 WARN org.apache.hadoop.hdfs.DFSClient
    (Thread-18): DataStreamer Exception:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be
    replicated to 0 nodes, instead of 1

    I'm using `lein uberjar` and `defmain` to create the jar. Everthing
    runs fine on my local Hadoop install on my laptop.

    The latest attempt is in https://gist.github.com/3069628

    Cheers,
    Chris Dean
  • Chris Dean at Jul 8, 2012 at 9:46 pm

    Marc Limotte writes:
    Did you try starting a new EMR cluster? Judging from the exception I
    wouldn't think this is related to your code. Probably a cluster issue
    (network? disk?).
    I've tried it 8 times now with slightly different setups and I
    consistently get the same error.

    Maybe it was something to do with the version of Cascalog or Cascading
    being incompatible with the Hadoop version on EMR.

    Are other folks running cascalog 1.9 on EMR?

    Cheers,
    Chris Dean
  • Chris Dean at Jul 9, 2012 at 1:21 am
    Looks like the emr is working. Changed the code to write out the
    results worked:

    (defmain WritePoemFreq [in-path out-path]
    (let [tap (hfs-textline in-path)
    out (hfs-textline out-path)]
    (?- out (freq-count-query tap))))

    I had assumed that the stdout would show up in some log, but I don't see
    it. Still seeing the error, but the written results are correct.

    Cheers,
    Chris Dean
  • Chris Dean at Jul 9, 2012 at 4:14 am

    Chris Dean writes:
    Is there a simple example of a project.clj and .clj file that runs on
    Amazon Elastic Map Reduce? I keep getting the same error:
    I put up a simple example in

    https://github.com/ctdean/cascalog-hello

    and a short description in

    http://www.ctdean.com/2012/07/06/cascalog-on-emr.html

    Cheers,
    Chris Dean
  • A at Jul 17, 2012 at 6:16 pm
    I'm getting back into Cascalog after a long pause and am finding the
    project.clj and ecosystem, version compatibility,
    dependencies/dev-dependencies have mostly changed with lein2. Thanks for
    putting "cascalog-hello" together (https://github.com/ctdean/cascalog-hello).
    It helps a lot.

    That said, I'm seeing a few errors when I run locally at a lein2 repl,
    along with an INFO warning that class not found exceptions may happen. How
    can I specify that it finds the classes it requires when it runs locally
    and assure it does the right thing when on EC2?

    Example:

    (??- (freq-count-query [["a line of text line many"] ["words line count
    line text"]]))

    12/07/17 11:02:04 INFO *util.HadoopUtil: using default application jar, may
    cause class not found exceptions on the cluster*
    12/07/17 11:02:04 INFO planner.HadoopPlanner: using application jar:
    /nfs/home/aaelony/.m2/repository/cascading/cascading-hadoop/2.0.0/cascading-hadoop-2.0.0.jar
    12/07/17 11:02:04 INFO property.AppProps: using app.id:
    569954D88846226B2D84D7F1AD64FEC3
    12/07/17 11:02:04 INFO hadoop.TupleSerialization: using default comparator:
    cascalog.hadoop.DefaultComparator
    12/07/17 11:02:04 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
    ClassNotFoundException org.codehaus.jackson.map.JsonMappingException
    java.net.URLClassLoader$1.run (URLClassLoader.java:200)
    12/07/17 11:02:04 INFO flow.Flow: [] starting
    12/07/17 11:02:04 INFO flow.Flow: [] source:
    MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/ecde3a83-eb07-4d96-b779-7a88ef84d223"]"]
    12/07/17 11:02:04 INFO flow.Flow: [] sink:
    Hfs["SequenceFile[[UNKNOWN]->['?word',
    '?count']]"]["/tmp/cascalog_reserved/f0c4e82d-4e91-4108-bb45-330826edc40e/9d748dbb-3e3b-4c8f-a3a8-bd0e1a3ff855"]"]
    12/07/17 11:02:04 INFO flow.Flow: [] parallel execution is enabled: false
    12/07/17 11:02:04 INFO flow.Flow: [] starting jobs: 1
    12/07/17 11:02:04 INFO flow.Flow: [] allocating threads: 1
    12/07/17 11:02:04 INFO flow.FlowStep: [] starting step: (1/1)
    ...3b-4c8f-a3a8-bd0e1a3ff855
    12/07/17 11:02:04 INFO flow.Flow: [] stopping all jobs
    12/07/17 11:02:04 INFO flow.FlowStep: [] stopping: (1/1)
    ...3b-4c8f-a3a8-bd0e1a3ff855
    12/07/17 11:02:04 INFO flow.Flow: [] stopped all jobs
    12/07/17 11:02:04 INFO flow.Flow: [] shutting down job executor
    12/07/17 11:02:04 INFO flow.Flow: [] shutdown complete

    Many thanks,
    A



    On Sunday, July 8, 2012 9:14:18 PM UTC-7, ctdean wrote:

    Chris Dean writes:
    Is there a simple example of a project.clj and .clj file that runs on
    Amazon Elastic Map Reduce? I keep getting the same error:
    I put up a simple example in

    https://github.com/ctdean/cascalog-hello

    and a short description in

    http://www.ctdean.com/2012/07/06/cascalog-on-emr.html

    Cheers,
    Chris Dean
  • Chris Dean at Jul 17, 2012 at 6:28 pm

    A writes:
    That said, I'm seeing a few errors when I run locally at a lein2 repl,
    along with an INFO warning that class not found exceptions may happen. How
    can I specify that it finds the classes it requires when it runs locally
    and assure it does the right thing when on EC2?
    I'm not sure. You're right that I only tested that config on EMR and
    not locally.

    This project.clj is very short if someone more knowledge can take a
    look:

    (defproject cascalog-hello "0.1.0"
    :dependencies [[cascalog/cascalog "1.9.0"]
    [org.apache.hadoop/hadoop-core "0.20.205.0"]]
    :aot [ctdean.cascalog.hello])

    https://github.com/ctdean/cascalog-hello/blob/master/project.clj

    Cheers,
    Chris Dean
  • Jeroen van Dijk at Jul 19, 2012 at 9:45 am

    On Tue, Jul 17, 2012 at 8:27 PM, Chris Dean wrote:

    A <aaelony@gmail.com> writes:
    That said, I'm seeing a few errors when I run locally at a lein2 repl,
    along with an INFO warning that class not found exceptions may happen. How
    can I specify that it finds the classes it requires when it runs locally
    and assure it does the right thing when on EC2?
    I'm not sure. You're right that I only tested that config on EMR and
    not locally.

    This project.clj is very short if someone more knowledge can take a
    look:

    (defproject cascalog-hello "0.1.0"
    :dependencies [[cascalog/cascalog "1.9.0"]
    [org.apache.hadoop/hadoop-core "0.20.205.0"]]
    :aot [ctdean.cascalog.hello])
    I think the hadoop dependency should only be added in development. Here is
    a project.clj of mine https://gist.github.com/3142700 that works on EMR
    with the use of Lemur (the Postgres part is new so I'm not yet sure about
    that though).

    HTH,
    Jeroen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJul 8, '12 at 6:43a
activeJul 19, '12 at 9:45a
posts8
users4
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2022 Grokbase