Hi all -

I'm experiencing a weird problem, which looks to me like it could be a
bug. When running under hadoop, I get an error running a query that
uses a defmapcatop after having opened a file. For instance, if I have
this code file:

(ns my.namespace
(:use [cascalog.api :only (defmapcatop)]
[clojure.java.io :only [input-stream]]))

(defn open-a-file
[file]
(with-open [file-stream (input-stream file)]
"some return value"))

(def info-from-file (open-a-file "test.txt"))

(defmapcatop split-op
[line]
(seq (.split line "\\s+")))

and I run the following from a hadoop repl:

hadoop jar <jar.from.above.file>
user=> (require '[my.namespace :as m])
nil
user=> (use 'cascalog.api)
nil
user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
"something-else") ?line) (m/split-op ?line :> ?word))

The job fails, and the cluster log includes this message:

Caused by: java.lang.RuntimeException:
java.lang.IllegalStateException: Attempting to call unbound fn:
#'leafgrabber.free-text.test/split-op__

Everything works fine if I run on the local file system or if the
cascalog query does not include the defmapcatop operation or if info-
from-file is defined after split-op is defined.

Thanks for any help you can provide!
- David

Search Discussions

  • Nathanmarz at Dec 28, 2011 at 1:36 am
    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?



    On Dec 27, 4:50 pm, David Goss-Grubbs wrote:
    Hi all -

    I'm experiencing a weird problem, which looks to me like it could be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I have
    this code file:

    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))

    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))

    (def info-from-file (open-a-file "test.txt"))

    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))

    and I run the following from a hadoop repl:

    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))

    The job fails, and the cluster log includes this message:

    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__

    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if info-
    from-file is defined after split-op is defined.

    Thanks for any help you can provide!
    - David
  • David Goss-Grubbs at Dec 28, 2011 at 8:26 pm
    Yeah, sorry. That's the namespace of the actual file I'm using. In the
    example below, the error message should refer to #'my.namespace/
    split-op__

    On Tue, Dec 27, 2011 at 5:36 PM, nathanmarz wrote:

    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?



    On Dec 27, 4:50 pm, David Goss-Grubbs wrote:
    Hi all -

    I'm experiencing a weird problem, which looks to me like it could be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I have
    this code file:

    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))

    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))

    (def info-from-file (open-a-file "test.txt"))

    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))

    and I run the following from a hadoop repl:

    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))

    The job fails, and the cluster log includes this message:

    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__

    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if info-
    from-file is defined after split-op is defined.

    Thanks for any help you can provide!
    - David
  • Nathan Marz at Dec 28, 2011 at 8:32 pm
    What version of Cascalog are you using?
    On Wed, Dec 28, 2011 at 12:26 PM, David Goss-Grubbs wrote:

    Yeah, sorry. That's the namespace of the actual file I'm using. In the
    example below, the error message should refer to #'my.namespace/
    split-op__

    On Tue, Dec 27, 2011 at 5:36 PM, nathanmarz wrote:

    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?




    On Dec 27, 4:50 pm, David Goss-Grubbs <dav...@structuredcommons.com>
    wrote:
    Hi all -

    I'm experiencing a weird problem, which looks to me like it could be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I have
    this code file:

    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))

    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))

    (def info-from-file (open-a-file "test.txt"))

    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))

    and I run the following from a hadoop repl:

    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))

    The job fails, and the cluster log includes this message:

    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__

    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if info-
    from-file is defined after split-op is defined.

    Thanks for any help you can provide!
    - David

    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • David Goss-Grubbs at Dec 28, 2011 at 9:03 pm
    1.8.4
    On Wed, Dec 28, 2011 at 12:32 PM, Nathan Marz wrote:

    What version of Cascalog are you using?


    On Wed, Dec 28, 2011 at 12:26 PM, David Goss-Grubbs <
    davidg@structuredcommons.com> wrote:
    Yeah, sorry. That's the namespace of the actual file I'm using. In the
    example below, the error message should refer to #'my.namespace/
    split-op__

    On Tue, Dec 27, 2011 at 5:36 PM, nathanmarz wrote:

    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?




    On Dec 27, 4:50 pm, David Goss-Grubbs <dav...@structuredcommons.com>
    wrote:
    Hi all -

    I'm experiencing a weird problem, which looks to me like it could be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I have
    this code file:

    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))

    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))

    (def info-from-file (open-a-file "test.txt"))

    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))

    and I run the following from a hadoop repl:

    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))

    The job fails, and the cluster log includes this message:

    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__

    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if info-
    from-file is defined after split-op is defined.

    Thanks for any help you can provide!
    - David

    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • Harish Tella at Jan 18, 2012 at 2:10 am
    I've run into the same bug in Cascalog 1.8.5 except I am using a
    defmapop.

    Switching the defmapop out for a defn didn't change anything.

    I am working around it by making my equivalent of David's 'info-from-
    file' into a memoized function. That way the file read happens after
    my mapop gets defined, and I can still define 'info-from-file' before
    the mapop or in an other file that I 'require'.

    On Dec 28 2011, 1:03 pm, David Goss-Grubbs
    wrote:
    1.8.4






    On Wed, Dec 28, 2011 at 12:32 PM, Nathan Marz wrote:
    What version of Cascalog are you using?
    On Wed, Dec 28, 2011 at 12:26 PM, David Goss-Grubbs <
    dav...@structuredcommons.com> wrote:
    Yeah, sorry. That's the namespace of the actual file I'm using. In the
    example below, the error message should refer to #'my.namespace/
    split-op__
    On Tue, Dec 27, 2011 at 5:36 PM, nathanmarz wrote:

    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?
    On Dec 27, 4:50 pm, David Goss-Grubbs <dav...@structuredcommons.com>
    wrote:
    Hi all -
    I'm experiencing a weird problem, which looks to me like it could be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I have
    this code file:
    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))
    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))
    (def info-from-file (open-a-file "test.txt"))
    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))
    and I run the following from a hadoop repl:
    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))
    The job fails, and the cluster log includes this message:
    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__
    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if info-
    from-file is defined after split-op is defined.
    Thanks for any help you can provide!
    - David
    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • Sam Ritchie at Jan 21, 2012 at 2:53 am
    Hey guys, I've opened an issue to track progress on this one:

    https://github.com/nathanmarz/cascalog/issues/46

    I'll run this on the cluster and see if I can track down the root cause.
    Thanks for picking up on this one!
    On Tue, Jan 17, 2012 at 6:10 PM, Harish Tella wrote:

    I've run into the same bug in Cascalog 1.8.5 except I am using a
    defmapop.

    Switching the defmapop out for a defn didn't change anything.

    I am working around it by making my equivalent of David's 'info-from-
    file' into a memoized function. That way the file read happens after
    my mapop gets defined, and I can still define 'info-from-file' before
    the mapop or in an other file that I 'require'.

    On Dec 28 2011, 1:03 pm, David Goss-Grubbs
    wrote:
    1.8.4






    On Wed, Dec 28, 2011 at 12:32 PM, Nathan Marz wrote:
    What version of Cascalog are you using?
    On Wed, Dec 28, 2011 at 12:26 PM, David Goss-Grubbs <
    dav...@structuredcommons.com> wrote:
    Yeah, sorry. That's the namespace of the actual file I'm using. In the
    example below, the error message should refer to #'my.namespace/
    split-op__
    On Tue, Dec 27, 2011 at 5:36 PM, nathanmarz <nathan.m...@gmail.com
    wrote:
    Strange. It's weird that it doesn't have the right namespace for the
    op. Do you know where the "leafgrabber.free-text.test" namespace is
    coming from?
    On Dec 27, 4:50 pm, David Goss-Grubbs <dav...@structuredcommons.com>
    wrote:
    Hi all -
    I'm experiencing a weird problem, which looks to me like it could
    be a
    bug. When running under hadoop, I get an error running a query that
    uses a defmapcatop after having opened a file. For instance, if I
    have
    this code file:
    (ns my.namespace
    (:use [cascalog.api :only (defmapcatop)]
    [clojure.java.io :only [input-stream]]))
    (defn open-a-file
    [file]
    (with-open [file-stream (input-stream file)]
    "some return value"))
    (def info-from-file (open-a-file "test.txt"))
    (defmapcatop split-op
    [line]
    (seq (.split line "\\s+")))
    and I run the following from a hadoop repl:
    hadoop jar <jar.from.above.file>
    user=> (require '[my.namespace :as m])
    nil
    user=> (use 'cascalog.api)
    nil
    user=> (?<- (hfs-textline <something>) [?word] ((hfs-textline
    "something-else") ?line) (m/split-op ?line :> ?word))
    The job fails, and the cluster log includes this message:
    Caused by: java.lang.RuntimeException:
    java.lang.IllegalStateException: Attempting to call unbound fn:
    #'leafgrabber.free-text.test/split-op__
    Everything works fine if I run on the local file system or if the
    cascalog query does not include the defmapcatop operation or if
    info-
    from-file is defined after split-op is defined.
    Thanks for any help you can provide!
    - David
    --
    Twitter: @nathanmarz
    http://nathanmarz.com


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • J. McConnell at Jan 25, 2012 at 4:32 am
    Anyone have any other suggestions for working around this? I'm seeing the
    same error on 1.8.5, however I'm not reading any files anywhere. After
    reading this response, I did try converting all plain def forms to memoized
    functions, but that didn't help.

    Thanks,

    - J.
  • Sam Ritchie at Jan 25, 2012 at 5:52 am
    This is very strange -- do you have some example code that shows the error?
    That'll be really helpful in debugging this. In the meantime, try adding

    :aot :all

    to your project.clj, and running lein compile before you uberjar. If that
    works, we'll have a good clue as to what the fix should be.

    On Tue, Jan 24, 2012 at 6:49 PM, J. McConnell wrote:

    Anyone have any other suggestions for working around this? I'm seeing the
    same error on 1.8.5, however I'm not reading any files anywhere. After
    reading this response, I did try converting all plain def forms to memoized
    functions, but that didn't help.

    Thanks,

    - J.


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • J. McConnell at Jan 25, 2012 at 4:32 pm

    On Wed, Jan 25, 2012 at 5:52 AM, Sam Ritchie wrote:

    This is very strange -- do you have some example code that shows the
    error? That'll be really helpful in debugging this.

    Sure, I'll see if I can't distill the code down a bit to a smaller example.

    In the meantime, try adding

    :aot :all

    to your project.clj, and running lein compile before you uberjar. If that
    works, we'll have a good clue as to what the fix should be.
    Yeah, I had tried this and, unfortunately, it made no difference.

    - J.
  • J. McConnell at Jan 25, 2012 at 5:37 pm

    On Wed, Jan 25, 2012 at 5:52 AM, Sam Ritchie wrote:

    This is very strange -- do you have some example code that shows the
    error? That'll be really helpful in debugging this.

    Ok, hopefully this helps. It seems that simply adding a reference to
    cheshire.core in my ns declaration triggers the bug. I've tried requiring
    cheshire.core instead of using it, but it didn't make a difference.

    ;; This version works
    (ns sessionizer.example
    (:use [cascalog.api]))

    (defmapop parse-request [line]
    ["parsed-request" line])

    (defn run-query []
    (?<- (hfs-textline "/user/j/cascalog-results") [?line]
    ((hfs-textline
    "/flumedata/DeDupe/RequestsLog/daily/2012-01/week-2/10/*") ?line)
    (parse-request ?line :> ?request ?line)))

    ;; This version fails with:
    ;; java.lang.IllegalStateException: Attempting to call unbound fn:
    #'sessionizer.example/parse-request__
    (ns sessionizer.example
    (:use [cascalog.api]
    [cheshire.core :only [parse-string]]))

    (defmapop parse-request [line]
    ["parsed-request" line])

    (defn run-query []
    (?<- (hfs-textline "/output/path") [?line]
    ((hfs-textline "/input/path/*") ?line)
    (parse-request ?line :> ?request ?line)))

    I'll add this as a comment on the ticket. Let me know if there's anything
    else I can do.

    Regards,

    - J.
  • Sam Ritchie at Jan 27, 2012 at 1:10 am
    Hey all,

    I've pushed a fix for this issue to Cascalog 1.8.6-SNAPSHOT and 1.9.0-wip.
    The issue here was that in same cases, requiring a project namespace would
    fail and cascalog would squash the exception. (We do this to allow for
    interactive development at the REPL.) The "unbound fn" problem is a result
    of Cascalog catching some other quite real exception.

    The fix involved only catching an exception if it resulting from a missing
    namespace file. This means that you guys will still get some sort of error,
    it'll just be a lot more informative than the "unbound fn" error.

    David, your error is probably that "file.txt" doesn't exist on other
    machines in the cluster. Try include the file in a "resources"
    subdirectory, adding :resources-path "resources" to your project.clj and
    using (input-stream (clojure.java.io/resource "file.txt")) instead of
    (input-stream "file.txt").

    Thanks guys! Please let me know if the new versions fix your issues.

    Sam
    On Wed, Jan 25, 2012 at 9:37 AM, J. McConnell wrote:
    On Wed, Jan 25, 2012 at 5:52 AM, Sam Ritchie wrote:

    This is very strange -- do you have some example code that shows the
    error? That'll be really helpful in debugging this.

    Ok, hopefully this helps. It seems that simply adding a reference to
    cheshire.core in my ns declaration triggers the bug. I've tried requiring
    cheshire.core instead of using it, but it didn't make a difference.

    ;; This version works
    (ns sessionizer.example
    (:use [cascalog.api]))

    (defmapop parse-request [line]
    ["parsed-request" line])

    (defn run-query []
    (?<- (hfs-textline "/user/j/cascalog-results") [?line]
    ((hfs-textline
    "/flumedata/DeDupe/RequestsLog/daily/2012-01/week-2/10/*") ?line)
    (parse-request ?line :> ?request ?line)))

    ;; This version fails with:
    ;; java.lang.IllegalStateException: Attempting to call unbound fn:
    #'sessionizer.example/parse-request__
    (ns sessionizer.example
    (:use [cascalog.api]
    [cheshire.core :only [parse-string]]))

    (defmapop parse-request [line]
    ["parsed-request" line])

    (defn run-query []
    (?<- (hfs-textline "/output/path") [?line]
    ((hfs-textline "/input/path/*") ?line)
    (parse-request ?line :> ?request ?line)))

    I'll add this as a comment on the ticket. Let me know if there's anything
    else I can do.

    Regards,

    - J.


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedDec 28, '11 at 1:27a
activeJan 27, '12 at 1:10a
posts12
users5
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2022 Grokbase