hi,

I'm try to write a hadoop streaming job by perl. But i'm complately confused
by the key/value separator.

I found lots of separators I can set ...

# -jobconf stream.map.output.field.separator=A \
# -jobconf stream.reducer.output.field.separator=B \
# -jobconf mapred.textoutputformat.separator=C \
# -jobconf key.value.separator.in.input.line=D \
# -jobconf stream.map.output.field.separator=A \
# -jobconf stream.reduce.input.field.separator=AA \
# -jobconf stream.reduce.output.field.separator=B \
# -jobconf map.output.key.field.separator=C \

But what does these separators mean?

I try to use ^A in my job, and find this
bug<http://issues.apache.org/jira/browse/HADOOP-3341>, it seems hadoop
have fix it in 0.19.0, but I still get follow error when I
set to ^A.

[Fatal Error] :49:68: Character reference "&#1" is an invalid XML character.
09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file:
org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
character.
Exception in thread "main" java.lang.RuntimeException:
org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
character.
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:381)
at
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630)
at org.apache.hadoop.mapred.JobConf.(LocalJobRunner.java:93)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at
org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.xml.sax.SAXParseException: Character reference "&#1" is an
invalid XML character.
at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091)
... 19 more

So, I can't use ^A as the separator ?

Search Discussions

  • Amogh Vasekar at Nov 10, 2009 at 6:40 am
    Hi,
    I'm pretty sure you need to specify unicode equivalent, or atleast that is what I used in my java map-red program.

    Amogh


    On 11/10/09 9:24 AM, "wd" wrote:

    hi,

    I'm try to write a hadoop streaming job by perl. But i'm complately confused by the key/value separator.

    I found lots of separators I can set ...

    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reducer.output.field.separator=B \
    # -jobconf mapred.textoutputformat.separator=C \
    # -jobconf key.value.separator.in.input.line=D \
    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reduce.input.field.separator=AA \
    # -jobconf stream.reduce.output.field.separator=B \
    # -jobconf map.output.key.field.separator=C \

    But what does these separators mean?

    I try to use ^A in my job, and find this bug <http://issues.apache.org/jira/browse/HADOOP-3341> , it seems hadoop have fix it in 0.19.0, but I still get follow error when I set to ^A.

    [Fatal Error] :49:68: Character reference "&#1" is an invalid XML character.
    09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.
    Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.
    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167)
    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039)
    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:381)
    at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630)
    at org.apache.hadoop.mapred.JobConf.(LocalJobRunner.java:93)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    Caused by: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091)
    ... 19 more

    So, I can't use ^A as the separator ?
  • Wd at Nov 10, 2009 at 6:54 am
    You mean the ^A ?
    I tried \u0001 and \x01, the streaming job recognise it as a string, not
    ^A..

    :(

    2009/11/10 Amogh Vasekar <amogh@yahoo-inc.com>
    Hi,
    I’m pretty sure you need to specify unicode equivalent, or atleast that is
    what I used in my java map-red program.

    Amogh



    On 11/10/09 9:24 AM, "wd" wrote:

    hi,

    I'm try to write a hadoop streaming job by perl. But i'm complately
    confused by the key/value separator.

    I found lots of separators I can set ...

    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reducer.output.field.separator=B \
    # -jobconf mapred.textoutputformat.separator=C \
    # -jobconf key.value.separator.in.input.line=D \
    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reduce.input.field.separator=AA \
    # -jobconf stream.reduce.output.field.separator=B \
    # -jobconf map.output.key.field.separator=C \

    But what does these separators mean?

    I try to use ^A in my job, and find this bug <
    http://issues.apache.org/jira/browse/HADOOP-3341> , it seems hadoop have
    fix it in 0.19.0, but I still get follow error when I set to ^A.


    [Fatal Error] :49:68: Character reference "&#1" is an invalid XML
    character.
    09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    Exception in thread "main" java.lang.RuntimeException:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167)
    at
    org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039)
    at
    org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:381)
    at
    org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630)
    at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:214)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93)
    at
    org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    Caused by: org.xml.sax.SAXParseException: Character reference "&#1" is an
    invalid XML character.
    at
    com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at
    com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091)
    ... 19 more

    So, I can't use ^A as the separator ?
  • Jason Venner at Nov 17, 2009 at 6:33 am
    There is a very clear picture in chapter 8 of pro hadoop, on all of the
    separators for streaming jobs.

    On Tue, Nov 10, 2009 at 6:53 AM, wd wrote:

    You mean the ^A ?
    I tried \u0001 and \x01, the streaming job recognise it as a string, not
    ^A..

    :(

    2009/11/10 Amogh Vasekar <amogh@yahoo-inc.com>

    Hi,
    I’m pretty sure you need to specify unicode equivalent, or atleast that is
    what I used in my java map-red program.

    Amogh



    On 11/10/09 9:24 AM, "wd" wrote:

    hi,

    I'm try to write a hadoop streaming job by perl. But i'm complately
    confused by the key/value separator.

    I found lots of separators I can set ...

    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reducer.output.field.separator=B \
    # -jobconf mapred.textoutputformat.separator=C \
    # -jobconf key.value.separator.in.input.line=D \
    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reduce.input.field.separator=AA \
    # -jobconf stream.reduce.output.field.separator=B \
    # -jobconf map.output.key.field.separator=C \

    But what does these separators mean?

    I try to use ^A in my job, and find this bug <
    http://issues.apache.org/jira/browse/HADOOP-3341> , it seems hadoop have
    fix it in 0.19.0, but I still get follow error when I set to ^A.


    [Fatal Error] :49:68: Character reference "&#1" is an invalid XML
    character.
    09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    Exception in thread "main" java.lang.RuntimeException:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167)
    at
    org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039)
    at
    org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:381)
    at
    org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630)
    at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:214)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93)
    at
    org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    Caused by: org.xml.sax.SAXParseException: Character reference "&#1" is an
    invalid XML character.
    at
    com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at
    com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091)
    ... 19 more

    So, I can't use ^A as the separator ?

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Wd at Nov 17, 2009 at 9:02 am
    Oh, very thanks, I found the picture.
    thks

    2009/11/17 Jason Venner <jason.hadoop@gmail.com>
    There is a very clear picture in chapter 8 of pro hadoop, on all of the
    separators for streaming jobs.


    On Tue, Nov 10, 2009 at 6:53 AM, wd wrote:

    You mean the ^A ?
    I tried \u0001 and \x01, the streaming job recognise it as a string, not
    ^A..

    :(

    2009/11/10 Amogh Vasekar <amogh@yahoo-inc.com>

    Hi,
    I’m pretty sure you need to specify unicode equivalent, or atleast that
    is what I used in my java map-red program.

    Amogh



    On 11/10/09 9:24 AM, "wd" wrote:

    hi,

    I'm try to write a hadoop streaming job by perl. But i'm complately
    confused by the key/value separator.

    I found lots of separators I can set ...

    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reducer.output.field.separator=B \
    # -jobconf mapred.textoutputformat.separator=C \
    # -jobconf key.value.separator.in.input.line=D \
    # -jobconf stream.map.output.field.separator=A \
    # -jobconf stream.reduce.input.field.separator=AA \
    # -jobconf stream.reduce.output.field.separator=B \
    # -jobconf map.output.key.field.separator=C \

    But what does these separators mean?

    I try to use ^A in my job, and find this bug <
    http://issues.apache.org/jira/browse/HADOOP-3341> , it seems hadoop
    have fix it in 0.19.0, but I still get follow error when I set to ^A.


    [Fatal Error] :49:68: Character reference "&#1" is an invalid XML
    character.
    09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    Exception in thread "main" java.lang.RuntimeException:
    org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
    character.
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167)
    at
    org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039)
    at
    org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:381)
    at
    org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630)
    at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:214)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93)
    at
    org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    Caused by: org.xml.sax.SAXParseException: Character reference "&#1" is an
    invalid XML character.
    at
    com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at
    com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at
    org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091)
    ... 19 more

    So, I can't use ^A as the separator ?

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedNov 10, '09 at 3:55a
activeNov 17, '09 at 9:02a
posts5
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Wd: 3 posts Amogh Vasekar: 1 post Jason Venner: 1 post

People

Translate

site design / logo © 2022 Grokbase