FAQ
Exception with file globbing closures
-------------------------------------

Key: HADOOP-3064
URL: https://issues.apache.org/jira/browse/HADOOP-3064
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.16.1
Reporter: Tom White


Using file globbing to select various input paths, like so:

conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));

gives an exception:

Exception in thread "main" java.io.IOException: Illegal file pattern:
Expecting set closure character or end of range, or } for glob {02 at
3
at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
at org.apache.hadoop.fs.FileSystem$GlobFilter.(FileSystem.java:826)
at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)

The code for JobConf.getInputPaths tokenizes using
a comma as the delimiter, producing two paths
"mr/input/glob/2008/02/{02" and "08}".

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Hairong Kuang (JIRA) at Mar 21, 2008 at 4:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581118#action_12581118 ]

    Hairong Kuang commented on HADOOP-3064:
    ---------------------------------------

    This is a problem whenever a file name contains a comma. I am thinking to escape commas before saving the input paths to a jobconf and unescape it after reading it from the jobconf.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White

    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 22, 2008 at 12:13 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Attachment: inputPath.patch

    A patch for review.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Attachments: inputPath.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 22, 2008 at 12:13 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang reassigned HADOOP-3064:
    -------------------------------------

    Assignee: Hairong Kuang
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 22, 2008 at 12:23 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581228#action_12581228 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3064:
    ------------------------------------------------

    - should we use String.replaceAll(...) instead of String.replace(...)?

    - could you add some more complicated test (e.g. with two or more comma, escape char, etc)?
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 22, 2008 at 1:25 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Attachment: inputPath1.patch

    This patch adds a testcase as suggested by Nicholas. I think using replace is more efficient than replaceAll.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 22, 2008 at 1:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE updated HADOOP-3064:
    -------------------------------------------

    Status: Patch Available (was: Open)

    +1
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 23, 2008 at 11:07 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581370#action_12581370 ]

    Hadoop QA commented on HADOOP-3064:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378426/inputPath1.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc -1. The javadoc tool appears to have generated 1 warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2029/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2029/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2029/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2029/console

    This message is automatically generated.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 24, 2008 at 3:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581572#action_12581572 ]

    Tom White commented on HADOOP-3064:
    -----------------------------------

    Thanks Hairong. This patch fixes the problem I was seeing.

    +1

    It would be good if you could add some unit tests for the new public methods in StringUtils. I know there is a test for JobConf, but it would be nice if there were some direct tests of split, escapeString and unEscapeString (including corner cases).
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 26, 2008 at 5:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Attachment: inputPath2.patch

    Tom, thanks for testing my previous patch and suggestions. I added some testcases and ended up with rewriting all my code. :-) Here comes this brandnew patch.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 26, 2008 at 6:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582392#action_12582392 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3064:
    ------------------------------------------------

    In java.lang.String, "".split(",") returns an empty array but StringUtils.split("") returns {""}.

    What is the expected behavior for StringUtils.split(",,,")?

    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 26, 2008 at 8:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582424#action_12582424 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3064:
    ------------------------------------------------

    FYI, all of the following return an empty array:
    {code}
    System.out.println(Arrays.asList("".split(",")));
    System.out.println(Arrays.asList(",".split(",")));
    System.out.println(Arrays.asList(",,".split(",")));
    {code}
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 26, 2008 at 10:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Attachment: inputPath3.patch

    Ok this patch removes all trailing empty splits as Java does.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Mar 26, 2008 at 10:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582459#action_12582459 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-3064:
    ------------------------------------------------

    +1
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 26, 2008 at 11:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Fix Version/s: 0.17.0
    Status: Patch Available (was: Open)
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 26, 2008 at 11:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Hairong Kuang updated HADOOP-3064:
    ----------------------------------

    Status: Open (was: Patch Available)
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 27, 2008 at 3:09 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582507#action_12582507 ]

    Hadoop QA commented on HADOOP-3064:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378666/inputPath3.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 6 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2072/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2072/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2072/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2072/console

    This message is automatically generated.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Mar 27, 2008 at 11:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-3064:
    ------------------------------

    Attachment: inputPath4.patch
    I added some testcases and ended up with rewriting all my code.
    Reminds me of Fred Brooks: "Plan to throw one away; you will anyhow." The changes look good, and it still fixes my original problem.

    The new unit tests had the arguments to assertEquals the wrong way round (it's expected value then actual value), so I've created another patch to fix that.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch, inputPath4.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hairong Kuang (JIRA) at Mar 27, 2008 at 5:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582737#action_12582737 ]

    Hairong Kuang commented on HADOOP-3064:
    ---------------------------------------

    Thank you, Tom.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch, inputPath4.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Konstantin Shvachko (JIRA) at Mar 28, 2008 at 12:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Konstantin Shvachko updated HADOOP-3064:
    ----------------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thank you Hairong.
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch, inputPath4.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Mar 28, 2008 at 12:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582994#action_12582994 ]

    Hudson commented on HADOOP-3064:
    --------------------------------

    Integrated in Hadoop-trunk #444 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/444/])
    Exception with file globbing closures
    -------------------------------------

    Key: HADOOP-3064
    URL: https://issues.apache.org/jira/browse/HADOOP-3064
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.16.1
    Reporter: Tom White
    Assignee: Hairong Kuang
    Fix For: 0.17.0

    Attachments: inputPath.patch, inputPath1.patch, inputPath2.patch, inputPath3.patch, inputPath4.patch


    Using file globbing to select various input paths, like so:
    conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
    gives an exception:
    Exception in thread "main" java.io.IOException: Illegal file pattern:
    Expecting set closure character or end of range, or } for glob {02 at
    3
    at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
    at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
    at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
    at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
    The code for JobConf.getInputPaths tokenizes using
    a comma as the delimiter, producing two paths
    "mr/input/glob/2008/02/{02" and "08}".
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 21, '08 at 4:43p
activeMar 28, '08 at 12:20p
posts21
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 21 posts

People

Translate

site design / logo © 2022 Grokbase