Grokbase Groups Pig dev August 2010
FAQ
[piggybank] add CSV Loader
--------------------------

Key: PIG-1555
URL: https://issues.apache.org/jira/browse/PIG-1555
Project: Pig
Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
Fix For: 0.8.0


Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Dmitriy V. Ryaboy (JIRA) at Aug 22, 2010 at 8:38 am
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dmitriy V. Ryaboy updated PIG-1555:
    -----------------------------------

    Attachment: PIG_1555.patch

    This is loosely based on the loader by James Kebinger that he open-sourced at http://github.com/jkebinger/pig-user-defined-functions

    I ported to the new API and fixed a few bugs.

    Still doesn't support multi-line records, but the basic stuff works, including quoting quotes by more quotes, excel-style.
    [piggybank] add CSV Loader
    --------------------------

    Key: PIG-1555
    URL: https://issues.apache.org/jira/browse/PIG-1555
    Project: Pig
    Issue Type: New Feature
    Reporter: Dmitriy V. Ryaboy
    Assignee: Dmitriy V. Ryaboy
    Priority: Minor
    Fix For: 0.8.0

    Attachments: PIG_1555.patch


    Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dmitriy V. Ryaboy (JIRA) at Aug 22, 2010 at 8:38 am
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dmitriy V. Ryaboy updated PIG-1555:
    -----------------------------------

    Status: Patch Available (was: Open)
    [piggybank] add CSV Loader
    --------------------------

    Key: PIG-1555
    URL: https://issues.apache.org/jira/browse/PIG-1555
    Project: Pig
    Issue Type: New Feature
    Reporter: Dmitriy V. Ryaboy
    Assignee: Dmitriy V. Ryaboy
    Priority: Minor
    Fix For: 0.8.0

    Attachments: PIG_1555.patch


    Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 23, 2010 at 7:39 pm
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901556#action_12901556 ]

    Alan Gates commented on PIG-1555:
    ---------------------------------

    +1

    If you have a chance sometime I'd be curious to learn the performance characteristics of this versus PigStorage. I'm curious if there is substantial cost to dealing with escaping.
    [piggybank] add CSV Loader
    --------------------------

    Key: PIG-1555
    URL: https://issues.apache.org/jira/browse/PIG-1555
    Project: Pig
    Issue Type: New Feature
    Reporter: Dmitriy V. Ryaboy
    Assignee: Dmitriy V. Ryaboy
    Priority: Minor
    Fix For: 0.8.0

    Attachments: PIG_1555.patch


    Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dmitriy V. Ryaboy (JIRA) at Aug 24, 2010 at 1:50 am
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901697#action_12901697 ]

    Dmitriy V. Ryaboy commented on PIG-1555:
    ----------------------------------------

    Alan,
    The differences I observe when running on actual csv files are within the margin of error -- sometimes CSVLoader comes out on top. Then again I am reading actual CSVs with quoted commas, so it's possible that the similarity in runtimes is due to the fact that PigStorage sees the commas and allocates extra tuple fields.

    -D
    [piggybank] add CSV Loader
    --------------------------

    Key: PIG-1555
    URL: https://issues.apache.org/jira/browse/PIG-1555
    Project: Pig
    Issue Type: New Feature
    Reporter: Dmitriy V. Ryaboy
    Assignee: Dmitriy V. Ryaboy
    Priority: Minor
    Fix For: 0.8.0

    Attachments: PIG_1555.patch


    Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dmitriy V. Ryaboy (JIRA) at Aug 26, 2010 at 8:20 pm
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dmitriy V. Ryaboy updated PIG-1555:
    -----------------------------------

    Status: Resolved (was: Patch Available)
    Release Note:
    CSVLoader can be used to load comma-separated value files.
    It properly handles commas included inside quoted fields, and quotes escaped by preceding them with another quote character (Excel-style).
    CSVLoader only handle single-line entries; quoting a multi-line value will *not* work.
    Resolution: Fixed
    [piggybank] add CSV Loader
    --------------------------

    Key: PIG-1555
    URL: https://issues.apache.org/jira/browse/PIG-1555
    Project: Pig
    Issue Type: New Feature
    Reporter: Dmitriy V. Ryaboy
    Assignee: Dmitriy V. Ryaboy
    Priority: Minor
    Fix For: 0.8.0

    Attachments: PIG_1555.patch


    Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 22, '10 at 8:36a
activeAug 26, '10 at 8:20p
posts6
users1
websitepig.apache.org

1 user in discussion

Dmitriy V. Ryaboy (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase