FAQ
I am having issues loading a delimited csv. here is a sample data
"1", "123456", " ", "Charlotte, NC"

The delimiter is ,
and fields are enclosed by " "

Does anyone have syntax for loading this in Hive?

Thanks, Ram

Search Discussions

  • Mark Grover at Nov 28, 2012 at 4:17 pm
    Moving to user@hive.apache.org since this is an Apache Hive related question

    Hi Ram,
    Hive doesn't provide a way to do this out of the box. I have seen other
    people have the same request before (and more, e.g. use of escape
    characters to embed the enclosing characters, etc.) and have just created a
    Hive JIRA to discuss and implement this.
    https://issues.apache.org/jira/browse/HIVE-3751

    However, there are several ways I can think of to work around this for now.
    They are:
    1. Pre-process the files to get rid of the enclosing characters ("). This
    may or may not be scalable depending on your implementation, would be
    fairly simple to implement.
    2. Create a staging Hive table with enclosing characters as a part of the
    data. Then, you could do one of 3 things with this staging table.
    2a. Create another Hive table that gets populated by a Hive query on the
    staging table. The query basically strips out the enclosing characters.
    2b. Create a view on the existing Hive staging table that presents the
    data after removing the enclosing characters.
    2c. Use the existing RegexSerDe when creating the Hive staging table to
    get rid of the enclosing characters when serializing/deserializing the data
    from the staging table.
    3. Create a custom informat or SerDe that supports enclosing characters in
    delimited fields.

    #3 above will be an answer to the JIRA I just created. So, the community
    would be grateful to you if you go that route and contribute it back. Of
    course, feel free to choose another option if that works better for you.

    For your reference, there is a similar thread that I had replied to at:
    http://mail-archives.apache.org/mod_mbox/hive-user/201204.mbox/%3CCAENxBwwrZrqBSJXtJHpqc_FfcZvwRMoaT9W7dR=JGtYJoXPqLw@mail.gmail.com%3E

    Good luck!
    Mark
    On Wed, Nov 28, 2012 at 7:25 AM, Ram Krishnamurthy wrote:

    I am having issues loading a delimited csv. here is a sample data
    "1", "123456", " ", "Charlotte, NC"

    The delimiter is ,
    and fields are enclosed by " "

    Does anyone have syntax for loading this in Hive?

    Thanks, Ram

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedNov 28, '12 at 3:25p
activeNov 28, '12 at 4:17p
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Mark Grover: 1 post Ram Krishnamurthy: 1 post

People

Translate

site design / logo © 2022 Grokbase