FAQ
I'm trying to pass a FALSE value thru a custom transform script to another
table, like so:

FROM (
FROM downloads
SELECT project, file, os, FALSE as folder, country, dt
WHERE dt='2010-05-14'
DISTRIBUTE BY project
SORT BY project asc, file asc
) b
INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-05-14', project)
SELECT TRANSFORM(file, os, country, folder, dt, project) USING
'transformwrap reduce.py --verbose' AS (file, downloads, os, folder,
country, project)
describe dl_day
['file', 'string', '']
['downloads', 'int', '']
['os', 'string', '']
['country', 'string', '']
['folder', 'boolean', '']
['dt', 'string', '']
['project', 'string', '']

When I log the 'folder' value from inside reduce.py, it shows:

2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:

i.e., an empty string. But when the INSERT executes, it seems to treat the
value as TRUE (or string 'true')?
select folder from dl_day
['true']
['true']
['true']
['true']
...

How can I preserve the FALSE value thru the transform script?

Thanks,
-L

Search Discussions

  • Dave Brondsema at Oct 13, 2010 at 5:59 pm
    Transform scripts only output text, so Hive has to convert from string to
    the column's data type (boolean in this case). So if you send an empty
    string "", that will be converted to boolean FALSE.

    FYI, on the way in to a transform script, booleans come through as strings
    "true" and "false".
    On Tue, Oct 12, 2010 at 12:17 PM, Luke Crouch wrote:

    I'm trying to pass a FALSE value thru a custom transform script to another
    table, like so:

    FROM (
    FROM downloads
    SELECT project, file, os, FALSE as folder, country, dt
    WHERE dt='2010-05-14'
    DISTRIBUTE BY project
    SORT BY project asc, file asc
    ) b
    INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-05-14', project)
    SELECT TRANSFORM(file, os, country, folder, dt, project) USING
    'transformwrap reduce.py --verbose' AS (file, downloads, os, folder,
    country, project)
    describe dl_day
    ['file', 'string', '']
    ['downloads', 'int', '']
    ['os', 'string', '']
    ['country', 'string', '']
    ['folder', 'boolean', '']
    ['dt', 'string', '']
    ['project', 'string', '']

    When I log the 'folder' value from inside reduce.py, it shows:

    2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:

    i.e., an empty string. But when the INSERT executes, it seems to treat the
    value as TRUE (or string 'true')?
    select folder from dl_day
    ['true']
    ['true']
    ['true']
    ['true']
    ...

    How can I preserve the FALSE value thru the transform script?

    Thanks,
    -L


    --
    Dave Brondsema
    Software Engineer
    Geeknet

    www.geek.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedOct 12, '10 at 4:17p
activeOct 13, '10 at 5:59p
posts2
users2
websitehive.apache.org

2 users in discussion

Dave Brondsema: 1 post Luke Crouch: 1 post

People

Translate

site design / logo © 2021 Grokbase