Generate and accept JSON as the input-output format from mappers and reducers

Key: HADOOP-4592
URL: https://issues.apache.org/jira/browse/HADOOP-4592
Project: Hadoop Core
Issue Type: Wish
Components: contrib/hive
Reporter: Venky Iyer

set mapred.data.format=JSON;
MAP USING 'python filter.py'

would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.

{ column1: value1, column2: [1,2,3] } etc

It would in turn produce JSON.

This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).

This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of {column name: value} on every record.

Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
postedNov 5, '08 at 9:42a
activeNov 5, '08 at 9:42a

1 user in discussion

Venky Iyer (JIRA): 1 post



site design / logo © 2022 Grokbase