FAQ
Class <? extends T> Deserializer.getRealClass() method to return the actual class of the objects from a deserializer
--------------------------------------------------------------------------------------------------------------------

Key: HADOOP-4192
URL: https://issues.apache.org/jira/browse/HADOOP-4192
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: Pete Wyckoff


A deserializer can return a subtype of the type it is instantiated to return, in which case, it may be important for the caller to know the actual type of the objects this deserializer will return.

An example of this would be
{code}
public class RecordIODeserializer implements Deserializer<Record> {
private Class<? extends Record> getMyRecordClass (Configuration conf); // gets the specific record io class from a configuration variable set by the mapper
Record deserialize(Record t) throws IOException { return ... } // obviously t cannot be Record - must be a subclass of Record.
}
{code}

The caller needs to instantiate the right Record subclass in order to call deserialize or to implement createKey/value from a recordreader. In this case, only the Deserializer knows the actual type of the records being returned.

One could instead parameterize TRecordIODeserializer, but in that case, how does one know the actual type of the object being deserialized until runtime, so one would still end up with <? extends Record> and the caller would still need to get the class.

I propose adding a getRealClass that returns the actual subclass being used.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Pete Wyckoff (JIRA) at Sep 16, 2008 at 7:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pete Wyckoff updated HADOOP-4192:
    ---------------------------------

    Description:
    Note: this use case is completely for non-self describing files with Serialization framework records. If the Serialization Class and the actual type of records to be deserialized is configured higher up through the JobConf.

    It is motivated by the need to create a generic FlatFileDeserializerRecordReader that can be configued to use any Serialization implementation through the JobConf.

    Since A deserializer can return a subtype of the type it is instantiated to return, we can create generic Deserializers for a base type - e.g., Writable, Record, Thrift.Tbase where the RecordReader need not be specific to any of them.

    In which case,to implement RecordReader.getValueClass();, the generic RecordReader really needs to query that from the Deserializer.

    And since this RecordReader is generic even ithe Serialization Implementation it is going to use should come from the JobConf as should the actual specific class being Deserialized. e.g., Record/MyUserIDRecord, Writable/LongWritable.

    The RecordReader would need to know how the Serialization and Deserializer get their configuration info to implement getValueClass().

    A much cleaner way is to implement getRealClass I think.


    was:
    A deserializer can return a subtype of the type it is instantiated to return, in which case, it may be important for the caller to know the actual type of the objects this deserializer will return.

    An example of this would be
    {code}
    public class RecordIODeserializer implements Deserializer<Record> {
    private Class<? extends Record> getMyRecordClass (Configuration conf); // gets the specific record io class from a configuration variable set by the mapper
    Record deserialize(Record t) throws IOException { return ... } // obviously t cannot be Record - must be a subclass of Record.
    }
    {code}

    The caller needs to instantiate the right Record subclass in order to call deserialize or to implement createKey/value from a recordreader. In this case, only the Deserializer knows the actual type of the records being returned.

    One could instead parameterize TRecordIODeserializer, but in that case, how does one know the actual type of the object being deserialized until runtime, so one would still end up with <? extends Record> and the caller would still need to get the class.

    I propose adding a getRealClass that returns the actual subclass being used.



    Class <? extends T> Deserializer.getRealClass() method to return the actual class of the objects from a deserializer
    --------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-4192
    URL: https://issues.apache.org/jira/browse/HADOOP-4192
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Pete Wyckoff

    Note: this use case is completely for non-self describing files with Serialization framework records. If the Serialization Class and the actual type of records to be deserialized is configured higher up through the JobConf.
    It is motivated by the need to create a generic FlatFileDeserializerRecordReader that can be configued to use any Serialization implementation through the JobConf.
    Since A deserializer can return a subtype of the type it is instantiated to return, we can create generic Deserializers for a base type - e.g., Writable, Record, Thrift.Tbase where the RecordReader need not be specific to any of them.
    In which case,to implement RecordReader.getValueClass();, the generic RecordReader really needs to query that from the Deserializer.
    And since this RecordReader is generic even ithe Serialization Implementation it is going to use should come from the JobConf as should the actual specific class being Deserialized. e.g., Record/MyUserIDRecord, Writable/LongWritable.
    The RecordReader would need to know how the Serialization and Deserializer get their configuration info to implement getValueClass().
    A much cleaner way is to implement getRealClass I think.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pete Wyckoff (JIRA) at Sep 18, 2008 at 3:14 am
    [ https://issues.apache.org/jira/browse/HADOOP-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pete Wyckoff resolved HADOOP-4192.
    ----------------------------------

    Resolution: Invalid

    should be attacking this the other way around by having a way of getting serialization context info - the serialization class and the class to be deserialized and pass these down.

    Class <? extends T> Deserializer.getRealClass() method to return the actual class of the objects from a deserializer
    --------------------------------------------------------------------------------------------------------------------

    Key: HADOOP-4192
    URL: https://issues.apache.org/jira/browse/HADOOP-4192
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Pete Wyckoff

    Note: this use case is completely for non-self describing files with Serialization framework records. If the Serialization Class and the actual type of records to be deserialized is configured higher up through the JobConf.
    It is motivated by the need to create a generic FlatFileDeserializerRecordReader that can be configued to use any Serialization implementation through the JobConf.
    Since A deserializer can return a subtype of the type it is instantiated to return, we can create generic Deserializers for a base type - e.g., Writable, Record, Thrift.Tbase where the RecordReader need not be specific to any of them.
    In which case,to implement RecordReader.getValueClass();, the generic RecordReader really needs to query that from the Deserializer.
    And since this RecordReader is generic even ithe Serialization Implementation it is going to use should come from the JobConf as should the actual specific class being Deserialized. e.g., Record/MyUserIDRecord, Writable/LongWritable.
    The RecordReader would need to know how the Serialization and Deserializer get their configuration info to implement getValueClass().
    A much cleaner way is to implement getRealClass I think.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 16, '08 at 5:57p
activeSep 18, '08 at 3:14a
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Pete Wyckoff (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase