FAQ
Make BinStorage declare its schema
----------------------------------

Key: PIG-468
URL: https://issues.apache.org/jira/browse/PIG-468
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Olga Natkovich
Fix For: types_branch


Currently, BinStorage breaks the rule that unless it tells Pig what types it is producing it should produce bytearrays. This causes runtime problems as the frontend assumes the data to be of one type while in fact it is of different type.

Loader interface has a way to specify schema via determineSchema API. BinStorage need to implement this. Also, since this interface has not been used before, the pluming might also need to be adjusted.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Pradeep Kamath (JIRA) at Oct 14, 2008 at 9:24 pm
    [ https://issues.apache.org/jira/browse/PIG-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639593#action_12639593 ]

    Pradeep Kamath commented on PIG-468:
    ------------------------------------

    Attached patch, some notes on the patch:
    - determineSchema() was never being called from LOLoad since the schemaFile was always passed as null from the parser. I have changed the signature of this method so that implementations of this method can open the input file if they need to, to determine the schema. Here is the new API:
    {code}
    /**
    * Find the schema from the loader. This function will be called at parse time
    * (not run time) to see if the loader can provide a schema for the data. The
    * loader may be able to do this if the data is self describing (e.g. JSON). If
    * the loader cannot determine the schema, it can return a null.
    * LoadFunc implementations which need to open the input "fileName", can use
    * FileLocalizer.open(String fileName, ExecType execType, DataStorage storage) to get
    * an InputStream which they can use to initialize their loader implementation. They
    * can then use this to read the input data to discover the schema. Note: this will
    * work only when the fileName represents a file on Local File System or Hadoop file
    * system
    * @param fileName Name of the file to be read.(this will be the same as the filename
    * in the "load statement of the script)
    * @param execType - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCE
    * @param storage - the DataStorage object corresponding to the execType
    * @return a Schema describing the data if possible, or null otherwise.
    * @throws IOException.
    */
    public Schema determineSchema(String fileName, ExecType execType, DataStorage storage) throws IOException;
    {code}

    As noted in the comments above, I have also added a static helper method in FileLocalizer. LoadFunc implementations which need to open the input "fileName", can use FileLocalizer.open(String fileName, ExecType execType, DataStorage storage) to get an InputStream which they can use to initialize their loader implementation. There are some related changes in TypeCastInserter and Schema to handle schema specification in the Load statement. (which would be providing an additional schema in addition to the one determined by determineSchema())

    - Reviewers, please also look at https://issues.apache.org/jira/browse/PIG-492 to make sure that, that are no blockers for that issue - it seems like for that issue we would need to serialize each of the loader
    in the query and send it to the backend which may not be trivial
    Make BinStorage declare its schema
    ----------------------------------

    Key: PIG-468
    URL: https://issues.apache.org/jira/browse/PIG-468
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Fix For: types_branch


    Currently, BinStorage breaks the rule that unless it tells Pig what types it is producing it should produce bytearrays. This causes runtime problems as the frontend assumes the data to be of one type while in fact it is of different type.
    Loader interface has a way to specify schema via determineSchema API. BinStorage need to implement this. Also, since this interface has not been used before, the pluming might also need to be adjusted.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pradeep Kamath (JIRA) at Oct 14, 2008 at 9:26 pm
    [ https://issues.apache.org/jira/browse/PIG-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pradeep Kamath updated PIG-468:
    -------------------------------

    Attachment: PIG-468.patch
    Make BinStorage declare its schema
    ----------------------------------

    Key: PIG-468
    URL: https://issues.apache.org/jira/browse/PIG-468
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-468.patch


    Currently, BinStorage breaks the rule that unless it tells Pig what types it is producing it should produce bytearrays. This causes runtime problems as the frontend assumes the data to be of one type while in fact it is of different type.
    Loader interface has a way to specify schema via determineSchema API. BinStorage need to implement this. Also, since this interface has not been used before, the pluming might also need to be adjusted.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Pradeep Kamath (JIRA) at Oct 14, 2008 at 9:26 pm
    [ https://issues.apache.org/jira/browse/PIG-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Pradeep Kamath updated PIG-468:
    -------------------------------

    Assignee: Pradeep Kamath
    Status: Patch Available (was: Open)
    Make BinStorage declare its schema
    ----------------------------------

    Key: PIG-468
    URL: https://issues.apache.org/jira/browse/PIG-468
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-468.patch


    Currently, BinStorage breaks the rule that unless it tells Pig what types it is producing it should produce bytearrays. This causes runtime problems as the frontend assumes the data to be of one type while in fact it is of different type.
    Loader interface has a way to specify schema via determineSchema API. BinStorage need to implement this. Also, since this interface has not been used before, the pluming might also need to be adjusted.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 14, 2008 at 10:28 pm
    [ https://issues.apache.org/jira/browse/PIG-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-468:
    -------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    patch committed; thanks, pradeep
    Make BinStorage declare its schema
    ----------------------------------

    Key: PIG-468
    URL: https://issues.apache.org/jira/browse/PIG-468
    Project: Pig
    Issue Type: Bug
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Pradeep Kamath
    Fix For: types_branch

    Attachments: PIG-468.patch


    Currently, BinStorage breaks the rule that unless it tells Pig what types it is producing it should produce bytearrays. This causes runtime problems as the frontend assumes the data to be of one type while in fact it is of different type.
    Loader interface has a way to specify schema via determineSchema API. BinStorage need to implement this. Also, since this interface has not been used before, the pluming might also need to be adjusted.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 30, '08 at 11:20p
activeOct 14, '08 at 10:28p
posts5
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 5 posts

People

Translate

site design / logo © 2022 Grokbase