Grokbase Groups Pig dev March 2010
FAQ
Add DateTime Support to Pig
---------------------------

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.8.0
Reporter: Russell Jurney
Fix For: 0.7.0


Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.

Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Mar 22, 2010 at 6:51 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-1314:
    --------------------------------

    Affects Version/s: (was: 0.8.0)
    0.7.0
    Fix Version/s: (was: 0.7.0)
    0.8.0
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Mar 22, 2010 at 7:03 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848285#action_12848285 ]

    Alan Gates commented on PIG-1314:
    ---------------------------------

    Major +1. Adding DateTime as a Pig primitive is definitely a good idea. It's on our list of things to do (http://wiki.apache.org/pig/PigJournal). A brief overview of the work to be done:

    # Add support in parser, both for declaring an input to be of type datetime and datetime constants
    # Add support in TypeChecker for datetime types, including any allowed type promotions (ie implicit casts)
    # Change LoadCaster interface to include bytesToDateTime method, add method to default implementation
    # Determerine which builtin UDFs that we want for datetime and get agreement from community. Implement these UDFs.
    # Implement any allowed cast operators for datetime (probably just string <-> datetime).
    # Implement datetime class represents datetime in memory. This needs to implement WritableComparable so that it can be serialized and compared in Hadoop
    # Implement raw comparator for the type so it can be used as a key in groups bys and joins.
    # Change physical operators and builtin UDFs to handle processing of datetime types.
    # Change data conversion and type discovery routines in DataType
    # And, of course, add prolific tests

    The other question is backward compatibility. I can think of only two backward incompatible changes
    # Addition of bytesToDateTime in the LoadCaster interface. Given that this will only require a change if people recompile their implementation, and AFAIK there are no implementations of LoadCaster before our default implementation, I think this is ok.
    # Changes to Pig Latin to specify a field as of type date, plus however we denote datetime strings. We need to make these as unobtrusive as possible, but again I think it will be ok, though we'll need to get community buy in on it.

    Would such a patch be accepted? If it's of good quality deals with backward compatibility concerns, certainly. In time for 0.8, I don't know. We try to do a release every three months, with a feature cut off about a month before release (give or take). Branching and feature cutoff for 0.7 is today, so branching and features cut off for 0.8 will probably be in June.

    If you want to pursue this, the first step should be a brief design that says how you'll go about doing it. It should cover things like which date format will you use (SQL, something else)? Which date function do you think should be built in? How to you plan to store this type in memory? Are there existing datetime libraries you can leverage or incorporate to avoid rebuilding the wheel? It's easiest to write up the design on Pig's wiki and then link to it on this bug. This will give users and developers a chance to review your thoughts and give feedback.

    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at Mar 22, 2010 at 9:31 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848356#action_12848356 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    Thanks, Alan. That is quite helpful. Let me look into it and see about feasibility.

    What about durations as well? http://en.wikipedia.org/wiki/ISO_8601#Durations ISO8601 durations would be very handy in enabling use of pig operators on datetimes via +/-, etc. This might be something to do later, though.
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Mar 22, 2010 at 10:59 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848422#action_12848422 ]

    Alan Gates commented on PIG-1314:
    ---------------------------------

    I think durations would be useful, and others have mentioned to me that they'd like to have them. As you note, this might be a good phase 2 addition, as getting datetime in alone will be a fair chunk of work.
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at Mar 23, 2010 at 8:37 am
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848618#action_12848618 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    I would not say this blocks PIG-1310 at all - the UDFs there simply treat ISO dates as strings, which works reasonably well. They should also handle Long unix times, and will in a next patch. In any case, this isn't a blocker to that ticket, for which a patch was just submitted.
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at May 29, 2010 at 4:21 am
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873234#action_12873234 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    As a first pass, I am going to add Boolean, which should be easier than DateTime, but will inform this implementation. See PIG-1429
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at May 30, 2010 at 5:58 am
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873382#action_12873382 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    Ok, thinking about really doing this soon, after Boolean. I'd like to add two new primitives to Pig - DateTime and Duration.

    I'd do this on the wiki, but I don't have edit access. Can someone please grant the ability to make a new page to user RussellJurney on the Pig wiki?

    Design Notes:

    1) I'd like to use Jodatime for this, as I did in the DateTime UDFs. It is possible to use the Java date libs, but it would be painful to do so. Jodatime also performs better than Java's native date classes. It is Apache 2.0 licensed and is already pulled in via ivy in the DateTime UDFs - see PIG-1310

    2) Date Format for text/dumps: ISO8601. Looks like: [YYYY][MM][DD]T[hh][mm]Z It is a human readable, sortable/comparable, international standard. See http://en.wikipedia.org/wiki/ISO_8601#Dates

    2.5) In memory type: org.joda.time.DateTime. See http://joda-time.sourceforge.net/apidocs/org/joda/time/DateTime.html

    The internal format of jodatime is a Long epoch/Unix/POSIX time. See http://joda-time.sourceforge.net/faq.html#internalstorage

    3) Duration Format for text/dumps: ISO8601. Looks like: P[n]Y[n]M[n]DT[n]H[n]M[n]S It is a human readable, sortable/comparable, international standard. See http://en.wikipedia.org/wiki/ISO_8601#Durations

    3.5) In-memory format: org.joda.time.Duration. See http://joda-time.sourceforge.net/apidocs/org/joda/time/Duration.html

    4) All date functions in PIG-1310 should be included, except those replaced by the use of operators on datetimes and durations. Adding/subtracting datetimes should result in a duration. Durations can be added/subtracted/divided/multiplied/negated.

    Date/Duration truncation, date differences, date parsing/conversion should be included. Conversion from int/long POSIX, SQL and datemonth should be included. Conversion from any string with a DateFormat string should be included.

    5) Casting to and from Integer and Long should be supported, as a Unix/POSIX time. Casting to/from chararray in ISO8601 format should be supported.

    Comments? Suggestions?
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at May 31, 2010 at 2:07 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873664#action_12873664 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    Hmmm not sure if I should use durations or periods, or both. See http://joda-time.sourceforge.net/apidocs/org/joda/time/Period.html

    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at Jul 2, 2010 at 6:20 am
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884558#action_12884558 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    Been thinking about this... I don't think we should add a full datetime type at this time. See comments in PIG-1314 on alternative approach using builtins.
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Russell Jurney (JIRA) at Jul 2, 2010 at 6:29 am
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884562#action_12884562 ]

    Russell Jurney commented on PIG-1314:
    -------------------------------------

    I suck at JIRA. See proposal in PIG-1430.


    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Jul 14, 2010 at 5:46 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-1314:
    --------------------------------

    Assignee: Russell Jurney
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Assignee: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Jul 26, 2010 at 9:07 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892464#action_12892464 ]

    Olga Natkovich commented on PIG-1314:
    -------------------------------------

    Russell, are you still planning to finish this for Pig 0.8.0 release?
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Assignee: Russell Jurney
    Fix For: 0.8.0

    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 31, 2010 at 5:45 pm
    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-1314:
    --------------------------------

    Fix Version/s: (was: 0.8.0)

    Unlinking from 0.8 since we are branching today
    Add DateTime Support to Pig
    ---------------------------

    Key: PIG-1314
    URL: https://issues.apache.org/jira/browse/PIG-1314
    Project: Pig
    Issue Type: Bug
    Components: data
    Affects Versions: 0.7.0
    Reporter: Russell Jurney
    Assignee: Russell Jurney
    Original Estimate: 672h
    Remaining Estimate: 672h

    Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
    Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedMar 22, '10 at 6:29p
activeAug 31, '10 at 5:45p
posts14
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 14 posts

People

Translate

site design / logo © 2021 Grokbase