Alan Gates commented on PIG-1314:
Major +1. Adding DateTime as a Pig primitive is definitely a good idea. It's on our list of things to do (http://wiki.apache.org/pig/PigJournal). A brief overview of the work to be done:
# Add support in parser, both for declaring an input to be of type datetime and datetime constants
# Add support in TypeChecker for datetime types, including any allowed type promotions (ie implicit casts)
# Change LoadCaster interface to include bytesToDateTime method, add method to default implementation
# Determerine which builtin UDFs that we want for datetime and get agreement from community. Implement these UDFs.
# Implement any allowed cast operators for datetime (probably just string <-> datetime).
# Implement datetime class represents datetime in memory. This needs to implement WritableComparable so that it can be serialized and compared in Hadoop
# Implement raw comparator for the type so it can be used as a key in groups bys and joins.
# Change physical operators and builtin UDFs to handle processing of datetime types.
# Change data conversion and type discovery routines in DataType
# And, of course, add prolific tests
The other question is backward compatibility. I can think of only two backward incompatible changes
# Addition of bytesToDateTime in the LoadCaster interface. Given that this will only require a change if people recompile their implementation, and AFAIK there are no implementations of LoadCaster before our default implementation, I think this is ok.
# Changes to Pig Latin to specify a field as of type date, plus however we denote datetime strings. We need to make these as unobtrusive as possible, but again I think it will be ok, though we'll need to get community buy in on it.
Would such a patch be accepted? If it's of good quality deals with backward compatibility concerns, certainly. In time for 0.8, I don't know. We try to do a release every three months, with a feature cut off about a month before release (give or take). Branching and feature cutoff for 0.7 is today, so branching and features cut off for 0.8 will probably be in June.
If you want to pursue this, the first step should be a brief design that says how you'll go about doing it. It should cover things like which date format will you use (SQL, something else)? Which date function do you think should be built in? How to you plan to store this type in memory? Are there existing datetime libraries you can leverage or incorporate to avoid rebuilding the wheel? It's easiest to write up the design on Pig's wiki and then link to it on this bug. This will give users and developers a chance to review your thoughts and give feedback.
Add DateTime Support to Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Russell Jurney
Fix For: 0.8.0
Original Estimate: 672h
Remaining Estimate: 672h
Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive.
Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted?
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.