Grokbase Groups Pig dev August 2010
FAQ
additional piggybank datetime and string UDFs
---------------------------------------------

Key: PIG-1565
URL: https://issues.apache.org/jira/browse/PIG-1565
Project: Pig
Issue Type: Improvement
Reporter: Andrew Hitchcock


Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Andrew Hitchcock (JIRA) at Aug 25, 2010 at 1:56 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1565:
    ----------------------------------

    Status: Patch Available (was: Open)
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Aug 25, 2010 at 1:56 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1565:
    ----------------------------------

    Attachment: PIG-1565-1.patch

    This patch provides a number of UDFs written by the Amazon Elastic MapReduce team that we feel are useful.

    A few of these UDFs are duplicates of existing functionality. I am including them because they are consistent with the rest of the UDFs in this patch and because I'd like to start a discussion about the best way to include these UDFs. Here is a list of what I believe to be duplicate UDFs:

    INDEX_OF
    LAST_INDEX_OF
    SPLIT_ON_REGEX

    Here are descriptions of the provided UDFs.

    datetime/
    These are based on JodaTime and provide a similar model for date handling.

    DATE_TIME
    A function that returns a DateTime String, of the form yyyy-MM-dd'T'HH:mm:ss.SSSZZ.
    DURATION
    A function that returns a Duration as a long. A duration is a length of time specified in milliseconds.
    EXTRACT_DT
    Extracts the integer numeric value of a field of a LocalDate, LocalTime, DateTime, Period or Duration.
    FORMAT_DT
    Formats a LocalDate, LocalTime or DateTime given a format string into a string.
    LOCAL_DATE
    A function that returns a LocalDate String, of the form yyyy-MM-dd.
    LOCAL_TIME
    A function that returns a LocalTime String, of the form HH:mm:ss.SSS.
    OFFSET_DT
    Offsets a LocalDate, LocalTime or DateTime by a Period/Duration, returning an object of the same type.
    PERIOD
    A function that returns a Period String. A Period is specified in terms of individual duration fields such as years and days.

    string/
    String handling functions modeled after Apache Commons StringUtils.

    CAPITALIZE
    Capitalizes a String changing the first letter to upper case.
    CENTER
    Centers a String in a larger String
    CONCAT_WITH
    Joins the arguments with String joiner.
    EXTRACT
    Parses input String with a regular expression, and returns all matches groups.
    FORMAT
    Formats a list of arguments into a single String
    INDEX_OF
    Finds the first index within a String, from a optional start position, handling null
    LAST_INDEX_OF
    Finds the last index within a String, from a optional start position, handling null
    LEFT_PAD
    Left pads a string to one of size size.
    REPEAT
    Repeat a String repeat times to form a new String.
    REPLACE_ONCE
    Replaces a String with another String inside a larger String, once.
    RIGHT_PAD
    Right pads a string to one of size size.
    SPLIT_ON_REGEX
    Splits this string around matches of the given regular expression.
    STRIP
    Strips any of a set of characters from the start and end of a String.
    STRIP_END
    Strips any of a set of characters from the start of a String.
    STRIP_START
    Strips any of a set of characters from the start of a String.
    SWAP_CASE
    Swaps the case of a String changing upper and title case to lower case, and lower case to upper case.
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 26, 2010 at 10:49 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903123#action_12903123 ]

    Alan Gates commented on PIG-1565:
    ---------------------------------

    Comments
    # ErrorCatchingBase swallows any non-ExecExceptions. It should print their messages out as warnings. Warnings are collated and the count reported at the end of the job. Details are only printed if the user asks for them. That way the user will still be informed that something unexpected happened and can investigate further if he wants to.
    # On the duplication, it looks to me like INDEX_OF and LAST_INDEX_OF are supersets of the functions already in Pig. You could submit a patch for those two functions (which are now builtins) to extend them to take the optional third argument. SPLIT_ON_REGEX looks like a subset of the existing SPLIT function that is built into Pig, so other than having it as an alias so that Amazon users who are used to calling SPLIT_ON_REGEX I'm not clear what the value is.

    Thanks for contributing all these, this is great.

    I'll run test-patch and the unit tests and post the results.

    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 26, 2010 at 10:52 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates reassigned PIG-1565:
    -------------------------------

    Assignee: Andrew Hitchcock
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 26, 2010 at 10:52 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-1565:
    ----------------------------

    Fix Version/s: 0.8.0
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dmitriy V. Ryaboy (JIRA) at Aug 26, 2010 at 11:31 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903140#action_12903140 ]

    Dmitriy V. Ryaboy commented on PIG-1565:
    ----------------------------------------

    Please note that there's an outstanding patch for INDEX_OF and LAST_INDEX_OF in PIG-1563
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 27, 2010 at 1:06 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903174#action_12903174 ]

    Alan Gates commented on PIG-1565:
    ---------------------------------

    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 5 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
    [exec]
    [exec]
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 27, 2010 at 6:33 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903515#action_12903515 ]

    Alan Gates commented on PIG-1565:
    ---------------------------------

    Unit tests run fine.

    When I run contrib tests, one of the tests in this patch fails:

    {code}
    Testsuite: org.apache.pig.piggybank.test.evaluation.string.TestString
    Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.751 sec
    ------------- Standard Error -----------------
    10/08/27 11:16:31 WARN string.SUBSTRING: invalid number of arguments to SUBSTRING
    ------------- ---------------- ---------------

    Testcase: testSimple took 0.683 sec
    FAILED
    expected:<lo> but was:<null>
    junit.framework.AssertionFailedError: expected:<lo> but was:(Unknown Source)
    at org.apache.pig.piggybank.test.evaluation.string.TestString.testSimple(Unknown Source)

    Testcase: testFormatTypes took 0.048 sec
    {code}
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Aug 27, 2010 at 6:33 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-1565:
    ----------------------------

    Status: Open (was: Patch Available)
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Sep 17, 2010 at 1:08 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1565:
    ----------------------------------

    Attachment: PIG-1565-2.patch

    Made changes to LAST_INDEX_OF, INDEXOF, and SPLIT_ON_REGEX as per request. Also fixed the test case bug, which was caused by a missing change (this patch now extends SUBSTRING with more functionality).
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Hitchcock (JIRA) at Sep 17, 2010 at 1:09 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Hitchcock updated PIG-1565:
    ----------------------------------

    Status: Patch Available (was: Open)
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 23, 2010 at 12:55 am
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913869#action_12913869 ]

    Alan Gates commented on PIG-1565:
    ---------------------------------

    I'll review this patch.
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Oct 1, 2010 at 10:40 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917081#action_12917081 ]

    Alan Gates commented on PIG-1565:
    ---------------------------------

    [exec] -1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 8 new or modified tests.
    [exec]
    [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
    [exec]
    [exec]

    The javadoc warning is:

    [javadoc] /home/gates/src/pig/PIG-1565/trunk/src/org/apache/pig/builtin/INDEXOF.java:78: warning - Tag @link: can't find INDEX_OF(int, int) in java.lang.String

    Building Piggybank now fails as well, since some of the ErrorCatchingBase class was moved into main Pig.

    Also, the patch fails a couple of unit tests in TestStringUDFs. It fails testIndexOf and testLastIndexOf() because it doesn't properly handle the null case.

    I'll attach the output from running the tests.
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Oct 1, 2010 at 10:40 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-1565:
    ----------------------------

    Status: Open (was: Patch Available)
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.8.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Oct 4, 2010 at 6:42 pm
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-1565:
    --------------------------------

    Fix Version/s: (was: 0.8.0)
    0.9.0

    Pushing to the next release since the patch is not quite ready to be committed
    additional piggybank datetime and string UDFs
    ---------------------------------------------

    Key: PIG-1565
    URL: https://issues.apache.org/jira/browse/PIG-1565
    Project: Pig
    Issue Type: Improvement
    Reporter: Andrew Hitchcock
    Assignee: Andrew Hitchcock
    Fix For: 0.9.0

    Attachments: PIG-1565-1.patch, PIG-1565-2.patch


    Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 25, '10 at 1:54a
activeOct 4, '10 at 6:42p
posts16
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 16 posts

People

Translate

site design / logo © 2022 Grokbase