Grokbase Groups Pig user January 2010
FAQ
currently, Pig's SUBSTRING (in piggybank) takes parameters (string,
startIndex, endIndex).

If endindex is past the end of the string, an error is logged and the
string is dropped (a null is returned). This is consistent with Java's
String.substring(). It seems to me that while this makes sense in
Java, this is not desirable in Pig where you can't catch an exception,
do runtime length checking, etc. I would prefer to have SUBSTRING
avoid the Java exception by calling str.substring(beginIndex,
min(str.length-1, endIndex)).

Thoughts?

-D

Search Discussions

  • Dmitriy Ryaboy at Jan 22, 2010 at 7:15 pm
    I mean min(str.length, endIndex)

    :-)

    -D
    On Fri, Jan 22, 2010 at 10:20 AM, Dmitriy Ryaboy wrote:
    currently, Pig's SUBSTRING (in piggybank) takes parameters (string,
    startIndex, endIndex).

    If endindex is past the end of the string, an error is logged and the
    string is dropped (a null is returned). This is consistent with Java's
    String.substring().  It seems to me that while this makes sense in
    Java, this is not desirable in Pig where you can't catch an exception,
    do runtime length checking, etc. I would prefer to have SUBSTRING
    avoid the Java exception by calling str.substring(beginIndex,
    min(str.length-1, endIndex)).

    Thoughts?

    -D
  • Ankur C. Goel at Jan 25, 2010 at 6:18 am
    +1 for the change. I agree with Dmitriy on the proposed behavior of SUBSTRING in PIG

    -@nkur


    On 1/23/10 12:45 AM, "Dmitriy Ryaboy" wrote:

    I mean min(str.length, endIndex)

    :-)

    -D
    On Fri, Jan 22, 2010 at 10:20 AM, Dmitriy Ryaboy wrote:
    currently, Pig's SUBSTRING (in piggybank) takes parameters (string,
    startIndex, endIndex).

    If endindex is past the end of the string, an error is logged and the
    string is dropped (a null is returned). This is consistent with Java's
    String.substring(). It seems to me that while this makes sense in
    Java, this is not desirable in Pig where you can't catch an exception,
    do runtime length checking, etc. I would prefer to have SUBSTRING
    avoid the Java exception by calling str.substring(beginIndex,
    min(str.length-1, endIndex)).

    Thoughts?

    -D
  • Alan Gates at Jan 23, 2010 at 5:58 pm
    In such situations, I generally ask myself, what would SQL do? The
    SQL standard specifies substring as:

    SUBSTRING left_parend _charactervalue_ FROM _startposition_ [ FOR
    _length_] right_parend (page 273)
    and it specifies that if length goes beyond the end of the string,
    then it returns up to the end of the string (page 303) (ie, it does as
    you suggest.)

    I wouldn't suggest all the verbose FROM and FOR, but I think we should
    align with SQL. So I would say the final parameter should be changed
    from end position to length and the behavior should change as you
    suggest.

    Alan.

    References from ISO 9075-2 Information technology - Database languages
    -SQL Part 2 Foundation, Third edition 2008.

    On Jan 22, 2010, at 10:20 AM, Dmitriy Ryaboy wrote:

    currently, Pig's SUBSTRING (in piggybank) takes parameters (string,
    startIndex, endIndex).

    If endindex is past the end of the string, an error is logged and the
    string is dropped (a null is returned). This is consistent with Java's
    String.substring(). It seems to me that while this makes sense in
    Java, this is not desirable in Pig where you can't catch an exception,
    do runtime length checking, etc. I would prefer to have SUBSTRING
    avoid the Java exception by calling str.substring(beginIndex,
    min(str.length-1, endIndex)).

    Thoughts?

    -D
  • Dmitriy Ryaboy at Jan 23, 2010 at 6:05 pm
    I'll make the change.
    On Sat, Jan 23, 2010 at 9:57 AM, Alan Gates wrote:
    In such situations, I generally ask myself, what would SQL do?  The SQL
    standard specifies substring as:

    SUBSTRING left_parend _charactervalue_ FROM _startposition_ [ FOR _length_]
    right_parend (page 273)
    and it specifies that if length goes beyond the end of the string, then it
    returns up to the end of the string (page 303) (ie, it does as you suggest.)

    I wouldn't suggest all the verbose FROM and FOR, but I think we should align
    with SQL.  So I would say the final parameter should be changed from end
    position to length and the behavior should change as you suggest.

    Alan.

    References from ISO 9075-2 Information technology - Database languages -SQL
    Part 2 Foundation, Third edition 2008.

    On Jan 22, 2010, at 10:20 AM, Dmitriy Ryaboy wrote:

    currently, Pig's SUBSTRING (in piggybank) takes parameters (string,
    startIndex, endIndex).

    If endindex is past the end of the string, an error is logged and the
    string is dropped (a null is returned). This is consistent with Java's
    String.substring().  It seems to me that while this makes sense in
    Java, this is not desirable in Pig where you can't catch an exception,
    do runtime length checking, etc. I would prefer to have SUBSTRING
    avoid the Java exception by calling str.substring(beginIndex,
    min(str.length-1, endIndex)).

    Thoughts?

    -D

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 22, '10 at 6:20p
activeJan 25, '10 at 6:18a
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase