FAQ
casting parameters of a UDF
---------------------------

Key: PIG-427
URL: https://issues.apache.org/jira/browse/PIG-427
Project: Pig
Issue Type: Improvement
Affects Versions: types_branch
Reporter: Olga Natkovich
Fix For: types_branch


Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.

This would be very useful for conversion of the UDFs from the piigybank to the new code.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Sep 12, 2008 at 12:42 am
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-427:
    ----------------------------------

    Assignee: Shravan Matthur Narayanamurthy
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 15, 2008 at 11:16 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-427:
    -----------------------------------------------

    Status: Patch Available (was: Open)

    The patch implements the following logic:

    checks if a fit is possible and returns a score if so. The lesser the score the better the fit.
    A table of possible casts is maintained and the table is ordered so as produce a sensible heuristic for the fit score. The principle behind the heuristic is that it tries to choose lesser number of casts and if the number of casts is same tries to choose conversions to a smaller type where ordering of types is:
    INTEGER, LONG, FLOAT, DOUBLE, CHARARRAY, TUPLE, BAG, MAP (from small to big)

    Once the best fit is determined, casts are introduced to suit that fit. However, if the schema contains a schema embedded as a Tuple or a Bag, the bestFit function wants these schemas to match exactly. For ex., if SUM provides a mapping to BAG(integers} & BAG(floats), and we have BAG(longs) as input, the best fit doesn't try to insert a cast here because the nesting here can be arbitrary and finding the right project where the cast should be inserted is a bit complicated.

    The patch also includes a test case which tests three scenarios for casting.
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 15, 2008 at 11:18 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-427:
    -----------------------------------------------

    Attachment: 427.patch
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 18, 2008 at 10:29 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632422#action_12632422 ]

    Olga Natkovich commented on PIG-427:
    ------------------------------------

    Hi Shravan,

    Thanks for the patch. I have a couple of comments:

    (1) We don't support BOOLEAN is type in the language so we don't need it in the mapping
    (2) I don't think we should cast numeric types to chararrays because we don't know what encoding to use
    (3) I don't think we should cast bytearrays to anything implicitely since we don't know what is a safe cast in this case
    (4) I think that if multiple function get the same score, we should say that it is ambiguous and ask for explicit cast. For instance, we have 2 functions (int, float) and (float, int) and the input (int, int) - we should say that we can't choose.
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 19, 2008 at 6:47 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632763#action_12632763 ]

    Olga Natkovich commented on PIG-427:
    ------------------------------------

    I want to update #4 of my original comments. We should cast bytearray if UDF map only consists of a single function.

    This would help with backward compatibility.
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 22, 2008 at 4:19 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-427:
    -----------------------------------------------

    Attachment: 427-1.patch

    Patch with Olga's comments included
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 22, 2008 at 5:51 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633395#action_12633395 ]

    Olga Natkovich commented on PIG-427:
    ------------------------------------

    Looks good. A couple of comments

    (1) exact schema match comparison should be made before bytearray comparison. You can have a UDF that takes a bytearray as parameter. In that case it does not matter how many functions are present in the table.

    (2) it would be good to have comment explaining what the rules are about bytearrays and also about multiple matches

    (3) Looks like the code is making an assumtion that scores will be returned in the score order. I was not quite sure why.


    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 22, 2008 at 8:26 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633477#action_12633477 ]

    Olga Natkovich commented on PIG-427:
    ------------------------------------

    Please ignore (3) on my last comment
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 23, 2008 at 1:11 am
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-427:
    -----------------------------------------------

    Attachment: 427-2.patch

    Have addressed olga's comments. There was a bug in 427-1.patch. I think I have resolved it now. It made the implicit assumption that if byte array is found and we have a single func defined, then the matching func is the defined one. However, this need not be the case. So need to check if a fit is possible. For ex, we might have (bytearray, int) & func defined might have (long, tuple). In this case, we need to fail.

    Also, hopefully the code is more readable now.
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427-2.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 26, 2008 at 9:00 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635018#action_12635018 ]

    Olga Natkovich commented on PIG-427:
    ------------------------------------

    I am seeing 2 problems with the patch while ranning some tests:

    Script 1:

    a = load 'studentnulltab10k' as (name:chararray, age:int, gpa:double);
    b = group a ALL;
    c = foreach b generate SUM(a.age);

    the output looks like:

    (449650.0)

    Notice that it got casted to double even though we have version of SUM that accepts an int and produces long

    Script 2:

    a = load 'studentnulltab10k' as (name:chararray, age:int, gpa:double);
    b = group a ALL;
    c = foreach b generate MIN(a.name);

    This query fails with the following error stack:

    2008-09-26 13:54:28,374 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error message from task (map) task_200809241441_1550_m_000000java.io.IOException: Received Error while processing the reduce plan.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:166)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:56)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427-2.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 29, 2008 at 6:59 am
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-427:
    -----------------------------------------------

    Attachment: 427-3.patch

    With all the changes, had left out the statement to set the found matching spec as the new func spec in LOUserFunc.

    One extraneous change is in src/org/apache/pig/pen/EquivalenceClasses.java which has an unused import which was causing an error in eclipse. Removed that in this patch.
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427-2.patch, 427-3.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 29, 2008 at 9:47 pm
    [ https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-427:
    -------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    patch committed; thanks, shravan!
    casting parameters of a UDF
    ---------------------------

    Key: PIG-427
    URL: https://issues.apache.org/jira/browse/PIG-427
    Project: Pig
    Issue Type: Improvement
    Affects Versions: types_branch
    Reporter: Olga Natkovich
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch

    Attachments: 427-1.patch, 427-2.patch, 427-3.patch, 427.patch


    Currently if we have a UDF that declares via getArgToFuncMapping that it can only handle a subset of types, passing any other types to the function would result in an error. However, some types can be promoted to others and it would be useful if typechecker to perform best fit cast. For instance, if the input parameter has type of Long and the UDF support Int and Double, the code should cast the paraneter into Double.
    This would be very useful for conversion of the UDFs from the piigybank to the new code.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 11, '08 at 7:56p
activeSep 29, '08 at 9:47p
posts13
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 13 posts

People

Translate

site design / logo © 2022 Grokbase