Grokbase Groups Pig dev November 2007
FAQ
BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
----------------------------------------------------------------------------------------------------------------------------

Key: PIG-39
URL: https://issues.apache.org/jira/browse/PIG-39
Project: Pig
Issue Type: Bug
Components: impl
Environment: Java 1.6, Mac OS X 10.5
Reporter: Sam Pullara


Simple fix can have a huge effect on performance of certain kinds of PIG programs:

Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
===================================================================
--- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
+++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
@@ -49,7 +49,14 @@
pos += rc;
return rc;
}
-
+
+ @Override
+ public int read(byte b[], int off, int len) throws IOException {
+ int read = in.read(b, off, len);
+ pos += read;
+ return read;
+ }
+
/**
* Returns the current position in the tracked InputStream.
*/


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Utkarsh Srivastava (JIRA) at Nov 30, 2007 at 2:46 am
    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546974 ]

    Utkarsh Srivastava commented on PIG-39:
    ---------------------------------------

    +1 to this patch.

    (pending of course, all tests passing).
    BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
    ----------------------------------------------------------------------------------------------------------------------------

    Key: PIG-39
    URL: https://issues.apache.org/jira/browse/PIG-39
    Project: Pig
    Issue Type: Bug
    Components: impl
    Environment: Java 1.6, Mac OS X 10.5
    Reporter: Sam Pullara

    Simple fix can have a huge effect on performance of certain kinds of PIG programs:
    Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
    ===================================================================
    --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
    +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
    @@ -49,7 +49,14 @@
    pos += rc;
    return rc;
    }
    -
    +
    + @Override
    + public int read(byte b[], int off, int len) throws IOException {
    + int read = in.read(b, off, len);
    + pos += read;
    + return read;
    + }
    +
    /**
    * Returns the current position in the tracked InputStream.
    */
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Benjamin Reed (JIRA) at Nov 30, 2007 at 3:54 pm
    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547189 ]

    Benjamin Reed commented on PIG-39:
    ----------------------------------

    +1 Great catch Sam!
    BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
    ----------------------------------------------------------------------------------------------------------------------------

    Key: PIG-39
    URL: https://issues.apache.org/jira/browse/PIG-39
    Project: Pig
    Issue Type: Bug
    Components: impl
    Environment: Java 1.6, Mac OS X 10.5
    Reporter: Sam Pullara

    Simple fix can have a huge effect on performance of certain kinds of PIG programs:
    Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
    ===================================================================
    --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
    +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
    @@ -49,7 +49,14 @@
    pos += rc;
    return rc;
    }
    -
    +
    + @Override
    + public int read(byte b[], int off, int len) throws IOException {
    + int read = in.read(b, off, len);
    + pos += read;
    + return read;
    + }
    +
    /**
    * Returns the current position in the tracked InputStream.
    */
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Nov 30, 2007 at 6:31 pm
    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547237 ]

    Olga Natkovich commented on PIG-39:
    -----------------------------------

    Great catch! I am going to make the change and run our functional as well as performance tests.
    BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
    ----------------------------------------------------------------------------------------------------------------------------

    Key: PIG-39
    URL: https://issues.apache.org/jira/browse/PIG-39
    Project: Pig
    Issue Type: Bug
    Components: impl
    Environment: Java 1.6, Mac OS X 10.5
    Reporter: Sam Pullara

    Simple fix can have a huge effect on performance of certain kinds of PIG programs:
    Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
    ===================================================================
    --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
    +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
    @@ -49,7 +49,14 @@
    pos += rc;
    return rc;
    }
    -
    +
    + @Override
    + public int read(byte b[], int off, int len) throws IOException {
    + int read = in.read(b, off, len);
    + pos += read;
    + return read;
    + }
    +
    /**
    * Returns the current position in the tracked InputStream.
    */
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Dec 10, 2007 at 9:58 pm
    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550206 ]

    Olga Natkovich commented on PIG-39:
    -----------------------------------

    I incorporated the change and ran performance tests. Unfortunately, I did not see any change in performance. By looking at Hadoop, code, I think they already buffering the data, so our code just going against data cached in memory.

    I am still going to commit the patch since this is a bug.
    BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
    ----------------------------------------------------------------------------------------------------------------------------

    Key: PIG-39
    URL: https://issues.apache.org/jira/browse/PIG-39
    Project: Pig
    Issue Type: Bug
    Components: impl
    Environment: Java 1.6, Mac OS X 10.5
    Reporter: Sam Pullara

    Simple fix can have a huge effect on performance of certain kinds of PIG programs:
    Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
    ===================================================================
    --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
    +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
    @@ -49,7 +49,14 @@
    pos += rc;
    return rc;
    }
    -
    +
    + @Override
    + public int read(byte b[], int off, int len) throws IOException {
    + int read = in.read(b, off, len);
    + pos += read;
    + return read;
    + }
    +
    /**
    * Returns the current position in the tracked InputStream.
    */
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Dec 21, 2007 at 12:44 am
    [ https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich closed PIG-39.
    -----------------------------

    Resolution: Fixed
    BufferedPositionedInputStream drastically reduces read performance because it doesn't override read([], o, l) in InputStream
    ----------------------------------------------------------------------------------------------------------------------------

    Key: PIG-39
    URL: https://issues.apache.org/jira/browse/PIG-39
    Project: Pig
    Issue Type: Bug
    Components: impl
    Environment: Java 1.6, Mac OS X 10.5
    Reporter: Sam Pullara

    Simple fix can have a huge effect on performance of certain kinds of PIG programs:
    Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
    ===================================================================
    --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (revision 597597)
    +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java (working copy)
    @@ -49,7 +49,14 @@
    pos += rc;
    return rc;
    }
    -
    +
    + @Override
    + public int read(byte b[], int off, int len) throws IOException {
    + int read = in.read(b, off, len);
    + pos += read;
    + return read;
    + }
    +
    /**
    * Returns the current position in the tracked InputStream.
    */
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedNov 30, '07 at 12:05a
activeDec 21, '07 at 12:44a
posts6
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase