FAQ
LineRecordReader needs more synchronization
-------------------------------------------

Key: HADOOP-3554
URL: https://issues.apache.org/jira/browse/HADOOP-3554
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.17.0
Environment: All java platforms
Reporter: Aaron Greenhouse


LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.

I propose the following changes to the class:
- Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
- Make the method getProgress() synchronized.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Aaron Greenhouse (JIRA) at Jun 13, 2008 at 2:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Aaron Greenhouse updated HADOOP-3554:
    -------------------------------------

    Attachment: LineRecordReader.patch

    Patch file for the class that applies the above changes.

    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 14, 2008 at 2:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605027#action_12605027 ]

    Chris Douglas commented on HADOOP-3554:
    ---------------------------------------

    Hi Aaron-

    * Would you mind generating the patch with subversion? Other contribution guidelines are [here|http://wiki.apache.org/hadoop/HowToContribute].
    * It looks like there are some annotations/comments that made it into the patch. Could you remove them?
    * Though writing a test case demonstrating a race condition or synchronization issue might be difficult, do you have a way to reproduce this? Where is a LineRecordReader instance shared between threads?
    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Jun 14, 2008 at 2:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605027#action_12605027 ]

    chris.douglas edited comment on HADOOP-3554 at 6/13/08 7:53 PM:
    ----------------------------------------------------------------

    Hi Aaron-

    * Would you mind generating the patch with subversion? Other contribution guidelines are [here|http://wiki.apache.org/hadoop/HowToContribute].
    * It looks like there are some annotations/comments that made it into the patch. Could you remove them?
    * Though writing a test case demonstrating a race condition or synchronization issue might be difficult, do you have a way to reproduce a problem or would you describe how one might arise? Where is a LineRecordReader instance shared between threads?

    was (Author: chris.douglas):
    Hi Aaron-

    * Would you mind generating the patch with subversion? Other contribution guidelines are [here|http://wiki.apache.org/hadoop/HowToContribute].
    * It looks like there are some annotations/comments that made it into the patch. Could you remove them?
    * Though writing a test case demonstrating a race condition or synchronization issue might be difficult, do you have a way to reproduce this? Where is a LineRecordReader instance shared between threads?
    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Greenhouse (JIRA) at Jun 16, 2008 at 1:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605286#action_12605286 ]

    Aaron Greenhouse commented on HADOOP-3554:
    ------------------------------------------

    Per Nigel Daley's request, I should point out that this bug was discovered using Sierra from SureLogic to view FindBug results. The bug was correct with the assistance of JSure from SureLogic. The annotations mentioned above are a result of that process.

    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Greenhouse (JIRA) at Jun 16, 2008 at 1:56 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605289#action_12605289 ]

    Aaron Greenhouse commented on HADOOP-3554:
    ------------------------------------------

    Chris,

    I will regenerate the patch with SVN with the annotations removed.

    Honestly, I don't know if the this class is used by multiple threads or not. I haven't even run HADOOP. As stated above, my attention was drawn to this class through the use of FindBugs through SureLogic Sierra. This class was flagged by FindBugs because it had partial synchronization. I describe above how to fix the synchronization. But if, it is the case that instances of the class are not meant to be shared across threads, then no synchronization is necessary, the class documentation should indicate that instances of the class should not be used by more than 1 thread, and all the existing sychronization in the class should be removed.

    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Greenhouse (JIRA) at Jun 16, 2008 at 3:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605315#action_12605315 ]

    Aaron Greenhouse commented on HADOOP-3554:
    ------------------------------------------

    Okay. After reevaluating in the SVN Trunk (rather than based on the 0.17.0 source code), the field maxLineLength also needs to be made final.

    So, in summary
    * If objects of the class LineRecordReader really are used by multiple threads, then the class needs to be updated:
    - Make the fields start, end, and maxLineLength final.
    - Make the method getProgress() synchronized.
    * Otherwise, remove the existing synchronization from methods next(), getPos(), and close() and update the class documentation to indicate that instances are not thread safe.

    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: LineRecordReader.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Greenhouse (JIRA) at Jun 16, 2008 at 3:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Aaron Greenhouse updated HADOOP-3554:
    -------------------------------------

    Attachment: HADOOP-3445.patch

    Patch that applies the described changes. Replaces the original patch. This one is correctly generated by SVN.

    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: HADOOP-3445.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Aaron Greenhouse (JIRA) at Jun 16, 2008 at 3:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Aaron Greenhouse updated HADOOP-3554:
    -------------------------------------

    Attachment: (was: LineRecordReader.patch)
    LineRecordReader needs more synchronization
    -------------------------------------------

    Key: HADOOP-3554
    URL: https://issues.apache.org/jira/browse/HADOOP-3554
    Project: Hadoop Core
    Issue Type: Bug
    Affects Versions: 0.17.0
    Environment: All java platforms
    Reporter: Aaron Greenhouse
    Attachments: HADOOP-3445.patch

    Original Estimate: 1h
    Remaining Estimate: 1h

    LineRecordReader has three index fields start, end, and pos. All of these fields are long, which means that, in general, access to them is not atomic. This can cause problems if the fields are accessed without appropriate synchronization.
    I propose the following changes to the class:
    - Make the fields start and end final. This requires some minor changes to the constructor LineRecordReader(Configuration, FileSplit).
    - Make the method getProgress() synchronized.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 13, '08 at 2:43p
activeJun 16, '08 at 3:28p
posts9
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Aaron Greenhouse (JIRA): 9 posts

People

Translate

site design / logo © 2022 Grokbase