Grokbase Groups HBase dev July 2009
FAQ
Filter#filterRow is called too often, filters rows it shouldn't have
--------------------------------------------------------------------

Key: HBASE-1647
URL: https://issues.apache.org/jira/browse/HBASE-1647
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Doğacan Güney


Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Doğacan Güney (JIRA) at Jul 11, 2009 at 12:09 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: scanfilter.patch
    ScanBug.java

    1) A simple class to demonstrate the problem. Not that col2:-s are all filtered even though the ValueFilter instance should only work on col1:-s. When running give -create as an argument to create the table.

    2) Patch for the issue. Patch delete the ScanQueryMatcher#filterEntireRow method and instead calls Filter#filterRow in HRegion.RegionScanner#next. I am not sure if this patch is complete/correct but it fixes the problem for me. Not that you also need HBASE-1646 to test this patch (or the class above).
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Attachments: ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 12, 2009 at 10:47 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: HBASE-1647-v2.patch

    v2 of patch.

    I think all reset()s should be moved into RegionScanner#next as well. Since we merge different columns of the same row in RegionScanner, calling reset anywhere else (especiall in ScanQueryMatcher, which is called from StoreScanner#next) seems like a bug.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Attachments: HBASE-1647-v2.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 13, 2009 at 7:23 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730471#action_12730471 ]

    stack commented on HBASE-1647:
    ------------------------------

    Patch looks good. +1. Its a radical change in Filter processing though it looks right and all tests pass. Can someone else look at this? Ryan? I'd like others input before commiting.

    On StoreScanner running through in an column order rather than row-at-a-time, thats not how I understand it works but maybe thats how it appears in this context.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 13, 2009 at 7:23 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1647:
    -------------------------

    Fix Version/s: 0.20.0

    Bringing into 0.20.0.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Jul 13, 2009 at 8:21 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730491#action_12730491 ]

    ryan rawson commented on HBASE-1647:
    ------------------------------------

    I'm afraid this patch is not good, the assumptions of the reporter are not correct, we in fact see things in this order:

    row1 / col1
    row1 / col2
    row2 / col1
    row2 / col2

    This patch removes a key piece of functionality I built into the new API that allows us to post-facto filter a row.

    The API needs to be clearer perhaps. The intent is:

    - use filterRowKey() to have the earliest chance to filter an entire row based on the _KEY_ only.
    - for more complex processing, implement filterKeyValue() and set internal state.
    -- At the "End" of the KeyValues for a row, filterRow() will be called, and you can choose, based on that state, to filter the row. Eg: if a column was missing (which you can't know until you hit the end of the KeyValues for a row).

    If you don't use this feature, implement filterRow() { return false; }. The JIT hotspot will take care of optimizing things.

    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 13, 2009 at 8:33 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730499#action_12730499 ]

    Doğacan Güney commented on HBASE-1647:
    --------------------------------------

    Hey Ryan,

    I did not mean that with my description but I was not clear at all. From IRC:

    dogacan: St^Ack: "On StoreScanner running through in an column order rather than row-at-a-time, thats not how I understand it works but maybe thats how it appears in this context."
    [10:32pm] dogacan: you are right here. I meant (again if I understood code correctly) scanners go to next row to figure out they went too far [i mean we peek to the next row in StoreScanner then get DONE then call filterRow]
    [10:32pm] dogacan: and when they do, they used to call filterRow

    ([.....] part was not on IRC)

    Anyway, did you try the ScanBug class I attached? When you set a ValueFilter, it filters out all other columns (because in current way there is almost one filterRow call for every filterKeyValue call).
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 14, 2009 at 12:15 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: HBASE-1647-v3.patch

    v3 of patch

    1) It seems stopRow pass check is only done if results.isEmpty. This patch makes stopRow pass checks at every row change.

    2) I moved filterRowKey into RegionScanner as well. Again, please review :), but I think calling filterRowKey in RegionScanner once should be safe and (very) slightly faster.

    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 15, 2009 at 1:09 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: HBASE-1647-v4.patch

    v4 of patch.

    * I have removed all test methods in TestStoreScanner as most of the filter methods are now called in RegionScanner. Should I also refactor the test methods to TestScanner?

    * I have made a small change in TestScanner. RegionScanner#next's javadoc:

    {code}

    /**
    * Get the next row of results from this region.
    * @param results list to append results to
    * @return true if there are more rows, false if scanner is done
    */

    {code}

    And in TestScanner#testStopRow:

    {code}

    InternalScanner s = r.getScanner(scan);
    int count = 0;
    while (s.next(results)) {
    count++;
    }

    {code}

    In trunk count is 1. However, there is only one row to scan ("abc"). Since once we call next (and put KeyValue-s in results) there are no more rows so I think we must return false (thus count is 0). Please correct me if I am wrong here.

    * There was a possibly serious bug in v3 in RegionScanner. It implicitly assumes that the caller cleared results list between calls to RegionScanner#next. If caller doesn't do that, we may delete results from older rows or even get stuck in an infinite loop. So I added a new field to RegionScanner. KeyValue-s are initially accumulated (or filtered) in this new field. Upon completion of next, they are added to the outResults. I am not sure if this is necessary (no code in hbase reuses results).
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 15, 2009 at 5:48 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731588#action_12731588 ]

    stack commented on HBASE-1647:
    ------------------------------

    .bq I have removed all test methods in TestStoreScanner as most of the filter methods are now called in RegionScanner. Should I also refactor the test methods to TestScanner?

    Tests are kinda critical for this infrequently used but critical feature.

    But this issue is more about how the new filter Interface works, fixing the context at which each of the filter methods are called. Lets get that worked out first before we work on tests.

    .bq On TestScanner#testStopRow.... 1 vs 0

    That looks right.

    On the javadoc change, I don't see it in the patch.

    Otherwise, patch looks good to me. Let me kick Ryan and get him to review it.





    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Jul 17, 2009 at 12:26 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732262#action_12732262 ]

    ryan rawson commented on HBASE-1647:
    ------------------------------------

    There are some issues that need to be addressed before this can go in:

    - results is now a field for no reason. This reduces GC efficiency and performance.
    - RegionScanner#next is a mess now. Too many boolean flags, I don't detect a sense of clear minded purpose. Unbalanced and uncertain flags and filter.reset calls make me concerned about bugs.
    - The last bug one is tests were deleted, instead of migrated. We lose test coverage with this patch.

    I'm poking at it more, but the next and test issue are show stoppers.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 17, 2009 at 11:18 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: HBASE-1647-v5.patch
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 17, 2009 at 11:18 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732469#action_12732469 ]

    Doğacan Güney commented on HBASE-1647:
    --------------------------------------
    # results is now a field for no reason. This reduces GC efficiency and performance.
    I explained why in my previous comment. Not sure if mine is a valid reason for worrying though. It seems results is always cleared in internal hbase usage so my extra safeguard there may be pointless.
    RegionScanner#next is a mess now. Too many boolean flags, I don't detect a sense of clear minded purpose.
    Unbalanced and uncertain flags and filter.reset calls make me concerned about bugs.
    I see your point, yet in other ways, it is also clearer now. All the extra logic outside the while loop is moved into the loop, and stop row comparison code is now in one place.

    I reduced boolean flags to one (filterCurrentRow). It is an optimization flag like stickyNextRow in underlying scanners.

    I also refactored code a bit. Let me know if it is clearer now.
    # The last bug one is tests were deleted, instead of migrated. We lose test coverage with this patch.
    I added tests to TestScanner.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 17, 2009 at 11:54 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doğacan Güney updated HBASE-1647:
    ---------------------------------

    Attachment: HBASE-1647-v6.patch

    oops. Sorry minor bug in last patch, updated.....
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Clint Morgan (JIRA) at Jul 17, 2009 at 10:04 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732753#action_12732753 ]

    Clint Morgan commented on HBASE-1647:
    -------------------------------------

    This patch fixes a few of my filter backed tests that were failing.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Jul 20, 2009 at 8:10 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733155#action_12733155 ]

    ryan rawson commented on HBASE-1647:
    ------------------------------------

    thanks for the updated patch. The tests need to be migrated not just deleted, but I'll poke at the implementation and its consequences compared to the previous implementation tomorrow. I've posted it to github:

    http://github.com/ryanobjc/hbase/commit/3b31d0f4b0c0df2ad519f421d35cbb216da054e1
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doğacan Güney (JIRA) at Jul 20, 2009 at 7:05 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733321#action_12733321 ]

    Doğacan Güney commented on HBASE-1647:
    --------------------------------------

    No problem.

    Btw, I updated the tests in latest patch (moved them into TestScanner).
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 25, 2009 at 4:11 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1647:
    -------------------------

    Assignee: ryan rawson
    Status: Patch Available (was: Open)

    Patch available, assigning Ryan for review. Assign it to me Ryan if you want me to review it instead.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • ryan rawson (JIRA) at Jul 28, 2009 at 8:52 am
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735995#action_12735995 ]

    ryan rawson commented on HBASE-1647:
    ------------------------------------

    stack, put it in as is, and I'll make adjustments if necessary. i want to think about the code a bit more, but i've been busy. i'll do it tomorrow.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 28, 2009 at 12:12 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1647:
    -------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    Ok Ryan.

    Committed. Thanks for patch Doğacan.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 28, 2009 at 12:52 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack reopened HBASE-1647:
    --------------------------


    Reopening. It may have broken scanning. Not sure yet. Looking.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 28, 2009 at 1:28 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736058#action_12736058 ]

    stack commented on HBASE-1647:
    ------------------------------

    Its not this patch that was prob. Will reapply in a while.
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 28, 2009 at 9:30 pm
    [ https://issues.apache.org/jira/browse/HBASE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack resolved HBASE-1647.
    --------------------------

    Resolution: Fixed
    Hadoop Flags: [Reviewed]

    Committed (again).
    Filter#filterRow is called too often, filters rows it shouldn't have
    --------------------------------------------------------------------

    Key: HBASE-1647
    URL: https://issues.apache.org/jira/browse/HBASE-1647
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Doğacan Güney
    Assignee: ryan rawson
    Fix For: 0.20.0

    Attachments: HBASE-1647-v2.patch, HBASE-1647-v3.patch, HBASE-1647-v4.patch, HBASE-1647-v5.patch, HBASE-1647-v6.patch, ScanBug.java, scanfilter.patch


    Filter#filterRow is called from ScanQueryMatcher#filterEntireRow which is called from StoreScanner.next. However, if I understood the code correctly, StoreScanner processes KeyValue-s in a column-oriented order (i.e. after row1-col1 comes row2-col1, not row1-col2). Thus, when filterEntireRow is called, in reality, the filter only processed (via filterKeyValue) only one column of a row.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 11, '09 at 12:05p
activeJul 28, '09 at 9:30p
posts23
users1
websitehbase.apache.org

1 user in discussion

stack (JIRA): 23 posts

People

Translate

site design / logo © 2022 Grokbase