FAQ
Scanning API must be reworked to allow for fully functional Filters client-side
-------------------------------------------------------------------------------

Key: HBASE-1831
URL: https://issues.apache.org/jira/browse/HBASE-1831
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Jonathan Gray
Priority: Critical
Fix For: 0.20.1, 0.21.0


Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.

A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).

This issue is about dealing with Filters properly from the client-side.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Jonathan Gray (JIRA) at Sep 23, 2009 at 4:36 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758581#action_12758581 ]

    Jonathan Gray commented on HBASE-1831:
    --------------------------------------

    Filters should not be run on the client-side at all.

    Server needs to be able to tell whether Scan should continue to the next region or not.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 23, 2009 at 5:05 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758584#action_12758584 ]

    stack commented on HBASE-1831:
    ------------------------------

    Chatted with Jon.

    1. Need to flag client to STOP. HTable#next internally uses the batch version of next (so can do prefetching of rows). An empty list of Results is always sent -- never null. We'll add passing null as the flag that filter is done; do not move to next region (Client has code to handle null list so if an old hbase version connects, it won't break; it'll just not do the STOP properly).
    2. There is a non-batch next in the ipc interface. I was thinking of deprecating it and moving internals to use the batch interface only, but these internal uses of scanners do not carry filters so will just leave them for 0.20.1.
    3. Filters carry state. How do we get the state across region transitions? Again chatting with Jon, will do the following. If a Scanner has a filter, and we got back a non-empty list, its time to move to the next region. Just before we move to the next region, we'll make another call to the old server -- Scanner.getFilter -- whose result is the deserialized filter. The deserialized filter will be passed then to the next region. In this manner filters will be able to carry their state forward. Downside is extra RPC call IF scanning with a filter.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 23, 2009 at 7:33 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758847#action_12758847 ]

    stack commented on HBASE-1831:
    ------------------------------

    On 3., above, carrying stateful filters across regions, it can't be done easily in 0.20.x because we can get a NotServingRegionException at any time. Also, if nothing goes wrong and we just exhaust a scanner on a particular region, the last thing done over in RegionServer is cleanup of scanner so a subsequent getFilter call would have nothing to pick up on once it'd arrived at the RegionServer. Statefulness has to be done client-side; filters need to allow specifying a client-component.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 23, 2009 at 7:43 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758849#action_12758849 ]

    stack commented on HBASE-1831:
    ------------------------------

    @jgray You say, "Filters should not be run on the client-side at all."

    I don't know how else we can do stateful scanners that ride over regions.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 23, 2009 at 7:59 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758854#action_12758854 ]

    stack commented on HBASE-1831:
    ------------------------------

    So, chatting w/ Ryan on this topic, he suggests that support for stateful filters is going to be a bear -- there's splitting and then what if regionserver crashes, etc. At the moment we figure a filter wants to stop the scan by testing the last regions endrow against the filter run client-side. Would it be enough my removing this and just adding the flag above which allows filters server-side to stop the scan (by passing back null result out of batch next)? There'd be no getFilter to pick up a filters state and pass it from one region to the next.

    There is going to be a problem though when fellas want to do filters that return 20 row results only or that want to have a filter skip 1000 rows; to do this, we'd something to run client-side.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 23, 2009 at 8:45 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Attachment: 1831.patch

    Here is a bit of a start. It adds javadoc to the ipc interfaces about new meaning of null. It then adds a new flag to the client-side nextScanner method. It renames the method that checks for the end row in Scan.... Now to work on server-side.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Sep 24, 2009 at 12:06 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758939#action_12758939 ]

    Jonathan Gray commented on HBASE-1831:
    --------------------------------------

    I agree that this is not going to be easy. Still think that filters should not be run client-side (even if they are, it will require additional/new information to be sent from client to server).

    Running an offset of 1000 rows client-side completely negates the value in using filters in the first place... to not have to send back all that data. That's the reason client-side filters don't really work, they require you to send back otherwise filtered-out data.

    What I would like to see as a long-term solution to this:

    1st: Add ability for server to say STOP and not go to next region
    2nd: Correctness for stateful scanners under non-split, non-failure scenarios (less correctness / fail-fast if encountering issues)
    3rd: Correctness/robustness for stateful scanners under splits and failures
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 25, 2009 at 12:32 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Attachment: 1831-v2.patch

    v2... complete but for test. Working on that now.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Sep 27, 2009 at 10:55 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Attachment: 1831-v3.patch

    Still not done
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 2, 2009 at 4:22 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Attachment: 1831-v4.patch

    Adds two ugly tests. One under filter that puts up three regions and then checks at the Region level that filters are doing right thing (Why can't i instantiate an HRegionServer and test from its interface -- its currently way too hard to put one of these up... requires there be a master.. .it shouldn't). Other test if ugly from client side. Splits table then makes sure RowFilter is returning right results around the row boundary. I can assert counts but I can't assert that only a subset of regions are being accessed with asserts. To do the latter, I added logging and it required eyeballing but you can see in the logs that yes we do not go to next region if filter says we're done.

    A few tests seem to be failing.... Looking into it.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 2, 2009 at 5:08 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Attachment: 1831-v5.patch

    This version passes all tests.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831-v5.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 3, 2009 at 5:19 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Status: Patch Available (was: Open)

    Needs review.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831-v5.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Purtell (JIRA) at Oct 5, 2009 at 11:37 pm
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrew Purtell updated HBASE-1831:
    ----------------------------------

    Attachment: 1831-v6.patch

    Testing this now. There were two rejects against latest SVN 0.20, one in HTable, one in HConnectionManager. I fixed them up and attached the result as -v6
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831-v5.patch, 1831-v6.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Purtell (JIRA) at Oct 6, 2009 at 12:31 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762465#action_12762465 ]

    Andrew Purtell commented on HBASE-1831:
    ---------------------------------------

    +1 all tests pass here
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831-v5.patch, 1831-v6.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 6, 2009 at 3:27 am
    [ https://issues.apache.org/jira/browse/HBASE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HBASE-1831:
    -------------------------

    Resolution: Fixed
    Assignee: stack
    Status: Resolved (was: Patch Available)

    Thanks for review Andrew. Committed branch and trunk.
    Scanning API must be reworked to allow for fully functional Filters client-side
    -------------------------------------------------------------------------------

    Key: HBASE-1831
    URL: https://issues.apache.org/jira/browse/HBASE-1831
    Project: Hadoop HBase
    Issue Type: Bug
    Affects Versions: 0.20.0
    Reporter: Jonathan Gray
    Assignee: stack
    Priority: Critical
    Fix For: 0.20.1, 0.21.0

    Attachments: 1831-v2.patch, 1831-v3.patch, 1831-v4.patch, 1831-v5.patch, 1831-v6.patch, 1831.patch


    Right now, a client replays part of the Filter locally by calling filterRowKey() and filterAllRemaining() to determine whether it should continue to the next region.
    A number of new filters rely on filterKeyValue() and other calls to alter state. It's also a false assumption that all rows/keys affecting a filter returning true for FAR will be seen client-side (what about those that failed the filter).
    This issue is about dealing with Filters properly from the client-side.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedSep 12, '09 at 1:08a
activeOct 6, '09 at 3:27a
posts16
users1
websitehbase.apache.org

1 user in discussion

stack (JIRA): 16 posts

People

Translate

site design / logo © 2022 Grokbase