FAQ
Hey guys,
I've been digging through JIRA and all related issued for secondary
indexing for use within my Datanucleus Plugin.

https://issues.apache.org/jira/browse/CASSANDRA-749


Currently there is only indexing for LT and LTE expression when an EQ
operator is present. Will it be possible to use the LT and LTE ops
without an EQ by the 0.7.0 release? If not, which of the following
would be more efficient?



1. Creating a dummy column of 1 byte that is indexed. Then when I
analyze the AST from the query, if no existing == op is present, and an
EQ op to my dummy column. This is is functionally a no-op, but required
for the client to execute an index operation.

2. Use my previous indexing scheme of 2 Super CF for longs and strings
to get my < <= operations. Where I use the following scheme.


Long: <index name> {

long value 1: { serialized keys (row keys for dn) }
long value 2: { serialized keys (row keys for dn)


}

Thanks,
Todd

Search Discussions

  • Jonathan Ellis at Oct 12, 2010 at 11:48 pm

    On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine wrote:
    Currently there is only indexing for LT and LTE expression when an EQ
    operator is present.  Will it be possible to use the LT and LTE ops
    without an EQ by the 0.7.0 release? No.
    If not, which of the following
    would be more efficient?

    1. Creating a dummy column of 1 byte that is indexed.
    This is basically the same as doing a full range scan, only less efficient.
    2. Use my previous indexing scheme of 2 Super CF for longs and strings
    to get my < <= operations.  Where I use the following scheme.
    I'm not sure I follow but if it's better than doing a full range scan
    then it is better than 1. :)

    --
    Jonathan Ellis
    Project Chair, Apache Cassandra
    co-founder of Riptano, the source for professional Cassandra support
    http://riptano.com
  • Todd Nine at Oct 13, 2010 at 12:59 am
    Fair enough!


    Thanks Jonathan.


    todd
    SENIOR SOFTWARE ENGINEER

    todd nine| spidertracks ltd | 117a the square
    po box 5203 | palmerston north 4441 | new zealand
    P: +64 6 353 3395 | M: +64 210 255 8576
    E: todd@spidertracks.co.nz W: www.spidertracks.com




    On Tue, 2010-10-12 at 18:47 -0500, Jonathan Ellis wrote:
    On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine wrote:
    Currently there is only indexing for LT and LTE expression when an EQ
    operator is present. Will it be possible to use the LT and LTE ops
    without an EQ by the 0.7.0 release? No.
    If not, which of the following
    would be more efficient?

    1. Creating a dummy column of 1 byte that is indexed.
    This is basically the same as doing a full range scan, only less efficient.
    2. Use my previous indexing scheme of 2 Super CF for longs and strings
    to get my < <= operations. Where I use the following scheme.
    I'm not sure I follow but if it's better than doing a full range scan
    then it is better than 1. :)
  • Todd Nine at Oct 13, 2010 at 2:33 am
    Thanks Johnathan,

    A follow up question. Will it be possible to migrate existing indexes
    in a future release as part of the upgrade path to support LT and LTE
    ops without equal? In the meantime in my Datanucleus Plugin I was
    thinking I could do something like the following. It's not efficient
    for space, but it will work and should hopefully be relatively efficient
    for querying.


    LT and LTE ops can be though of as the distance from the MAX value of
    any given data type. For instance, if I had a data type :"ubershort",
    which goes from -200 to 200, I could say that an expression of <= 0 is
    really >= (distance) 200 from the maximum. I could use this equation to
    calculate the "distance" to persist a distance value in a column named
    "<colName>_reverse". Which would effectively give me a reverse index.


    Then the value would simply be

    storedValue = MAXVALUE-userVal.
    From there, whenever the user issues a < <= query, I would simply
    translate the value via the above equation and < becomes > and <=
    becomes >=. Aside from the space issue of storage, do you see any other
    problems with this approach for a 0.7 compatible version of my plugin?

    Thanks,
    Todd




    On Wed, 2010-10-13 at 14:00 +1300, Todd Nine wrote:

    Fair enough!


    Thanks Jonathan.


    todd
    SENIOR SOFTWARE ENGINEER

    todd nine| spidertracks ltd | 117a the square
    po box 5203 | palmerston north 4441 | new zealand
    P: +64 6 353 3395 | M: +64 210 255 8576
    E: todd@spidertracks.co.nz W: www.spidertracks.com






    On Tue, 2010-10-12 at 18:47 -0500, Jonathan Ellis wrote:
    On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine wrote:
    Currently there is only indexing for LT and LTE expression when an EQ
    operator is present. Will it be possible to use the LT and LTE ops
    without an EQ by the 0.7.0 release? No.
    If not, which of the following
    would be more efficient?

    1. Creating a dummy column of 1 byte that is indexed.
    This is basically the same as doing a full range scan, only less efficient.
    2. Use my previous indexing scheme of 2 Super CF for longs and strings
    to get my < <= operations. Where I use the following scheme.
    I'm not sure I follow but if it's better than doing a full range scan
    then it is better than 1. :)
  • Tyler Hobbs at Oct 13, 2010 at 4:02 am
    I'm not completely sure I follow your scheme, but it's fairly to support
    GT, LT, etc with your own index.

    Use a row for your index where the columns names are the data values
    you want to index. If you set the comparator type (in your example, this
    would be LongType), you can perform a LT or GT query just by getting a
    slice of the index columns. Store the original data row keys as the column
    values, and you're there.

    - Tyler
    On Tue, Oct 12, 2010 at 9:33 PM, Todd Nine wrote:

    Thanks Johnathan,

    A follow up question. Will it be possible to migrate existing indexes
    in a future release as part of the upgrade path to support LT and LTE
    ops without equal? In the meantime in my Datanucleus Plugin I was
    thinking I could do something like the following. It's not efficient
    for space, but it will work and should hopefully be relatively efficient
    for querying.


    LT and LTE ops can be though of as the distance from the MAX value of
    any given data type. For instance, if I had a data type :"ubershort",
    which goes from -200 to 200, I could say that an expression of <= 0 is
    really >= (distance) 200 from the maximum. I could use this equation to
    calculate the "distance" to persist a distance value in a column named
    "<colName>_reverse". Which would effectively give me a reverse index.


    Then the value would simply be

    storedValue = MAXVALUE-userVal.
    From there, whenever the user issues a < <= query, I would simply
    translate the value via the above equation and < becomes > and <=
    becomes >=. Aside from the space issue of storage, do you see any other
    problems with this approach for a 0.7 compatible version of my plugin?

    Thanks,
    Todd




    On Wed, 2010-10-13 at 14:00 +1300, Todd Nine wrote:

    Fair enough!


    Thanks Jonathan.


    todd
    SENIOR SOFTWARE ENGINEER

    todd nine| spidertracks ltd | 117a the square
    po box 5203 | palmerston north 4441 | new zealand
    P: +64 6 353 3395 | M: +64 210 255 8576
    E: todd@spidertracks.co.nz W: www.spidertracks.com






    On Tue, 2010-10-12 at 18:47 -0500, Jonathan Ellis wrote:
    On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine wrote:
    Currently there is only indexing for LT and LTE expression when an EQ
    operator is present. Will it be possible to use the LT and LTE ops
    without an EQ by the 0.7.0 release? No.
    If not, which of the following
    would be more efficient?

    1. Creating a dummy column of 1 byte that is indexed.
    This is basically the same as doing a full range scan, only less
    efficient.
    2. Use my previous indexing scheme of 2 Super CF for longs and
    strings
    to get my < <= operations. Where I use the following scheme.
    I'm not sure I follow but if it's better than doing a full range scan
    then it is better than 1. :)
  • Todd Nine at Oct 13, 2010 at 6:45 pm
    Hi Tyler,
    That was the original scheme I was describing in the original email.
    Unfortunately, I can have more that one value per column, so I actually
    have to use super columns. This way I can write more that one row key
    for any given indexed value. I'm concerned that this may not scale well
    (at least on version 0.6). However after looking at the limitations
    page.

    http://wiki.apache.org/cassandra/CassandraLimitations

    It appears that the "row must fit in memory" has been removed. I'll
    move back to this scheme for my querying.

    todd
    SENIOR SOFTWARE ENGINEER

    todd nine| spidertracks ltd | 117a the square
    po box 5203 | palmerston north 4441 | new zealand
    P: +64 6 353 3395 | M: +64 210 255 8576
    E: todd@spidertracks.co.nz W: www.spidertracks.com




    On Tue, 2010-10-12 at 23:01 -0500, Tyler Hobbs wrote:

    I'm not completely sure I follow your scheme, but it's fairly to
    support
    GT, LT, etc with your own index.

    Use a row for your index where the columns names are the data values
    you want to index. If you set the comparator type (in your example,
    this
    would be LongType), you can perform a LT or GT query just by getting a
    slice of the index columns. Store the original data row keys as the
    column
    values, and you're there.

    - Tyler


    On Tue, Oct 12, 2010 at 9:33 PM, Todd Nine wrote:

    Thanks Johnathan,

    A follow up question. Will it be possible to migrate existing
    indexes
    in a future release as part of the upgrade path to support LT
    and LTE
    ops without equal? In the meantime in my Datanucleus Plugin
    I was
    thinking I could do something like the following. It's not
    efficient
    for space, but it will work and should hopefully be relatively
    efficient
    for querying.


    LT and LTE ops can be though of as the distance from the MAX
    value of
    any given data type. For instance, if I had a data
    type :"ubershort",
    which goes from -200 to 200, I could say that an expression of
    <= 0 is
    really >= (distance) 200 from the maximum. I could use this
    equation to
    calculate the "distance" to persist a distance value in a
    column named
    "<colName>_reverse". Which would effectively give me a reverse
    index.


    Then the value would simply be

    storedValue = MAXVALUE-userVal.
    From there, whenever the user issues a < <= query, I would
    simply
    translate the value via the above equation and < becomes > and
    <=
    becomes >=. Aside from the space issue of storage, do you see
    any other
    problems with this approach for a 0.7 compatible version of my
    plugin?

    Thanks,
    Todd






    On Wed, 2010-10-13 at 14:00 +1300, Todd Nine wrote:

    Fair enough!


    Thanks Jonathan.


    todd
    SENIOR SOFTWARE ENGINEER

    todd nine| spidertracks ltd | 117a the square
    po box 5203 | palmerston north 4441 | new zealand
    P: +64 6 353 3395 | M: +64 210 255 8576
    E: todd@spidertracks.co.nz W: www.spidertracks.com






    On Tue, 2010-10-12 at 18:47 -0500, Jonathan Ellis wrote:

    On Tue, Oct 12, 2010 at 6:34 PM, Todd Nine
    wrote:
    Currently there is only indexing for LT and LTE
    expression when an EQ
    operator is present. Will it be possible to use the LT
    and LTE ops
    without an EQ by the 0.7.0 release? No.
    If not, which of the following
    would be more efficient?

    1. Creating a dummy column of 1 byte that is indexed.
    This is basically the same as doing a full range scan,
    only less efficient.
    2. Use my previous indexing scheme of 2 Super CF for
    longs and strings
    to get my < <= operations. Where I use the following
    scheme.
    I'm not sure I follow but if it's better than doing a full
    range scan
    then it is better than 1. :)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriescassandra
postedOct 12, '10 at 11:34p
activeOct 13, '10 at 6:45p
posts6
users3
websitecassandra.apache.org
irc#cassandra

People

Translate

site design / logo © 2021 Grokbase