FAQ
Hi,
Someone suggested that I should use MemoryIndex to
match content to a large # of queries. e.g. "nike red
shoes" --match--> "nike shoes -blue" and --match-->
"nike shoes -black"... What if I have 100000 of these
queries for each content? and there maybe 1000000 of
these contents.

But how fast is MemoryIndex? Is it cpu and memory
intensive? I read somewhere and it said that it is
about three order faster than normal operation. If
so, why not use it for the normal operation as well?

Many thanks.





__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Olena Medelyan at Oct 24, 2005 at 9:25 am
    Hi Sam,

    to do such matching you first of all need something that keeps semantic
    information about words: e.g. a thesaurus, where "red", "blue" and "black"
    are all grouped under the same term "colour". Otherwise, how will
    your system know that "nike red shoes" should match to "nike shoes -black" and not to
    "nike shoes -"anything else"?
    You would also need rules that define that only certain terms are to
    be replaced with alternatives. Otherwise, your query can be mapped to X
    alternatives like:
    "-adidas red shoes", "nike red -pants" ...

    Cheers,
    Olena
    On Sun, 23 Oct 2005, Sam Lee wrote:

    Hi,
    Someone suggested that I should use MemoryIndex to
    match content to a large # of queries. e.g. "nike red
    shoes" --match--> "nike shoes -blue" and --match-->
    "nike shoes -black"... What if I have 100000 of these
    queries for each content? and there maybe 1000000 of
    these contents.

    But how fast is MemoryIndex? Is it cpu and memory
    intensive? I read somewhere and it said that it is
    about three order faster than normal operation. If
    so, why not use it for the normal operation as well?

    Many thanks.





    __________________________________
    Start your day with Yahoo! - Make it your home page!
    http://www.yahoo.com/r/hs

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Markharw00d at Oct 24, 2005 at 4:07 pm
    If so, why not use it for the normal operation as well?
    Because MemoryIndex only allows you to store/query one document.
    It is fast, but I would not suggest running 10000 queries against it.

    Why not try store the queries as documents in a special index and query
    them using the subject document.
    The results will be a rough short-list of the queries you now need to
    run (ie less than 10,000!). Put the subject document eg "i sell red
    nike shoes" into a memory index then run the selected queries against it.

    These queries may have mandatory clauses ( eg +/- operators) which may
    cause them to fail when run as queries against the MemoryIndexed subject
    doc which is why the first "query the queries" search is insufficient to
    find the matches.

    Cheers,
    Mark


    Sam Lee wrote:
    Hi,
    Someone suggested that I should use MemoryIndex to
    match content to a large # of queries. e.g. "nike red
    shoes" --match--> "nike shoes -blue" and --match-->
    "nike shoes -black"... What if I have 100000 of these
    queries for each content? and there maybe 1000000 of
    these contents.

    But how fast is MemoryIndex? Is it cpu and memory
    intensive? I read somewhere and it said that it is
    about three order faster than normal operation. If
    so, why not use it for the normal operation as well?

    Many thanks.





    __________________________________
    Start your day with Yahoo! - Make it your home page!
    http://www.yahoo.com/r/hs

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org








    ___________________________________________________________
    Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sam Lee at Oct 24, 2005 at 4:55 pm
    How much of a performance impact if I store queries as
    documents first?

    Actually, I just thought of a way to first select
    queries with certain quality before doing memoryindex,
    so it will trim it to much less than 100000.

    But has anyone done MemoryIndex? I need some
    real-world examples that can tell me how fast
    MemoryIndex is before I decide to use it, like # of
    queries /sec and cpu and memory they are using, etc.
    I searched all over google but can't find any.

    --- markharw00d wrote:
    If so, why not use it for the normal operation as
    well?

    Because MemoryIndex only allows you to store/query
    one document.
    It is fast, but I would not suggest running 10000
    queries against it.

    Why not try store the queries as documents in a
    special index and query
    them using the subject document.
    The results will be a rough short-list of the
    queries you now need to
    run (ie less than 10,000!). Put the subject
    document eg "i sell red
    nike shoes" into a memory index then run the
    selected queries against it.

    These queries may have mandatory clauses ( eg +/-
    operators) which may
    cause them to fail when run as queries against the
    MemoryIndexed subject
    doc which is why the first "query the queries"
    search is insufficient to
    find the matches.

    Cheers,
    Mark


    Sam Lee wrote:
    Hi,
    Someone suggested that I should use MemoryIndex to
    match content to a large # of queries. e.g. "nike red
    shoes" --match--> "nike shoes -blue" and
    --match-->
    "nike shoes -black"... What if I have 100000 of these
    queries for each content? and there maybe 1000000 of
    these contents.

    But how fast is MemoryIndex? Is it cpu and memory
    intensive? I read somewhere and it said that it is
    about three order faster than normal operation. If
    so, why not use it for the normal operation as well?
    Many thanks.





    __________________________________
    Start your day with Yahoo! - Make it your home page!
    http://www.yahoo.com/r/hs
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org







    ___________________________________________________________
    Yahoo! Messenger - NEW crystal clear PC to PC
    calling worldwide with voicemail
    http://uk.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org



    __________________________________
    Yahoo! FareChase: Search multiple travel sites in one click.
    http://farechase.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Christophe at Oct 24, 2005 at 4:57 pm
    Hi, Sam,

    Is there a reason you couldn't build a test case and try it, in your
    environment and on your hardware? That seems to be the only way to
    really answer the question.
    On 24 Oct 2005, at 09:54, Sam Lee wrote:

    How much of a performance impact if I store queries as
    documents first?

    Actually, I just thought of a way to first select
    queries with certain quality before doing memoryindex,
    so it will trim it to much less than 100000.

    But has anyone done MemoryIndex? I need some
    real-world examples that can tell me how fast
    MemoryIndex is before I decide to use it, like # of
    queries /sec and cpu and memory they are using, etc.
    I searched all over google but can't find any.

    --- markharw00d wrote:

    If so, why not use it for the normal operation as
    well?

    Because MemoryIndex only allows you to store/query
    one document.
    It is fast, but I would not suggest running 10000
    queries against it.

    Why not try store the queries as documents in a
    special index and query
    them using the subject document.
    The results will be a rough short-list of the
    queries you now need to
    run (ie less than 10,000!). Put the subject
    document eg "i sell red
    nike shoes" into a memory index then run the
    selected queries against it.

    These queries may have mandatory clauses ( eg +/-
    operators) which may
    cause them to fail when run as queries against the
    MemoryIndexed subject
    doc which is why the first "query the queries"
    search is insufficient to
    find the matches.

    Cheers,
    Mark


    Sam Lee wrote:

    Hi,
    Someone suggested that I should use MemoryIndex to
    match content to a large # of queries. e.g. "nike red
    shoes" --match--> "nike shoes -blue" and
    --match-->
    "nike shoes -black"... What if I have 100000 of these
    queries for each content? and there maybe 1000000 of
    these contents.

    But how fast is MemoryIndex? Is it cpu and memory
    intensive? I read somewhere and it said that it is
    about three order faster than normal operation. If
    so, why not use it for the normal operation as well?
    Many thanks.





    __________________________________
    Start your day with Yahoo! - Make it your home page!
    http://www.yahoo.com/r/hs

    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org









    ___________________________________________________________
    Yahoo! Messenger - NEW crystal clear PC to PC
    calling worldwide with voicemail
    http://uk.messenger.yahoo.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail:
    java-user-help@lucene.apache.org




    __________________________________
    Yahoo! FareChase: Search multiple travel sites in one click.
    http://farechase.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Oct 24, 2005 at 5:19 pm
    It is fast.
    so, why not use it for the normal operation as
    well?

    Because it only stores one document.

    Given the number of queries you have I'm not sure I'd
    run them all. How about putting them as docs into a
    categorisation index then using the subject document
    as a query to selct a subset of the queries you need
    to run?
    This should give you a rough shortlist of queries then
    you can run them all against the one memory-indexed
    subject document to see if they *really* match i.e if
    the mandatory/AND statements are all satisfied.

    Cheers,
    Mark



    ___________________________________________________________
    To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 24, '05 at 5:59a
activeOct 24, '05 at 5:19p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase