FAQ
Hi,

We are using a Lucene 3.x index to search for photo albums based on
textual properties such as photo album title/author/URL and photo
captions/URLs. Goal is to find the most relevant photo albums for a user
query and display the best matching photos for these albums.

In our current solution we append, for each photo album, all photo
captions in a catchall field and do a search. For each search result we
get the URL and do a look up in another Lucene index where each document
contains individual photo properties to find the best matching photos in
a photo album. These indexes are very small, but there is one for each
photo album. (see example below)

I've tried to use multivalued fields and exploit the fact that the
values are ordered so that for example the 2nd value of field:caption is
related to the 2nd value of field:URL.

The problem with this solution is that it is not possible to find the
best matching photos in the best photo albums (scoring of multivalued
fields is performed on all values together). As a workaround I've tried
storing some fields and creating the photo indexes on the fly using a
RAMDirectory. Besides the fact that this is not an elegant solution, the
increase in index size is unacceptable in our current setup.

What would be a better way to solve this problem?

Thanks for your time,
Dennis

---

Example:

index: photo albums
doc1( field:title, field:author, field:url_album, field:id=unique_id )
doc2( field:title, field:author, field:url_album, field:id=unique_id )
...

index: <unique_id>
doc1(field:caption, field:url_photo)
doc1(field:caption, field:url_photo)
doc1(field:caption, field:url_photo)



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ryan Aylward at Dec 15, 2010 at 11:07 pm
    Would you be able to create a single index with all photos? Your searches would go against the photo index. At that point, you would have the most relevant photos regardless of album. You could then introduce a sort to your Lucene search to ensure all photos from a given album are grouped together.

    So your result would be something like:

    Album 1 Photo 1a
    Album 1 Photo 1b
    Album 2 Photo 2a
    Album 3 Photo 3a
    Album 3 Photo 3b


    -----Original Message-----
    From: Dennis Hendriksen
    Sent: Wednesday, December 15, 2010 8:04 AM
    To: java-user@lucene.apache.org
    Subject: Multivalued scoring

    Hi,

    We are using a Lucene 3.x index to search for photo albums based on
    textual properties such as photo album title/author/URL and photo
    captions/URLs. Goal is to find the most relevant photo albums for a user
    query and display the best matching photos for these albums.

    In our current solution we append, for each photo album, all photo
    captions in a catchall field and do a search. For each search result we
    get the URL and do a look up in another Lucene index where each document
    contains individual photo properties to find the best matching photos in
    a photo album. These indexes are very small, but there is one for each
    photo album. (see example below)

    I've tried to use multivalued fields and exploit the fact that the
    values are ordered so that for example the 2nd value of field:caption is
    related to the 2nd value of field:URL.

    The problem with this solution is that it is not possible to find the
    best matching photos in the best photo albums (scoring of multivalued
    fields is performed on all values together). As a workaround I've tried
    storing some fields and creating the photo indexes on the fly using a
    RAMDirectory. Besides the fact that this is not an elegant solution, the
    increase in index size is unacceptable in our current setup.

    What would be a better way to solve this problem?

    Thanks for your time,
    Dennis

    ---

    Example:

    index: photo albums
    doc1( field:title, field:author, field:url_album, field:id=unique_id )
    doc2( field:title, field:author, field:url_album, field:id=unique_id )
    ...

    index: <unique_id>
    doc1(field:caption, field:url_photo)
    doc1(field:caption, field:url_photo)
    doc1(field:caption, field:url_photo)



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Dec 16, 2010 at 12:04 am
    As Ryan mentions, you really should consider piling them all into
    a single index. Yes, it seems really wasteful to re-index author,
    URL with every last photo, but try it and see if the size is acceptable.
    Or, more accurately, whether performance is acceptable.

    Best
    Erick
    On Wed, Dec 15, 2010 at 11:04 AM, Dennis Hendriksen wrote:

    Hi,

    We are using a Lucene 3.x index to search for photo albums based on
    textual properties such as photo album title/author/URL and photo
    captions/URLs. Goal is to find the most relevant photo albums for a user
    query and display the best matching photos for these albums.

    In our current solution we append, for each photo album, all photo
    captions in a catchall field and do a search. For each search result we
    get the URL and do a look up in another Lucene index where each document
    contains individual photo properties to find the best matching photos in
    a photo album. These indexes are very small, but there is one for each
    photo album. (see example below)

    I've tried to use multivalued fields and exploit the fact that the
    values are ordered so that for example the 2nd value of field:caption is
    related to the 2nd value of field:URL.

    The problem with this solution is that it is not possible to find the
    best matching photos in the best photo albums (scoring of multivalued
    fields is performed on all values together). As a workaround I've tried
    storing some fields and creating the photo indexes on the fly using a
    RAMDirectory. Besides the fact that this is not an elegant solution, the
    increase in index size is unacceptable in our current setup.

    What would be a better way to solve this problem?

    Thanks for your time,
    Dennis

    ---

    Example:

    index: photo albums
    doc1( field:title, field:author, field:url_album, field:id=unique_id )
    doc2( field:title, field:author, field:url_album, field:id=unique_id )
    ...

    index: <unique_id>
    doc1(field:caption, field:url_photo)
    doc1(field:caption, field:url_photo)
    doc1(field:caption, field:url_photo)



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 15, '10 at 4:04p
activeDec 16, '10 at 12:04a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase