FAQ
Hi,

in my application I have documents that may contain terms and term translations in multiple languages. The language tag of each term is explicitly given and should be available in the index in order to enable queries for documents that contain a certain term (optionally in a given language).

I could split the documents in a set of sub-documents each containing terms in one specific language and a dedicated field indicating the language. But then I need multiple queries to retrieve stored term translations from the subdocuments.

The IMO better alternative is not to split the document and to assign the language tags as payloads to the terms. But then I need

(i) a search filter that eliminates docs based on a given language tag and

(ii) a way to access the term payloads from the documents returned by the searcher

For both I haven't found a solution yet. Can I write a custom PayloadFilter or is there already some implementation available? Is it possible to access the term payloads from the search results?

Thanks.
Bernhard
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Jul 8, 2010 at 2:30 pm
    If you know this at index time, could you index language-specific fields?
    i.e.
    text_en, text_de, title_en, title_de etc? Perhaps you could have a catch-all
    that contained everything too.

    Then your searching would be on a per field_lang basis.
    PerFieldAnalyzerWrapper
    would automatically use the proper language-specific Analyzers.

    This may turn out being too clumsy if you have many fields X many
    languages....

    Actually, this looks a lot like what SOLR could provide, perhaps with
    dynamic
    fields and the dismax query parser

    Best
    Erick
    On Thu, Jul 8, 2010 at 4:47 AM, Bernhard Haslhofer wrote:

    Hi,

    in my application I have documents that may contain terms and term
    translations in multiple languages. The language tag of each term is
    explicitly given and should be available in the index in order to enable
    queries for documents that contain a certain term (optionally in a given
    language).

    I could split the documents in a set of sub-documents each containing terms
    in one specific language and a dedicated field indicating the language. But
    then I need multiple queries to retrieve stored term translations from the
    subdocuments.

    The IMO better alternative is not to split the document and to assign the
    language tags as payloads to the terms. But then I need

    (i) a search filter that eliminates docs based on a given language tag and

    (ii) a way to access the term payloads from the documents returned by the
    searcher

    For both I haven't found a solution yet. Can I write a custom PayloadFilter
    or is there already some implementation available? Is it possible to access
    the term payloads from the search results?

    Thanks.
    Bernhard
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 8, '10 at 8:47a
activeJul 8, '10 at 2:30p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase