FAQ
Hello there! We are indexing metadata for our medias. One ideia is that each
user adds its own metadata, so each document may have different
number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
the this relax approach.

Also, considering that each user may define its own metadata, we may have
several different types of fields. Is there a limit for this?

Regards

--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

Search Discussions

  • Erick Erickson at Mar 12, 2010 at 1:44 pm
    There's no requirement that all documents have the same
    fields, Lucene is fine with different docs having different
    fields.

    There's no limit on the number of different fields allowed
    that I know of, but I'm sure someone will chime in if there
    is....

    HTH
    Erick
    On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho wrote:

    Hello there! We are indexing metadata for our medias. One ideia is that
    each
    user adds its own metadata, so each document may have different
    number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
    the this relax approach.

    Also, considering that each user may define its own metadata, we may have
    several different types of fields. Is there a limit for this?

    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.
  • Uwe Schindler at Mar 12, 2010 at 1:51 pm
    You get memory problems if you turn on norms for all those fields (as norms are large byte[] arrays per field). But this is not a hard limitation, but you should take care.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Erick Erickson
    Sent: Friday, March 12, 2010 2:43 PM
    To: java-user@lucene.apache.org
    Subject: Re: Question on number of fields in a document

    There's no requirement that all documents have the same
    fields, Lucene is fine with different docs having different
    fields.

    There's no limit on the number of different fields allowed
    that I know of, but I'm sure someone will chime in if there
    is....

    HTH
    Erick

    On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho <
    viniciusccarvalho@gmail.com> wrote:
    Hello there! We are indexing metadata for our medias. One ideia is that
    each
    user adds its own metadata, so each document may have different
    number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
    the this relax approach.

    Also, considering that each user may define its own metadata, we may have
    several different types of fields. Is there a limit for this?

    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Renaud Delbru at Mar 12, 2010 at 2:46 pm
    There is some bottleneck when you have a large number of fields and of
    words. Each field has its own list of terms which means that the
    dictionary, in the worst case, could be of size n*m (with n the number
    of fields, and m the number of terms).
    This can lead to some overhead when looking up a term in the case where
    n and m is large. (Term lookup occurs for each keyword in a query).

    Another problem (for the end user) of using an arbitrary number of
    fields is that the user will have to know exactly which field names to
    query. By default, Lucene cannot search efficiently on an arbitrary
    number of fields, unless you create a "content" field that you will use
    to index the values from all the fields. This will duplicate the data
    inside the index (even if it is cheap to index two times the same data,
    it can be problematic for very large index).

    We have released recently a plugin for Lucene (SIREn [1]) that tackles
    such particular problem. It has been developped initially to create a
    search engine for RDF data (standard model for data interchange on the
    web). It allows to index an arbitrary number of fields without facing
    the two previous problems, but also to keep web scale performance. In
    addition, it allows to use keyword search on the field names, and better
    support of multi-valued fields.

    I think the best it to give try, do a benchmark using Lucene and SIREn,
    and see which one answers more your needs (in term of response time, and
    also on search capabilities). If your index stays relatively small (few
    thousands or maybe millions of documents), then maybe Lucene is a good
    choice, but if your expect to have a large index (millions of documents)
    with an arbitrary number of fields (thousands or even more like tens of
    thousands), then maybe SIREn will be more suitable.

    [1] http://siren.sindice.com/
    --
    Renaud Delbru
    On 12/03/10 13:43, Erick Erickson wrote:
    There's no requirement that all documents have the same
    fields, Lucene is fine with different docs having different
    fields.

    There's no limit on the number of different fields allowed
    that I know of, but I'm sure someone will chime in if there
    is....

    HTH
    Erick

    On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho<
    viniciusccarvalho@gmail.com> wrote:

    Hello there! We are indexing metadata for our medias. One ideia is that
    each
    user adds its own metadata, so each document may have different
    number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
    the this relax approach.

    Also, considering that each user may define its own metadata, we may have
    several different types of fields. Is there a limit for this?

    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 12, '10 at 12:53p
activeMar 12, '10 at 2:46p
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase