FAQ
I am new to lucene. Here is my question. The document has fields. When I add
a field to the document I can specify that field is Indexed, Tokenized,
etc.. So the same field can be Tokenized in one document and be
not-tokenized in another document. However the is a method
IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that returns all
index fields in the index. It seems like it assumes that FIELD should have
the same attributes across all documents.
Can anyoen explain it?

--
View this message in context: http://www.nabble.com/Field-Question-tp19108787p19108787.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • John Griffin at Aug 22, 2008 at 10:17 pm
    Dimitri,

    Field.TOKENIZED and Field.NO_NORMs send their field's contents
    through a tokenizer and make their contents indexed and therefore
    searchable. FIELD.UN_TOKENIZED does not send its field's contents through a
    tokenizer but it still indexes its contents. Only Field.NO does not index
    its field's contents.

    So, regardless of whether or not a particular document has a field
    tokenized while another document does not has nothing to do with whether or
    not the contents are indexed. As long as it's not Field.NO, they are.

    John G.

    -----Original Message-----
    From: DimitriD
    Sent: Friday, August 22, 2008 9:14 AM
    To: java-user@lucene.apache.org
    Subject: Field Question


    I am new to lucene. Here is my question. The document has fields. When I add
    a field to the document I can specify that field is Indexed, Tokenized,
    etc.. So the same field can be Tokenized in one document and be
    not-tokenized in another document. However the is a method
    IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that returns all
    index fields in the index. It seems like it assumes that FIELD should have
    the same attributes across all documents.
    Can anyoen explain it?

    --
    View this message in context:
    http://www.nabble.com/Field-Question-tp19108787p19108787.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Aug 22, 2008 at 11:41 pm
    Actually, Field.NO_NORMS means Field.UN_TOKENIZED plus
    Field.setOmitNorms(true).

    Mike

    John Griffin wrote:
    Dimitri,

    Field.TOKENIZED and Field.NO_NORMs send their field's contents
    through a tokenizer and make their contents indexed and therefore
    searchable. FIELD.UN_TOKENIZED does not send its field's contents
    through a
    tokenizer but it still indexes its contents. Only Field.NO does not
    index
    its field's contents.

    So, regardless of whether or not a particular document has a field
    tokenized while another document does not has nothing to do with
    whether or
    not the contents are indexed. As long as it's not Field.NO, they are.

    John G.

    -----Original Message-----
    From: DimitriD
    Sent: Friday, August 22, 2008 9:14 AM
    To: java-user@lucene.apache.org
    Subject: Field Question


    I am new to lucene. Here is my question. The document has fields.
    When I add
    a field to the document I can specify that field is Indexed,
    Tokenized,
    etc.. So the same field can be Tokenized in one document and be
    not-tokenized in another document. However the is a method
    IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that
    returns all
    index fields in the index. It seems like it assumes that FIELD
    should have
    the same attributes across all documents.
    Can anyoen explain it?

    --
    View this message in context:
    http://www.nabble.com/Field-Question-tp19108787p19108787.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • John Griffin at Aug 23, 2008 at 7:51 pm
    Thanks Mike. I stand corrected.

    John G.

    -----Original Message-----
    From: Michael McCandless
    Sent: Friday, August 22, 2008 5:40 PM
    To: java-user@lucene.apache.org
    Subject: Re: Field Question


    Actually, Field.NO_NORMS means Field.UN_TOKENIZED plus
    Field.setOmitNorms(true).

    Mike

    John Griffin wrote:
    Dimitri,

    Field.TOKENIZED and Field.NO_NORMs send their field's contents
    through a tokenizer and make their contents indexed and therefore
    searchable. FIELD.UN_TOKENIZED does not send its field's contents
    through a
    tokenizer but it still indexes its contents. Only Field.NO does not
    index
    its field's contents.

    So, regardless of whether or not a particular document has a field
    tokenized while another document does not has nothing to do with
    whether or
    not the contents are indexed. As long as it's not Field.NO, they are.

    John G.

    -----Original Message-----
    From: DimitriD
    Sent: Friday, August 22, 2008 9:14 AM
    To: java-user@lucene.apache.org
    Subject: Field Question


    I am new to lucene. Here is my question. The document has fields.
    When I add
    a field to the document I can specify that field is Indexed,
    Tokenized,
    etc.. So the same field can be Tokenized in one document and be
    not-tokenized in another document. However the is a method
    IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that
    returns all
    index fields in the index. It seems like it assumes that FIELD
    should have
    the same attributes across all documents.
    Can anyoen explain it?

    --
    View this message in context:
    http://www.nabble.com/Field-Question-tp19108787p19108787.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • DimitriD at Aug 25, 2008 at 1:03 am
    John,

    Thanks for answering.
    So if I have a document "A "with field "foo" and the field has attribute
    Field.NO and
    I have document "B" with field "foo" and the field is Field.TOKENIZED.
    Will IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) return field
    "foo"?

    Thanks,

    Dimitri


    John Griffin-3 wrote:
    Dimitri,

    Field.TOKENIZED and Field.NO_NORMs send their field's contents
    through a tokenizer and make their contents indexed and therefore
    searchable. FIELD.UN_TOKENIZED does not send its field's contents through
    a
    tokenizer but it still indexes its contents. Only Field.NO does not index
    its field's contents.

    So, regardless of whether or not a particular document has a field
    tokenized while another document does not has nothing to do with whether
    or
    not the contents are indexed. As long as it's not Field.NO, they are.

    John G.

    -----Original Message-----
    From: DimitriD
    Sent: Friday, August 22, 2008 9:14 AM
    To: java-user@lucene.apache.org
    Subject: Field Question


    I am new to lucene. Here is my question. The document has fields. When I
    add
    a field to the document I can specify that field is Indexed, Tokenized,
    etc.. So the same field can be Tokenized in one document and be
    not-tokenized in another document. However the is a method
    IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that returns
    all
    index fields in the index. It seems like it assumes that FIELD should have
    the same attributes across all documents.
    Can anyoen explain it?

    --
    View this message in context:
    http://www.nabble.com/Field-Question-tp19108787p19108787.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RE%3A-Field-Question-tp19116394p19136508.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Aug 25, 2008 at 9:23 am
    I think you meant Field.Index.NO and Field.Index.TOKENIZED, for those
    two docs.

    The answer is yes -- Lucene considers the field "indexed" if ever any
    doc, even a single doc, had set Index.TOKENIZED or Index.UN_TOKENIZED
    for that field.

    However, your document A still will not have been indexed.

    Lucene has no global schema; it allows individual docs to have
    whatever fields, with whatever options, they want while indexing.
    However, when documents are indexed together into a segment, some
    properties, eg isIndexed, omitNorms, storeTermVectors, etc, are stored
    globally in the FieldInfos (*.fnm files) and thus must be "merged".
    EG, with omitNorms, only if every doc in the segment called omitNorms
    for field "foo" will norms actually not be stored.

    Mike

    DimitriD wrote:
    John,

    Thanks for answering.
    So if I have a document "A "with field "foo" and the field has
    attribute
    Field.NO and
    I have document "B" with field "foo" and the field is Field.TOKENIZED.
    Will IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED)
    return field
    "foo"?

    Thanks,

    Dimitri


    John Griffin-3 wrote:
    Dimitri,

    Field.TOKENIZED and Field.NO_NORMs send their field's contents
    through a tokenizer and make their contents indexed and therefore
    searchable. FIELD.UN_TOKENIZED does not send its field's contents
    through
    a
    tokenizer but it still indexes its contents. Only Field.NO does not
    index
    its field's contents.

    So, regardless of whether or not a particular document has a field
    tokenized while another document does not has nothing to do with
    whether
    or
    not the contents are indexed. As long as it's not Field.NO, they are.

    John G.

    -----Original Message-----
    From: DimitriD
    Sent: Friday, August 22, 2008 9:14 AM
    To: java-user@lucene.apache.org
    Subject: Field Question


    I am new to lucene. Here is my question. The document has fields.
    When I
    add
    a field to the document I can specify that field is Indexed,
    Tokenized,
    etc.. So the same field can be Tokenized in one document and be
    not-tokenized in another document. However the is a method
    IndexReader.getFieldNames(IndexReader.FieldOption.INDEXED) that
    returns
    all
    index fields in the index. It seems like it assumes that FIELD
    should have
    the same attributes across all documents.
    Can anyoen explain it?

    --
    View this message in context:
    http://www.nabble.com/Field-Question-tp19108787p19108787.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RE%3A-Field-Question-tp19116394p19136508.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 22, '08 at 3:17p
activeAug 25, '08 at 9:23a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase