FAQ
Is there an easy way to retrieve a collection of fields (or field names) that are analyzed/tokenized from any given index?

Jordon
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Dec 23, 2010 at 9:31 pm
    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick
    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit wrote:

    Is there an easy way to retrieve a collection of fields (or field names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jordon Saardchit at Dec 23, 2010 at 9:44 pm
    Yes I have, and after testing each of the various options denoted in IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed (analyzed), and unstored. I figured this would be relatively easy to do and I was simply overlooking something. Is it perhaps not possible to do this?

    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick
    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit wrote:

    Is there an easy way to retrieve a collection of fields (or field names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Dec 23, 2010 at 10:51 pm
    Ah, you didn't mention indexed but unstored in your original message,
    just indexed/analyzed....

    I don't think you can (someone jump in here if I'm wrong, please). The
    problem
    is that Lucene doesn't require any sort of schema. So if you are perfectly
    free to
    store a field in one document and NOT store it in another. All the variants
    specified in IndexReader.fieldOption can quickly be determined by just
    looking at the
    various index files. But you'd have to spin through all the #documents# in
    order
    to answer the question "is this field ever stored?". Sounds like a table
    scan in the
    DB world.

    I don't think Lucene keeps meta-data for this, and spinning through all the
    documents
    would be expensive...

    Why do you want to know? Perhaps there's another way to satisfy the
    use-case.

    I could be way off base here, I'm speaking from general principles not
    knowledge of
    the code...

    Best
    Erick
    On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit wrote:

    Yes I have, and after testing each of the various options denoted in
    IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
    (analyzed), and unstored. I figured this would be relatively easy to do and
    I was simply overlooking something. Is it perhaps not possible to do this?

    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick

    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Is there an easy way to retrieve a collection of fields (or field names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jordon Saardchit at Dec 23, 2010 at 11:33 pm
    The basic use case is determiniation of rules in regards to building a query. I've got an application that programmatically builds queries (without any pre existing knowledge of the contents of the index it is searching). We have a custom designed analyzer and filter chain. However, it is applied to certain fields at index time. The fields it is applied to are unstored.

    On the search side, I want to be able to determine at runtime which field the analyzer should be applied to, and which field not to. I could be approaching the solution incorrectly, but I figured this would be a pretty common or natural use case.

    Jordon
    On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:

    Ah, you didn't mention indexed but unstored in your original message,
    just indexed/analyzed....

    I don't think you can (someone jump in here if I'm wrong, please). The
    problem
    is that Lucene doesn't require any sort of schema. So if you are perfectly
    free to
    store a field in one document and NOT store it in another. All the variants
    specified in IndexReader.fieldOption can quickly be determined by just
    looking at the
    various index files. But you'd have to spin through all the #documents# in
    order
    to answer the question "is this field ever stored?". Sounds like a table
    scan in the
    DB world.

    I don't think Lucene keeps meta-data for this, and spinning through all the
    documents
    would be expensive...

    Why do you want to know? Perhaps there's another way to satisfy the
    use-case.

    I could be way off base here, I'm speaking from general principles not
    knowledge of
    the code...

    Best
    Erick
    On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit wrote:

    Yes I have, and after testing each of the various options denoted in
    IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
    (analyzed), and unstored. I figured this would be relatively easy to do and
    I was simply overlooking something. Is it perhaps not possible to do this?

    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick

    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Is there an easy way to retrieve a collection of fields (or field names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Dec 24, 2010 at 2:00 am
    I guess I'm missing the point. The fact that it is stored is irrelevant for
    searching. Stored
    fields really only govern whether Document.getField("fieldname") returns
    anything #after#
    the search. You can find out if a field is stored-only by asking
    IndexReader.getFields
    for UNINDEXED, and you can search on anything that is INDEXED.

    So if, say, you're creating a drop-down with a selection of fields to choose
    from, you
    should be able to get the list by looking for INDEXED.

    But somewhere you've got to insure that the analyzers used at index time are
    identical
    or compatible with those used at query time. If all you're concerned is
    building up a string
    like "+text:stuff +title:nonsense" and handing that off to the app that
    knows how the index
    was built (so it can use the right analyzers for the text and title fields
    when parsing the input)
    looking for INDEXED should be fine.

    If you're #only# using your custom analyzer for searchable fields, it's
    fine because any INDEXED
    field can use the your custom analyzer.

    But if you use different analyzers for different searchable fields, there's
    no way I know of to
    analyze an index and answer the question "what analyzer was this field
    created with",
    that knowledge is built a-priori into the app as far as I know.


    Best
    Erick

    On Thu, Dec 23, 2010 at 6:32 PM, Jordon Saardchit wrote:

    The basic use case is determiniation of rules in regards to building a
    query. I've got an application that programmatically builds queries
    (without any pre existing knowledge of the contents of the index it is
    searching). We have a custom designed analyzer and filter chain. However,
    it is applied to certain fields at index time. The fields it is applied to
    are unstored.

    On the search side, I want to be able to determine at runtime which field
    the analyzer should be applied to, and which field not to. I could be
    approaching the solution incorrectly, but I figured this would be a pretty
    common or natural use case.

    Jordon
    On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:

    Ah, you didn't mention indexed but unstored in your original message,
    just indexed/analyzed....

    I don't think you can (someone jump in here if I'm wrong, please). The
    problem
    is that Lucene doesn't require any sort of schema. So if you are perfectly
    free to
    store a field in one document and NOT store it in another. All the variants
    specified in IndexReader.fieldOption can quickly be determined by just
    looking at the
    various index files. But you'd have to spin through all the #documents# in
    order
    to answer the question "is this field ever stored?". Sounds like a table
    scan in the
    DB world.

    I don't think Lucene keeps meta-data for this, and spinning through all the
    documents
    would be expensive...

    Why do you want to know? Perhaps there's another way to satisfy the
    use-case.

    I could be way off base here, I'm speaking from general principles not
    knowledge of
    the code...

    Best
    Erick

    On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Yes I have, and after testing each of the various options denoted in
    IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
    (analyzed), and unstored. I figured this would be relatively easy to do
    and
    I was simply overlooking something. Is it perhaps not possible to do
    this?
    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick

    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Is there an easy way to retrieve a collection of fields (or field
    names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jordon Saardchit at Dec 24, 2010 at 6:16 pm
    Heh, yes, all stuff I know. My question was if an index contained any meta data which revealed whether or not a certain indexed field had been analyzed or not, which I think you are saying it does not.

    Our searching and indexing is isolated into 2 completely seperate packages which can be deployed independantly of each other. The only common dependency (obviously) is the index itself. That being said, I was trying to determine from the search runtime if the given fieldname/input pair should be analyzed or not when building the query without having any knowledge of how the index was created.

    Jordon
    On Dec 23, 2010, at 5:59 PM, Erick Erickson wrote:

    I guess I'm missing the point. The fact that it is stored is irrelevant for
    searching. Stored
    fields really only govern whether Document.getField("fieldname") returns
    anything #after#
    the search. You can find out if a field is stored-only by asking
    IndexReader.getFields
    for UNINDEXED, and you can search on anything that is INDEXED.

    So if, say, you're creating a drop-down with a selection of fields to choose
    from, you
    should be able to get the list by looking for INDEXED.

    But somewhere you've got to insure that the analyzers used at index time are
    identical
    or compatible with those used at query time. If all you're concerned is
    building up a string
    like "+text:stuff +title:nonsense" and handing that off to the app that
    knows how the index
    was built (so it can use the right analyzers for the text and title fields
    when parsing the input)
    looking for INDEXED should be fine.

    If you're #only# using your custom analyzer for searchable fields, it's
    fine because any INDEXED
    field can use the your custom analyzer.

    But if you use different analyzers for different searchable fields, there's
    no way I know of to
    analyze an index and answer the question "what analyzer was this field
    created with",
    that knowledge is built a-priori into the app as far as I know.


    Best
    Erick

    On Thu, Dec 23, 2010 at 6:32 PM, Jordon Saardchit wrote:

    The basic use case is determiniation of rules in regards to building a
    query. I've got an application that programmatically builds queries
    (without any pre existing knowledge of the contents of the index it is
    searching). We have a custom designed analyzer and filter chain. However,
    it is applied to certain fields at index time. The fields it is applied to
    are unstored.

    On the search side, I want to be able to determine at runtime which field
    the analyzer should be applied to, and which field not to. I could be
    approaching the solution incorrectly, but I figured this would be a pretty
    common or natural use case.

    Jordon
    On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:

    Ah, you didn't mention indexed but unstored in your original message,
    just indexed/analyzed....

    I don't think you can (someone jump in here if I'm wrong, please). The
    problem
    is that Lucene doesn't require any sort of schema. So if you are perfectly
    free to
    store a field in one document and NOT store it in another. All the variants
    specified in IndexReader.fieldOption can quickly be determined by just
    looking at the
    various index files. But you'd have to spin through all the #documents# in
    order
    to answer the question "is this field ever stored?". Sounds like a table
    scan in the
    DB world.

    I don't think Lucene keeps meta-data for this, and spinning through all the
    documents
    would be expensive...

    Why do you want to know? Perhaps there's another way to satisfy the
    use-case.

    I could be way off base here, I'm speaking from general principles not
    knowledge of
    the code...

    Best
    Erick

    On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Yes I have, and after testing each of the various options denoted in
    IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
    (analyzed), and unstored. I figured this would be relatively easy to do
    and
    I was simply overlooking something. Is it perhaps not possible to do
    this?
    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick

    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Is there an easy way to retrieve a collection of fields (or field
    names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Dec 24, 2010 at 8:18 pm
    Well, not to my knowledge. In fact there's no guarantee that the #same#
    index
    has the #same# analyzer used on the #same# field in different documents, so
    I don't
    see how there could be a robust implementation of what you want.

    You could populate a field with a particular analyzer (or none at all),
    close your writer and open another with any other random analyzer (or
    none at all) for the same field and Lucene wouldn't complain.

    Solr handles this with the schema file. I guess you could abstract the
    field definitions into a library and use the library in both apps, but
    otherwise
    the apps have to "just know".

    Best
    Erick
    On Fri, Dec 24, 2010 at 1:16 PM, Jordon Saardchit wrote:

    Heh, yes, all stuff I know. My question was if an index contained any meta
    data which revealed whether or not a certain indexed field had been analyzed
    or not, which I think you are saying it does not.

    Our searching and indexing is isolated into 2 completely seperate packages
    which can be deployed independantly of each other. The only common
    dependency (obviously) is the index itself. That being said, I was trying
    to determine from the search runtime if the given fieldname/input pair
    should be analyzed or not when building the query without having any
    knowledge of how the index was created.

    Jordon
    On Dec 23, 2010, at 5:59 PM, Erick Erickson wrote:

    I guess I'm missing the point. The fact that it is stored is irrelevant for
    searching. Stored
    fields really only govern whether Document.getField("fieldname") returns
    anything #after#
    the search. You can find out if a field is stored-only by asking
    IndexReader.getFields
    for UNINDEXED, and you can search on anything that is INDEXED.

    So if, say, you're creating a drop-down with a selection of fields to choose
    from, you
    should be able to get the list by looking for INDEXED.

    But somewhere you've got to insure that the analyzers used at index time are
    identical
    or compatible with those used at query time. If all you're concerned is
    building up a string
    like "+text:stuff +title:nonsense" and handing that off to the app that
    knows how the index
    was built (so it can use the right analyzers for the text and title fields
    when parsing the input)
    looking for INDEXED should be fine.

    If you're #only# using your custom analyzer for searchable fields, it's
    fine because any INDEXED
    field can use the your custom analyzer.

    But if you use different analyzers for different searchable fields, there's
    no way I know of to
    analyze an index and answer the question "what analyzer was this field
    created with",
    that knowledge is built a-priori into the app as far as I know.


    Best
    Erick


    On Thu, Dec 23, 2010 at 6:32 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    The basic use case is determiniation of rules in regards to building a
    query. I've got an application that programmatically builds queries
    (without any pre existing knowledge of the contents of the index it is
    searching). We have a custom designed analyzer and filter chain.
    However,
    it is applied to certain fields at index time. The fields it is applied
    to
    are unstored.

    On the search side, I want to be able to determine at runtime which
    field
    the analyzer should be applied to, and which field not to. I could be
    approaching the solution incorrectly, but I figured this would be a
    pretty
    common or natural use case.

    Jordon
    On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:

    Ah, you didn't mention indexed but unstored in your original message,
    just indexed/analyzed....

    I don't think you can (someone jump in here if I'm wrong, please). The
    problem
    is that Lucene doesn't require any sort of schema. So if you are perfectly
    free to
    store a field in one document and NOT store it in another. All the variants
    specified in IndexReader.fieldOption can quickly be determined by just
    looking at the
    various index files. But you'd have to spin through all the #documents# in
    order
    to answer the question "is this field ever stored?". Sounds like a
    table
    scan in the
    DB world.

    I don't think Lucene keeps meta-data for this, and spinning through all the
    documents
    would be expensive...

    Why do you want to know? Perhaps there's another way to satisfy the
    use-case.

    I could be way off base here, I'm speaking from general principles not
    knowledge of
    the code...

    Best
    Erick

    On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit <jsaardchit@go2.com
    wrote:
    Yes I have, and after testing each of the various options denoted in
    IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
    (analyzed), and unstored. I figured this would be relatively easy to
    do
    and
    I was simply overlooking something. Is it perhaps not possible to do
    this?
    Jordon
    On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:

    Have you looked at IndexReader.getFieldNames()?

    Best
    Erick

    On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <
    jsaardchit@go2.com
    wrote:
    Is there an easy way to retrieve a collection of fields (or field
    names)
    that are analyzed/tokenized from any given index?

    Jordon
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 23, '10 at 8:24p
activeDec 24, '10 at 8:18p
posts8
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase