FAQ
Is using a QueryParser to parse a query using the same, single instance of
Analyzer thread-safe?
Or should I create a new Analyzer each time?



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Malcolm at Nov 1, 2005 at 4:02 pm
    Hi,
    I've been reading my new project bible 'Lucene in Action' about Analysis in
    Chapter 4 and wondered what others are doing for indexing XML(if anyone else
    is, that is!).
    Are you folks just writing your own or utilising the current Lucene analysis
    libraries?
    thanks,
    Malcolm Clark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Erik Hatcher at Nov 1, 2005 at 8:54 pm

    On 1 Nov 2005, at 11:02, Malcolm wrote:

    Hi,
    I've been reading my new project bible 'Lucene in Action'
    Amen! ;)
    about Analysis in Chapter 4 and wondered what others are doing for
    indexing XML(if anyone else is, that is!).
    Are you folks just writing your own or utilising the current Lucene
    analysis libraries?
    Analyzers are at a per-field granularity, and more than likely your
    XML data contains what you would want treated as multiple fields. So
    while an analyzer _could_ directly deal with XML, it really is
    unlikely to be the appropriate layer to do so. The majority of
    scenarios would have XML parsed separately and then the individual
    separated text fed to Lucene fields for analysis.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Peter Kim at Nov 1, 2005 at 4:10 pm
    Not exactly sure what you're asking with regards to Analyzers and
    parsing XML...

    But for parsing and indexing XML documents with Lucene, you can find a
    lot of material out there by searching the list archives and using
    google. However, the document I found most helpful was this piece
    written by Otis:

    http://www-128.ibm.com/developerworks/java/library/j-lucene/

    I primarily used that to get me started with using Digester and Lucene
    together to index XML documents. You can download Apache Commons
    Digester from here:

    http://jakarta.apache.org/commons/digester/

    Peter
    -----Original Message-----
    From: Malcolm
    Sent: Tuesday, November 01, 2005 11:02 AM
    To: [email protected]
    Subject: Analysis

    Hi,
    I've been reading my new project bible 'Lucene in Action'
    about Analysis in Chapter 4 and wondered what others are
    doing for indexing XML(if anyone else is, that is!).
    Are you folks just writing your own or utilising the current
    Lucene analysis libraries?
    thanks,
    Malcolm Clark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Malcolm at Nov 1, 2005 at 4:19 pm
    Hi,
    I'm just asking for opinions on Analyzer's for the indexing. For example
    Otis in his article uses the WhitespaceAnalyzer and the Sandbox program uses
    the StandardAnalyzer.I am just gauging opinions on the subject with regard
    to XML.
    I'm using a mix of the Sandbox XMLDocumentHandlerSAX and a bit extra. I
    originally started using Digester but found that I preferred the Sandbox
    implementation.
    Thanks,
    Malcolm Clark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Grant Ingersoll at Nov 1, 2005 at 4:39 pm
    Not sure I am understanding your question correctly, but I think you
    want to pick your Analyzer based on what is in your content (i.e.
    language, usage of special symbols, etc.), not based on what the format
    of your content is (i.e. XML).

    Malcolm wrote:
    Hi,
    I'm just asking for opinions on Analyzer's for the indexing. For
    example Otis in his article uses the WhitespaceAnalyzer and the
    Sandbox program uses the StandardAnalyzer.I am just gauging opinions
    on the subject with regard to XML.
    I'm using a mix of the Sandbox XMLDocumentHandlerSAX and a bit extra.
    I originally started using Digester but found that I preferred the
    Sandbox implementation.
    Thanks,
    Malcolm Clark

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    --
    -------------------------------------------------------------------
    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    337 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Peter Kim at Nov 1, 2005 at 4:33 pm
    Ok... just got confused because you mentioned XML. Unless you're
    actually indexing the raw XML in some of your fields, the fact that
    you're indexing XML documents as your source content is irrelevant to
    your choice of Analyzer.

    Choice of indexer really depends on your specific project requirements
    and what level of querying functionality your client needs. For example,
    I started off using the StandardAnalyzer because it incorporates some
    very useful and sophisticated functionality. But I found that it was
    removing many stop words that my client requested the ability to query
    with, so I will end up using my own custom analyzer class primarily
    based on the StandardAnalyzer but modifying the stop word list.


    -----Original Message-----
    From: Malcolm
    Sent: Tuesday, November 01, 2005 11:19 AM
    To: [email protected]
    Subject: Re: Analysis

    Hi,
    I'm just asking for opinions on Analyzer's for the indexing.
    For example Otis in his article uses the WhitespaceAnalyzer
    and the Sandbox program uses the StandardAnalyzer.I am just
    gauging opinions on the subject with regard to XML.
    I'm using a mix of the Sandbox XMLDocumentHandlerSAX and a
    bit extra. I originally started using Digester but found that
    I preferred the Sandbox implementation.
    Thanks,
    Malcolm Clark


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Malcolm at Nov 1, 2005 at 4:44 pm
    I'm currently indexing the INEX collection and then performing queries on
    the Format features within the text. I am using a wide range of the XML
    features. The reason I asked about the XML Analysis is I am interesting in
    opinions and reasons for adding a wide range of discussion to my
    dissertation regarding Lucene.
    Thanks,
    Malcolm


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 1, '05 at 3:46p
activeNov 1, '05 at 8:54p
posts8
users5
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase