FAQ
Hi,

I have three questions about indexing:

1) I am indexing HTML documents, how can I do "stop
removal" before indexing, I dont want to index stop
words?

2) I can have an access to the terms in one document,
but how can I have access to the document name that
these terms has been appeared?

3) I want to find phrases at index level, e.x. find
frequency of phrases in the collection, also their
frequency in each document. How can I do it in Lucene,
is there any sample code?

Thanks




____________________________________________________________________________________
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
http://videogames.yahoo.com/platform?platform=120121

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Daniel Noll at Mar 23, 2007 at 5:14 am

    Maryam wrote:
    Hi,

    I have three questions about indexing:

    1) I am indexing HTML documents, how can I do "stop
    removal" before indexing, I dont want to index stop
    words?
    The same way you would do it for indexing text documents: StopFilter.
    2) I can have an access to the terms in one document,
    but how can I have access to the document name that
    these terms has been appeared?
    The usual way to do this is to store the document name as another field.

    Daniel



    --
    Daniel Noll

    Nuix Pty Ltd
    Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
    Web: http://nuix.com/ Fax: +61 2 9212 6902

    This message is intended only for the named recipient. If you are not
    the intended recipient you are notified that disclosing, copying,
    distributing or taking any action in reliance on the contents of this
    message or attachment is strictly prohibited.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 23, '07 at 3:14a
activeMar 23, '07 at 5:14a
posts2
users2
websitelucene.apache.org

2 users in discussion

Maryam: 1 post Daniel Noll: 1 post

People

Translate

site design / logo © 2022 Grokbase