FAQ
Hey everyone, I'm trying to figure out the best way to get lucene to detect
concatenated words in a body of copy or a URL.

I've got a few scenarios I'm trying to handle. Many times in source code
and URL's, several words are concatenated together to create a meaningful
string ie. UserRegistrationService.cs and www.worldbestwebsites.com. I
would like index these as User Registration Service cs and www world best
websites com etc. I'm not expecting an easy answer, but would like to know
how the community at large is dealing with these types of scenarios.

Thanks,

Thomas

Search Discussions

  • Digy digy at Apr 4, 2011 at 6:48 am
    Instead of splitting the token into meaningful words, you may want to try to
    use the SingleCharTokenAnalyzer in contrib.
    It allows %text% searches.
    (
    http://svn.apache.org/viewvc/incubator/lucene.net/trunk/C%23/contrib/Contrib.Net/Contrib.Net/Analysis/Ext/Analysis.Ext.cs
    )

    given "www.worldbestwebsites.com", you can search "world", "best", "web",
    "website", "bestweb" or "stwebsi" :) etc.

    DIGY
    On Mon, Apr 4, 2011 at 7:15 AM, Thomas Rankin wrote:

    Hey everyone, I'm trying to figure out the best way to get lucene to detect
    concatenated words in a body of copy or a URL.

    I've got a few scenarios I'm trying to handle. Many times in source code
    and URL's, several words are concatenated together to create a meaningful
    string ie. UserRegistrationService.cs and www.worldbestwebsites.com. I
    would like index these as User Registration Service cs and www world best
    websites com etc. I'm not expecting an easy answer, but would like to know
    how the community at large is dealing with these types of scenarios.

    Thanks,

    Thomas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedApr 4, '11 at 4:16a
activeApr 4, '11 at 6:48a
posts2
users2
websitelucene.apache.org

2 users in discussion

Digy digy: 1 post Thomas Rankin: 1 post

People

Translate

site design / logo © 2022 Grokbase