FAQ
Thanks for the detailed response sujit. UIMA, especially looks like an
interesting option.

On 3/24/11 3:57 PM, "Sujit Pal" wrote:

I don't know if there is already an analyzer available for this, but you
could use GATE or UIMA for Named Entity Extraction against names and
expand the query to include the extra names that are used synonymously.
You could do this outside Lucene or inline using a custom Lucene
tokenizer that embeds either a GATE or UIMA NER.

If you go the custom route (and you are not familiar with GATE or UIMA),
you may want to take a look at Dr Manu Konchady's book on Lingpipe,
Lucene and GATE - there is code in there to embed a GATE NER into a
Lucene tokenizer (although its not a streaming tokenizer due to the
nature of the NER process). The process would be similar for embedding a
UIMA NER.

GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
== William, Bob == Robert, Tom == Thomas, etc) which you can leverage
with GATE's Jape rule language. Alternatively, you could use the same
data from UIMA using a custom analysis engine (I prefer this route
because this is all Java, easier learning curve and maintainability).

-sujit
On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
Hi,

I would like to build a search system where a search for "Dan" would
also search for "Daniel" and a search for "Will", "William" . Any ideas
on how to go about implementing that? I can think of writing a custom
Analyzer that would map these partial tokens to their full firstname or
lastnames. But is there an Analyzer in lucene contrib modules or
elsewhere that does a similar job for me?

Thanks,
Deepak Konidena.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMar 24, '11 at 6:32p
activeMar 25, '11 at 2:16p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase