Ok, here is my problem. We have created a .NET implementation of Lucene using
the .NET Library for Lucene. We were previously only bringing in content
from the database and indexing it. We now want to keep that in place and use
Nutch or some web crawler to go out and get content from the list of urls in
our database.

Keep in mind here, I do not want different indexes for the web crawler
content, I want this to be in the same document as the database content that
it matches the url with. This would mean that I would probably want to take
the data that the web crawler gets back and store it in the relational
database. Then when I index the contents of the database, I would bring out
the content of the web crawler as well. Has anyone done this, if so, please
tell me in as much description how you did it. My goal here is to not have
to move to Nutch as the search engine and indexer, the goal is to use the
web crawler, gather its contents, then index it normally with Lucene.

I realize that Nutch is using Lucene's API, we want to keep in .NET, because
we have written additional things that we do not want to have to rewrite in

Thanks in advance,
View this message in context: http://www.nabble.com/Using-Nutch-database-with-Lucene-tf3185238.html#a8840535
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
postedFeb 7, '07 at 6:28a
activeFeb 7, '07 at 6:28a

1 user in discussion

Mmoser: 1 post



site design / logo © 2022 Grokbase