If it has an API that let's you get the content that needs to be
indexed, then, sure, you can index from the spider. If it doesn't
have an API, presumably, you would need to somehow extract the docs
from the files it builds. This is, of course, assuming it stores the
crawled files in some proprietary format. If it just downloads them
as the original files on your disk, then it is just like any indexing
problem of extracting the content from the files.
-Grant
On Jun 24, 2008, at 11:35 PM, yugana wrote:Hi Otis,
Thanks for the reply. So you mean it is not possible to use Lucene
to index
the fetched (Verity Spider Content) content.
Yug
Otis Gospodnetic wrote:
It sounds like you want to check out Nutch - fetched, indexer,
searcher,
etc. in one lovely package.
Otis
--
Sematext --
http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----From: yugana <yugandhar.adem@wipro.com>
To: java-user@lucene.apache.org
Sent: Tuesday, June 24, 2008 3:25:03 AM
Subject: Indexing the spider content
Hi All,
I am new to Lucene Search. Can you let me know if it is possible
to index
the "Verity Spider" content. If possible please let me know how to
create
a
index form it and search on it. Also share some code snippets on
how to
proceed on indexing and searching. I donot have much time to go
throught
the
Lucene API. Your help is appreciated.
Thanks,
Yug
--
View this message in context:
http://www.nabble.com/Indexing-the-spider-content-tp18085388p18085388.htmlSent from the Lucene - Java Users mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--
View this message in context:
http://www.nabble.com/Indexing-the-spider-content-tp18085388p18104348.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--------------------------
Grant Ingersoll
http://www.lucidimagination.comLucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformancehttp://wiki.apache.org/lucene-java/LuceneFAQ---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org