Hello
I'm using lucene & nutch, but I don't now witch type of field of documents
are created by nutch, I developed this program in java :
Directory dir = FSDirectory.open(new File("C:/Users/MyWebPage/index"));
IndexSearcher search = new IndexSearcher(dir);
int numberDoc = search.maxDoc();
System.out.println("number of doc "+ numberDoc);
for (int i=0;i<numberDoc;i++)
{
System.out.println("Document numero "+i);
Document doc = search.doc(i);
for (Fieldable f :doc.getFields())
{
for (Fieldable ff: doc.getFieldables(f.name()))
{
System.out.println("\t"+ff.name()+"
"+ff.stringValue());
}
System.out.println("*******************");
}
I'have this result
....
number of doc 1907
Document numero 0
title Convention and Visitors Office
segment 20110502142927
boost 0.13529637
digest d07c6f19b2efaa8739754e9e9ff75fcc
tstamp 20110502122931566
url http://ar.info.com/
.....
...
Document numero 90
title Who are we? - Presentation of the Paris Convention Bureau
segment 20110502144050
boost 0.0016601664
digest 62ee8c0ff6c2ab7c91599f3c3ff18735
tstamp 20110502125316832
url http://convention.info.com/en/about-us/
my question is :
what's segment, boost, digest, tstamp and how can I read it
thanks for your help