I got lost.

Short version:
Is it possible to index tons of files, execute a query
for word 'foo'.
Look at *each* hit in the 10 best files and receive
some meta information?

Extended version:
I have html like files, which I want to index with

<tag2 attr1=a attr2=b> foo </tag2>
<tag2 attr1=c attr2=d> bar </tag2>
<tag2 attr1=e attr2=e> foo </tag2>

<tag2 attr1=a attr2=d> foo </tag2>
<tag2 attr1=c attr2=d> bar </tag2>

How can I build the index, that if I search for 'foo',

all corresponding attributes for each hit are returned
from Lucene, but the
ranking is calculated over the files:
FileA a b foo [site ranking value for FileA]
FileA e e foo [site ranking value for FileA]
FileB a d foo [site ranking value for FileB]

First I tried to instance an Document with two fields
for each file. On field for the filename,
the other one with the tokenized file content:

doc.add(new Field("filename", "FileA");
doc.add(new Field("content", new FileReader(FileA));

Then, the ranking is fine (each file has it own
value), but who can I find now the
specific hits 'foo' in the file with the corresponding

Second I tried to add a Document for each word in the

doc.add(new Field("filename", "FileA"))
doc.add(new Field("attr1", "a"))
doc.add(new Field("attr1", "b"))
doc.add(new Field("content", "foo"))

Now, a query returns each hit of 'foo', with the
corresponding attributes, but the ranking is
calculated about each hit, and not of the whole file.

What is the Lucene way to solve my problem?


Best regards,

Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail: http://mail.yahoo.de

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupjava-user @
postedJan 18, '07 at 2:12p
activeJan 18, '07 at 2:12p

1 user in discussion

Tomas Fischer: 1 post



site design / logo © 2022 Grokbase