Le 16 avr. 05, à 08:31, Pierrick Brihaye a écrit :
How do I search all the tokens with "chem" type token, such as H2O,
O2, etc? Any sample like this? If this approach doesn't work, what's
the best approach?
Nifty question... I'm working on indexing text with math formulae...
there may be similarities !
You may assign a type to the tokens, and then you may filter them
according to their type *but* the index forgets this info since it
stores *terms* (field/value pairs). [...]
1) use a dedicated field "chem" where only chemical content is allowed
(filter out every token whose type is different from "chem")
2) manipulate your termText : "chem_H2" ; the same for your queries
3) play with the query rather than with the index content : filter out
what is not chemical
So it really seems chem_H2 is the only choice, or ?
What's your requirements or expectations ?
- match a formula in the middle of a sentence ?
- or simply match documents that contain both the sentence's words and
the formula (in the latter case, I think solution 1 is valid)
- how would you do wildcards with formulae ?
A related question, at least for me, is how to match a+(b+1) when the
query is X+Y, ie. subtree cut.
Does this occur in chemical formulae as well?
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org