On Friday 05 December 2003 10:45, Doug Cutting wrote:
Tatu Saloranta wrote:
Also, shouldn't there be at least 3 methods that take Readers; one for
Text-like handling, another for UnStored, and last for UnIndexed.
How do you store the contents of a Reader? You'd have to double-buffer
it, first reading it into a String to store, and then tokenizing the
StringReader. A key feature of Reader values is that they're streamed:
Not really, you can pass Reader to tokenizer, which then reads and tokenizes
directly (I think that's the way code also works). This because internally
String is read using StringReader, so passing a String looks more like a
the entire value is never in RAM. Storing a Reader value would remove
that advantage. The current API makes this explicit: when you want
something streamed, you pass in a Reader, when you're willing to have
the entire value in memory, pass in a String.
I guess for things that are both tokenized and stored, passing a Reader can't
really help a lot; if one wants to reduce mem usage, text needs to be read
twice, or analyzer needs to help in writing output; or, text needs to be read
in-memory much like what happens now. It'd simplify application code a bit,
but wouldn't do much more.
So.... I guess I need to downgrade my suggestion to require just 2
Reader-taking factory methods? :-)
I still think that index-only and store-only version would both make sense. In
latter case, storing could be done in fully streaming fashion; in former
tokenization can be done?
Yes, it is a bit confusing that Text(String, String) stores its value,
while Text(String, Reader) does not, but it is at least well documented.
And we cannot change it: that would break too many applications. But
we can put this on the list for Lucene 2.0 cleanups.
Yes, I understand that. It'd not be reasonable to do such a change. But how
about adding more intuitive factory method (UnStored(String, Reader))?
When I first wrote these static methods I meant for them to be
constructor-like. I wanted to have multiple Field(String, String)
constructors, but that's not possible, so I used capitalized static
methods instead. I've never seen anyone else do this (capitalize any
method but a real constructor) so I guess I didn't start a fad! This :-)
should someday too be cleaned up. Lucene was the first Java program
that I ever wrote, and thus its style is in places non-standard. Sorry.
Best standards are created by people doing things others use, follow or
imitate... so it was worth a try! :-)
-+ Tatu +-
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com