FAQ
I have a base implementation of lazy field loading that I am starting to
test and wanted to run my approach by everyone to hear their thoughts.

I have, as per Doug's suggestion from a while ago, created an interface
named Fieldable that is implemented by Field and a new, private class,
owned by FieldsReader. I have introduced an "enumerated" type to the
Field class named LazyLoad (which can be YES or NO, in the same spirit
as Field.TermVector). Any place that used to take Field now takes
Fieldable. This should be completely transparent and
backward-compatible. The existing constructors of field all assume lazy
to be off.

On creation of a Field, a user can pass in LazyLoad.YES or NO to a
constructor that takes either a String value or a byte array (it does
not apply to the Reader constructors since they do not store their
content). Indexing and writing of fields take place as normal, the only
difference being there is an extra bit added to the field writing that
marks the field as being lazy.

On reading in of the field, if it is Lazy, instead of reading in the
value for the field and constructing a Field, construct a LazyField
instance which takes in the pointer of the fieldsStream and the amount
of data to read. This instance, since it is a private class of
FieldsReader, maintains access to the fieldsStream. Thus, when a
application goes to access the value of the field, we check to see if it
is has been loaded or not. If it has not, we load it using the
fieldsStream, the pointer and the length to read.

Does anyone see any issues with this? I think it will only really pay
off on large stored fields, but have not quantified it yet. My main
concern is the semantics of the fieldsStream and whether that would be
closed behind the back of the LazyField implementation. My
understanding is that as long as the IndexReader is open, this stream
should also be open. Is that correct? What am I forgetting about?

If testing goes well, I should be able to button this up this week or
next and submit the patch.

--

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244

http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Mar 29, 2006 at 1:43 pm
    Lazy loaded fields will be a nice addition to Lucene. I'm curious
    why the flag is set at indexing time rather than it being something
    that is controlled during retrieval somehow. I'm not sure what that
    API would look like, but it seems its a decision to be addressed
    during searching and reading of an index rather than during indexing
    itself.

    Erik

    On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:

    I have a base implementation of lazy field loading that I am
    starting to test and wanted to run my approach by everyone to hear
    their thoughts.

    I have, as per Doug's suggestion from a while ago, created an
    interface named Fieldable that is implemented by Field and a new,
    private class, owned by FieldsReader. I have introduced an
    "enumerated" type to the Field class named LazyLoad (which can be
    YES or NO, in the same spirit as Field.TermVector). Any place that
    used to take Field now takes Fieldable. This should be completely
    transparent and backward-compatible. The existing constructors of
    field all assume lazy to be off.

    On creation of a Field, a user can pass in LazyLoad.YES or NO to a
    constructor that takes either a String value or a byte array (it
    does not apply to the Reader constructors since they do not store
    their content). Indexing and writing of fields take place as
    normal, the only difference being there is an extra bit added to
    the field writing that marks the field as being lazy.

    On reading in of the field, if it is Lazy, instead of reading in
    the value for the field and constructing a Field, construct a
    LazyField instance which takes in the pointer of the fieldsStream
    and the amount of data to read. This instance, since it is a
    private class of FieldsReader, maintains access to the
    fieldsStream. Thus, when a application goes to access the value of
    the field, we check to see if it is has been loaded or not. If it
    has not, we load it using the fieldsStream, the pointer and the
    length to read.

    Does anyone see any issues with this? I think it will only really
    pay off on large stored fields, but have not quantified it yet. My
    main concern is the semantics of the fieldsStream and whether that
    would be closed behind the back of the LazyField implementation.
    My understanding is that as long as the IndexReader is open, this
    stream should also be open. Is that correct? What am I
    forgetting about?

    If testing goes well, I should be able to button this up this week
    or next and submit the patch.

    --

    Grant Ingersoll Sr. Software Engineer Center for Natural Language
    Processing Syracuse University School of Information Studies 335
    Hinds Hall Syracuse, NY 13244
    http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Mar 29, 2006 at 2:05 pm
    Hmmm, I guess I always thought of it as a property of the field that
    user's would want to explicitly control. I assumed that most fields
    would not be lazy and a few would be.
    Now that you have backed me up a bit on it (in a good way), I think it
    could just as easily be a parameter that any field that is over a
    specified size would be lazily loaded. With this approach, I could see:

    IndexReader.document(int docNumber, long maxFieldSizeToLoad);

    and IndexReader.document(int docNum) would just call this new method
    passing in some default value, say 2K or something.

    Or, we could pass in an array of field names to be lazily loaded to,
    something like

    IndexReader.document(int docNumber, String [] fieldNamesToLoadLazy);

    The current way I have it looks something like (with a few other
    variations):
    public Field(String name, String value, Store store, Index index,
    LazyLoad lazy)
    and
    public Field(String name, byte[] value, Store store, LazyLoad lazy)

    for field constructors.

    I am happy to do either way since the underlying mechanics are pretty
    similar. What do others think?

    -Grant

    Erik Hatcher wrote:
    Lazy loaded fields will be a nice addition to Lucene. I'm curious
    why the flag is set at indexing time rather than it being something
    that is controlled during retrieval somehow. I'm not sure what that
    API would look like, but it seems its a decision to be addressed
    during searching and reading of an index rather than during indexing
    itself.

    Erik

    On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:

    I have a base implementation of lazy field loading that I am starting
    to test and wanted to run my approach by everyone to hear their
    thoughts.

    I have, as per Doug's suggestion from a while ago, created an
    interface named Fieldable that is implemented by Field and a new,
    private class, owned by FieldsReader. I have introduced an
    "enumerated" type to the Field class named LazyLoad (which can be YES
    or NO, in the same spirit as Field.TermVector). Any place that used
    to take Field now takes Fieldable. This should be completely
    transparent and backward-compatible. The existing constructors of
    field all assume lazy to be off.

    On creation of a Field, a user can pass in LazyLoad.YES or NO to a
    constructor that takes either a String value or a byte array (it does
    not apply to the Reader constructors since they do not store their
    content). Indexing and writing of fields take place as normal, the
    only difference being there is an extra bit added to the field
    writing that marks the field as being lazy.

    On reading in of the field, if it is Lazy, instead of reading in the
    value for the field and constructing a Field, construct a LazyField
    instance which takes in the pointer of the fieldsStream and the
    amount of data to read. This instance, since it is a private class
    of FieldsReader, maintains access to the fieldsStream. Thus, when a
    application goes to access the value of the field, we check to see if
    it is has been loaded or not. If it has not, we load it using the
    fieldsStream, the pointer and the length to read.

    Does anyone see any issues with this? I think it will only really
    pay off on large stored fields, but have not quantified it yet. My
    main concern is the semantics of the fieldsStream and whether that
    would be closed behind the back of the LazyField implementation. My
    understanding is that as long as the IndexReader is open, this stream
    should also be open. Is that correct? What am I forgetting about?

    If testing goes well, I should be able to button this up this week or
    next and submit the patch.

    --
    Grant Ingersoll Sr. Software Engineer Center for Natural Language
    Processing Syracuse University School of Information Studies 335
    Hinds Hall Syracuse, NY 13244
    http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Donovan Aaron at Mar 29, 2006 at 2:14 pm
    I've done a lot of work with Verity's search engine, and I like the way
    they handle fields. At query time you specify the fields you want
    returned from matching documents.

    Aaron

    -----Original Message-----
    From: Grant Ingersoll
    Sent: Wednesday, March 29, 2006 9:05 AM
    To: java-dev@lucene.apache.org
    Subject: Re: Lazy Field Loading

    Hmmm, I guess I always thought of it as a property of the field that
    user's would want to explicitly control. I assumed that most fields
    would not be lazy and a few would be.
    Now that you have backed me up a bit on it (in a good way), I think it
    could just as easily be a parameter that any field that is over a
    specified size would be lazily loaded. With this approach, I could see:

    IndexReader.document(int docNumber, long maxFieldSizeToLoad);

    and IndexReader.document(int docNum) would just call this new method
    passing in some default value, say 2K or something.

    Or, we could pass in an array of field names to be lazily loaded to,
    something like

    IndexReader.document(int docNumber, String [] fieldNamesToLoadLazy);

    The current way I have it looks something like (with a few other
    variations):
    public Field(String name, String value, Store store, Index index,
    LazyLoad lazy) and public Field(String name, byte[] value, Store store,
    LazyLoad lazy)

    for field constructors.

    I am happy to do either way since the underlying mechanics are pretty
    similar. What do others think?

    -Grant

    Erik Hatcher wrote:
    Lazy loaded fields will be a nice addition to Lucene. I'm curious
    why the flag is set at indexing time rather than it being something
    that is controlled during retrieval somehow. I'm not sure what that
    API would look like, but it seems its a decision to be addressed
    during searching and reading of an index rather than during indexing
    itself.

    Erik

    On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:

    I have a base implementation of lazy field loading that I am starting
    to test and wanted to run my approach by everyone to hear their
    thoughts.

    I have, as per Doug's suggestion from a while ago, created an
    interface named Fieldable that is implemented by Field and a new,
    private class, owned by FieldsReader. I have introduced an
    "enumerated" type to the Field class named LazyLoad (which can be YES
    or NO, in the same spirit as Field.TermVector). Any place that used
    to take Field now takes Fieldable. This should be completely
    transparent and backward-compatible. The existing constructors of
    field all assume lazy to be off.

    On creation of a Field, a user can pass in LazyLoad.YES or NO to a
    constructor that takes either a String value or a byte array (it does
    not apply to the Reader constructors since they do not store their
    content). Indexing and writing of fields take place as normal, the
    only difference being there is an extra bit added to the field
    writing that marks the field as being lazy.

    On reading in of the field, if it is Lazy, instead of reading in the
    value for the field and constructing a Field, construct a LazyField
    instance which takes in the pointer of the fieldsStream and the
    amount of data to read. This instance, since it is a private class
    of FieldsReader, maintains access to the fieldsStream. Thus, when a
    application goes to access the value of the field, we check to see if
    it is has been loaded or not. If it has not, we load it using the
    fieldsStream, the pointer and the length to read.

    Does anyone see any issues with this? I think it will only really
    pay off on large stored fields, but have not quantified it yet. My
    main concern is the semantics of the fieldsStream and whether that
    would be closed behind the back of the LazyField implementation. My
    understanding is that as long as the IndexReader is open, this stream
    should also be open. Is that correct? What am I forgetting about?

    If testing goes well, I should be able to button this up this week or
    next and submit the patch.

    --
    Grant Ingersoll Sr. Software Engineer Center for Natural Language
    Processing Syracuse University School of Information Studies 335
    Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice:
    315-443-5484 Fax: 315-443-6886

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Mar 29, 2006 at 6:53 pm
    Of course, another option is to make all fields lazy all the time and
    the user never even needs to think about it. Need some strategy for
    when the IndexReader gets closed, but we have this in all cases.


    Donovan Aaron wrote:
    I've done a lot of work with Verity's search engine, and I like the way
    they handle fields. At query time you specify the fields you want
    returned from matching documents.

    Aaron

    -----Original Message-----
    From: Grant Ingersoll
    Sent: Wednesday, March 29, 2006 9:05 AM
    To: java-dev@lucene.apache.org
    Subject: Re: Lazy Field Loading

    Hmmm, I guess I always thought of it as a property of the field that
    user's would want to explicitly control. I assumed that most fields
    would not be lazy and a few would be.
    Now that you have backed me up a bit on it (in a good way), I think it
    could just as easily be a parameter that any field that is over a
    specified size would be lazily loaded. With this approach, I could see:

    IndexReader.document(int docNumber, long maxFieldSizeToLoad);

    and IndexReader.document(int docNum) would just call this new method
    passing in some default value, say 2K or something.

    Or, we could pass in an array of field names to be lazily loaded to,
    something like

    IndexReader.document(int docNumber, String [] fieldNamesToLoadLazy);

    The current way I have it looks something like (with a few other
    variations):
    public Field(String name, String value, Store store, Index index,
    LazyLoad lazy) and public Field(String name, byte[] value, Store store,
    LazyLoad lazy)

    for field constructors.

    I am happy to do either way since the underlying mechanics are pretty
    similar. What do others think?

    -Grant

    Erik Hatcher wrote:
    Lazy loaded fields will be a nice addition to Lucene. I'm curious
    why the flag is set at indexing time rather than it being something
    that is controlled during retrieval somehow. I'm not sure what that
    API would look like, but it seems its a decision to be addressed
    during searching and reading of an index rather than during indexing
    itself.

    Erik


    On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:

    I have a base implementation of lazy field loading that I am starting
    to test and wanted to run my approach by everyone to hear their
    thoughts.

    I have, as per Doug's suggestion from a while ago, created an
    interface named Fieldable that is implemented by Field and a new,
    private class, owned by FieldsReader. I have introduced an
    "enumerated" type to the Field class named LazyLoad (which can be YES
    or NO, in the same spirit as Field.TermVector). Any place that used
    to take Field now takes Fieldable. This should be completely
    transparent and backward-compatible. The existing constructors of
    field all assume lazy to be off.

    On creation of a Field, a user can pass in LazyLoad.YES or NO to a
    constructor that takes either a String value or a byte array (it does
    not apply to the Reader constructors since they do not store their
    content). Indexing and writing of fields take place as normal, the
    only difference being there is an extra bit added to the field
    writing that marks the field as being lazy.

    On reading in of the field, if it is Lazy, instead of reading in the
    value for the field and constructing a Field, construct a LazyField
    instance which takes in the pointer of the fieldsStream and the
    amount of data to read. This instance, since it is a private class
    of FieldsReader, maintains access to the fieldsStream. Thus, when a
    application goes to access the value of the field, we check to see if
    it is has been loaded or not. If it has not, we load it using the
    fieldsStream, the pointer and the length to read.

    Does anyone see any issues with this? I think it will only really
    pay off on large stored fields, but have not quantified it yet. My
    main concern is the semantics of the fieldsStream and whether that
    would be closed behind the back of the LazyField implementation. My
    understanding is that as long as the IndexReader is open, this stream
    should also be open. Is that correct? What am I forgetting about?

    If testing goes well, I should be able to button this up this week or
    next and submit the patch.

    --
    Grant Ingersoll Sr. Software Engineer Center for Natural Language
    Processing Syracuse University School of Information Studies 335
    Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice:
    315-443-5484 Fax: 315-443-6886

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Doug Cutting at Mar 29, 2006 at 8:16 pm

    Grant Ingersoll wrote:
    My main
    concern is the semantics of the fieldsStream and whether that would be
    closed behind the back of the LazyField implementation. My
    understanding is that as long as the IndexReader is open, this stream
    should also be open. Is that correct? What am I forgetting about?
    You need to make sure that access to the stream is synchronized, so that
    one thread doesn't move the file pointer while someone else is reading.
    You could use a cloned stream in a ThreadLocal to avoid contention.

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Mar 31, 2006 at 12:21 pm
    OK, how about a vote on this.

    I see several ways of implementing the front end to this:

    1. Declarative: On construction of a Document, you declare the Field to
    be Lazy.

    2. Implicit: All fields are Lazy

    3. Size of Field. Pass into IndexReader.document() the size of the
    field above which it will be lazily loaded. A default size can also be
    used.

    4. By Field name. Pass in the names of the Fields that you want loaded
    lazily.

    Thanks,
    Grant

    Doug Cutting wrote:
    Grant Ingersoll wrote:
    My main concern is the semantics of the fieldsStream and whether that
    would be closed behind the back of the LazyField implementation. My
    understanding is that as long as the IndexReader is open, this stream
    should also be open. Is that correct? What am I forgetting about?
    You need to make sure that access to the stream is synchronized, so
    that one thread doesn't move the file pointer while someone else is
    reading. You could use a cloned stream in a ThreadLocal to avoid
    contention.

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark harwood at Mar 31, 2006 at 1:10 pm
    I'd prefer option 4.
    Users should expect to provide some form of guidance
    to the engine about how they are going to access the
    data if it is expected to be retrieved efficiently.

    Preferably this choice of field loading policy should
    NOT be "baked in" at index time because index access
    patterns can vary (ruling out options 1 and 3)

    I think option 4, the reader.document(int docid,
    String[]fields) approach is a reasonable option and is
    analogous to the "select a,b" part of a SQL statement.

    It seems to be the most flexible and is not likely to
    be seen as an unnecessary burden by end users familiar
    with SQL. We should also have a "select *" equivalent
    for those uninterested in being selective.

    I suspect your option "2" (all fields are implicitly
    lazy) could have a hard time second-guessing how
    people are accessing the docs?


    Cheers
    Mark


    Send instant messages to your online friends http://uk.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Mar 31, 2006 at 1:30 pm

    mark harwood wrote:
    Preferably this choice of field loading policy should
    NOT be "baked in" at index time because index access
    patterns can vary (ruling out options 1 and 3)
    I don't think option 3 is baked in at indexing time. I think it would
    look like:
    IndexReader.document(int docNumber, long maxFieldSizeToLoad);
    and IndexReader.document(int docNum) would just call this new method
    passing in some default value, say 2K or something

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark harwood at Mar 31, 2006 at 2:32 pm
    I don't think option 3 is baked in at indexing time.
    Sorry, I misread it. Yes, that is another option.

    So if options 3 and 4 are about search-time selection
    (based on size and fieldname respectively) can they be
    generalized into a more wide-reaching retrieval API?

    You can imagine a high-level retrieval language like
    this:

    Select url, length(contents), substring(descr,0,50)

    ..where we have 3 items being returned. The first item
    (url) is a straight copy of the original field data,
    the second is the size in bytes of the "contents"
    field and the third is a summary of the "descr" field
    (in this case a simple substring but conceivably could
    be a more sophisticated summarizer eg the highlighter)

    If you think of each of these as retrieval functions
    we have an API that looks something like this:

    IndexReader.document(int doc,
    RetrieveFunction []retrievers);

    interface RetreiveFunction {
    Object getValue(FieldMetaData f);
    }

    interface FieldMetaData
    {
    String getFieldName()
    int getSize();
    InputStream getInputStream();
    }

    The reader calls the retrievers with a FieldMetaData
    object for each field and the data is only loaded from
    disk if a retrievefunction "bites" and asks for the
    InputStream to get the content for a field.
    You can imagine the different RetrieveFunction
    implementations could then not only choose which
    fields are returned but also how much content and in
    what format.

    I'm not sure if there is a sufficently long list of
    different retriever functions that would make this a
    useful approach.


    Cheers
    Mark

    Send instant messages to your online friends http://uk.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Erik Hatcher at Mar 31, 2006 at 1:30 pm
    I prefer option #4 myself. Also note that a similar issue with
    patches exists within JIRA:

    <https://issues.apache.org:443/jira/browse/LUCENE-509>

    Erik

    On Mar 31, 2006, at 7:21 AM, Grant Ingersoll wrote:

    OK, how about a vote on this.

    I see several ways of implementing the front end to this:

    1. Declarative: On construction of a Document, you declare the
    Field to be Lazy.

    2. Implicit: All fields are Lazy

    3. Size of Field. Pass into IndexReader.document() the size of the
    field above which it will be lazily loaded. A default size can
    also be used.

    4. By Field name. Pass in the names of the Fields that you want
    loaded lazily.

    Thanks,
    Grant

    Doug Cutting wrote:
    Grant Ingersoll wrote:
    My main concern is the semantics of the fieldsStream and whether
    that would be closed behind the back of the LazyField
    implementation. My understanding is that as long as the
    IndexReader is open, this stream should also be open. Is that
    correct? What am I forgetting about?
    You need to make sure that access to the stream is synchronized,
    so that one thread doesn't move the file pointer while someone
    else is reading. You could use a cloned stream in a ThreadLocal
    to avoid contention.

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    --

    Grant Ingersoll Sr. Software Engineer Center for Natural Language
    Processing Syracuse University School of Information Studies 335
    Hinds Hall Syracuse, NY 13244
    http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Mar 31, 2006 at 2:25 pm

    On 3/31/06, Erik Hatcher wrote:
    I prefer option #4 myself. Also note that a similar issue with
    patches exists within JIRA:

    <https://issues.apache.org:443/jira/browse/LUCENE-509>
    Yes, I'd personally find a way to retrieve just fields x,y, and z more
    useful than lazy loading.
    It seems like lazy loading could be useful if you do something with
    field values that is conditional on the value of other fields... a
    case I haven't run into.

    -Yonik
    http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Andrzej Bialecki at Mar 31, 2006 at 2:49 pm

    Yonik Seeley wrote:
    On 3/31/06, Erik Hatcher wrote:

    I prefer option #4 myself. Also note that a similar issue with
    patches exists within JIRA:

    <https://issues.apache.org:443/jira/browse/LUCENE-509>
    Yes, I'd personally find a way to retrieve just fields x,y, and z more
    useful than lazy loading.
    It seems like lazy loading could be useful if you do something with
    field values that is conditional on the value of other fields... a
    case I haven't run into.
    Use cases in Nutch would also indicate that #4 is the most convenient
    option, and rule out options #1 and #3 (and perhaps #2 due to
    efficiency). Various fields from Lucene indexes are used for e.g.
    sorting, where sorting field is selected by users during run time. Some
    field values help with Hits presentation, while other values should only
    be retrieved when requesting all hit's "metadata" - again, using the
    same index. So, option #4 would be really useful.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Mar 31, 2006 at 8:43 pm
    OK, #4 it is. I will most likely submit a patch this weekend.


    Andrzej Bialecki wrote:
    Yonik Seeley wrote:
    On 3/31/06, Erik Hatcher wrote:

    I prefer option #4 myself. Also note that a similar issue with
    patches exists within JIRA:

    <https://issues.apache.org:443/jira/browse/LUCENE-509>
    Yes, I'd personally find a way to retrieve just fields x,y, and z more
    useful than lazy loading.
    It seems like lazy loading could be useful if you do something with
    field values that is conditional on the value of other fields... a
    case I haven't run into.
    Use cases in Nutch would also indicate that #4 is the most convenient
    option, and rule out options #1 and #3 (and perhaps #2 due to
    efficiency). Various fields from Lucene indexes are used for e.g.
    sorting, where sorting field is selected by users during run time.
    Some field values help with Hits presentation, while other values
    should only be retrieved when requesting all hit's "metadata" - again,
    using the same index. So, option #4 would be really useful.
    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Related Discussions

    Discussion Navigation
    viewthread | post
    Discussion Overview
    groupjava-dev @
    categorieslucene
    postedMar 29, '06 at 1:32p
    activeMar 31, '06 at 8:43p
    posts14
    users7
    websitelucene.apache.org

    People

    Translate

    site design / logo © 2021 Grokbase