FAQ

[Solr-user] Updating documents

Vinicius Carvalho
Jul 11, 2012 at 2:58 pm
Hi there.

I was checking the faq and found that solr does not support field updates
right. So I assume that in order to update a document, one should first
retrieve it by its Id and then change the required field and update the doc
again. But then I wonder about fields that are indexed and not stored,
since the new document that is sent to the index does not have the values,
would this mean we will loose them?

BTW any chances we see field level updates on 4.0 like elastic search has?

Regards

--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.
reply

Search Discussions

12 responses

  • Jonatan Fournier at Jul 11, 2012 at 3:30 pm

    On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho wrote:
    Hi there.

    I was checking the faq and found that solr does not support field updates
    right. So I assume that in order to update a document, one should first
    retrieve it by its Id and then change the required field and update the doc
    again. But then I wonder about fields that are indexed and not stored,
    since the new document that is sent to the index does not have the values,
    would this mean we will loose them?

    BTW any chances we see field level updates on 4.0 like elastic search has?
    I'm actually also looking a this new feature in 4.0-ALPHA:

    http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

    I was wondering where the new xml tags where documented to support
    these "set", "add to multi-value" etc.

    --
    jonatan
    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.
  • Erick Erickson at Jul 12, 2012 at 3:05 pm
    Vinicius:

    No, fetching the document from the index, changing selected values and
    re-indexing probably
    won't work at all. The problem is that you only get _stored_ values
    back from Solr. So unless
    you've specified 'stored="true" ' for all your fields, you can't use
    the doc fetched from Solr to
    update a field.

    The partial documents update that Jonatan references also requires
    that all the fields be stored.

    You're best bet is to go back to your system-of-record for the data
    and re-index the whole
    document.

    Best
    Erick

    On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
    wrote:
    On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
    wrote:
    Hi there.

    I was checking the faq and found that solr does not support field updates
    right. So I assume that in order to update a document, one should first
    retrieve it by its Id and then change the required field and update the doc
    again. But then I wonder about fields that are indexed and not stored,
    since the new document that is sent to the index does not have the values,
    would this mean we will loose them?

    BTW any chances we see field level updates on 4.0 like elastic search has?
    I'm actually also looking a this new feature in 4.0-ALPHA:

    http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

    I was wondering where the new xml tags where documented to support
    these "set", "add to multi-value" etc.

    --
    jonatan
    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.
  • Jonatan Fournier at Jul 12, 2012 at 4:38 pm
    Erick,

    On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
    wrote:
    Vinicius:

    No, fetching the document from the index, changing selected values and
    re-indexing probably
    won't work at all. The problem is that you only get _stored_ values
    back from Solr. So unless
    you've specified 'stored="true" ' for all your fields, you can't use
    the doc fetched from Solr to
    update a field.

    The partial documents update that Jonatan references also requires
    that all the fields be stored.
    If my only fields with stored="false" are copyField (e.g. I don't need
    their content to rebuild the document), are they gonna be re-copied
    with the partial document update?

    --
    jonatan
    You're best bet is to go back to your system-of-record for the data
    and re-index the whole
    document.

    Best
    Erick

    On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
    wrote:
    On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
    wrote:
    Hi there.

    I was checking the faq and found that solr does not support field updates
    right. So I assume that in order to update a document, one should first
    retrieve it by its Id and then change the required field and update the doc
    again. But then I wonder about fields that are indexed and not stored,
    since the new document that is sent to the index does not have the values,
    would this mean we will loose them?

    BTW any chances we see field level updates on 4.0 like elastic search has?
    I'm actually also looking a this new feature in 4.0-ALPHA:

    http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

    I was wondering where the new xml tags where documented to support
    these "set", "add to multi-value" etc.

    --
    jonatan
    Regards

    --
    The intuitive mind is a sacred gift and the
    rational mind is a faithful servant. We have
    created a society that honors the servant and
    has forgotten the gift.
  • Yonik Seeley at Jul 12, 2012 at 4:52 pm

    On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier wrote:
    On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
    The partial documents update that Jonatan references also requires
    that all the fields be stored.
    If my only fields with stored="false" are copyField (e.g. I don't need
    their content to rebuild the document), are they gonna be re-copied
    with the partial document update?
    Correct - your setup should be fine. Only original source fields (non
    copyField targets) should have stored=true

    -Yonik
    http://lucidimagination.com
  • Jonatan Fournier at Jul 12, 2012 at 7:20 pm
    Yonik,

    On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
    wrote:
    On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
    wrote:
    On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
    The partial documents update that Jonatan references also requires
    that all the fields be stored.
    If my only fields with stored="false" are copyField (e.g. I don't need
    their content to rebuild the document), are they gonna be re-copied
    with the partial document update?
    Correct - your setup should be fine. Only original source fields (non
    copyField targets) should have stored=true
    Another question I had related to partial update...

    $ ./post.sh foo.json
    {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
    not found for update. id=foo","code":409}}

    Is there a flag for: if document does not exist, create it for me? The
    thing is that I don't know in advance if the document already exist
    (of course I could query first.. but I have millions of entry to
    process, might exist, might be an update I don't know...)

    My naive approach was to have in the same request two documents, one
    with only "set" using the unique ID, and then in the second one all
    the "add" (concerning multivalue field).

    So it would do the following:

    1. Document (with id) exist or not don't care, use the following "set"
    command to update/create
    2. 2nd pass, I know you exist (with above id), please add all those to
    the multivalue fields (none of those fields are in the initial
    updates)

    My rationale is that if the document exists, reset some fields, and
    then append the multivalue fields (those multivalue fields express
    historical updates)

    The reason I created 2 documents is that Solr doesn't seem happy if I
    mix set and add in the same document :)
  • Yonik Seeley at Jul 13, 2012 at 4:57 am

    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier wrote:
    Is there a flag for: if document does not exist, create it for me?
    Not currently, but it certainly makes sense.
    The implementation should be easy. The most difficult part is figuring
    out the best syntax to specify this.

    Another idea: we could possibly switch to create-if-not-exist by
    default, and use the existing optimistic concurrency mechanism to
    specify that the document should exist.

    So specify _version_=1 if the document should exist and _version_=0
    (the default) if you don't care.

    -Yonik
    http://lucidimagination.com
  • Jonatan Fournier at Jul 13, 2012 at 5:41 pm

    On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley wrote:
    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
    wrote:
    Is there a flag for: if document does not exist, create it for me?
    Not currently, but it certainly makes sense.
    The implementation should be easy. The most difficult part is figuring
    out the best syntax to specify this.

    Another idea: we could possibly switch to create-if-not-exist by
    default, and use the existing optimistic concurrency mechanism to
    specify that the document should exist.

    So specify _version_=1 if the document should exist and _version_=0
    (the default) if you don't care.
    Yes that would be neat!

    One more question related to partial document update. So far I'm able
    to append to multivalue fields, set new value to regular/multivalue
    fields. One thing I didn't find is the "remove" command, what is its
    JSON syntax?

    Thanks,
  • Yonik Seeley at Jul 13, 2012 at 5:43 pm

    On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier wrote:
    On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
    wrote:
    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
    wrote:
    Is there a flag for: if document does not exist, create it for me?
    Not currently, but it certainly makes sense.
    The implementation should be easy. The most difficult part is figuring
    out the best syntax to specify this.

    Another idea: we could possibly switch to create-if-not-exist by
    default, and use the existing optimistic concurrency mechanism to
    specify that the document should exist.

    So specify _version_=1 if the document should exist and _version_=0
    (the default) if you don't care.
    Yes that would be neat!
    I've just committed this change.
    One more question related to partial document update. So far I'm able
    to append to multivalue fields, set new value to regular/multivalue
    fields. One thing I didn't find is the "remove" command, what is its
    JSON syntax?
    Set it to the JSON value of null.

    -Yonik
    http://lucidimagination.com
  • Jonatan Fournier at Jul 13, 2012 at 5:52 pm

    On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley wrote:
    On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
    wrote:
    On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
    wrote:
    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
    wrote:
    Is there a flag for: if document does not exist, create it for me?
    Not currently, but it certainly makes sense.
    The implementation should be easy. The most difficult part is figuring
    out the best syntax to specify this.

    Another idea: we could possibly switch to create-if-not-exist by
    default, and use the existing optimistic concurrency mechanism to
    specify that the document should exist.

    So specify _version_=1 if the document should exist and _version_=0
    (the default) if you don't care.
    Yes that would be neat!
    I've just committed this change.
    Super thanks! I assume it will end up in the 4.0 release?
    One more question related to partial document update. So far I'm able
    to append to multivalue fields, set new value to regular/multivalue
    fields. One thing I didn't find is the "remove" command, what is its
    JSON syntax?
    Set it to the JSON value of null.

    -Yonik
    http://lucidimagination.com
  • Yonik Seeley at Jul 13, 2012 at 6:11 pm

    I've just committed this change.
    Super thanks! I assume it will end up in the 4.0 release?
    Yep!

    -Yonik
    http://lucidimagination.com
  • Jonatan Fournier at Jul 13, 2012 at 7:50 pm

    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier wrote:
    Yonik,

    On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
    wrote:
    On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
    wrote:
    On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
    The partial documents update that Jonatan references also requires
    that all the fields be stored.
    If my only fields with stored="false" are copyField (e.g. I don't need
    their content to rebuild the document), are they gonna be re-copied
    with the partial document update?
    Correct - your setup should be fine. Only original source fields (non
    copyField targets) should have stored=true
    Another question I had related to partial update...

    $ ./post.sh foo.json
    {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
    not found for update. id=foo","code":409}}

    Is there a flag for: if document does not exist, create it for me? The
    thing is that I don't know in advance if the document already exist
    (of course I could query first.. but I have millions of entry to
    process, might exist, might be an update I don't know...)

    My naive approach was to have in the same request two documents, one
    with only "set" using the unique ID, and then in the second one all
    the "add" (concerning multivalue field).

    So it would do the following:

    1. Document (with id) exist or not don't care, use the following "set"
    command to update/create
    2. 2nd pass, I know you exist (with above id), please add all those to
    the multivalue fields (none of those fields are in the initial
    updates)

    My rationale is that if the document exists, reset some fields, and
    then append the multivalue fields (those multivalue fields express
    historical updates)
    Probably silly mistake on my side, but I don't seem to get the
    "append/add" JSON syntax right for multiValue fields...

    On my document initial creation it works great with

    ...
    "mv_f":"cat1",
    "mv_f":"cat2",
    ...

    But later on when I want to "append" cat3 to the field by doing this:

    "mv_f":{"add":"cat3"},
    ...

    I end up with something like this in the index:

    "mv_f":["{add=cat3}"],

    Obviously something is wrong with my syntax ;)

    --
    jonatan
    The reason I created 2 documents is that Solr doesn't seem happy if I
    mix set and add in the same document :)

    --
    jonatan
  • Yonik Seeley at Jul 13, 2012 at 8:36 pm

    On Fri, Jul 13, 2012 at 3:50 PM, Jonatan Fournier wrote:
    On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
    wrote:
    But later on when I want to "append" cat3 to the field by doing this:

    "mv_f":{"add":"cat3"},
    ...

    I end up with something like this in the index:

    "mv_f":["{add=cat3}"],

    Obviously something is wrong with my syntax ;)
    Are you using a custom update processor chain? The
    DistributedUpdateProcessor currently contains the logic for optimistic
    concurrency and updates.
    If you're not already, try some test commands with the stock server.

    If you are already using the stock server, then perhaps you're not
    sending what you think you are to Solr?

    -Yonik
    http://lucidimagination.com

Related Discussions

Discussion Navigation
viewthread | post