FAQ

[Solr-user] Export Index and Re-Index XML

Kalyan Kuram
Apr 22, 2013 at 11:08 pm
Hi AllI am new to solr and i wanted to know if i can export the Index as XML and then re-index back into Solr,The reason i need to do this is i misconfigured fieldtype and to make it work i need to re-index the content
Kalyan
reply

Search Discussions

6 responses

  • Shawn Heisey at Apr 22, 2013 at 11:48 pm

    On 4/22/2013 5:07 PM, Kalyan Kuram wrote:
    Hi All I am new to solr and i wanted to know if i can export the Index as XML and then re-index back into Solr, The reason i need to do this is i misconfigured fieldtype and to make it work i need to re-index the content
    The best option is to do the indexing again from whatever source you did
    the index from the first time. Because your requirements may change at
    any time, this is something that you should be prepared to do quite often.

    If you did not set all fields to stored="true" in your schema, then you
    will not be able to export all your documents from your current index to
    a new one. There is no way around this, you will have to wipe your
    index, go back to your original data source, and do the indexing again.

    If you DID store all your fields, then you have two choices.

    1) Use the dataimport handler with SolrEntityProcessor. You can use
    this to import from one core onto another core on the same server with a
    different config/schema, or from one server to another.

    http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

    2) I don't recommend this option, but it might work. You can query Solr
    for your docs, one page at a time (use the rows and start parameters),
    with wt=xml or wt=json, and save that output. With a little bit of
    modification, you can then use what you save as input for indexing.
    Here's a website describing the process and PHP script to make it
    easier. I have not checked to see whether the script actually works,
    and I won't be able to help you with it:

    http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/

    Thanks,
    Shawn
  • Jack Krupansky at Apr 22, 2013 at 11:55 pm
    Any fields which have stored values can be read and output, but
    indexed-only, non-stored fields cannot be read or exported. Even if they
    could be, their values are post-analysis, which means that there is a good
    chance that they cannot be run through term analysis again.

    It is always best to keep a copy of your raw source data separate from the
    data you add to Solr. Or, at least make sure any important data is "stored".

    In short, you need to model your data for "reindexing", which is a fact of
    life in Solr land.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Monday, April 22, 2013 7:07 PM
    To: solr-user@lucene.apache.org
    Subject: Export Index and Re-Index XML

    Hi AllI am new to solr and i wanted to know if i can export the Index as XML
    and then re-index back into Solr,The reason i need to do this is i
    misconfigured fieldtype and to make it work i need to re-index the content
    Kalyan
  • Kalyan Kuram at Apr 23, 2013 at 12:11 am
    Thank you all very much for your help.I do have field configured as stored and index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i think needed.I am trying to index the data from Adobe CQ and its a push based indexing and pain to index data from a very large repository.I think i can manage this with SolrEntityProcessor for now and will think of modelling data for re-indexing purposes
    Kalyan
    From: jac...@...com
    To: solr-user@lucene.apache.org
    Subject: Re: Export Index and Re-Index XML
    Date: Mon, 22 Apr 2013 19:54:26 -0400

    Any fields which have stored values can be read and output, but
    indexed-only, non-stored fields cannot be read or exported. Even if they
    could be, their values are post-analysis, which means that there is a good
    chance that they cannot be run through term analysis again.

    It is always best to keep a copy of your raw source data separate from the
    data you add to Solr. Or, at least make sure any important data is "stored".

    In short, you need to model your data for "reindexing", which is a fact of
    life in Solr land.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Monday, April 22, 2013 7:07 PM
    To: solr-user@lucene.apache.org
    Subject: Export Index and Re-Index XML

    Hi AllI am new to solr and i wanted to know if i can export the Index as XML
    and then re-index back into Solr,The reason i need to do this is i
    misconfigured fieldtype and to make it work i need to re-index the content
    Kalyan
  • Jan Høydahl at Apr 23, 2013 at 1:47 pm
    Hi,

    I have done this many times. First use a curl job or something to download the complete index as CSV

    q=*:*&rows=9999999&wt=csv

    Then use post.jar to push that csv into the new node.

    Alternatively you can query with XML and use xslt update request handler with parm tr=updateXml which is a stylesheet for indexing response XML directly.

    --
    Jan Høydahl, search solution architect
    Cominvent AS - www.cominvent.com
    Solr Training - www.solrtraining.com

    23. apr. 2013 kl. 02:11 skrev Kalyan Kuram <kal...@...com>:
    Thank you all very much for your help.I do have field configured as stored and index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i think needed.I am trying to index the data from Adobe CQ and its a push based indexing and pain to index data from a very large repository.I think i can manage this with SolrEntityProcessor for now and will think of modelling data for re-indexing purposes
    Kalyan
    From: jac...@...com
    To: solr-user@lucene.apache.org
    Subject: Re: Export Index and Re-Index XML
    Date: Mon, 22 Apr 2013 19:54:26 -0400

    Any fields which have stored values can be read and output, but
    indexed-only, non-stored fields cannot be read or exported. Even if they
    could be, their values are post-analysis, which means that there is a good
    chance that they cannot be run through term analysis again.

    It is always best to keep a copy of your raw source data separate from the
    data you add to Solr. Or, at least make sure any important data is "stored".

    In short, you need to model your data for "reindexing", which is a fact of
    life in Solr land.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Monday, April 22, 2013 7:07 PM
    To: solr-user@lucene.apache.org
    Subject: Export Index and Re-Index XML

    Hi AllI am new to solr and i wanted to know if i can export the Index as XML
    and then re-index back into Solr,The reason i need to do this is i
    misconfigured fieldtype and to make it work i need to re-index the content
    Kalyan
  • Kalyan Kuram at Apr 24, 2013 at 12:40 am
    Thanks for the help,i could successfully export the file as csv and import it into my local box successfully ,now i have a different problem i tried to re-index the content using post.sh anc chaging URL=http://dev-core-solr1:8983/solr/ZinioArticles/update/csv this is now i see this error
    Before this i deleted all documents and then tried to re-index .$ sh post.sh output1.csvPosting file output1.csv to http://dev-core-solr1.zinio.com:8983/solr/ZinioArticles/update/csv<?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"><int name="status">409</int><int name="QTime">19</int></lst><lst name="error"><str name="msg">version conflict for 100845239 expected=1432420345067864064 actual=-1</str><int name="code">409</int></lst></response>
    <?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst></response>

    Can somebody help me how to do this


    Subject: Re: Export Index and Re-Index XML
    From: jan...@...com
    Date: Tue, 23 Apr 2013 15:46:36 +0200
    To: solr-user@lucene.apache.org

    Hi,

    I have done this many times. First use a curl job or something to download the complete index as CSV

    q=*:*&rows=9999999&wt=csv

    Then use post.jar to push that csv into the new node.

    Alternatively you can query with XML and use xslt update request handler with parm tr=updateXml which is a stylesheet for indexing response XML directly.

    --
    Jan Høydahl, search solution architect
    Cominvent AS - www.cominvent.com
    Solr Training - www.solrtraining.com

    23. apr. 2013 kl. 02:11 skrev Kalyan Kuram <kal...@...com>:
    Thank you all very much for your help.I do have field configured as stored and index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i think needed.I am trying to index the data from Adobe CQ and its a push based indexing and pain to index data from a very large repository.I think i can manage this with SolrEntityProcessor for now and will think of modelling data for re-indexing purposes
    Kalyan
    From: jac...@...com
    To: solr-user@lucene.apache.org
    Subject: Re: Export Index and Re-Index XML
    Date: Mon, 22 Apr 2013 19:54:26 -0400

    Any fields which have stored values can be read and output, but
    indexed-only, non-stored fields cannot be read or exported. Even if they
    could be, their values are post-analysis, which means that there is a good
    chance that they cannot be run through term analysis again.

    It is always best to keep a copy of your raw source data separate from the
    data you add to Solr. Or, at least make sure any important data is "stored".

    In short, you need to model your data for "reindexing", which is a fact of
    life in Solr land.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Monday, April 22, 2013 7:07 PM
    To: solr-user@lucene.apache.org
    Subject: Export Index and Re-Index XML

    Hi AllI am new to solr and i wanted to know if i can export the Index as XML
    and then re-index back into Solr,The reason i need to do this is i
    misconfigured fieldtype and to make it work i need to re-index the content
    Kalyan
  • Jack Krupansky at Apr 24, 2013 at 12:54 am
    When you export, explicitly list only the fields that you normally specify
    when adding a document. So, exclude _version_, which Solr will add.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Tuesday, April 23, 2013 8:40 PM
    To: solr-user@lucene.apache.org
    Subject: RE: Export Index and Re-Index XML

    Thanks for the help,i could successfully export the file as csv and import
    it into my local box successfully ,now i have a different problem i tried
    to re-index the content using post.sh anc chaging
    URL=http://dev-core-solr1:8983/solr/ZinioArticles/update/csv this is now i
    see this error
    Before this i deleted all documents and then tried to re-index .$ sh post.sh
    output1.csvPosting file output1.csv to
    http://dev-core-solr1.zinio.com:8983/solr/ZinioArticles/update/csv<?xml
    version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"><int
    name="status">409</int><int name="QTime">19</int></lst><lst
    name="error"><str name="msg">version conflict for 100845239
    expected=1432420345067864064 actual=-1</str><int
    name="code">409</int></lst></response>
    <?xml version="1.0" encoding="UTF-8"?><response><lst
    name="responseHeader"><int name="status">0</int><int
    name="QTime">5</int></lst></response>

    Can somebody help me how to do this


    Subject: Re: Export Index and Re-Index XML
    From: jan...@...com
    Date: Tue, 23 Apr 2013 15:46:36 +0200
    To: solr-user@lucene.apache.org

    Hi,

    I have done this many times. First use a curl job or something to download
    the complete index as CSV

    q=*:*&rows=9999999&wt=csv

    Then use post.jar to push that csv into the new node.

    Alternatively you can query with XML and use xslt update request handler
    with parm tr=updateXml which is a stylesheet for indexing response XML
    directly.

    --
    Jan Høydahl, search solution architect
    Cominvent AS - www.cominvent.com
    Solr Training - www.solrtraining.com

    23. apr. 2013 kl. 02:11 skrev Kalyan Kuram <kal...@...com>:
    Thank you all very much for your help.I do have field configured as
    stored and index,i did read the FAQ from wiki,I think
    SolrEntityProcessor is what i think needed.I am trying to index the data
    from Adobe CQ and its a push based indexing and pain to index data from
    a very large repository.I think i can manage this with
    SolrEntityProcessor for now and will think of modelling data for
    re-indexing purposes
    Kalyan
    From: jac...@...com
    To: solr-user@lucene.apache.org
    Subject: Re: Export Index and Re-Index XML
    Date: Mon, 22 Apr 2013 19:54:26 -0400

    Any fields which have stored values can be read and output, but
    indexed-only, non-stored fields cannot be read or exported. Even if
    they
    could be, their values are post-analysis, which means that there is a
    good
    chance that they cannot be run through term analysis again.

    It is always best to keep a copy of your raw source data separate from
    the
    data you add to Solr. Or, at least make sure any important data is
    "stored".

    In short, you need to model your data for "reindexing", which is a fact
    of
    life in Solr land.

    -- Jack Krupansky

    -----Original Message-----
    From: Kalyan Kuram
    Sent: Monday, April 22, 2013 7:07 PM
    To: solr-user@lucene.apache.org
    Subject: Export Index and Re-Index XML

    Hi AllI am new to solr and i wanted to know if i can export the Index
    as XML
    and then re-index back into Solr,The reason i need to do this is i
    misconfigured fieldtype and to make it work i need to re-index the
    content
    Kalyan

Related Discussions

Discussion Navigation
viewthread | post