First, using Solr as a repository is pretty risky. I would keep the official copy of the data in a database, not in Solr.

Second, you can’t “migrate tables” because Solr doesn’t have tables. You need to turn the tables into documents, then index the documents. It can take a lot of joins to flatten a relational schema into Solr documents.

Solr does not support schema migration, so yes, you will need to save off all the documents, then reload them. I would save them to files. It makes no sense to put them in another copy of Solr.

Changing the schema will be difficult and time-consuming, but you’ll probably run into much worse problems trying to use Solr as a repository.

Walter Underwood
http://observer.wunderwood.org/ (my blog)

On Jun 9, 2016, at 8:50 AM, Hui Liu wrote:


We are porting an application currently hosted in Oracle 11g to Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in Solr, index them, and build search tools on top of this; the goal is we won't be using Oracle at all after this has been implemented; every fields in Solr will have 'stored=true' and selectively a subset of searchable fields will have 'indexed=true'; the question is what steps we should follow if we need to re-index a collection after making some schema changes - mostly we only add new fields to store, or make a non-indexed field as indexed, we normally do not delete or rename any existing fields; according to this url: https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 'intermediate' Solr1 to only store the data themselves without any indexing, then have another Solr2 setup to store the indexed data, and in case of re-index, just delete all the documents in Solr2 for the collection and re-import data from Solr1 into Solr2 using SolrEntityProcessor (from dataimport handler)? Is this still the recommended approach? I can see the downside of this approach is if we have tremendous amount of data for a collection (some of our collection could have several billions of documents), re-import it from Solr1 to Solr2 may take a few hours or even days, and during this time, users cannot query the data, is there any better way to do this and avoid this type of down time? Any feedback is appreciated!

Hui Liu
Opentext, Inc.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 9 | next ›
Discussion Overview
groupsolr-user @
postedJun 9, '16 at 3:51p
activeJun 10, '16 at 8:15p

2 users in discussion

Hui Liu: 5 posts Walter Underwood: 4 posts



site design / logo © 2019 Grokbase