On 3/24/2013 10:02 AM, Niran Fajemisin wrote:
We import about 1.5 million documents on a nightly basis using DIH. During this time, we need to ensure that all documents make it into index otherwise rollback on any errors; which DIH takes care of for us. We also disable autoCommit in DIH but instruct it to commit at the very end of the import. This is all done through configuration of the DIH config XML file and the command issued to the request handler.
We have noticed that the tlog file appears to linger around even after DIH has issued the hard commit. My expectation would be that after the hard commit has occurred, the tlog file will be removed. I'm obviously misunderstanding how this all works.
You've already gotten the reason for the giant tlog hanging around.
The way to actually fix this problem is to turn on autoCommit with one
of the values set relatively low. The key to enabling autoCommit
without changing anything about how your import process works is this:
make sure that openSearcher is set to false in the autoCommit:
I make maxDocs low rather than maxTime, but that's up to you. Each hard
commit done by autoCommit will create a new tlog, and each tlog will be
fairly small. Only a few of them will be kept around, so the disk space
requirement will be small, and restarting Solr will be fast because
there won't be a lot of data to replay.
With openSearcher set to false, there will be NO changes in document
visibility. Searches will continue using the old searcher, so the old
documents will still be there and the new documents will NOT be
searchable until DIH does its explicit commit at the end.
The one thing that I'm not sure about is what happens if Solr or the
machine crashes in the middle of the import. Complete rollback might
not be possible. Someone with better knowledge may have to comment there.