I think a better approach might be a specialized class
ConcurrentBitSet designed for read lots, writes little (or just make
a query not check deletes once it is started).

The ConcurrentHashMap in the JDK is a basis for the implementation.

Then, a SegmentDeletes that extends it (with IO functions) would
complete the story.
On Jun 26, 2008, at 10:25 AM, Todd Feak (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1316?
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:

I wanted to share my micro load test results with you, to make sure
you all understand scale of the bottleneck as we are experiencing it.

For an optimized index with 4700+ documents (ie small), a NOT query
varies by a factor of 35 under heavy load. Using 2.3.0 release I
got 20 tps. With the volatile/synchronized fix suggested, I got 700
tps. The limiting factor on the 700 tps was the CPU on the computer
throwing load, so this may be even worse. Furthermore, the more
documents that exist in the index, the worse this may get, as it
synchonizes on every single iteration through the loop.

An argument can be made that this is just a necessary evil, and
that we *must* synchronize on this for the possibility of updates
during reads. I have 2 questions regarding that.

1. What is the cost of a dirty read in that case? Is it stale data,
incorrect data, or a corrupted system?
2. What is more prevalent in a production system. Indexes with no
deletes, indexes with *some* deletes, or indexes with frequent

Do we need to have 1 class that does it all, or should we consider
2 different implementation for 2 different uses. What about a read-
only SegmentReader for read-only slaves in production environments?

Avoidable synchronization bottleneck in MatchAlldocsQuery

Key: LUCENE-1316
URL: https://issues.apache.org/jira/browse/
Project: Lucene - Java
Issue Type: Bug
Components: Query/Scoring
Affects Versions: 2.3
Environment: All
Reporter: Todd Feak
Priority: Minor
Attachments: MatchAllDocsQuery.java

Original Estimate: 1h
Remaining Estimate: 1h

The isDeleted() method on IndexReader has been mentioned a number
of times as a potential synchronization bottleneck. However, the
reason this bottleneck occurs is actually at a higher level that
wasn't focused on (at least in the threads I read).
In every case I saw where a stack trace was provided to show the
lock/block, higher in the stack you see the MatchAllScorer.next()
method. In Solr paricularly, this scorer is used for "NOT"
queries. We saw incredibly poor performance (order of magnitude)
on our load tests for NOT queries, due to this bottleneck. The
problem is that every single document is run through this isDeleted
() method, which is synchronized. Having an optimized index
exacerbates this issues, as there is only a single SegmentReader
to synchronize on, causing a major thread pileup waiting for the
By simply having the MatchAllScorer see if there have been any
deletions in the reader, much of this can be avoided. Especially
in a read-only environment for production where you have slaves
doing all the high load searching.
I modified line 67 in the MatchAllDocsQuery
if (!reader.isDeleted(id)) {
if (!reader.hasDeletions() || !reader.isDeleted(id)) {
In our micro load test for NOT queries only, this was a major
performance improvement. We also got the same query results. I
don't believe this will improve the situation for indexes that
have deletions.
Please consider making this adjustment for a future bug fix release.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 20 of 33 | next ›
Discussion Overview
groupjava-dev @
postedJun 25, '08 at 3:59p
activeJun 28, '08 at 12:58a

2 users in discussion

Hoss Man (JIRA): 32 posts Robert engels: 1 post



site design / logo © 2021 Grokbase