FAQ
Hi all,

I'm experiencing a performance degradation after migrating to 2.9 and running
some tests. I'm getting out of ideas and any help to identify the reasons why
2.9 is slower than 2.4 are highly appreciated.

We've had some issues with custom sorting in lucene 2.4.1. We worked around them
by sorting the resultsets manually and caching the results after sorting (memory
consuming but fast).

I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
implementation for sorting and refactored all deprecated api calls to the new
lucene 2.9 api.

Everything works fine from a functional perspective. But performance severly is
(negatively) affected by lucene 2.9.

I profiled the application for a couple of hours, build a jmeter load test and
compared the following scenarios:

1. lucene 2.9 - new api
2. lucene 2.9 - old api and custom sorting after lucene
3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)

Please find attached an rrd graph showing the results. The lighter the color the
faster the request has been served. y=# requests, x=time.

Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
response times and therefore throughput (see results of testcase 2 and 3).
Adapting to the new api decreased performance again. The difference between
testcase 1 and 2 is most probably due to precached custom sorted results.

The application under test is a dedicated lucene search engine doing nothing
else, but serving search requests. We're running a cluster of them in prd and
it's incredibly fast. With the old implementation and prd traffic we've above
98% of the requests served in 200ms.
The index under test contains about 3 million documents (with lots of fields),
consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
the linux kernel.

Most interesting methods used for searching are:

getHitsCount (is there a way to speed this up?):

public int getHitsCount(String query, Filter filter) throws
LuceneServiceException {
log.debug("getHitsCount('{}, {}')", query, filter);
if (StringUtils.isBlank(query)) {
log.warn("getHitsCount: empty lucene query");
return 0;
}
long startTimeMillis = System.currentTimeMillis();
int count = 0;

if (indexSearcher == null) {
return 0;
}

BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
Query q = null;
try {
q = createQuery(query);
TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
indexSearcher.search(q, filter, tsdc);
count = tsdc.getTotalHits();
log.info("getHitsCount: count = {}",count);
} catch (ParseException ex) {
throw new LuceneServiceException("invalid lucene query:" + query, ex);
} catch (IOException e) {
throw new LuceneServiceException(" indexSearcher could be corrupted", e);
} finally {
long durationMillis = System.currentTimeMillis() - startTimeMillis;
if (durationMillis > slowQueryLimit) {
log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
}
log.debug("getHitsCount: query took {} ms", durationMillis);
}
return count;
}

search:
public List<Document> search(String query, Filter filter, Sort sort, int from,
int size) throws LuceneServiceException {
log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
filter, sort, from, size });
long startTimeMillis = System.currentTimeMillis();

List<Document> docs = new ArrayList<Document>();
if (indexSearcher == null) {
return docs;
}
Query q = null;
try {
if (query == null) {
log.warn("search: lucene query is null...");
return docs;
}
q = createQuery(query);
BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
if (size < 0 || size > maxNumHits) {
// set hard limit for numHits
size = maxNumHits;
if (log.isDebugEnabled())
log.debug("search: Size set to hardlimit: {} for query: {} with filter:
{}", new Object[] { size, query, filter });
}
TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
true, false, false, true);
indexSearcher.search(q, filter, collector);
if(size > collector.getTotalHits())
size = collector.getTotalHits();
if (size > 100000)
log.info("search: size: {} bigger than 100.000 for query: {} with filter:
{}", new Object[] { size, query, filter });
TopDocs td = collector.topDocs(from, size);
ScoreDoc[] scoreDocs = td.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
docs.add(indexSearcher.doc(scoreDoc.doc));
}
} catch (ParseException e) {
log.warn("search: ParseException: {}", e.getMessage());
if (log.isDebugEnabled())
log.warn("search: ParseException: ", e);
return Collections.emptyList();
} catch (IOException e) {
log.warn("search: IOException: ", e);
return Collections.emptyList();
} finally {
long durationMillis = System.currentTimeMillis() - startTimeMillis;
if (durationMillis > slowQueryLimit) {
log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
new Object[] { durationMillis, query,
indexSearcher.getIndexReader().directory() });
}
log.debug("search: query took {} ms", durationMillis);
}
return docs;
}

I'm wondering why others are experiencing better performance with 2.9 and why
our implementations performance is going bad. Maybe our way of using the 2.9 api
is not the best and sorting is definetly expensive.

Any ideas are appreciated. I'm torning out my hair since hours and days to find
the root cause. Also hints how I could find the bottlenecks myself are appreciated.

Cheers,
Thomas

--
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone: +49 211 97020-195
Fax: +49 211 97020-949
Mobile: +49 173 5146567 (private)
E-Mail: mailto:thomas.becker@net-m.de
Internet: http://www.net-m.de

Registergericht: Amtsgericht Düsseldorf, HRB 48022
Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates: Dr. Michael Briem


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Thomas Becker at Sep 15, 2009 at 1:31 pm
    Missed the attachment, sorry.

    Thomas Becker wrote:
    Hi all,

    I'm experiencing a performance degradation after migrating to 2.9 and running
    some tests. I'm getting out of ideas and any help to identify the reasons why
    2.9 is slower than 2.4 are highly appreciated.

    We've had some issues with custom sorting in lucene 2.4.1. We worked around them
    by sorting the resultsets manually and caching the results after sorting (memory
    consuming but fast).

    I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
    implementation for sorting and refactored all deprecated api calls to the new
    lucene 2.9 api.

    Everything works fine from a functional perspective. But performance severly is
    (negatively) affected by lucene 2.9.

    I profiled the application for a couple of hours, build a jmeter load test and
    compared the following scenarios:

    1. lucene 2.9 - new api
    2. lucene 2.9 - old api and custom sorting after lucene
    3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)

    Please find attached an rrd graph showing the results. The lighter the color the
    faster the request has been served. y=# requests, x=time.

    Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
    response times and therefore throughput (see results of testcase 2 and 3).
    Adapting to the new api decreased performance again. The difference between
    testcase 1 and 2 is most probably due to precached custom sorted results.

    The application under test is a dedicated lucene search engine doing nothing
    else, but serving search requests. We're running a cluster of them in prd and
    it's incredibly fast. With the old implementation and prd traffic we've above
    98% of the requests served in 200ms.
    The index under test contains about 3 million documents (with lots of fields),
    consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
    the linux kernel.

    Most interesting methods used for searching are:

    getHitsCount (is there a way to speed this up?):

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    search:
    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }

    I'm wondering why others are experiencing better performance with 2.9 and why
    our implementations performance is going bad. Maybe our way of using the 2.9 api
    is not the best and sorting is definetly expensive.

    Any ideas are appreciated. I'm torning out my hair since hours and days to find
    the root cause. Also hints how I could find the bottlenecks myself are appreciated.

    Cheers,
    Thomas
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem
  • Thomas Becker at Sep 15, 2009 at 1:39 pm
    Urm and uploaded here:
    http://ankeschwarzer.de/tmp/graph.jpg

    Sorry.

    Thomas Becker wrote:
    Missed the attachment, sorry.

    Thomas Becker wrote:
    Hi all,

    I'm experiencing a performance degradation after migrating to 2.9 and running
    some tests. I'm getting out of ideas and any help to identify the reasons why
    2.9 is slower than 2.4 are highly appreciated.

    We've had some issues with custom sorting in lucene 2.4.1. We worked around them
    by sorting the resultsets manually and caching the results after sorting (memory
    consuming but fast).

    I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
    implementation for sorting and refactored all deprecated api calls to the new
    lucene 2.9 api.

    Everything works fine from a functional perspective. But performance severly is
    (negatively) affected by lucene 2.9.

    I profiled the application for a couple of hours, build a jmeter load test and
    compared the following scenarios:

    1. lucene 2.9 - new api
    2. lucene 2.9 - old api and custom sorting after lucene
    3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)

    Please find attached an rrd graph showing the results. The lighter the color the
    faster the request has been served. y=# requests, x=time.

    Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
    response times and therefore throughput (see results of testcase 2 and 3).
    Adapting to the new api decreased performance again. The difference between
    testcase 1 and 2 is most probably due to precached custom sorted results.

    The application under test is a dedicated lucene search engine doing nothing
    else, but serving search requests. We're running a cluster of them in prd and
    it's incredibly fast. With the old implementation and prd traffic we've above
    98% of the requests served in 200ms.
    The index under test contains about 3 million documents (with lots of fields),
    consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
    the linux kernel.

    Most interesting methods used for searching are:

    getHitsCount (is there a way to speed this up?):

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    search:
    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }

    I'm wondering why others are experiencing better performance with 2.9 and why
    our implementations performance is going bad. Maybe our way of using the 2.9 api
    is not the best and sorting is definetly expensive.

    Any ideas are appreciated. I'm torning out my hair since hours and days to find
    the root cause. Also hints how I could find the bottlenecks myself are appreciated.

    Cheers,
    Thomas

    ------------------------------------------------------------------------


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 1:49 pm
    Hey Thomas - any chance you can do some quick profiling and grab the
    hotspots from the 3 configurations?

    Are your custom sorts doing anything tricky?

    --
    - Mark

    http://www.lucidimagination.com


    Thomas Becker wrote:
    Urm and uploaded here:
    http://ankeschwarzer.de/tmp/graph.jpg

    Sorry.

    Thomas Becker wrote:
    Missed the attachment, sorry.

    Thomas Becker wrote:
    Hi all,

    I'm experiencing a performance degradation after migrating to 2.9 and running
    some tests. I'm getting out of ideas and any help to identify the reasons why
    2.9 is slower than 2.4 are highly appreciated.

    We've had some issues with custom sorting in lucene 2.4.1. We worked around them
    by sorting the resultsets manually and caching the results after sorting (memory
    consuming but fast).

    I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
    implementation for sorting and refactored all deprecated api calls to the new
    lucene 2.9 api.

    Everything works fine from a functional perspective. But performance severly is
    (negatively) affected by lucene 2.9.

    I profiled the application for a couple of hours, build a jmeter load test and
    compared the following scenarios:

    1. lucene 2.9 - new api
    2. lucene 2.9 - old api and custom sorting after lucene
    3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now)

    Please find attached an rrd graph showing the results. The lighter the color the
    faster the request has been served. y=# requests, x=time.

    Most interestingly simply switching the lucene jars between 2.4 and 2.9 degraded
    response times and therefore throughput (see results of testcase 2 and 3).
    Adapting to the new api decreased performance again. The difference between
    testcase 1 and 2 is most probably due to precached custom sorted results.

    The application under test is a dedicated lucene search engine doing nothing
    else, but serving search requests. We're running a cluster of them in prd and
    it's incredibly fast. With the old implementation and prd traffic we've above
    98% of the requests served in 200ms.
    The index under test contains about 3 million documents (with lots of fields),
    consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
    the linux kernel.

    Most interesting methods used for searching are:

    getHitsCount (is there a way to speed this up?):

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    search:
    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }

    I'm wondering why others are experiencing better performance with 2.9 and why
    our implementations performance is going bad. Maybe our way of using the 2.9 api
    is not the best and sorting is definetly expensive.

    Any ideas are appreciated. I'm torning out my hair since hours and days to find
    the root cause. Also hints how I could find the bottlenecks myself are appreciated.

    Cheers,
    Thomas
    ------------------------------------------------------------------------


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 2:05 pm
    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.

    Yes the custom sorts are doing something tricky. :) I'll try to explain them in
    few words and paste the code.

    But even w/o them 2.9 is slower. Testcase 2 and 3 have only different lucene jars.

    CustomFieldComparatorPrefix.java:

    a field containing for example releaseDates for sorting. But there's different
    releaseDates for a single document and different countries for example. They're
    prefixed and comma separated in a single field.

    Here's the code:

    public final class CustomFieldComparatorPrefix extends FieldComparatorSource {

    /**
    *
    */
    private static final long serialVersionUID = 200907240001L;

    private final String prefix;

    public CustomFieldComparatorPrefix(String prefix) {
    this.prefix = prefix;
    }

    /*
    * (non-Javadoc)
    *
    * @see
    * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang
    * .String, int, int, boolean)
    */
    public FieldComparator newComparator(final String fieldname, final int numHits,
    int sortPos, boolean reversed) throws IOException {
    return new FieldComparator() {

    private int[] currentReaderValues;

    private int[] values = new int[numHits];

    private int bottom;

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#compare(int, int)
    */
    public int compare(int slot1, int slot2) {
    // TODO: there are sneaky non-branch ways to compute
    // -1/+1/0 sign
    // Cannot return values[slot1] - values[slot2] because that
    // may overflow
    final int v1 = values[slot1];
    final int v2 = values[slot2];
    if (v1 > v2) {
    return 1;
    } else if (v1 < v2) {
    return -1;
    } else {
    return 0;
    }
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#compareBottom(int)
    */
    public int compareBottom(int doc) throws IOException {
    // TODO: there are sneaky non-branch ways to compute
    // -1/+1/0 sign
    // Cannot return bottom - values[slot2] because that
    // may overflow
    final int v2 = currentReaderValues[doc];
    if (bottom > v2) {
    return 1;
    } else if (bottom < v2) {
    return -1;
    } else {
    return 0;
    }
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#copy(int, int)
    */
    public void copy(int slot, int doc) throws IOException {
    values[slot] = currentReaderValues[doc];
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#setBottom(int)
    */
    public void setBottom(int slot) {
    this.bottom = values[slot];
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#sortType()
    */
    public int sortType() {
    return SortField.CUSTOM;
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#value(int)
    */
    public Comparable<Integer> value(int slot) {
    return new Integer(values[slot]);
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws IOException {
    currentReaderValues = FieldCache.DEFAULT.getInts(reader, fieldname, new
    PrefixedIntParser(prefix));
    }

    };
    }
    }

    CustomFieldComparatorPosition.java:

    works similar to the one above. But in the field are different positions for
    different contentgroups. Example "10_1,11_5,14_1". Whereas the prefix are
    defining the contentgroup id and the digit after the underscore is the actual
    position to define the documents place in the sort order.
    This one could in theory use the same comparator as above, but the app will run
    oom due to too many different contentgroups. Since only few (~280.000) documents
    have positions set I wrote my own lucene cache implementation storing only not
    default values.

    Comparator Source:

    public final class CustomFieldComparatorPosition extends FieldComparatorSource {
    private final Logger log = LoggerFactory.getLogger(this.getClass());

    /**
    *
    */
    private static final long serialVersionUID = 200907240001L;

    private final String prefix;

    private static PositionCache cache;

    private StopWatch sw = new StopWatch();

    public CustomFieldComparatorPosition(String prefix) {
    if (cache == null) {
    log.debug("CustomFieldComparatorPosition:initializing PositionCache");
    cache = new PositionCache();
    }
    this.prefix = prefix;
    }

    /*
    * (non-Javadoc)
    *
    * @see
    * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang
    * .String, int, int, boolean)
    */
    public FieldComparator newComparator(final String fieldname, final int numHits,
    int sortPos, boolean reversed) throws IOException {
    return new FieldComparator() {

    private Map<Integer, Integer> currentReaderValues;

    private int[] values = new int[numHits];

    private int bottom;

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#compare(int, int)
    */
    public int compare(int slot1, int slot2) {
    // TODO: there are sneaky non-branch ways to compute
    // -1/+1/0 sign
    // Cannot return values[slot1] - values[slot2] because that
    // may overflow
    final int v1 = values[slot1];
    final int v2 = values[slot2];
    if (v1 > v2) {
    return 1;
    } else if (v1 < v2) {
    return -1;
    } else {
    return 0;
    }
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#compareBottom(int)
    */
    public int compareBottom(int doc) throws IOException {
    int i;
    try {
    i = currentReaderValues.get(doc);
    } catch (NullPointerException e) {
    i = 999999;
    }
    // TODO: there are sneaky non-branch ways to compute
    // -1/+1/0 sign
    // Cannot return bottom - values[slot2] because that
    // may overflow
    final int v2 = i;
    if (bottom > v2) {
    return 1;
    } else if (bottom < v2) {
    return -1;
    } else {
    return 0;
    }
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#copy(int, int)
    */
    public void copy(int slot, int doc) throws IOException {
    // This will be executed n times where n is the amount of
    // documents in the index.
    int value;
    try{
    value = currentReaderValues.get(doc);
    }catch(NullPointerException e){
    value = 999999;
    }
    values[slot] = value;
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#setBottom(int)
    */
    public void setBottom(int slot) {
    this.bottom = values[slot];
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#sortType()
    */
    public int sortType() {
    return SortField.CUSTOM;
    }

    /*
    * (non-Javadoc)
    *
    * @see org.apache.lucene.search.FieldComparator#value(int)
    */
    public Comparable<Integer> value(int slot) {
    return new Integer(values[slot]);
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws IOException {
    sw.start();
    try {
    currentReaderValues = cache.getInts(reader, fieldname, new
    PrefixedIntParser(prefix));
    } catch (InterruptedException e) {
    throw new IllegalStateException(e.getCause());
    }
    sw.stop();
    if (sw.getTime() > 3000) {
    log.info("setNextReader: Slow: Time to get currentReaderValues from cache:
    {}ms. Items in cache: {}", sw.getTime(), cache.getItemCount());
    }
    }

    };
    }
    }

    PositionCache.java:
    /**
    * This cache implementation caches the position values of items stored as
    * documents in lucene. This cache has a WeakHashMap with an IndexReader
    * reference as a key thus if the indexReader reference gets deleted, the cache
    * is marked to be gced. The innerCache is a Map containing field + parser
    * (contracttocontentgroup prefix) as the key and as a value yet another map.
    * The latter map finally contains the docIds as key and positionvalue for this
    * prefix as value.
    *
    * @author Thomas Becker (thomas.becker@net-m.de)
    *
    */
    public class PositionCache {
    final Map<Object, Map<String, Future<Map<Integer, Integer>>>> readerCache = new
    WeakHashMap<Object, Map<String, Future<Map<Integer, Integer>>>>();

    AtomicInteger itemCount = new AtomicInteger(0); // only for debugging...only
    increasing

    public Map<Integer, Integer> getInts(final IndexReader reader, final String
    field, final IntParser parser) throws InterruptedException {

    String key = parser.toString();
    // Future<Map<Integer,Integer> contains docId as key and position as
    // value
    HashMap<String, Future<Map<Integer, Integer>>> innerCache;
    final Object readerKey = reader.getFieldCacheKey();
    synchronized (readerCache) {
    innerCache = (HashMap<String, Future<Map<Integer, Integer>>>)
    readerCache.get(readerKey);
    if (innerCache == null) {
    innerCache = new HashMap<String, Future<Map<Integer, Integer>>>();
    readerCache.put(readerKey, innerCache);
    }
    }
    Future<Map<Integer, Integer>> f = innerCache.get(key);
    if (f == null) {
    Callable<Map<Integer, Integer>> eval = new Callable<Map<Integer, Integer>>() {
    public Map<Integer, Integer> call() throws InterruptedException, IOException {
    HashMap<Integer, Integer> docPositions = new HashMap<Integer, Integer>();
    TermDocs termDocs = reader.termDocs();
    TermEnum termEnum = reader.terms(new Term(field));
    try {
    do {
    Term term = termEnum.term();
    if (term == null || term.field() != field)
    break;
    int termval = parser.parseInt(term.text());
    termDocs.seek(termEnum);
    while (termDocs.next()) {
    // do not store defaults to save memory
    if (termval < 999999) {
    itemCount.incrementAndGet();
    docPositions.put(termDocs.doc(), termval);
    }
    }
    } while (termEnum.next());
    } finally {
    termDocs.close();
    termEnum.close();
    }
    return docPositions;
    }
    };
    FutureTask<Map<Integer, Integer>> ft = new FutureTask<Map<Integer,
    Integer>>(eval);
    f = innerCache.put(key, ft);
    if (f == null) {
    f = ft;
    ft.run();
    }
    }
    try {
    return f.get();
    } catch (CancellationException e) {
    innerCache.remove(key);
    } catch (ExecutionException e) {
    throw new IllegalStateException(e.getCause());
    }
    return null;
    }


    Cheers,
    Thomas


    Mark Miller wrote:
    Hey Thomas - any chance you can do some quick profiling and grab the
    hotspots from the 3 configurations?

    Are your custom sorts doing anything tricky?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 2:13 pm

    Thomas Becker wrote:
    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.

    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 2:44 pm
    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
    improves. Maybe that already solves the issue...stupid me...
    We're updating the index every 30 min. at the moment and it gets optimized after
    each update.


    Mark Miller wrote:
    Thomas Becker wrote:
    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.

    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 2:53 pm

    Thomas Becker wrote:
    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
    improves. Maybe that already solves the issue...stupid me...
    That shouldn't be an issue unless there is some odd bug.
    We're updating the index every 30 min. at the moment and it gets optimized after
    each update.
    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring changes.
    Mark Miller wrote:
    Thomas Becker wrote:
    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.


    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 3:04 pm
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:
    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
    improves. Maybe that already solves the issue...stupid me...
    That shouldn't be an issue unless there is some odd bug.

    We're updating the index every 30 min. at the moment and it gets optimized after
    each update.
    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring changes.

    Mark Miller wrote:

    Thomas Becker wrote:

    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.



    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 3:24 pm
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:

    http://ankeschwarzer.de/tmp/graph.jpg

    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.

    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:
    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
    improves. Maybe that already solves the issue...stupid me...
    That shouldn't be an issue unless there is some odd bug.

    We're updating the index every 30 min. at the moment and it gets optimized after
    each update.
    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring changes.

    Mark Miller wrote:

    Thomas Becker wrote:

    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.



    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.

    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 3:30 pm

    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:

    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.
    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.

    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.

    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.

    --
    - Mark

    http://www.lucidimagination.com

    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:

    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if performance
    improves. Maybe that already solves the issue...stupid me...

    That shouldn't be an issue unless there is some odd bug.


    We're updating the index every 30 min. at the moment and it gets optimized after
    each update.

    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring changes.


    Mark Miller wrote:


    Thomas Becker wrote:


    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of minutes.




    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a lot,
    have you/can you try the same tests on an optimized index? Don't want to
    get ahead of the profiling results, but just to continue the info loop.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 3:44 pm
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 3:59 pm
    Will do, tomorrow.

    Mark Miller wrote:
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 3:06 pm
    Tests run on tmpfs:
    config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258539, ms=8090, MB/sec=935.4907787391842
    config: impl=ChannelFile serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=39444, MB/sec=191.8700030422878
    config: impl=ChannelPread serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=8504, MB/sec=889.9483066792098
    config: impl=PooledPread serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=9585, MB/sec=789.5795931142409

    I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
    faster than NIOFS and the response times improved. But it's still slower than 2.4.

    I'll do some profiling now again and let you know the results.

    Thanks again for all the great support to all who've answered.


    Mark Miller wrote:
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 4:23 pm
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.

    --
    - Mark

    http://www.lucidimagination.com



    Thomas Becker wrote:
    Tests run on tmpfs:
    config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258539, ms=8090, MB/sec=935.4907787391842
    config: impl=ChannelFile serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=39444, MB/sec=191.8700030422878
    config: impl=ChannelPread serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=8504, MB/sec=889.9483066792098
    config: impl=PooledPread serial=false nThreads=4 iterations=100 bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=9585, MB/sec=789.5795931142409

    I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
    faster than NIOFS and the response times improved. But it's still slower than 2.4.

    I'll do some profiling now again and let you know the results.

    Thanks again for all the great support to all who've answered.


    Mark Miller wrote:
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 4:32 pm
    New Profiling sessions with invocation counts. A single lucene search request
    with huge resultset (169k items).

    Quite interesting results though and there's definetly something wrong with luc
    2.9 and the way I'm using it. But see yourself:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi_singlereq.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi_singlereq.png
    and even worse:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request. Will
    do the run again and then post the required info.

    Mark Miller wrote:
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 4:49 pm

    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called twice
    in contrast to the first figure, where IndexSearcher.search was called only
    once.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 4:55 pm

    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called
    twice
    in contrast to the first figure, where IndexSearcher.search was called
    only
    once.
    I forgot, searchWithFilter it is called per segment in 2.9. If it was only
    one search, you must have two segments and therefore no optimized index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 5:01 pm
    No it's only a single segment. But two calls. One doing a getHitsCount first and
    the other doing the actual search. I'll paste both methods below if someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my case with
    2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)

    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called
    twice
    in contrast to the first figure, where IndexSearcher.search was called
    only
    once.
    I forgot, searchWithFilter it is called per segment in 2.9. If it was only
    one search, you must have two segments and therefore no optimized index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 5:04 pm
    Something is very odd about this if they both cover the same search and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very* odd.

    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:
    No it's only a single segment. But two calls. One doing a getHitsCount first and
    the other doing the actual search. I'll paste both methods below if someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my case with
    2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)

    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called
    twice
    in contrast to the first figure, where IndexSearcher.search was called
    only
    once.
    I forgot, searchWithFilter it is called per segment in 2.9. If it was only
    one search, you must have two segments and therefore no optimized index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 5:14 pm
    Notice that while DisjunctionScorer.advance and
    DisjuntionScorer.advanceAfterCurrent appear to be called
    in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
    called.

    Can someone explain that?

    Mark Miller wrote:
    Something is very odd about this if they both cover the same search and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very* odd.

    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:
    No it's only a single segment. But two calls. One doing a getHitsCount first and
    the other doing the actual search. I'll paste both methods below if someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my case with
    2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)

    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:

    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one
    request.

    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called
    twice
    in contrast to the first figure, where IndexSearcher.search was called
    only
    once.
    I forgot, searchWithFilter it is called per segment in 2.9. If it was only
    one search, you must have two segments and therefore no optimized index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 5:23 pm
    Nevermind. I see advance wasn't around in 2.4. This is part of the
    DocIdSetIterator changes.

    Anyway - either these are just not comparable runs, or there is a major
    bug (which seems unlikely).

    Just to keep pointing out the obvious:

    2.4 calls doc 195,000 times
    2.9 calls docId 1.4 million times

    That just doesn't jive.

    Mark Miller wrote:
    Notice that while DisjunctionScorer.advance and
    DisjuntionScorer.advanceAfterCurrent appear to be called
    in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
    called.

    Can someone explain that?

    Mark Miller wrote:
    Something is very odd about this if they both cover the same search and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very* odd.

    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:

    No it's only a single segment. But two calls. One doing a getHitsCount first and
    the other doing the actual search. I'll paste both methods below if someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my case with
    2.9 tomorrow. It was about time to get more into lucene-core sources anyhow. :)

    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc = TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene query:" + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher could be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms, query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms", durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort, int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to hardlimit: {} for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than 100.000 for query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}", e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() - startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms, query={}, indexUsed={}",
    new Object[] { durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms", durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:


    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one

    request.


    Will
    do the run again and then post the required info.

    The last figure shows, that IndexSearcher.searchWithFilter was called
    twice
    in contrast to the first figure, where IndexSearcher.search was called
    only
    once.

    I forgot, searchWithFilter it is called per segment in 2.9. If it was only
    one search, you must have two segments and therefore no optimized index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 5:29 pm
    See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
    it explains this behaviour (and Thomas ran some warming queries before).

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 7:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Nevermind. I see advance wasn't around in 2.4. This is part of the
    DocIdSetIterator changes.

    Anyway - either these are just not comparable runs, or there is a major
    bug (which seems unlikely).

    Just to keep pointing out the obvious:

    2.4 calls doc 195,000 times
    2.9 calls docId 1.4 million times

    That just doesn't jive.

    Mark Miller wrote:
    Notice that while DisjunctionScorer.advance and
    DisjuntionScorer.advanceAfterCurrent appear to be called
    in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
    called.

    Can someone explain that?

    Mark Miller wrote:
    Something is very odd about this if they both cover the same search and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very* odd.

    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:

    No it's only a single segment. But two calls. One doing a getHitsCount
    first and
    the other doing the actual search. I'll paste both methods below if
    someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my
    case with
    2.9 tomorrow. It was about time to get more into lucene-core sources
    anyhow. :)
    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc =
    TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene
    query:"
    + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher
    could
    be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms,
    query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms",
    durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort,
    int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] {
    indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to
    hardlimit: {}
    for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector =
    TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than
    100.000 for
    query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}",
    e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms,
    query={},
    indexUsed={}",
    new Object[] {
    durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms",
    durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:


    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one

    request.


    Will
    do the run again and then post the required info.

    The last figure shows, that IndexSearcher.searchWithFilter was
    called
    twice
    in contrast to the first figure, where IndexSearcher.search was
    called
    only
    once.

    I forgot, searchWithFilter it is called per segment in 2.9. If it was
    only
    one search, you must have two segments and therefore no optimized
    index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 5:34 pm
    Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
    was sinking in. I hadn't cross correlated the issues yet.

    Makes perfect sense. Very nice catch.
    Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
    as before, but optionally would wrap it into an OpenBitSet, if it is not an
    instance of OBS.
    So - do we address this for 2.9. This is a nasty performance trap, is it
    not?

    We recommend that you use both of these classes together - if it can
    work like this though ...

    Seems like a bug to me.

    - mark

    Uwe Schindler wrote:
    See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
    it explains this behaviour (and Thomas ran some warming queries before).

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de


    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 7:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Nevermind. I see advance wasn't around in 2.4. This is part of the
    DocIdSetIterator changes.

    Anyway - either these are just not comparable runs, or there is a major
    bug (which seems unlikely).

    Just to keep pointing out the obvious:

    2.4 calls doc 195,000 times
    2.9 calls docId 1.4 million times

    That just doesn't jive.

    Mark Miller wrote:
    Notice that while DisjunctionScorer.advance and
    DisjuntionScorer.advanceAfterCurrent appear to be called
    in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
    called.

    Can someone explain that?

    Mark Miller wrote:

    Something is very odd about this if they both cover the same search and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very* odd.

    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:


    No it's only a single segment. But two calls. One doing a getHitsCount
    first and
    the other doing the actual search. I'll paste both methods below if
    someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for my
    case with
    2.9 tomorrow. It was about time to get more into lucene-core sources
    anyhow. :)
    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc =
    TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid lucene
    query:"
    + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException(" indexSearcher
    could
    be corrupted", e);
    } finally {
    long durationMillis = System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query: {} ms,
    query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms",
    durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter, Sort sort,
    int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new Object[] {
    indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set to
    hardlimit: {}
    for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector =
    TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger than
    100.000 for
    query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}",
    e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ", e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis = System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms,
    query={},
    indexUsed={}",
    new Object[] {
    durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms",
    durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:



    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one


    request.



    Will
    do the run again and then post the required info.


    The last figure shows, that IndexSearcher.searchWithFilter was
    called
    twice
    in contrast to the first figure, where IndexSearcher.search was
    called
    only
    once.


    I forgot, searchWithFilter it is called per segment in 2.9. If it was
    only
    one search, you must have two segments and therefore no optimized
    index for
    this to be correct?

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 5:38 pm
    I am currently preparing a patch for CachingWrapperFilter that has a boolean
    ctor parameter (useOpenBitSetCache) and the method getDocIdSet will then do
    what QueryWrapperFilter did before LUCENE-1427.

    I would not do this in QueryWrapperFilter like before, because it would
    slowdown MultiTermQuery if no caching is used and the constantScore/Boolean
    rewrite mode is active.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 7:33 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
    was sinking in. I hadn't cross correlated the issues yet.

    Makes perfect sense. Very nice catch.
    Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
    as before, but optionally would wrap it into an OpenBitSet, if it is not an
    instance of OBS.
    So - do we address this for 2.9. This is a nasty performance trap, is it
    not?

    We recommend that you use both of these classes together - if it can
    work like this though ...

    Seems like a bug to me.

    - mark

    Uwe Schindler wrote:
    See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
    it explains this behaviour (and Thomas ran some warming queries before).

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de


    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 7:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Nevermind. I see advance wasn't around in 2.4. This is part of the
    DocIdSetIterator changes.

    Anyway - either these are just not comparable runs, or there is a major
    bug (which seems unlikely).

    Just to keep pointing out the obvious:

    2.4 calls doc 195,000 times
    2.9 calls docId 1.4 million times

    That just doesn't jive.

    Mark Miller wrote:
    Notice that while DisjunctionScorer.advance and
    DisjuntionScorer.advanceAfterCurrent appear to be called
    in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
    called.

    Can someone explain that?

    Mark Miller wrote:

    Something is very odd about this if they both cover the same search
    and
    the environ for both is identical. Even if one search was done twice,
    and we divide the numbers for the new api by 2 - its still *very*
    odd.
    With 2.4, ScorerDocQueue.topDoc is called half a million times.
    With 2.9, its called over 4 million times.

    Huh?

    Thomas Becker wrote:


    No it's only a single segment. But two calls. One doing a
    getHitsCount
    first and
    the other doing the actual search. I'll paste both methods below if
    someone's
    interested.

    Will dig into lucene's sources and compare 2.4 search behaviour for
    my
    case with
    2.9 tomorrow. It was about time to get more into lucene-core sources
    anyhow. :)
    See you tomorrow guys and thanks a lot again! It's a pleasure.

    public int getHitsCount(String query, Filter filter) throws
    LuceneServiceException {
    log.debug("getHitsCount('{}, {}')", query, filter);
    if (StringUtils.isBlank(query)) {
    log.warn("getHitsCount: empty lucene
    query");
    return 0;
    }
    long startTimeMillis = System.currentTimeMillis();
    int count = 0;

    if (indexSearcher == null) {
    return 0;
    }

    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    Query q = null;
    try {
    q = createQuery(query);
    TopScoreDocCollector tsdc =
    TopScoreDocCollector.create(1, true);
    indexSearcher.search(q, filter, tsdc);
    count = tsdc.getTotalHits();
    log.info("getHitsCount: count = {}",count);
    } catch (ParseException ex) {
    throw new LuceneServiceException("invalid
    lucene
    query:"
    + query, ex);
    } catch (IOException e) {
    throw new LuceneServiceException("
    indexSearcher
    could
    be corrupted", e);
    } finally {
    long durationMillis =
    System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("getHitsCount: Slow query:
    {} ms,
    query={}", durationMillis, query);
    }
    log.debug("getHitsCount: query took {} ms",
    durationMillis);
    }
    return count;
    }

    public List<Document> search(String query, Filter filter,
    Sort
    sort,
    int from,
    int size) throws LuceneServiceException {
    log.debug("{} search('{}', {}, {}, {}, {})", new
    Object[] {
    indexAlias, query,
    filter, sort, from, size });
    long startTimeMillis = System.currentTimeMillis();

    List<Document> docs = new ArrayList<Document>();
    if (indexSearcher == null) {
    return docs;
    }
    Query q = null;
    try {
    if (query == null) {
    log.warn("search: lucene query is
    null...");
    return docs;
    }
    q = createQuery(query);
    BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
    if (size < 0 || size > maxNumHits) {
    // set hard limit for numHits
    size = maxNumHits;
    if (log.isDebugEnabled())
    log.debug("search: Size set
    to
    hardlimit: {}
    for query: {} with filter:
    {}", new Object[] { size, query, filter });
    }
    TopFieldCollector collector =
    TopFieldCollector.create(sort, size + from,
    true, false, false, true);
    indexSearcher.search(q, filter, collector);
    if(size > collector.getTotalHits())
    size = collector.getTotalHits();
    if (size > 100000)
    log.info("search: size: {} bigger
    than
    100.000 for
    query: {} with filter:
    {}", new Object[] { size, query, filter });
    TopDocs td = collector.topDocs(from, size);
    ScoreDoc[] scoreDocs = td.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
    docs.add(indexSearcher.doc(scoreDoc.doc));
    }
    } catch (ParseException e) {
    log.warn("search: ParseException: {}",
    e.getMessage());
    if (log.isDebugEnabled())
    log.warn("search: ParseException: ",
    e);
    return Collections.emptyList();
    } catch (IOException e) {
    log.warn("search: IOException: ", e);
    return Collections.emptyList();
    } finally {
    long durationMillis =
    System.currentTimeMillis() -
    startTimeMillis;
    if (durationMillis > slowQueryLimit) {
    log.warn("search: Slow query: {} ms,
    query={},
    indexUsed={}",
    new Object[] {
    durationMillis, query,
    indexSearcher.getIndexReader().directory() });
    }
    log.debug("search: query took {} ms",
    durationMillis);
    }
    return docs;
    }


    Uwe Schindler wrote:



    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one


    request.



    Will
    do the run again and then post the required info.


    The last figure shows, that IndexSearcher.searchWithFilter was
    called
    twice
    in contrast to the first figure, where IndexSearcher.search was
    called
    only
    once.


    I forgot, searchWithFilter it is called per segment in 2.9. If it
    was
    only
    one search, you must have two segments and therefore no optimized
    index for
    this to be correct?

    Uwe


    -------------------------------------------------------------------
    --
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 4:55 pm
    Ah - that explains a bit. Though if you divide by 2, the new one still
    appears to overcall each method
    in comparison to 2.4.

    - Mark

    Uwe Schindler wrote:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called twice
    in contrast to the first figure, where IndexSearcher.search was called only
    once.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 4:58 pm
    This might be a change in my getHitCounts method. I will dig into that by tomorrow.

    I'm really sorry, but I've to leave now. Otherwise I'll have other issues.

    Uwe Schindler wrote:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request.
    Will
    do the run again and then post the required info.
    The last figure shows, that IndexSearcher.searchWithFilter was called twice
    in contrast to the first figure, where IndexSearcher.search was called only
    once.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 4:54 pm
    Very interesting. Something can't be going right here. You are searching
    against a single segment, yet, just for example,
    while before, DisjunctionSumScorer.advanceAfterCurrent was being called
    154,000 times, now its being called 1.3 million times.
    Other scoring methods have similar crazy jumps.

    Guys that worked on that stuff - any ideas on whats up here?


    They both appear to be using BooleanScorer2.

    --
    - Mark

    http://www.lucidimagination.com




    Thomas Becker wrote:
    New Profiling sessions with invocation counts. A single lucene search request
    with huge resultset (169k items).

    Quite interesting results though and there's definetly something wrong with luc
    2.9 and the way I'm using it. But see yourself:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi_singlereq.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi_singlereq.png
    and even worse:
    http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png

    Have to verify that the last one is not by accident more than one request. Will
    do the run again and then post the required info.

    Mark Miller wrote:
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 4:34 pm
    How should we proceed? Stop the final artifact build and voting or proceed
    with the release of 2.9? We waited so long and for most people it is faster
    than slower!

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 6:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.

    --
    - Mark

    http://www.lucidimagination.com



    Thomas Becker wrote:
    Tests run on tmpfs:
    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258539, ms=8090, MB/sec=935.4907787391842
    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=39444, MB/sec=191.8700030422878
    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=8504, MB/sec=889.9483066792098
    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=9585, MB/sec=789.5795931142409

    I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
    faster than NIOFS and the response times improved. But it's still slower than 2.4.
    I'll do some profiling now again and let you know the results.

    Thanks again for all the great support to all who've answered.


    Mark Miller wrote:
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 4:39 pm
    I suggest to find the root cause and then decide about the release. Tomorrow I
    will spent the whole working day on the issue if no prio1 pops up.

    Sadly I've to leave early today, since I'm moving to a new flat... :(

    Uwe Schindler wrote:
    How should we proceed? Stop the final artifact build and voting or proceed
    with the release of 2.9? We waited so long and for most people it is faster
    than slower!

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 6:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.

    --
    - Mark

    http://www.lucidimagination.com



    Thomas Becker wrote:
    Tests run on tmpfs:
    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258539, ms=8090, MB/sec=935.4907787391842
    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=39444, MB/sec=191.8700030422878
    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=8504, MB/sec=889.9483066792098
    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=9585, MB/sec=789.5795931142409

    I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
    faster than NIOFS and the response times improved. But it's still slower than 2.4.
    I'll do some profiling now again and let you know the results.

    Thanks again for all the great support to all who've answered.


    Mark Miller wrote:
    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 16, 2009 at 4:45 pm
    I agree - best to find the cause before making a decision. There are
    enough smart people in the wings, I can't imagine this should take us
    that long. We have solved a good chunk of it already, and have only just
    begun chunk two.

    --
    - Mark

    http://www.lucidimagination.com



    Thomas Becker wrote:
    I suggest to find the root cause and then decide about the release. Tomorrow I
    will spent the whole working day on the issue if no prio1 pops up.

    Sadly I've to leave early today, since I'm moving to a new flat... :(

    Uwe Schindler wrote:
    How should we proceed? Stop the final artifact build and voting or proceed
    with the release of 2.9? We waited so long and for most people it is faster
    than slower!

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de


    -----Original Message-----
    From: Mark Miller
    Sent: Wednesday, September 16, 2009 6:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.

    --
    - Mark

    http://www.lucidimagination.com



    Thomas Becker wrote:
    Tests run on tmpfs:
    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258539, ms=8090, MB/sec=935.4907787391842
    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=39444, MB/sec=191.8700030422878
    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=8504, MB/sec=889.9483066792098
    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024
    poolsize=2 filelen=18920301
    answer=850258903, ms=9585, MB/sec=789.5795931142409

    I did run some tests now with SimpleFSDirectory and MMapDirectory. Both are
    faster than NIOFS and the response times improved. But it's still slower than 2.4.
    I'll do some profiling now again and let you know the results.

    Thanks again for all the great support to all who've answered.


    Mark Miller wrote:

    Can you run the following test on your RAMDISK?

    http://people.apache.org/~markrmiller/FileReadTest.java

    I've taken it from the following issue (in which NIOFSDirectory was
    developed):
    https://issues.apache.org/jira/browse/LUCENE-753

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Sep 16, 2009 at 4:44 pm

    On Wed, Sep 16, 2009 at 12:33 PM, Uwe Schindler wrote:
    How should we proceed? Stop the final artifact build and voting or proceed
    with the release of 2.9? We waited so long and for most people it is faster
    than slower!
    I think we know that 2.9 will not be faster for everyone:
    - Per-segment searching and the new comparatores are a general win,
    but will be slower for some people.
    - Query parsing and small document indexing will be somewhat slower
    due to the new token APIs (the workarounds for back compatibility) if
    token streams aren't reused.

    I don't see any indication of any bugs in Lucene in this thread either.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 16, 2009 at 4:56 pm
    So here's a debug message showing the query:

    2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
    service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
    BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
    wed:true)) +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
    +CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
    +CachingWrapperFilter(QueryWrapperFilter(+contentReposit
    oryIds:146)) +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
    -CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox20090916 TO
    lablxrox99999999} digitalDeleteDate:[lablxro
    x19000101 TO lablxrox20090916]))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
    <custom:"position": null>, 0,
    16)

    By the way, will it make sense to move this into a jira issue? It'll get quite
    tough to find the right information already since this thread is growing quite fast.

    Mark Miller wrote:
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 5:10 pm
    I found one thing in your debug output:

    You are using a lot of CachingWrapperFilters around QueryWrapperFilter.
    According to http://issues.apache.org/jira/browse/LUCENE-1427,
    QueryWrapperFilter does not copy the scorer's doc ids into a OpenBitSet, it
    instead returns the scorer itself as DocIdSet (which is perfectly legal ant
    very good, if no caching is involved). The CachingWrapperFilter just caches
    this DocIdSet, but the result is that the scorer's next/advance methods are
    executed even if the cache had cached the filter before.

    Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
    as before, but optionally would wrap it into an OpenBitSet, if it is not an
    instance of OBS.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Thomas Becker
    Sent: Wednesday, September 16, 2009 6:56 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    So here's a debug message showing the query:

    2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
    service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
    BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
    wed:true))
    +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
    +CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
    +CachingWrapperFilter(QueryWrapperFilter(+contentReposit
    oryIds:146))
    +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
    -
    CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox200909
    16 TO
    lablxrox99999999} digitalDeleteDate:[lablxro
    x19000101 TO lablxrox20090916]))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
    <custom:"position": null>, 0,
    16)

    By the way, will it make sense to move this into a jira issue? It'll get
    quite
    tough to find the right information already since this thread is growing
    quite fast.

    Mark Miller wrote:
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 16, 2009 at 6:09 pm
    Hi Thomas,

    I think we found the root of the problem. We opened
    https://issues.apache.org/jira/browse/LUCENE-1911 .

    Could you please try the attached patch, if it solves your problems? It has
    to do with the work of CachingWrapperFilter and QueryWrapperFilter together,
    which changed in 2.9.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Thomas Becker
    Sent: Wednesday, September 16, 2009 6:56 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    So here's a debug message showing the query:

    2009-09-16 18:53:59,642 [DEBUG] [http-8440-2] [] [2144122] []
    service.impl.LuceneBaseService: items search('viewable:(FINDALL 0 1 2 )',
    BooleanFilter( +CachingWrapperFilter(QueryWrapperFilter(+issalesallo
    wed:true))
    +CachingWrapperFilter(QueryWrapperFilter(licenseGroupKeys:lablxrox))
    +CachingWrapperFilter(QueryWrapperFilter(+(viewable:1 viewable:2)))
    +CachingWrapperFilter(QueryWrapperFilter(+contentReposit
    oryIds:146))
    +CachingWrapperFilter(QueryWrapperFilter(+contentGroupIds:14554))
    -
    CachingWrapperFilter(QueryWrapperFilter(digitalReleaseDate:{lablxrox200909
    16 TO
    lablxrox99999999} digitalDeleteDate:[lablxro
    x19000101 TO lablxrox20090916]))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:all))
    -CachingWrapperFilter(QueryWrapperFilter(+excludedContractIds:2144122))),
    <custom:"position": null>, 0,
    16)

    By the way, will it make sense to move this into a jira issue? It'll get
    quite
    tough to find the right information already since this thread is growing
    quite fast.

    Mark Miller wrote:
    bq. I'll do some profiling now again and let you know the results.

    Great - it will be interesting to see the results. My guess, based on
    the 2.9 new api profiling, is that your queries may not be agreeing with
    some of the changes somehow. Along with the profiling, can you fill us
    in on the query types you are using as well? (eg qualities)

    And grab invocations if its possible.
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 15, 2009 at 3:51 pm
    On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
    FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
    of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs.
    MMapDir and compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 5:30 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.
    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.

    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.

    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.

    --
    - Mark

    http://www.lucidimagination.com

    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:

    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old
    one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if
    performance
    improves. Maybe that already solves the issue...stupid me...

    That shouldn't be an issue unless there is some odd bug.


    We're updating the index every 30 min. at the moment and it gets
    optimized after
    each update.

    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring
    changes.

    Mark Miller wrote:


    Thomas Becker wrote:


    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of
    minutes.



    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a
    lot,
    have you/can you try the same tests on an optimized index? Don't
    want to
    get ahead of the profiling results, but just to continue the info
    loop.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 4:00 pm
    Hi Uwe,

    already done. See my last message.

    Cheers,
    Thomas

    Uwe Schindler wrote:
    On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
    FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
    of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs.
    MMapDir and compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 5:30 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.
    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.

    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.

    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.

    --
    - Mark

    http://www.lucidimagination.com

    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:

    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old
    one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if
    performance
    improves. Maybe that already solves the issue...stupid me...

    That shouldn't be an issue unless there is some odd bug.


    We're updating the index every 30 min. at the moment and it gets
    optimized after
    each update.

    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring
    changes.
    Mark Miller wrote:


    Thomas Becker wrote:


    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of
    minutes.


    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a
    lot,
    have you/can you try the same tests on an optimized index? Don't
    want to
    get ahead of the profiling results, but just to continue the info
    loop.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Becker at Sep 15, 2009 at 3:59 pm

    Mark Miller wrote:
    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:

    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.
    Ok, got your point and agree.
    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.
    Yes, I've checked that in the sources. I'm right now building the app and will
    try it with SimpleFSDirectory. FSDirectory.open is replaced with new
    SimpleFSDirectory. Will collect and post the results tomorrow. Seems to be a tad
    faster. Details and new profiling data tomorrow. Have to leave now, sorry.
    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.
    Yes, I reduced the profiler's impact on performance. It got horribly slow with
    my usual profiling settings as soon as I included the org.apache.lucene files.
    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.
    And me too. Doing my best to find the issue. :)
    --
    Thomas Becker
    Senior JEE Developer

    net mobile AG
    Zollhof 17
    40221 Düsseldorf
    GERMANY

    Phone: +49 211 97020-195
    Fax: +49 211 97020-949
    Mobile: +49 173 5146567 (private)
    E-Mail: mailto:thomas.becker@net-m.de
    Internet: http://www.net-m.de

    Registergericht: Amtsgericht Düsseldorf, HRB 48022
    Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann,
    Kai Markus Kulas, Dieter Plassmann
    Vorsitzender des
    Aufsichtsrates: Dr. Michael Briem

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 15, 2009 at 4:37 pm
    Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux
    do you use, 64bit or 32bit JVM and kernel, ram fs type?

    If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM
    fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly
    used when mapped into the JVM address space.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 5:30 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.
    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.

    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.

    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.

    --
    - Mark

    http://www.lucidimagination.com

    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:
    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:
    Thomas Becker wrote:

    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old
    one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if
    performance
    improves. Maybe that already solves the issue...stupid me...

    That shouldn't be an issue unless there is some odd bug.


    We're updating the index every 30 min. at the moment and it gets
    optimized after
    each update.

    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring
    changes.

    Mark Miller wrote:


    Thomas Becker wrote:


    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of
    minutes.



    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a
    lot,
    have you/can you try the same tests on an optimized index? Don't
    want to
    get ahead of the profiling results, but just to continue the info
    loop.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 4:47 pm
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.

    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.

    --
    - Mark

    http://www.lucidimagination.com


    Uwe Schindler wrote:
    Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux
    do you use, 64bit or 32bit JVM and kernel, ram fs type?

    If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM
    fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly
    used when mapped into the JVM address space.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 5:30 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Thomas Becker wrote:
    Hey Mark,

    yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here:
    http://ankeschwarzer.de/tmp/graph.jpg
    Right - I know your measurements showed a difference (and will keep that
    in mind) - but the profiling results then seem
    oddly similar.

    2.4 responds much quicker thus increasing throughput severly. I'm having a
    single segment only:

    -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
    -rw-r--r-- 1 asuser asgroup 294 Sep 9 16:44 segments_1o
    -rw-r--r-- 1 asuser asgroup 2810603184 Sep 9 16:44 _7c.cfs

    The FileChannel.read hotspot is indeed strange. Especially if taking into
    account that the index is lying on a tmpfs (in memory). And 2.4 should be using
    NIOFSDirectory as well?! Will check that.
    2.4 did not use NIOFSDirectory by default - you would have had to
    manually specified it. Now its used by default if its detects a non
    Windows OS.

    Any particular reason your profiling output does not show invocations?
    Its not essential most likely, but I've found it to be helpful in
    comparisons.

    We are about to release 2.9, and its been such a long haul, I'd hate to
    see a release with an unknown performance trap.

    --
    - Mark

    http://www.lucidimagination.com


    Thanks a lot for your support!

    Cheers,
    Thomas

    Mark Miller wrote:

    A few quick notes -

    Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?

    You save a lot with the new Intern impl, because thats not a hotspot
    anymore. But then,
    RandomAccessFile seeks end up being a lot more of the pie. They look
    fairly similar in speed overall?

    It looks like the major bottleneck with 2.9 new api is that its using
    NIOFSDirectory (your on unix I guess, and it now
    defaults to that on non Windows os's), and that appears to be a real
    killer for you. Its taking half the time for its
    reads. ???

    No conclusions yet, but I'm looking it over. Some other guys will come
    in with some ideas as well.

    Do confirm that those profiling results are on a single segment though.

    - Mark


    Mark Miller wrote:

    Thomas Becker wrote:


    Here's the results of profiling 10 different search requests:

    http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
    http://ankeschwarzer.de/tmp/lucene_29_newapi.png

    But you already gave me a good hint. The index being used is an old
    one build
    with lucene 2.4. I will now try a freshly build 2.9 index and see if
    performance
    improves. Maybe that already solves the issue...stupid me...


    That shouldn't be an issue unless there is some odd bug.



    We're updating the index every 30 min. at the moment and it gets
    optimized after
    each update.


    So this profiling is on an optimized index (eg a single segment) ?
    That would be odd indeed, and possibly point to some of the scoring
    changes.
    Mark Miller wrote:



    Thomas Becker wrote:



    Hey Mark,

    thanks for your reply. Will do. Results will follow in a couple of
    minutes.


    Thanks, awesome.

    Also, how many segments (approx) are in your index? If there are a
    lot,
    have you/can you try the same tests on an optimized index? Don't
    want to
    get ahead of the profiling results, but just to continue the info
    loop.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 5:17 pm

    Mark Miller wrote:
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.

    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.
    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 15, 2009 at 5:27 pm
    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.
    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 6:20 pm
    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:
    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.

    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Sep 15, 2009 at 7:00 pm
    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different? Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com


    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:
    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:
    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.

    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 15, 2009 at 7:11 pm
    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).

    What JRE/OS, linux, kernel versions, and hardware, are you running on?

    The gains of SeparateFile over all else are stunning. And, quite
    different from the linux tests I had run under LUCENE-753. Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts. But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:
    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different?  Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com


    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:
    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:
    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:
    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.

    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=10
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 7:18 pm

    Michael McCandless wrote:
    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).
    Those 4 were tmpfs - I mention ext4 at the end because I had just given
    a feel for the hardrive tests and wanted to note it was from ext4 - the
    results are def ramdisk though.
    What JRE/OS, linux, kernel versions, and hardware, are you running on?
    These are on:
    Ubuntu Karmic Koala 9.10, currently updated
    java-1.5.0-sun-1.5.0.20
    2.6.31-10-generic

    RAM: 3.9 Gig
    4 core Intel Core2 duo 2.0GHz

    Slow 5200 rpm laptop drives.
    The gains of SeparateFile over all else are stunning. And, quite
    different from the linux tests I had run under LUCENE-753. Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts. But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:
    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different? Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com


    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:

    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:
    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de


    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:

    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on
    ramdisks.

    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.


    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 7:25 pm
    I just really I hadn't sent this one. Here are results from the harddrive:

    It looks like its closer to the same speed on the hardrive once
    everything is loaded in the system cache (as you'd expect). SeparateFile
    was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.


    - Mark

    Tests from harddisk (filesystem cache warmed):

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282293977, ms=238230, MB/sec=1226.6370616630988

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=766340, MB/sec=381.3212767179059


    Mark Miller wrote:
    Michael McCandless wrote:
    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).
    Those 4 were tmpfs - I mention ext4 at the end because I had just given
    a feel for the hardrive tests and wanted to note it was from ext4 - the
    results are def ramdisk though.
    What JRE/OS, linux, kernel versions, and hardware, are you running on?
    These are on:
    Ubuntu Karmic Koala 9.10, currently updated
    java-1.5.0-sun-1.5.0.20
    2.6.31-10-generic

    RAM: 3.9 Gig
    4 core Intel Core2 duo 2.0GHz

    Slow 5200 rpm laptop drives.

    The gains of SeparateFile over all else are stunning. And, quite
    different from the linux tests I had run under LUCENE-753. Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts. But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:

    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different? Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com



    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:

    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:

    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de



    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:


    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on

    ramdisks.


    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.



    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Sep 15, 2009 at 7:33 pm
    Remember to disable CPU frequency scaling when benchmarking... some
    things with IO cause the freq to drop, and when it's CPU bound again
    it takes a while for Linux to scale up the freq again.

    For example, on my ubuntu box, ChannelFile went from 100MB/sec to
    388MB/sec. This effect probably won't be uniform across
    implementations either.

    -Yonik
    http://www.lucidimagination.com
    On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller wrote:
    I just really I hadn't sent this one. Here are results from the harddrive:

    It looks like its closer to the same speed on the hardrive once
    everything is loaded in the system cache (as you'd expect). SeparateFile
    was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.


    - Mark

    Tests from harddisk (filesystem cache warmed):

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282293977, ms=238230, MB/sec=1226.6370616630988

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=766340, MB/sec=381.3212767179059


    Mark Miller wrote:
    Michael McCandless wrote:
    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).
    Those 4 were tmpfs - I mention ext4 at the end because I had just given
    a feel for the hardrive tests and wanted to note it was from ext4 - the
    results are def ramdisk though.
    What JRE/OS, linux, kernel versions, and hardware, are you running on?
    These are on:
    Ubuntu Karmic Koala 9.10, currently updated
    java-1.5.0-sun-1.5.0.20
    2.6.31-10-generic

    RAM: 3.9 Gig
    4 core Intel Core2 duo 2.0GHz

    Slow 5200 rpm laptop drives.

    The gains of SeparateFile over all else are stunning.  And, quite
    different from the linux tests I had run under LUCENE-753.  Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts.  But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:

    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different?  Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com



    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:

    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:

    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de



    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:


    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
    SeparateFile all 4 of my cores are immediately pinned and remain so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory on

    ramdisks.


    Even still though - it looks like you have a further issue - your Lucene
    2.9 old-api results don't use it, and are still not good.



    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 15, 2009 at 7:45 pm
    Now I am completely disturbed. Which numbers come from which filesystem?
    Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,...

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: yseeley@gmail.com On Behalf Of Yonik
    Seeley
    Sent: Tuesday, September 15, 2009 9:33 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Remember to disable CPU frequency scaling when benchmarking... some
    things with IO cause the freq to drop, and when it's CPU bound again
    it takes a while for Linux to scale up the freq again.

    For example, on my ubuntu box, ChannelFile went from 100MB/sec to
    388MB/sec. This effect probably won't be uniform across
    implementations either.

    -Yonik
    http://www.lucidimagination.com
    On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller wrote:
    I just really I hadn't sent this one. Here are results from the
    harddrive:
    It looks like its closer to the same speed on the hardrive once
    everything is loaded in the system cache (as you'd expect). SeparateFile
    was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.

    - Mark

    Tests from harddisk (filesystem cache warmed):

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282293977, ms=238230, MB/sec=1226.6370616630988

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=766340, MB/sec=381.3212767179059


    Mark Miller wrote:
    Michael McCandless wrote:
    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).
    Those 4 were tmpfs - I mention ext4 at the end because I had just given
    a feel for the hardrive tests and wanted to note it was from ext4 - the
    results are def ramdisk though.
    What JRE/OS, linux, kernel versions, and hardware, are you running on?
    These are on:
    Ubuntu Karmic Koala 9.10, currently updated
    java-1.5.0-sun-1.5.0.20
    2.6.31-10-generic

    RAM: 3.9 Gig
    4 core Intel Core2 duo 2.0GHz

    Slow 5200 rpm laptop drives.

    The gains of SeparateFile over all else are stunning.  And, quite
    different from the linux tests I had run under LUCENE-753.  Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts.  But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:

    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different?  Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com



    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:

    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are
    bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:

    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de



    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:


    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk -
    with
    SeparateFile all 4 of my cores are immediately pinned and remain
    so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory
    on
    ramdisks.


    Even still though - it looks like you have a further issue - your
    Lucene
    2.9 old-api results don't use it, and are still not good.



    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    -------------------------------------------------------------------
    --
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    --------------------------------------------------------------------
    -
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at Sep 15, 2009 at 8:10 pm
    Disturbed reminds me of the owl from sword in the stone ;)

    Thats a great one liner - now I am completely disturbed.

    Sorry - I've been known to do that -

    The two results that I say specifically are from the harddisk - those
    are from the harddisk and are ext4. They are a tad slower than the
    ramdisk results.

    Everything else is the ramdisk - and I wasn't sure if thats ext4 or ext3
    (and my mind hasn't wandered towards finding the answer yet). I didn't
    specify when creating it.

    - Mark

    Uwe Schindler wrote:
    Now I am completely disturbed. Which numbers come from which filesystem?
    Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,...

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: yseeley@gmail.com On Behalf Of Yonik
    Seeley
    Sent: Tuesday, September 15, 2009 9:33 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Remember to disable CPU frequency scaling when benchmarking... some
    things with IO cause the freq to drop, and when it's CPU bound again
    it takes a while for Linux to scale up the freq again.

    For example, on my ubuntu box, ChannelFile went from 100MB/sec to
    388MB/sec. This effect probably won't be uniform across
    implementations either.

    -Yonik
    http://www.lucidimagination.com

    On Tue, Sep 15, 2009 at 3:25 PM, Mark Miller <markrmiller@gmail.com>
    wrote:
    I just really I hadn't sent this one. Here are results from the
    harddrive:
    It looks like its closer to the same speed on the hardrive once
    everything is loaded in the system cache (as you'd expect). SeparateFile
    was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.
    - Mark

    Tests from harddisk (filesystem cache warmed):

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282293977, ms=238230, MB/sec=1226.6370616630988

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=766340, MB/sec=381.3212767179059


    Mark Miller wrote:
    Michael McCandless wrote:

    I don't like that the answer is different... but it's really really
    odd that it's different-yet-almost-the-same.

    Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
    (Because the top 2 entries of your 4 results match the first set of 2
    entries you sent... so I'm thinking these 4 were actually tmpfs not
    ext4).

    Those 4 were tmpfs - I mention ext4 at the end because I had just given
    a feel for the hardrive tests and wanted to note it was from ext4 - the
    results are def ramdisk though.

    What JRE/OS, linux, kernel versions, and hardware, are you running on?

    These are on:
    Ubuntu Karmic Koala 9.10, currently updated
    java-1.5.0-sun-1.5.0.20
    2.6.31-10-generic

    RAM: 3.9 Gig
    4 core Intel Core2 duo 2.0GHz

    Slow 5200 rpm laptop drives.


    The gains of SeparateFile over all else are stunning. And, quite
    different from the linux tests I had run under LUCENE-753. Maybe we
    need to revert FSDir.open to return SimpleFSDir again, on non-Windows
    hosts. But then we don't have good concurrency...

    Mike

    On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley
    wrote:


    It's been a while since I wrote that benchmarker... is it OK that the
    answer is different? Did you use the same test file?

    -Yonik
    http://www.lucidimagination.com



    On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller <markrmiller@gmail.com>
    wrote:
    The results:

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383

    config: impl=ChannelPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=632253, MB/sec=462.19115955163517

    config: impl=PooledPread serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=774664, MB/sec=377.2238637654518

    ClassicFile was heading for the same fate as ChannelFile.


    I'll have to check what its like on the file system - but it appears
    just ridiculously slower. Even with SeparateFile, All 4 cores are
    bouncing
    from 0-12% independently and really favoring the low end of that.
    ChannelPread appears no better.

    There are results from other OS's/setups in the JIRA issue.

    I'm using ext4.

    Uwe Schindler wrote:


    How does a conventional file system compare?

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de




    -----Original Message-----
    From: Mark Miller
    Sent: Tuesday, September 15, 2009 7:15 PM
    To: java-user@lucene.apache.org
    Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?

    Mark Miller wrote:



    Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk -
    with
    SeparateFile all 4 of my cores are immediately pinned and remain
    so.
    With ChannelFile, all 4 cores hover 20-30%.

    It would appear it may not be a good idea to use NIOFSDirectory
    on
    ramdisks.



    Even still though - it looks like you have a further issue - your
    Lucene
    2.9 old-api results don't use it, and are still not good.




    The quick results:

    ramdisk: sudo mount -t tmpfs tmpfs /tmp/space -o
    size=1G,nr_inodes=200k,mode=01777

    config: impl=SeparateFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295611, ms=173550, MB/sec=1683.7899579371938

    config: impl=ChannelFile serial=false nThreads=4 iterations=100
    bufsize=1024 poolsize=2 filelen=730554368
    answer=-282295361, ms=1377768, MB/sec=212.09793463050383


    --
    - Mark

    http://www.lucidimagination.com




    ------------------------------------------------------------------
    ---
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    -------------------------------------------------------------------
    --
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    --
    - Mark

    http://www.lucidimagination.com




    --------------------------------------------------------------------
    -
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    - Mark

    http://www.lucidimagination.com




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Sep 15, 2009 at 7:50 pm
    Here's my results in my quad core phenom, with ondemand CPU freq
    scaling disabled (clocks locked at 3GHz)

    Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.

    Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
    2009 x86_64 GNU/Linux
    Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
    Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)


    config: impl=ClassicFile serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427971, ms=15610, MB/sec=489.99482383087764

    config: impl=SeparateFile serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427672, ms=4115, MB/sec=1858.7652976913728

    config: impl=PooledPread serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427971, ms=6352, MB/sec=1204.15919395466

    config: impl=ChannelFile serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427971, ms=20347, MB/sec=375.91876935174713

    config: impl=ChannelPread serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427971, ms=5189, MB/sec=1474.0449412218154

    config: impl=ChannelTransfer serial=false nThreads=4 iterations=20
    bufsize=1024 poolsize=2 filelen=95610240
    answer=1165427971, ms=14794, MB/sec=517.021711504664

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 15, '09 at 1:30p
activeSep 16, '09 at 6:09p
posts57
users6
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase