Grokbase Groups Lucene dev June 2011
FAQ
crank up faceting module tests
------------------------------

Key: LUCENE-3264
URL: https://issues.apache.org/jira/browse/LUCENE-3264
Project: Lucene - Java
Issue Type: Test
Components: modules/facet
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.4, 4.0


The faceting module has a large set of good tests.

lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.

for now, lets just get the coverage in, it will be good to do before any refactoring.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Robert Muir (JIRA) at Jun 30, 2011 at 4:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3264:
    --------------------------------

    Attachment: LUCENE-3264.patch
    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Shai Erera (JIRA) at Jun 30, 2011 at 8:13 am
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057668#comment-13057668 ]

    Shai Erera commented on LUCENE-3264:
    ------------------------------------

    Patch looks very good. All tests pass for me (I've applied on trunk only).

    Few things I've noticed:
    * Previously the tests took 1m20s to run, now they take 2m55s. I guess it's because previously we only created RAMDirs, while now newDirectory picks FSDir from time to time (10%?).

    * FacetTestUtils.close*() can be removed and calls replaced by IOUtils.closeSafely. This is not critical, just remove redundant code.

    * You added a TODO to CategoryListIteratorTest about the test failing if TieredMP is used. In general TieredMP is not good for the taxonomy index, which relies on Lucene doc IDs, and therefore segments must be merged in-order. LTW uses LMP specifically because of that. I will look into the test to understand why would it care about doc IDs, since it doesn't using the taxonomy index at all.

    * There are few places with code like: assertTrue("Would like to test this with deletions!",indexReader.hasDeletions()), and assertTrue("Would like to test this with deletions!",indexReader.numDeletedDocs() > 0) which you removed. Any reason?

    * You added a TODO to TestScoredDocIDsUtils (about reader is read-only) -- you're right, the comment can be deleted.

    While I reviewed, I was thinking that RandomIndexWriter is used to replace the IndexWriter for content indexing. While this is good, this does not cover the 'taxonomy' indexing. So I wonder if we should have under facet/test/o.a.l.utils a RandomTaxonomyWriter which opens RIW internally?

    This is very impressive progress Robert, thanks for doing it !

    I am +1 to commit, after we resolve the tiny issues I raised above. We can add RandomTaxonomyWriter as a follow-on commit.
    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 30, 2011 at 12:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057792#comment-13057792 ]

    Robert Muir commented on LUCENE-3264:
    -------------------------------------

    {quote}
    Previously the tests took 1m20s to run, now they take 2m55s. I guess it's because previously we only created RAMDirs, while now newDirectory picks FSDir from time to time (10%?).
    {quote}

    I don't think its from FSDir, this is now very very rarely picked. Anyway, as said in the issue summary, for a number of reasons, I don't want to address this on this issue, I want to address the coverage first.

    {quote}
    FacetTestUtils.close*() can be removed and calls replaced by IOUtils.closeSafely. This is not critical, just remove redundant code.
    {quote}

    ah, you are right. let's change this.

    {quote}
    You added a TODO to CategoryListIteratorTest about the test failing if TieredMP is used. In general TieredMP is not good for the taxonomy index, which relies on Lucene doc IDs, and therefore segments must be merged in-order. LTW uses LMP specifically because of that. I will look into the test to understand why would it care about doc IDs, since it doesn't using the taxonomy index at all.
    {quote}

    Right, as you said this is for the main index, not the taxonomy index. So I think the test just relies upon lucene doc ids, but I didnt want to just change the test without saying why.

    {quote}
    There are few places with code like: assertTrue("Would like to test this with deletions!",indexReader.hasDeletions()), and assertTrue("Would like to test this with deletions!",indexReader.numDeletedDocs() > 0) which you removed. Any reason?
    {quote}

    Mostly to prevent the tests from failing. RandomIndexWriter randomly optimizes some times, so occasionally there are no deletions. I think this is fine (actually better) as far as coverage... then the deleted docs is occasionally null, etc.

    {quote}
    You added a TODO to TestScoredDocIDsUtils (about reader is read-only) – you're right, the comment can be deleted.
    {quote}

    OK, I'll nuke this.

    {quote}
    We can add RandomTaxonomyWriter as a follow-on commit.
    {quote}

    Yes, lets do this separate.

    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Shai Erera (JIRA) at Jun 30, 2011 at 3:51 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057893#comment-13057893 ]

    Shai Erera commented on LUCENE-3264:
    ------------------------------------

    bq. Anyway, as said in the issue summary, for a number of reasons, I don't want to address this on this issue, I want to address the coverage first.

    I don't understand. I thought that you said so regarding introducing atLeast and iterations, and I'm ok with that. I was just asking, since all you've done is move to use newDir, newIWC and RandomIW, how come the tests running time got that much longer? If it's not FSDir, do you have any idea what can cause that? Will RandomIW stall indexing randomly, or maybe it's newIWC which chooses to flush more often?

    Only trying to understand the cause here.

    bq. RandomIndexWriter randomly optimizes some times, so occasionally there are no deletions

    Ok, makes sense.

    bq. So I think the test just relies upon lucene doc ids, but I didnt want to just change the test without saying why.

    I'll dig why, but it shouldn't hold up the issue. If it needs to be changed, we can change it later.
    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 30, 2011 at 4:01 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057901#comment-13057901 ]

    Robert Muir commented on LUCENE-3264:
    -------------------------------------

    {quote}
    I don't understand. I thought that you said so regarding introducing atLeast and iterations, and I'm ok with that. I was just asking, since all you've done is move to use newDir, newIWC and RandomIW, how come the tests running time got that much longer? If it's not FSDir, do you have any idea what can cause that? Will RandomIW stall indexing randomly, or maybe it's newIWC which chooses to flush more often?
    {quote}

    I think the slowdown is basically linear (the tests run 2x or 3x as slow). Let me explain some of the reasons why you have this slowdown over just normal indexing without using randomiw/mockdirectorywrapper/etc:
    # we call checkIndex on every directory we create after its closed. I think this is the right thing to do always... it does slow down the tests a bit.
    # we do sometimes get crappy indexing params, crazy merge params, ridiculous IndexReader/Writer params (e.g. termIndexInterval=1). I think sometimes these non-optimal params slow things down.
    # occasionally we do things like randomly fully or partially optimize, yield(), etc.

    So while Lucene's defaults are pretty good, we are testing a bunch of non-default parameters and doing a bunch of other crazy things... so these slow down the tests!

    That being said, I'm working on the speed issue at least a little here, because I really want to get this test improvements in, although I really didn't want to work on this here (I think 1 minute extra *temporarily* to the build is no big deal for the additional coverage).

    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 30, 2011 at 4:05 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3264:
    --------------------------------

    Attachment: LUCENE-3264.patch

    attached is a patch with the improvements discussed, also just a couple test speedups (seems to shave off about a minute for me, setting me at 2minutes).

    Note I explicitly didnt change any of the ones that extend BaseTestTopK here... these tests are more complicated but I think we need to adjust this one to do more work on the nightly, and less work on regular builds (currently each test method in every class that extends this looks like it indexes up to 20000 * 4 = 80000 documents).

    I know these are the sluggish ones, but I'd really prefer to handle this separately because they are complicated.

    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch, LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Shai Erera (JIRA) at Jun 30, 2011 at 4:05 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057906#comment-13057906 ]

    Shai Erera commented on LUCENE-3264:
    ------------------------------------

    Thanks Robert. This makes sense to me.

    bq. although I really didn't want to work on this here (I think 1 minute extra temporarily to the build is no big deal for the additional coverage)

    I apologize if that caused you to do that work here. I really only wanted to understand. By all means, commit the changes. The explanation makes sense and I'm ok with it. We can speed up things later.
    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch, LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 30, 2011 at 6:37 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir resolved LUCENE-3264.
    ---------------------------------

    Resolution: Fixed

    ok, committed and backported.

    I think we should open followup issue(s):
    * speed up the top-k sampling tests (but make sure they are thorough on nightly etc still)
    * make a RandomTaxonomyWriter
    * look at any hardcoded constants like #docs etc and see if we can in general add randomization.

    crank up faceting module tests
    ------------------------------

    Key: LUCENE-3264
    URL: https://issues.apache.org/jira/browse/LUCENE-3264
    Project: Lucene - Java
    Issue Type: Test
    Components: modules/facet
    Reporter: Robert Muir
    Assignee: Robert Muir
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3264.patch, LUCENE-3264.patch


    The faceting module has a large set of good tests.
    lets switch them over to use all of our test infra (randomindexwriter, random iwconfig, mockanalyzer, newDirectory, ...)
    I don't want to address multipliers and atLeast() etc on this issue, I think we should follow up with that on a separate issue, that also looks at speed and making sure the nightly build is exhaustive.
    for now, lets just get the coverage in, it will be good to do before any refactoring.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 30, '11 at 4:48a
activeJun 30, '11 at 6:37p
posts9
users1
websitelucene.apache.org

1 user in discussion

Robert Muir (JIRA): 9 posts

People

Translate

site design / logo © 2021 Grokbase