Grokbase Groups Lucene dev May 2011
FAQ
[ https://issues.apache.org/jira/browse/LUCENE-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034530#comment-13034530 ]

Esmond Pitt commented on LUCENE-2100:
-------------------------------------

Did somebody implement this for 3.1.0? StandardAnalyzer became final between 3.0.3 and 3.1.0. This is *not acceptable.* Binary compatibility must be preserved and to be frank I do not give a good goddam how ugly the code inside looks compared to this requirement.
Make contrib analyzers final
----------------------------

Key: LUCENE-2100
URL: https://issues.apache.org/jira/browse/LUCENE-2100
Project: Lucene - Java
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 3.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
Fix For: 4.0

Attachments: LUCENE-2100.patch, LUCENE-2100.patch


The analyzers in contrib/analyzers should all be marked final. None of the Analyzers should ever be subclassed - users should build their own analyzers if a different combination of filters and Tokenizers is desired.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Steven Rowe (JIRA) at May 17, 2011 at 4:14 am
    [ https://issues.apache.org/jira/browse/LUCENE-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034540#comment-13034540 ]

    Steven Rowe commented on LUCENE-2100:
    -------------------------------------

    Hi Esmond,

    Take a look at [the source code for StandardAnalyzer|http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/lucene/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java?view=markup]. Fewer than 50 lines of code there, if you take out the comments. Copy/paste suddenly seems doable. Lucene's Analyzers are best thought of as examples.

    Steve
    Make contrib analyzers final
    ----------------------------

    Key: LUCENE-2100
    URL: https://issues.apache.org/jira/browse/LUCENE-2100
    Project: Lucene - Java
    Issue Type: Improvement
    Components: modules/analysis
    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 3.0
    Reporter: Simon Willnauer
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 4.0

    Attachments: LUCENE-2100.patch, LUCENE-2100.patch


    The analyzers in contrib/analyzers should all be marked final. None of the Analyzers should ever be subclassed - users should build their own analyzers if a different combination of filters and Tokenizers is desired.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Esmond Pitt (JIRA) at May 17, 2011 at 4:30 am
    [ https://issues.apache.org/jira/browse/LUCENE-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034544#comment-13034544 ]

    Esmond Pitt commented on LUCENE-2100:
    -------------------------------------

    Steve

    Thanks. Maybe you could have a look at this. How do you suggest I recode it?
    I wrote this 7 years ago and cannot now remember anything about it. Quite
    possibly the entire thing is now obsolete, but I've been carting it around
    since before Lucene was even at Apache. All I've ever done is adjust the
    version number.

    ==========================================================
    public class PorterStemAnalyzer extends StandardAnalyzer
    {
    /**
    * Construct a new instance of PorterStemAnalyzer.
    */
    public PorterStemAnalyzer()
    {
    super(Version.LUCENE_30);
    }

    @Override
    public final TokenStream tokenStream(String fieldName, Reader
    reader)
    {
    return new PorterStemFilter(super.tokenStream(fieldName,
    reader));
    }
    }
    ============================================================

    EJP


    Make contrib analyzers final
    ----------------------------

    Key: LUCENE-2100
    URL: https://issues.apache.org/jira/browse/LUCENE-2100
    Project: Lucene - Java
    Issue Type: Improvement
    Components: modules/analysis
    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 3.0
    Reporter: Simon Willnauer
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 4.0

    Attachments: LUCENE-2100.patch, LUCENE-2100.patch


    The analyzers in contrib/analyzers should all be marked final. None of the Analyzers should ever be subclassed - users should build their own analyzers if a different combination of filters and Tokenizers is desired.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at May 17, 2011 at 4:38 am
    [ https://issues.apache.org/jira/browse/LUCENE-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034549#comment-13034549 ]

    Robert Muir commented on LUCENE-2100:
    -------------------------------------

    Esmond: hi, what you are doing here is exactly the reason why we made it final.

    By subclassing StandardAnalyzer in this way, the indexer is no longer able to reuse tokenstreams, making analysis very slow and inefficient.

    The easiest way to get your PorterStemAnalyzer is to just use EnglishAnalyzer, which does just this.

    Otherwise if you really want to do it yourself, do it like this:
    {noformat}
    Analyzer analyzer = new ReusableAnalyzerBase() {
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer tokenizer = new StandardTokenizer(...);
    TokenStream filteredStream = new StandardFilter(tokenizer, ...);
    filteredStream = new LowerCaseFilterFilter(filteredStream, ...);
    filteredStream = new StopFilterFilter(filteredStream, ...);
    filteredStream = new PorterStemFilter(filteredStream, ...);
    return new TokenStreamComponents(tokenizer, filteredStream);
    }
    };
    {noformat}

    Please see LUCENE-3055 for more examples and a more thorough explanation.

    The good news is if you implement your analyzer like this, you will see performance improvements!

    Make contrib analyzers final
    ----------------------------

    Key: LUCENE-2100
    URL: https://issues.apache.org/jira/browse/LUCENE-2100
    Project: Lucene - Java
    Issue Type: Improvement
    Components: modules/analysis
    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 3.0
    Reporter: Simon Willnauer
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 4.0

    Attachments: LUCENE-2100.patch, LUCENE-2100.patch


    The analyzers in contrib/analyzers should all be marked final. None of the Analyzers should ever be subclassed - users should build their own analyzers if a different combination of filters and Tokenizers is desired.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Esmond Pitt (JIRA) at May 17, 2011 at 4:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034555#comment-13034555 ]

    Esmond Pitt commented on LUCENE-2100:
    -------------------------------------

    Many thanks.


    Make contrib analyzers final
    ----------------------------

    Key: LUCENE-2100
    URL: https://issues.apache.org/jira/browse/LUCENE-2100
    Project: Lucene - Java
    Issue Type: Improvement
    Components: modules/analysis
    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 3.0
    Reporter: Simon Willnauer
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 4.0

    Attachments: LUCENE-2100.patch, LUCENE-2100.patch


    The analyzers in contrib/analyzers should all be marked final. None of the Analyzers should ever be subclassed - users should build their own analyzers if a different combination of filters and Tokenizers is desired.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMay 17, '11 at 3:47a
activeMay 17, '11 at 4:50a
posts5
users1
websitelucene.apache.org

1 user in discussion

Esmond Pitt (JIRA): 5 posts

People

Translate

site design / logo © 2021 Grokbase