FAQ
Arabic Analyzer: Stopwords list needs enhancement
-------------------------------------------------

Key: LUCENE-1966
URL: https://issues.apache.org/jira/browse/LUCENE-1966
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
Affects Versions: 2.9.1
Reporter: Basem Narmok
Priority: Trivial
Fix For: 2.9


The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Basem Narmok (JIRA) at Oct 8, 2009 at 11:08 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Basem Narmok updated LUCENE-1966:
    ---------------------------------

    Attachment: LUCENE-1966.patch
    arabic-stopwords-comments.txt

    Please see the arabic-stopwords-comments.txt to see my comments on the list, and why/what did I change.

    The patch provides an updated Arabic stopwords file, and modifies ArabicAnalyzer to filter stopwords after the normalization, as the provided list is a normalized Arabic stop words.

    Best,
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9.1
    Reporter: Basem Narmok
    Priority: Trivial
    Fix For: 2.9

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 9, 2009 at 12:22 am
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir reassigned LUCENE-1966:
    -----------------------------------

    Assignee: Robert Muir
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9.1
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 2.9

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 9, 2009 at 1:04 am
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763774#action_12763774 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem, thanks for the patch, and the comments.

    One thing I noticed: if I apply the patch, على (the stopword) will not be filtered as a stopword. This is because it will be normalized to علي (the name).

    So, if we are going to normalize before stopfilter, I think we need to make sure the stopwords do not contain yeh without dots, or else these will not work. This is one example of why I was scared to apply normalization before stopwords, because by doing so, we cause على and علي to conflate.

    Let me know what you think about this.

    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9.1
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 2.9

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 9, 2009 at 12:40 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1966:
    --------------------------------

    Affects Version/s: (was: 2.9.1)
    2.9
    Fix Version/s: (was: 2.9)
    3.0
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Basem Narmok (JIRA) at Oct 11, 2009 at 9:38 am
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Basem Narmok updated LUCENE-1966:
    ---------------------------------

    Attachment: LUCENE-1966.patch

    Robert, you are correct, to solve the problem we have two options:
    1- to remove words like علي and وفي
    2- to use unnormalized stiowirds list, before the normalization filter.

    I think the best is the second option, so this patch only modifies the list (unnormalized), please try it.
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 1:35 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764449#action_12764449 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem, thanks. I like the new list.

    I have one very minor question: in the list we have أيضا / ايضا twice.

    I wanted to check with you, is this by accident or did you have some other spellings in mind?

    If it is by accident, let me know, I can just remove the duplicates before committing.
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Basem Narmok (JIRA) at Oct 11, 2009 at 2:54 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764456#action_12764456 ]

    Basem Narmok commented on LUCENE-1966:
    --------------------------------------

    Hi Robert,

    Regarding ايضا / أيضا ...

    No, not by accident, I included both formats (normalized,unnormalized). Arabic users tend to use both on the internet (different spellings), another example is words like أي / اي
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 3:02 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764462#action_12764462 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem, I meant: there are two entries for أيضا , and two entries for ايضا (total of four)

    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 3:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764462#action_12764462 ]

    Robert Muir edited comment on LUCENE-1966 at 10/11/09 8:10 AM:
    ---------------------------------------------------------------

    Basem, I meant: there are two entries for أيضا , and two entries for ايضا (total of four)

    edit: here are the relevant line numbers from the new stopwords.txt:

    Lines 72 and 73:
    {noformat}
    ايضا
    أيضا
    {noformat}

    Lines 123 and 124:
    {noformat}
    ايضا
    أيضا
    {noformat}

    was (Author: rcmuir):
    Basem, I meant: there are two entries for أيضا , and two entries for ايضا (total of four)

    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 3:32 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764465#action_12764465 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem I can simply remove 123 & 124 if this is the case, but I did not want to do this without checking first.

    The reason is, I wonder if perhaps you intended for these two to be أيضاً and ايضاً (with fathatan)
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Basem Narmok (JIRA) at Oct 11, 2009 at 5:23 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764493#action_12764493 ]

    Basem Narmok commented on LUCENE-1966:
    --------------------------------------

    Oh, my mistake, sorry, yes please remove the last two on 123 & 124.

    no, they are just duplicate of the ones on line 72 & 73


    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 5:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764495#action_12764495 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem, ok! Thanks a lot for your help here. I will commit soon.
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 6:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764501#action_12764501 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    before I commit this, I want to solicit any comments/concerns about backwards compat, assuming the following notice:

    {noformat}
    Changes in runtime behavior

    * LUCENE-1966: Modified and cleaned the default Arabic stopwords list used
    by ArabicAnalyzer. You'll need to fully re-index any previously created
    indexes. (Basem Narmok via Robert Muir)
    {noformat}

    i know contrib has no bw compat guarantee, but just want to double-check.
    Perhaps in the future someone might help fix the Persian stopwords file also so this may happen again :)

    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Basem Narmok (JIRA) at Oct 11, 2009 at 10:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764515#action_12764515 ]

    Basem Narmok commented on LUCENE-1966:
    --------------------------------------

    Seems good.

    BTW with FAST ESP we never used stopwords, as hits from stopwords get low relevancy (keywords with high number of hits = low value, low importance, so less relevant), so such hits will never get into the top results. Also, using stopwords will affect phrase search, most of the search engines avoid removing them. But, at the end it depends on the client's application, and what she really wants, as enterprise search could have very specific and different needs than Internet search.

    Anyways, still I am testing the Arabic Analyzer, and I will provide you with more comments soon. but for the stopwords they are good for now :)
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 11, 2009 at 10:54 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764519#action_12764519 ]

    Robert Muir commented on LUCENE-1966:
    -------------------------------------

    Basem, yes I think the improvements are good.

    My question is really: is it OK to commit this for 3.0 or should we wait for 3.1?

    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Oct 14, 2009 at 12:26 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir resolved LUCENE-1966.
    ---------------------------------

    Resolution: Fixed

    Committed revision 825110.

    Thanks Basem!
    Arabic Analyzer: Stopwords list needs enhancement
    -------------------------------------------------

    Key: LUCENE-1966
    URL: https://issues.apache.org/jira/browse/LUCENE-1966
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Basem Narmok
    Assignee: Robert Muir
    Priority: Trivial
    Fix For: 3.0

    Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, LUCENE-1966.patch


    The provided Arabic stopwords list needs some enhancements (e.g. it contains a lot of words that not stopwords, and some cleanup) . patch will be provided with this issue.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedOct 8, '09 at 10:52p
activeOct 14, '09 at 12:26p
posts17
users1
websitelucene.apache.org

1 user in discussion

Robert Muir (JIRA): 17 posts

People

Translate

site design / logo © 2021 Grokbase