FAQ
Add option to ReverseStringFilter to mark reversed tokens
---------------------------------------------------------

Key: LUCENE-1813
URL: https://issues.apache.org/jira/browse/LUCENE-1813
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
Affects Versions: 2.9
Reporter: Andrzej Bialecki
Attachments: reverseMark.patch

This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Andrzej Bialecki (JIRA) at Aug 15, 2009 at 4:17 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrzej Bialecki updated LUCENE-1813:
    --------------------------------------

    Attachment: reverseMark.patch

    Patch and unit tests.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Attachments: reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 15, 2009 at 4:53 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir reassigned LUCENE-1813:
    -----------------------------------

    Assignee: Robert Muir
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Attachments: reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 15, 2009 at 4:53 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743737#action_12743737 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    the corresponding solr task (SOLR-1321) is marked as version 1.4

    does anyone oppose putting this one in 2.9?

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Attachments: reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 15, 2009 at 5:09 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743740#action_12743740 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    andrzej, the reverse() methods are public, can you supply default impls (withMark=false) just in the case that someone is using them?

    alternatively, maybe the reverse() methods could stay the same, and the marking could happen in incrementToken() ?

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Attachments: reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Andrzej Bialecki (JIRA) at Aug 15, 2009 at 6:09 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743743#action_12743743 ]

    Andrzej Bialecki commented on LUCENE-1813:
    -------------------------------------------

    Either way is fine with me. To preserve the public API I think it's better to move this marking logic to incrementToken(). I'll prepare an updated patch.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Attachments: reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Andrzej Bialecki (JIRA) at Aug 15, 2009 at 8:25 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Andrzej Bialecki updated LUCENE-1813:
    --------------------------------------

    Attachment: reverseMark-2.patch

    Updated patch that moves the marking logic to incrementToken().
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 16, 2009 at 11:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1813:
    --------------------------------

    Lucene Fields: [New, Patch Available] (was: [New])
    Fix Version/s: 2.9

    thanks Andrzej, i think this patch is ready.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 16, 2009 at 10:12 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743916#action_12743916 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    if no one objects to this one I will commit it tomorrow.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Aug 16, 2009 at 10:28 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743918#action_12743918 ]

    Mark Miller commented on LUCENE-1813:
    -------------------------------------

    +1
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 16, 2009 at 11:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743926#action_12743926 ]

    Paul Cowan commented on LUCENE-1813:
    ------------------------------------

    Very very minor thing, but does it make more sense to choose a more suitable character? U+0001 is an assigned character, with some semantic meaning ("Start of Heading", same as ASCII character 0x01) which isn't really relevant to this use. It mightn't be a bad idea to (a) choose a control character which makes sense in context, if there is one (I can't see one, myself), (b) using a character from the private-use area (U+E000 to U+F8FF) or (c) my preferred option, using the Unicode tag characters. The tag characters are designed for just such a purpose.. embedding contextual metadata in text fields. The general syntax for a tag is <TAG TYPE> followed by one or more <TAG CHARACTER>s. Unfortunately, only one tag type is defined in unicode at present (language tag), which isn't suitable.

    That said, I think it makes sense (and is probably 'nicer') to pick one of the Unicode tag characters -- say, U+E0052 TAG LATIN CAPITAL LETTER R (for 'reverse') and use that. This could lead to a de facto standard for Lucene fields, where different variations of the same token could use different leading tag characters. Rather than just everyone picking a character at random, this could lead to some sort of structure around similar situations (i.e. I could envisage a filter which uses U+E004E TAG LATIN CAPITAL LETTER N for a normalised version of the token, etc).

    Sorry, I'm really anal about Unicode. Can't help it.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Aug 16, 2009 at 11:44 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743928#action_12743928 ]

    Mark Miller commented on LUCENE-1813:
    -------------------------------------

    bq. Sorry, I'm really anal about Unicode. Can't help it.

    This is a full text search engine - we love it!

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 16, 2009 at 11:44 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743929#action_12743929 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    {quote}
    Sorry, I'm really anal about Unicode. Can't help it.
    {quote}

    Me too :) My problem with tag characters is that they are deprecated.

    I will take your advice and look and see if there is something more suitable.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 16, 2009 at 11:52 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743931#action_12743931 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    another issue, besides the fact they are deprecated, is that tag characters are outside of the BMP.

    Currently, the reverse filter does not properly reverse characters outside of the BMP [it does not recognize them as one character],
    This means characters such as tag characters will be 'reversed' into trail surrogate followed by lead surrogate (two unpaired surrogates).
    But we cannot fix the above, as lucene wildcard support does not recognize codepoints > FFFF as one 'character' either.

    If we are gonna pick a character other than U+0001, it needs to be inside the BMP.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 17, 2009 at 12:04 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743933#action_12743933 ]

    Paul Cowan commented on LUCENE-1813:
    ------------------------------------

    Yeah, ok, makes sense.

    I'd suggest choosing a range of Private Use characters from the BMP block then, that's what they're for. Doesn't really matter which... we can pick a block of (say) 256 and use the first one for this, then others can be used for other purposes later if required. U+ECxx, maybe, because that's got 3 letters out of 'lucene' in it. So EC00 means 'reversed', and then people who need other similar filters can organise amongst themselves.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 17, 2009 at 12:06 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743933#action_12743933 ]

    Paul Cowan edited comment on LUCENE-1813 at 8/16/09 5:04 PM:
    -------------------------------------------------------------

    Yeah, ok, makes sense.

    I'd suggest choosing a range of Private Use characters from the BMP block then, that's what they're for. Doesn't really matter which... we can pick a block of (say) 256 and use the first one for this, then others can be used for other purposes later if required. U+ECxx, maybe, because that's got 3 letters out of 'lucene' in it :) . So EC00 means 'reversed', and then people who need other similar filters can organise amongst themselves.

    was (Author: pcowan):
    Yeah, ok, makes sense.

    I'd suggest choosing a range of Private Use characters from the BMP block then, that's what they're for. Doesn't really matter which... we can pick a block of (say) 256 and use the first one for this, then others can be used for other purposes later if required. U+ECxx, maybe, because that's got 3 letters out of 'lucene' in it. So EC00 means 'reversed', and then people who need other similar filters can organise amongst themselves.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 12:12 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743935#action_12743935 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    I looked into this and I think using the private use area (U+E000 to U+F8FF) would also not be the best.
    I do not think Lucene should use PUA characters system-internally, besides I have at least a few docs with PUA characters, and I think others will as well.
    We should leave PUA characters available to the end user.

    So personally I have nothing against this U+0001, but I'll take any recommendations...

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 12:20 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743936#action_12743936 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    what if we simply make it so there is no boolean option for a marker character, instead it is ReverseFilter() and ReverseFilter(char marker)
    This way, lucene does not define the character used for this operation, and someone can feel free to select whichever they want (such as U+0001)

    When we are on java 5 and can support supp. characters properly (reversing/wildcards,etc), then we can change this to ReverseFilter(int marker) and someone can use anything they want, including outside of the BMP?

    If this is ok, I will supply a patch.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 17, 2009 at 12:28 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743937#action_12743937 ]

    Paul Cowan commented on LUCENE-1813:
    ------------------------------------

    OK, cool. I'm taking an interest in this purely because I have some ideas for other token filters which would do something similar, and really like the idea of tagging them in the same way just with different 'headers'. It would be really beneficial, I think, to come up with something that can be reused and, more importantly, combined (so different filters don't 'clash' with their output). What about making it 2 characters, at least?

    U+0001 START OF HEADER
    U+xxxx whatever you like to indicate 'reversing' (i.e. an 'R', or just a 0-byte as this is the first purpose allocated, or whatever)

    This adds 2 bytes to each term, not 1, but terms generally don't take up that much room in the scale of a whole index and I think it's worth the flexibility. Hell, if you're willing to use 3 (that IS starting to seem wasteful, I admit) then maybe

    U+0001 START OF HEADER
    U+xxxx whatever
    U+0002 START OF TEXT

    That's at least semantically meaningful. Other ideas, just looking at the ASCII control characters:

    U+xxxx whatever
    U+001F UNIT SEPARATOR

    or

    U+000E SHIFT OUT
    U+xxxx whatever
    U+000F SHIFT IN

    I don't really mind, but it's always nice to plan ahead.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 12:30 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743939#action_12743939 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    Paul ok, how about instead of char, a sequence :)

    Then you can use however many characters you want...
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 17, 2009 at 12:30 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743938#action_12743938 ]

    Paul Cowan commented on LUCENE-1813:
    ------------------------------------

    Yes, or +1 on passing in your own marker, that lets everyone choose a character that won't clash with whatever unicode subset gets used for their tokens.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 12:42 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1813:
    --------------------------------

    Attachment: LUCENE-1813.patch

    updated patch so you can choose your own character for marking.

    if one character is not enough let me know (i suppose we could make it a sequence), but I'd rather keep this simple.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Cowan (JIRA) at Aug 17, 2009 at 1:12 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743943#action_12743943 ]

    Paul Cowan commented on LUCENE-1813:
    ------------------------------------

    Simple's good. I'm a happy man, thanks Robert.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 1:18 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743944#action_12743944 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    Paul, thanks very much for your feedback... I think its a cleaner change now.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Andrzej Bialecki (JIRA) at Aug 17, 2009 at 7:22 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743983#action_12743983 ]

    Andrzej Bialecki commented on LUCENE-1813:
    -------------------------------------------

    +1. One comment, perhaps stating the obvious .. I picked char 0001 for two reasons - it's not likely to be used in regular text, and its UTF-8 encoding uses one byte. The use case for this filter means that it will create more or less as many tokens as there were in the original token stream, thus doubling the size of term dictionary. One byte here, one byte there, and suddenly it matters whether we use 0001 or FFFF ...
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Ted Dunning (JIRA) at Aug 17, 2009 at 8:06 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743994#action_12743994 ]

    Ted Dunning commented on LUCENE-1813:
    -------------------------------------


    I understand the desire to use a mark that requires fewer bytes, but the unicode bidi marks might be better for the purpose of marking writing direction: (U+200E LTR or U+200F RTL)



    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 11:28 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744023#action_12744023 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    Ted, with the current patch you can do this: new ReverseStringFilter('\u200E'), or new ReverseStringFilter('\u200F'), or new ReverseStringFilter('\u0001'), or whatever.

    Also, for anyone using this filter its my understanding that each term in lucene's term dictionary is a "delta" versus the previous term, so the character you choose should not affect its size?

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll (JIRA) at Aug 17, 2009 at 1:47 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744062#action_12744062 ]

    Grant Ingersoll commented on LUCENE-1813:
    -----------------------------------------

    Perhaps it is useful to define a few constants for each of these suggested characters to make it super easy for people to use them? Just a thought. Otherwise, I like the idea of passing in your own marker.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • DM Smith (JIRA) at Aug 17, 2009 at 2:59 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744092#action_12744092 ]

    DM Smith commented on LUCENE-1813:
    ----------------------------------

    I like the idea of a constant and it presented as a default. I suggest that others be given in the JavaDoc.

    I have some texts which are using PUAs until Unicode includes the code points (e.g. Myanmar text), so I'm glad that allowing a choice doesn't create a potential conflict there. I think PUA should be left to the text author.

    As my texts are all derived from XML, I like the use of a character that is not allowed in XML. I think 0001 is just fine, even if not from a purity perspective.

    Some of my texts have BIDI markers and while these will be stripped by filters, I don't think this use is analogous.


    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 17, 2009 at 8:28 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744184#action_12744184 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    thanks for your comments guys, I like the idea of constants for some of these suggested characters.

    i will update the patch later tonight if no one wants to tackle it and beat me to it first :)
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 18, 2009 at 2:51 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1813:
    --------------------------------

    Attachment: LUCENE-1813.patch

    same as before (you pick your own character), but with some constants for example marker characters, ones mentioned under this issue.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 18, 2009 at 4:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744586#action_12744586 ]

    Robert Muir commented on LUCENE-1813:
    -------------------------------------

    please let me know if you have any feedback on the latest patch.
    if there are no comments I would like to resolve this issue tomorrow or thursday, thanks!
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 19, 2009 at 1:53 am
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1813:
    --------------------------------

    Attachment: LUCENE-1813.patch

    just some additional javadocs.
    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, LUCENE-1813.patch, LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 19, 2009 at 12:10 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir resolved LUCENE-1813.
    ---------------------------------

    Resolution: Fixed

    Committed revision 805769.

    Thanks Andrzej and also everyone who provided feedback

    Add option to ReverseStringFilter to mark reversed tokens
    ---------------------------------------------------------

    Key: LUCENE-1813
    URL: https://issues.apache.org/jira/browse/LUCENE-1813
    Project: Lucene - Java
    Issue Type: Improvement
    Components: contrib/analyzers
    Affects Versions: 2.9
    Reporter: Andrzej Bialecki
    Assignee: Robert Muir
    Fix For: 2.9

    Attachments: LUCENE-1813.patch, LUCENE-1813.patch, LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch


    This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedAug 15, '09 at 4:17p
activeAug 19, '09 at 12:10p
posts34
users1
websitelucene.apache.org

1 user in discussion

Robert Muir (JIRA): 34 posts

People

Translate

site design / logo © 2021 Grokbase