Grokbase Groups Lucene dev July 2010
FAQ
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1799:
--------------------------------

Attachment: LUCENE-1799.patch

attached is a simple prototype for encoding terms as BOCU-1

So while I don't think things like wildcard, etc will work due to the nature of BOCU-1, term and phrase queries should work fine, and it maintains UTF-8 order so sorting is fine, and range queries should work once we fix TermRangeQuery to use byte.

the impl is probably a bit slow (uses charset api) as its just for playing around.

note: I didnt check the box because of the patent thing, (not sure it even applies since i use the icu impl here), but either way i dont want to involve myself with that.

Unicode compression
-------------------

Key: LUCENE-1799
URL: https://issues.apache.org/jira/browse/LUCENE-1799
Project: Lucene - Java
Issue Type: New Feature
Components: Store
Affects Versions: 2.4.1
Reporter: DM Smith
Priority: Minor
Attachments: LUCENE-1799.patch


In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
This led to the comment that a different or compressed encoding would be a generally useful feature.
BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
SCSU is another Unicode compression algorithm that could be used.
An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Uwe Schindler (JIRA) at Jul 20, 2010 at 8:27 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    Here the policed one :-)

    In my opinion something is better than nothing. The patents are not violated here, as we only use an abstract API and the string "BOCU-1". You can use the same code to encode in "ISO-8859-1".
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 8:29 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: (was: LUCENE-1799.patch)
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 8:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    One more violation. Now its correct!
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 9:00 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    Here a heavy reusing variant.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 9:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 9:17 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    The last one that could be used with any charset
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 9:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: (was: LUCENE-1799.patch)
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 10:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 20, 2010 at 10:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: (was: LUCENE-1799.patch)
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 21, 2010 at 7:08 am
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    Here is a 100% legally valid implementation:

    - Linking to icu4j-charsets is done dynamically by reflection. If you don't have ICU4J charsets in your classpath, the attribute throws explaining exception
    - We dont need to ship the rather large JAR file with Lucene just for this class
    - We dont have legal patent problems as we neither ship the API nor use it directly
    - The backside is that the Test simple prints a warning but passes, so the class is not tested until you install icu4j-charsets.jar. We can put the JAR file on hudson, so it can be used during nightly builds. Or we download it dynamically on build.

    I added further improvements to the encoder ittself:
    - less variables
    - correct error handling for encoding errors
    - remove floating point from main loop
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jul 21, 2010 at 9:04 am
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-1799:
    ----------------------------------

    Attachment: LUCENE-1799.patch

    A new patch that completely separates the BOCU factory from the implementation (which moves to common/miscellaneous). This has the following advantages:

    - You can use any Charset to encode your terms. The javadocs should only note, that the byte[] order should be correct for range queries to work
    - Theoretically you could remove the BOCU classes at all, one that wants to use, can simply get the Charset from ICUs factory and pass it to the AttributeFactory. The convenience class is still useful, especially if we can later natively implement the encoding without NIO (when patent issues are solved...)
    - The test for the CustomCharsetTermAttributeFactory uses UTF-8 as charset and verifies that the created BytesRefs have the same format like a BytesRef created using the UnicodeUtils.
    - The test also checks that encoding errors are bubbled up as RuntimeExceptions

    TODO:

    - docs
    - handling of encoding errors configureable (replace with replacement char?)
    - If you want your complete index e.g. in ISO-8859-1, there should be also convenience methods that take CharSequences/char[] in the factory/attribute to quickly convert strings to BytesRefs like UnicodeUtil does - by this its possible to create TermQueries directly using e.g. ISO-8859-1 encoding.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 22, 2010 at 11:38 am
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799_big.patch

    attached is a really really rough patch that sets bocu-1 as the default encoding.

    Beware: its a work in progress and a lot of the patch is auto-generated (eclipse) so some things need to be reverted.

    Most tests pass, the idea is to find bugs in tests etc that abuse bytesref/assume utf-8 encoding, things like that.

    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 27, 2010 at 9:39 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799.patch

    attached is a patch for the start of a "BOCUUtil' with unicodeutil like methods.

    For now i only implemented encode (and encodeWithHash):

    I generated random strings with _TestUtil.randomRealisticUnicodeString and benchmarked, and the numbers are stable.
    encoding||time to encode 20 million strings (ms)||number of encoded bytes||
    UTF-8|1,757|596,516,000|
    BOCU-1|1,968|250,202,000|
    So I think we get good compression, and good performance close to UTF-8 for encode.
    I'll work on decode now.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 27, 2010 at 11:01 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1799:
    ---------------------------------------

    Attachment: LUCENE-1779.patch

    Slightly more optimized version of BOCU1 encode (but it's missing the hash variant).
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 27, 2010 at 11:05 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1799:
    ---------------------------------------

    Attachment: LUCENE-1799.patch

    Duh -- that was some ancient wrong patch. This one should be right!
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 27, 2010 at 11:21 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1799:
    ---------------------------------------

    Attachment: LUCENE-1799.patch

    Just inlines the 2-byte diff case.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jul 27, 2010 at 11:47 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-1799:
    ---------------------------------------

    Attachment: LUCENE-1799.patch

    Inlines/unwinds the 3-byte cases. I think we can leave the 4 byte case as a for loop...
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 28, 2010 at 1:03 am
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799.patch

    removed some ifs for the positive unrolled cases.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 28, 2010 at 12:12 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799.patch

    i optimized the surrogate case here, moving it into the 'prev' calculation.
    now we are faster than utf-8 on average for encode.
    encoding||time to encode 20 million strings (ms)||number of encoded bytes||
    UTF-8|1,756|596,516,000|
    BOCU-1|1,724|250,202,000|
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 28, 2010 at 12:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799.patch

    oops, forgot a check in the surrogate case.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 28, 2010 at 2:14 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: LUCENE-1799.patch

    here it is with first stab at decoder (its correct against random icu strings, but i didnt benchmark yet)
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jul 28, 2010 at 6:54 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-1799:
    --------------------------------

    Attachment: Benchmark.java

    attached is my benchmark for english text.

    UTF-8: 15530ms
    BOCU-1: 15687ms

    Note, i use a Sun JVM 1.6.0_19 (64bit)

    Yonik if you run this benchmark and find a problem with it / or its slower on your machine, let me know your configuration, because i dont see the results you do.
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: Benchmark.java, LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Yonik Seeley (JIRA) at Jul 28, 2010 at 7:32 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Yonik Seeley updated LUCENE-1799:
    ---------------------------------

    Attachment: Benchmark.java
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: Benchmark.java, Benchmark.java, LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Yonik Seeley (JIRA) at Jul 28, 2010 at 7:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Yonik Seeley updated LUCENE-1799:
    ---------------------------------

    Attachment: Benchmark.java

    OK, hopefully the right Benchmark.java this time ;-)
    Unicode compression
    -------------------

    Key: LUCENE-1799
    URL: https://issues.apache.org/jira/browse/LUCENE-1799
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Store
    Affects Versions: 2.4.1
    Reporter: DM Smith
    Priority: Minor
    Attachments: Benchmark.java, Benchmark.java, Benchmark.java, LUCENE-1779.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


    In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.
    This led to the comment that a different or compressed encoding would be a generally useful feature.
    BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.
    SCSU is another Unicode compression algorithm that could be used.
    An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJul 20, '10 at 7:43p
activeJul 28, '10 at 7:43p
posts25
users1
websitelucene.apache.org

1 user in discussion

Yonik Seeley (JIRA): 25 posts

People

Translate

site design / logo © 2022 Grokbase