Grokbase Groups Lucene dev June 2011
FAQ
hmm are you concerned about the extra Math.min that happens in the
copyOf method?
I don't how that relates to "intrinsic" and java 1.7

I don't have strong feelings here just checking if you mix something
up in the comment you put there... I am happy to keep the old and now
current code

simon
On Thu, Jun 30, 2011 at 2:42 PM, wrote:
Author: rmuir
Date: Thu Jun 30 12:42:17 2011
New Revision: 1141510

URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
Log:
LUCENE-3239: remove use of slow Arrays.copyOf

Modified:
lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff
==============================================================================
--- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original)
+++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
@@ -2,7 +2,6 @@ package org.apache.lucene.util;

import java.io.IOException;
import java.io.OutputStream;
-import java.util.Arrays;

/**
* Licensed to the Apache Software Foundation (ASF) under one or more
@@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
}

private void grow(int newLength) {
-    buffer = Arrays.copyOf(buffer, newLength);
+    // It actually should be: (Java 1.7, when its intrinsic on all machines)
+    // buffer = Arrays.copyOf(buffer, newLength);
+    byte[] newBuffer = new byte[newLength];
+    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
+    buffer = newBuffer;
}

/**

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Uwe Schindler at Jun 30, 2011 at 12:56 pm
    Hi Robert,

    you reverted a use of Arrays.copyOf() on native types which is *exactly* implemented like this in Arrays.java code!

    The slow ones are <T> T[] copyOf(T[] array, int newlen)

    (because they use reflection).

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: rmuir@apache.org
    Sent: Thursday, June 30, 2011 2:42 PM
    To: commits@lucene.apache.org
    Subject: svn commit: r1141510 -
    /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:

    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB
    yteArrayOutputStream.java

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB
    yteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/or
    g/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1
    =1141509&r2=1141510&view=diff
    ==========================================================
    ====================
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeB
    yteArrayOutputStream.java (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    +++ eByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    - buffer = Arrays.copyOf(buffer, newLength);
    + // It actually should be: (Java 1.7, when its intrinsic on all machines)
    + // buffer = Arrays.copyOf(buffer, newLength);
    + byte[] newBuffer = new byte[newLength];
    + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    + buffer = newBuffer;
    }

    /**


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir at Jun 30, 2011 at 1:06 pm
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM,  wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

    Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff
    ==============================================================================
    --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original)
    +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    -    buffer = Arrays.copyOf(buffer, newLength);
    +    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    +    // buffer = Arrays.copyOf(buffer, newLength);
    +    byte[] newBuffer = new byte[newLength];
    +    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    +    buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer at Jun 30, 2011 at 1:09 pm
    Robert I agree but doesn't that apply to Arrays.copyOf(Object[],int)
    only? here we use a specialized primitive version?

    simon
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM,  wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

    Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff
    ==============================================================================
    --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original)
    +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    -    buffer = Arrays.copyOf(buffer, newLength);
    +    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    +    // buffer = Arrays.copyOf(buffer, newLength);
    +    byte[] newBuffer = new byte[newLength];
    +    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    +    buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Dawid Weiss at Jun 30, 2011 at 1:12 pm
    Arrays.copyOf(primitive) is actually System.arraycopy by default. If
    intrinsics are used it can only get faster. For object types it will
    probably be a bit slower for -client because of a runtime check for
    the component type.

    Dawid
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM,  wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

    Modified: lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff
    ==============================================================================
    --- lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java (original)
    +++ lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    -    buffer = Arrays.copyOf(buffer, newLength);
    +    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    +    // buffer = Arrays.copyOf(buffer, newLength);
    +    byte[] newBuffer = new byte[newLength];
    +    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    +    buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler at Jun 30, 2011 at 1:26 pm
    We had an issue about this with FST's array growing in Mike's code, in facts ist *much* slower for generic Arrays' T[] copyOf(T[]...), with T extends Object (uses slow reflection).

    For primitives it can only get faster in later JVMs, this is why we want to change all ArrayUtils.grow() to use this (and we don’t have a generic one there for above reason).

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: dawid.weiss@gmail.com On Behalf
    Of Dawid Weiss
    Sent: Thursday, June 30, 2011 3:11 PM
    To: dev@lucene.apache.org
    Subject: Re: svn commit: r1141510 -
    /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    Arrays.copyOf(primitive) is actually System.arraycopy by default. If intrinsics
    are used it can only get faster. For object types it will probably be a bit slower
    for -client because of a runtime check for the component type.

    Dawid
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM, wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java
    /org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510
    &r1=1141509&r2=1141510&view=diff
    ==========================================================
    ==========
    ==========
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/U
    +++ nsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or
    more @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    - buffer = Arrays.copyOf(buffer, newLength);
    + // It actually should be: (Java 1.7, when its intrinsic on all
    + machines)
    + // buffer = Arrays.copyOf(buffer, newLength);
    + byte[] newBuffer = new byte[newLength];
    + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    + buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
    additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
    additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
    commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer at Jun 30, 2011 at 1:40 pm

    On Thu, Jun 30, 2011 at 3:26 PM, Uwe Schindler wrote:
    We had an issue about this with FST's array growing in Mike's code, in facts ist *much* slower for generic Arrays' T[] copyOf(T[]...), with T extends Object (uses slow reflection).

    For primitives it can only get faster in later JVMs, this is why we want to change all ArrayUtils.grow() to use this (and we don’t have a generic one there for above reason).
    +1 - I don't see why this would be any slower... if we can get
    improvements we should go for it. The issues and bugreports are all
    for non-primitive copyOf methods so I don't see how this should affect
    us

    simon
    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: dawid.weiss@gmail.com On Behalf
    Of Dawid Weiss
    Sent: Thursday, June 30, 2011 3:11 PM
    To: dev@lucene.apache.org
    Subject: Re: svn commit: r1141510 -
    /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    Arrays.copyOf(primitive) is actually System.arraycopy by default. If intrinsics
    are used it can only get faster. For object types it will probably be a bit slower
    for -client because of a runtime check for the component type.

    Dawid
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM,  wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java
    /org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510
    &r1=1141509&r2=1141510&view=diff
    ==========================================================
    ==========
    ==========
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsaf
    eByteArrayOutputStream.java (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/U
    +++ nsafeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or
    more @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    -    buffer = Arrays.copyOf(buffer, newLength);
    +    // It actually should be: (Java 1.7, when its intrinsic on all
    + machines)
    +    // buffer = Arrays.copyOf(buffer, newLength);
    +    byte[] newBuffer = new byte[newLength];
    +    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    +    buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
    additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
    additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
    commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir at Jun 30, 2011 at 2:44 pm
    I think Dawid is correct here... so we should change this back? still
    personally when I see array clone() or copyOf() it makes me concerned, I
    know these are as fast as arraycopy in recent versions, but depending on
    which variant is used, and whether you use -server, can be slower... in
    general I just don't want us to have performance regressions on say windows
    32bit over this stuff, personally I think arraycopy is a sure fire bet every
    time, but Ill concede the point that copyOf might not be slower for the
    primitive versions... I think in jdk7 we will not have this issue as -client
    sorta goes away in favor of the tiered thing? anyway, I think we should
    proceed with caution here as far as moving things over to copyOf, I don't
    see any evidence that its ever faster, but its definitely sometimes slower.
    On Jun 30, 2011 9:12 AM, "Dawid Weiss" wrote:
    Arrays.copyOf(primitive) is actually System.arraycopy by default. If
    intrinsics are used it can only get faster. For object types it will
    probably be a bit slower for -client because of a runtime check for
    the component type.

    Dawid
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM, wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff
    >>>>
    ==============================================================================
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    - buffer = Arrays.copyOf(buffer, newLength);
    + // It actually should be: (Java 1.7, when its intrinsic on all
    machines)
    + // buffer = Arrays.copyOf(buffer, newLength);
    + byte[] newBuffer = new byte[newLength];
    + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    + buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer at Jun 30, 2011 at 6:32 pm

    On Thu, Jun 30, 2011 at 4:44 PM, Robert Muir wrote:
    I think Dawid is correct here... so we should change this back? still
    personally when I see array clone() or copyOf() it makes me concerned, I
    know these are as fast as arraycopy in recent versions, but depending on
    which variant is used, and whether you use -server, can be slower... in
    general I just don't want us to have performance regressions on say windows
    32bit over this stuff, personally I think arraycopy is a sure fire bet every
    time, but Ill concede the point that copyOf might not be slower for the
    primitive versions... I think in jdk7 we will not have this issue as -client
    sorta goes away in favor of the tiered thing? anyway, I think we should
    proceed with caution here as far as moving things over to copyOf, I don't
    see any evidence that its ever faster, but its definitely sometimes slower.
    I don't seen any evidence that this is any slower though.

    simon
    On Jun 30, 2011 9:12 AM, "Dawid Weiss" wrote:
    Arrays.copyOf(primitive) is actually System.arraycopy by default. If
    intrinsics are used it can only get faster. For object types it will
    probably be a bit slower for -client because of a runtime check for
    the component type.

    Dawid
    On Thu, Jun 30, 2011 at 3:05 PM, Robert Muir wrote:
    because on windows 32bit at least, -client is still the default on
    most jres out there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports
    (oracle's site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I
    think we should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM,  wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:

    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r1=1141509&r2=1141510&view=diff

    ==============================================================================
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
    Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    -    buffer = Arrays.copyOf(buffer, newLength);
    +    // It actually should be: (Java 1.7, when its intrinsic on all
    machines)
    +    // buffer = Arrays.copyOf(buffer, newLength);
    +    byte[] newBuffer = new byte[newLength];
    +    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    +    buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Dawid Weiss at Jun 30, 2011 at 6:51 pm
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is tricky
    because x64 will pick -server regardless of the command-line setting)
    and you need to be copying generic arrays. I think this can be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.

    Dawid

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless at Jun 30, 2011 at 6:58 pm
    I think it's important Lucene keeps good performance on "ordinary"
    machines/envs.

    It's really quite dangerous that the active Lucene devs all use beasts
    for development/testing. We draw false conclusions.

    So we really should be testing with -client and if indeed generified
    Arrays.copyOf (and anything else) is risky in such envs we should not
    use it when System.arraycopy works more consistently.

    Mike McCandless

    http://blog.mikemccandless.com

    On Thu, Jun 30, 2011 at 2:50 PM, Dawid Weiss
    wrote:
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is tricky
    because x64 will pick -server regardless of the command-line setting)
    and you need to be copying generic arrays. I think this can be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.

    Dawid

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Dawid Weiss at Jun 30, 2011 at 7:10 pm

    I think it's important Lucene keeps good performance on "ordinary"
    machines/envs.
    Not that this voice will help in anything, but I think the above is
    virtually impossible to achieve unless you have a bunch of machines,
    OSs and VMs to continually test on and a consistent set of benchmarks
    plotted over time... and of course check every single commit for
    regression over all these combinations. And even then you'd always
    find a case of something being faster or slower on some combination of
    hardware/ software; optimizing for these differences makes little
    sense to me (people struggling with performance on some weird
    software/hardware combination can always change the VM vendor or a VM
    switch).

    Sorry for being so pessimistically unconstructive... :(

    Dawid

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless at Jun 30, 2011 at 7:25 pm
    Fair enough, and I agree.

    Though the least we could do is rotate in a Windows env, where Java
    runs with -client, to our Jenkins.

    But simple-to-follow rules like "Don't use Arrays.copyOf; use
    System.arraycopy instead" (if indeed System.arraycopy seems to
    generally not be slower) seem like a no-brainer.

    Why risk Arrays.copyOf, anytime? Shouldn't we never use it...?

    Mike McCandless

    http://blog.mikemccandless.com

    On Thu, Jun 30, 2011 at 3:09 PM, Dawid Weiss
    wrote:
    I think it's important Lucene keeps good performance on "ordinary"
    machines/envs.
    Not that this voice will help in anything, but I think the above is
    virtually impossible to achieve unless you have a bunch of machines,
    OSs and VMs to continually test on and a consistent set of benchmarks
    plotted over time... and of course check every single commit for
    regression over all these combinations. And even then you'd always
    find a case of something being faster or slower on some combination of
    hardware/ software; optimizing for these differences makes little
    sense to me (people struggling with performance on some weird
    software/hardware combination can always change the VM vendor or a VM
    switch).

    Sorry for being so pessimistically unconstructive... :(

    Dawid

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer at Jun 30, 2011 at 8:45 pm

    On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss wrote:
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is tricky
    because x64 will pick -server regardless of the command-line setting)
    and you need to be copying generic arrays. I think this can be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.
    this is what I want to see. I don't want to discuss based on some bug
    reported for a non-primitive version of copyOf thats all.
    its pointless to discuss if there is no evidence which I don't see. I
    am happy with arraycopy I would just have appreciated a discussion
    before backing the change out.

    simon
    Dawid
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless at Jun 30, 2011 at 10:20 pm

    On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer wrote:
    On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss
    wrote:
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is tricky
    because x64 will pick -server regardless of the command-line setting)
    and you need to be copying generic arrays. I think this can be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.
    this is what I want to see. I don't want to discuss based on some bug
    reported for a non-primitive version of copyOf thats all.
    its pointless to discuss if there is no evidence which I don't see. I
    am happy with arraycopy I would just have appreciated a discussion
    before backing the change out.
    I think the burden of proof here is on Arrays.copyOf.

    Ie, until we can prove (through benchmarking in different envs) that
    it can be trusted, we should just stick with System.arraycopy to
    reduce the risk.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer at Jul 1, 2011 at 5:47 am

    On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless wrote:
    On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer
    wrote:
    On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss
    wrote:
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is tricky
    because x64 will pick -server regardless of the command-line setting)
    and you need to be copying generic arrays. I think this can be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.
    this is what I want to see. I don't want to discuss based on some bug
    reported for a non-primitive version of copyOf thats all.
    its pointless to discuss if there is no evidence which I don't see. I
    am happy with arraycopy I would just have appreciated a discussion
    before backing the change out.
    I think the burden of proof here is on Arrays.copyOf.

    Ie, until we can prove (through benchmarking in different envs) that
    it can be trusted, we should just stick with System.arraycopy to
    reduce the risk.
    Mike I think we should do that but the real issue here is what if
    somebody comes up with any arbitrary method in the future claiming its
    slow we back out and use the "we think safe way" what if it is
    actually the other way around and copyOf is optimized by new VMs and
    the copyarray is slightly slower. If robert would not have reverted
    this commit we would have not discussed that her at all. I just want
    to prevent the "we should not do this" it might be a problem in the
    big picture while the microbenchmark doesn't show a difference. At
    some point we have to rely on the JVM.
    Even if we benchmark on all OS with various JVMs we can't prevent jvm
    bug based perf hits. While there was no bug reported for primitives
    here we don't have to be afraid of it either. I don't think its saver
    to use arraycopy at all its just a level of indirection safer but
    makes development more of a pain IMO.

    Just my $0.05
    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler at Jul 1, 2011 at 6:52 am
    Hi,

    I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL):

    Arrays.java:
    /**
    * Copies the specified array, truncating or padding with zeros (if necessary)
    * so the copy has the specified length. For all indices that are
    * valid in both the original array and the copy, the two arrays will
    * contain identical values. For any indices that are valid in the
    * copy but not the original, the copy will contain <tt>(byte)0</tt>.
    * Such indices will exist if and only if the specified length
    * is greater than that of the original array.
    *
    * @param original the array to be copied
    * @param newLength the length of the copy to be returned
    * @return a copy of the original array, truncated or padded with zeros
    * to obtain the specified length
    * @throws NegativeArraySizeException if <tt>newLength</tt> is negative
    * @throws NullPointerException if <tt>original</tt> is null
    * @since 1.6
    */
    public static byte[] copyOf(byte[] original, int newLength) {
    byte[] copy = new byte[newLength];
    System.arraycopy(original, 0, copy, 0,
    Math.min(original.length, newLength));
    return copy;
    }


    This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream):

    private void grow(int newLength) {
    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    // buffer = Arrays.copyOf(buffer, newLength);
    byte[] newBuffer = new byte[newLength];
    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    buffer = newBuffer;
    }

    So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth?

    The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection.


    To come back to UnsafeByteArrayOutputStream:

    I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward].

    The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class.

    Uwe
    (getting crazy)

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Simon Willnauer
    Sent: Friday, July 01, 2011 7:47 AM
    To: Michael McCandless
    Cc: dev@lucene.apache.org; Dawid Weiss
    Subject: Re: svn commit: r1141510 -
    /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless
    wrote:
    On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer
    wrote:
    On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss
    wrote:
    I don't seen any evidence that this is any slower though.
    You need to run with -client (if the machine is a beast this is
    tricky because x64 will pick -server regardless of the command-line
    setting) and you need to be copying generic arrays. I think this can
    be shown
    -- a caliper benchmark would be perfect to demonstrate this in
    isolation; I may write one if I find a spare moment.
    this is what I want to see. I don't want to discuss based on some bug
    reported for a non-primitive version of copyOf thats all.
    its pointless to discuss if there is no evidence which I don't see. I
    am happy with arraycopy I would just have appreciated a discussion
    before backing the change out.
    I think the burden of proof here is on Arrays.copyOf.

    Ie, until we can prove (through benchmarking in different envs) that
    it can be trusted, we should just stick with System.arraycopy to
    reduce the risk.
    Mike I think we should do that but the real issue here is what if somebody
    comes up with any arbitrary method in the future claiming its slow we back
    out and use the "we think safe way" what if it is actually the other way
    around and copyOf is optimized by new VMs and the copyarray is slightly
    slower. If robert would not have reverted this commit we would have not
    discussed that her at all. I just want to prevent the "we should not do this" it
    might be a problem in the big picture while the microbenchmark doesn't
    show a difference. At some point we have to rely on the JVM.
    Even if we benchmark on all OS with various JVMs we can't prevent jvm bug
    based perf hits. While there was no bug reported for primitives here we
    don't have to be afraid of it either. I don't think its saver to use arraycopy at
    all its just a level of indirection safer but makes development more of a pain
    IMO.

    Just my $0.05
    Mike
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
    commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2011 at 1:13 pm

    On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler wrote:
    Hi,

    I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL):
    This is the source code for a specific version of one specific Java
    impl. If we knew all Java impls simply implemented the primitive case
    using System.arraycopy (admittedly it's hard to imagine that they
    wouldn't!) then we are fine.
    This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream):

    private void grow(int newLength) {
    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    // buffer = Arrays.copyOf(buffer, newLength);
    byte[] newBuffer = new byte[newLength];
    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    buffer = newBuffer;
    }

    So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth?
    Right, in this case (if you used OpenJDK 6) we are obviously OK. Not
    sure about other cases...
    The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection.
    OK I'm convinced (I think!) that for primitive types only, let's use
    Arrays.copyOf!
    To come back to UnsafeByteArrayOutputStream:

    I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward].
    Well, it sounds like for primitive types, we can cutover
    ArrayUtils.grow methods. Then we can look @ the nightly bench the
    next day ;)

    But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
    (almost) a dup of ByteArrayDataOutput?
    The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class.
    That sounds good!

    Uwe can you commit TODOs to the code w/ these ideas?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Shai Erera at Jul 1, 2011 at 4:07 pm
    About the encoders package - there are several encoders there besides
    VInt, so I wouldn't dispose of it so quickly. That said, I think we
    should definitely explore consolidating VInt with the core classes,
    and maybe write an encoder which delegate to them.

    Or, come up w/ a different approach for allowing to plug in different
    Encoders. I don't rule out anything, as long as we preserve
    functionality and capabilities.

    Shai
    On Friday, July 1, 2011, Michael McCandless wrote:
    On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler wrote:
    Hi,

    I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL):
    This is the source code for a specific version of one specific Java
    impl.  If we knew all Java impls simply implemented the primitive case
    using System.arraycopy (admittedly it's hard to imagine that they
    wouldn't!) then we are fine.
    This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream):

    private void grow(int newLength) {
    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    // buffer = Arrays.copyOf(buffer, newLength);
    byte[] newBuffer = new byte[newLength];
    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    buffer = newBuffer;
    }

    So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth?
    Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
    sure about other cases...
    The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection.
    OK I'm convinced (I think!) that for primitive types only, let's use
    Arrays.copyOf!
    To come back to UnsafeByteArrayOutputStream:

    I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward].
    Well, it sounds like for primitive types, we can cutover
    ArrayUtils.grow methods.  Then we can look @ the nightly bench the
    next day ;)

    But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
    (almost) a dup of ByteArrayDataOutput?
    The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class.
    That sounds good!

    Uwe can you commit TODOs to the code w/ these ideas?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless at Jul 1, 2011 at 1:05 pm

    On Fri, Jul 1, 2011 at 1:47 AM, Simon Willnauer wrote:

    Mike I think we should do that but the real issue here is what if
    somebody comes up with any arbitrary method in the future claiming its
    slow we back out and use the "we think safe way" what if it is
    actually the other way around and copyOf is optimized by new VMs and
    the copyarray is slightly slower.
    I think we take it case by case? We do need to be careful when using
    low level ops in Java.

    In this specific case, we have been almost burned already by
    Arrays.copyOf, and never by System.arraycopy (that I'm aware of), so
    we should not cutover until we have some confidence that burning will
    not occur. Other cases will be different, but this one has enough
    history that the bias seems clear.
    I just want
    to prevent the "we should not do this" it might be a problem in the
    big picture while the microbenchmark doesn't show a difference. At
    some point we have to rely on the JVM.
    I agree, but generally the burden of proof is on the "new one". Just
    because we can use Java 1.6 code now doesn't mean we should blindly
    cutover to new stuff.

    System.arraycopy is tried & true and we've never hit a perf issue with
    it, in my memory.
    Even if we benchmark on all OS with various JVMs we can't prevent jvm
    bug based perf hits.
    Right, nothing is perfect here; this is just about mitigating risk.
    We were lucky to have tracked down this slowdown down last time, I
    think only because Robert was using a -client JVM at the time?
    While there was no bug reported for primitives
    here we don't have to be afraid of it either. I don't think its saver
    to use arraycopy at all its just a level of indirection safer but
    makes development more of a pain IMO.
    Yes it's a slight hassle (2 extra lines), but if this mitigates risk,
    it's worth it.

    That said, if we can convince ourselves that in the primitives only
    case, Arrays.copyOf has very little risk, then I'm OK w/ using it for
    those cases.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler at Jun 30, 2011 at 1:13 pm
    Robert,

    as noted in my other eMail, ist only slow for the generic Object[] method (as it uses j.l.reflect.Array.newInstance(Class componentType)). We are talking here about byte[], and the Arrays method is implemented with the same 3 lines of code, Simon replaced. The only difference is a Math.min() which is intrinsic (it is used, as Arrays.copyOf supports shrinking size, so the System.arrayCopy() needs upper limit to not AIOOBE).

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Robert Muir
    Sent: Thursday, June 30, 2011 3:05 PM
    To: dev@lucene.apache.org; simon.willnauer@gmail.com
    Cc: commits@lucene.apache.org
    Subject: Re: svn commit: r1141510 -
    /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    because on windows 32bit at least, -client is still the default on most jres out
    there.

    i realize people don't care about -client, but i will police
    foo[].clone() / arrays.copyOf etc to prevent problems.

    There are comments about this stuff on the relevant bug reports (oracle's
    site is down, sorry) linked to this issue.
    https://issues.apache.org/jira/browse/LUCENE-2674

    Sorry, I don't think we should use foo[].clone() / arrays.copyOf, I think we
    should always use arraycopy.

    On Thu, Jun 30, 2011 at 8:55 AM, Simon Willnauer
    wrote:
    hmm are you concerned about the extra Math.min that happens in the
    copyOf method?
    I don't how that relates to "intrinsic" and java 1.7

    I don't have strong feelings here just checking if you mix something
    up in the comment you put there... I am happy to keep the old and now
    current code

    simon
    On Thu, Jun 30, 2011 at 2:42 PM, wrote:
    Author: rmuir
    Date: Thu Jun 30 12:42:17 2011
    New Revision: 1141510

    URL: http://svn.apache.org/viewvc?rev=1141510&view=rev
    Log:
    LUCENE-3239: remove use of slow Arrays.copyOf

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java

    Modified:
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java
    URL:
    http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/src/java/
    org/apache/lucene/util/UnsafeByteArrayOutputStream.java?rev=1141510&r
    1=1141509&r2=1141510&view=diff
    ==========================================================
    ===========
    =========
    ---
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
    ByteArrayOutputStream.java (original)
    +++
    lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Un
    +++ safeByteArrayOutputStream.java Thu Jun 30 12:42:17 2011
    @@ -2,7 +2,6 @@ package org.apache.lucene.util;

    import java.io.IOException;
    import java.io.OutputStream;
    -import java.util.Arrays;

    /**
    * Licensed to the Apache Software Foundation (ASF) under one or more
    @@ -72,7 +71,11 @@ public class UnsafeByteArrayOutputStream
    }

    private void grow(int newLength) {
    - buffer = Arrays.copyOf(buffer, newLength);
    + // It actually should be: (Java 1.7, when its intrinsic on all
    + machines)
    + // buffer = Arrays.copyOf(buffer, newLength);
    + byte[] newBuffer = new byte[newLength];
    + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    + buffer = newBuffer;
    }

    /**

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
    additional commands, e-mail: dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
    commands, e-mail: dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 30, '11 at 12:56p
activeJul 1, '11 at 4:07p
posts21
users6
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase