Grokbase Groups R r-devel June 2008
FAQ
I came to report this same bug and found it already in the trash, but
I slightly disagree with that assessment. If it's not a bug, then
perhaps it's a feature request. Comments at the end.
On Mon, May 14, 2007, Duncan Murdoch wrote:
On 13/05/2007 8:46 PM, scott.wilkinson at csiro.au wrote:

In the example below round() does not report to the specified number of
digits when the last digit to be reported is zero: Compare behaviour for
0.897575 and 0.946251. Ditto for signif(). The number of sigfigs is
ambiguous unless the reader knows this behaviour. Is this a bug or
intended behaviour? Is there a work-around?
It's not a bug. It has nothing to do with round(), it is the way R
prints numbers by default. If you ask to print 0.90, you'll get

[1] 0.9

because 0.9 and 0.90 are the same number. If you want trailing zeros to
print, you need to specify a format to do that, e.g.
noquote(format(0.9, nsmall=2))
[1] 0.90

The noquote stops the "" from printing. You could also use sprintf() or
formatC() for more C-like format specifications.
All of those options require you to specify the number of digits after
the decimal, don't they? Unless sprintf would default to the number of
decimal places passed to it, but it doesn't:
sprintf("%f",signif(0.90, digits=2))
[1] "0.900000";

it defaults to 6. Although %g behaves differently,
sprintf("%g",signif(0.90, digits=2))
[1] "0.9",

this clearly still isn't the desired behavior.

To continue that vein, the same issue with rounding versus printing
occurs with vectors:
sapply(c(1:6),function(a){signif(c(18.423,0.90),digits=a)})
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 20.0 18.0 18.4 18.42 18.423 18.423
[2,] 0.9 0.9 0.9 0.90 0.900 0.900

Trying to get that and more complicated tables to print the correct
number of significant digits gets pretty hairy with sprintf(). I could
be wrong, but I would view the primary purpose of rounding to
significant digits the printed output of the number. That there
doesn't seem to be a way to do this without later having to specify
the number of decimal places would seem to render signif() as it's
written not particularly useful.

There are two solutions I can think of off the top of my head. The
first is to create a new data type of a fixed length real number but
the easier way would be to have a function that returns a string
something like this:

signif.string <- function(signum,sigdigs){
left <- nchar(trunc(signum))
right <- nchar(signum-trunc(signum))-2
if (abs(signum)<1 | signum<0) {left<-left-1}
if (right<0) {right<-0}
if (sigdigs<left) {return(as.character(signif(signum,digits=sigdigs)))}
else if (sigdigs==left) {return(paste(round(signum),".",sep=""))}
else if (sigdigs<=left+right) {return(format(signum,digits=sigdigs))}
else {return(sprintf(paste("%.",sigdigs-left,"f",sep=""),signum))}
}

This is just a skeleton that I think suits my needs for the moment and
might also cover the original poster's. One for production would need
to handle scientific notation and would probably want to fix the 5
round behavior on windows.

Pat Carr

version.string R version 2.7.0 (2008-04-22)

Search Discussions

  • Pmc1 at Jun 3, 2008 at 1:30 am
    To reply to my own message, that function wasn't quite right. I think
    this one works better:

    signif.string <- function(signum,sigdigs){
    test <- abs(signum)
    left <- nchar(trunc(test))
    right <- nchar(test)-left-1
    if (test<1) {left<-left-1}
    if (right<0) {right<-0}
    if (sigdigs<left) {out<-as.character(signif(signum,digits=sigdigs))}
    else if (sigdigs==left & trunc(signum) %% 10 == 0)
    {out<-paste(round(signum),".",sep="")}
    else if (sigdigs<=left+right) {out<-format(signum,digits=sigdigs)}
    else {out<-sprintf(paste("%.",sigdigs-left,"f",sep=""),signum)}
    return(noquote(out))
    }

    But it should still have error checking and vector capability, yadda
    yadda. Also, I forgot what year it was, so sorry, Scott, for spamming
    you with something you're hopefully not still stuck on.

    Pat Carr
  • Duncan Murdoch at Jun 3, 2008 at 10:36 am

    pmc1 at cornell.edu wrote:
    I came to report this same bug and found it already in the trash, but
    I slightly disagree with that assessment. If it's not a bug, then
    perhaps it's a feature request. Comments at the end.
    On Mon, May 14, 2007, Duncan Murdoch wrote:

    On 13/05/2007 8:46 PM, scott.wilkinson at csiro.au wrote:

    In the example below round() does not report to the specified number of
    digits when the last digit to be reported is zero: Compare behaviour for
    0.897575 and 0.946251. Ditto for signif(). The number of sigfigs is
    ambiguous unless the reader knows this behaviour. Is this a bug or
    intended behaviour? Is there a work-around?
    It's not a bug. It has nothing to do with round(), it is the way R
    prints numbers by default. If you ask to print 0.90, you'll get

    [1] 0.9

    because 0.9 and 0.90 are the same number. If you want trailing zeros to
    print, you need to specify a format to do that, e.g.

    noquote(format(0.9, nsmall=2))
    [1] 0.90

    The noquote stops the "" from printing. You could also use sprintf() or
    formatC() for more C-like format specifications.
    All of those options require you to specify the number of digits after
    the decimal, don't they? Unless sprintf would default to the number of
    decimal places passed to it, but it doesn't:
    That specification doesn't make sense. There is no "number of decimal
    places passed to it". What sprintf() sees below is identical to what it
    would see if you called

    sprintf("%f", 0.9)

    because signif(0.90, digits=2) == 0.9. Those two objects are identical.
    sprintf("%f",signif(0.90, digits=2))
    [1] "0.900000";

    it defaults to 6. Although %g behaves differently,
    sprintf("%g",signif(0.90, digits=2))
    [1] "0.9",

    this clearly still isn't the desired behavior.
    Maybe not what you desired, but certainly reasonable behaviour.
    To continue that vein, the same issue with rounding versus printing
    occurs with vectors:
    sapply(c(1:6),function(a){signif(c(18.423,0.90),digits=a)})
    [,1] [,2] [,3] [,4] [,5] [,6]
    [1,] 20.0 18.0 18.4 18.42 18.423 18.423
    [2,] 0.9 0.9 0.9 0.90 0.900 0.900

    Trying to get that and more complicated tables to print the correct
    number of significant digits gets pretty hairy with sprintf(). I could
    be wrong, but I would view the primary purpose of rounding to
    significant digits the printed output of the number. That there
    doesn't seem to be a way to do this without later having to specify
    the number of decimal places would seem to render signif() as it's
    written not particularly useful.

    There are two solutions I can think of off the top of my head. The
    first is to create a new data type of a fixed length real number but
    the easier way would be to have a function that returns a string
    something like this:

    signif.string <- function(signum,sigdigs){
    left <- nchar(trunc(signum))
    right <- nchar(signum-trunc(signum))-2
    if (abs(signum)<1 | signum<0) {left<-left-1}
    if (right<0) {right<-0}
    if (sigdigs<left) {return(as.character(signif(signum,digits=sigdigs)))}
    else if (sigdigs==left) {return(paste(round(signum),".",sep=""))}
    else if (sigdigs<=left+right) {return(format(signum,digits=sigdigs))}
    else {return(sprintf(paste("%.",sigdigs-left,"f",sep=""),signum))}
    }


    This is just a skeleton that I think suits my needs for the moment and
    might also cover the original poster's. One for production would need
    to handle scientific notation and would probably want to fix the 5
    round behavior on windows.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10

    looks okay to me. If you want biased rounding instead (getting 2:11 as
    output), simply use trunc(x + 0.5) instead of round(x).

    Duncan Murdoch
    Pat Carr

    version.string R version 2.7.0 (2008-04-22)

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Patrick Carr at Jun 3, 2008 at 3:43 pm

    On 6/3/08, Duncan Murdoch wrote:
    because signif(0.90, digits=2) == 0.9. Those two objects are identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.

    The original complainant (and I) expected this behavior from signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235

    OS X (10.5.2/intel) does not have that problem. But (on both windows and OS X):
    signif(12345.12345,digits)
    [1] 12345.12

    Pat Carr
  • Duncan Murdoch at Jun 3, 2008 at 6:48 pm

    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.

    The original complainant (and I) expected this behavior from signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number 12345
    is exactly representable, so it is exactly half-way between 12340 and
    12350, so 12340 is the right answer by the unbiased round-to-even rule.
    The number 0.12345 is not exactly representable, but (I think) it is
    represented by something slightly closer to 0.1235 than to 0.1234. So
    it looks as though Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong. Both are supposed to be
    using the 64 bit floating point standard, so they should both give the
    same answer: but the actual arithmetic is being done by run-time
    libraries that are outside our control to a large extent, and it looks
    as though the one on the Mac is less accurate than the one on Windows.


    But (on both windows and OS X):
    signif(12345.12345,digits)
    [1] 12345.12
    This is a different problem. The number is correctly computed as
    12345.12345 (or at least a representable number quite close to that),
    and then the default display rounds it some more. Set
    options(digits) to see it in its full glory.

    Duncan Murdoch
  • Patrick Carr at Jun 3, 2008 at 8:07 pm

    On 6/3/08, Duncan Murdoch wrote:
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number 12345 is
    exactly representable, so it is exactly half-way between 12340 and 12350, so
    12340 is the right answer by the unbiased round-to-even rule. The number
    0.12345 is not exactly representable, but (I think) it is represented by
    something slightly closer to 0.1235 than to 0.1234. So it looks as though
    Windows gets it right.
    Well, right within the limitations of binary floating-point
    arithmetic. Not right right.

    In the grander scheme, this is a nicety which is largely
    inconsequential--if I need a real measure precision (precise
    precision?) I'll use a +/- notation of a propagated error and/or edit
    the typography of the numbers by hand immediately before the final
    output. But again, final printed output of the number is basically the
    useful use I see for a function that returns significant digits. And
    for that purpose I think it should be right right, and actually output
    the number of significant digits requested.
    signif(12345.12345,digits)
    [1] 12345.12
    This is a different problem. The number is correctly computed as
    12345.12345 (or at least a representable number quite close to that), and
    then the default display rounds it some more. Set options(digits) to see
    it in its full glory.
    Aha, my mistake; I missed that setting.

    Pat Carr
  • Simon Urbanek at Jun 3, 2008 at 8:36 pm

    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased round-
    to-even rule. The number 0.12345 is not exactly representable, but
    (I think) it is represented by something slightly closer to 0.1235
    than to 0.1234. So it looks as though Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on pretty
    much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).
    Windows is the only one delivering the incorrect result here.

    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that all
    except Windows get. I don't have a Windows box handy, so I can't tell
    why - but if you go through fprec this is what you get on the
    platforms I tested (log10 may vary slightly but that's irrelevant here):

    x = 0.123450000000000004174439
    l10 = -0.908508905732048899217546, e10 = 4
    pow10 = 10000.000000000000000000000000
    x*pow10 = 1234.500000000000000000000000

    Cheers,
    Simon

    but the actual arithmetic is being done by run-time libraries that
    are outside our control to a large extent, and it looks as though
    the one on the Mac is less accurate than the one on Windows.


    But (on both windows and OS X):
    signif(12345.12345,digits)
    [1] 12345.12
    This is a different problem. The number is correctly computed as
    12345.12345 (or at least a representable number quite close to
    that), and then the default display rounds it some more. Set
    options(digits) to see it in its full glory.

    Duncan Murdoch

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Duncan Murdoch at Jun 3, 2008 at 9:12 pm

    On 6/3/2008 4:36 PM, Simon Urbanek wrote:
    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased round-
    to-even rule. The number 0.12345 is not exactly representable, but
    (I think) it is represented by something slightly closer to 0.1235
    than to 0.1234. So it looks as though Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on pretty
    much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).
    Windows is the only one delivering the incorrect result here.

    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that all
    except Windows get.
    I think you skipped a step. The correct answer is either 0.1234 or
    0.1235, not something 10000 times bigger. The first important question
    is whether 0.12345 is exactly representable, and the answer is no. The
    second question is whether it is represented by a number bigger or
    smaller than the real number 0.12345. If it is bigger, the answer
    should be 0.1235, and if it is smaller, the answer is 0.1234. My
    experiments suggest it is bigger. Yours don't look relevant. It
    certainly isn't exactly equal to 1234.5/10000, because that number is
    not representable. It's equal to x/2^y, for some x and y, and it's a
    pain to figure out exactly what they are.

    However, I am pretty sure R is representing it (at least on Windows) as
    the binary expansion

    0.000111111001101001101011010100001011000011110010011111

    while the true binary expansion (using exact rational arithmetic) starts
    out

    0.00011111100110100110101101010000101100001111001001111011101...

    If you line those up, you'll see that the first number is bigger than
    the second. (Ugly code to derive these is down below.)

    Clearly the top representation is the correct one to that number of
    binary digits, so I think Windows got it right, and all those other
    systems didn't. This is probably because R on Windows is using extended
    precision (64 bit mantissas) for intermediate results, and those other
    systems stick with 53 bit mantissas.

    Duncan Murdoch

    # Convert number to binary expansion; add the decimal point manually

    x <- 0.12345
    while (x != 0) {
    cat(trunc(x))
    x <- x - trunc(x)
    x <- x * 2
    }

    # Do the same thing in exact rational arithmetic

    num <- 12345
    denom <- 100000
    for (i in 1:60) {
    cat(ifelse(num > 100000, "1", "0"))
    num <- num %% 100000
    num <- 2*num
    }

    # Manually cut and paste the results to get these:

    "0.000111111001101001101011010100001011000011110010011111"
    "0.00011111100110100110101101010000101100001111001001111011101"
  • Simon Urbanek at Jun 3, 2008 at 9:33 pm

    On Jun 3, 2008, at 5:12 PM, Duncan Murdoch wrote:
    On 6/3/2008 4:36 PM, Simon Urbanek wrote:
    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say
    as an
    annotation on a graph. Passing it through sprintf() or format()
    later
    requires you to specify the number of digits after the decimal,
    which
    is different than the number of significant digits, and requires
    case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased
    round- to-even rule. The number 0.12345 is not exactly
    representable, but (I think) it is represented by something
    slightly closer to 0.1235 than to 0.1234. So it looks as though
    Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on
    pretty much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/
    Sun, ...). Windows is the only one delivering the incorrect result
    here.
    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that
    all except Windows get.
    I think you skipped a step.
    I didn't - I was just pointing out that what you are trying to show is
    irrelevant. We are dealing with FP arithmetics here, so although your
    reasoning is valid algebraically, it's not in FP world. You missed the
    fact that FP operations are used to actually get the result (*10000.0,
    round and divide again) and thus those operation will influence it as
    well.

    The correct answer is either 0.1234 or 0.1235, not something 10000
    times bigger. The first important question is whether 0.12345 is
    exactly representable, and the answer is no. The second question is
    whether it is represented by a number bigger or smaller than the
    real number 0.12345. If it is bigger, the answer should be 0.1235,
    and if it is smaller, the answer is 0.1234.
    No. That was what I was trying to point out. You can see clearly from
    my post that 0.12345 is not exactly representable and that the
    representation is slightly bigger. This is, however, irrelevant,
    because the next step is to multiply that number by 10000 (see fprec
    source) and this is where your reasoning breaks down - the result is
    exact representation of 1234.5, because the imprecision gets lost in
    the operation on all platforms but Windows. The result is that Windows
    is inconsistent with others, whether that is a bug or feature I don't
    care. All I really wanted to say is that this has nothing to do with
    OS X - if anything then it's a Windows issue.

    My experiments suggest it is bigger.
    I was not claiming otherwise.

    Yours don't look relevant.
    Vice versa as it turns out.

    Cheers,
    Simon

    It certainly isn't exactly equal to 1234.5/10000, because that
    number is not representable. It's equal to x/2^y, for some x and y,
    and it's a pain to figure out exactly what they are.

    However, I am pretty sure R is representing it (at least on Windows)
    as the binary expansion

    0.000111111001101001101011010100001011000011110010011111

    while the true binary expansion (using exact rational arithmetic)
    starts out

    0.00011111100110100110101101010000101100001111001001111011101...

    If you line those up, you'll see that the first number is bigger
    than the second. (Ugly code to derive these is down below.)

    Clearly the top representation is the correct one to that number of
    binary digits, so I think Windows got it right, and all those other
    systems didn't. This is probably because R on Windows is using
    extended precision (64 bit mantissas) for intermediate results, and
    those other systems stick with 53 bit mantissas.
    However, this means that Windows doesn't conform
    Duncan Murdoch

    # Convert number to binary expansion; add the decimal point manually

    x <- 0.12345
    while (x != 0) {
    cat(trunc(x))
    x <- x - trunc(x)
    x <- x * 2
    }

    # Do the same thing in exact rational arithmetic

    num <- 12345
    denom <- 100000
    for (i in 1:60) {
    cat(ifelse(num > 100000, "1", "0"))
    num <- num %% 100000
    num <- 2*num
    }

    # Manually cut and paste the results to get these:

    "0.000111111001101001101011010100001011000011110010011111"
    "0.00011111100110100110101101010000101100001111001001111011101"
  • Karl Ove Hufthammer at Jun 4, 2008 at 11:17 am

    Duncan Murdoch:

    The number 0.12345 is not exactly representable, but (I think) it is
    represented by something slightly closer to 0.1235 than to 0.1234.
    I like using formatC for checking such things. On my (Linux) system, I get:

    $ formatC(.12345,digitsP)
    [1] "0.12345000000000000417443857259058859199285507202148"
    So it looks as though Windows gets it right.
    --
    Karl Ove Hufthammer
  • Prof Brian Ripley at Jun 4, 2008 at 12:18 pm
    That (and simpler, sprintf()) merely tell you about your OS's sprintf
    function. That is not required to accurately give a decimal
    representation of more than DECIMAL_DIG digits, and certainly not 50.
    The help for formatC does warn you about this.

    On my F8 system DECIMAL_DIG is mentioned in math.h but not defined there.
    (It is defined in the compiler, AFAICS -- SunStudio has it as 21, and gcc
    computes __DECIMAL_DIG__ also as 21.) The C99 standard says that it
    should be at least 10 and that 17 is appropriate for IEC60559 arithmetic.

    On Wed, 4 Jun 2008, Karl Ove Hufthammer wrote:

    Duncan Murdoch:
    The number 0.12345 is not exactly representable, but (I think) it is
    represented by something slightly closer to 0.1235 than to 0.1234.
    I like using formatC for checking such things. On my (Linux) system, I get:

    $ formatC(.12345,digitsP)
    [1] "0.12345000000000000417443857259058859199285507202148"
    So it looks as though Windows gets it right.
    --
    Karl Ove Hufthammer

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
    --
    Brian D. Ripley, ripley at stats.ox.ac.uk
    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
    University of Oxford, Tel: +44 1865 272861 (self)
    1 South Parks Road, +44 1865 272866 (PA)
    Oxford OX1 3TG, UK Fax: +44 1865 272595
  • Simon Urbanek at Jun 3, 2008 at 8:40 pm

    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased round-
    to-even rule. The number 0.12345 is not exactly representable, but
    (I think) it is represented by something slightly closer to 0.1235
    than to 0.1234. So it looks as though Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on pretty
    much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).
    Windows is the only one delivering the incorrect result here.

    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that all
    except Windows get. I don't have a Windows box handy, so I can't tell
    why - but if you go through fprec this is what you get on the
    platforms I tested (log10 may vary slightly but that's irrelevant here):

    x = 0.123450000000000004174439
    l10 = -0.908508905732048899217546, e10 = 4
    pow10 = 10000.000000000000000000000000
    x*pow10 = 1234.500000000000000000000000

    Cheers,
    Simon

    but the actual arithmetic is being done by run-time libraries that
    are outside our control to a large extent, and it looks as though
    the one on the Mac is less accurate than the one on Windows.


    But (on both windows and OS X):
    signif(12345.12345,digits)
    [1] 12345.12
    This is a different problem. The number is correctly computed as
    12345.12345 (or at least a representable number quite close to
    that), and then the default display rounds it some more. Set
    options(digits) to see it in its full glory.

    Duncan Murdoch

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Murdoch at Jun 3, 2008 at 9:15 pm

    On 6/3/2008 4:36 PM, Simon Urbanek wrote:
    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say as an
    annotation on a graph. Passing it through sprintf() or format() later
    requires you to specify the number of digits after the decimal, which
    is different than the number of significant digits, and requires case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased round-
    to-even rule. The number 0.12345 is not exactly representable, but
    (I think) it is represented by something slightly closer to 0.1235
    than to 0.1234. So it looks as though Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on pretty
    much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).
    Windows is the only one delivering the incorrect result here.

    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that all
    except Windows get.
    I think you skipped a step. The correct answer is either 0.1234 or
    0.1235, not something 10000 times bigger. The first important question
    is whether 0.12345 is exactly representable, and the answer is no. The
    second question is whether it is represented by a number bigger or
    smaller than the real number 0.12345. If it is bigger, the answer
    should be 0.1235, and if it is smaller, the answer is 0.1234. My
    experiments suggest it is bigger. Yours don't look relevant. It
    certainly isn't exactly equal to 1234.5/10000, because that number is
    not representable. It's equal to x/2^y, for some x and y, and it's a
    pain to figure out exactly what they are.

    However, I am pretty sure R is representing it (at least on Windows) as
    the binary expansion

    0.000111111001101001101011010100001011000011110010011111

    while the true binary expansion (using exact rational arithmetic) starts
    out

    0.00011111100110100110101101010000101100001111001001111011101...

    If you line those up, you'll see that the first number is bigger than
    the second. (Ugly code to derive these is down below.)

    Clearly the top representation is the correct one to that number of
    binary digits, so I think Windows got it right, and all those other
    systems didn't. This is probably because R on Windows is using extended
    precision (64 bit mantissas) for intermediate results, and those other
    systems stick with 53 bit mantissas.

    Duncan Murdoch

    # Convert number to binary expansion; add the decimal point manually

    x <- 0.12345
    while (x != 0) {
    cat(trunc(x))
    x <- x - trunc(x)
    x <- x * 2
    }

    # Do the same thing in exact rational arithmetic

    num <- 12345
    denom <- 100000
    for (i in 1:60) {
    cat(ifelse(num > 100000, "1", "0"))
    num <- num %% 100000
    num <- 2*num
    }

    # Manually cut and paste the results to get these:

    "0.000111111001101001101011010100001011000011110010011111"
    "0.00011111100110100110101101010000101100001111001001111011101"
  • Simon Urbanek at Jun 3, 2008 at 9:35 pm

    On Jun 3, 2008, at 5:12 PM, Duncan Murdoch wrote:
    On 6/3/2008 4:36 PM, Simon Urbanek wrote:
    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say
    as an
    annotation on a graph. Passing it through sprintf() or format()
    later
    requires you to specify the number of digits after the decimal,
    which
    is different than the number of significant digits, and requires
    case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:
    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10
    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased
    round- to-even rule. The number 0.12345 is not exactly
    representable, but (I think) it is represented by something
    slightly closer to 0.1235 than to 0.1234. So it looks as though
    Windows gets it right.

    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on
    pretty much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/
    Sun, ...). Windows is the only one delivering the incorrect result
    here.
    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that
    all except Windows get.
    I think you skipped a step.
    I didn't - I was just pointing out that what you are trying to show is
    irrelevant. We are dealing with FP arithmetics here, so although your
    reasoning is valid algebraically, it's not in FP world. You missed the
    fact that FP operations are used to actually get the result (*10000.0,
    round and divide again) and thus those operation will influence it as
    well.

    The correct answer is either 0.1234 or 0.1235, not something 10000
    times bigger. The first important question is whether 0.12345 is
    exactly representable, and the answer is no. The second question is
    whether it is represented by a number bigger or smaller than the
    real number 0.12345. If it is bigger, the answer should be 0.1235,
    and if it is smaller, the answer is 0.1234.
    No. That was what I was trying to point out. You can see clearly from
    my post that 0.12345 is not exactly representable and that the
    representation is slightly bigger. This is, however, irrelevant,
    because the next step is to multiply that number by 10000 (see fprec
    source) and this is where your reasoning breaks down - the result is
    exact representation of 1234.5, because the imprecision gets lost in
    the operation on all platforms but Windows. The result is that Windows
    is inconsistent with others, whether that is a bug or feature I don't
    care. All I really wanted to say is that this has nothing to do with
    OS X - if anything then it's a Windows issue.

    My experiments suggest it is bigger.
    I was not claiming otherwise.

    Yours don't look relevant.
    Vice versa as it turns out.

    Cheers,
    Simon

    It certainly isn't exactly equal to 1234.5/10000, because that
    number is not representable. It's equal to x/2^y, for some x and y,
    and it's a pain to figure out exactly what they are.

    However, I am pretty sure R is representing it (at least on Windows)
    as the binary expansion

    0.000111111001101001101011010100001011000011110010011111

    while the true binary expansion (using exact rational arithmetic)
    starts out

    0.00011111100110100110101101010000101100001111001001111011101...

    If you line those up, you'll see that the first number is bigger
    than the second. (Ugly code to derive these is down below.)

    Clearly the top representation is the correct one to that number of
    binary digits, so I think Windows got it right, and all those other
    systems didn't. This is probably because R on Windows is using
    extended precision (64 bit mantissas) for intermediate results, and
    those other systems stick with 53 bit mantissas.
    However, this means that Windows doesn't conform
    Duncan Murdoch

    # Convert number to binary expansion; add the decimal point manually

    x <- 0.12345
    while (x != 0) {
    cat(trunc(x))
    x <- x - trunc(x)
    x <- x * 2
    }

    # Do the same thing in exact rational arithmetic

    num <- 12345
    denom <- 100000
    for (i in 1:60) {
    cat(ifelse(num > 100000, "1", "0"))
    num <- num %% 100000
    num <- 2*num
    }

    # Manually cut and paste the results to get these:

    "0.000111111001101001101011010100001011000011110010011111"
    "0.00011111100110100110101101010000101100001111001001111011101"
  • Duncan Murdoch at Jun 3, 2008 at 11:06 pm

    simon.urbanek at r-project.org wrote:
    On Jun 3, 2008, at 5:12 PM, Duncan Murdoch wrote:

    On 6/3/2008 4:36 PM, Simon Urbanek wrote:
    On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
    On 6/3/2008 11:43 AM, Patrick Carr wrote:
    On 6/3/08, Duncan Murdoch wrote:

    because signif(0.90, digits=2) == 0.9. Those two objects are
    identical.
    My text above that is poorly worded. They're identical internally,
    yes. But in terms of the number of significant digits, 0.9 and 0.90
    are different. And that matters when the number is printed, say
    as an
    annotation on a graph. Passing it through sprintf() or format()
    later
    requires you to specify the number of digits after the decimal,
    which
    is different than the number of significant digits, and requires
    case
    testing for numbers of different orders of magnitude.
    The original complainant (and I) expected this behavior from
    signif(),
    not merely rounding. As I said before, I wrote my own workaround so
    this is somewhat academic, but I don't think we're alone.
    As far as I know, rounding is fine in Windows:

    round(1:10 + 0.5)
    [1] 2 2 4 4 6 6 8 8 10 10

    It might not be the rounding, then. (windows xp sp3)
    signif(12345,digits=4) [1] 12340
    signif(0.12345,digits=4)
    [1] 0.1235
    It's easy to make mistakes in this, but a little outside-of-R
    experimentation suggests those are the right answers. The number
    12345 is exactly representable, so it is exactly half-way between
    12340 and 12350, so 12340 is the right answer by the unbiased
    round- to-even rule. The number 0.12345 is not exactly
    representable, but (I think) it is represented by something
    slightly closer to 0.1235 than to 0.1234. So it looks as though
    Windows gets it right.


    OS X (10.5.2/intel) does not have that problem.
    Which would seem to imply OS X gets it wrong.
    This has nothing to do with OS X, you get that same answer on
    pretty much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/
    Sun, ...). Windows is the only one delivering the incorrect result
    here.
    Both are supposed to be using the 64 bit floating point standard,
    so they should both give the same answer:
    Should, yes, but Windows doesn't. In fact 10000.0 is exactly
    representable and so is 1234.5 which is the correct result that
    all except Windows get.
    I think you skipped a step.
    I didn't - I was just pointing out that what you are trying to show is
    irrelevant. We are dealing with FP arithmetics here, so although your
    reasoning is valid algebraically, it's not in FP world. You missed the
    fact that FP operations are used to actually get the result (*10000.0,
    round and divide again) and thus those operation will influence it as
    well.
    But your multiplication isn't accurate: 10000 is 16 times 625.
    Multiplying by 16 will shift the binary representation by 4 places, but
    multiplying by an odd number should leave the last 1 bit in place. If
    you go through the following series of operations, you get a different
    answer than multiplying by 10000:
    x <- 0.12345
    x <- 16*x
    y <- x
    x <- 624*x
    x + y - 1234.5
    [1] 2.273737e-13

    versus
    x <- 0.12345
    x <- 10000*x
    x - 1234.5
    [1] 0

    I get identical results for both of these calculations on a Mac and on
    Windows. Looks strange, but I think the explanation is that multiplying
    by 625 (or 10000) needs one more bit than multiplying by 624 needs, so
    we lose a bit at that point. Windows does the multiplication by 10000
    accurately in its 64 bit registers, and then loses the last bit on
    storing the result into x in 53 bits. But in the signif() calculation,
    everything stays in 64 bit precision, so Windows gets the right result
    (0.1235), while those systems that round intermediate results get it
    wrong (0.1234).

    It's all FP calculations, but in Windows we've programmed the run-time
    to keep extra bits around for intermediate results and get more accurate
    results in cases where things stay in registers, whereas the other
    systems throw away the bits and get more consistent results regardless
    of the execution path.

    Duncan Murdoch

    The correct answer is either 0.1234 or 0.1235, not something 10000
    times bigger. The first important question is whether 0.12345 is
    exactly representable, and the answer is no. The second question is
    whether it is represented by a number bigger or smaller than the
    real number 0.12345. If it is bigger, the answer should be 0.1235,
    and if it is smaller, the answer is 0.1234.
    No. That was what I was trying to point out. You can see clearly from
    my post that 0.12345 is not exactly representable and that the
    representation is slightly bigger. This is, however, irrelevant,
    because the next step is to multiply that number by 10000 (see fprec
    source) and this is where your reasoning breaks down - the result is
    exact representation of 1234.5, because the imprecision gets lost in
    the operation on all platforms but Windows. The result is that Windows
    is inconsistent with others, whether that is a bug or feature I don't
    care. All I really wanted to say is that this has nothing to do with
    OS X - if anything then it's a Windows issue.


    My experiments suggest it is bigger.
    I was not claiming otherwise.


    Yours don't look relevant.
    Vice versa as it turns out.

    Cheers,
    Simon


    It certainly isn't exactly equal to 1234.5/10000, because that
    number is not representable. It's equal to x/2^y, for some x and y,
    and it's a pain to figure out exactly what they are.

    However, I am pretty sure R is representing it (at least on Windows)
    as the binary expansion

    0.000111111001101001101011010100001011000011110010011111

    while the true binary expansion (using exact rational arithmetic)
    starts out

    0.00011111100110100110101101010000101100001111001001111011101...

    If you line those up, you'll see that the first number is bigger
    than the second. (Ugly code to derive these is down below.)

    Clearly the top representation is the correct one to that number of
    binary digits, so I think Windows got it right, and all those other
    systems didn't. This is probably because R on Windows is using
    extended precision (64 bit mantissas) for intermediate results, and
    those other systems stick with 53 bit mantissas.
    However, this means that Windows doesn't conform

    Duncan Murdoch

    # Convert number to binary expansion; add the decimal point manually

    x <- 0.12345
    while (x != 0) {
    cat(trunc(x))
    x <- x - trunc(x)
    x <- x * 2
    }

    # Do the same thing in exact rational arithmetic

    num <- 12345
    denom <- 100000
    for (i in 1:60) {
    cat(ifelse(num > 100000, "1", "0"))
    num <- num %% 100000
    num <- 2*num
    }

    # Manually cut and paste the results to get these:

    "0.000111111001101001101011010100001011000011110010011111"
    "0.00011111100110100110101101010000101100001111001001111011101"

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Scott Wilkinson at Jun 13, 2008 at 4:10 am
    Thanks Patrick, your function is a neat work-around to include trailing
    zeroes when specifying significant digits.
    It worked better with this modification to the "format" arguments:
    else if (sigdigs<=left+right)
    {out<-format(signum,nsmall=sigdigs-left)}

    Agree would be nice to include this in signif()
    Regards, Scott.
    -----Original Message-----
    From: patcarr at gmail.com [mailto:patcarr at gmail.com] On Behalf
    Of Patrick Carr
    Sent: Tuesday, 3 June 2008 11:28 AM
    To: r-bugs at r-project.org
    Cc: Wilkinson, Scott (CLW, Townsville)
    Subject: Re: significant digits (PR#9682)

    To reply to my own message, that function wasn't quite right. I think
    this one works better:

    signif.string <- function(signum,sigdigs){
    test <- abs(signum)
    left <- nchar(trunc(test))
    right <- nchar(test)-left-1
    if (test<1) {left<-left-1}
    if (right<0) {right<-0}
    if (sigdigs<left) {out<-as.character(signif(signum,digits=sigdigs))}
    else if (sigdigs==left & trunc(signum) %% 10 == 0)
    {out<-paste(round(signum),".",sep="")}
    else if (sigdigs<=left+right) {out<-format(signum,digits=sigdigs)}
    else {out<-sprintf(paste("%.",sigdigs-left,"f",sep=""),signum)}
    return(noquote(out))
    }

    But it should still have error checking and vector capability, yadda
    yadda. Also, I forgot what year it was, so sorry, Scott, for spamming
    you with something you're hopefully not still stuck on.

    Pat Carr

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-devel @
categoriesr
postedJun 2, '08 at 11:40p
activeJun 13, '08 at 4:10a
posts16
users6
websiter-project.org
irc#r

People

Translate

site design / logo © 2023 Grokbase