FAQ

[Python] PEP 327: Decimal Data Type

Batista, Facundo
Jan 30, 2004 at 12:49 pm
I'm proud to announce that the PEP for Decimal Data Type is now published
under the python.org structure:

http://www.python.org/peps/pep-0327.html

This wouldn't has been possible without the help from Alex Martelli, Aahz,
Tim Peters, David Goodger and c.l.p itself.

After the pre-PEP roundups the features are almost established. There is not
agreement yet on how to create a Decimal from a float, in both explicit and
implicit constructions.

I depend on settle that to finish the test cases and actually start to work
on the code.

I'll apreciate any feedback. Thank you all in advance.

. Facundo
reply

Search Discussions

39 responses

  • Michael Chermside at Jan 30, 2004 at 3:06 pm

    Facundo Batista writes:
    I'm proud to announce that the PEP for Decimal Data Type is now published
    http://www.python.org/peps/pep-0327.html
    VERY nice work here.

    Here's my 2 cents:

    (1) You propose conversion from floats via:
    Decimal(1.1, 2) == Decimal('1.1')
    Decimal(1.1, 16) == Decimal('1.1000000000000001')
    Decimal(1.1) == Decimal('110000000000000008881784197001252...e-51')

    I think that we'd do even better to ommit the second use. People who
    really want to convert floats exactly can easily write "Decimal(1.1, 60)". But
    hardly anyone wants to convert floats exactly, while lots of newbies would
    forget to include the second parameter. I'd say just make Decimal(someFloat)
    raise a TypeError with a helpful message about how you need that second
    parameter when using floats.

    (2) For adding a Decimal and a float, you write:
    I propose to allow the interaction with float, making an exact conversion and
    raising ValueError if exceeds the precision in the current context (this is
    maybe too tricky, because for example with a precision of 9, Decimal(35) + 1.2
    is OK but Decimal(35) + 1.1 raises an error).

    I suppose that would be all right, but I think I'm with Aahz on this
    one... require explicit conversion. It prevents newbie errors, and non-newbies
    can provide the functionality extremely easily. Also, we can always change our
    minds to allow addition with floats if we initially release with that raising
    an exception. But if we ever release a version of Python where Decimal and
    float can be added, we'll be stuck supporting it forever.


    Really, that's all I came up with. This is great, and I'm looking forward to
    using it. I would, though, be interested in a couple more syntax-related
    details:
    (a) What's the syntax for changing the context? I'd think we'd want
    a "pushDecimalContext()" and "popDecimalContext()" sort of approach, since most
    well-behaved routines will want to restore their caller's context.
    (b) How about querying to determine a thread's current context? I don't
    have any use cases, but it would seem peculiar not to provide it.
    (c) Given a Decimal object, is there a straightforward way to determine its
    coefficient and exponent? Methods named .precision() and .exponent() might do
    the trick.

    -- Michael Chermside
  • Stephen Horne at Jan 30, 2004 at 4:03 pm

    On Fri, 30 Jan 2004 09:49:05 -0300, "Batista, Facundo" wrote:

    I'll apreciate any feedback. Thank you all in advance.
    My concern is that many people will use a decimal type just because it
    is there, without any consideration of whether they actually need it.

    95% of the time or more, all you need to do to represent money is to
    use an integer and select appropriate units (pence rather than pounds,
    cents rather than dollars, etc) so that the decimal point is just a
    presentation issue when the value is printed/displayed but is never
    needed in the internal representation.


    That said, there are cases where a decimal type would be genuinely
    useful. Given that, my only comment on the PEP is that a decimal
    literal might be a good idea - identical to float literals but with a
    'D' appended, for instance.

    I wouldn't mention it now, seeing it as an issue for after the library
    itself has matured and been proven, except for the issue of implicit
    conversions. Having a decimal literal would add another class of
    errors with implicit conversions - it would be very easy to forget the
    'D' on the end of a literal, and to get an imprecise float implicitly
    converted to decimal rather than the precise decimal literal that was
    intended.

    I don't know what the solution should be, but I do think it needs to
    be considered.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Christopher Koppler at Jan 30, 2004 at 6:12 pm

    On Fri, 30 Jan 2004 16:03:55 +0000, Stephen Horne wrote: [snip]
    That said, there are cases where a decimal type would be genuinely
    useful. Given that, my only comment on the PEP is that a decimal
    literal might be a good idea - identical to float literals but with a
    'D' appended, for instance.
    Or, maybe if money is being represented, appending a '$'?

    *ducks*

    Just-couldn't-resist-ly yours,

    --
    Christopher
  • Achrist at Jan 31, 2004 at 3:01 am

    Stephen Horne wrote:
    I don't know what the solution should be, but I do think it needs to
    be considered.
    The C and C++ people have agreed. The next standards for those
    languages, whenever they come out, are supposed to include decimal
    floating point as a standard data type. The number of decimal
    places required is also profuse, something up around 25-30 places,
    more than current hardware, eg IBM mainframes, supports.

    If python adds decimal data, it probably ought to be consistent with C
    and C++. Otherwise, the C and C++ guys will have a dreadful time
    writing emulation code to run on computers built to support python.


    Al
  • Josiah Carlson at Jan 31, 2004 at 3:23 am

    If python adds decimal data, it probably ought to be consistent with C
    and C++. Otherwise, the C and C++ guys will have a dreadful time
    writing emulation code to run on computers built to support python.
    Now that's a "Python will take over the world" statement if I ever heard
    one. But seriously, processor manufacturers build processors and
    compilers for Fortran, C, and C++. If a manufacturer starts paying
    attention to where Python is going (for things other than scripting
    their build-process), I'm sure Guido would like to know.

    - Josiah
  • Aahz at Feb 5, 2004 at 2:09 pm

    In article <401B1A87.639CD971 at easystreet.com>, wrote:
    If python adds decimal data, it probably ought to be consistent with C
    and C++. Otherwise, the C and C++ guys will have a dreadful time
    writing emulation code to run on computers built to support python.
    Read the PEP; Python's proposed decimal type is based on the existing
    decimal standard. If C/C++ *don't* follow the standard, that's their
    problem. BTW, Java uses the standard.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
  • Aahz at Feb 5, 2004 at 2:16 pm
    In article <6ltk10h30riel0lghd18t5unjco2g26spi at 4ax.com>,
    Stephen Horne wrote:
    On Fri, 30 Jan 2004 09:49:05 -0300, "Batista, Facundo"
    wrote:
    I'll apreciate any feedback. Thank you all in advance.
    My concern is that many people will use a decimal type just because it
    is there, without any consideration of whether they actually need it.

    95% of the time or more, all you need to do to represent money is to
    use an integer and select appropriate units (pence rather than pounds,
    cents rather than dollars, etc) so that the decimal point is just a
    presentation issue when the value is printed/displayed but is never
    needed in the internal representation.
    The problem lies precisely in that representation. For starters, a
    binary integer is O(n^2) for conversion to decimal printing. Then
    there's the question about multi-currency conversions, or interest
    rates, or ....
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
  • Stephen Horne at Feb 6, 2004 at 2:49 am

    On 5 Feb 2004 09:16:51 -0500, aahz at pythoncraft.com (Aahz) wrote:
    In article <6ltk10h30riel0lghd18t5unjco2g26spi at 4ax.com>,
    Stephen Horne wrote:
    On Fri, 30 Jan 2004 09:49:05 -0300, "Batista, Facundo"
    wrote:
    I'll apreciate any feedback. Thank you all in advance.
    My concern is that many people will use a decimal type just because it
    is there, without any consideration of whether they actually need it.

    95% of the time or more, all you need to do to represent money is to
    use an integer and select appropriate units (pence rather than pounds,
    cents rather than dollars, etc) so that the decimal point is just a
    presentation issue when the value is printed/displayed but is never
    needed in the internal representation.
    The problem lies precisely in that representation. For starters, a
    binary integer is O(n^2) for conversion to decimal printing.
    In practice, there is an upper limit to the size of number that occurs
    in any financial use, and of course we are not talking about tens of
    digits let alone hundreds, meaning that the conversion is most
    sensibly treated as O(1) for each number converted.

    Anyway, speeding up the presentation of results makes little sense if
    you slow down all the arithmetic operations to do it.
    Then
    there's the question about multi-currency conversions, or interest
    rates, or ....
    Admittedly needing better than penny precision, but still fixed
    precision (ie suiting an integer representation with an implicit scale
    factor) and the results are rounded.

    I work with a company that writes accounting software. We don't need
    to worry about currency conversions, but we do need to worry about
    interest and other cases where fractional pennies seem to be implied
    (rates for taxes, allowances etc) and basically the fractional pennies
    are never really an issue - you do have to be careful with the
    rounding rules, but that applies whatever representation you use.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Bengt Richter at Feb 6, 2004 at 6:56 pm

    On 5 Feb 2004 09:16:51 -0500, aahz at pythoncraft.com (Aahz) wrote:
    In article <6ltk10h30riel0lghd18t5unjco2g26spi at 4ax.com>,
    Stephen Horne wrote:
    On Fri, 30 Jan 2004 09:49:05 -0300, "Batista, Facundo"
    wrote:
    I'll apreciate any feedback. Thank you all in advance.
    My concern is that many people will use a decimal type just because it
    is there, without any consideration of whether they actually need it.

    95% of the time or more, all you need to do to represent money is to
    use an integer and select appropriate units (pence rather than pounds,
    cents rather than dollars, etc) so that the decimal point is just a
    presentation issue when the value is printed/displayed but is never
    needed in the internal representation.
    The problem lies precisely in that representation. For starters, a
    binary integer is O(n^2) for conversion to decimal printing. Then
    Please clarify. What is your "n" in that?

    Regards,
    Bengt Richter
  • Jeff Epler at Feb 6, 2004 at 7:50 pm

    On 5 Feb 2004 09:16:51 -0500, aahz at pythoncraft.com (Aahz) wrote:
    The problem lies precisely in that representation. For starters, a
    binary integer is O(n^2) for conversion to decimal printing. Then
    On Fri, Feb 06, 2004 at 06:56:03PM +0000, Bengt Richter wrote:
    Please clarify. What is your "n" in that?
    "n" is the number of digits in the number, in this case.

    A standard way to convert to base 10 looks like this:
    def base10(i):
    digits = []
    while i:
    i, b = divmod(i, 10)
    digits.append(b)
    digits.reverse()
    return digits
    Each divmod() takes from O(n) down to O(1) (O(log i) for each successive
    value of i), and the loop runs n times (i is shortened by one digit each
    time). This is a typical n^2 algorithm, much like bubble sort where the
    outer loop runs n times and an inner loop runs 1-to-n times.

    Jeff
  • Batista, Facundo at Jan 30, 2004 at 6:21 pm
    Stephen Horne wrote:

    #- My concern is that many people will use a decimal type just
    #- because it
    #- is there, without any consideration of whether they actually need it.

    Speed considerations are raised. You'll *never* get the performance of using
    floats or ints (unless you have a coprocessor that handles this).


    #- I don't know what the solution should be, but I do think it needs to
    #- be considered.

    (In my dreams) I want to "float" to be decimal. Always. No more binary.
    Maybe in ten years the machines will be as fast as is needed to make this
    posible. Or it'll be implemented in hardware.

    Anyway, until then I'm happy having decimal floating point as a module.

    . Facundo
  • Josiah Carlson at Jan 30, 2004 at 8:30 pm

    (In my dreams) I want to "float" to be decimal. Always. No more binary.
    Maybe in ten years the machines will be as fast as is needed to make this
    posible. Or it'll be implemented in hardware.

    Anyway, until then I'm happy having decimal floating point as a module.

    In my dreams, data is optimally represented in base e, and every number
    is represented with a roughly equivalent amount of fudge-factor (except
    for linear combinations of the powers of e).

    Heh, thankfully my dreams haven't come to fuition.


    While decimal storage is useful for people and money, it is arbitrarily
    limiting. Perhaps a generalized BaseN module is called for. People
    could then generate floating point numbers in any base (up to perhaps
    base 36, [1-9a-z]). At that point, having a Money version is just a
    specific subclass of BaseN floating point.

    Of course then you have the same problem with doing math on two
    different bases as with doing math on rational numbers. Personally, I
    would more favor a generalized BaseN class than just a single Base10 class.

    - Josiah
  • Dan Bishop at Jan 31, 2004 at 9:01 am
    Josiah Carlson <jcarlson at nospam.uci.edu> wrote in message news:<bvef14$919$1 at news.service.uci.edu>...
    (In my dreams) I want to "float" to be decimal. Always. No more binary.
    I disagree.

    My reasons for this have to do with the real-life meaning of figures
    with decimal points. I can say that I have $1.80 in change on my
    desk, and I can say that I am 1.80 meters tall. But the two 1.80's
    have fundamentally different meanings.

    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.

    On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    that humans height comes in discrete packets of 0.01 m. It means that
    I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    posture and the time of day, and "1.80" is just a convenient
    approximation. And it wouldn't be inaccurate to express my height as
    0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    these are within the tolerance of the measurement. So number base
    doesn't matter here.

    But even if the number base of a measurement doesn't matter, precision
    and speed of calculations often does. And on digital computers,
    non-binary arithmetic is inherently imprecise and slow. Imprecise
    because register bits are limited and decimal storage wastes them.
    (For example, representing the integer 999 999 999 requires 36 bits in
    BCD but only 30 bits in binary. Also, for floating point, only binary
    allows the precision-gaining "hidden bit" trick.) Slow because
    decimal requires more complex hardware. (For example, a BCD adder has
    more than twice as many gates as a binary adder.)
    In my dreams, data is optimally represented in base e, and every number
    is represented with a roughly equivalent amount of fudge-factor (except
    for linear combinations of the powers of e).

    Heh, thankfully my dreams haven't come to fuition.
    Perhaps we'll have an efficient inplementation within the next
    102.1120... years or so ;-)
    While decimal storage is useful for...money
    Out of curiosity: Is there much demand for decimal floating point in
    places that have fractionless currecy like Japanese Yen?
    Perhaps a generalized BaseN module is called for. People
    could then generate floating point numbers in any base (up to perhaps
    base 36, [1-9a-z]).
    If you're going to allow exact representation of multiples of 1/2,
    1/3, 1/4, ..., 1/36, 1/49, 1/64, 1/81, 1/100, 1/121, 1/125, 1/128,
    1/144, etc., I see no reason not to have exact representations of
    *all* rational numbers. Especially considering that rationals are
    much easier to implement. (See below.)
    ... Of course then you have the same problem with doing math on two
    different bases as with doing math on rational numbers.
    Actually, the problem is even worse.

    Like rationals, BaseN numbers have the problem that there are multiple
    representations for the same number (e.g., 1/2=6/12, and 0.1 (2) = 0.6
    (12)). But rationals at least have a standardized normalization. We
    agree can agree that 1/2 should be represented as 1/2 and not
    -131/-262, but should BaseN('0.1', base=2) + BaseN('0.1', base=4) be
    BaseN('0.11', 2) or BaseN('0.3', 4)?

    The same potential problem exists with ints, but Python (and afaik,
    everything else) avoids it by internally storing everything in binary
    and not keeping track of its representation. This is why "print 0x68"
    produces the same output as "print 104". BaseN would violate this
    separation between numbers and their notation, and imho that would
    create a lot more problems than it solves.

    Including the problem that mixed-based arithmetic will require:
    * approximating at least one of the numbers, in which case there's no
    advantage over binary, or
    * finding a "least common base", but what if that base is greater than
    36 (or 62 if lowercase digits are distinguished from uppercase ones)?
  • Stephen Horne at Jan 31, 2004 at 10:45 am

    On 31 Jan 2004 01:01:41 -0800, danb_83 at yahoo.com (Dan Bishop) wrote:
    I disagree. <snip>
    But even if the number base of a measurement doesn't matter, precision
    and speed of calculations often does. And on digital computers,
    non-binary arithmetic is inherently imprecise and slow. Imprecise
    because register bits are limited and decimal storage wastes them.
    (For example, representing the integer 999 999 999 requires 36 bits in
    BCD but only 30 bits in binary. Also, for floating point, only binary
    allows the precision-gaining "hidden bit" trick.) Slow because
    decimal requires more complex hardware. (For example, a BCD adder has
    more than twice as many gates as a binary adder.)
    I think BSD is a slightly unfair comparison. The efficiency of packing
    decimal digits into binary integers increases as the size of each
    packed group of digits increases. For example, while 8 BCD digits
    requires 32 bits those 32 bits can encode 9 decimal digits, and while
    16 BCD digits requires 64 bits, those digits can encode 19 decimal
    digits.

    The principal is correct, though - binary is 'natural' for computers
    where decimal is more natural for people, so decimal representations
    will be relatively inefficient even with hardware support. Low
    precision because a mantissa with the same number of bits can only
    represent a smaller range of values. Slow (or expensive) because of
    the relative complexity of handling decimal using binary logic.
    Perhaps a generalized BaseN module is called for. People
    could then generate floating point numbers in any base (up to perhaps
    base 36, [1-9a-z]).
    <snip>
    ... Of course then you have the same problem with doing math on two
    different bases as with doing math on rational numbers.
    Actually, the problem is even worse.

    Like rationals, BaseN numbers have the problem that there are multiple
    representations for the same number (e.g., 1/2=6/12, and 0.1 (2) = 0.6
    (12)). But rationals at least have a standardized normalization. We
    agree can agree that 1/2 should be represented as 1/2 and not
    -131/-262, but should BaseN('0.1', base=2) + BaseN('0.1', base=4) be
    BaseN('0.11', 2) or BaseN('0.3', 4)?
    I don't see the point of supporting all bases. The main ones are of
    course base 2, 8, 10 and 16. And of course base 8 and 16
    representations map directly to base 2 representations anyway - that
    is why they get used in the first place.

    If I were supporting loads of bases (and that is a big 'if') I would
    take an approach where each base type directly supported arithmetic
    with itself only. Each base would be imported separately and be
    implemented using code optimised for that base, so that the base
    wouldn't need to be maintained by - for instance - a member of the
    class. There would be a way to convert between bases, but that would
    be the limit of the interaction.

    If I needed more than that, I'd use a rational type - I speak from
    experience as I set out to write a base N float library for C++ once
    upon a time and ended up writing a rational instead. A rational, BTW,
    isn't too bad to get working but that's as far as I got - doing it
    well would probably take a lot of work. And if getting Base N floats
    working was harder than for rationals, getting them to work well would
    probably be an order of magnitude harder - for no real benefit to 99%
    or more of users.

    Just because a thing can be done, that doesn't make it worth doing.
    but what if that base is greater than
    36 (or 62 if lowercase digits are distinguished from uppercase ones)?
    For theoretical use, converting to a list of integers - one integer
    representing each 'digit' - would probably work. If there is a real
    application, that is.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Josiah Carlson at Jan 31, 2004 at 5:35 pm

    If I needed more than that, I'd use a rational type - I speak from
    experience as I set out to write a base N float library for C++ once
    upon a time and ended up writing a rational instead. A rational, BTW,
    isn't too bad to get working but that's as far as I got - doing it
    well would probably take a lot of work. And if getting Base N floats
    working was harder than for rationals, getting them to work well would
    probably be an order of magnitude harder - for no real benefit to 99%
    or more of users.
    I also wrote a rational type (last summer). It took around 45 minutes.
    Floating point takes a bit longer to get right.
    Just because a thing can be done, that doesn't make it worth doing.
    Indeed :)

    - Josiah
  • Stephen Horne at Jan 31, 2004 at 7:33 pm

    On Sat, 31 Jan 2004 09:35:09 -0800, Josiah Carlson wrote:
    If I needed more than that, I'd use a rational type - I speak from
    experience as I set out to write a base N float library for C++ once
    upon a time and ended up writing a rational instead. A rational, BTW,
    isn't too bad to get working but that's as far as I got - doing it
    well would probably take a lot of work. And if getting Base N floats
    working was harder than for rationals, getting them to work well would
    probably be an order of magnitude harder - for no real benefit to 99%
    or more of users.
    I also wrote a rational type (last summer). It took around 45 minutes.
    Floating point takes a bit longer to get right.
    Was your implementation the 'not too bad to get working' or the 'doing
    it well'?

    For instance, there is the greatest common divisor that you need for
    normalising the rationals.

    I used the Euclidean algorithm for the GCD. Not too bad, certainly
    better than using prime factorisation, but as I understand it doing
    the job well means using a better algorithm for this - though I never
    did bother looking up the details.

    Actually, as far as I remember, just doing the arbitrary length
    integer division functions took me more than your 45 minutes. The long
    division algorithm is simple in principle, but I seem to remember
    messing up the decision of how many bits to shift the divisor after a
    subtraction. Of course in Python, that's already done.

    Maybe I was just having a bad day. Maybe I remember it worse than it
    really was. Still, 45 minutes doesn't seem too realistic in my memory,
    even for the 'not too bad to get working' case.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Josiah Carlson at Feb 1, 2004 at 7:10 pm

    Was your implementation the 'not too bad to get working' or the 'doing
    it well'?
    I thought it did pretty well. But then again, I didn't really much
    worry about it or use it much. I merely tested to make sure it did the
    right thing and forgot about it.
    For instance, there is the greatest common divisor that you need for
    normalising the rationals.

    I used the Euclidean algorithm for the GCD. Not too bad, certainly
    better than using prime factorisation, but as I understand it doing
    the job well means using a better algorithm for this - though I never
    did bother looking up the details.
    I also used Euclid's GCD, but last time I checked, it is a pretty
    reasonable algorithm. Runs in log(n) time, where n is the maximum of
    either value. Technically, it runs linear in the amount of space that
    it takes up, which is about as well as you can do.
    Actually, as far as I remember, just doing the arbitrary length
    integer division functions took me more than your 45 minutes. The long
    division algorithm is simple in principle, but I seem to remember
    messing up the decision of how many bits to shift the divisor after a
    subtraction. Of course in Python, that's already done.
    Ahh, integer division. I solved a related problem with long integers
    for Python in a programming competition my senior year of college
    (everyone else was using Java, the suckers) in about 15 minutes. We
    were to calculate 1/n, for some arbitrarily large n (where 1/n was a
    fraction that could be represented by base-10 integer division). Aside
    from I/O, it was 9 lines.

    Honestly, I never implemented integer division in my rational type. For
    casts to floats,
    float(self.numerator)/float(self.denominator)+self.whole seemed just
    fine (I was using rationals with denominators in the range of 2-100 and
    total value < 1000).

    Thinking about it now, it wouldn't be very difficult to pull out my 1/n
    code and adapt it to the general integer division problem. Perhaps
    something to do later.
    Maybe I was just having a bad day. Maybe I remember it worse than it
    really was. Still, 45 minutes doesn't seem too realistic in my memory,
    even for the 'not too bad to get working' case.
    For all the standard operations on a rational type, all you need is to
    make sure all you have is two pairs of numerators and denominators, then
    all the numeric manipulation is trivial:
    a.n = a.numerator * a.whole*a.denominator
    a.d = a.denominator
    b.n = b.numerator * b.whole*b.denominator
    b.d = b.denominator

    a + b = rational(a.n*b.d + b.n*a.d, a.d*b.d)
    a - b = rational(a.n*b.d - b.n*a.d, a.d*b.d)
    a * b = rational(a.n*b.n, a.d*b.d)
    a / b = rational(a.n*b.d, a.d*b.n)
    a ** b, b is an integer >= 1 (binary exponentiation)


    One must remember to normalize on initialization, but that's not
    difficult. Functionally that's how my rational turned out. It wasn't
    terribly full featured, but it worked well for what I was doing.

    - Josiah
  • Paul Moore at Feb 2, 2004 at 5:52 pm

    Josiah Carlson <jcarlson at nospam.uci.edu> writes:

    One must remember to normalize on initialization, but that's not
    difficult. Functionally that's how my rational turned out. It wasn't
    terribly full featured, but it worked well for what I was doing.
    Straightforward rational implementations *are* easy. But when you
    start to look at some of the more subtle numerical issues, life
    rapidly gets hard.

    The key point (easy enough with Python, but bear with me) is that the
    numerator and denominator *must* be infinite-precision integers.
    Otherwise, rationals have as many rounding and representational issues
    as floating point numbers, and the characteristics of the problems
    differ in ways that make them *less* usable without specialist
    knowledge, not more.

    With Python, this isn't an onerous requirement, as Python Longs fit
    the bill nicely. But the next decision you have to make is how often
    to normalise. You imply (in your comment above) that you should only
    normalise on initialisation, but if you do that, your representation
    rapidly blows up, in terms of space used. Sure,
    8761348763287654786543876543/17522697526575309573087753086 is the same
    as 1/2, but the former uses a lot more space, and is going to be
    slower to compute with.

    But if you normalise every time, some theoretically simple operations
    can become relatively very expensive in terms of time. (Basically,
    things like addition, which suddenly require a GCD calculation).

    So you have to work out a good tradeoff, which isn't easy.

    There are other issues to consider, but that should be enough to
    demonstrate the sort of issues an "industrial strength" rational
    implementation must address.

    Of course, this isn't to say that every implementation *needs* to be
    industrial-strength. Only the user can say what's good enough for his
    needs.

    Paul.
    --
    This signature intentionally left blank
  • Josiah Carlson at Feb 2, 2004 at 9:55 pm

    But if you normalise every time, some theoretically simple operations
    can become relatively very expensive in terms of time. (Basically,
    things like addition, which suddenly require a GCD calculation).
    If we are to take cues from standard Python numeric types, any
    mathematical calculation results in a new immutable object. Thusly,
    only normalizing on initialization is sufficient. Since that is the
    only time you ever get anything new, doing GCD on initialization is the
    minimum and maximum requirement.

    - Josiah
  • Mel Wilson at Feb 3, 2004 at 4:02 pm
    In article <bvmh58$4hc$1 at news.service.uci.edu>,
    Josiah Carlson wrote:
    But if you normalise every time, some theoretically simple operations
    can become relatively very expensive in terms of time. (Basically,
    things like addition, which suddenly require a GCD calculation).
    If we are to take cues from standard Python numeric types, any
    mathematical calculation results in a new immutable object. Thusly,
    only normalizing on initialization is sufficient. Since that is the
    only time you ever get anything new, doing GCD on initialization is the
    minimum and maximum requirement.
    I agree, but that means we do a lot of initializations,
    so the performance in doing a computation would be about the
    same.

    I tried a decimal floating-point package just lately, for
    fun, based on long mantissas and int exponents. I used this
    approach to normalization, because I think it's natural, but
    I've been scared to benchmark the package. I should, I
    guess.

    Regards. Mel.
  • Dan Bishop at Feb 7, 2004 at 1:53 am
    Josiah Carlson <jcarlson at nospam.uci.edu> wrote in message news:<bvjj49$c94$1 at news.service.uci.edu>...
    Was your implementation [of rationals] the 'not too bad to get working' or
    the 'doing it well'?
    ...
    For all the standard operations on a rational type, all you need is to
    make sure all you have is two pairs of numerators and denominators, then
    all the numeric manipulation is trivial: ...
    a + b = rational(a.n*b.d + b.n*a.d, a.d*b.d)
    a - b = rational(a.n*b.d - b.n*a.d, a.d*b.d)
    a * b = rational(a.n*b.n, a.d*b.d)
    a / b = rational(a.n*b.d, a.d*b.n)
    Also,

    floor(a) = a.n // a.d
    a // b = floor(a / b)
    a ** b, b is an integer >= 1 (binary exponentiation)
    It's even more trivial when b=0: The result is 1.

    And when b < 0, a ** b can be calculated as (1 / a) ** (-b)
  • Aahz at Feb 5, 2004 at 2:18 pm
    In article <ad052e5c.0401310101.1c5bd5aa at posting.google.com>,
    Dan Bishop wrote:
    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.
    Really. What kind of financial institution was this? They didn't need
    to deal with any form of fractional pennies?
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
  • Stephen Horne at Feb 6, 2004 at 1:51 am

    On 5 Feb 2004 09:18:12 -0500, aahz at pythoncraft.com (Aahz) wrote:
    In article <ad052e5c.0401310101.1c5bd5aa at posting.google.com>,
    Dan Bishop wrote:
    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.
    Really. What kind of financial institution was this? They didn't need
    to deal with any form of fractional pennies?
    Does it really matter if they did? They may not deal in whole pennies,
    but I seriously doubt that they need infinite precision - integers
    with a predefined scaling factor (ie fixed point arithmetic) will, I
    suspect, handle those few jobs that counting in pennies can't.

    For instance, while certainly exchange rates involve fractional
    amounts (specified to a fixed number of places), the converted amounts
    will be rounded as account balances are recorded to the nearest penny,
    unless I'm very badly mistaken. The same applies to interest - the
    results get rounded before the balance is affected.

    So if the exchange rate is 1.83779 dollars to the uk pound, who can't
    cope with the following code?

    exchange_rate = 183779

    result = pounds * exchange_rate / 100000

    Assuming that rounding matches the programming languages default
    behaviour, of course, and that the width of the integers is
    sufficient.


    That said, as I understand it, a lot of financial institutions have a
    lot of COBOL code. And from what I remember of programming in COBOL,
    the typical representation of numbers in both files and working
    storage uses decimal digits stored in a character string - at least
    that's what the picture strings specify in the source code. Given that
    the compiler knows the precision of every number, and assuming that
    there is no conversion to a more convenient representation internally,
    it shouldn't make much difference whether the number has a point or
    not.


    Personally, I wouldn't want to contradict Dan Bishops claims - he has
    the experience in a financial institution, not me - but I suspect
    there is a fair amount of code used in many financial institutions
    that does in fact use a decimal representation, if only because of old
    COBOL code.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Aahz at Feb 6, 2004 at 3:39 am
    In article <qeq520pv7kbd1s3ojmn3idetjuljhtk5md at 4ax.com>,
    Stephen Horne wrote:
    On 5 Feb 2004 09:18:12 -0500, aahz at pythoncraft.com (Aahz) wrote:
    In article <ad052e5c.0401310101.1c5bd5aa at posting.google.com>,
    Dan Bishop wrote:
    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.
    Really. What kind of financial institution was this? They didn't need
    to deal with any form of fractional pennies?
    Does it really matter if they did? They may not deal in whole pennies,
    but I seriously doubt that they need infinite precision - integers
    with a predefined scaling factor (ie fixed point arithmetic) will, I
    suspect, handle those few jobs that counting in pennies can't.
    That's mostly true (witness Tim Peters's FixedPoint.py). If you really
    want to debate this issue, read Cowlishaw first:
    http://www2.hursley.ibm.com/decimal/decarith.html
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
  • Dan Bishop at Feb 7, 2004 at 1:38 am
    Stephen Horne <steve at ninereeds.fsnet.co.uk> wrote in message news:<qeq520pv7kbd1s3ojmn3idetjuljhtk5md at 4ax.com>...
    On 5 Feb 2004 09:18:12 -0500, aahz at pythoncraft.com (Aahz) wrote:

    In article <ad052e5c.0401310101.1c5bd5aa at posting.google.com>,
    Dan Bishop wrote:
    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.
    Really. What kind of financial institution was this? They didn't need
    to deal with any form of fractional pennies?
    Does it really matter if they did? They may not deal in whole pennies,
    but I seriously doubt that they need infinite precision - integers
    with a predefined scaling factor (ie fixed point arithmetic) will, I
    suspect, handle those few jobs that counting in pennies can't.
    And you would be right. For example, interest rates were always
    stored in thousandths of a percent.

    The only problem was that some of the third-party software we used
    made this scaling completely visible to the user. Our employees would
    occasionally forget the scaling factor, and this resulted in mistakes
    like having one of our CD's pay 445% interest instead of 4.45%.
    That said, as I understand it, a lot of financial institutions have a
    lot of COBOL code. And from what I remember of programming in COBOL,
    the typical representation of numbers in both files and working
    storage uses decimal digits stored in a character string - at least
    that's what the picture strings specify in the source code.
    We had a lot of numbers in EBCDIC signed decimal. Even though our
    mainframe used ASCII.
  • Aahz at Feb 11, 2004 at 3:09 am
    In article <ad052e5c.0402061738.bdddcaa at posting.google.com>,
    Dan Bishop wrote:
    Stephen Horne <steve at ninereeds.fsnet.co.uk> wrote in message news:<qeq520pv7kbd1s3ojmn3idetjuljhtk5md at 4ax.com>...
    On 5 Feb 2004 09:18:12 -0500, aahz at pythoncraft.com (Aahz) wrote:
    In article <ad052e5c.0401310101.1c5bd5aa at posting.google.com>,
    Dan Bishop wrote:
    For money, it means that I have *exactly* $1.80. This is because
    "dollars" are just a notational convention for large numbers of cents.
    I can just as accuately say that have an (integer) 180 cents, and
    indeed, that's exactly the way it would be stored in my financial
    institution's database. (I know because I used to work there.) So
    all you really need here is "int". But I do agree with the idea of
    having a class to hide the decimal/integer conversion from the user.
    Really. What kind of financial institution was this? They didn't need
    to deal with any form of fractional pennies?
    Does it really matter if they did? They may not deal in whole pennies,
    but I seriously doubt that they need infinite precision - integers
    with a predefined scaling factor (ie fixed point arithmetic) will, I
    suspect, handle those few jobs that counting in pennies can't.
    And you would be right. For example, interest rates were always
    stored in thousandths of a percent.

    The only problem was that some of the third-party software we used
    made this scaling completely visible to the user. Our employees would
    occasionally forget the scaling factor, and this resulted in mistakes
    like having one of our CD's pay 445% interest instead of 4.45%.
    ...and that's a good argument for having a built-in type that handles
    the conversions automatically. Another issue is the different kinds of
    rounding. All in all, there are many kinds of already-solved problems
    that are taken care of by using the decimal float standard.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "The joy of coding Python should be in seeing short, concise, readable
    classes that express a lot of action in a small amount of clear code --
    not in reams of trivial code that bores the reader to death." --GvR
  • Stephen Horne at Jan 30, 2004 at 10:32 pm

    On Fri, 30 Jan 2004 07:06:21 -0800, Michael Chermside wrote:

    Facundo Batista writes:
    I'm proud to announce that the PEP for Decimal Data Type is now published
    http://www.python.org/peps/pep-0327.html
    VERY nice work here.

    Here's my 2 cents:

    (1) You propose conversion from floats via:
    Decimal(1.1, 2) == Decimal('1.1')
    Decimal(1.1, 16) == Decimal('1.1000000000000001')
    Decimal(1.1) == Decimal('110000000000000008881784197001252...e-51')

    I think that we'd do even better to ommit the second use. People who
    really want to convert floats exactly can easily write "Decimal(1.1, 60)". But
    hardly anyone wants to convert floats exactly, while lots of newbies would
    forget to include the second parameter. I'd say just make Decimal(someFloat)
    raise a TypeError with a helpful message about how you need that second
    parameter when using floats.
    Good point. A 'DecimalExact' or similar function could perhaps be
    provided to replace the simple conversion when people have really
    thought about it and do really want it.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Batista, Facundo at Feb 2, 2004 at 2:45 pm
    danb_83 wrote:

    #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    #- that humans height comes in discrete packets of 0.01 m. It
    #- means that
    #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    #- posture and the time of day, and "1.80" is just a convenient
    #- approximation. And it wouldn't be inaccurate to express my height as
    #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    #- these are within the tolerance of the measurement. So number base
    #- doesn't matter here.

    Are you saying that it's ok to store your number imprecisely because you
    don't take well measures?


    #- But even if the number base of a measurement doesn't matter,
    #- precision
    #- and speed of calculations often does. And on digital computers,
    #- non-binary arithmetic is inherently imprecise and slow. Imprecise
    #- because register bits are limited and decimal storage wastes them.
    #- (For example, representing the integer 999 999 999 requires
    #- 36 bits in
    #- BCD but only 30 bits in binary. Also, for floating point,
    #- only binary
    #- allows the precision-gaining "hidden bit" trick.) Slow because
    #- decimal requires more complex hardware. (For example, a BCD
    #- adder has
    #- more than twice as many gates as a binary adder.)

    In my dreams, speed and storage are both infinite, :p

    . Facundo
  • David M. Cooke at Feb 2, 2004 at 10:07 pm

    At some point, "Batista, Facundo" wrote:

    danb_83 wrote:

    #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    #- that humans height comes in discrete packets of 0.01 m. It
    #- means that
    #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    #- posture and the time of day, and "1.80" is just a convenient
    #- approximation. And it wouldn't be inaccurate to express my height as
    #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    #- these are within the tolerance of the measurement. So number base
    #- doesn't matter here.

    Are you saying that it's ok to store your number imprecisely because you
    don't take well measures?
    What we need for this is an interval type. 1.80 m shouldn't be stored
    as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    and multiplication should propogate the intervals.

    How to do that is another question: for addition, do you add the
    magnitudes of the intervals, or use the square root of the sums of the
    squares, or something else? It greatly depends on what _type_ of error
    0.005 measures (is it the width of a Gaussian distribution? a uniform
    distribution? something skewed that's not representable by one
    number?).

    My 0.0438126 Argentina pesos [1]

    [1] $0.02 Canadian, which hilights the other problem with any
    representation of a number without units -- decimal or otherwise.

    --
    \/|<
    /--------------------------------------------------------------------------\
    David M. Cooke
    cookedm(at)physics(dot)mcmaster(dot)ca
  • Stephen Horne at Feb 4, 2004 at 1:59 am

    On Mon, 02 Feb 2004 17:07:52 -0500, cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote:
    At some point, "Batista, Facundo" wrote:

    danb_83 wrote:

    #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    #- that humans height comes in discrete packets of 0.01 m. It
    #- means that
    #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    #- posture and the time of day, and "1.80" is just a convenient
    #- approximation. And it wouldn't be inaccurate to express my height as
    #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    #- these are within the tolerance of the measurement. So number base
    #- doesn't matter here.

    Are you saying that it's ok to store your number imprecisely because you
    don't take well measures?
    What we need for this is an interval type. 1.80 m shouldn't be stored
    as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    and multiplication should propogate the intervals.
    I disagree with this, not because it is a bad idea to keep track of
    precision, but because this should not be a part of the float type or
    of basic arithmetic operations.

    When you write a value with its precision specified in the form of an
    interval, that interval is a second number. The value with the
    precision is a compound representation, built up using simpler
    components. It doesn't mean that the components no longer have uses
    outside of the compound. In Python, the same should apply - a numeric
    type that can track precision sounds useful, but it shouldn't replace
    the existing float.

    One good reason is simply that knowledge of the precision is only
    sometimes useful. As an obvious example, what would the point be of
    keeping track of the precision of the calculations in a 3D game -
    there is no point as the information about precision has no bearing on
    the rendering of the image.

    Besides this, there is a much more fundamental problem.

    The whole point of using an imprecise representation is because
    manipulating a perfect representation is impractical - mainly slow.

    It is true that in general the source is inherently approximate too,
    meaning that floats are a quite a good match for the physical
    measurements they are often used to represent, but still if it were
    practical to do perfect arithmetic on those approximate values it
    would give slightly more precise answers as the arithmetic would not
    introduce additional sources of error.

    Having an approximate representation with an interval sounds good, but
    remember that one error source is the arithmetic itself - e.g. 1.0 /
    3.0 cannot be finitely represented in either binary or decimal without
    error (except as a rational, of course).

    So therefore, in answer to your question...
    How to do that is another question: for addition, do you add the
    magnitudes of the intervals, or use the square root of the sums of the
    squares, or something else? It greatly depends on what _type_ of error
    0.005 measures (is it the width of a Gaussian distribution? a uniform
    distribution? something skewed that's not representable by one
    number?).
    None of these is sufficient - they may track the errors resulting from
    measurement issues (if you choose the appropriate method for your
    application) but neither takes into account errors resulting from the
    imprecision of the arithmetic. Furthermore, to keep track of such
    imprecision precisely means you need an infinitely precise numeric
    representation for your interval - and if it was practical to do that,
    it would be far better to just use that representation for the value
    itself.

    This doesn't mean that tracking precision is a bad idea. It just means
    that when it is done, the error interval itself should be imprecise.
    You should have the guarantee that the real value is never going to be
    outside of the given bounds, but not the guarantee that the bounds are
    as close together as possible - the bounds should be allowed to get a
    little further apart to allow for imprecision in the calculation of
    the interval.

    And if the error interval is itself an approximation, why track it on
    every single arithmetic operation? Unless you have a specific good
    reason to do so, it makes much more sense to handle the precision
    tracking at a higher level. And as those higher level operations are
    often going to be application specific, having a single library for it
    (ie not tailored to some particular type of task) is IMO unlikely to
    work.

    For instance, consider calculating and applying a 3D rotation matrix
    to a vector. If you track errors on every float value, that is 9
    values in the matrix with error values (due to limited precision trig
    functions etc) and 3 values in the vector, a dozen for the
    intermediate results in the matrix multiplication, and 3 error
    intervals for the 3 dimensions of the output vector. But the odds are
    that all you want is a single float value - the maximum distance
    between the real point and the point represented by the output vector,
    and you can probably get a good value for that by multiplying the
    length of the input vector by some 'potential error from rotation'
    constant.

    Incidentally, it would not always be appropriate to include arithmetic
    errors in error intervals. For instance, some statistical interval
    types do not guarantee that all values are within the interval range.
    They may guarantee that 95% of values are within the interval, for
    instance - _and_ that 5% of values are outside the interval. The 5%
    outside is as important as the 95% inside, so there is no acceptable
    direction to move the bounds a little 'just to be safe'.

    In some cases, you might even want to track the error interval (from
    arithmetic error) for your error interval value. I can certainly
    imagine a result with the form...

    The average widginess of a blodgit is 9.5 +/- 0.2
    95% differ from the average by less than 2.7 +/- 0.03

    Thus I can say that this randomly chosen blodgit has a
    widginess of (9.5 +/- 0.2) +/- (2.7 +/- 0.03) with 95% confidence.

    You might even get results like that it you had estimated the average
    and distribution of widginess from a sample of the blodgits - in which
    case, you may still need to account from the arithmetic error which
    requires potentially another four values ;-)


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • David M. Cooke at Feb 4, 2004 at 7:52 pm

    At some point, Stephen Horne wrote:

    On Mon, 02 Feb 2004 17:07:52 -0500, cookedm+news at physics.mcmaster.ca
    (David M. Cooke) wrote:
    At some point, "Batista, Facundo" wrote:

    danb_83 wrote:

    #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
    #- that humans height comes in discrete packets of 0.01 m. It
    #- means that
    #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
    #- posture and the time of day, and "1.80" is just a convenient
    #- approximation. And it wouldn't be inaccurate to express my height as
    #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
    #- these are within the tolerance of the measurement. So number base
    #- doesn't matter here.

    Are you saying that it's ok to store your number imprecisely because you
    don't take well measures?
    What we need for this is an interval type. 1.80 m shouldn't be stored
    as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    and multiplication should propogate the intervals.
    I disagree with this, not because it is a bad idea to keep track of
    precision, but because this should not be a part of the float type or
    of basic arithmetic operations.
    I was being a bit facetious :-) This is certainly something that can
    be done without being builtin, like this:
    http://pedro.dnp.fmph.uniba.sk/~stanys/Uncertainities.py
    Having an approximate representation with an interval sounds good, but
    remember that one error source is the arithmetic itself - e.g. 1.0 /
    3.0 cannot be finitely represented in either binary or decimal without
    error (except as a rational, of course).
    Hey, if my measurement error is so small that arithmetic error becomes
    significant, I'm happy.

    --
    \/|<
    /--------------------------------------------------------------------------\
    David M. Cooke
    cookedm(at)physics(dot)mcmaster(dot)ca
  • Stephen Horne at Feb 4, 2004 at 9:01 pm

    On Wed, 04 Feb 2004 14:52:42 -0500, cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote:

    I was being a bit facetious :-)
    Ah - sorry for taking it the wrong way.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
  • Bengt Richter at Feb 6, 2004 at 3:58 pm
    On Wed, 04 Feb 2004 01:59:41 +0000, Stephen Horne wrote:
    [...]
    A bunch of stuff including stuff about intervals which probably could
    benefit from revision in the light of, e.g.,

    http://www.americanscientist.org/template/AssetDetail/assetid/28331;jsessionid=aaa41kNy_Uu1-c

    or the whole in "printer-friendly" format

    http://www.americanscientist.org/template/AssetDetail/assetid/28331/page/3?&print=yes

    or the .pdf (nicer) at

    http://www.americanscientist.org/template/PDFDetail/assetid/28315;jsessionid=aaa41kNy_Uu1-c

    see also

    http://www.cs.utep.edu/interval-comp/

    Google is your friend ;-)

    Regards,
    Bengt Richter
  • Batista, Facundo at Feb 3, 2004 at 12:33 pm
    cookedm wrote:

    #- What we need for this is an interval type. 1.80 m shouldn't be stored
    #- as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    #- and multiplication should propogate the intervals.

    I think this kind of math is beyond a pure numeric data type. 1.80 is to be
    represented as a numeric data type. And also 0.005.

    But '1.80 +/- 0.005' should be worked in another object. Hey! These are the
    benefits of OOP!

    . Facundo
  • Bengt Richter at Feb 6, 2004 at 5:03 pm

    On Tue, 3 Feb 2004 09:33:26 -0300, "Batista, Facundo" wrote:
    cookedm wrote:

    #- What we need for this is an interval type. 1.80 m shouldn't be stored
    #- as '1.80', but as '1.80 +/- 0.005', and operations such as addition
    #- and multiplication should propogate the intervals.

    I think this kind of math is beyond a pure numeric data type. 1.80 is to be
    represented as a numeric data type. And also 0.005.

    But '1.80 +/- 0.005' should be worked in another object. Hey! These are the
    benefits of OOP!
    The key concern is _exactly_ representing the limits of an interval that is
    _guaranteed to contain_ the exact value of interest. One hopes to represent
    very narrow intervals, but the principle is the same irrespective of the available
    computer states available to represent the end points.

    E.g., integer intervals can reliably enclose 1.8 and 0.005
    (with [1,2] and [0,1] respectively). Of course, [1,2] +- [0,1]
    => [0,3] gets you something less than useful for 1.8+-0.005

    But choosing from available IEEE-754 floating point double states
    gets you some really narrow intervals, where e.g. 1.8 can be guaranteed to
    be in the closed interval including the two nearest available
    exactly-representable floating point numers, namely

    [1.8000000000000000444089209850062616169452667236328125,
    1.79999999999999982236431605997495353221893310546875]

    I'll leave it as an exercise to work out the exactly representable value
    interval limits for 0.005 and 1.8+-0.005 ;-)

    The _meaning_ of numbers that are guaranteed to fall into known exact intervals
    in terms of representing measurements, measurement errors, statistics of the
    errors, etc. is a separate matter from keeping track of exact intervals during
    computation. These concerns should not be confused, IMO, though they inevitably
    arise together in thinking about computing with real-life measurement values.

    Regards,
    Bengt Richter
  • Anton Vredegoor at Feb 6, 2004 at 7:25 pm

    On 6 Feb 2004 17:03:57 GMT, bokr at oz.net (Bengt Richter) wrote:
    The _meaning_ of numbers that are guaranteed to fall into known exact intervals
    in terms of representing measurements, measurement errors, statistics of the
    errors, etc. is a separate matter from keeping track of exact intervals during
    computation. These concerns should not be confused, IMO, though they inevitably
    arise together in thinking about computing with real-life measurement values.
    (Warning, naive hobbyist input, practicality: undefined)

    One possible option would be to provide for some kind of random
    rounding routine for some of the least significant bits of a floating
    point value. The advantage would be that this would also be usable for
    DSP-like computations that are used in music programming (volume
    adjustments) or in digital video (image rotation).

    I agree with the idea that exact interval tracking is important, but
    perhaps this exact interval tracking should be used only during
    testing and development of the code.

    It could be that it would be possible to produce code with a fixed
    number of least significant bits that are randomly rounded each time
    some specific operation makes this necessary (not *all* computations!)
    and that the floating point data would stay accurate enough for long
    enough to be useable in 99.9 percent of the use cases.

    Maybe we need a DSP-float instead of a decimal data type? Decimals
    could be used for testing DSP-float implementations.

    Anton
  • Tim Roberts at Feb 8, 2004 at 6:55 am

    anton at vredegoor.doge.nl (Anton Vredegoor) wrote:
    One possible option would be to provide for some kind of random
    rounding routine for some of the least significant bits of a floating
    point value.
    I'm not so sure about this. It still gives you what seems to be an exact
    answer with 15 decimal places, but now you have non-determinism. The real
    answer, I think, is getting people to understand how much of their
    real-world measurements are garbage.
    The advantage would be that this would also be usable for
    DSP-like computations that are used in music programming (volume
    adjustments) or in digital video (image rotation).
    Interesting. I know you were kind of talking off the top of your head, but
    can you tell me what leads you to thinking that some low-order randomness
    would be helpful in those particular applications?
    Maybe we need a DSP-float instead of a decimal data type? Decimals
    could be used for testing DSP-float implementations.
    Can you describe what you mean by DSP-float? I'm not sure why a DSP should
    treat floats any differently than an ordinary processor.
    --
    - Tim Roberts, timr at probo.com
    Providenza & Boekelheide, Inc.
  • Anton Vredegoor at Feb 9, 2004 at 12:43 pm

    Tim Roberts wrote:
    anton at vredegoor.doge.nl (Anton Vredegoor) wrote:
    One possible option would be to provide for some kind of random
    rounding routine for some of the least significant bits of a floating
    point value.
    I'm not so sure about this. It still gives you what seems to be an exact
    answer with 15 decimal places, but now you have non-determinism. The real
    answer, I think, is getting people to understand how much of their
    real-world measurements are garbage.
    Yes, but this is not a simple matter. There is some kind of order long
    after strict methods become unwieldy. An intelligent rounding scheme
    could harness some of this partial order to keep the computations more
    accurate over a wider range of manipulations on real world data.

    I'm providing some code below to show that there is order beyond
    determinism. It's not very helpful in an explicit way, but it should
    serve to prove the point for someone wanting to look at it for long
    enough and willing to check the code for some exact deterministic
    explanation, and being unable to formalize it :-)

    Also it's not bad to look at even for those not wanting to
    investigate, so it might help to prevent possible tension in this
    discussion a bit.
    The advantage would be that this would also be usable for
    DSP-like computations that are used in music programming (volume
    adjustments) or in digital video (image rotation).
    Interesting. I know you were kind of talking off the top of your head, but
    can you tell me what leads you to thinking that some low-order randomness
    would be helpful in those particular applications?
    There are high end digital mixers that use some kind of random
    rounding to the least significant bits of their sample data in order
    to make the sounds "survive" more manipulations before the effect of
    the manipulations becomes audible.

    In digital video with image rotation there is the problem of
    determining where an object exactly is after it is rotated, because
    all of its coordinate points have been rounded. A statistic approach
    seems to work well here.

    On a more cosmic scale the universe seems to use the same trick of
    indeterminism, at least according to quantum theory and the Heisenberg
    uncertainty principle. Some think that because of that the universe
    itself must be a computer simulation :-) I guess I'd better stop here
    before someone mentions Douglas Adams ...
    Maybe we need a DSP-float instead of a decimal data type? Decimals
    could be used for testing DSP-float implementations.
    Can you describe what you mean by DSP-float? I'm not sure why a DSP should
    treat floats any differently than an ordinary processor.
    You are right, a DSP is just like an ordinary processor, except that
    it is specialized for digital signal processing operations. I guess I
    got a bit carried away by thinking about a datatype that has builtin
    random rounding for the least significant bits. For example by using
    the Mersenne twisted random generator, it could compute a lot of
    rounding bytes at once and just use them up as needed. This way it
    would not slow down the computations too much.

    Anton

    from __future__ import division
    from Tkinter import *
    from random import random,choice

    class Scaler:

    def __init__(self, world, viewport):
    (a,b,c,d), (e,f,g,h) = world, viewport
    xf,yf = self.xf,self.yf = (g-e)/(c-a),(h-f)/(d-b)
    wxc,wyc = (a+c)/2, (b+d)/2
    vxc,vyc = (e+g)/2, (f+h)/2
    self.xc,self.yc = vxc-xf*wxc,vyc-yf*wyc

    def scalepoint(self, a, b):
    xf,yf,xc,yc = self.xf,self.yf,self.xc,self.yc
    return xf*a+xc,yf*b+yc

    def scalerect(self, a, b, c, d):
    xf,yf,xc,yc = self.xf,self.yf,self.xc,self.yc
    return xf*a+xc,yf*b+yc,xf*c+xc,yf*d+yc

    class RandomDot:

    def __init__(self, master, n):
    self.master = master
    self.n = n
    self.world = (0,0,1,1)
    c = self.canvas = Canvas(master, bg = 'black',
    width = 380, height = 380)
    c.pack(fill = BOTH, expand = YES)
    master.bind("<Configure>", self.configure)
    master.bind("<Escape>", lambda
    event ='ignored', m=master: m.destroy())
    self.canvas.bind("<Button-1>", self.click)
    self.colorfuncs = {'red':(min,min),'green':(min,max),
    'blue':(max,min), 'white':(max,max)}
    self.polling = False

    def poll(self):
    self.wriggle()
    self.master.after(10, self.poll)

    def click(self, event):
    self.draw()

    def configure(self,event):
    self.scale = Scaler(self.world, self.getviewport())
    self.draw()
    if not self.polling:
    self.polling = True
    self.poll()

    def draw(self):
    c,sp = self.canvas,self.scale.scalepoint
    c.delete('all')
    funcs = self.colorfuncs
    colors = funcs.keys()
    for i in xrange(1000):
    color = choice(colors)
    a,b = sp(random(), random())
    c.create_oval(a,b,a+5,b+5,fill=color,
    outline = '')

    def wriggle(self):
    c,sp = self.canvas,self.scale.scalepoint
    funcs = self.colorfuncs
    x = choice(c.find_all())
    color = c.itemcget(x,"fill")
    f1,f2 = funcs[color]
    a = f1([random() for i in xrange(self.n)])
    b = f2([random() for i in xrange(self.n)])
    a,b = sp(a,b)
    c.coords(x,a,b,a+5,b+5)

    def getviewport(self):
    c = self.canvas
    return (0, 0, c.winfo_width(),c.winfo_height())

    if __name__=='__main__':
    root = Tk()
    root.title('randomdot')
    app = RandomDot(root,3)
    root.mainloop()
  • Bengt Richter at Feb 9, 2004 at 5:47 pm

    On Fri, 06 Feb 2004 20:25:21 +0100, anton at vredegoor.doge.nl (Anton Vredegoor) wrote:
    On 6 Feb 2004 17:03:57 GMT, bokr at oz.net (Bengt Richter) wrote:

    The _meaning_ of numbers that are guaranteed to fall into known exact intervals
    in terms of representing measurements, measurement errors, statistics of the
    errors, etc. is a separate matter from keeping track of exact intervals during
    computation. These concerns should not be confused, IMO, though they inevitably
    arise together in thinking about computing with real-life measurement values.
    (Warning, naive hobbyist input, practicality: undefined)

    One possible option would be to provide for some kind of random
    rounding routine for some of the least significant bits of a floating
    point value. The advantage would be that this would also be usable for
    DSP-like computations that are used in music programming (volume
    adjustments) or in digital video (image rotation).
    I can't spend a lot of time on this right now, but this reminds me of
    a time when I tried (sucessfully IMO) to explain why feeding a simulation
    system with very low noise data got more accurate results than feeding it
    exact data.

    The reason has to do with quantization (which was part of the system being
    simulated, and which could be fed with highly accurate world-sim values plus
    noise). I.e., measurements are always represented digitally with some least
    significat bit representing some defined amount of a measured quantity.
    This means measurement information below that is lost (or at least one bit
    below that, depending the device).

    The result is that a statistical mean (or other integrating process) of samples
    will not be affected by the bits lost in quantizing. In the case of feeding a
    simulator with accurate values multiple times, this results in the identical
    biased quantized values, whereas if you add a small amount of noise, you will
    get a few neighboring quantized values in some proportion, and the mean will
    be a better estimate of the true (unquantized) value that a mean of quantized
    values with no noise -- where all the quantized values are exactly equal and
    all biased. The effect can be amplified if the input is feeding a sensitive
    calculation such as the inversion of a near-singular matrix, and can make the
    difference between usable and useless results.

    An example using int as the quantization function:
    import random
    def simval(val, noise=1.0):
    ... return val + noise*random.random()
    ...
    def simulator(val, noise, trials00):
    ... return sum([int(simval(val, noise)) for i in xrange(trials)])/float(trials)
    ...
    for i in xrange(10): print simulator(1.3, 0.0),
    ...
    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
    for i in xrange(10): print simulator(1.3, 1.0),
    ...
    1.295 1.293 1.284 1.307 1.3 1.292 1.322 1.291 1.322 1.315

    I suspect that the ear integrates/averages some when presented with 44.1k samples/sec,
    so if uniform noise is added in below the quantization lsb of a CD, that may enhance
    the perceived output sound, but some audiophile can provide the straight scoop on that.
    I agree with the idea that exact interval tracking is important, but
    perhaps this exact interval tracking should be used only during
    testing and development of the code.

    It could be that it would be possible to produce code with a fixed
    number of least significant bits that are randomly rounded each time
    some specific operation makes this necessary (not *all* computations!)
    and that the floating point data would stay accurate enough for long
    enough to be useable in 99.9 percent of the use cases.
    I think you have to be careful when you do your rounding, and note
    the effect on values vs populations of values and how that feeds the
    next stage of processing or use.
    Maybe we need a DSP-float instead of a decimal data type? Decimals
    could be used for testing DSP-float implementations.
    I'm not sure what DSP-float really means yet ;-)
    HTH, gotta go.

    Regards,
    Bengt Richter

Related Discussions