FAQ

jonas.thornvall at gmail.com wrote:
Well then i have news for you.

Well, then, why don't you share it?


Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
     f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.


Here's another way to look at it. If f(x) is smaller than x for every x,
that means there MUST me multiple values of x that produce the same f(x).
Do you see? If x is three bits and f(x) is two bits, that means there are
8 possible values for x but only 4 values for f(x). So, given an f(x), you
cannot tell which value of x it came from. You have lost information.
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.

Search Discussions

  • Mark Janssen at Nov 2, 2013 at 9:37 pm

    Let me try to get you to understand WHY what you say is impossible. Let's
    say you do have a function f(x) that can produce a compressed output y for
    any given x, such that y is always smaller than x. If that were true, then
    I could call f() recursively:
    f(f(...f(f(f(f(f(x)))))...))
    and eventually the result get down to a single bit. I hope it is clear
    that there's no way to restore a single bit back into different source
    texts.

    Hey, that's a nice proof!


    Cheers,


    Mark Janssen
  • Steven D'Aprano at Nov 3, 2013 at 3:17 am

    On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:


    jonas.thornvall at gmail.com wrote:
    Well then i have news for you.
    Well, then, why don't you share it?

    Let me try to get you to understand WHY what you say is impossible.
    [snip reasons]


    Expanding on Tim's post... the first scenario Tim gives, applying the
    compression repeatedly to its own output until you eventually get a
    single byte, can be overcome if there are data sets that are unchanged by
    compression. That is, if f() is the compression function:


    f(f(f(data))) = f(f(data)) == f(data) == data


    for some values of data. So if you start with some other data, and
    compress it, then compress it again, and again, eventually you end up
    with one of these attractors, after which repeated compression doesn't
    give you any more benefit.


    [Aside: the attractors aren't necessarily a single point. The attractor
    could be a cycle of two or more points, f(x) == y and f(y) == x. Or you
    could even have a "strange attractor", which brings us to chaos theory.]


    But your second reason, better known as the pigeonhole principle,
    demonstrates that for any lossless compression method, there must be data
    sets that actually expand the data. It doesn't matter how cleverly you
    compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
    speak. Suppose your compression algorithm compresses a single byte into a
    nybble (four bits). There are 256 different input data sets (0x00,
    0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
    is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
    more pigeons in at least one hole. Ergo, if the compression algorithm is
    lossless, *some* data must be expanded rather than compressed.


    Alternatively, the compression may be lossy. Or both!


    Obviously data isn't compressed a byte at a time, that would be silly.
    But the principle still applies.


    There is a way to apparently get around these limits: store data
    externally, perhaps inside the compression application itself. Then, if
    you just look at the compressed file (the "data.zip" equivalent, although
    I stress that zip compression is *not* like this), you might think it has
    shrunk quite a lot. But when you include the hidden data, the compression
    is not quite so impressive...




    --
    Steven
  • Chris Angelico at Nov 3, 2013 at 4:10 am

    On Sun, Nov 3, 2013 at 2:17 PM, Steven D'Aprano wrote:
    There is a way to apparently get around these limits: store data
    externally, perhaps inside the compression application itself. Then, if
    you just look at the compressed file (the "data.zip" equivalent, although
    I stress that zip compression is *not* like this), you might think it has
    shrunk quite a lot. But when you include the hidden data, the compression
    is not quite so impressive...

    Storing externally is, of course, a very useful thing - it's just not
    compression. For instance, a git repository (and quite likely a
    Mercurial one too) keeps track of files as blobs of data referenced by
    their cryptographic hashes. The current state of a directory can be
    described by listing file names with their object hashes, which is a
    very compact notation; but it doesn't have _all_ the information, and
    the "decompression" process involves fetching file contents from the
    library. It's a tool in the arsenal, but it's not compression in the
    same way that PK-ZIP is.


    With real compression algorithms, there's usually an "out" clause that
    detects that the algorithm's doing a bad job. The file format for
    PK-ZIP and, I expect, every other archiving+compression file format,
    has allowances for some of the files to be stored uncompressed. That
    way, there's no need for any file to be enlarged by more than some
    signature - say, a one-byte "compression type" marker, or even a
    one-bit mark somewhere. But enlarged they must be, for the reasons
    Steven explained.


    ChrisA
  • Joshua Landau at Nov 3, 2013 at 3:34 pm

    On 3 November 2013 03:17, Steven D'Aprano wrote:
    On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:

    jonas.thornvall at gmail.com wrote:
    Well then i have news for you.
    Well, then, why don't you share it?

    Let me try to get you to understand WHY what you say is impossible.
    [snip reasons]

    But your second reason, better known as the pigeonhole principle,
    demonstrates that for any lossless compression method, there must be data
    sets that actually expand the data. It doesn't matter how cleverly you
    compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
    speak. Suppose your compression algorithm compresses a single byte into a
    nybble (four bits). There are 256 different input data sets (0x00,
    0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
    is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
    more pigeons in at least one hole. Ergo, if the compression algorithm is
    lossless, *some* data must be expanded rather than compressed.

    You have to be careful to make this totally rigorous, too.


    Note that I *can* make a "compression" algorithm that takes any
    length-n sequence and compresses all but one sequence by at least one
    bit, and does not ever expand the data.


    "00" -> ""
    "01" -> "0"
    "10" -> "1"
    "11" -> "00"


    This, obviously, is just 'cause the length is an extra piece of data,
    but sometimes you have to store that anyway ;). So if I have a list of
    N length-Y lists containing only 1s or 0s, I can genuinely compress
    the whole structure by N log2 Y items.
  • Joshua Landau at Nov 3, 2013 at 3:51 pm

    On 3 November 2013 15:34, Joshua Landau wrote:
    I can genuinely compress
    the whole structure by N log2 Y items.

    By which I mean 2N items.
  • Mark Janssen at Nov 4, 2013 at 3:40 am

    Note that I *can* make a "compression" algorithm that takes any
    length-n sequence and compresses all but one sequence by at least one
    bit, and does not ever expand the data.

    "00" -> ""
    "01" -> "0"
    "10" -> "1"
    "11" -> "00"

    This, obviously, is just 'cause the length is an extra piece of data,
    but sometimes you have to store that anyway ;).

    And how many bits will you use to store the length?

    So if I have a list of
    N length-Y lists containing only 1s or 0s, I can genuinely compress
    the whole structure by N log2 Y items.

    But you cheated by using a piece of information from "outside the
    system": length. A generic compression algorithm doesn't have this
    information beforehand.
    --
    MarkJ
    Tacoma, Washington
  • Tim Chase at Nov 4, 2013 at 1:08 pm

    On 2013-11-03 19:40, Mark Janssen wrote:
    But you cheated by using a piece of information from "outside the
    system": length. A generic compression algorithm doesn't have this
    information beforehand.

    By cheating with outside information, you can perfectly compress any
    one data-set down to 1 bit. Just store the data in the program, then
    store 1 bit of "is this file the data we have stored in the
    program?". Granted, in modern OSes, you waste 7 extra bits since
    they require you to write an entire byte at a time. :-)


    And with that, you could even have an empty file and test for a file
    extension. ;-)


    -tkc
  • Ethan Furman at Nov 3, 2013 at 4:26 am

    On 10/30/2013 01:32 PM, Gene Heskett wrote:
    Congratulations Jonas. My kill file for this list used to have only one
    name, but now has 2.

    You have more patience than I! Jonas just made mine seven. :)


    --
    ~Ethan~
  • Mark Janssen at Nov 3, 2013 at 6:09 am

    Congratulations Jonas. My kill file for this list used to have only one
    name, but now has 2.
    You have more patience than I! Jonas just made mine seven. :)

    Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
    but computer scientists. It's an easy mistake to make.


    --
    MarkJ
    Tacoma, Washington
  • Michael Torrie at Nov 3, 2013 at 3:14 pm

    On 11/03/2013 12:09 AM, Mark Janssen wrote:
    Congratulations Jonas. My kill file for this list used to have only one
    name, but now has 2.
    You have more patience than I! Jonas just made mine seven. :)
    Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
    but computer scientists. It's an easy mistake to make.

    I don't think he's being plonked for not understanding computational
    theory. He's being plonked for resorting to name calling on his second
    post! If he was a smart computer scientist type, then engaging in a
    discussion about the theoretical aspects of his algorithm would have
    been welcomed by him, because that's what science is all about. But he
    failed that early on.


    Thanks to everyone in this part of the thread for turning this
    ridiculous farce into a really educational discussion on the theory of
    information compression. Too bad the OP has tuned out a long time ago.
  • Gene Heskett at Nov 3, 2013 at 9:50 am

    On Sunday 03 November 2013 04:40:45 Ethan Furman did opine:

    On 10/30/2013 01:32 PM, Gene Heskett wrote:
    Congratulations Jonas. My kill file for this list used to have only
    one name, but now has 2.
    You have more patience than I! Jonas just made mine seven. :)

    --
    ~Ethan~

    Yeah, well there are a couple others in the mugwump category here yet. I
    lurk here to try and learn, and baseless arguments are just noise. To be
    filtered. And its working!


    But it may be that this old dog has learned his last "new" trick in the
    language arena too, too many "irons in the fire", and fine tuning machinery
    to run the GCode I write to carve metal or wood is the primary interest
    ATM. At 79yo, the short term memory needs help. I'm smart enough to
    understand that, but it doesn't mean I like it. Its a right PIMA TBE.


    Cheers, Gene
    --
    "There are four boxes to be used in defense of liberty:
      soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author)


    All the evidence concerning the universe has not yet been collected,
    so there's still hope.
    A pen in the hand of this president is far more
    dangerous than 200 million guns in the hands of
              law-abiding citizens.
  • Ethan Furman at Nov 3, 2013 at 4:26 am

    On 10/30/2013 12:23 PM, jonas.thornvall at gmail.com wrote:
    What i actually saying is that you are indeed... [insult snipped]

    *plonk*
  • Jonas Thornvall at Nov 4, 2013 at 1:53 pm

    Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
    jonas.thornvall at gmail.com wrote:

    Well then i have news for you.


    Well, then, why don't you share it?



    Let me try to get you to understand WHY what you say is impossible. Let's

    say you do have a function f(x) that can produce a compressed output y for

    any given x, such that y is always smaller than x. If that were true, then

    I could call f() recursively:

    f(f(...f(f(f(f(f(x)))))...))

    and eventually the result get down to a single bit. I hope it is clear

    that there's no way to restore a single bit back into different source

    texts.



    Here's another way to look at it. If f(x) is smaller than x for every x,

    that means there MUST me multiple values of x that produce the same f(x).

    Do you see? If x is three bits and f(x) is two bits, that means there are

    8 possible values for x but only 4 values for f(x). So, given an f(x), you

    cannot tell which value of x it came from. You have lost information.

    --

    Tim Roberts, timr at probo.com

    Providenza & Boekelheide, Inc.

    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16


    Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.
  • Jonas Thornvall at Nov 4, 2013 at 2:00 pm

    Den m?ndagen den 4:e november 2013 kl. 14:53:28 UTC+1 skrev jonas.t... at gmail.com:
    Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
    jonas.thornvall at gmail.com wrote:


    Well then i have news for you.



    Well, then, why don't you share it?



    Let me try to get you to understand WHY what you say is impossible. Let's

    say you do have a function f(x) that can produce a compressed output y for

    any given x, such that y is always smaller than x. If that were true, then

    I could call f() recursively:

    f(f(...f(f(f(f(f(x)))))...))

    and eventually the result get down to a single bit. I hope it is clear

    that there's no way to restore a single bit back into different source

    texts.



    Here's another way to look at it. If f(x) is smaller than x for every x,

    that means there MUST me multiple values of x that produce the same f(x).

    Do you see? If x is three bits and f(x) is two bits, that means there are

    8 possible values for x but only 4 values for f(x). So, given an f(x), you

    cannot tell which value of x it came from. You have lost information.

    --

    Tim Roberts, timr at probo.com

    Providenza & Boekelheide, Inc.


    Well let me try to explain why it is working and i have implemented one.

    I only need to refresh my memory it was almost 15 years ago.

    This is not the solution but this is why it is working.

    65536%6^2^4=***4^8***=2^16



    Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

    You see if the program behave intelligent some small operations can perform wonder on a number.
  • Dave Angel at Nov 4, 2013 at 2:27 pm

    On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com wrote:
    Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim
    Roberts:
    Here's another way to look at it. If f(x) is smaller than x for
    every x,







    that means there MUST me multiple values of x that produce the
    same f(x).







    Do you see? If x is three bits and f(x) is two bits, that means
    there are







    8 possible values for x but only 4 values for f(x). So, given an
    f(x), y






    cannot tell which value of x it came from. You have lost
    information.







    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16


    Yes i am aware that 256 is a single byte 8 bits, but the approach
    is valid > anyway.


    And e ^ (I * pi) == -1
    So what. ?


    Better file that patent, before the patent office realizes the
    analogy to the perpetual motion machine.


    --
    DaveA
  • Rusi at Nov 4, 2013 at 2:46 pm

    On Monday, November 4, 2013 7:57:19 PM UTC+5:30, Dave Angel wrote:
    On Mon, 4 Nov 2013 05:53:28 -0800 (PST), Jonas wrote:
    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16
    Yes i am aware that 256 is a single byte 8 bits, but the approach
    is valid anyway.
    And e ^ (I * pi) == -1
    So what. ?
    Better file that patent, before the patent office realizes the
    analogy to the perpetual motion machine.

    Now I am too much of a dim-wit to comprehend all this compressified
    profundification but I think I have a (dim) clue how such a patent
    will be obtained:


    Just make a submission in such an unreadable double (then triple,
    quadruple...) spaced format that the patent office will grant it in
    exasperation.
  • Jonas Thornvall at Nov 4, 2013 at 10:34 pm

    Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
    On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com

    wrote:
    Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim
    Roberts:
    Here's another way to look at it. If f(x) is smaller than x for
    every x,








    that means there MUST me multiple values of x that produce the
    same f(x).








    Do you see? If x is three bits and f(x) is two bits, that means
    there are








    8 possible values for x but only 4 values for f(x). So, given an
    f(x), y>







    cannot tell which value of x it came from. You have lost
    information.








    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16



    Yes i am aware that 256 is a single byte 8 bits, but the approach
    is valid >
    anyway.


    And e ^ (I * pi) == -1

    So what. ?

    e is an approximation... and your idea is not general for any n.

    Better file that patent, before the patent office realizes the

    analogy to the perpetual motion machine.



    --

    DaveA
  • Dave Angel at Nov 5, 2013 at 1:29 am

    On Mon, 4 Nov 2013 14:34:23 -0800 (PST), jonas.thornvall at gmail.com wrote:
    e is an approximation... and your idea is not general for any n.

    e is certainly not an approximation, and I never mentioned n.


    --
    DaveA
  • Steven D'Aprano at Nov 5, 2013 at 4:33 am

    On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:


    Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
    On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com
    wrote:
    [...]
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16

    "this" being Jonas' alleged lossless compression method capable of
    compressing random data.



    Yes i am aware that 256 is a single byte 8 bits, but the approach
    is valid anyway.

    I must say, I cannot see the connection between the fact that 256**2 =2**16 and compression of random data. I might as well state that I have
    squared the circle, and offer as proof that 3+4 == 5+2.





    And e ^ (I * pi) == -1

    So what. ?
    e is an approximation... and your idea is not general for any n.

    e is not an approximation, it is a symbolic name for an exact
    transcendental number which cannot be specified exactly by any finite
    number of decimal places.


    Your comment about "n" is irrelevant, since Euler's Identity e**(i*pi)=-1
    has nothing to do with "n". But in case you actually meant "i", again you
    are mistaken. i is a symbolic name for an exact number.






    --
    Steven
  • Steven D'Aprano at Nov 5, 2013 at 4:36 am

    On Tue, 05 Nov 2013 04:33:46 +0000, Steven D'Aprano wrote:

    On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:

    Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
    On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com
    wrote:
    [...]
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16
    "this" being Jonas' alleged lossless compression method capable of
    compressing random data.

    /facepalm


    I meant "it", not "this", as in "why it (the compression method) is
    working". Sorry for any confusion.






    --
    Steven
  • Tim Roberts at Nov 7, 2013 at 8:05 am

    jonas.thornvall at gmail.com wrote:
    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16

    All of those values are indeed the same, and yet that is completely
    unrelated to compression. Did you honestly believe this was actually
    explaining anything?

    Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

    This whole post is a most amazing collection of non sequiturs. If you
    would like to describe your compression scheme, there really are people
    here who would be interested in reading it (although that number gets less
    and less as this thread goes on).
    --
    Tim Roberts, timr at probo.com
    Providenza & Boekelheide, Inc.
  • Mark Janssen at Nov 7, 2013 at 6:59 pm

    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16
    All of those values are indeed the same, and yet that is completely
    unrelated to compression. Did you honestly believe this was actually
    explaining anything?

    I think the idea is that you could take any arbitrary input sequence,
    view it as a large number, and then find what exponential equation can
    produce that result. The equation becomes the "compression".


    MarkJ
    Tacoma, Washington
  • Tim Roberts at Nov 7, 2013 at 7:22 pm

    Mark Janssen wrote:
    Well let me try to explain why it is working and i have implemented one.
    I only need to refresh my memory it was almost 15 years ago.
    This is not the solution but this is why it is working.
    65536%6^2^4=***4^8***=2^16
    I think the idea is that you could take any arbitrary input sequence,
    view it as a large number, and then find what exponential equation can
    produce that result. The equation becomes the "compression".

    Interesting -- I hadn't noticed that. Of course, now you have the
    problem of expressing 4^8 in a compact way.


    --
    Tim Roberts, timr at probo.com
    Providenza & Boekelheide, Inc.
  • Chris Angelico at Nov 7, 2013 at 10:26 pm

    On Fri, Nov 8, 2013 at 5:59 AM, Mark Janssen wrote:
    I think the idea is that you could take any arbitrary input sequence,
    view it as a large number, and then find what exponential equation can
    produce that result. The equation becomes the "compression".

    Interesting idea, but I don't see how this has anything to do with
    compressing arbitrary data. We already have a compact notation for
    representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see
    how this will be any better, other than eliding NULs.


    ChrisA
  • Jonas Thornvall at Nov 8, 2013 at 2:05 am

    Den torsdagen den 7:e november 2013 kl. 23:26:45 UTC+1 skrev Chris Angelico:
    On Fri, Nov 8, 2013 at 5:59 AM, Mark Janssen wrote:

    I think the idea is that you could take any arbitrary input sequence,
    view it as a large number, and then find what exponential equation can
    produce that result. The equation becomes the "compression".


    Interesting idea, but I don't see how this has anything to do with

    compressing arbitrary data. We already have a compact notation for

    representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see

    how this will be any better, other than eliding NULs.



    ChrisA

    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
  • Chris Angelico at Nov 8, 2013 at 2:17 am

    On Fri, Nov 8, 2013 at 1:05 PM, wrote:
    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

    I don't care how fast. I care about the laws of physics :) You can't
    stuff more data into less space without losing some of it.


    Also, please lose Google Groups, or check out what other people have
    said about making it less obnoxious.


    ChrisA
  • Mark Janssen at Nov 8, 2013 at 2:24 am

    On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico wrote:
    On Fri, Nov 8, 2013 at 1:05 PM, wrote:
    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
    I don't care how fast. I care about the laws of physics :) You can't
    stuff more data into less space without losing some of it.

    Technically, the universe could expand temporarily or reconfigure to
    allow it; the question is who or what will have to shift out to allow
    it?


    --
    MarkJ
    Tacoma, Washington
  • Chris Angelico at Nov 8, 2013 at 2:27 am

    On Fri, Nov 8, 2013 at 1:24 PM, Mark Janssen wrote:
    On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico wrote:
    On Fri, Nov 8, 2013 at 1:05 PM, wrote:
    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
    I don't care how fast. I care about the laws of physics :) You can't
    stuff more data into less space without losing some of it.
    Technically, the universe could expand temporarily or reconfigure to
    allow it; the question is who or what will have to shift out to allow
    it?

    ... okay, I bow to your superior power. If you can make the very
    *UNIVERSE* bow to your compression algorithm, I have to admit defeat.
    :D


    ChrisA
  • Gregory Ewing at Nov 8, 2013 at 7:16 am

    Mark Janssen wrote:
    Technically, the universe could expand temporarily or reconfigure to
    allow it;

    Oh, no! If Jonas ever succeeds in getting his algorithm to
    work, the Void will expand and swallow us all!


    http://en.wikipedia.org/wiki/The_Dreaming_Void


    --
    Greg
  • Jonas Thornvall at Nov 8, 2013 at 2:25 am

    Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
    On Fri, Nov 8, 2013 at 1:05 PM, wrote:

    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?


    I don't care how fast. I care about the laws of physics :) You can't

    stuff more data into less space without losing some of it.



    Also, please lose Google Groups, or check out what other people have

    said about making it less obnoxious.



    ChrisA

    Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.


    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
  • Chris Angelico at Nov 8, 2013 at 2:36 am

    On Fri, Nov 8, 2013 at 1:25 PM, wrote:
    Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.


    Maybe what you can do is render a large number of small subsets of
    numbers efficiently. (Let's say all instances of integer^integer, and
    all cases of Fibonacci numbers.) You'll still have some holes in
    between which you can't render as tidily, and these are the ones where
    the best representation is the raw one, plus a small tag saying that
    it's raw. That's how most compression algorithms work.


    Also, please don't swear. When I described Google Groups posts as
    "obnoxious", I was using the word in a strictly correct way, and that
    is not an invitation for profanity. Of course, if you want to be seen
    as *yourself* obnoxious, that's one of the easier ways to accomplish
    that.


    ChrisA
  • Mark Janssen at Nov 8, 2013 at 2:43 am

    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.

    I think the idea would be to find the prime factorization for a given
    number, which has been proven to be available (and unique) for any and
    every number. Most numbers can compress given this technique. Prime
    numbers, of course, would not be compressed.


    --
    MarkJ
    Tacoma, Washington
  • Chris Angelico at Nov 8, 2013 at 3:05 am

    On Fri, Nov 8, 2013 at 1:43 PM, Mark Janssen wrote:
    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.
    I think the idea would be to find the prime factorization for a given
    number, which has been proven to be available (and unique) for any and
    every number. Most numbers can compress given this technique. Prime
    numbers, of course, would not be compressed.

    Also an interesting theory.


    1) How do you represent the prime factors?
    2) How do you factor large numbers efficiently? Trust me, if you've
    solved this one, there are a *LOT* of people who want to know. People
    with money, like Visa.
    3) Still doesn't deal with all possible numbers.


    ChrisA
  • Roy Smith at Nov 8, 2013 at 3:08 am
    In article <mailman.2179.1383879958.18130.python-list@python.org>,
      Chris Angelico wrote:

    2) How do you factor large numbers efficiently? Trust me, if you've
    solved this one, there are a *LOT* of people who want to know. People
    with money, like Visa.

    Not to mention the NSA.
  • Dave Angel at Nov 8, 2013 at 4:12 am

    On Thu, 7 Nov 2013 18:43:17 -0800, Mark Janssen wrote:
    I think the idea would be to find the prime factorization for a given
    number, which has been proven to be available (and unique) for any and
    every number. Most numbers can compress given this technique. Prime
    numbers, of course, would not be compressed.

    No, very few numbers can be compressed using this technique. And
    primes of course expand greatly.


    --
    DaveA
  • Steven D'Aprano at Nov 8, 2013 at 4:47 am

    On Thu, 07 Nov 2013 18:43:17 -0800, Mark Janssen wrote:


    I am not sure if it is just stupidness or laziness that prevent you
    from seeing that 4^8e536.
    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.
    I think the idea would be to find the prime factorization for a given
    number, which has been proven to be available (and unique) for any and
    every number. Most numbers can compress given this technique.

    Sadly, they would not.


    Let's suppose you wish to compress the number 143. So you find the prime
    factorization, 11*133. Have you compressed anything? No you have not
    -- the original number, 143, fits in a single byte, while it requires
    *two* bytes to deal with 11 and 13 (one byte for 11, and a second byte
    for 13).


    We can try a different strategy: while 143 requires a full eight bits
    (one byte), both 11 and 13 require only four bits (one nybble) each. So
    by hard coding our expectation that the result will be two nybbles, we
    can avoid expanding the data and end up with exactly zero compression:


    143 = binary 10001110 or eight bits in total
    11, 13 = binary 1011 1101 or eight bits in total


    But while that trick works for 143, it doesn't work for 154 (one byte,
    binary 10011010) which has prime factorization 2*7*11 (three nybbles 0010
    0111 1011). Even if you can somehow drop the leading zeroes, it requires
    nine bits versus eight.


    Suppose instead of using bytes, you use decimal digits. 11 and 13 use
    four digits, while 143 uses only three. Likewise, 154 has three decimal
    digits while 2*7*11 has four digits. In both cases, there is no
    compression.


    How about recording which prime number it is, instead of the actual value
    of the prime? So 154=2*7*11 expands to the 1st, 4th and 5th prime, so
    maybe we can record 1 4 5 instead of 2 7 11 and save some bits. (We'll
    save more bits the larger the prime.) Of course, to reverse the
    compression you need to keep a table of the prime numbers somewhere, and
    that's going to be a lot bigger than the space you save...


    Now, I accept that, possibly, with sufficient work we might be able to
    find some cunning mechanism by which we can compress *any* integer value.
    But it won't be the same mechanism for every value! For example, we might
    compress (2**1000+1) using a run-length encoding of the binary bits,
    while compressing 14629839399572435720381495231 as its prime
    factorization (the 319th prime number, the 479th prime, the 499th prime
    six times), and 10**1000 as an encoding of the decimal digits. But we
    still need some sort of tag to distinguish between these different
    compression schemes, and once you take into account the extra information
    in the tag, there will be cases where some numbers aren't compressed at
    all.


    The ability to losslessly compress *everything* is sheer pseudo-
    mathematics, up there with finding an exact rational value for pi, or
    squaring the circle using only a compass and straight-edge. But the
    ability to losslessly compress *useful data* is real, and such algorithms
    operate by finding non-random patterns and redundant information in the
    data.






    --
    Steven
  • Gregory Ewing at Nov 8, 2013 at 7:09 am

    Steven D'Aprano wrote:
    Of course, to reverse the
    compression you need to keep a table of the prime numbers somewhere,

    That's not strictly necessary -- you could calculate them
    as needed. It wouldn't be *fast*, but it would work...


    You've got me thinking now about how viable a compression
    scheme this would be, efficiency issues aside. I suppose
    it would depend on things like the average density of primes
    and the average number of prime factors a number has.
    Any number theorists here?


    --
    Greg
  • Chris Angelico at Nov 8, 2013 at 7:21 am

    On Fri, Nov 8, 2013 at 6:09 PM, Gregory Ewing wrote:
    You've got me thinking now about how viable a compression
    scheme this would be, efficiency issues aside. I suppose
    it would depend on things like the average density of primes
    and the average number of prime factors a number has.
    Any number theorists here?

    Start by coming up with an efficient storage representation for the
    primes. Don't forget that you can't use a bitfield, because eg 98 =
    7*7*2, so you somehow need to represent that the 7 comes up twice. And
    you can't limit the size of your primes (eg by representing 98 as
    "\x07\x07\x02").


    And that's without dealing with the major issue of _finding_ prime
    factors for a large number.


    ChrisA
  • Jonas Thornvall at Nov 8, 2013 at 3:48 pm

    Den fredagen den 8:e november 2013 kl. 03:43:17 UTC+1 skrev zipher:
    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.


    I think the idea would be to find the prime factorization for a given

    number, which has been proven to be available (and unique) for any and

    every number. Most numbers can compress given this technique. Prime

    numbers, of course, would not be compressed.



    --

    MarkJ

    Tacoma, Washington

    3^2-2^2=5
  • Rusi at Nov 8, 2013 at 3:57 pm

    On Friday, November 8, 2013 9:18:05 PM UTC+5:30, jonas.t... at gmail.com wrote:
    Den fredagen den 8:e november 2013 kl. 03:43:17 UTC+1 skrev zipher:
    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
    I can see that 4^8 = 65536. Now how are you going to render 65537? You
    claimed that you could render *any* number efficiently. What you've
    proven is that a small subset of numbers can be rendered efficiently.
    I think the idea would be to find the prime factorization for a given
    number, which has been proven to be available (and unique) for any and
    every number. Most numbers can compress given this technique. Prime
    numbers, of course, would not be compressed.


    3^2-2^2=5

    Oh my! What a profound insight!!
  • Ian Kelly at Nov 8, 2013 at 6:48 pm

    On Fri, Nov 8, 2013 at 8:48 AM, wrote:
    3^2-2^2=5

    How do you intend to encode 3**2 - 2**2 in such a way that it is more
    compact than simply encoding 5? If you actually have an algorithm,
    you should share it instead of dropping these cryptic one-line
    non-explanations and leaving us guessing about the details. But I'm
    starting to think that you don't actually have an algorithm at all,
    whereas my initial impression was that you did and were simply
    mistaken about its effectiveness.
  • Rusi at Nov 8, 2013 at 2:36 am

    On Friday, November 8, 2013 7:55:18 AM UTC+5:30, jonas wrote:
    Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
    On Fri, Nov 8, 2013 at 1:05 PM, jonas.thornvall wrote:
    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
    I don't care how fast. I care about the laws of physics :) You can't
    stuff more data into less space without losing some of it.
    Also, please lose Google Groups, or check out what other people have
    said about making it less obnoxious.
    ChrisA
    Please, you are he obnoxious, so fuck off or go learn about
    reformulation of problems. Every number has an infinite number of
    arithmetical solutions. So every number do has a shortest
    arithmetical encoding. And that is not the hard part to figure out,
    the hard part is to find a generic arithmetic encoding.

    Since we are discussing profound matters such as cosmic laws,
    here's my 'all-universal' law:


    When you engage with a troll you get trolled.
  • R. Michael Weylandt at Nov 8, 2013 at 2:43 am

    On Nov 7, 2013, at 21:25, jonas.thornvall at gmail.com wrote:


    Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
    On Fri, Nov 8, 2013 at 1:05 PM, wrote:

    I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?


    I don't care how fast. I care about the laws of physics :) You can't

    stuff more data into less space without losing some of it.



    Also, please lose Google Groups, or check out what other people have

    said about making it less obnoxious.



    ChrisA
    Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

    I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

    Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.


    You might reply that you don't need a whole byte for 4 or 8 -- that's true. You could, e.g., just encode the fourth and eight bits of a single byte and send that: certainly gives some compression but at the cost of generality-- what would you do with 65^65? It's sort of like the mean-variance tradeoff in statistics; it's much easier to encode certain data sets (e.g. powers of two as you noted) but only if you concede your algorithm will only work for those values.


    More generally, check out the work of Claud Shannon; a very accessible and worthwhile author.
  • Chris Angelico at Nov 8, 2013 at 3:24 am

    On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt <michael.weylandt@gmail.com> wrote:
    Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

    Well, 65536 won't fit in a single byte, nor even in two (only just). A
    typical binary representation of 65536 would take 3 bytes, or 4 for a
    common integer type: 0x00 0x01 0x00 0x00 (big-endian). So it would be
    possible to represent that more compactly, if you deem one byte each
    for the base and the exponent: 0x04 0x08. However, that system allows
    you to represent just 62,041 possible numbers:

    decomp={}
    for base in range(256):
         for exp in range(256):
             decomp[base**exp]=base,exp

    len(decomp)
    62041


    The trouble is, these numbers range all the way up from 0**0 == 0 to
    255**255 == uhh...
    255**255
    46531388344983681457769984555620005635274427815488751368772861643065273360461098097690597702647394229975161523887729348709679192202790820272357752329882392140552515610822058736740145045150003072264722464746837070302159356661765043244993104360887623976285955058200326531849137668562738184397385361179287309286327712528995820702180594566008294593820621769951491324907014215176509758404760451335847252744697820515292329680698271481385779516652518207263143889034764775414387732372812840456880885163361037485452406176311868267428358492408075197688911053603714883403374930891951109790394269793978310190141201019287109375


    which would decompress to (obviously) 255 bytes. So you can represent
    just over sixty thousand possible 255-byte strings in two bytes with
    this system.


    To generalize it, you'd need to devote a bit somewhere to saying
    "There are more to add in". Let's say you do this on the exponent byte
    (high bit for convenience), so you have 0x04 0x08 means 65536, and
    0x04 0x88 0x01 0x01 means 65536+1 = 65537. Now we have something that
    generalizes; but the efficiency is gone - and there are too many ways
    to encode the same value. (Bear in mind, for instance, that 0x01 0xNN
    for any NN will still just represent 1, and 0x00 0xNN will represent
    0. That's wasting a lot of bits.)


    The application I can see for this sort of thing is not data
    compression, but puzzles. There are lots of puzzles that humans find
    enjoyable that represent an entire grid with less data than it
    contains - for one popular example, look at Sudoku. You have a 9x9
    grid where each cell could contain any of nine digits, which means a
    theoretical information density of 9**81; but the entire grid can be
    described by a handful of digits and heaps of blank space. This could
    be a similarly-fun mathematical puzzle: 3359232 can be represented as
    B1**E1 + B2**E2, where all numbers are single-digit. Find B1, B2, E1,
    and E2. In this case, you're guaranteed that the end result is shorter
    (four digits), but it's hardly useful for general work.


    ChrisA
  • R. Michael Weylandt at Nov 8, 2013 at 4:05 am

    On Nov 7, 2013, at 22:24, Chris Angelico wrote:


    On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
    <michael.weylandt@gmail.com> wrote:
    Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte.
    Well, 65536 won't fit in a single byte, nor even in two (only just). A
    typical binary representation of 65536 would take 3 bytes, or 4 for a
    common integer type: 0x00 0x01 0x00 0x00 (big-endian).

    Quite right -- meant C's int type, (machine word) not char. My mistake!


    Michael
  • Chris Angelico at Nov 8, 2013 at 4:06 am

    On Fri, Nov 8, 2013 at 3:05 PM, R. Michael Weylandt <michael.weylandt@gmail.com> wrote:

    On Nov 7, 2013, at 22:24, Chris Angelico wrote:

    On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
    <michael.weylandt@gmail.com> wrote:
    Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte.
    Well, 65536 won't fit in a single byte, nor even in two (only just). A
    typical binary representation of 65536 would take 3 bytes, or 4 for a
    common integer type: 0x00 0x01 0x00 0x00 (big-endian).
    Quite right -- meant C's int type, (machine word) not char. My mistake!

    Sure. Assuming at least 32-bit words, yes, that's correct. Of course,
    this is still just proving that it's {possible, not possible} to
    compress specific values, while the OP claimed to compress anything.


    ChrisA
  • Steven D'Aprano at Nov 8, 2013 at 5:32 am

    On Thu, 07 Nov 2013 18:25:18 -0800, jonas.thornvall wrote:


    Please, you are he obnoxious, so fuck off

    Pot, meet kettle.

    I am not sure if it is just stupidness or laziness that prevent you from
    seeing that 4^8e536.

    And what is it that prevents you from seeing that 4**8e536 is
    irrelevant to the compression of random data?


    All your talk about "reformulation of problems", "infinite number of
    arithmetical solutions" and "generic arithmetic encoding" is just a smoke
    screen to distract from the fact that you don't, in fact, have a
    compression algorithm that can losslessly compress arbitrary random data
    of any length.


    I am as sure as this as I am sure you don't have a perpetual motion
    machine, a method for squaring the circle using only a compass and
    straight-edge, or a fool-proof system for winning the lottery every week
    without any chance of failure.




    --
    Steven

Related Discussions

People

Translate

site design / logo © 2022 Grokbase