FAQ

jonas.thornvall at gmail.com wrote:
Well then i have news for you.

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.

Here's another way to look at it. If f(x) is smaller than x for every x,
that means there MUST me multiple values of x that produce the same f(x).
Do you see? If x is three bits and f(x) is two bits, that means there are
8 possible values for x but only 4 values for f(x). So, given an f(x), you
cannot tell which value of x it came from. You have lost information.
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.

## Search Discussions

•  at Nov 2, 2013 at 9:37 pm ⇧

Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.

Hey, that's a nice proof!

Cheers,

Mark Janssen
•  at Nov 3, 2013 at 3:17 am ⇧

On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:

jonas.thornvall at gmail.com wrote:
Well then i have news for you.
Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible.
[snip reasons]

Expanding on Tim's post... the first scenario Tim gives, applying the
compression repeatedly to its own output until you eventually get a
single byte, can be overcome if there are data sets that are unchanged by
compression. That is, if f() is the compression function:

f(f(f(data))) = f(f(data)) == f(data) == data

for some values of data. So if you start with some other data, and
compress it, then compress it again, and again, eventually you end up
with one of these attractors, after which repeated compression doesn't
give you any more benefit.

[Aside: the attractors aren't necessarily a single point. The attractor
could be a cycle of two or more points, f(x) == y and f(y) == x. Or you
could even have a "strange attractor", which brings us to chaos theory.]

But your second reason, better known as the pigeonhole principle,
demonstrates that for any lossless compression method, there must be data
sets that actually expand the data. It doesn't matter how cleverly you
compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
speak. Suppose your compression algorithm compresses a single byte into a
nybble (four bits). There are 256 different input data sets (0x00,
0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
more pigeons in at least one hole. Ergo, if the compression algorithm is
lossless, *some* data must be expanded rather than compressed.

Alternatively, the compression may be lossy. Or both!

Obviously data isn't compressed a byte at a time, that would be silly.
But the principle still applies.

There is a way to apparently get around these limits: store data
externally, perhaps inside the compression application itself. Then, if
you just look at the compressed file (the "data.zip" equivalent, although
I stress that zip compression is *not* like this), you might think it has
shrunk quite a lot. But when you include the hidden data, the compression
is not quite so impressive...

--
Steven
•  at Nov 3, 2013 at 4:10 am ⇧

On Sun, Nov 3, 2013 at 2:17 PM, Steven D'Aprano wrote:
There is a way to apparently get around these limits: store data
externally, perhaps inside the compression application itself. Then, if
you just look at the compressed file (the "data.zip" equivalent, although
I stress that zip compression is *not* like this), you might think it has
shrunk quite a lot. But when you include the hidden data, the compression
is not quite so impressive...

Storing externally is, of course, a very useful thing - it's just not
compression. For instance, a git repository (and quite likely a
Mercurial one too) keeps track of files as blobs of data referenced by
their cryptographic hashes. The current state of a directory can be
described by listing file names with their object hashes, which is a
very compact notation; but it doesn't have _all_ the information, and
the "decompression" process involves fetching file contents from the
library. It's a tool in the arsenal, but it's not compression in the
same way that PK-ZIP is.

With real compression algorithms, there's usually an "out" clause that
detects that the algorithm's doing a bad job. The file format for
PK-ZIP and, I expect, every other archiving+compression file format,
has allowances for some of the files to be stored uncompressed. That
way, there's no need for any file to be enlarged by more than some
signature - say, a one-byte "compression type" marker, or even a
one-bit mark somewhere. But enlarged they must be, for the reasons
Steven explained.

ChrisA
•  at Nov 3, 2013 at 3:34 pm ⇧

On 3 November 2013 03:17, Steven D'Aprano wrote:
On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:

jonas.thornvall at gmail.com wrote:
Well then i have news for you.
Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible.
[snip reasons]

But your second reason, better known as the pigeonhole principle,
demonstrates that for any lossless compression method, there must be data
sets that actually expand the data. It doesn't matter how cleverly you
compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
speak. Suppose your compression algorithm compresses a single byte into a
nybble (four bits). There are 256 different input data sets (0x00,
0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
more pigeons in at least one hole. Ergo, if the compression algorithm is
lossless, *some* data must be expanded rather than compressed.

You have to be careful to make this totally rigorous, too.

Note that I *can* make a "compression" algorithm that takes any
length-n sequence and compresses all but one sequence by at least one
bit, and does not ever expand the data.

"00" -> ""
"01" -> "0"
"10" -> "1"
"11" -> "00"

This, obviously, is just 'cause the length is an extra piece of data,
but sometimes you have to store that anyway ;). So if I have a list of
N length-Y lists containing only 1s or 0s, I can genuinely compress
the whole structure by N log2 Y items.
•  at Nov 3, 2013 at 3:51 pm ⇧

On 3 November 2013 15:34, Joshua Landau wrote:
I can genuinely compress
the whole structure by N log2 Y items.

By which I mean 2N items.
•  at Nov 4, 2013 at 3:40 am ⇧

Note that I *can* make a "compression" algorithm that takes any
length-n sequence and compresses all but one sequence by at least one
bit, and does not ever expand the data.

"00" -> ""
"01" -> "0"
"10" -> "1"
"11" -> "00"

This, obviously, is just 'cause the length is an extra piece of data,
but sometimes you have to store that anyway ;).

And how many bits will you use to store the length?

So if I have a list of
N length-Y lists containing only 1s or 0s, I can genuinely compress
the whole structure by N log2 Y items.

But you cheated by using a piece of information from "outside the
system": length. A generic compression algorithm doesn't have this
information beforehand.
--
MarkJ
Tacoma, Washington
•  at Nov 4, 2013 at 1:08 pm ⇧

On 2013-11-03 19:40, Mark Janssen wrote:
But you cheated by using a piece of information from "outside the
system": length. A generic compression algorithm doesn't have this
information beforehand.

By cheating with outside information, you can perfectly compress any
one data-set down to 1 bit. Just store the data in the program, then
store 1 bit of "is this file the data we have stored in the
program?". Granted, in modern OSes, you waste 7 extra bits since
they require you to write an entire byte at a time. :-)

And with that, you could even have an empty file and test for a file
extension. ;-)

-tkc
•  at Nov 3, 2013 at 4:26 am ⇧

On 10/30/2013 01:32 PM, Gene Heskett wrote:
Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2.

You have more patience than I! Jonas just made mine seven. :)

--
~Ethan~
•  at Nov 3, 2013 at 6:09 am ⇧

Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2.
You have more patience than I! Jonas just made mine seven. :)

Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
but computer scientists. It's an easy mistake to make.

--
MarkJ
Tacoma, Washington
•  at Nov 3, 2013 at 3:14 pm ⇧

On 11/03/2013 12:09 AM, Mark Janssen wrote:
Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2.
You have more patience than I! Jonas just made mine seven. :)
Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
but computer scientists. It's an easy mistake to make.

I don't think he's being plonked for not understanding computational
theory. He's being plonked for resorting to name calling on his second
post! If he was a smart computer scientist type, then engaging in a
discussion about the theoretical aspects of his algorithm would have
been welcomed by him, because that's what science is all about. But he
failed that early on.

Thanks to everyone in this part of the thread for turning this
ridiculous farce into a really educational discussion on the theory of
information compression. Too bad the OP has tuned out a long time ago.
•  at Nov 3, 2013 at 9:50 am ⇧

On Sunday 03 November 2013 04:40:45 Ethan Furman did opine:

On 10/30/2013 01:32 PM, Gene Heskett wrote:
Congratulations Jonas. My kill file for this list used to have only
one name, but now has 2.
You have more patience than I! Jonas just made mine seven. :)

--
~Ethan~

Yeah, well there are a couple others in the mugwump category here yet. I
lurk here to try and learn, and baseless arguments are just noise. To be
filtered. And its working!

But it may be that this old dog has learned his last "new" trick in the
language arena too, too many "irons in the fire", and fine tuning machinery
to run the GCode I write to carve metal or wood is the primary interest
ATM. At 79yo, the short term memory needs help. I'm smart enough to
understand that, but it doesn't mean I like it. Its a right PIMA TBE.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

All the evidence concerning the universe has not yet been collected,
so there's still hope.
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.
•  at Nov 3, 2013 at 4:26 am ⇧

On 10/30/2013 12:23 PM, jonas.thornvall at gmail.com wrote:
What i actually saying is that you are indeed... [insult snipped]

*plonk*
•  at Nov 4, 2013 at 1:53 pm ⇧

Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
jonas.thornvall at gmail.com wrote:

Well then i have news for you.

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible. Let's

say you do have a function f(x) that can produce a compressed output y for

any given x, such that y is always smaller than x. If that were true, then

I could call f() recursively:

f(f(...f(f(f(f(f(x)))))...))

and eventually the result get down to a single bit. I hope it is clear

that there's no way to restore a single bit back into different source

texts.

Here's another way to look at it. If f(x) is smaller than x for every x,

that means there MUST me multiple values of x that produce the same f(x).

Do you see? If x is three bits and f(x) is two bits, that means there are

8 possible values for x but only 4 values for f(x). So, given an f(x), you

cannot tell which value of x it came from. You have lost information.

--

Tim Roberts, timr at probo.com

Providenza & Boekelheide, Inc.

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.
•  at Nov 4, 2013 at 2:00 pm ⇧

Den m?ndagen den 4:e november 2013 kl. 14:53:28 UTC+1 skrev jonas.t... at gmail.com:
Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim Roberts:
jonas.thornvall at gmail.com wrote:

Well then i have news for you.

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible. Let's

say you do have a function f(x) that can produce a compressed output y for

any given x, such that y is always smaller than x. If that were true, then

I could call f() recursively:

f(f(...f(f(f(f(f(x)))))...))

and eventually the result get down to a single bit. I hope it is clear

that there's no way to restore a single bit back into different source

texts.

Here's another way to look at it. If f(x) is smaller than x for every x,

that means there MUST me multiple values of x that produce the same f(x).

Do you see? If x is three bits and f(x) is two bits, that means there are

8 possible values for x but only 4 values for f(x). So, given an f(x), you

cannot tell which value of x it came from. You have lost information.

--

Tim Roberts, timr at probo.com

Providenza & Boekelheide, Inc.

Well let me try to explain why it is working and i have implemented one.

I only need to refresh my memory it was almost 15 years ago.

This is not the solution but this is why it is working.

65536%6^2^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

You see if the program behave intelligent some small operations can perform wonder on a number.
•  at Nov 4, 2013 at 2:27 pm ⇧

On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com wrote:
Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim
Roberts:
Here's another way to look at it. If f(x) is smaller than x for
every x,

that means there MUST me multiple values of x that produce the
same f(x).

Do you see? If x is three bits and f(x) is two bits, that means
there are

8 possible values for x but only 4 values for f(x). So, given an
f(x), y

cannot tell which value of x it came from. You have lost
information.

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach
is valid > anyway.

And e ^ (I * pi) == -1
So what. ?

Better file that patent, before the patent office realizes the
analogy to the perpetual motion machine.

--
DaveA
•  at Nov 4, 2013 at 2:46 pm ⇧

On Monday, November 4, 2013 7:57:19 PM UTC+5:30, Dave Angel wrote:
On Mon, 4 Nov 2013 05:53:28 -0800 (PST), Jonas wrote:
Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16
Yes i am aware that 256 is a single byte 8 bits, but the approach
is valid anyway.
And e ^ (I * pi) == -1
So what. ?
Better file that patent, before the patent office realizes the
analogy to the perpetual motion machine.

Now I am too much of a dim-wit to comprehend all this compressified
profundification but I think I have a (dim) clue how such a patent
will be obtained:

Just make a submission in such an unreadable double (then triple,
quadruple...) spaced format that the patent office will grant it in
exasperation.
•  at Nov 4, 2013 at 10:34 pm ⇧

Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com

wrote:
Den l?rdagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim
Roberts:
Here's another way to look at it. If f(x) is smaller than x for
every x,

that means there MUST me multiple values of x that produce the
same f(x).

Do you see? If x is three bits and f(x) is two bits, that means
there are

8 possible values for x but only 4 values for f(x). So, given an
f(x), y>

cannot tell which value of x it came from. You have lost
information.

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach
is valid >
anyway.

And e ^ (I * pi) == -1

So what. ?

e is an approximation... and your idea is not general for any n.

Better file that patent, before the patent office realizes the

analogy to the perpetual motion machine.

--

DaveA
•  at Nov 5, 2013 at 1:29 am ⇧

On Mon, 4 Nov 2013 14:34:23 -0800 (PST), jonas.thornvall at gmail.com wrote:
e is an approximation... and your idea is not general for any n.

e is certainly not an approximation, and I never mentioned n.

--
DaveA
•  at Nov 5, 2013 at 4:33 am ⇧

On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:

Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com
wrote:
[...]
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16

"this" being Jonas' alleged lossless compression method capable of
compressing random data.

Yes i am aware that 256 is a single byte 8 bits, but the approach
is valid anyway.

I must say, I cannot see the connection between the fact that 256**2 =2**16 and compression of random data. I might as well state that I have
squared the circle, and offer as proof that 3+4 == 5+2.

And e ^ (I * pi) == -1

So what. ?
e is an approximation... and your idea is not general for any n.

e is not an approximation, it is a symbolic name for an exact
transcendental number which cannot be specified exactly by any finite
number of decimal places.

has nothing to do with "n". But in case you actually meant "i", again you
are mistaken. i is a symbolic name for an exact number.

--
Steven
•  at Nov 5, 2013 at 4:36 am ⇧

On Tue, 05 Nov 2013 04:33:46 +0000, Steven D'Aprano wrote:

On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:

Den m?ndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.thornvall at gmail.com
wrote:
[...]
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16
"this" being Jonas' alleged lossless compression method capable of
compressing random data.

/facepalm

I meant "it", not "this", as in "why it (the compression method) is
working". Sorry for any confusion.

--
Steven
•  at Nov 7, 2013 at 8:05 am ⇧

jonas.thornvall at gmail.com wrote:
Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16

All of those values are indeed the same, and yet that is completely
unrelated to compression. Did you honestly believe this was actually
explaining anything?

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

This whole post is a most amazing collection of non sequiturs. If you
would like to describe your compression scheme, there really are people
here who would be interested in reading it (although that number gets less
and less as this thread goes on).
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
•  at Nov 7, 2013 at 6:59 pm ⇧

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16
All of those values are indeed the same, and yet that is completely
unrelated to compression. Did you honestly believe this was actually
explaining anything?

I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

MarkJ
Tacoma, Washington
•  at Nov 7, 2013 at 7:22 pm ⇧

Mark Janssen wrote:
Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536%6^2^4=***4^8***=2^16
I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

Interesting -- I hadn't noticed that. Of course, now you have the
problem of expressing 4^8 in a compact way.

--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
•  at Nov 7, 2013 at 10:26 pm ⇧

On Fri, Nov 8, 2013 at 5:59 AM, Mark Janssen wrote:
I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

Interesting idea, but I don't see how this has anything to do with
compressing arbitrary data. We already have a compact notation for
representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see
how this will be any better, other than eliding NULs.

ChrisA
•  at Nov 8, 2013 at 2:05 am ⇧

Den torsdagen den 7:e november 2013 kl. 23:26:45 UTC+1 skrev Chris Angelico:
On Fri, Nov 8, 2013 at 5:59 AM, Mark Janssen wrote:

I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

Interesting idea, but I don't see how this has anything to do with

compressing arbitrary data. We already have a compact notation for

representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see

how this will be any better, other than eliding NULs.

ChrisA

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
•  at Nov 8, 2013 at 2:17 am ⇧

On Fri, Nov 8, 2013 at 1:05 PM, wrote:
I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.

Also, please lose Google Groups, or check out what other people have
said about making it less obnoxious.

ChrisA
•  at Nov 8, 2013 at 2:24 am ⇧

On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico wrote:
On Fri, Nov 8, 2013 at 1:05 PM, wrote:
I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.

Technically, the universe could expand temporarily or reconfigure to
allow it; the question is who or what will have to shift out to allow
it?

--
MarkJ
Tacoma, Washington
•  at Nov 8, 2013 at 2:27 am ⇧

On Fri, Nov 8, 2013 at 1:24 PM, Mark Janssen wrote:
On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico wrote:
On Fri, Nov 8, 2013 at 1:05 PM, wrote:
I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.
Technically, the universe could expand temporarily or reconfigure to
allow it; the question is who or what will have to shift out to allow
it?

... okay, I bow to your superior power. If you can make the very
:D

ChrisA
•  at Nov 8, 2013 at 7:16 am ⇧

Mark Janssen wrote:
Technically, the universe could expand temporarily or reconfigure to
allow it;

Oh, no! If Jonas ever succeeds in getting his algorithm to
work, the Void will expand and swallow us all!

http://en.wikipedia.org/wiki/The_Dreaming_Void

--
Greg
•  at Nov 8, 2013 at 2:25 am ⇧

Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
On Fri, Nov 8, 2013 at 1:05 PM, wrote:

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

I don't care how fast. I care about the laws of physics :) You can't

stuff more data into less space without losing some of it.

Also, please lose Google Groups, or check out what other people have

said about making it less obnoxious.

ChrisA

Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
•  at Nov 8, 2013 at 2:36 am ⇧

On Fri, Nov 8, 2013 at 1:25 PM, wrote:
Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.

Maybe what you can do is render a large number of small subsets of
numbers efficiently. (Let's say all instances of integer^integer, and
all cases of Fibonacci numbers.) You'll still have some holes in
between which you can't render as tidily, and these are the ones where
the best representation is the raw one, plus a small tag saying that
it's raw. That's how most compression algorithms work.

"obnoxious", I was using the word in a strictly correct way, and that
is not an invitation for profanity. Of course, if you want to be seen
as *yourself* obnoxious, that's one of the easier ways to accomplish
that.

ChrisA
•  at Nov 8, 2013 at 2:43 am ⇧

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.

I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

--
MarkJ
Tacoma, Washington
•  at Nov 8, 2013 at 3:05 am ⇧

On Fri, Nov 8, 2013 at 1:43 PM, Mark Janssen wrote:
I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.
I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

Also an interesting theory.

1) How do you represent the prime factors?
2) How do you factor large numbers efficiently? Trust me, if you've
solved this one, there are a *LOT* of people who want to know. People
with money, like Visa.
3) Still doesn't deal with all possible numbers.

ChrisA
•  at Nov 8, 2013 at 3:08 am ⇧
In article <mailman.2179.1383879958.18130.python-list@python.org>,
Chris Angelico wrote:

2) How do you factor large numbers efficiently? Trust me, if you've
solved this one, there are a *LOT* of people who want to know. People
with money, like Visa.

Not to mention the NSA.
•  at Nov 8, 2013 at 4:12 am ⇧

On Thu, 7 Nov 2013 18:43:17 -0800, Mark Janssen wrote:
I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

No, very few numbers can be compressed using this technique. And
primes of course expand greatly.

--
DaveA
•  at Nov 8, 2013 at 4:47 am ⇧

On Thu, 07 Nov 2013 18:43:17 -0800, Mark Janssen wrote:

I am not sure if it is just stupidness or laziness that prevent you
from seeing that 4^8e536.
I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.
I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique.

Let's suppose you wish to compress the number 143. So you find the prime
factorization, 11*133. Have you compressed anything? No you have not
-- the original number, 143, fits in a single byte, while it requires
*two* bytes to deal with 11 and 13 (one byte for 11, and a second byte
for 13).

We can try a different strategy: while 143 requires a full eight bits
(one byte), both 11 and 13 require only four bits (one nybble) each. So
by hard coding our expectation that the result will be two nybbles, we
can avoid expanding the data and end up with exactly zero compression:

143 = binary 10001110 or eight bits in total
11, 13 = binary 1011 1101 or eight bits in total

But while that trick works for 143, it doesn't work for 154 (one byte,
binary 10011010) which has prime factorization 2*7*11 (three nybbles 0010
0111 1011). Even if you can somehow drop the leading zeroes, it requires
nine bits versus eight.

Suppose instead of using bytes, you use decimal digits. 11 and 13 use
four digits, while 143 uses only three. Likewise, 154 has three decimal
digits while 2*7*11 has four digits. In both cases, there is no
compression.

How about recording which prime number it is, instead of the actual value
of the prime? So 154=2*7*11 expands to the 1st, 4th and 5th prime, so
maybe we can record 1 4 5 instead of 2 7 11 and save some bits. (We'll
save more bits the larger the prime.) Of course, to reverse the
compression you need to keep a table of the prime numbers somewhere, and
that's going to be a lot bigger than the space you save...

Now, I accept that, possibly, with sufficient work we might be able to
find some cunning mechanism by which we can compress *any* integer value.
But it won't be the same mechanism for every value! For example, we might
compress (2**1000+1) using a run-length encoding of the binary bits,
while compressing 14629839399572435720381495231 as its prime
factorization (the 319th prime number, the 479th prime, the 499th prime
six times), and 10**1000 as an encoding of the decimal digits. But we
still need some sort of tag to distinguish between these different
compression schemes, and once you take into account the extra information
in the tag, there will be cases where some numbers aren't compressed at
all.

The ability to losslessly compress *everything* is sheer pseudo-
mathematics, up there with finding an exact rational value for pi, or
squaring the circle using only a compass and straight-edge. But the
ability to losslessly compress *useful data* is real, and such algorithms
operate by finding non-random patterns and redundant information in the
data.

--
Steven
•  at Nov 8, 2013 at 7:09 am ⇧

Steven D'Aprano wrote:
Of course, to reverse the
compression you need to keep a table of the prime numbers somewhere,

That's not strictly necessary -- you could calculate them
as needed. It wouldn't be *fast*, but it would work...

You've got me thinking now about how viable a compression
scheme this would be, efficiency issues aside. I suppose
it would depend on things like the average density of primes
and the average number of prime factors a number has.
Any number theorists here?

--
Greg
•  at Nov 8, 2013 at 7:21 am ⇧

On Fri, Nov 8, 2013 at 6:09 PM, Gregory Ewing wrote:
You've got me thinking now about how viable a compression
scheme this would be, efficiency issues aside. I suppose
it would depend on things like the average density of primes
and the average number of prime factors a number has.
Any number theorists here?

Start by coming up with an efficient storage representation for the
primes. Don't forget that you can't use a bitfield, because eg 98 =
7*7*2, so you somehow need to represent that the 7 comes up twice. And
you can't limit the size of your primes (eg by representing 98 as
"\x07\x07\x02").

And that's without dealing with the major issue of _finding_ prime
factors for a large number.

ChrisA
•  at Nov 8, 2013 at 3:48 pm ⇧

Den fredagen den 8:e november 2013 kl. 03:43:17 UTC+1 skrev zipher:
I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.

I think the idea would be to find the prime factorization for a given

number, which has been proven to be available (and unique) for any and

every number. Most numbers can compress given this technique. Prime

numbers, of course, would not be compressed.

--

MarkJ

Tacoma, Washington

3^2-2^2=5
•  at Nov 8, 2013 at 3:57 pm ⇧

On Friday, November 8, 2013 9:18:05 PM UTC+5:30, jonas.t... at gmail.com wrote:
Den fredagen den 8:e november 2013 kl. 03:43:17 UTC+1 skrev zipher:
I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.
I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.
I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

3^2-2^2=5

Oh my! What a profound insight!!
•  at Nov 8, 2013 at 6:48 pm ⇧

On Fri, Nov 8, 2013 at 8:48 AM, wrote:
3^2-2^2=5

How do you intend to encode 3**2 - 2**2 in such a way that it is more
compact than simply encoding 5? If you actually have an algorithm,
you should share it instead of dropping these cryptic one-line
non-explanations and leaving us guessing about the details. But I'm
starting to think that you don't actually have an algorithm at all,
whereas my initial impression was that you did and were simply
•  at Nov 8, 2013 at 2:36 am ⇧

On Friday, November 8, 2013 7:55:18 AM UTC+5:30, jonas wrote:
Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
On Fri, Nov 8, 2013 at 1:05 PM, jonas.thornvall wrote:
I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.
Also, please lose Google Groups, or check out what other people have
said about making it less obnoxious.
ChrisA
Please, you are he obnoxious, so fuck off or go learn about
reformulation of problems. Every number has an infinite number of
arithmetical solutions. So every number do has a shortest
arithmetical encoding. And that is not the hard part to figure out,
the hard part is to find a generic arithmetic encoding.

Since we are discussing profound matters such as cosmic laws,
here's my 'all-universal' law:

When you engage with a troll you get trolled.
•  at Nov 8, 2013 at 2:43 am ⇧

On Nov 7, 2013, at 21:25, jonas.thornvall at gmail.com wrote:

Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:
On Fri, Nov 8, 2013 at 1:05 PM, wrote:

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

I don't care how fast. I care about the laws of physics :) You can't

stuff more data into less space without losing some of it.

Also, please lose Google Groups, or check out what other people have

said about making it less obnoxious.

ChrisA
Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8e536.

Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

You might reply that you don't need a whole byte for 4 or 8 -- that's true. You could, e.g., just encode the fourth and eight bits of a single byte and send that: certainly gives some compression but at the cost of generality-- what would you do with 65^65? It's sort of like the mean-variance tradeoff in statistics; it's much easier to encode certain data sets (e.g. powers of two as you noted) but only if you concede your algorithm will only work for those values.

More generally, check out the work of Claud Shannon; a very accessible and worthwhile author.
•  at Nov 8, 2013 at 3:24 am ⇧

On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt <michael.weylandt@gmail.com> wrote:
Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian). So it would be
possible to represent that more compactly, if you deem one byte each
for the base and the exponent: 0x04 0x08. However, that system allows
you to represent just 62,041 possible numbers:

decomp={}
for base in range(256):
for exp in range(256):
decomp[base**exp]=base,exp

len(decomp)
62041

The trouble is, these numbers range all the way up from 0**0 == 0 to
255**255 == uhh...
255**255
46531388344983681457769984555620005635274427815488751368772861643065273360461098097690597702647394229975161523887729348709679192202790820272357752329882392140552515610822058736740145045150003072264722464746837070302159356661765043244993104360887623976285955058200326531849137668562738184397385361179287309286327712528995820702180594566008294593820621769951491324907014215176509758404760451335847252744697820515292329680698271481385779516652518207263143889034764775414387732372812840456880885163361037485452406176311868267428358492408075197688911053603714883403374930891951109790394269793978310190141201019287109375

which would decompress to (obviously) 255 bytes. So you can represent
just over sixty thousand possible 255-byte strings in two bytes with
this system.

To generalize it, you'd need to devote a bit somewhere to saying
"There are more to add in". Let's say you do this on the exponent byte
(high bit for convenience), so you have 0x04 0x08 means 65536, and
0x04 0x88 0x01 0x01 means 65536+1 = 65537. Now we have something that
generalizes; but the efficiency is gone - and there are too many ways
to encode the same value. (Bear in mind, for instance, that 0x01 0xNN
for any NN will still just represent 1, and 0x00 0xNN will represent
0. That's wasting a lot of bits.)

The application I can see for this sort of thing is not data
compression, but puzzles. There are lots of puzzles that humans find
enjoyable that represent an entire grid with less data than it
contains - for one popular example, look at Sudoku. You have a 9x9
grid where each cell could contain any of nine digits, which means a
theoretical information density of 9**81; but the entire grid can be
described by a handful of digits and heaps of blank space. This could
be a similarly-fun mathematical puzzle: 3359232 can be represented as
B1**E1 + B2**E2, where all numbers are single-digit. Find B1, B2, E1,
and E2. In this case, you're guaranteed that the end result is shorter
(four digits), but it's hardly useful for general work.

ChrisA
•  at Nov 8, 2013 at 4:05 am ⇧

On Nov 7, 2013, at 22:24, Chris Angelico wrote:

On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
<michael.weylandt@gmail.com> wrote:
Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte.
Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian).

Quite right -- meant C's int type, (machine word) not char. My mistake!

Michael
•  at Nov 8, 2013 at 4:06 am ⇧

On Fri, Nov 8, 2013 at 3:05 PM, R. Michael Weylandt <michael.weylandt@gmail.com> wrote:

On Nov 7, 2013, at 22:24, Chris Angelico wrote:

On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
<michael.weylandt@gmail.com> wrote:
Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte.
Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian).
Quite right -- meant C's int type, (machine word) not char. My mistake!

Sure. Assuming at least 32-bit words, yes, that's correct. Of course,
this is still just proving that it's {possible, not possible} to
compress specific values, while the OP claimed to compress anything.

ChrisA
•  at Nov 8, 2013 at 5:32 am ⇧

On Thu, 07 Nov 2013 18:25:18 -0800, jonas.thornvall wrote:

Please, you are he obnoxious, so fuck off

Pot, meet kettle.

I am not sure if it is just stupidness or laziness that prevent you from
seeing that 4^8e536.

And what is it that prevents you from seeing that 4**8e536 is
irrelevant to the compression of random data?

arithmetical solutions" and "generic arithmetic encoding" is just a smoke
screen to distract from the fact that you don't, in fact, have a
compression algorithm that can losslessly compress arbitrary random data
of any length.

I am as sure as this as I am sure you don't have a perpetual motion
machine, a method for squaring the circle using only a compass and
straight-edge, or a fool-proof system for winning the lottery every week
without any chance of failure.

--
Steven

## Related Discussions

Discussion Overview
 group python-list categories python posted Nov 2, '13 at 9:31p active Nov 8, '13 at 6:48p posts 48 users 16 website python.org

### 16 users in discussion

Content

People

Support

Translate

site design / logo © 2022 Grokbase