FAQ
The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?

Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.

So the problem is to come up with a reasonable hashkey for each of
these objects. The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are "the same" in both. The first attribute is a simple dictionary
whose keys are integers and values are strings. The second attribute
is more complicated. It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings. The keys at every
level of this structure are strings. E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
'B': set(['6', '7', '8'])}

I'm looking for a good algorithm for computing a hash key for
something like this? (Basically I'm looking for a good way to
combine hashkeys.)

Thanks!

kj

## Search Discussions

• at Oct 6, 2010 at 7:13 pm ⇧ kj wrote:

The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
http://svn.python.org/view/python/trunk/Objects/tupleobject.c?revision�029&view=markup

static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}

I'm looking for a good algorithm for computing a hash key for
something like this? (Basically I'm looking for a good way to
combine hashkeys.)
If you want to combine the hashes from several objects then the
easiest way is just to create a tuple of the objects and hash it.
• at Oct 6, 2010 at 7:26 pm ⇧ The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
Surprisingly, in the source:

Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.

So the problem is to come up with a reasonable hashkey for each of
these objects. The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are "the same" in both. The first attribute is a simple dictionary
whose keys are integers and values are strings. The second attribute
is more complicated. It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings. The keys at every
level of this structure are strings. E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
'B': set(['6', '7', '8'])}

I'm looking for a good algorithm for computing a hash key for
something like this? (Basically I'm looking for a good way to
combine hashkeys.)
Creating tuples from dicts, recursively, and stabilized by using sorted
on items.

Diez
• at Oct 7, 2010 at 10:53 am ⇧ In <m2fwwjazs2.fsf at web.de> deets at web.de (Diez B. Roggisch) writes:
The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
Surprisingly, in the source:
Thanks to you, and to all who pointed me to the place in the source
where this is.

How exactly did you search for this? Taking a hint from the url
above, I went to Google Code Search and searched for "python tuple
hash" (minus the quotes, of course), which produced a ton of
irrelevant stuff (almost 80K hits). Searching for "python tuple
hash lang:c" cut down the number of hits to ~8K, but still too much
to wade through. Clearly I need a smarter search strategy (one
that does not include foreknowledge of the name of the actual
function in the C source, of course).

~kj
• at Oct 7, 2010 at 11:59 am ⇧ In <m2fwwjazs2.fsf at web.de> deets at web.de (Diez B. Roggisch) writes:
The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
Surprisingly, in the source:
Thanks to you, and to all who pointed me to the place in the source
where this is.

How exactly did you search for this? Taking a hint from the url
above, I went to Google Code Search and searched for "python tuple
hash" (minus the quotes, of course), which produced a ton of
irrelevant stuff (almost 80K hits). Searching for "python tuple
hash lang:c" cut down the number of hits to ~8K, but still too much
to wade through. Clearly I need a smarter search strategy (one
that does not include foreknowledge of the name of the actual
function in the C source, of course).
I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look.
From that I learned the tuplehash function name.
Diez
• at Oct 7, 2010 at 1:01 pm ⇧ In <87pqvmp611.fsf at web.de> deets at web.de (Diez B. Roggisch) writes:
I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look.
From that I learned the tuplehash function name.
You must be at least somewhat familiar with the Python source.
Everytime I've peeked into it I just feel lost, but it's clearly
something I need to master sooner or later... I can't wait for
the next one of those occasional introductions to the Python
internals at our local Python club.

Thanks,

~kj
• at Oct 7, 2010 at 1:25 pm ⇧ In <87pqvmp611.fsf at web.de> deets at web.de (Diez B. Roggisch) writes:
I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look.
From that I learned the tuplehash function name.
You must be at least somewhat familiar with the Python source.
Everytime I've peeked into it I just feel lost, but it's clearly
something I need to master sooner or later... I can't wait for
the next one of those occasional introductions to the Python
internals at our local Python club.
No, I'm not the tiniest bit. I just followed my instincts in looking
into the "Objects" folder, because that's where I suspected the
definition of objects to be....

And grep has been proven useful in these cases as well.

Diez
• at Oct 6, 2010 at 7:28 pm ⇧ On Wed, Oct 6, 2010 at 11:58 AM, kj wrote:

The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
From Objects/tuple.c, line 315 in Python3.2:
static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}
Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. ?The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.

So the problem is to come up with a reasonable hashkey for each of
these objects. ?The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are "the same" in both. ?The first attribute is a simple dictionary
whose keys are integers and values are strings. ?The second attribute
is more complicated. ?It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings. ?The keys at every
level of this structure are strings. ?E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
? ? ? 'b': set(['4', '5'])},
?'B': set(['6', '7', '8'])}

I'm looking for a good algorithm for computing a hash key for
something like this? ?(Basically I'm looking for a good way to
combine hashkeys.)
Sounds like you're trying to hash mutable data structures, which is a
no-no, but assuming you've got that part all figured out then the easy
way out would be to print the whole thing and take the hash of the
resulting string. A (possibly) better way would be to do a Schwartzian
transform and freeze everything. Other approaches may be better,
but those are the first two out of my head.

Geremy Condra

 http://en.wikipedia.org/wiki/Schwartzian_transform
• at Oct 6, 2010 at 7:31 pm ⇧ On 10/6/10 1:58 PM, kj wrote:
The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?
The function tuplehash() in Objects/tupleobject.c, predictably enough.
Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.
The most straightforward way to implement __hash__ for a complicated object is
to normalize its relevant data to a tuple of hashable objects and then call
hash() on that tuple. A straightforward way to compare such objects is to
calculate that very same normalized tuple for each one and compare those tuples.
Cache it if necessary. Don't bother hashing to implement __eq__ unless if you
are really optimizing for space and time.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco
• at Oct 6, 2010 at 9:43 pm ⇧ On 10/6/2010 2:58 PM, kj wrote:

These objects are non-mutable once they are created,
See below.
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.
I believe Python strings do this (cache the hash). Equality comparison
can check length, hashes, and only then chars.
So the problem is to come up with a reasonable hashkey for each of
these objects. The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are "the same" in both. The first attribute is a simple dictionary
whose keys are integers and values are strings. The second attribute
is more complicated. It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings. The keys at every
level of this structure are strings. E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
'B': set(['6', '7', '8'])}
If these two attributes, and hence the dicts, are public, then your
instances are mutable.

--
Terry Jan Reedy
• at Oct 7, 2010 at 10:42 am ⇧ In <mailman.1415.1286438617.29448.python-list at python.org> Terry Reedy <tjreedy at udel.edu> writes:
If these two attributes, and hence the dicts, are public, then your
instances are mutable.
I guess I should have written "immutable among consenting adults."

As far as I know, Python does not support private attributes, so
I guess the dicts are public no matter what I do. I suppose that
I can implement "frozendict", but I can't think of any way to
enforce the immutability of these "frozendicts" that would not be
trivial to circumvent (it better be, in fact, otherwise I wouldn't
be able to initialize the damn things).

~kj
• at Oct 7, 2010 at 3:10 pm ⇧ On 07/10/2010 11:42, kj wrote:
In<mailman.1415.1286438617.29448.python-list at python.org> Terry Reedy<tjreedy at udel.edu> writes:
If these two attributes, and hence the dicts, are public, then your
instances are mutable.
I guess I should have written "immutable among consenting adults."

As far as I know, Python does not support private attributes, so
I guess the dicts are public no matter what I do. I suppose that
I can implement "frozendict", but I can't think of any way to
enforce the immutability of these "frozendicts" that would not be
trivial to circumvent (it better be, in fact, otherwise I wouldn't
be able to initialize the damn things).
You would initialise them by creating them from a list of tuples (or an
iterable which yields tuples), like with a dict:
dict([(1, "foo"), (2, "bar")])
{1: 'foo', 2: 'bar'}
• at Oct 7, 2010 at 8:00 pm ⇧ The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?

Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object. The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.

So the problem is to come up with a reasonable hashkey for each of
these objects. The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are "the same" in both. The first attribute is a simple dictionary
whose keys are integers and values are strings. The second attribute
is more complicated. It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings. The keys at every
level of this structure are strings. E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
'B': set(['6', '7', '8'])}

I'm looking for a good algorithm for computing a hash key for
something like this? (Basically I'm looking for a good way to
combine hashkeys.)
You could do something like this:

deep_methods = {
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
}

def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)

def deepfreeze(obj):
"""Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)

def deephash(obj):
"""Return hash(deepfreeze(obj)) without deepfreezing"""
return hash(apply_method(deephash, obj))

# Example of deepfreezable object:
obj = [1, "foo", {(2, 4): {7, 5, 4}, "bar": "baz"}]

deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
deephash(obj)
1341422540
hash(deepfreeze(obj))
1341422540

--
Arnaud
• at Oct 9, 2010 at 7:54 pm ⇧ In <87y6a9lqnj.fsf at gmail.com> Arnaud Delobelle <arnodel at gmail.com> writes:
You could do something like this:
deep_methods = {
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
}
def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)
def deepfreeze(obj):
"""Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)
def deephash(obj):
"""Return hash(deepfreeze(obj)) without deepfreezing"""
return hash(apply_method(deephash, obj))
# Example of deepfreezable object:
obj = [1, "foo", {(2, 4): {7, 5, 4}, "bar": "baz"}]
^ ^
`-------`------- what's this?

deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
deephash(obj)
1341422540
hash(deepfreeze(obj))
1341422540

After fixing the missing """ in deepfreeze this code works as
advertised, but I'm mystified by the identity between hash(deepfreeze(...))
and deephash(...). Without some knowledge of the Python internals,
I don't see how this follows.

More specifically, it is not obvious to me that, for example,

hash(frozenset((<whatever>,)))

would be identical to

hash(frozenset((hash(<whatever>),)))

but this identity has held every time I've checked it. Similarly
for other more complicated variations on this theme.

Anyway, thanks for the code. It's very useful.

~kj
• at Oct 9, 2010 at 8:39 pm ⇧ In <87y6a9lqnj.fsf at gmail.com> Arnaud Delobelle <arnodel at gmail.com> writes:
You could do something like this:
deep_methods = {
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
}
def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)
def deepfreeze(obj):
"""Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)
def deephash(obj):
"""Return hash(deepfreeze(obj)) without deepfreezing"""
return hash(apply_method(deephash, obj))
# Example of deepfreezable object:
obj = [1, "foo", {(2, 4): {7, 5, 4}, "bar": "baz"}]
^ ^
`-------`------- what's this?
This is set literal notation, introduced in Python 3!
In 2.X, you would write set([7, 5, 4])
deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
deephash(obj)
1341422540
hash(deepfreeze(obj))
1341422540

After fixing the missing """ in deepfreeze this code works as
Oops, I must admit I added the docstrings after pasting the code :)
but I'm mystified by the identity between hash(deepfreeze(...)) and
deephash(...). Without some knowledge of the Python internals, I
don't see how this follows.
OK. I haven't actually proved it, but it follows from the following
facts:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).

2. Container hashable objects compute their hash from the hash of their
elements.

I don't think either of these two facts is documented, but it would be quite
easy to verify them in the Python source (I must admit I have not done
it). And it is difficult to conceive how it could be otherwise.

--
Arnaud
• at Oct 10, 2010 at 12:31 am ⇧ On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).
It's a beautiful theory, but, alas, it is not the case.
hash(-1) == -1
False
hash(2**64) == 2**64
False

to give only two of an infinite number of counter-examples.

Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions
before, or after, ints and longs were partially integrated?

[steve at sylar ~]\$ python2.1
Python 2.1.3 (#1, Aug 12 2010, 01:53:57)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
2**64
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: integer exponentiation
>>>

People keep forgetting that 2.2 introduced nearly as many far-reaching
changes as 3.0.

--
Steven
• at Oct 10, 2010 at 8:51 am ⇧ Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).
It's a beautiful theory, but, alas, it is not the case.
hash(-1) == -1
False
This is a counter-example for the (invalid) premise that hash(x) == x,
but not for the invariant of hash(hash(x)) == hash(x).
hash(hash(-1)) == hash(-1)
True
hash(hash(2**64)) == hash(2**64)
True
Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions
before, or after, ints and longs were partially integrated?
I would take it to mean the type 2.x calls 'int', i.e. fixed-width
integer type.
• at Oct 10, 2010 at 10:06 am ⇧ Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).
It's a beautiful theory, but, alas, it is not the case.
hash(-1) == -1
False
hash(2**64) == 2**64
False

to give only two of an infinite number of counter-examples.
I can only see one counterexample, (-1). 2**64 is of type 'long' in 2.X
on your machine (or, to be pedantic, 2.x where x >= 2). And, in fact,
(-1) is the only int such that hash(x) != x.

In can only guess that (-1) is a value that has a special meaning when
hashing. Try this (Python 2.6):
class A(object):
... def __hash__(self): return -1
...
a = A()
hash(a)
-2

Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions
before, or after, ints and longs were partially integrated? Either.
[steve at sylar ~]\$ python2.1
Python 2.1.3 (#1, Aug 12 2010, 01:53:57)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
2**64
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: integer exponentiation

People keep forgetting that 2.2 introduced nearly as many far-reaching
changes as 3.0.
I didn't forget, but what bearing does it have on this particular issue?
• at Oct 10, 2010 at 1:49 pm ⇧ In <87hbgu8irb.fsf at gmail.com> Arnaud Delobelle <arnodel at gmail.com> writes:
Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).
It's a beautiful theory, but, alas, it is not the case.
hash(-1) == -1
False
hash(2**64) == 2**64
False

to give only two of an infinite number of counter-examples.
(Infinite???)
And, in fact,
(-1) is the only int such that hash(x) != x.
Arnaud, how did you determine that -1 is the only such int? I
can't imagine any other approach other than a brute-force check of
all ints... When I tried this I ran into unforeseen limitations
in xrange, etc. It's all very doable, but still, it would take at
least about 3 hours on my laptop.
In can only guess that (-1) is a value that has a special meaning when
hashing. Try this (Python 2.6):
class A(object):
... def __hash__(self): return -1
...
a = A()
hash(a)
-2
Very cool.

BTW, thank you for the explanation in your previous post. It makes
a lot of sense. I find it interesting (as Hrvoje pointed out) that
the hash function is (or appears to be) idempotent on integers
(long or not), even though it is not the identity on the integers.
Thanks to Steven for the counterexamples to show the latter. I've
learned tons from this exchange.

~kj
• at Oct 10, 2010 at 3:32 pm ⇧ On Sun, 10 Oct 2010 13:49:09 +0000, kj wrote:

to give only two of an infinite number of counter-examples.
(Infinite???)
I was mistaken. Given Arnaud's specification that we look only at the
Python 2.x ints, I believe that there is only one counter-example, namely
-1.

However, in Python 3.x I would be correct. The reasoning is a simple
application of the pigeon-hole principle. Since hash(n) is limited to
returning a finite number of distinct results, but there are an infinite
number of Python 3 ints, it follows that there must be an infinite number
of collisions.

And, in fact,
(-1) is the only int such that hash(x) != x.
Arnaud, how did you determine that -1 is the only such int? I can't
imagine any other approach other than a brute-force check of all ints...
Reading the source code is also a good approach.

I don't read C very well, but even I can work out what this is doing:

static long
int_hash(PyIntObject *v)
{
/* XXX If this is changed, you also need to change the way
Python's long, float and complex types are hashed. */
long x = v -> ob_ival;
if (x == -1)
x = -2;
return x;
}

(from intobject.c in Python 2.6.1). It takes a Python int object,
extracts the value of it as a C long, returns -2 if the value is -1
otherwise returns the value.

When I tried this I ran into unforeseen limitations in xrange, etc.
It's all very doable, but still, it would take at least about 3 hours on
my laptop.

What are you doing? This takes less than 20 seconds to test up to 2**24:

import time
def test(n):
t = time.time()
assert hash(0) == 0
assert hash(1) == 1
assert hash(-1) == -2
for i in xrange(2, n):
assert hash(i) == i, "failed %d" % i
assert hash(-i) == -i, "failed %d" % -i
t = time.time() - t
return t

test(2**24)
18.076412916183472

Since 2**31 is 2**7 = 128 times bigger, I estimate that testing
everything up to 2**31 should take less than 45 minutes. And my PC is not
exactly the fastest machine on the block.

--
Steven
• at Oct 10, 2010 at 8:57 pm ⇧ In <4cb1dc9a\$0\$29970\$c3e8da3\$5496439d at news.astraweb.com> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
Reading the source code is also a good approach.
Every time I have attempted to answer a question by looking at the
Python C source, all I've achieved was wasting time, sometimes a
*lot* of time, so by now I've developed what can only be described
as a phobia to it. I probably need professional help at this point.

~kj
• at Oct 10, 2010 at 3:49 pm ⇧ In <87hbgu8irb.fsf at gmail.com> Arnaud Delobelle <arnodel at gmail.com> writes:
[...]
And, in fact,
(-1) is the only int such that hash(x) != x.
Arnaud, how did you determine that -1 is the only such int? I
can't imagine any other approach other than a brute-force check of
all ints... When I tried this I ran into unforeseen limitations
in xrange, etc. It's all very doable, but still, it would take at
least about 3 hours on my laptop.
I looked at the source, namely the get_hash() function in the
intobject.c file in the 2.x source, and the get_hash() function in the
longobject.c file in the 3.x source.
In can only guess that (-1) is a value that has a special meaning when
hashing. Try this (Python 2.6):
class A(object):
... def __hash__(self): return -1
...
a = A()
hash(a)
-2
Very cool.

BTW, thank you for the explanation in your previous post. It makes
a lot of sense. I find it interesting (as Hrvoje pointed out) that
the hash function is (or appears to be) idempotent on integers
(long or not), even though it is not the identity on the integers.
Thanks to Steven for the counterexamples to show the latter. I've
learned tons from this exchange.
It still seems to hold that hash() is idempotent on all objects.

I have learnt too that hash(-1) is not (-1), and that it seems that a
hash value is not allowed to be (-1). There is one thing left to find
out. Why can't it be (-1)? Maybe looking at the source for the hash()
builtin would yield a clue, or maybe someone who knows would be able to
tell us?

--
Arnaud
• at Oct 10, 2010 at 4:24 pm ⇧ Arnaud Delobelle <arnodel at gmail.com> writes:

I have learnt too that hash(-1) is not (-1), and that it seems that a
hash value is not allowed to be (-1). There is one thing left to find
out. Why can't it be (-1)?
Because -1 has a special meaning in the C function equivalent to
Python's hash(). PyObject_Hash returning -1 means that an exception was
raised somewhere inside the object's __hash__ method. For that reason
hash functions that really return -1 must change that to another value,
and -2 is as good a replacement as any.

This is documented in
http://docs.python.org/c-api/object.html?highlight=pyobject_hash#PyObject_Hash
• at Oct 10, 2010 at 2:28 pm ⇧ On Sun, 10 Oct 2010 11:06:00 +0100, Arnaud Delobelle wrote:

Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x)
hold for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).
It's a beautiful theory, but, alas, it is not the case.
hash(-1) == -1
False
hash(2**64) == 2**64
False

to give only two of an infinite number of counter-examples.
I can only see one counterexample, (-1). 2**64 is of type 'long' in 2.X
on your machine (or, to be pedantic, 2.x where x >= 2). And, in fact,
(-1) is the only int such that hash(x) != x.
Fair point. I was mistaken.

I had the impression that the integration between ints and longs since
Python 2.2 was more extensive than it actually is. I made the above
comments based on the idea that since Python 2.2, longs are a subclass of
ints, e.g. that isinstance(2**64, int) would return True. Turns out that
I'm wrong, in which case I agree with you that -1 is the only int counter-
example.

As an aside, this makes me glad that I have continued writing
isinstance(x, (int, long)) in my code, even though I "knew" it was
unnecessary. Not the first time, and probably not the last, that a "this
can never happen" test saved the day.

--
Steven