In article <km4fkv8qbbge3mvckv5gsim9amq59qvohb at 4ax.com>,
Geoff Howland <ghowland at lupineNO.SPAMgames.com> writes:
On Sat, 23 Aug 2003 13:56:13 +0000 (UTC), kern at taliesen.caltech.edu
(Robert Kern) wrote:
[snip]
In the end, it's just not worth the trouble. 99.9% of the time, I think
you will find that subclassing the builtin types and giving the new
classes short names will suffice. I mean, how much of a hardship is it
to do the following:
class d(dict):
def __add__(self, other):
# stuff
d({1:2}) + d({3:4})
It's not actually the hardship of doing this, there is also the
conversion of everything that doesn't create MyDict by default. I
would have to keep converting them to perform the same manipulations,
or other functions (add is only an important example, others could be
handy as well).
Now I will need additional lines to do this, or make the lines that
exist more complicated. It's not undue hardship in a lot of ways, but
it would be nice not to have to and be able to keep the characters
used to a minimum. Espcially when trying to write sub-50 line scripts
and such.
I understand this feeling. In the NumPy world, we have the same problem
with subclassing arrays. The builtin functions still return regular
arrays.
One particular type of solution is often possible. For example, making
an __add__ method for dicts probably doesn't require any other new
methods, just the normal dict ones (copy, update). That is, it doesn't
require the other object to be MyDict, just a normal dict. So you can
write:
d({1:2}) + {3:4}
Especially for the situations you are considering, where you are only
adding functionality, not changing the way other methods work, this will
often be a viable way of doing what you want.
If you're still not convinced, ask yourself these questions:
I actually feel confident I would be happy with my provided answers as
real functionality. I don't believe Python will be changed to allow
that, and that is fine with me. I just wanted to know if it could be
done the way Python is now.
The point of this exercise is to clarify the details what you actually
want, try to square them with the current architecture and each other,
and finally to show why they aren't there now.
* How would you apply the new subclass?
- Only on new literals after the subclass definition (and registry of
the subclass with some special hook)?
- On every new object that would normally have been the base type?
Yes and yes.
It is possible, if this were possible, to only make this work for
classes that imported it if they so desired I expect. Explicitly
requested only, etc.
Not in any clean way. See below.
- On every previously existing object with the base type?
No, they're already defined. Unless my understanding of the way types
are done is lacking, and they are referencing the same base object
code, and then maybe yes to this as well, but it would be passive.
The question is "do you really want this?" It would interfere with other
modules in a very invasive way and according to what you say below is
not what you want.
- In just the one module? or others which import it? or also in
modules imported after the one with the subclass?
In the module, and any that import it. It fails to be useful in
making things clean-and-neat if in order to make a clean-and-neat
script you have to first have a bunch of stuff that modifies the
language norms.
I think this is impossible without taking a performance hit in the
construction of all object with a builtin type. Every internal call to
PyDict_New, PyList_New, etc. would have to check the registry and
somehow know what module it is being called from. There would have to be
a reworking of the internals to give the Py<Object>_New functions that
kind of information. And you get a performance hit even if you don't
subclass builtin types.
And if you do subclass in your module, just for internal use, how can
you import this module into another without affecting it as well? That
is, you only want what the module does, not the redefinition of dicts?
* How does your choice above work with code compiled on-the-fly with
eval, exec or execfile?
It would depend on how Python allowed this change to work. If it did
it the way "I imagine" it would, then the change would effect all of
these circumstances.
I don't think it works the way you imagine. I think there would have to
be even more reworking of the way these work to pass in the information
that the exec'd code should use the redefinitions in the current module.
* How do you deal with multiple subclasses being defined and registered?
If they re-wrote the same thing and didn't just augment it (or
overwrote each other), then I would expect them to clobber each other.
Problems could ensue.
This is where I say "too much rope."
However, why isn't having {} + {} or [].len() part of the language
anyway? If it was, people wouldnt have to keep writing their own.
{} + {} doesnt have to solve ALL cases, which is why I'm assuming it
isnt already implemented as [] + [] is. It just has to solve the
basic case and provide the mechanism for determining the difference.
Ex, key clobbering add, and an intersect() function to see what will
be clobbered and do something about it if you care.
Isn't it better to have some default solution than to have none and
everyone keeps implementing it themselves anew every time?
Unless the default solution is the one clear way to do it, no. Since it
only takes a few lines to implement in any one way that you might want
to do it, there is very little incentive to make the default way one of
them. You want clobbering; I'll probably want the values added to each
other as in a sparse array.
For the same reason, the Python standard library doesn't have a priority
queue implementation. There are so many different ways to write a
priority queue with different behaviors, tradeoffs, and applications.
Since there are all pretty easy to implement in a few number of lines,
there is no incentive to include a default one in the standard library.
For the [].len() type things, this is obviously a matter of taste, and
I have always been ok with len([]), but in trying to get team mates to
work with Python I have to concede that [].len() is more consistent
and a good thing. Same for other builtins that apply.
Terry explains the historical reasons very well. I would also add that
the "len(obj) calls obj.__len__()" rule enforces a consistency among
user-defined classes and types. The canonical way to report a length is
to define a __len__ method, and the canonical way to get a length is to
call len(obj). If this were not the case, objects would be defining
.len(), .length(), .getLength(), .number(), .cardinality(), etc.
* How would you pass in initialization information?
E.g. say I want to limit the length of lists
class LimitList(list):
def __init__(self, maxlength, data=[]):
self.maxlength = maxlength
list.__init__(self, data)
def append(self, value):
if len(self) == self.maxlength:
raise ValueError, "list at maximum length"
else:
list.append(self, value)
I am actually interested in new features than modifying current ones,
but I suppose you would have this ability as well since you have
access.
I would say if you break the initialization or any other methods of
accessing the type in a default way, then you have just given yourself
broken code. No one other function will work properly that uses it
and didn't expect to.
Again, "too much rope."
You will see it doesn't work, and stop. I didn't say I wanted to
change ALL the behavior of the language, I just wanted to add features
that dont currently exist and I think should.
I, for one, rarely ever subclass the builtin types to add something
orthogonal to the original methods. I'm usually modifying the behavior
of current methods. There's no clean way for an implementation to know
this upon registry and raise an error.
* Can I get a real dict again if wanted to?
Save it, dont clobber the original out of existance. The coder could
mess this up (more rope).
* Given your choices above, how can you implement it in such a way that
you don't interfere with other people's code by accident?
There could be module limitations, but when I imagined this it was
globally reaching. I believe that as long as you add features, or any
changes you make dont break things, then you will be safe. If you
mess with features that exist then you will break things and your
programs will not work.
This seems natural enough to me. You ARE changing a default behavior,
there are risks, but if you dont change things that other functions
use (pre-existing methods), then you are safe.
This restriction really limits the usefulness of the feature. Compare
the changes that would have to be made at the architecture level to the
drawbacks of using MyDict({}).
Okay, so the last one isn't really fair, but the answers you give on the
other questions should help define in your mind the kind of behavior you
want. Reading up on Python internals with the Language Reference, the
dis module documentation, and some source code should give you an idea
of how one might go about an implementation and (more importantly) the
compromises one would have to make.
It seems the changes would have to be to the Python source, and I dont
want to make a new Python. I dont even want to "change" Python, I
want to augment it. I think these are reasonable additions, and I'm
sure I missed the discussions on why they aren't there now.
There may be very good reasons, and I could end up retracting my
desired feature set because of them, but I still wanted to see if I
could make it work for my own code.
At the Python prompt, "import this".
Then compare these specific features with the current way. Which is
safer? Which is more readable by someone unfamiliar with the code? Which
is more flexible? Which saves the most typing?
I think my answers above are sound. If it worked the way I hoped it
might, you could change things on a global scale and you still
wouldn't break anything.
Well, Python isn't architectured the same way that Ruby is. There are
fundamental problems in doing what you want.
My condolences on having to deal with such a team. It seems a little
silly to me to equate OO with Everything-Must-Be-a-Method-Call. But if
they insist, point them at [].__len__(). len([]) just calls [].__len__()
anyways. After writing code like that for a while, they'll probably get
over their method fixation.
I never had a problem with this myself, but people are different. I
have a great team, everyone has their own opinions. No need to
belittle them for differing.
Some people never get over the whitespacing. It's just the way the
people work. No need to see them as flawed. :)
No belittling was intended. I just wanted to express that I just don't
see the sense in it, and that if I had to work with such a team, I would
be tearing my hair out. I lose enough hair interfacing with FORTRAN,
thank you very much. :-)
Preferring [].len() to len([]) is one thing, claiming it's intrinsically
more OO is another.
--
Robert Kern
kern at caltech.edu
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter