FAQ
I am new to python, so please bear with me if I am making some
conceptual error.

Basically I want to create a graph with an adjacency list
representation, but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable. I have a function createEdge
that adds an edge to the graph. The arguments will be distinct since
they are read from text files. But basically I want to use the
dictionary as a string pool, and if the argument string equals
something in the pool already, don't use the argument string, just a
use a reference to something in the string pool already.

Is this a waste of time? Because I know in python I cannot be certain
that the argument strings that are read from files are even garbage
collected anyway. I could certainly do the job with duplicate
strings, but it would seem wasteful. I am a C programmer, and old
habits die hard.

The following code works -- I tested it and entries in the adjacency
list that are equivalent are in fact identical. But it seems rather
stupid to have a dictionary of the form {'alice': 'alice', 'bob':
'bob'}, etc. i.e. the keys and values are the same. It would be nice
if I can get a reference to the key in the same time as a hash lookup.
I know I can do dictionary.keys().index( 'string' ) or something but
that would be pretty inefficient, I believe.

class GraphAccumulator:
def __init__( self, fileFunction ):
self.fileFunction = fileFunction
self.adjList = {} # adjacency list
self.nodes = {} # list of nodes for
preventing string dupes

def createEdge( self, node1, node2 ):

# another way of doing by using a dictionary

nodes = self.nodes
if nodes.has_key( node2 ):
node2 = nodes[ node2 ]
else:
nodes[ node2 ] = node2

adjList = self.adjList
if adjList.has_key( node1 ):
adjList[ node1 ].append( node2 );
else:
adjList[ node1 ] = [ node2 ]; # singleton edge list

Search Discussions

  • Ben Finney at Aug 1, 2003 at 12:26 am

    On 31 Jul 2003 17:36:41 -0700, Andy C wrote:
    Is this a waste of time? Because I know in python I cannot be certain
    that the argument strings that are read from files are even garbage
    collected anyway. I could certainly do the job with duplicate
    strings, but it would seem wasteful. I am a C programmer, and old
    habits die hard.
    Python dynamically binds names ("variables") to objects; many types,
    including immutable strings, will not be duplicated if an identical one
    is already stored.

    The upshot is: store an identical string as many times as you like,
    because PYthon will simply keep references to the one string object for
    each duplicate.

    --
    \ "Friendship is born at that moment when one person says to |
    `\ another, 'What! You too? I thought I was the only one!'" -- |
    _o__) C.S. Lewis |
    Ben Finney <http://bignose.squidly.org/>
  • Terry Reedy at Aug 1, 2003 at 12:46 am
    "Andy C" <andychup at yahoo.com> wrote in message
    news:645db655.0307311636.71923378 at posting.google.com...
    I am new to python, so please bear with me if I am making some
    conceptual error.

    Basically I want to create a graph with an adjacency list
    representation, but I don't want any of the adjacency lists to have
    duplicate strings when it is avoidable. I have a function
    createEdge
    that adds an edge to the graph. The arguments will be distinct since
    they are read from text files. But basically I want to use the
    dictionary as a string pool, and if the argument string equals
    something in the pool already, don't use the argument string, just a
    use a reference to something in the string pool already.
    Thinking in terms of 'references' can both help and hinder. (There
    have been some long recent discussion.)

    Are you familiar with this?
    help('intern')
    Help on built-in function intern:

    intern(...)
    intern(string) -> string

    ``Intern'' the given string. This enters the string in the
    (global)
    table of interned strings whose purpose is to speed up dictionary
    lookups.
    Return the string itself or the previously interned string object
    with the
    same value.

    TJR
  • Andy C at Aug 1, 2003 at 6:30 am
    Thanks, I think that is exactly what I'm looking for. I guess if you do a intern(a), then if a is equivalent but not identical to something in the
    global string table, it will assign a to reference the object in the global
    string table, and thus the old "a" value can be garbage collected.

    Well, for a C/C++ programmer, they ARE references, and I can't imagine
    thinking otherwise. They are implemented as references/pointers in the
    interpreter. Thinking in value semantics can simplify things, but there
    will always be some cases where it doesn't work.

    I think my example is a good one. If you have a graph in an adjacency list
    representation, you don't want the nodes to be copies of each other. They
    should just point to a global list of nodes. This can definitely matter...
    the worst case is when you have a complete graph, where you would have
    n(n-1)/2 (or n(n-1) for a complete directed graph) node objects where you
    only need n. Instead of 100,000 nodes (not an unreasonable size), you could
    have 10 billion (totally unreasonable).

    But it is a good question whether the garbage collection will make the
    difference worth it, in most cases. I could reassign my references, but I
    don't know when the old objects get deleted. That is, I can do a intern(a), but the old value of a could hang around in memory for who knows
    how long. If anyone could shed some light on this, I would appreciate it.


    "Terry Reedy" <tjreedy at udel.edu> wrote in message
    news:Qf-cnYqDQejEJbSiXTWJjw at comcast.com...
    "Andy C" <andychup at yahoo.com> wrote in message
    news:645db655.0307311636.71923378 at posting.google.com...
    I am new to python, so please bear with me if I am making some
    conceptual error.

    Basically I want to create a graph with an adjacency list
    representation, but I don't want any of the adjacency lists to have
    duplicate strings when it is avoidable. I have a function
    createEdge
    that adds an edge to the graph. The arguments will be distinct since
    they are read from text files. But basically I want to use the
    dictionary as a string pool, and if the argument string equals
    something in the pool already, don't use the argument string, just a
    use a reference to something in the string pool already.
    Thinking in terms of 'references' can both help and hinder. (There
    have been some long recent discussion.)

    Are you familiar with this?
    help('intern')
    Help on built-in function intern:

    intern(...)
    intern(string) -> string

    ``Intern'' the given string. This enters the string in the
    (global)
    table of interned strings whose purpose is to speed up dictionary
    lookups.
    Return the string itself or the previously interned string object
    with the
    same value.

    TJR
  • Andrew Dalke at Aug 1, 2003 at 8:40 am

    Andy C:
    Basically I want to create a graph with an adjacency list
    representation, but I don't want any of the adjacency lists to have
    duplicate strings when it is avoidable.
    Took me a while but I finally figured out what you're asking.

    Look in the docs for 'intern', which is a builtin. It's rarely used
    because most people don't worry about this sort of duplication.
    Is this a waste of time? Because I know in python I cannot be certain
    that the argument strings that are read from files are even garbage
    collected anyway. I could certainly do the job with duplicate
    strings, but it would seem wasteful. I am a C programmer, and old
    habits die hard.
    You can never be certain that anything is every garbage collected.
    Python-the-language makes no guarantees on that. Python-in-C
    does make a stronger statement, but it's an implementation issue. As
    a C programmer you'll be happy that it's a reference counted scheme
    and so long as there are no cycles (which can't happen with a string),
    the string will be deallocated when your code lets go of it.

    This is true even of interned strings, at least in newer Pythons. If
    no one references an interned string, it too goes away. (Older
    Pythons make the strings immortal.)

    Here's how you might change your code.

    class GraphAccumulator:
    def __init__( self, fileFunction ):
    self.fileFunction = fileFunction
    self.adjList = {} # adjacency list

    def createEdge( self, node1, node2 ):

    # another way of doing by using a dictionary
    node1 = intern(node1)
    node2 = intern(node2)
    adjList = self.adjList
    if adjList.has_key( node1 ):
    adjList[ node1 ].append( node2 );
    else:
    adjList[ node1 ] = [ node2 ]; # singleton edge list

    But personally I would put the intern outside of the graph
    code - either in the reader or the code between the reader
    and this one. Otherwise, consider:

    # fully connect the graph
    nodes = graph.adjList.keys()
    for n1 in nodes:
    for n2 in nodes:
    if n1 != n2:
    graph.createEdge(n1, n1)

    This does extra interning even though it isn't needed.

    But then again, normally I wouldn't worry about the interning
    at all. ... Perhaps I should... Hmmm...

    Andrew
    dalke at dalkescientific.com
  • Aahz at Aug 2, 2003 at 5:10 am
    In article <645db655.0307311636.71923378 at posting.google.com>,
    Andy C wrote:
    Basically I want to create a graph with an adjacency list
    representation, but I don't want any of the adjacency lists to have
    duplicate strings when it is avoidable. I have a function createEdge
    that adds an edge to the graph. The arguments will be distinct since
    they are read from text files. But basically I want to use the
    dictionary as a string pool, and if the argument string equals
    something in the pool already, don't use the argument string, just a
    use a reference to something in the string pool already.
    That makes some sense, but why use the string object for both key and
    value? Just do

    d[key] = True

    You could optimize your coding style (if not your speed) in Python 2.3
    by using sets. I'd recommend against using intern(), because in Python
    2.2 and earlier, interned strings *never* get garbage collected. Even
    in Python 2.3, you may end up with worse memory behavior.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    This is Python. We don't care much about theory, except where it intersects
    with useful practice. --Aahz
  • Andy C at Aug 2, 2003 at 10:05 am
    I don't see how I can do this and let me eliminate duplicates. I need to
    assign the old duplicate string to the unique string that already exists.
    Hence the question, how do I get a reference to the KEY value? I know I can
    use keys() and do a linear search, but that is much more inefficient. I
    would like to get a reference to the key value in the same time that it
    takes to do a hash lookup (constant time).

    Basically I would want to rewrite this section of the code I posted:

    nodes = self.nodes
    if nodes.has_key( node2 ):
    node2 = nodes[ node2 ]
    else:
    nodes[ node2 ] = node2

    This dictionary seems stupid, I agree. The keys and values are the same.
    But in the first part of the if, I want to reassign node2 to an equivalent
    string that already exists. How can I do that?

    The intern solution seems reasonable, and it appears that it was designed
    specifically for this problem. I wasn't aware of the implementation
    problems. But I'm still curious about different ways to do it.
    That makes some sense, but why use the string object for both key and
    value? Just do

    d[key] = True

    You could optimize your coding style (if not your speed) in Python 2.3
    by using sets. I'd recommend against using intern(), because in Python
    2.2 and earlier, interned strings *never* get garbage collected. Even
    in Python 2.3, you may end up with worse memory behavior.
    --
    Aahz (aahz at pythoncraft.com) <*>
    http://www.pythoncraft.com/
    This is Python. We don't care much about theory, except where it
    intersects
    with useful practice. --Aahz
  • Aahz at Aug 2, 2003 at 2:16 pm
    In article <w1MWa.133$Tz4.17019035 at newssvr21.news.prodigy.com>,
    Andy C wrote:
    I don't see how I can do this and let me eliminate duplicates. I need
    to assign the old duplicate string to the unique string that already
    exists. Hence the question, how do I get a reference to the KEY value?
    I know I can use keys() and do a linear search, but that is much more
    inefficient. I would like to get a reference to the key value in the
    same time that it takes to do a hash lookup (constant time).
    Ahhhh.... Right. Hmmmmm.... <thinks hard> You're correct, you do
    need to set the value to the key. I think using a dict is better than
    using intern(). Here's an optimization:

    node2 = node_cache.setdefault(node2, node2)
    The intern solution seems reasonable, and it appears that it was
    designed specifically for this problem. I wasn't aware of the
    implementation problems. But I'm still curious about different ways to
    do it.
    intern() is intended more for a small-scale optimization of
    frequently-used strings rather than as a mechanism for memory
    management.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    This is Python. We don't care much about theory, except where it intersects
    with useful practice. --Aahz
  • Bengt Richter at Aug 2, 2003 at 11:32 pm

    On 2 Aug 2003 10:16:13 -0400, aahz at pythoncraft.com (Aahz) wrote:
    In article <w1MWa.133$Tz4.17019035 at newssvr21.news.prodigy.com>,
    Andy C wrote:
    I don't see how I can do this and let me eliminate duplicates. I need
    to assign the old duplicate string to the unique string that already
    exists. Hence the question, how do I get a reference to the KEY value?
    I know I can use keys() and do a linear search, but that is much more
    inefficient. I would like to get a reference to the key value in the
    same time that it takes to do a hash lookup (constant time).
    Ahhhh.... Right. Hmmmmm.... <thinks hard> You're correct, you do
    need to set the value to the key. I think using a dict is better than
    using intern(). Here's an optimization:

    node2 = node_cache.setdefault(node2, node2)
    I would sure be more comfortable with a change in nomenclature, e.g.,

    node2name = node_name_cache.setdefault(node2name, node2name)

    so that it is clear that you are dealing with name strings, not typical graph
    vertex/node objects.

    BTW, reading from a file, you can probably not in general avoid forward references,
    (i.e., names of adjacent nodes yet to be defined), but they should be resolvable at the end,
    so you could make nodes that refer to adjacent nodes directly instead of by name.
    Nodes would carry a ref to their own (string) name, and go by that same name string in a graph dict,
    so there should be no duplication of strings, just references to them. E.g.,
    (I started to just sketch something, and ... well, it's hard to stop programming Python
    before something is running at least preliminarily ;-) I subclassed dict to keep nodes in by name.

    ====< graph.py >=========================================================
    def boxit(s):
    lines = s.splitlines()
    wid = max(map(len,lines))
    ret = ['+-%s-+' % ('-'*wid)]
    fmt = '| %%-%ds |'%wid
    for line in lines: ret.append(fmt%line)
    ret.append(ret[0])
    return '\n'.join(ret)

    def errep(Ex, *rest):
    if __debug__: print boxit('%s: %s'%(Ex.__name__, rest and rest[0])) # and blunder on
    else: raise Ex(*rest)

    class Node(object):
    __slots__ = 'name', 'adj_list'
    def __init__(self, name):
    self.name = name
    self.adj_list = None # signify undefined, vs [] for no adjacent nodes

    class Graph(dict):
    def load(self, infile, print_lines=False):
    for line in infile:
    if print_lines: print line.rstrip()
    sline = line.strip()
    if not sline or sline.startswith('#'): continue
    node_def_names = line.split() # assume line like 'nodename adj_node1_name adj_node2_name ...'
    node_name = node_def_names[0]
    if node_name in self:
    # don't allow re-definition
    errep(ValueError, 'Node %r being re-defined:\n new adj: %r\n old adj: %r' %(
    node_name, node_def_names[1:], self[node_name].adj_list))
    else:
    self[node_name] = Node(node_name)
    # set adj list, making making this a defined node
    self[node_name].adj_list = node_def_names[1:]
    # change adj lists to direct references to adjacent nodes
    for node in self.values():
    adj_list = node.adj_list
    assert adj_list is not None
    for i in xrange(len(adj_list)):
    adj_name = adj_list[i]
    try:
    adj_list[i] = self[adj_name]
    except KeyError:
    errep( ValueError, 'node %r refers to undefined adj node %r' % (node.name, adj_name))
    # continuing if debugging
    adj_list[i] = Node(adj_name) # adj_list None (vs []) signifies undefined
    def show(self):
    node_names = self.keys()
    node_names.sort()
    visited = {}
    ret = []
    def prtree(node, indent=0):
    if node.name in visited:
    if indent: ret.append('%s%s (see previous)'%(indent*' ',node.name))
    else:
    visited[node.name]=True
    if node.adj_list is None:
    ret.append('%s%s (undefined) '%(indent*' ',node.name))
    return
    ret.append('%s%s'%(indent*' ',node.name))
    for adj in node.adj_list:
    prtree(adj, indent+1)
    for node_name in node_names:
    prtree(self[node_name])
    ret.append('')
    return '\n'.join(ret)

    def test(filepath=''):
    graph = Graph() # name-string -> Node instance
    if not filepath:
    import StringIO
    f_graph = StringIO.StringIO("""
    A B C D
    B E F
    C G H
    G A
    D B
    E
    F
    H
    D E F
    X E Y
    """)
    else:
    f_graph = file(filepath)

    print 'Loading graph from %s ...\n--' % (filepath or '(test)')
    graph.load(f_graph, True) # print the lines read
    print '--'
    f_graph.close()

    print graph.show()

    if __name__ == '__main__':
    import sys
    args = sys.argv[1:]
    test(args and args[0] or '')
    =========================================================================

    Result:

    [16:33] C:\pywk\clp\pp>graph.py
    Loading graph from (test) ...
    --

    A B C D
    B E F
    C G H
    G A
    D B
    E
    F
    H
    D E F
    +----------------------------------------+
    ValueError: Node 'D' being re-defined: |
    new adj: ['E', 'F'] |
    old adj: ['B'] |
    +----------------------------------------+
    X E Y

    +-------------------------------------------------------+
    ValueError: node 'X' refers to undefined adj node 'Y' |
    +-------------------------------------------------------+
    --
    A
    B
    E
    F
    C
    G
    A (see previous)
    H
    D
    B (see previous)
    X
    E (see previous)
    Y (undefined)

    (Not tested beyond what you see. NTBWYS? ;-)

    Regards,
    Bengt Richter
  • John Machin at Aug 2, 2003 at 11:16 pm
    "Andy C" <ayc8NOSPAM at cornell.edu> wrote in message news:<w1MWa.133$Tz4.17019035 at newssvr21.news.prodigy.com>...
    I don't see how I can do this and let me eliminate duplicates. I need to
    assign the old duplicate string to the unique string that already exists.
    Hence the question, how do I get a reference to the KEY value? I know I can
    use keys() and do a linear search, but that is much more inefficient. I
    would like to get a reference to the key value in the same time that it
    takes to do a hash lookup (constant time).

    Basically I would want to rewrite this section of the code I posted:

    nodes = self.nodes
    if nodes.has_key( node2 ):
    node2 = nodes[ node2 ]
    else:
    nodes[ node2 ] = node2

    This dictionary seems stupid, I agree. The keys and values are the same.
    But in the first part of the if, I want to reassign node2 to an equivalent
    string that already exists. How can I do that?

    The intern solution seems reasonable, and it appears that it was designed
    specifically for this problem. I wasn't aware of the implementation
    problems. But I'm still curious about different ways to do it.
    Google("intern-like memory saver").
  • Andy C at Aug 2, 2003 at 11:34 pm
    Google("intern-like memory saver").
    Well, that seems very sensical, so how come it hasn't made it into the
    language? And what's wrong with intern? (Though intern only works on
    strings, not for immutable objects in general. I believe someone was asking
    a pretty much identical question here, and someone replied with the
    'memoize' pattern).

    Can this be done without C, now that you can subclass the built-in
    dictionary?

    Andy
  • Bengt Richter at Aug 3, 2003 at 1:24 am

    On Sat, 02 Aug 2003 23:34:20 GMT, "Andy C" wrote:

    Google("intern-like memory saver").
    Well, that seems very sensical, so how come it hasn't made it into the
    language? And what's wrong with intern? (Though intern only works on
    strings, not for immutable objects in general. I believe someone was asking
    a pretty much identical question here, and someone replied with the
    'memoize' pattern).

    Can this be done without C, now that you can subclass the built-in
    dictionary?
    For a subclass of dict that may be of interest, see my post in this thread
    timestamped less than 2 minutes before this post of yours ;-)

    Regards,
    Bengt Richter
  • Andy C at Aug 3, 2003 at 8:51 am
    Thanks, looks interesting... but it actually doesn't really address the
    question I was asking. It doesn't matter since I think the intern solution
    is fine for me. But if I'm not mistaken, your solution still stores the
    node names in the values of the dictionary (via the Node objects). So in
    effect you do still have a dictionary of the form { 'node1name':
    ('node1name', other node1 data), 'node2name': ('node2name', other node2
    data)). It doesn't seem as silly when you have a real node object, though.
    In my application the nodes are really just strings; I don't need anything
    else.

    Andy

    "Bengt Richter" <bokr at oz.net> wrote in message
    news:bgho7i$790$0 at 216.39.172.122...
    On Sat, 02 Aug 2003 23:34:20 GMT, "Andy C" wrote:

    Google("intern-like memory saver").
    Well, that seems very sensical, so how come it hasn't made it into the
    language? And what's wrong with intern? (Though intern only works on
    strings, not for immutable objects in general. I believe someone was
    asking
    a pretty much identical question here, and someone replied with the
    'memoize' pattern).

    Can this be done without C, now that you can subclass the built-in
    dictionary?
    For a subclass of dict that may be of interest, see my post in this thread
    timestamped less than 2 minutes before this post of yours ;-)

    Regards,
    Bengt Richter
  • Bengt Richter at Aug 4, 2003 at 8:17 am

    On Sun, 03 Aug 2003 08:51:40 GMT, "Andy C" wrote:
    Thanks, looks interesting... but it actually doesn't really address the
    question I was asking. It doesn't matter since I think the intern solution
    I thought it did ;-)
    is fine for me. But if I'm not mistaken, your solution still stores the
    node names in the values of the dictionary (via the Node objects). So in
    No, just references, not duplicate strings.
    effect you do still have a dictionary of the form { 'node1name':
    ('node1name', other node1 data), 'node2name': ('node2name', other node2
    data)). It doesn't seem as silly when you have a real node object, though.
    In my application the nodes are really just strings; I don't need anything
    else.
    Well, don't you need the adjacency lists you were talking about? I assumed they were
    all lists of out-edges associated with graph vertices. So I defined a Node class
    with the two things: a name and an adj_list. You can eliminate the name reference
    and just let the graph dict keys be the only reference, but then you will not
    so conveniently have the name available to algorithms that process the nodes.
    E.g., for some node myNode you might have to search the dict something like

    for key,value in graph.items(): # untested!
    if value is myNode: myName=key; break
    else:
    myName = "?? can't find name of %r ??'%myNode

    One caveat though, if your graph has cycles, changing the adjacency lists to direct node
    references instead of names will create reference cycles that may have to be broken for
    the gc. I have to look into that. You can comment that fixup loop out if you want to
    keep adjacency lists as name string references instead of node references.

    BTW, have you looked at the sets module of 2.3? That would let you have a set of strings
    with no associated values, and no duplicates (but how do you represent adjacency lists then?)

    Anyway, unless I goofed, the { 'node1name':('node1name', other node 1 data) } analogy is
    not using two separate strings. It's more like the result of (untested!):

    s = 'node1name'
    graph.update({ s:(s, other node 1 data)}) # pseudo-tuple-element at graph[s][1] ;-)
    del s

    To check, I added the assert below, where it scans the node names in the
    graph.show() method, and it didn't complain:

    def show(self):
    node_names = self.keys()
    8< snip >8
    for node_name in node_names:
    assert node_name is self[node_name].name
    prtree(self[node_name])
    ret.append('')
    return '\n'.join(ret)

    Regards,
    Bengt Richter
  • Andy C at Aug 5, 2003 at 3:31 am

    is fine for me. But if I'm not mistaken, your solution still stores the
    node names in the values of the dictionary (via the Node objects). So in
    No, just references, not duplicate strings.
    OK, you're right -- I assumed that the key and value were duplicated, but
    that's not the case. I don't know why I thought that, maybe since the key
    must be immutable and the value is mutable. So I guess having a dictionary
    of the form { 'A': 'A', 'B': 'B' } is not as stupid as it first seems, since
    only a reference is stored. But still I wonder why the language doesn't
    have a facility for getting a reference to the key value in constant time.
    Apparently someone did it by modifying the C source for the dictionary to
    add a ref_key() accessor. It seems like it would be useful quite often.
    One caveat though, if your graph has cycles, changing the adjacency lists
    to direct node
    references instead of names will create reference cycles that may have to
    be broken for
    the gc. I have to look into that. You can comment that fixup loop out if
    you want to
    keep adjacency lists as name string references instead of node references.
    Good point... I thought though that the new python (2.3?) will GC data with
    cycles.
    BTW, have you looked at the sets module of 2.3? That would let you have a
    set of strings
    with no associated values, and no duplicates (but how do you represent
    adjacency lists then?)

    Actually, I did run across the sets type recently. It is actually built *on
    top of* dict, so it's no better. Ex:
    a = Set([1,2,3])
    a
    Set([1, 2, 3])
    a._data
    {1: True, 2: True, 3: True}

    It just stores a dummy value as the value of the dict.

    BTW I am using a regular dictionary for the adjacency list, with the list of
    destination edges. I was using another dictionary for the 'cache' of names.

    Thanks for the help.

    Andy
  • Aahz at Aug 5, 2003 at 3:05 pm
    In article <jyFXa.342$TC.22669471 at newssvr13.news.prodigy.com>,
    Andy C wrote:
    OK, you're right -- I assumed that the key and value were duplicated,
    but that's not the case. I don't know why I thought that, maybe since
    the key must be immutable and the value is mutable. So I guess having
    a dictionary of the form { 'A': 'A', 'B': 'B' } is not as stupid as it
    first seems, since only a reference is stored. But still I wonder why
    the language doesn't have a facility for getting a reference to the key
    value in constant time. Apparently someone did it by modifying the C
    source for the dictionary to add a ref_key() accessor. It seems like
    it would be useful quite often.
    Well, you're the first person I recall caring about this specific issue.
    Of course, general caching issues come up quite frequently. All the
    people I've seen wanting to use intern() come at it from a performance
    rather than memory perspective, for which a dict would be no use. ;-)
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    This is Python. We don't care much about theory, except where it intersects
    with useful practice. --Aahz
  • John Machin at Aug 6, 2003 at 12:33 am
    aahz at pythoncraft.com (Aahz) wrote in message news:<bgoh3r$knm$1 at panix1.panix.com>...
    In article <jyFXa.342$TC.22669471 at newssvr13.news.prodigy.com>,
    Andy C wrote:
    OK, you're right -- I assumed that the key and value were duplicated,
    but that's not the case. I don't know why I thought that, maybe since
    the key must be immutable and the value is mutable. So I guess having
    a dictionary of the form { 'A': 'A', 'B': 'B' } is not as stupid as it
    first seems, since only a reference is stored. But still I wonder why
    the language doesn't have a facility for getting a reference to the key
    value in constant time. Apparently someone did it by modifying the C
    source for the dictionary to add a ref_key() accessor. It seems like
    it would be useful quite often.
    Well, you're the first person I recall caring about this specific issue.
    Of course, general caching issues come up quite frequently. All the
    people I've seen wanting to use intern() come at it from a performance
    rather than memory perspective, for which a dict would be no use. ;-)
    Human recall can be defective, or input-deficient. I say again:

    Google("intern-like memory saver") (in this news group).

    As I said in that thread, when you run out of physical memory and you
    start to access your swapfile, you have a performance problem.

    As far as a dict being of no use, in my case I *already* have a dict,
    which is used for frequency counting, and the algorithm being used
    wants to maintain all of its data in memory as well. Thus being able
    to get a reference to the frequency dict key and store that in the
    data structure is a major win.

    These days I have an extension which merely subclasses dict to add on
    a key reference method and an increment method which does something
    like:

    def increment(self, key):
    if key in self:
    self[key] += 1
    else:
    self[key] = 1

    Regards,
    "someone"
  • Andy C at Aug 6, 2003 at 5:53 am

    These days I have an extension which merely subclasses dict to add on
    a key reference method and an increment method which does something
    like:
    You did this in C or python? If in python, how did you do it? (without
    keys()?) If in C, why don't you submit it to the source tree?

    thanks,
    Andy
  • John Machin at Aug 6, 2003 at 12:12 pm
    "Andy C" <ayc8NOSPAM at cornell.edu> wrote in message news:<IJ0Ya.969$Z5.84190448 at newssvr21.news.prodigy.com>...
    These days I have an extension which merely subclasses dict to add on
    a key reference method and an increment method which does something
    like:
    You did this in C or python? If in python, how did you do it? (without
    keys()?) If in C, why don't you submit it to the source tree?
    C. There were two people interested in having it in the core, me and Pat Malone.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 1, '03 at 12:26a
activeAug 6, '03 at 12:12p
posts19
users8
websitepython.org

People

Translate

site design / logo © 2022 Grokbase