FAQ
Folks:

Every couple of years I run into a problem where some Python code that
worked well at small scales starts burning up my CPU at larger scales,
and the underlying issue turns out to be the idiom of accumulating
data by string concatenation. It just happened again
(http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
to make the data accumulator efficient without introducing a bunch of
bugs into the surrounding code. So this time around I decided to try
encapsulating my preferred more efficient idiom into a reusable class.

So I present to you StringChain, which is an efficient way to
accumulate and process data in many chunks:

http://tahoe-lafs.org/trac/stringchain

Here are some benchmarks generated by running python -OOu -c 'from
stringchain.bench import bench; bench.quick_bench()' as instructed by
the README.txt file.

The N: in the left-hand column is how many bytes were in the test
dataset. The ave rate: number in the right-hand column is how many
bytes per second were processed. "naive" means the string-based idiom
sketched above and "strch" means using the StringChain class.

_buildup init_naive
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 890, ave
rate: 58350579
N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 265, ave
rate: 34800398
N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01,
2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 79, ave
rate: 20745346
N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05,
2th-worst: 0.05, worst: 0.05 (of 5), reps/s: 20, ave
rate: 10823850
_buildup init_strch
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 25543, ave
rate: 1674043282
N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 14179, ave
rate: 1858538925
N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 8016, ave
rate: 2101513050
N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4108, ave
rate: 2154215572
_consume init_naive_loaded
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 931, ave
rate: 61037862
N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 270, ave
rate: 35454393
N: 262144, time: best: 0.01, 2th-best: 0.01, ave: 0.01,
2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 74, ave
rate: 19471963
N: 524288, time: best: 0.05, 2th-best: 0.05, ave: 0.05,
2th-worst: 0.05, worst: 0.06 (of 5), reps/s: 19, ave
rate: 10146747
_consume init_strch_loaded
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 4309, ave
rate: 282447500
N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 2313, ave
rate: 303263357
N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 1186, ave
rate: 311159052
N: 524288, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 606, ave
rate: 317814669
_randomy init_naive
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 479, ave
rate: 31450561
N: 131072, time: best: 0.01, 2th-best: 0.01, ave: 0.01,
2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 140, ave
rate: 18461191
N: 262144, time: best: 0.02, 2th-best: 0.02, ave: 0.02,
2th-worst: 0.03, worst: 0.03 (of 5), reps/s: 42, ave
rate: 11127714
N: 524288, time: best: 0.06, 2th-best: 0.07, ave: 0.08,
2th-worst: 0.08, worst: 0.09 (of 5), reps/s: 13, ave
rate: 6906341
_randomy init_strch
N: 65536, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 973, ave
rate: 63827127
N: 131072, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 495, ave
rate: 64970669
N: 262144, time: best: 0.00, 2th-best: 0.00, ave: 0.00,
2th-worst: 0.00, worst: 0.00 (of 5), reps/s: 239, ave
rate: 62913360
N: 524288, time: best: 0.01, 2th-best: 0.01, ave: 0.01,
2th-worst: 0.01, worst: 0.01 (of 5), reps/s: 121, ave
rate: 63811569

The naive approach is slower than the StringChain class, and the
bigger the dataset the slower it goes. The StringChain class is fast
and also it is scalable (with regard to these benchmarks at least...).

Thanks!

Regards,

Zooko

Search Discussions

  • Paul Rubin at Mar 12, 2010 at 7:20 am

    "Zooko O'Whielacronx" <zookog at gmail.com> writes:
    Every couple of years I run into a problem where some Python code that
    worked well at small scales starts burning up my CPU at larger scales,
    and the underlying issue turns out to be the idiom of accumulating
    data by string concatenation.
    I usually use StringIO or cStringIO for that (python 2.x syntax):

    buf = cStringIO()
    buf.write("first thing")
    buf.write("second thing")
    result = buf.getvalue()

    Sometimes I like to use a generator instead:

    def stuff():
    yield "first thing"
    yield "second thing"

    result = ''.join(stuff())
  • Steven D'Aprano at Mar 12, 2010 at 8:52 am

    On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:

    Folks:

    Every couple of years I run into a problem where some Python code that
    worked well at small scales starts burning up my CPU at larger scales,
    and the underlying issue turns out to be the idiom of accumulating data
    by string concatenation.
    I don't mean to discourage you, but the simple way to avoid that is not
    to accumulate data by string concatenation.

    The usual Python idiom is to append substrings to a list, then once, at
    the very end, combine into a single string:


    accumulator = []
    for item in sequence:
    accumulator.append(process(item))
    string = ''.join(accumulator)

    It just happened again
    (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
    to make the data accumulator efficient without introducing a bunch of
    bugs into the surrounding code.
    I'm sorry, I don't agree about that at all. I've never come across a
    situation where I wanted to use string concatenation and couldn't easily
    modify it to use the list idiom above.

    [...]
    Here are some benchmarks generated by running python -OOu -c 'from
    stringchain.bench import bench; bench.quick_bench()' as instructed by
    the README.txt file.
    To be taken seriously, I think you need to compare stringchain to the
    list idiom. If your benchmarks favourably compare to that, then it might
    be worthwhile.



    --
    Steven
  • MRAB at Mar 12, 2010 at 1:40 pm

    Steven D'Aprano wrote:
    On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:

    Folks:

    Every couple of years I run into a problem where some Python code that
    worked well at small scales starts burning up my CPU at larger scales,
    and the underlying issue turns out to be the idiom of accumulating data
    by string concatenation.
    I don't mean to discourage you, but the simple way to avoid that is not
    to accumulate data by string concatenation.

    The usual Python idiom is to append substrings to a list, then once, at
    the very end, combine into a single string:


    accumulator = []
    for item in sequence:
    accumulator.append(process(item))
    string = ''.join(accumulator)

    It just happened again
    (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
    to make the data accumulator efficient without introducing a bunch of
    bugs into the surrounding code.
    I'm sorry, I don't agree about that at all. I've never come across a
    situation where I wanted to use string concatenation and couldn't easily
    modify it to use the list idiom above.

    [...]
    Here are some benchmarks generated by running python -OOu -c 'from
    stringchain.bench import bench; bench.quick_bench()' as instructed by
    the README.txt file.
    To be taken seriously, I think you need to compare stringchain to the
    list idiom. If your benchmarks favourably compare to that, then it might
    be worthwhile.
    IIRC, someone did some work on making concatenation faster by delaying
    it until a certain threshold had been reached (in the string class
    implementation).
  • Steven D'Aprano at Mar 13, 2010 at 3:06 am

    On Fri, 12 Mar 2010 13:40:23 +0000, MRAB wrote:

    To be taken seriously, I think you need to compare stringchain to the
    list idiom. If your benchmarks favourably compare to that, then it
    might be worthwhile.
    IIRC, someone did some work on making concatenation faster by delaying
    it until a certain threshold had been reached (in the string class
    implementation).
    I believe you're talking about this patch:

    http://bugs.python.org/issue1569040

    It's never been formally rejected, but the chances of it being accepted
    are pretty low.


    However, in Python 2.4 another optimization was added that makes string
    concatenation of the form:

    a = a + b
    a += b

    much faster. This is implementation specific (Jython and IronPython don't
    have it, and almost certainly won't) and it doesn't work for (e.g.):

    a = b + a

    See here:

    http://mail.python.org/pipermail/python-dev/2004-August/046686.html
    http://bugs.python.org/issue980695



    --
    Steven
  • Zooko O'Whielacronx at Mar 22, 2010 at 5:09 am
    Folks:

    I failed to make something sufficiently clear in my original message
    about StringChain. The use case that I am talking about is not simply
    that you need to accumulate a sequence of incoming chunks of data,
    concatenate them together, and then process the entire result. If that
    is all you need to do then indeed you can accumulate the incoming
    strings in a list and then run ''.join(thelist) when you have received
    all of them and are ready to process them.

    But the use case that I am talking about is where you need to
    accumulate new incoming strings into your buffer while alternately
    processing leading prefixes of the buffer. The typical example is that
    sequences of bytes are arriving on your TCP socket and you are
    processing the stream incrementally. You need to process the leading
    prefixes of the stream as soon as you can (e.g. as soon as a line
    followed by a newline has accumulated, or as soon as another complete
    element in your data format has accumulated, etc.); you can't simply
    wait until the bytes stop coming and then concatenate the entire
    collection together at once.

    This is exactly the current situation in foolscap [1] which is causing
    a performance problem in Tahoe-LAFS [2].

    To illustrate what I mean I cooked up some benchmarks showing the task
    of "accumulate a bunch of things then consume them in a single gulp"
    versus the task of "alternate between accumulating and consuming"
    (with accumulation going faster than consumption until the input
    stream runs dry).

    I implemented a few data structures for benchmarking: StringChain, the
    naive "accumulatorstr += newstr" approach (named "Stringy" in the
    benchmarks), one based on cStringIO (named "StringIOy"), one based on
    accumulating the new strings into a list and then doing
    ''.join(accumulatorlist) whenever you need to consume a leading prefix
    (called "SimplerStringChain").

    Below are the abbreviated results of the benchmark. You can easily run
    this benchmark yourself by following the README.txt in the StringChain
    source package [3]. These results show that for the load imposed by
    this benchmark StringChain is the only one of the four that scales and
    that it is also nice and fast.

    The left-hand column is the total number of bytes that were run
    through it. The results are shown in nanoseconds per byte in
    exponential notation. ("e+01" means times 10, "e+02" means times 100,
    and "e+03" means times 1000, etc.) Each result is the best of 10
    tries, or of 5 tries for the tasks which were taking too long to run
    it 10 times.

    Regards,

    Zooko

    [1] http://foolscap.lothar.com/trac/ticket/149
    [2] http://allmydata.org/pipermail/tahoe-dev/2010-March/004181.html
    [3] http://tahoe-lafs.org/trac/stringchain

    impl: StringChain
    task: _accumulate_then_one_gulp
    10000 best: 2.694e+00
    50000 best: 2.742e+00
    100000 best: 2.310e+00
    500000 best: 2.040e+00
    1000000 best: 1.988e+00
    5000000 best: 2.193e+00

    task: _alternate_str
    10000 best: 6.509e+00
    50000 best: 4.559e+00
    100000 best: 4.308e+00
    500000 best: 4.070e+00
    1000000 best: 3.991e+00
    5000000 best: 4.000e+00

    impl: SimplerStringChain
    task: _accumulate_then_one_gulp
    10000 best: 1.407e+00
    50000 best: 2.317e+00
    100000 best: 2.012e+00
    500000 best: 1.902e+00
    1000000 best: 1.897e+00
    5000000 best: 2.104e+00

    task: _alternate_str
    10000 best: 4.888e+00
    50000 best: 5.198e+00
    100000 best: 1.750e+01
    500000 best: 6.233e+01
    1000000 best: 1.134e+02
    5000000 best: 7.599e+02

    impl: StringIOy
    task: _accumulate_then_one_gulp
    10000 best: 4.196e+00
    50000 best: 5.522e+00
    100000 best: 4.499e+00
    500000 best: 3.756e+00
    1000000 best: 4.176e+00
    5000000 best: 5.414e+00

    task: _alternate_str
    10000 best: 5.484e+00
    50000 best: 7.863e+00
    100000 best: 2.126e+01
    500000 best: 6.972e+01
    1000000 best: 1.219e+02
    5000000 best: 9.463e+02

    task: _accumulate_then_one_gulp
    10000 best: 1.502e+00
    50000 best: 1.420e+01
    100000 best: 2.245e+01
    500000 best: 8.577e+01
    1000000 best: 2.295e+02
    5000000 best: 1.326e+03

    task: _alternate_str
    10000 best: 3.290e+00
    50000 best: 4.220e+00
    100000 best: 1.665e+01
    500000 best: 6.281e+01
    1000000 best: 1.127e+02
    5000000 best: 7.626e+02
  • Zooko O'Whielacronx at Mar 22, 2010 at 9:19 pm
    My apologies; I left out the heading on the last of the four
    structures in the benchmark results. Here are those results again with
    the missing heading (Stringy) inserted:

    Regards,

    Zooko
    - Hide quoted text -
    On Sun, Mar 21, 2010 at 11:09 PM, Zooko O'Whielacronx wrote:

    impl: StringChain
    task: _accumulate_then_one_gulp
    10000 best: 2.694e+00
    50000 best: 2.742e+00
    100000 best: 2.310e+00
    500000 best: 2.040e+00
    1000000 best: 1.988e+00
    5000000 best: 2.193e+00

    task: _alternate_str
    10000 best: 6.509e+00
    50000 best: 4.559e+00
    100000 best: 4.308e+00
    500000 best: 4.070e+00
    1000000 best: 3.991e+00
    5000000 best: 4.000e+00

    impl: SimplerStringChain
    task: _accumulate_then_one_gulp
    10000 best: 1.407e+00
    50000 best: 2.317e+00
    100000 best: 2.012e+00
    500000 best: 1.902e+00
    1000000 best: 1.897e+00
    5000000 best: 2.104e+00

    task: _alternate_str
    10000 best: 4.888e+00
    50000 best: 5.198e+00
    100000 best: 1.750e+01
    500000 best: 6.233e+01
    1000000 best: 1.134e+02
    5000000 best: 7.599e+02

    impl: StringIOy
    task: _accumulate_then_one_gulp
    10000 best: 4.196e+00
    50000 best: 5.522e+00
    100000 best: 4.499e+00
    500000 best: 3.756e+00
    1000000 best: 4.176e+00
    5000000 best: 5.414e+00

    task: _alternate_str
    10000 best: 5.484e+00
    50000 best: 7.863e+00
    100000 best: 2.126e+01
    500000 best: 6.972e+01
    1000000 best: 1.219e+02
    5000000 best: 9.463e+02
    impl: Stringy
    - Hide quoted text -
    task: _accumulate_then_one_gulp
    10000 best: 1.502e+00
    50000 best: 1.420e+01
    100000 best: 2.245e+01
    500000 best: 8.577e+01
    1000000 best: 2.295e+02
    5000000 best: 1.326e+03

    task: _alternate_str
    10000 best: 3.290e+00
    50000 best: 4.220e+00
    100000 best: 1.665e+01
    500000 best: 6.281e+01
    1000000 best: 1.127e+02
    5000000 best: 7.626e+02
  • Steven D'Aprano at Mar 22, 2010 at 8:07 am

    On Sun, 21 Mar 2010 23:09:46 -0600, Zooko O'Whielacronx wrote:

    But the use case that I am talking about is where you need to accumulate
    new incoming strings into your buffer while alternately processing
    leading prefixes of the buffer. [...]
    Below are the abbreviated results of the benchmark. You can easily run
    this benchmark yourself by following the README.txt in the StringChain
    source package [3]. These results show that for the load imposed by this
    benchmark StringChain is the only one of the four that scales and that
    it is also nice and fast.
    I was reading this email, and thinking "Why do you need this StringChain
    data structure, from the description it sounds like a job for
    collections.deque?"

    And funnily enough, following the links you supplied I came across this:

    "You can get the package from http://pypi.python.org/pypi/stringchain or
    with darcs get http://tahoe-lafs.org/source/stringchain/trunk. It has
    unit tests. It is in pure Python (it uses collections.deque and string)."

    http://tahoe-lafs.org/trac/stringchain


    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based on
    a reliable, component like deque.

    In any case, it looks to me that the StringChain data structure itself is
    a little too application specific for me to be personally interested in
    it. Maybe you should consider linking to it on PyPI and seeing if there
    is any interest from others?



    Regards,





    Steven
  • Zooko O'Whielacronx at Mar 22, 2010 at 9:21 pm

    On Mon, Mar 22, 2010 at 2:07 AM, Steven D'Aprano wrote:

    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based on
    a reliable, component like deque.
    The fact that StringChain uses deque to hold the queue of strings
    isn't that important. I just benchmarked it by swapping in the deque
    for a list and using the list costs about one third of a nanosecond
    per byte at the scales that the benchmark exercises (namely 10,000,000
    bytes in about 10,000 strings). A third of a nanosecond per byte is
    about 4% of the runtime.

    I also implemented and benchmarked a simpler deque-based scheme which
    puts all the actual bytes from the strings into a deque with
    self.d.extend(newstr). As you would expect, this shows good asymptotic
    performance but the constants are relatively bad so that at all of the
    actual loads measured it was three orders of magnitude worse than
    StringChain and than String-Chain-with-a-list-instead-of-a-deque.
    Moral: the constants matter!

    Those benchmarks are appended. You can run the benchmarks yourself per
    the README.txt.

    But anyway, I take your point and I updated the StringChain README to
    explain that it is a pretty simple data structure that holds a list
    (actually a deque) of strings and isn't anything too clever or novel.

    By the way, we could further micro-optimize this kind of thing if
    ''.join() would accept either strings or buffers instead of requiring
    strings:
    ''.join([buffer("abc"), "def"])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sequence item 0: expected string, buffer found

    Then whenever StringChain needs to slice a string into leading and
    trailing parts, it could construct a buffer() viewing each part
    instead of making a copy of each part.

    it. Maybe you should consider linking to it on PyPI and seeing if there
    is any interest from others?
    http://pypi.python.org/pypi/stringchain

    Regards,

    Zooko

    impl: StringChain
    task: _accumulate_then_one_gulp
    10000 best: 5.698e+00, 3th-best: 7.486e+00, mean: 7.758e+00,
    100000 best: 4.640e+00, 3th-best: 4.690e+00, mean: 7.674e+00,
    1000000 best: 3.789e+00, 3th-best: 3.806e+00, mean: 3.995e+00,
    10000000 best: 4.095e+00, 1th-best: 4.095e+00, mean: 4.095e+00,
    task: _alternate_str
    10000 best: 1.378e+01, 3th-best: 1.390e+01, mean: 1.500e+01,
    100000 best: 9.198e+00, 3th-best: 9.248e+00, mean: 9.385e+00,
    1000000 best: 8.715e+00, 3th-best: 8.755e+00, mean: 8.808e+00,
    10000000 best: 8.738e+00, 1th-best: 8.738e+00, mean: 8.738e+00,
    impl: StringChainWithList
    task: _accumulate_then_one_gulp
    10000 best: 3.600e+00, 3th-best: 3.695e+00, mean: 4.129e+00,
    100000 best: 4.070e+00, 3th-best: 4.079e+00, mean: 4.162e+00,
    1000000 best: 3.662e+00, 3th-best: 3.688e+00, mean: 3.721e+00,
    10000000 best: 3.902e+00, 1th-best: 3.902e+00, mean: 3.902e+00,
    1th-worst: 3.902e+00, worst: 3.902e+00 (of 1)
    task: _alternate_str
    10000 best: 1.369e+01, 3th-best: 1.380e+01, mean: 1.442e+01,
    100000 best: 9.251e+00, 3th-best: 9.289e+00, mean: 9.416e+00,
    1000000 best: 8.809e+00, 3th-best: 8.876e+00, mean: 8.943e+00,
    10000000 best: 9.095e+00, 1th-best: 9.095e+00, mean: 9.095e+00,
    impl: Dequey
    task: _accumulate_then_one_gulp
    10000 best: 2.772e+02, 3th-best: 2.785e+02, mean: 2.911e+02,
    100000 best: 2.314e+02, 3th-best: 2.334e+02, mean: 2.422e+02,
    1000000 best: 2.282e+02, 3th-best: 2.288e+02, mean: 2.370e+02,
    10000000 best: 2.587e+02, 1th-best: 2.587e+02, mean: 2.587e+02,
    task: _alternate_str
    10000 best: 1.576e+03, 3th-best: 1.580e+03, mean: 1.608e+03,
    100000 best: 1.301e+03, 3th-best: 1.303e+03, mean: 1.306e+03,
    1000000 best: 1.275e+03, 3th-best: 1.276e+03, mean: 1.278e+03,
    10000000 best: 1.280e+03, 1th-best: 1.280e+03, mean: 1.280e+03,
  • Lie Ryan at Mar 27, 2010 at 7:48 pm

    On 03/22/2010 07:07 PM, Steven D'Aprano wrote:
    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based on
    a reliable, component like deque.
    Well, technically StringChain is not a data structure in the first
    place. StringChain is a string; a string that is implemented using deque
    data structure to make appending algorithmically efficient. It is not a
    data structure, in the sense that I can't put arbitrary "thing" into the
    data structure. In the same sense, "string" is also not a data structure
    as "string" is an implementation of stream using "array" data structure;
    "StringChain" is an implementation of stream using "deque" data structure.
  • Steven D'Aprano at Mar 28, 2010 at 2:59 pm

    On Sun, 28 Mar 2010 06:48:21 +1100, Lie Ryan wrote:
    On 03/22/2010 07:07 PM, Steven D'Aprano wrote:
    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based
    on a reliable, component like deque.
    Well, technically StringChain is not a data structure in the first
    place. StringChain is a string;
    And strings are data structures, as are arrays and structs. Just because
    they're simple data structures made directly from primitives rather than
    rich, complex structures, doesn't mean they're not data structures.

    a string that is implemented using deque
    data structure to make appending algorithmically efficient. It is not a
    data structure, in the sense that I can't put arbitrary "thing" into the
    data structure.
    Any "thing" that can be pickled or serialised can be put into a string.



    --
    Steven
  • Lie Ryan at Mar 30, 2010 at 11:23 am

    On 03/29/2010 01:59 AM, Steven D'Aprano wrote:
    On Sun, 28 Mar 2010 06:48:21 +1100, Lie Ryan wrote:
    On 03/22/2010 07:07 PM, Steven D'Aprano wrote:
    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based
    on a reliable, component like deque.
    Well, technically StringChain is not a data structure in the first
    place. StringChain is a string;
    And strings are data structures, as are arrays and structs. Just because
    they're simple data structures made directly from primitives rather than
    rich, complex structures, doesn't mean they're not data structures.
    Array is a way to structure data and thus a data structure; array is the
    concept of structuring data using contiguous memory with elements
    addressed by an index. string is just a contiguous memory reserved to
    store data, or in other words: string is an array. This is what I meant
    when I said string is not itself a data structure.
    a string that is implemented using deque
    data structure to make appending algorithmically efficient. It is not a
    data structure, in the sense that I can't put arbitrary "thing" into the
    data structure.
    Any "thing" that can be pickled or serialised can be put into a string.
    Fair enough, you're right to think so but IMHO I disagree. Streams (or
    perhaps 'blob' to avoid the connotation of FIFO) are the data structure
    which you can put anything pickleable/serialisable into, but string (the
    implementation of stream/blob using array) are just a special case of
    array data structure and not a different, separate data structure than
    array.

    Perhaps I should not make such a bold claim as saying string is not data
    structure; what I have in mind is that string is just a special case of
    array and not distinctly separate data structure than array.
  • Steve Holden at Mar 30, 2010 at 1:46 pm

    Lie Ryan wrote:
    On 03/29/2010 01:59 AM, Steven D'Aprano wrote:
    On Sun, 28 Mar 2010 06:48:21 +1100, Lie Ryan wrote:
    On 03/22/2010 07:07 PM, Steven D'Aprano wrote:
    Perhaps you should have said that it was a wrapper around deque giving
    richer functionality, rather than giving the impression that it was a
    brand new data structure invented by you. People are naturally going to
    be more skeptical about a newly invented data structure than one based
    on a reliable, component like deque.
    Well, technically StringChain is not a data structure in the first
    place. StringChain is a string;
    And strings are data structures, as are arrays and structs. Just because
    they're simple data structures made directly from primitives rather than
    rich, complex structures, doesn't mean they're not data structures.
    Array is a way to structure data and thus a data structure; array is the
    concept of structuring data using contiguous memory with elements
    addressed by an index. string is just a contiguous memory reserved to
    store data, or in other words: string is an array. This is what I meant
    when I said string is not itself a data structure.
    a string that is implemented using deque
    data structure to make appending algorithmically efficient. It is not a
    data structure, in the sense that I can't put arbitrary "thing" into the
    data structure.
    Any "thing" that can be pickled or serialised can be put into a string.
    Fair enough, you're right to think so but IMHO I disagree. Streams (or
    perhaps 'blob' to avoid the connotation of FIFO) are the data structure
    which you can put anything pickleable/serialisable into, but string (the
    implementation of stream/blob using array) are just a special case of
    array data structure and not a different, separate data structure than
    array.

    Perhaps I should not make such a bold claim as saying string is not data
    structure; what I have in mind is that string is just a special case of
    array and not distinctly separate data structure than array.
    While this may be true conceptually it's certainly not true in Python
    (and in Python you need to be careful to distinguish between lists and
    arrays, since it does have both types).
    From the point of view of the language, strings are primitives. But
    every Python implementation uses a data structure to store them, and the
    structure is definitely not the same as is used to store lists, or
    arrays (which in Python are container types, whereas the string isn't
    because the individual indexable elements are immutable).

    As always, it's better to seek common ground than pick the nits: I don't
    think that there's really that much difference between your approach and
    Steven's, but he's a well-known nit-picker (and we are *sometimes*
    grateful for it).

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 12, '10 at 7:11a
activeMar 30, '10 at 1:46p
posts13
users7
websitepython.org

People

Translate

site design / logo © 2022 Grokbase