FAQ
Hi, all.

I'm a member of Japanese translate of Python document Project.
We complete translating Python 2.5 document last year and now
work for Python 2.6 Document.

I feel building document is slow a little. So I try to tune docutils
and Sphinx.

Attached patches make building document 30% faster.
(In my environ. 330sec -> 220sec roughly)

I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
Could anyone review these patch?

These patches changes following:

1. Use PyStemmer instead of PorterStemmer.
PorterStemmer is implemented Python and consumes about 50seconds
during buid.
PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
and consumes only 7 seconds.

But searchindex.js with PyStemmer is different to one with PorterStemmer.

2. Avoid building OptionParser many times.
Sphinx uses docutils.core.publish_parts() without `settings` argument
many times.
This causes building docutils.frontend.OptionParser many times and consumes
29 seconds.

3. Avoid building NestedStateMachine many times.
NestedStateMachine is built and destroyed many times.
Recycling that SM make significant performance gain.

== before ==
ncalls tottime percall cumtime percall filename:lineno(function)
25720/459 0.997 0.000 134.085 0.292
tools/docutils/statemachine.py:178(run)
92281/1513 1.420 0.000 133.935 0.089
tools/docutils/statemachine.py:384(check_line)
25720 0.184 0.000 89.628 0.003
tools/docutils/statemachine.py:129(__init__)
25720 0.632 0.000 89.444 0.003
tools/docutils/statemachine.py:448(add_states)
385800 1.665 0.000 88.813 0.000
tools/docutils/statemachine.py:436(add_state)
385800 2.356 0.000 85.287 0.000
tools/docutils/statemachine.py:928(__init__)
385800 1.793 0.000 82.931 0.000
tools/docutils/statemachine.py:566(__init__)

== after ==
ncalls tottime percall cumtime percall filename:lineno(function)
25720/459 1.051 0.000 68.175 0.149
tools/docutils/statemachine.py:178(run)
92281/1513 1.405 0.000 68.024 0.045
tools/docutils/statemachine.py:384(check_line)
6862 0.031 0.000 24.241 0.004
tools/docutils/statemachine.py:129(__init__)
6862 0.174 0.000 24.210 0.004
tools/docutils/statemachine.py:448(add_states)
102930 0.430 0.000 24.036 0.000
tools/docutils/statemachine.py:436(add_state)
102930 0.633 0.000 23.162 0.000
tools/docutils/statemachine.py:928(__init__)
102930 0.549 0.000 22.529 0.000
tools/docutils/statemachine.py:566(__init__)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sphinx.patch
Type: application/octet-stream
Size: 3930 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/doc-sig/attachments/20090404/f90eab5d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: docutils.patch
Type: application/octet-stream
Size: 1923 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/doc-sig/attachments/20090404/f90eab5d/attachment-0001.obj>

Search Discussions

  • Georg Brandl at Apr 4, 2009 at 2:42 pm

    ???? schrieb:
    Hi, all.

    I'm a member of Japanese translate of Python document Project.
    We complete translating Python 2.5 document last year and now
    work for Python 2.6 Document.

    I feel building document is slow a little. So I try to tune docutils
    and Sphinx.
    Great! I've already started tuning a bit with the docutils Node.traverse()
    patch, but did not do much more than that.
    Attached patches make building document 30% faster.
    (In my environ. 330sec -> 220sec roughly)

    I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
    Could anyone review these patch?
    I will, when I have a bit more time.
    These patches changes following:

    1. Use PyStemmer instead of PorterStemmer.
    PorterStemmer is implemented Python and consumes about 50seconds
    during buid.
    PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
    and consumes only 7 seconds.

    But searchindex.js with PyStemmer is different to one with PorterStemmer.
    This could be a problem. The client-side search implemented in JavaScript
    uses exactly the same stemmer (which is necessary to be able to find all
    words). In short, if you can find a C implementation of the Porter stemmer
    we could include it in Sphinx as an optional extension.
    2. Avoid building OptionParser many times.
    Sphinx uses docutils.core.publish_parts() without `settings` argument
    many times.
    This causes building docutils.frontend.OptionParser many times and consumes
    29 seconds.

    3. Avoid building NestedStateMachine many times.
    NestedStateMachine is built and destroyed many times.
    Recycling that SM make significant performance gain.
    I assume that both of this is in the second commit I see on bitbucket? Both
    look like a worthy optimization.

    Thanks,
    Georg

    --
    Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
    Four shall be the number of spaces thou shalt indent, and the number of thy
    indenting shall be four. Eight shalt thou not indent, nor either indent thou
    two, excepting that thou then proceed to four. Tabs are right out.
  • Naoki INADA at Apr 4, 2009 at 4:03 pm
    Hi Georg.
    Attached patches make building document 30% faster.
    (In my environ. 330sec -> 220sec roughly)

    I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
    Could anyone review these patch?
    I will, when I have a bit more time.
    Thank you.
    But searchindex.js with PyStemmer is different to one with PorterStemmer.
    This could be a problem. ?The client-side search implemented in JavaScript
    uses exactly the same stemmer (which is necessary to be able to find all
    words). ?In short, if you can find a C implementation of the Porter stemmer
    we could include it in Sphinx as an optional extension.
    I see.
    Original Porter Stemmer is here.
    http://tartarus.org/~martin/PorterStemmer/

    And that implemented in C. I'll try to make Python wrapper with swig and
    compare searchindex.js. Wait for a while.

    2. Avoid building OptionParser many times.
    Sphinx uses docutils.core.publish_parts() without `settings` argument
    many times.
    This causes building docutils.frontend.OptionParser many times and consumes
    29 seconds.

    3. Avoid building NestedStateMachine many times.
    NestedStateMachine is built and destroyed many times.
    Recycling that SM make significant performance gain.
    I assume that both of this is in the second commit I see on bitbucket? ?Both
    look like a worthy optimization.
    Former is in bitbucket.
    http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/

    And later is not in bitbucket because NestedStateMachine is not in Sphinx
    but docutils.

    --
    Naoki INADA <inada-n at klab.jp>
    KLab Inc. <http://www.klab.jp>
  • Naoki INADA at Apr 4, 2009 at 9:01 pm

    But searchindex.js with PyStemmer is different to one with PorterStemmer.
    This could be a problem. ?The client-side search implemented in JavaScript
    uses exactly the same stemmer (which is necessary to be able to find all
    words). ?In short, if you can find a C implementation of the Porter stemmer
    we could include it in Sphinx as an optional extension.
    I see.
    Original Porter Stemmer is here.
    http://tartarus.org/~martin/PorterStemmer/

    And that implemented in C. I'll try to make Python wrapper with swig and
    compare searchindex.js. Wait for a while.
    I make a Python wrapper!
    http://bitbucket.org/methane/porterstemmer/

    This is my first extension module, and still alpha version.
    But I can build Python document with the porterstemmer and searchindex.js is
    same to original.

    --
    Naoki INADA <inada-n at klab.jp>
    KLab Inc. <http://www.klab.jp>
  • Georg Brandl at Apr 9, 2009 at 8:13 pm

    Naoki INADA schrieb:

    2. Avoid building OptionParser many times.
    Sphinx uses docutils.core.publish_parts() without `settings` argument
    many times.
    This causes building docutils.frontend.OptionParser many times and consumes
    29 seconds.

    3. Avoid building NestedStateMachine many times.
    NestedStateMachine is built and destroyed many times.
    Recycling that SM make significant performance gain.
    I assume that both of this is in the second commit I see on bitbucket? Both
    look like a worthy optimization.
    Former is in bitbucket.
    http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/
    Thanks, merged! When porterstemmer is mature I'd also like to include it in
    the Sphinx distribution as an optional extension.
    And later is not in bitbucket because NestedStateMachine is not in Sphinx
    but docutils.
    OK, let's see. I'd first try to get the patch into docutils, after passing the
    tests. However, since most people will be using docutils 0.4 or 0.5 it might
    also make sense to make a monkey-patch version for sphinx, like the traverse one.

    Georg
  • Michael Foord at Apr 4, 2009 at 2:56 pm
    Hello,

    There is a docutils specific mailing list:

    docutils users <docutils-users at lists.sourceforge.net>

    You will need to subscribe from sourceforge, or you can just post your
    patch on sourceforge:

    http://docutils.sf.net

    Another patch was recently submitted by Georg Brandl offering a similar
    speedup. No idea if it is in the same area or not.

    All the best,


    Michael Foord

    ???? wrote:
    Hi, all.

    I'm a member of Japanese translate of Python document Project.
    We complete translating Python 2.5 document last year and now
    work for Python 2.6 Document.

    I feel building document is slow a little. So I try to tune docutils
    and Sphinx.

    Attached patches make building document 30% faster.
    (In my environ. 330sec -> 220sec roughly)

    I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
    Could anyone review these patch?

    These patches changes following:

    1. Use PyStemmer instead of PorterStemmer.
    PorterStemmer is implemented Python and consumes about 50seconds
    during buid.
    PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
    and consumes only 7 seconds.

    But searchindex.js with PyStemmer is different to one with PorterStemmer.

    2. Avoid building OptionParser many times.
    Sphinx uses docutils.core.publish_parts() without `settings` argument
    many times.
    This causes building docutils.frontend.OptionParser many times and consumes
    29 seconds.

    3. Avoid building NestedStateMachine many times.
    NestedStateMachine is built and destroyed many times.
    Recycling that SM make significant performance gain.

    == before ==
    ncalls tottime percall cumtime percall filename:lineno(function)
    25720/459 0.997 0.000 134.085 0.292
    tools/docutils/statemachine.py:178(run)
    92281/1513 1.420 0.000 133.935 0.089
    tools/docutils/statemachine.py:384(check_line)
    25720 0.184 0.000 89.628 0.003
    tools/docutils/statemachine.py:129(__init__)
    25720 0.632 0.000 89.444 0.003
    tools/docutils/statemachine.py:448(add_states)
    385800 1.665 0.000 88.813 0.000
    tools/docutils/statemachine.py:436(add_state)
    385800 2.356 0.000 85.287 0.000
    tools/docutils/statemachine.py:928(__init__)
    385800 1.793 0.000 82.931 0.000
    tools/docutils/statemachine.py:566(__init__)

    == after ==
    ncalls tottime percall cumtime percall filename:lineno(function)
    25720/459 1.051 0.000 68.175 0.149
    tools/docutils/statemachine.py:178(run)
    92281/1513 1.405 0.000 68.024 0.045
    tools/docutils/statemachine.py:384(check_line)
    6862 0.031 0.000 24.241 0.004
    tools/docutils/statemachine.py:129(__init__)
    6862 0.174 0.000 24.210 0.004
    tools/docutils/statemachine.py:448(add_states)
    102930 0.430 0.000 24.036 0.000
    tools/docutils/statemachine.py:436(add_state)
    102930 0.633 0.000 23.162 0.000
    tools/docutils/statemachine.py:928(__init__)
    102930 0.549 0.000 22.529 0.000
    tools/docutils/statemachine.py:566(__init__)

    ------------------------------------------------------------------------

    _______________________________________________
    Doc-SIG maillist - Doc-SIG at python.org
    http://mail.python.org/mailman/listinfo/doc-sig

    --
    http://www.ironpythoninaction.com/
    http://www.voidspace.org.uk/blog
  • Aahz at Apr 4, 2009 at 3:43 pm

    On Sat, Apr 04, 2009, Michael Foord wrote:
    There is a docutils specific mailing list:

    docutils users <docutils-users at lists.sourceforge.net>
    Actually, there are two docutils mailing lists, and I think that
    docutils-develop is probably more appropriate for this.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are, by
    definition, not smart enough to debug it." --Brian W. Kernighan
  • Naoki INADA at Apr 4, 2009 at 4:06 pm

    On Sat, Apr 04, 2009, Michael Foord wrote:
    There is a docutils specific mailing list:

    docutils users <docutils-users at lists.sourceforge.net>
    Actually, there are two docutils mailing lists, and I think that
    docutils-develop is probably more appropriate for this.
    OK. I'll subscribe both.

    --
    Naoki INADA <inada-n at klab.jp>
    KLab Inc. <http://www.klab.jp>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdoc-sig @
categoriespython
postedApr 4, '09 at 1:57p
activeApr 9, '09 at 8:13p
posts8
users4
websitepython.org

People

Translate

site design / logo © 2019 Grokbase