FAQ
I was recently bitten by the fact that the command:


   python -m foo


pulls in the module and attaches it as sys.modules['__main__'], but not to
sys.modules['foo']. Should the program also:


   import foo


it pulls in the same module code, but binds a completely independent separate
instance of it to sys.modules['foo']. This is counter intuitive; it is a
natural expectation that "python -m foo" imports "foo" in a normal fashion.


If the program modifies items in "foo", those modifications are not effected in
"__main__", since these are two distinct modules.


I propose that "python -m foo" imports foo as normal, binding it to
sys.modules["__main__"] as at present, but that it also binds the module to
sys.modules["foo"]. This will remove the disconnect between "python -m foo" and
a program's internal "import foo".


For people who are concerned that the modules .__name__ is "__main__", note
that the module's resolved "offical" name is present in .__spec__.name as
described in PEP 451.


There are two recent discussion threads on this in python-list at:


   https://mail.python.org/pipermail/python-list/2015-August/694905.html


and in python-ideas at:


   https://mail.python.org/pipermail/python-ideas/2015-August/034947.html


Please give them a read and give this PEP your thoughts.


The raw text of the PEP is below. It feels uncontroversial to me, but then it
would:-)


It is visible on the web here:


   https://www.python.org/dev/peps/pep-0499/


and I've made a public repository to track the text as it evolves here:


   https://bitbucket.org/cameron_simpson/pep-0499/


Cheers,
Cameron Simpson <cs@zip.com.au>


PEP: 499
Title: ``python -m foo`` should bind ``sys.modules['foo']`` in addition to ``sys.modules['__main__']``
Version: $Revision$
Last-Modified: $Date$
Author: Cameron Simpson <cs@zip.com.au>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 07-Aug-2015
Python-Version: 3.6


Abstract
========


When a module is used as a main program on the Python command line,
such as by:


     python -m module.name ...


it is easy to accidentally end up with two independent instances
of the module if that module is again imported within the program.
This PEP proposes a way to fix this problem.


When a module is invoked via Python's -m option the module is bound
to ``sys.modules['__main__']`` and its ``.__name__`` attribute is set to
``'__main__'``.
This enables the standard "main program" boilerplate code at the
bottom of many modules, such as::


     if __name__ == '__main__':
         sys.exit(main(sys.argv))


However, when the above command line invocation is used it is a
natural inference to presume that the module is actually imported
under its official name ``module.name``,
and therefore that if the program again imports that name
then it will obtain the same module instance.


That actuality is that the module was imported only as ``'__main__'``.
Another import will obtain a distinct module instance, which can
lead to confusing bugs.




Proposal
========


It is suggested that to fix this situation all that is needed is a
simple change to the way the ``-m`` option is implemented: in addition
to binding the module object to ``sys.modules['__main__']``, it is also
bound to ``sys.modules['module.name']``.


Nick Coghlan has suggested that this is as simple as modifying the
``runpy`` module's ``_run_module_as_main`` function as follows::


     main_globals = sys.modules["__main__"].__dict__


to instead be::


     main_module = sys.modules["__main__"]
     sys.modules[mod_spec.name] = main_module
     main_globals = main_module.__dict__




Considerations and Prerequisites
================================


Pickling Modules
----------------


Nick has mentioned `issue 19702`_ which proposes (quoted from the issue):


- runpy will ensure that when __main__ is executed via the import
   system, it will also be aliased in sys.modules as __spec__.name
- if __main__.__spec__ is set, pickle will use __spec__.name rather
   than __name__ to pickle classes, functions and methods defined in
   __main__
- multiprocessing is updated appropriately to skip creating __mp_main__
   in child processes when __main__.__spec__ is set in the parent
   process


The first point above covers this PEP's specific proposal.




Background
==========


`I tripped over this issue`_ while debugging a main program via a
module which tried to monkey patch a named module, that being the
main program module. Naturally, the monkey patching was ineffective
as it imported the main module by name and thus patched the second
module instance, not the running module instance.


However, the problem has been around as long as the ``-m`` command
line option and is encountered regularly, if infrequently, by others.


In addition to `issue 19702`_, the discrepancy around `__main__`
is alluded to in PEP 451 and a similar proposal (predating PEP 451)
is described in PEP 395 under `Fixing dual imports of the main module`_.




References
==========


.. _issue 19702: http://bugs.python.org/issue19702


.. _I tripped over this issue: https://mail.python.org/pipermail/python-list/2015-August/694905.html


.. _Fixing dual imports of the main module: https://www.python.org/dev/peps/pep-0395/#fixing-dual-imports-of-the-main-module




Copyright
=========


This document has been placed in the public domain.




..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:

Search Discussions

  • Chris Angelico at Aug 8, 2015 at 10:30 am

    On Sat, Aug 8, 2015 at 7:49 PM, Cameron Simpson wrote:
    The raw text of the PEP is below. It feels uncontroversial to me, but then
    it would:-)

    I'm not sure that it'll be uncontroversial, but I agree with it :)


    The risk that I see (as I mentioned in the previous thread, but
    reiterating for those who just came in) is that it becomes possible to
    import something whose __name__ is not what you imported. Currently,
    you can "import math" and see that math.__name__ is "math", or "import
    urllib.parse" and, as you'd expect, urllib.parse.__name__ is
    "urllib.parse". In the few cases where it isn't exactly what you
    imported, it's the canonical name for it - for instance,
    os.path.__name__ is posixpath on my system. The change proposed here
    means that the canonical name for the module you're running as the
    main file is now "__main__", and not whatever else it would have been.


    Consequences for pickle/multiprocessing/Windows are mentioned in the
    PEP. Are there any other places where a module's name is checked?


    ChrisA
  • Cameron Simpson at Aug 8, 2015 at 11:18 pm

    On 08Aug2015 20:30, Chris Angelico wrote:
    On Sat, Aug 8, 2015 at 7:49 PM, Cameron Simpson wrote:
    The raw text of the PEP is below. It feels uncontroversial to me, but then
    it would:-)
    I'm not sure that it'll be uncontroversial, but I agree with it :)

    The risk that I see (as I mentioned in the previous thread, but
    reiterating for those who just came in) is that it becomes possible to
    import something whose __name__ is not what you imported. Currently,
    you can "import math" and see that math.__name__ is "math", or "import
    urllib.parse" and, as you'd expect, urllib.parse.__name__ is
    "urllib.parse". In the few cases where it isn't exactly what you
    imported, it's the canonical name for it - for instance,
    os.path.__name__ is posixpath on my system. The change proposed here
    means that the canonical name for the module you're running as the
    main file is now "__main__", and not whatever else it would have been.

    I think I take the line that as of PEP 451 the conanical name for a module is
    .__spec__.name. The module's .__name__ normally matches that, but obviously in
    the case of "python -m" it does not.


    As you point out, suddenly a module can appear somewhere other than
    sys.modules['__main__'] where that difference shows.


    Let's ask the associated question: who introspects module.__name__ and expects
    it to be the cononical name? For what purpose?


    I'm of the opinion that those cases are few, and that they should in any case
    be updated to consult .__spec__.name these days (with, I suppose, fallback for
    older Python versions). I think that is the case even without the change
    suggested by PEP 499.


    Cheers,
    Cameron Simpson <cs@zip.com.au>
  • Andrew Barnert at Aug 9, 2015 at 5:12 am

    On Aug 8, 2015, at 16:18, Cameron Simpson wrote:
    I think I take the line that as of PEP 451 the conanical name for a module is .__spec__.name. The module's .__name__ normally matches that, but obviously in the case of "python -m" it does not.

    As you point out, suddenly a module can appear somewhere other than sys.modules['__main__'] where that difference shows.

    Let's ask the associated question: who introspects module.__name__ and expects it to be the cononical name? For what purpose?

    I'd think the first place to look is code that deals directly with module objects and/or sys.modules--graphical debuggers, plugin frameworks, bridges (a la AppScript or PyObjC), etc. Especially since many of them want to retain compatibility with 3.3, if not 3.2, and to share as much code as possible with a 2.x version


    Of course you're probably right that there aren't too many such things, and they're also presumably written by people who know what they're doing and wouldn't have too much trouble adapting them for 3.6+ if needed.
  • Joseph Jevnik at Aug 9, 2015 at 7:05 am
    If I have a package that defines both a __main__ and a __init__, then your
    change would bind the __main__ to the name instead of the __init__. That
    seems incorrect.


    On Sun, Aug 9, 2015 at 1:12 AM, Andrew Barnert via Python-ideas wrote:

    On Aug 8, 2015, at 16:18, Cameron Simpson wrote:
    I think I take the line that as of PEP 451 the conanical name for a
    module is .__spec__.name. The module's .__name__ normally matches that, but
    obviously in the case of "python -m" it does not.
    As you point out, suddenly a module can appear somewhere other than
    sys.modules['__main__'] where that difference shows.
    Let's ask the associated question: who introspects module.__name__ and
    expects it to be the cononical name? For what purpose?

    I'd think the first place to look is code that deals directly with module
    objects and/or sys.modules--graphical debuggers, plugin frameworks, bridges
    (a la AppScript or PyObjC), etc. Especially since many of them want to
    retain compatibility with 3.3, if not 3.2, and to share as much code as
    possible with a 2.x version

    Of course you're probably right that there aren't too many such things,
    and they're also presumably written by people who know what they're doing
    and wouldn't have too much trouble adapting them for 3.6+ if needed.

    _______________________________________________
    Python-ideas mailing list
    Python-ideas at python.org
    https://mail.python.org/mailman/listinfo/python-ideas
    Code of Conduct: http://python.org/psf/codeofconduct/
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150809/e08852ff/attachment.html>
  • Cameron Simpson at Aug 9, 2015 at 10:34 am

    On 09Aug2015 03:05, Joseph Jevnik wrote:
    If I have a package that defines both a __main__ and a __init__, then your
    change would bind the __main__ to the name instead of the __init__. That
    seems incorrect.

    Yes. Yes it does.


    I just did a quick test package named "testmod" via "python -m testmod" and:


    - __init__.py has the __name__ "testmod"
    - __main__.py has the __name__ "__main__"


    in both python 2.7 and python 3.4.


    Since my test script reports:


       % python3.4 -m testmod
       __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod
       __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__
       % python2.7 -m testmod
       ('__init__.py:', '/Users/cameron/rc/python/testmod/__init__.pyc', 'testmod')
       ('__main__.py:', '/Users/cameron/rc/python/testmod/__main__.py', '__main__')


    would it be enough to say that this change should only apply if the module is
    not a package?


    I'll do some more fiddling to see exactly what happens in packages when I
    import pieces of them, too.


    Cheers,
    Cameron Simpson <cs@zip.com.au>
  • Cameron Simpson at Aug 9, 2015 at 10:48 pm

    On 09Aug2015 20:34, Cameron Simpson wrote:
    On 09Aug2015 03:05, Joseph Jevnik wrote:
    If I have a package that defines both a __main__ and a __init__, then your
    change would bind the __main__ to the name instead of the __init__. That
    seems incorrect.
    Yes. Yes it does. [...]
    would it be enough to say that this change should only apply if the module is
    not a package?

    I append the code for my testmod below, being an __init__.py and a __main__.py.
    A run shows:


       % python3.4 -m testmod
       __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod testmod
       __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__ testmod.__main__
       __main__ <module 'testmod.__main__' from '/Users/cameron/rc/python/testmod/__main__.py'>
       testmod <module 'testmod' from '/Users/cameron/rc/python/testmod/__init__.py'>


    (4 lines, should your mailer fold the output.)


    It seems to me that Python already does the "right thing" for packages, and it
    is only non-package modules which need the change proposed by the PEP.


    Comments please?


    Code below.


    Cheers,
    Cameron Simpson <cs@zip.com.au>


    testmod/__init__.py:
         #!/usr/bin/python
         print('__init__.py:', __file__, __name__, __spec__.name)


    testmod/__main__.py:
         #!/usr/bin/python
         import pprint
         import sys
         print('__main__.py:', __file__, __name__, __spec__.name)
         for modname, mod in sorted(sys.modules.items()):
           rmod = repr(mod)
           if 'testmod' in modname or 'testmod' in rmod:
             print(modname, rmod)
  • Joseph Jevnik at Aug 9, 2015 at 11:33 pm
    I would be okay if this change did not affect execution of a package with
    the python -m flag. I was only concerned because a __main__ in a package is
    common and wanted to make sure you had addressed it.


    On Sun, Aug 9, 2015 at 6:48 PM, Cameron Simpson wrote:

    On 09Aug2015 20:34, Cameron Simpson wrote:
    On 09Aug2015 03:05, Joseph Jevnik wrote:

    If I have a package that defines both a __main__ and a __init__, then
    your
    change would bind the __main__ to the name instead of the __init__. That
    seems incorrect.
    Yes. Yes it does. [...]
    would it be enough to say that this change should only apply if the
    module is not a package?
    I append the code for my testmod below, being an __init__.py and a
    __main__.py. A run shows:

    % python3.4 -m testmod
    __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod testmod
    __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__
    testmod.__main__
    __main__ <module 'testmod.__main__' from
    '/Users/cameron/rc/python/testmod/__main__.py'>
    testmod <module 'testmod' from
    '/Users/cameron/rc/python/testmod/__init__.py'>

    (4 lines, should your mailer fold the output.)

    It seems to me that Python already does the "right thing" for packages,
    and it is only non-package modules which need the change proposed by the
    PEP.

    Comments please?

    Code below.

    Cheers,
    Cameron Simpson <cs@zip.com.au>

    testmod/__init__.py:
    #!/usr/bin/python
    print('__init__.py:', __file__, __name__, __spec__.name)

    testmod/__main__.py:
    #!/usr/bin/python
    import pprint
    import sys
    print('__main__.py:', __file__, __name__, __spec__.name)
    for modname, mod in sorted(sys.modules.items()):
    rmod = repr(mod)
    if 'testmod' in modname or 'testmod' in rmod:
    print(modname, rmod)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150809/fb22429f/attachment.html>
  • Cameron Simpson at Aug 10, 2015 at 10:49 am

    On 09Aug2015 19:33, Joseph Jevnik wrote:
    On 09Aug2015 20:34, Cameron Simpson wrote:
    On 09Aug2015 03:05, Joseph Jevnik wrote:
    If I have a package that defines both a __main__ and a __init__, then
    your change would bind the __main__ to the name instead of the __init__.
    That seems incorrect.
    Yes. Yes it does. [...]
    would it be enough to say that this change should only apply if the
    module is not a package?
    I would be okay if this change did not affect execution of a package with
    the python -m flag. I was only concerned because a __main__ in a package is
    common and wanted to make sure you had addressed it.

    Good point. Please see if this update states your issue fairly and addresses
    it:


       https://bitbucket.org/cameron_simpson/pep-0499/commits/3efcd9b54e238a1ff7f5c5df805df139d6cb5a30


    Cheers,
    Cameron Simpson <cs@zip.com.au>
  • Cameron Simpson at Aug 10, 2015 at 10:13 am

    On 08Aug2015 22:12, Andrew Barnert wrote:
    On Aug 8, 2015, at 16:18, Cameron Simpson wrote:
    I think I take the line that as of PEP 451 the conanical name for a module is .__spec__.name. The module's .__name__ normally matches that, but obviously in the case of "python -m" it does not.

    As you point out, suddenly a module can appear somewhere other than sys.modules['__main__'] where that difference shows.

    Let's ask the associated question: who introspects module.__name__ and expects it to be the cononical name? For what purpose?
    I'd think the first place to look is code that deals directly with module objects and/or sys.modules--graphical debuggers, plugin frameworks, bridges (a la AppScript or PyObjC), etc. Especially since many of them want to retain compatibility with 3.3, if not 3.2, and to share as much code as possible with a 2.x version

    Of course you're probably right that there aren't too many such things, and they're also presumably written by people who know what they're doing and wouldn't have too much trouble adapting them for 3.6+ if needed.

    One might hope. So I've started with the stdlib in two passes: looking for
    .__name__ associated with "mod", and looking for __main__ not in the standard
    boilerplate (__name__ == '__main__').


    Obviously all this code is unfamiliar to me so anyone with deeper understanding
    who wants to look is most welcome.


    Pass 1 with this command:


       find . -type f -name \*.py | xargs fgrep .__name__ /dev/null | grep mod


    to look for module related code using .__name__. Of course a lot of it is
    reporting, but there are some interesting relevant bits.


    doctest:


    This refers to module.__name__ quite a lot. The _normalize_module() function
    uses __name__ instead of __spec__.name. _from_module() tests is an object is
    defined in a particular module based on __name__; I'm (naively) surprised that
    this can't use "is", but it looks like an object's __module__ attribute is a
    string, which I imagine avoids circular references. _get_test() uses __name__
    instead of __spec__.name, though only as a fallback if there is no __file__.
    SkipDocTestCase.shortDescription() uses __name__.


    importlib: mostly seems fine according to my shallow understanding?


    inspect: getmodule() seems correct (uses __name__ but seems correctish) - this
    does seem to be a grope around in the available places looking for a match
    function, and feels unreliable anyway.


    modulefinder: this does look like it could use __spec__.name more widely, or as
    an adjunct to __name__. scan_code() looks like another "grope around" function
    trying to infer structure from the pieces sitting about:-)


    pdb: Pdb.do_whatis definitely reports using .__name__. Not necessarily
    incorrect.


    pkgutils: get_loader() uses .__name__, probably ougtht to be __spec__.name


    pydoc: also probably should upgrade to .__spec__.name


    unittest: TestLoader.discover seems to rely on __name__ instead of
    __spec__.name while constructing a pathname; definitely seems like it needs
    updating for PEP 451. It also looks up __name__ in sys.builtin_module_names to
    reject constructing a pathname.


    Pass 2 with this command:


       find . -type f -name \*.py |xxargs fgrep __main__ | grep -v 'if *__name__ *== *["'\'']__main__'


    looking for __main__ but discarding the boilerplate.


    I'm actually striking out here. Since this PEP doesn't change __name__ ==
    '__main__' I've not found anything here that looks like it would stop working.
    Even runpy, surcory though my look at it is, is going forward: setting __name__
    to '__main__' instead of working backwards.


    Further thoughts?


    Cheers,
    Cameron Simpson <cs@zip.com.au>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-ideas @
categoriespython
postedAug 8, '15 at 9:49a
activeAug 10, '15 at 10:49a
posts10
users4
websitepython.org

People

Translate

site design / logo © 2019 Grokbase