FAQ
One of the fixes PEP 395 (module aliasing) proposes is to make running
modules inside packages by filename work correctly (i.e. without
breaking relative imports and without getting the directory where the
module lives directly on sys.path which can lead to unexpected name
clashes). The PEP currently states [1] that this can be made to work
with both PEP 382 and PEP 402

In current Python, fixing this just involves checking for a colocated
__init__.py file. If we find one, then we work our way up the
directory hierarchy until we find a directory without an __init__.py
file, put *that* on sys.path, then (effectively) rewrite the command
line as if the -m switch had been used.

The extension to the current version of PEP 382 is clear - we just
accept both an __init__.py file and a .pyp extension as indicating
"this is part of a Python package", but otherwise the walk back up the
filesystem hierarchy to decide which directory to add to sys.path
remains unchanged.

However, I'm no longer convinced that this concept can actually be
made to work in the context of PEP 402:

1. We can't use sys.path, since we're trying to figure out which
directory we want to *add* to sys.path
2. We can't use "contains a Python module", since PEP 402 allows
directories inside packages that only contain subpackages (only the
leaf directories are required to contain valid Python modules), so we
don't know the significance of an empty directory without already
knowing what is on sys.path!

So, without a clear answer to the question of "from module X, inside
package (or package portion) Y, find the nearest parent directory that
should be placed on sys.path" in a PEP 402 based world, I'm switching
to supporting PEP 382 as my preferred approach to namespace packages.
In this case, I think "explicit is better than implicit" means, "given
only a filesystem hierarchy, you should be able to figure out the
Python package hierarchy it contains". Only explicit markers (either
files or extensions) let you do that - with PEP 402, the filesystem
doesn't contain enough information to figure it out, you need to also
know the contents of sys.path.

Regards,
Nick.

[1] http://www.python.org/dev/peps/pep-0395/#fixing-direct-execution-inside-packages
--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Search Discussions

  • Eric Snow at Nov 16, 2011 at 8:15 am

    On Tue, Nov 15, 2011 at 11:29 PM, Nick Coghlan wrote:
    One of the fixes PEP 395 (module aliasing) proposes is to make running
    modules inside packages by filename work correctly (i.e. without
    breaking relative imports and without getting the directory where the
    module lives directly on sys.path which can lead to unexpected name
    clashes). The PEP currently states [1] that this can be made to work
    with both PEP 382 and PEP 402

    In current Python, fixing this just involves checking for a colocated
    __init__.py file. If we find one, then we work our way up the
    directory hierarchy until we find a directory without an __init__.py
    file, put *that* on sys.path, then (effectively) rewrite the command
    line as if the -m switch had been used.

    The extension to the current version of PEP 382 is clear - we just
    accept both an __init__.py file and a .pyp extension as indicating
    "this is part of a Python package", but otherwise the walk back up the
    filesystem hierarchy to decide which directory to add to sys.path
    remains unchanged.

    However, I'm no longer convinced that this concept can actually be
    made to work in the context of PEP 402:

    1. We can't use sys.path, since we're trying to figure out which
    directory we want to *add* to sys.path
    2. We can't use "contains a Python module", since PEP 402 allows
    directories inside packages that only contain subpackages (only the
    leaf directories are required to contain valid Python modules), so we
    don't know the significance of an empty directory without already
    knowing what is on sys.path!

    So, without a clear answer to the question of "from module X, inside
    package (or package portion) Y, find the nearest parent directory that
    should be placed on sys.path" in a PEP 402 based world, I'm switching
    to supporting PEP 382 as my preferred approach to namespace packages.
    In this case, I think "explicit is better than implicit" means, "given
    only a filesystem hierarchy, you should be able to figure out the
    Python package hierarchy it contains". Only explicit markers (either
    files or extensions) let you do that - with PEP 402, the filesystem
    doesn't contain enough information to figure it out, you need to also
    know the contents of sys.path.
    Ouch. What about the following options?

    Indicator for the top-level package? No
    Leverage __pycache__? No

    Merge in the idea from PEP 382 of special directory names? To borrow
    an example from PEP 3147:

    alpha.pyp/
    one.py
    two.py
    beta.py
    beta.pyp/
    three.py
    four.py

    So package directories are explicitly marked but PEP 402 otherwise
    continues as-is. I'll have to double-check, but I don't think we
    tried this angle already.

    -eric

    Regards,
    Nick.

    [1] http://www.python.org/dev/peps/pep-0395/#fixing-direct-execution-inside-packages
    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
  • PJ Eby at Nov 16, 2011 at 3:08 pm

    On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote:

    So, without a clear answer to the question of "from module X, inside
    package (or package portion) Y, find the nearest parent directory that
    should be placed on sys.path" in a PEP 402 based world, I'm switching
    to supporting PEP 382 as my preferred approach to namespace packages.
    In this case, I think "explicit is better than implicit" means, "given
    only a filesystem hierarchy, you should be able to figure out the
    Python package hierarchy it contains". Only explicit markers (either
    files or extensions) let you do that - with PEP 402, the filesystem
    doesn't contain enough information to figure it out, you need to also
    know the contents of sys.path.
    After spending an hour or so reading through PEP 395 and trying to grok
    what it's doing, I actually come to the opposite conclusion: that PEP 395
    is violating the ZofP by both guessing, and not encouraging One Obvious Way
    of invoking scripts-as-modules.

    For example, if somebody adds an __init__.py to their project directory,
    suddenly scripts that worked before will behave differently under PEP 395,
    creating a strange bit of "spooky action at a distance". (And yes, people
    add __init__.py files to their projects in odd places -- being setuptools
    maintainer, you get to see a LOT of weird looking project layouts.)

    While I think the __qname__ idea is fine, and it'd be good to have a way to
    avoid aliasing main (suggestion for how included below), I think that
    relative imports failing from inside a main module should offer an error
    message suggesting you use "-m" if you're running a script that's within a
    package, since that's the One Obvious Way of running a script that's also a
    module. (Albeit not obvious unless you're Dutch. ;-) )

    For the import aliasing case, AFAICT it's only about cases where __name__
    == '__main__', no? Why not just save the file/importer used for __main__,
    and then have the import machinery check whether a module being imported is
    about to alias __main__? For that, you don't need to know in *advance*
    what the qualified name of __main__ is - you just spot it the first time
    somebody re-imports it.

    I think removing qname-quessing from PEP 395 (and replacing it with
    instructive/google-able error messages) would be an unqualified
    improvement, independent of what happens to PEPs 382 and 402.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20111116/df3c7be9/attachment.html>
  • Eric Snow at Nov 16, 2011 at 6:21 pm

    On Wed, Nov 16, 2011 at 8:08 AM, PJ Eby wrote:
    On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote:

    So, without a clear answer to the question of "from module X, inside
    package (or package portion) Y, find the nearest parent directory that
    should be placed on sys.path" in a PEP 402 based world, I'm switching
    to supporting PEP 382 as my preferred approach to namespace packages.
    In this case, I think "explicit is better than implicit" means, "given
    only a filesystem hierarchy, you should be able to figure out the
    Python package hierarchy it contains". Only explicit markers (either
    files or extensions) let you do that - with PEP 402, the filesystem
    doesn't contain enough information to figure it out, you need to also
    know the contents of sys.path.
    After spending an hour or so reading through PEP 395 and trying to grok what
    it's doing, I actually come to the opposite conclusion: that PEP 395 is
    violating the ZofP by both guessing, and not encouraging One Obvious Way of
    invoking scripts-as-modules.
    For example, if somebody adds an __init__.py to their project directory,
    suddenly scripts that worked before will behave differently under PEP 395,
    creating a strange bit of "spooky action at a distance". ?(And yes, people
    add __init__.py files to their projects in odd places -- being setuptools
    maintainer, you get to see a LOT of weird looking project layouts.)
    While I think the __qname__ idea is fine, and it'd be good to have a way to
    avoid aliasing main (suggestion for how included below), I think that
    relative imports failing from inside a main module should offer an error
    message suggesting you use "-m" if you're running a script that's within a
    package, since that's the One Obvious Way of running a script that's also a
    module. ?(Albeit not obvious unless you're Dutch. ?;-) )
    For the import aliasing case,?AFAICT it's only about cases where __name__ ==
    '__main__', no? ?Why not just save the file/importer used for __main__, and
    then have the import machinery check whether a module being imported is
    about to alias __main__? ?For that, you don't need to know in *advance* what
    the qualified name of __main__ is - you just spot it the first time somebody
    re-imports it.
    I think removing qname-quessing from PEP 395 (and replacing it with
    instructive/google-able error messages) would be an unqualified improvement,
    independent of what happens to PEPs 382 and 402.
    But which is more astonishing (POLA and all that): running your module
    in Python, it behaves differently than when you import it (especially
    __name__); or you add an __init__.py to a directory and your *scripts*
    there start to behave differently?

    When I was learning Python, it took quite a while before I realized
    that modules are imported and scripts are passed at the commandline;
    and to understand the whole __main__ thing. It has always been a
    pain, particularly when I wanted to
    just check a module really quickly for errors.

    However, lately I've actually taken to the idea that it's better to
    write a test script that imports the module and running that, rather
    than running the module itself. But that came with the understanding
    that the module doesn't go through the import machinery when you *run*
    it, which I don't think is obvious, particularly to beginners. So
    Nick's solution, to me, is an appropriate concession to the reality
    that most folks will expect Python to treat their modules like modules
    and their scripts like scripts.

    Still, this actually got me wishing there were a way to customize
    script-running the same way you can customize import with __import__
    and import hooks.

    -eric

    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
  • PJ Eby at Nov 16, 2011 at 8:06 pm

    On Wed, Nov 16, 2011 at 1:21 PM, Eric Snow wrote:

    But which is more astonishing (POLA and all that): running your module
    in Python, it behaves differently than when you import it (especially
    __name__); or you add an __init__.py to a directory and your *scripts*
    there start to behave differently?
    To me it seems that the latter is more astonishing because there's less
    connection between your action and the result. If you're running something
    differently, it makes more sense that it acts differently, because you've
    changed what you're *doing*. In the scripts case, you haven't changed how
    you run the scripts, and you haven't changed the scripts, so the change in
    behavior seems to appear out of nowhere.


    When I was learning Python, it took quite a while before I realized
    that modules are imported and scripts are passed at the commandline;
    and to understand the whole __main__ thing.

    It doesn't seem to me that PEP 395 fixes this problem. In order to
    *actually* fix it, we'd need to have some sort of "package" statement like
    in other languages - then you'd declare right there in the code what
    package it's supposed to be part of.


    It has always been a pain, particularly when I wanted to
    just check a module really quickly for errors.
    What, specifically, was a pain? That information might be of more use in
    determining a solution.

    If you mean that you had other modules importing the module that was also
    __main__, then I agree that having a solution for __main__-aliasing is a
    good idea. I just think it might be more cleanly fixed by checking whether
    the __file__ of a to-be-imported module is going to end up matching
    __main__.__file__, and if so, alias __main__ instead.


    However, lately I've actually taken to the idea that it's better to
    write a test script that imports the module and running that, rather
    than running the module itself. But that came with the understanding
    that the module doesn't go through the import machinery when you *run*
    it, which I don't think is obvious, particularly to beginners. So
    Nick's solution, to me, is an appropriate concession to the reality
    that most folks will expect Python to treat their modules like modules
    and their scripts like scripts.
    You lost me there: if most people don't understand the difference, then why
    are they expecting a difference?
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20111116/d0c4c806/attachment.html>
  • Eric Snow at Nov 16, 2011 at 10:41 pm

    On Wed, Nov 16, 2011 at 1:06 PM, PJ Eby wrote:
    On Wed, Nov 16, 2011 at 1:21 PM, Eric Snow wrote:

    But which is more astonishing (POLA and all that): running your module
    in Python, it behaves differently than when you import it (especially
    __name__); or you add an __init__.py to a directory and your *scripts*
    there start to behave differently?
    To me it seems that the latter is more astonishing because there's less
    connection between your action and the result. ?If you're running something
    differently, it makes more sense that it acts differently, because you've
    changed what you're *doing*. ?In the scripts case, you haven't changed how
    you run the scripts, and you haven't changed the scripts, so the change in
    behavior seems to appear out of nowhere.
    Well, then I suppose both are astonishing and, for me at least, the
    module-as-script side of it has bit me more. Regardless, both are a
    consequence of the script vs. module situation.
    When I was learning Python, it took quite a while before I realized
    that modules are imported and scripts are passed at the commandline;
    and to understand the whole __main__ thing.
    It doesn't seem to me that PEP 395 fixes this problem. ?In order to
    *actually* fix it, we'd need to have some sort of "package" statement like
    in other languages - then you'd declare right there in the code what package
    it's supposed to be part of.
    Certainly an effective indicator that a file's a module and not a
    script. Still, I'd rather we find a way to maintain the
    filesystem-based package approach we have now. It's nice not having
    to look in each file to figure out the package it belongs to or if
    it's a script or not.

    The consequence is that a package that's spread across multiple
    directories is likewise addressed through the filesystem, hence PEPs
    382 and 402. However, the namespace package issue is a separate one
    from script-vs-module.
    ?It has always been a?pain, particularly when I wanted to
    ?just check a module really quickly for errors.
    What, specifically, was a pain? ?That information might be of more use in
    determining a solution.

    If you mean that you had other modules importing the module that was also
    __main__, then I agree that having a solution for __main__-aliasing is a
    good idea.
    PEP 395 spells out several pretty well. Additionally, running a
    module as a script can cause trouble if your module otherwise relies
    on the value of __name__. Finally, sometimes I rely on a module
    triggering an import hook, though that is likely a problem just for
    me.
    ?I just think it might be more cleanly fixed by checking whether
    the __file__ of a to-be-imported module is going to end up matching
    __main__.__file__, and if so, alias __main__ instead.
    Currently the only promise regarding __file__ is that it will be set
    on module object once the module has been loaded but before the
    implicit binding for the import statement. So, unless I'm mistaken,
    that would have to change to allow for import hooks. Otherwise, sure.
    However, lately I've actually taken to the idea that it's better to
    write a test script that imports the module and running that, rather
    than running the module itself. ?But that came with the understanding
    that the module doesn't go through the import machinery when you *run*
    it, which I don't think is obvious, particularly to beginners. ?So
    Nick's solution, to me, is an appropriate concession to the reality
    that most folks will expect Python to treat their modules like modules
    and their scripts like scripts.
    You lost me there: if most people don't understand the difference, then why
    are they expecting a difference?
    Yeah, that wasn't clear. :)

    When someone learns Python, they probably are not going to recognize
    the difference between running their module and importing it. They'll
    expect their module to work identically if run as a script or
    imported. They won't even think about the distinction. Or maybe I'm
    really out of touch (quite possible :).

    It'll finally bite them when they implicitly or explicitly rely on the
    module state set by the import machinery (__name__, __file__, etc.),
    or on customization of that machinery (a la import hooks).

    Educating developers on the distinction between scripts and modules is
    good, but it seems like PEP 395 is trying to bring the behavior more
    in line with the intuitive behavior, which sounds good to me.

    Regarding the PEP 402 conflict, if using .pyp on directory names
    addresses Nick's concern, would you be opposed to that solution?

    -eric

    p.s. where should I bring up general discussion on PEP 395?
  • Nick Coghlan at Nov 16, 2011 at 10:44 pm

    On Thu, Nov 17, 2011 at 8:41 AM, Eric Snow wrote:
    p.s. where should I bring up general discussion on PEP 395?
    import-sig for now - it needs more thought before I take it back to python-dev.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Nick Coghlan at Nov 16, 2011 at 10:41 pm

    On Thu, Nov 17, 2011 at 1:08 AM, PJ Eby wrote:
    On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote:

    So, without a clear answer to the question of "from module X, inside
    package (or package portion) Y, find the nearest parent directory that
    should be placed on sys.path" in a PEP 402 based world, I'm switching
    to supporting PEP 382 as my preferred approach to namespace packages.
    In this case, I think "explicit is better than implicit" means, "given
    only a filesystem hierarchy, you should be able to figure out the
    Python package hierarchy it contains". Only explicit markers (either
    files or extensions) let you do that - with PEP 402, the filesystem
    doesn't contain enough information to figure it out, you need to also
    know the contents of sys.path.
    After spending an hour or so reading through PEP 395 and trying to grok what
    it's doing, I actually come to the opposite conclusion: that PEP 395 is
    violating the ZofP by both guessing, and not encouraging One Obvious Way of
    invoking scripts-as-modules.
    For example, if somebody adds an __init__.py to their project directory,
    suddenly scripts that worked before will behave differently under PEP 395,
    creating a strange bit of "spooky action at a distance". ?(And yes, people
    add __init__.py files to their projects in odd places -- being setuptools
    maintainer, you get to see a LOT of weird looking project layouts.)

    While I think the __qname__ idea is fine, and it'd be good to have a way to
    avoid aliasing main (suggestion for how included below), I think that
    relative imports failing from inside a main module should offer an error
    message suggesting you use "-m" if you're running a script that's within a
    package, since that's the One Obvious Way of running a script that's also a
    module. ?(Albeit not obvious unless you're Dutch. ?;-) )
    The -m switch is not always an adequate replacement for direct
    execution, because it relies on the current working directory being
    set correctly (or else the module to be executed being accessible via
    sys.path, and there being nothing in the current directory that will
    shadow modules that you want to import). Direct execution will always
    have the advantage of allowing you more explicit control over all of
    sys.path[0], sys.argv[0] and __main__.__file__. The -m switch, on the
    other hand, will always set sys.path[0] to the empty string, which may
    not be what you really want.

    If the package directory markers are explicit (as they are now and as
    they are in PEP 382), then PEP 395 isn't guessing - the mapping from
    the filesystem layout to the Python module namespace is completely
    unambiguous, since the directory added as sys.path[0] will always be
    the first parent directory that isn't marked as a package directory:

    # Current rule
    sys.path[0] = os.path.abspath(os.path.dirname(__main__.__file__))

    # PEP 395 rule
    path0 = os.path.abspath(os.path.dirname(__main__.__file__))
    while is_package_dir(path0):
    path0 = os.path.dirname(path0)
    sys.path[0] = path0

    In fact, both today and under PEP 382, we could fairly easily provide
    a "runpy.split_path_module()" function that converts an arbitrary
    filesystem path to the corresponding python module name and sys.path
    entry:

    def _splitmodname(fspath):
    path_entry, fname = os.path.split(fspath)
    modname = os.path.splitext(fname)[0]
    return path_entry, modname

    # Given appropriate definitions for "is_module_or_package" and
    "has_init_file"...
    def split_path_module(fspath):
    if not is_module_or_package(fspath):
    raise ValueError("{!r} is not recognized as a Python
    module".format(filepath))
    path_entry, modname = _splitmodname(fspath)
    while path_entry.endswith(".pyp") or has_init_file(path_entry):
    path_entry, pkg_name = _splitmodname(path_entry)
    modname = pkg_name + '.' + modname
    return modname, path_entry

    As far as the "one obvious way" criticism goes, I think the obvious
    way (given PEP 395) is clear:

    1. Do you have a filename? Just run it and Python will figure out
    where it lives in the module namespace
    2. Do you have a module name? Run it with the -m switch and Python
    will figure out where it lives on the filesystem

    runpy.run_path() corresponds directly to 1, runpy.run_module()
    corresponds directly to 2.

    Currently, if you have a filename, just running it is sometimes the
    *wrong thing to do*, because it may point inside a package directory.
    But you have no easy way to tell if that is the case. Under PEP 402,
    you simply *can't* tell, as the filesystem no longer contains enough
    information to provide an unambiguous mapping to the Python module
    namespace - instead, the intended mapping depends not only on the
    filesystem contents, but also on the runtime configuration of
    sys.path.
    For the import aliasing case,?AFAICT it's only about cases where __name__ ==
    '__main__', no? ?Why not just save the file/importer used for __main__, and
    then have the import machinery check whether a module being imported is
    about to alias __main__? ?For that, you don't need to know in *advance* what
    the qualified name of __main__ is - you just spot it the first time somebody
    re-imports it.
    Oh, I like that idea - once __main__.__qname__ is available, you could
    just have a metapath hook along the lines of the following:

    class MainImporter:
    def __init__(self):
    main = sys.modules.get("__main__", None):
    self.main_qname = getattr(main, "__qname__", None)

    def find_module(self, fullname, path=None):
    if fullname == self.main_qname:
    return self
    return None

    def load_module(self, fullname):
    return sys.modules["__main__"]
    I think removing qname-quessing from PEP 395 (and replacing it with
    instructive/google-able error messages) would be an unqualified improvement,
    independent of what happens to PEPs 382 and 402.
    Even if the "just do what I mean" part of the proposal in PEP 395 is
    replaced by a "Did you mean?" error message, PEP 382 still offers the
    superior user experience, since we could use runpy.split_path_module()
    to state the *exact* argument to -m that should have been used. Of
    course, that still wouldn't get sys.path[0] set correctly, so it isn't
    a given that it would really help.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • PJ Eby at Nov 17, 2011 at 12:10 am

    On Wed, Nov 16, 2011 at 5:41 PM, Nick Coghlan wrote:

    If the package directory markers are explicit (as they are now and as
    they are in PEP 382), then PEP 395 isn't guessing - the mapping from
    the filesystem layout to the Python module namespace is completely
    unambiguous, since the directory added as sys.path[0] will always be
    the first parent directory that isn't marked as a package directory:
    Sorry, but that's *still guessing*. Random extraneous __init__.py and
    subdirectories on sys.path can screw you over. For example, if I have a
    stray __init__.py in site-packages, does that mean that every module there
    is a submodule of a package called 'site-packages'?

    Sure, you could fix that problem by ignoring names with a '-', but that's
    just an illustration. The __init__.py idea was a very good attempt at
    solving the problem, but even in today's Python, it's still ambiguous and
    we should refuse to guess. (Because it will result in weird behavior
    that's *much* harder to debug.)

    Import aliasing detection and relative import errors, on the other hand,
    don't rely on guessing.


    Even if the "just do what I mean" part of the proposal in PEP 395 is
    replaced by a "Did you mean?" error message, PEP 382 still offers the
    superior user experience, since we could use runpy.split_path_module()
    to state the *exact* argument to -m that should have been used.

    No, what you get is just a *guess* as to the correct directory. (And you
    can make similar guesses under PEP 402, if a parent directory of the script
    is already on sys.path.)


    Of
    course, that still wouldn't get sys.path[0] set correctly, so it isn't
    a given that it would really help.
    Right; and if you already *have* a correct sys.path, then you can make just
    as good a guess under PEP 402.

    Don't get me wrong - I'm all in favor of further confusion-reduction (which
    is what PEP 402's about, after all). I'm just concerned that PEP 395 isn't
    really clear about the tradeoffs, in the same way that PEP 382 was unclear
    back when I started doing all those proposed revisions leading up to PEP
    402.

    That is, like early PEP 382, ISTM that it's an initial implementation
    attempt to solve a problem by patching over it, rather than an attempt to
    think through "how things are" and "how they ought to be". I think some of
    that sort of thinking ought to be done, to see if perhaps there's a better
    tradeoff to be had.

    For one thing, I wonder about the whole scripts-as-modules thing. In other
    scripting languages AFAICT it's not very common to have a script as a
    module; there's a pretty clear delineation between the two, because
    Python's about the only language with the name==main paradigm. In
    languages that have some sort of "main" paradigm, it's usually a specially
    named function or class method (Java) or whatever.

    So, I'm wondering a bit about the detailed use cases people have about
    using modules as scripts and vice versa. Are they writing scripts, then
    turning them into modules? Trying to run somebody else's modules? Copying
    example code from somewhere?

    (The part that confuses me is, if you *know* there's a difference between a
    script and a module, then presumably you either know about __name__, OR you
    wouldn't have any reason to run your module as a script. Conversely, if
    you don't know about __name__, then how would you conceive of making your
    script into a module? ISTM that in order to even have this problem you
    have to at least be knowledgeable enough to realize there's *some*
    difference between moduleness and scriptness.)

    Anyway, understanding the *details* of this process (of how people end up
    making the sort of errors PEP 395 aims to address) seems important to me
    for pinning down precisely what problem to solve and how.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20111116/d63d7537/attachment.html>
  • Nick Coghlan at Nov 17, 2011 at 1:47 am

    On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote:
    So, I'm wondering a bit about the detailed use cases people have about using
    modules as scripts and vice versa. ?Are they writing scripts, then turning
    them into modules? ?Trying to run somebody else's modules? ?Copying example
    code from somewhere?
    (The part that confuses me is, if you *know* there's a difference between a
    script and a module, then presumably you either know about __name__, OR you
    wouldn't have any reason to run your module as a script. ?Conversely, if you
    don't know about __name__, then how would you conceive of making your script
    into a module? ?ISTM that in order to even have this problem you have to at
    least be knowledgeable enough to realize there's *some* difference between
    moduleness and scriptness.)
    Anyway, understanding the *details* of this process (of how people end up
    making the sort of errors PEP 395 aims to address) seems important to me for
    pinning down precisely what problem to solve and how.
    The module->script process comes from wanting to expose useful command
    line functionality from a Python module in a cross-platform way
    without any additional packaging effort (as exposing system-native
    scripts is a decidedly *non* trivial task, and also doesn't work from
    a source checkout).

    The genesis was actually the timeit module - "python -m timeit" is now
    the easiest way to run short benchmarking snippets.

    A variety of other standard library modules also offer useful "-m"
    functionality - "-m site" will dump diagnostic info regarding your
    path setup, "-m smptd" will run up a local SMTP server, "-m unittest"
    and "-m doctest" can be used to run tests, "-m pdb" can be used to
    invoke the debugger, "-m pydoc" will run pydoc as usual. (A more
    comprehensive list is below, but it's also worth caveating this list
    with Raymond's comments on http://bugs.python.org/issue11260)

    Third party wise, I've mostly seen "-m" support used for "scripts that
    run scripts" - tools like pychecker, coverage and so forth are
    naturally Python version specific, and running them via -m rather than
    directly automatically deals with those scoping issues.

    It's also fairly common for test definition modules to support
    execution via "-m" (by invoking unittest.main() from an "if __name__"
    guarded suite).

    Cheers,
    Nick.

    ====================
    Top level stdlib modules with meaningful "if __name__ == '__main__':" blocks:

    base64.py - CLI for base64 encoding/decoding
    calendar.py - CLI to display text calendars
    cgi.py - displays some example CGI output
    code.py - code-based interactive interpreter
    compileall.py - CLI for bytecode file generation
    cProfile.py - profile a script with cProfile
    dis.py - CLI for file disassembly
    doctest.py - CLI for doctest execution
    filecmp.py - CLI to compare directory contents
    fileinput.py - line numbered file display
    formatter.py - reformats text and prints to stdout
    ftplib.py - very basic CLI for FTP
    gzip.py - basic CLI for creation of gzip files
    imaplib.py - basic IMAP client (localhost only)
    imghdr.py - scan a directory looking for valid image headers
    mailcap.py - display system mailcap config info
    mimetypes.py - CLI for querying mimetypes (but appears broken)
    modulefinder.py - dump list of all modules referenced (directly or
    indirectly) from a Python file
    netrc.py - dump netrc config (I think)
    nntplib.py - basic CLI for nntp
    pdb.py - debug a script
    pickle.py - dumps the content of a pickle file
    pickletools.py - prettier dump of pickle file contents
    platform.py - display platform info (e.g.
    Linux-3.1.1-1.fc16.x86_64-x86_64-with-fedora-16-Verne)
    profile.py - profile a script with profile
    pstats.py - CLI to browse profile stats
    pydoc.py - same as the installed pydoc script
    quopri.py - CLI for quoted printable encoding/decoding
    runpy.py - Essentially an indirect way to do what -m itself already does
    shlex.py - runs the lexer over the specified file
    site.py - dumps path config information
    smtpd.py - local SMTP server
    sndhdr.py - scan a directory looking for valid audio headers
    sysconfig.py - dumps system configuration details
    tabnanny.py - CLI to scan files
    telnetlib.py - very basic telnet CLI
    timeit.py - CLI to time snippets of code
    tokenize.py - CLI to tokenize files
    turtle.py - runs turtle demo (appears to be broken in trunk, though)
    uu.py - CLI for UUencode encoding/decoding
    webbrowser.py - CLI to launch a web browser
    zipfile.py - basic CLI for zipfile creation and inspection

    Not sure (no help text, no clear purpose without looking at the code):
    aifc.py - dump info about AIFF files?
    codecs.py
    decimal.py
    difflib.py
    getopt.py - manual sanity check?
    heapq.py
    inspect.py
    keyword.py - only valid in source checkout
    macurl2path.py - manual sanity check?
    poplib.py - simple POP3 client?
    pprint.py
    pyclbr.py - dump classes defined in files?
    py_compile.py
    random.py - manual sanity check?
    smtplib.py
    sre_constants.py - broken on Py3k!
    symbol.py - only valid in source checkout, broken on Py3k
    symtable.py - manual sanity check?
    textwrap.py - manual sanity check?
    token.py - only valid in source checkout

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • PJ Eby at Nov 17, 2011 at 2:00 am

    On Wed, Nov 16, 2011 at 8:47 PM, Nick Coghlan wrote:
    On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote:
    So, I'm wondering a bit about the detailed use cases people have about using
    modules as scripts and vice versa. Are they writing scripts, then turning
    them into modules? Trying to run somebody else's modules? Copying example
    code from somewhere?
    (The part that confuses me is, if you *know* there's a difference between a
    script and a module, then presumably you either know about __name__, OR you
    wouldn't have any reason to run your module as a script. Conversely, if you
    don't know about __name__, then how would you conceive of making your script
    into a module? ISTM that in order to even have this problem you have to at
    least be knowledgeable enough to realize there's *some* difference between
    moduleness and scriptness.)
    Anyway, understanding the *details* of this process (of how people end up
    making the sort of errors PEP 395 aims to address) seems important to me for
    pinning down precisely what problem to solve and how.
    The module->script process comes from wanting to expose useful command
    line functionality from a Python module in a cross-platform way
    without any additional packaging effort (as exposing system-native
    scripts is a decidedly *non* trivial task, and also doesn't work from
    a source checkout).
    No, I mean how do the people who PEP 395 is supposed to be helping, find
    out that they even want to run a script as a module?

    Or are you saying that the central use case the PEP is aimed at is running
    stdlib modules? ;-)


    It's also fairly common for test definition modules to support
    execution via "-m" (by invoking unittest.main() from an "if __name__"
    guarded suite).
    Right... so are these modules not *documented* as being run by -m? Are
    people running them as scripts by mistake?

    I'm still not seeing how people end up making their own scripts into
    modules or vice versa, *without* some explicit documentation about the
    process. I mean, how do you even know that a file can be both, without
    realizing that there's a difference between the two?

    The most common confusion I've seen among newbies is the ones who don't
    grok that module != file. That is, they don't understand why you replace
    directory separators with '.' (which is how they think of it) or they want
    to use exec/runfile instead of import, or they expect import to run the
    code, or similar confusions of "file" and "module".

    However, I don't grok how people with *that* confusion would end up writing
    code that has a problem when run as a combination script/module, because
    they already think scripts and modules are the same thing and are rather
    unlikely to create a package in the first place.

    So who *is* PEP 395's target audience, and what is their mental model?
    That's the question I'd like to come to grips with before proposing a full
    solution.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20111116/5c81ffcf/attachment.html>
  • Nick Coghlan at Nov 17, 2011 at 3:50 am

    On Thu, Nov 17, 2011 at 12:00 PM, PJ Eby wrote:
    So who *is* PEP 395's target audience, and what is their mental model?
    ?That's the question I'd like to come to grips with before proposing a full
    solution.
    OK, I realised that the problem I want to solve with this part of the
    PEP isn't limited to direct execution of scripts - It's a general
    problem with figuring out an appropriate value for sys.path[0] that
    also affects the interactive interpreter and the -m switch.

    The "mission statement" for this part of PEP 395 is then clearly
    stated as: the Python interpreter should *never* automatically place a
    Python package directory on sys.path.

    Adding package directories to sys.path creates undesirable aliasing
    that may lead to multiple imports of the same module under different
    names, unexpected shadowing of standard library (and other) modules
    and packages, and frequently confusing errors where a module works
    when imported but not when executed directly and vice-versa. Letting
    the import system get into that state without even a warning is
    letting an error pass silently and we shouldn't do it.

    However, it's also true that, in many cases, this slight error in the
    import state is actually harmless, so *always* failing in this
    situation would be an unacceptable breach of backwards compatibility.
    While we could issue a warning and demand that the user fix it
    themselves (by invoking Python differently), there's no succinct way
    to explain what has gone wrong - it depends on a fairly detailed
    understanding of how import system gets initialised. And, as noted,
    there isn't actually a easy mechanism for users to currently fix it
    themselves in the general case - using the -m switch means also you
    have to get the current working directory right, losing out on one of
    the main benefits of direct execution. And such a warning is assuredly
    useless if you actually ran the script by double-clicking it in a file
    browser...

    Accordingly, PEP 395 proposes that, when such a situation is
    encountered, Python should just use the nearest containing
    *non*-package directory as sys.path[0] rather than naively blundering
    ahead and corrupting the import system state, regardless of how the
    proposed value for sys.path[0] was determined (i.e. the current
    working directory or the location of a specific Python file). Any
    module that currently worked correctly in this situation should
    continue to work, and many others that previously failed (because they
    were inside packages) will start to work. The only new failures will
    be early detection of invalid filesystem layouts, such as
    "__init__.py" files in directories that are not valid Python package
    names, and scripts stored inside package directories that *only* work
    as scripts (effectively relying on the implicit relative imports that
    occur due to __name__ being set to "__main__").

    This problem most often arises during development (*not* after
    deployment), when developers either start python to perform some
    experiments, or place quick tests or sanity checks in "if __name__ ==
    '__main__':" blocks at the end of their modules (this is a common
    practice, described in a number of Python tutorials. Our own docs also
    recommend this practice for test modules:
    http://docs.python.org/library/unittest#basic-example).

    The classic example from Stack Overflow looked like this:

    project/
    package/
    __init__.py
    foo.py
    tests/
    __init__.py
    test_foo.py

    Currently, the *only* correct way to invoke test_foo is with "project"
    as the current working directory and the command "python -m
    package.tests.test_foo". Anything else (such as "python
    package/tests/test_foo.py", ./package/tests/test_foo.py", clicking the
    file in a file browser or, while in the tests directory, invoking
    "python test_foo.py", "./test_foo.py" or "python -m test_foo") will
    still *try* to run test_foo, but fail in a completely confusing
    manner.

    If test_foo uses absolute imports, then the error will generally be
    "ImportError: No module named package", if it uses explicit relative
    imports, then the error will be "ValueError: Attempted relative import
    in non-package". Neither of these is going to make any sense to a
    novice Python developer, but there isn't any obvious way to make those
    messages self-explanatory (they're completely accurate, they just
    involve a lot of assumed knowledge regarding how the import system
    works and sys.path gets initialised).

    If foo.py is set up to invoke its own test suite:

    if __name__ == "__main__":
    import unittest
    from .tests import test_foo
    unittest.main(test_foo.__name__)

    Then you can get similarly confusing errors when attempting to run foo itself.

    However, those errors are comparatively obvious compared to the
    AttributeErrors (and ImportErrors) that can arise if you get
    unexpected name shadowing. For example, suppose you have a helper
    module called "package.json" for dealing with JSON serialisation in
    your library, and you start an interactive session while in the
    package directory, or attempting to invoke 'foo.py' directly in order
    to run its test suite (as described above). Now "import json" is
    giving you the version from your package, even though that version is
    *supposed* to be safely hidden away inside your package namespace. By
    silently allowing a package directory onto sys.path, we're doing our
    users a grave disservice.

    So my perspective is this: we're currently doing something by default
    that's almost guaranteed to be the wrong thing to do. There's a
    reasonably simple alternative that's almost always the *right* thing
    to do. So let's switch the default behaviour to get the common case
    right, and leave the confusing errors for the situations where
    something is actually *broken* (i.e. misplaced __init__.py files and
    scripts in package directories that are relying on implicit relative
    imports).

    And if that means requiring that package directories always be marked
    explicitly (either by an __init__.py file or by a ".pyp" extension)
    and forever abandoning the concepts in PEP 402, so be it.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Nick Coghlan at Nov 17, 2011 at 1:52 am

    On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote:
    On Wed, Nov 16, 2011 at 5:41 PM, Nick Coghlan wrote:

    If the package directory markers are explicit (as they are now and as
    they are in PEP 382), then PEP 395 isn't guessing - the mapping from
    the filesystem layout to the Python module namespace is completely
    unambiguous, since the directory added as sys.path[0] will always be
    the first parent directory that isn't marked as a package directory:
    Sorry, but that's *still guessing*. ?Random extraneous __init__.py and
    subdirectories on sys.path can screw you over. ?For example, if I have a
    stray __init__.py in site-packages, does that mean that every module there
    is a submodule of a package called 'site-packages'?
    Yes. (although in that case, you'd error out, since the package name
    isn't valid).

    Errors should never pass silently - ignoring such a screw-up in their
    filesystem layout is letting an error pass silently and will most
    likely cause obscure problems further down the road.
    Sure, you could fix that problem by ignoring names with a '-', but that's
    just an illustration. ?The __init__.py idea was a very good attempt at
    solving the problem, but even in today's Python, it's still ambiguous and we
    should refuse to guess. ?(Because it will result in weird behavior that's
    *much* harder to debug.)
    Import aliasing detection and relative import errors, on the other hand,
    don't rely on guessing.
    Umm, if people screw up their filesystem layouts and *lie* to the
    interpreter about whether or not something is a package, how is that
    our fault? "Oh, they told me something, but they might not mean it, so
    I'll choose to ignore the information they've given me" is the part
    that sounds like guessing to me.

    If we error *immediately*, telling them what's wrong with their
    filesystem, that's the *opposite* of guessing.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • PJ Eby at Nov 17, 2011 at 5:48 am

    On Wed, Nov 16, 2011 at 8:52 PM, Nick Coghlan wrote:

    Umm, if people screw up their filesystem layouts and *lie* to the
    interpreter about whether or not something is a package, how is that
    our fault? "Oh, they told me something, but they might not mean it, so
    I'll choose to ignore the information they've given me" is the part
    that sounds like guessing to me.
    Er, what?

    They're not lying, they just made a mistake -- a mistake that could've
    occurred at any point during a project's development, which would then only
    surface later.

    As I said, I've seen projects where people had unnecessary __init__.py
    files floating around -- mainly because at some point they were trying any
    and everything to get package imports to work correctly, and somewhere
    along the line decided to just put __init__.py files everywhere just to be
    "sure" that things would work. (i.e. the sort of behavior PEP 402 is
    supposed to make unnecessary.)


    If we error *immediately*, telling them what's wrong with their
    filesystem, that's the *opposite* of guessing.
    I'm all in favor of warning or erroring out on aliasing __main__ or
    relative imports from __main__. It's silently *succeeding* in doing
    something that might not have been intended on the basis of coincidental
    __init__.py placement that I have an issue with.

    There exist projects that *intentionally* alias their modules as both a
    package and non-package (*cough* PIL *cough*), to name just *one* kind of
    *intentionally* weird sys.path setups, not counting unintentional ones like
    I mentioned. The simple fact is that you cannot unambiguously determine
    the intended meaning of a given script, and you certainly can't do it
    *before* the script executes (because it might already be doing some
    sys.path munging of its own.

    Saying that people who made one kind of mistake or intentional change are
    lying, while a different set of people making mistakes deserve to have
    their mistake silently corrected doesn't seem to make much sense to me.
    But even if I granted that people with extra __init__.py's floating around
    should be punished for this (and I don't), this *still* wouldn't magically
    remove the existing ambiguity-of-intention in today's Python projects.
    Without some way for people to explicitly declare their intention (e.g.
    explicitly setting __qname__), you really have no way to definitely
    establish what the user's *intention* is. (Especially since the user who
    wrote the code and the user trying to use it might be different people....
    and sys.path might've been set up by yet another party.)

    IOW, it's ambiguous already, today, with or without 382, 402, or any other
    new PEP. (Heck, it was ambiguous before PEP 302 came around!)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20111117/9145b51b/attachment-0001.html>
  • Nick Coghlan at Nov 17, 2011 at 7:00 am

    On Thu, Nov 17, 2011 at 3:48 PM, PJ Eby wrote:
    I'm all in favor of warning or erroring out on aliasing __main__ or relative
    imports from __main__. ?It's silently *succeeding* in doing something that
    might not have been intended on the basis of coincidental __init__.py
    placement that I have an issue with.
    This is the part I don't get - you say potentially unintentional
    success is bad, but you're ok with silently succeeding by *ignoring*
    the presence of an __init__.py file and hence performing implicit
    relative imports, exactly the behaviour that PEP 328 set out to
    eliminate.

    Currently, by default, a *correct* package layout breaks under direct
    execution. I am proposing that we make it work by preventing implicit
    relative imports from __main__, just as we do from any other module.

    As a consequence, scripts that already support direct execution from
    inside a package would need to be updated to use explicit relative
    imports in Python 3.3+, since their implicit relative imports will
    break, just as they already do when you attempt to import such a
    module. I'm happy to fix things for novices and put the burden of a
    workaround on the people that know what they're doing.

    The workaround:

    if __name__ == "__main__" and sys.version_info < (3, 3):
    import peer # Implicit relative import
    else:
    from . import peer # explicit relative import

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimport-sig @
categoriespython
postedNov 16, '11 at 6:29a
activeNov 17, '11 at 7:00a
posts15
users3
websitepython.org

3 users in discussion

Nick Coghlan: 7 posts PJ Eby: 5 posts Eric Snow: 3 posts

People

Translate

site design / logo © 2018 Grokbase