FAQ
This reflects (I hope!) the discussions at PyCon. My plan is to produce
an implementation based on the importlib code, and then flush out pieces
of the PEP.

In particular, I want to make sure the PEP addresses the various
objections that were raised, especially by Nick.

Eric.

Search Discussions

  • Brett Cannon at Apr 19, 2012 at 9:08 pm

    On Thu, Apr 19, 2012 at 16:18, Eric V. Smith wrote:

    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    Obviously thanks for writing this up, Eric! I have the following comments
    (some of which I would fix myself but I lack hg repo access ATM) ...

    In Terminology, can you put the terms when you define them in quotes, e.g. 'The
    term "distribution" refers to ...'?

    "setuptools provides a similar function pkg_resources.declare_namespace"
    should either have a "named" added in there or a comma.

    "As vendors might chose(sic) to".

    You should mention that this will do away with the ImportWarning of
    discovering a directory lacking an __init__.py file.

    As for the effects on path hooks, there are none. =) It's actually the
    finders that they return which need to change. Either finders need to be
    updated to return something other than None to signal they have a directory
    which works for the name (maybe the string for what should go into
    __path__?) or another method on finders which is called if
    finder.find_module() returns None (like finder.find_namespace() which
    returns the directory name or None). Then you need to update
    importlib._bootstrap.PathFinder to handle one of the two approaches to
    create the module and set it with some __loader__ (which really doesn't
    need to do much more than construct a module with the proper attributes
    since there is nothing to execute) like
    importlib.machinery.NamespaceLoader(name, *paths). Using a specific class
    in import already has precedence thanks to NullImporter.

    If you want performance then you go with the returning of a string by
    finder.find_module() since the finder can keep track of finding a directory
    w/o an __init__.py when it tries looking for a module. Import can do a
    hasattr check on non-None return values to decide if it got back a loader
    or a path for a namespace. If you don't like what the return value to mean
    based on it being None or having a specific attribute then you would want
    the new method at the (potential) cost of another stat call. Or maybe
    someone can think of some other approach.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120419/bf2a4baa/attachment.html>
  • Eric V. Smith at Apr 19, 2012 at 10:10 pm

    On 4/19/2012 5:08 PM, Brett Cannon wrote:
    In Terminology, can you put the terms when you define them in quotes,
    e.g. 'The term "distribution" refers to ...'?

    "setuptools provides a similar function pkg_resources.declare_namespace"
    should either have a "named" added in there or a comma.

    "As vendors might chose(sic) to".
    I've made these grammar changes. I'll update based on the rest of your
    comments tomorrow.

    Thanks!

    Eric.
    You should mention that this will do away with the ImportWarning of
    discovering a directory lacking an __init__.py file.

    As for the effects on path hooks, there are none. =) It's actually the
    finders that they return which need to change. Either finders need to be
    updated to return something other than None to signal they have a
    directory which works for the name (maybe the string for what should go
    into __path__?) or another method on finders which is called if
    finder.find_module() returns None (like finder.find_namespace() which
    returns the directory name or None). Then you need to update
    importlib._bootstrap.PathFinder to handle one of the two approaches to
    create the module and set it with some __loader__ (which really doesn't
    need to do much more than construct a module with the proper attributes
    since there is nothing to execute) like
    importlib.machinery.NamespaceLoader(name, *paths). Using a specific
    class in import already has precedence thanks to NullImporter.

    If you want performance then you go with the returning of a string by
    finder.find_module() since the finder can keep track of finding a
    directory w/o an __init__.py when it tries looking for a module. Import
    can do a hasattr check on non-None return values to decide if it got
    back a loader or a path for a namespace. If you don't like what the
    return value to mean based on it being None or having a specific
    attribute then you would want the new method at the (potential) cost of
    another stat call. Or maybe someone can think of some other approach.
  • Eric V. Smith at Apr 19, 2012 at 10:59 pm

    On 4/19/2012 5:08 PM, Brett Cannon wrote:

    You should mention that this will do away with the ImportWarning of
    discovering a directory lacking an __init__.py file. Done.
    As for the effects on path hooks, there are none. =) It's actually the
    finders that they return which need to change. Either finders need to be
    updated to return something other than None to signal they have a
    directory which works for the name (maybe the string for what should go
    into __path__?) or another method on finders which is called if
    finder.find_module() returns None (like finder.find_namespace() which
    returns the directory name or None). Then you need to update
    importlib._bootstrap.PathFinder to handle one of the two approaches to
    create the module and set it with some __loader__ (which really doesn't
    need to do much more than construct a module with the proper attributes
    since there is nothing to execute) like
    importlib.machinery.NamespaceLoader(name, *paths). Using a specific
    class in import already has precedence thanks to NullImporter.

    If you want performance then you go with the returning of a string by
    finder.find_module() since the finder can keep track of finding a
    directory w/o an __init__.py when it tries looking for a module. Import
    can do a hasattr check on non-None return values to decide if it got
    back a loader or a path for a namespace. If you don't like what the
    return value to mean based on it being None or having a specific
    attribute then you would want the new method at the (potential) cost of
    another stat call. Or maybe someone can think of some other approach.
    Changing finder.find_module() to return a string seems the best thing to do.

    Barry and I (and hopefully Jason Coombs) are going to try and get
    together and sprint on this in the near future. I might wait to update
    the PEP on the affect on finders until we're done.

    Thanks again.

    Eric.
  • Eric Snow at Apr 19, 2012 at 9:21 pm

    On Thu, Apr 19, 2012 at 2:18 PM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    Nice work, Eric. PEP 420 is quite clear. I appreciate that not many
    words are spent on contrasting it with PEP 402. I agree that the PEP
    needs to be clear on Nick's concerns, one way or the other (especially
    as they relate to PEP 395). I don't recall any satisfactory
    resolution on that. Looking forward to hearing more on this.

    -eric

    p.s. how often do the PEPs get rebuilt? I saw the PEP as it came
    across the commits list, but it's not showing up on the site.
  • Nick Coghlan at Apr 20, 2012 at 3:56 am

    On Fri, Apr 20, 2012 at 6:18 AM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.
    This paragraph in the "Rationale" section is confusing:

    "Namespace packages need to be installed in one of two ways: either
    all portions of a namespace will be combined into a single directory
    (and therefore a single entry in sys.path), or each portion will be
    installed in its own directory (and each portion will have a distinct
    sys.path entry)."

    I would combine this with the following paragraph to make a single
    cohesive explanation of the problem that needs to be solved:

    "Namespace packages are designed to support being split across
    multiple directories (and hence found via multiple sys.path entries).
    In this configuration, it doesn't matter if multiple portions all
    provide an __init__.py file, so long as each portion correctly
    initialises the namespace package. However, Linux distribution vendors
    (amongst others) prefer to combine the separate portions and install
    them all into the *same* filesystem directory. This creates a
    potential for conflict, as the portions are now attempting to provide
    the *same* file on the target system - something that is not allowed
    by many package managers. Allowing implicit namespace packages means
    that the requirement to provide an __init__.py file can be dropped
    completely, and affected portions can be installed into a common
    directory or split across multiple directories as distributions see
    fit."
    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    Yep. I'm happy with the conclusions we reached in the previous
    discussion, but PEP 420 does need to describe them. Here's the gist of
    it for the four points listed:

    - for the first point, "practicality beats purity" pretty much carries
    the day as far the Zen goes

    - for the second point, the minor backwards compatibility risks are
    acknowledged and accepted. My initial objection was based on a
    misunderstanding of the consensus proposal. Once it was clarified that
    the only "incompatibility" is that an import may now succeed where it
    previously would have failed, I was no longer concerned. In contrast
    to PEP 402, PEP 420 deliberately chooses to preserve consistent
    behaviour of "import foo; import foo.bar" and "import foo.bar; import
    foo", seeing that as being more important than preventing the
    successful import of an empty (or otherwise non-package) subdirectory
    of a sys.path location. This does mean some try/except import blocks
    may need to updated to check the imported module or package for an
    expected attribute or subpackage rather than just checking that the
    import works, but has the major advantage of making the revised import
    model much cleaner and easier to understand.

    - the final two points will be addressed by having PEP 395 propose the
    production of better *error messages* rather than introducing any
    additional magic to the initialisation of sys.path[0] (see
    http://mail.python.org/pipermail/import-sig/2012-March/000442.html).
    The "are we in a package subdirectory?" heuristic mentioned in that
    message will be based on this suggestion from Eric Snow:
    http://mail.python.org/pipermail/import-sig/2012-March/000438.html

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Eric V. Smith at Apr 20, 2012 at 10:21 am

    On 4/19/2012 11:56 PM, Nick Coghlan wrote:
    On Fri, Apr 20, 2012 at 6:18 AM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.
    This paragraph in the "Rationale" section is confusing:

    "Namespace packages need to be installed in one of two ways: either
    all portions of a namespace will be combined into a single directory
    (and therefore a single entry in sys.path), or each portion will be
    installed in its own directory (and each portion will have a distinct
    sys.path entry)."

    I would combine this with the following paragraph to make a single
    cohesive explanation of the problem that needs to be solved:

    "Namespace packages are designed to support being split across
    multiple directories (and hence found via multiple sys.path entries).
    In this configuration, it doesn't matter if multiple portions all
    provide an __init__.py file, so long as each portion correctly
    initialises the namespace package. However, Linux distribution vendors
    (amongst others) prefer to combine the separate portions and install
    them all into the *same* filesystem directory. This creates a
    potential for conflict, as the portions are now attempting to provide
    the *same* file on the target system - something that is not allowed
    by many package managers. Allowing implicit namespace packages means
    that the requirement to provide an __init__.py file can be dropped
    completely, and affected portions can be installed into a common
    directory or split across multiple directories as distributions see
    fit."
    That does read much better. Thanks.
    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    Yep. I'm happy with the conclusions we reached in the previous
    discussion, but PEP 420 does need to describe them. Here's the gist of
    it for the four points listed:
    <discussion deleted>

    I'll add these after I go back and re-read the original thread.

    Eric.
  • Nick Coghlan at Apr 20, 2012 at 4:04 am

    On Fri, Apr 20, 2012 at 6:18 AM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    One other thing I noticed: "There is no mechanism to recompute the
    __path__ once a namespace package has been created."

    This isn't really true - pkgutil.extend_path() can still be used to
    update a namespace package path. Perhaps change it to:

    "There is no mechanism to automatically recompute the __path__ if
    sys.path is altered after a namespace package has already been
    created. However, existing namespace utilities (like
    pkgutil.extend_path()) can be used to update them explicitly if
    desired."

    Also, as a general matter of readability, adding double backticks
    around attributes, functions and filenames to get them displayed in
    monospace can be quite helpful.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Eric V. Smith at Apr 20, 2012 at 10:15 am

    On 4/20/2012 12:04 AM, Nick Coghlan wrote:
    On Fri, Apr 20, 2012 at 6:18 AM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    One other thing I noticed: "There is no mechanism to recompute the
    __path__ once a namespace package has been created."

    This isn't really true - pkgutil.extend_path() can still be used to
    update a namespace package path. Perhaps change it to:

    "There is no mechanism to automatically recompute the __path__ if
    sys.path is altered after a namespace package has already been
    created. However, existing namespace utilities (like
    pkgutil.extend_path()) can be used to update them explicitly if
    desired."
    Done. Thanks!
    Also, as a general matter of readability, adding double backticks
    around attributes, functions and filenames to get them displayed in
    monospace can be quite helpful.
    Agreed. That's a work in progress.

    Eric.
  • PJ Eby at Apr 21, 2012 at 5:06 pm

    On Fri, Apr 20, 2012 at 12:04 AM, Nick Coghlan wrote:

    "There is no mechanism to automatically recompute the __path__ if
    sys.path is altered after a namespace package has already been
    created. However, existing namespace utilities (like
    pkgutil.extend_path()) can be used to update them explicitly if
    desired."
    Btw, was there ever an explicit rejection of the "namespace package
    __path__ is an auto-updating iterable instead of a list" approach, or did
    it even come up in the consensus discussion?
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120421/ef68c2e9/attachment.html>
  • Eric Snow at Apr 21, 2012 at 6:49 pm

    On Sat, Apr 21, 2012 at 11:06 AM, PJ Eby wrote:
    Btw, was there ever an explicit rejection of the "namespace package __path__
    is an auto-updating iterable instead of a list" approach, or did it even
    come up in the consensus discussion?
    Pretty sure it didn't come up, but it sounds like Eric Smith has
    considered it. PEP 420 currently has this to say:

    "There is no mechanism to automatically recompute the __path__ if
    sys.path is altered after a namespace package has already been
    created. However, existing namespace utilities (like
    pkgutil.extend_path) can be used to update them explicitly if
    desired." [1]

    -eric

    [1] http://www.python.org/dev/peps/pep-0420/#id9
  • Martin v. Löwis at Apr 21, 2012 at 8:50 pm

    Am 21.04.2012 20:49, schrieb Eric Snow:
    On Sat, Apr 21, 2012 at 11:06 AM, PJ Eby wrote:
    Btw, was there ever an explicit rejection of the "namespace package __path__
    is an auto-updating iterable instead of a list" approach, or did it even
    come up in the consensus discussion?
    Pretty sure it didn't come up, but it sounds like Eric Smith has
    considered it.
    There was a sort of bulk-rejection of "fancy features", IIRC. It wasn't
    clear to us which of the many additional features of PEP 402 was really
    important to you, so the consensus was to start with the minimum, and
    extend as actual use cases become apparent.

    For some of the PEP 402 features, we identified "concurrent versions"
    as the use case (i.e. pkg_resources.require). The consensus was that
    this use case can be ignored.

    Eric is right that the specific question of a dynamic __path__ was not
    discussed.

    Regards,
    Martin
  • Eric V. Smith at Apr 22, 2012 at 1:06 am

    On 4/21/2012 4:50 PM, "Martin v. L?wis" wrote:
    Am 21.04.2012 20:49, schrieb Eric Snow:
    On Sat, Apr 21, 2012 at 11:06 AM, PJ Eby wrote:
    Btw, was there ever an explicit rejection of the "namespace package __path__
    is an auto-updating iterable instead of a list" approach, or did it even
    come up in the consensus discussion?
    What's the use case for this?
    Pretty sure it didn't come up, but it sounds like Eric Smith has
    considered it.
    There was a sort of bulk-rejection of "fancy features", IIRC. It wasn't
    clear to us which of the many additional features of PEP 402 was really
    important to you, so the consensus was to start with the minimum, and
    extend as actual use cases become apparent.
    I don't recall this issue specifically, but I agree with Martin that
    we're trying to start with a minimal feature set.
    Eric is right that the specific question of a dynamic __path__ was not
    discussed.
    Furthermore, given how __path__ is built, by one-at-a-time remembering
    the path entries that have a foo directory but no foo/__init__.py, I'm
    not sure how you'd turn that into some auto-updating iterable.

    Eric.
  • Nick Coghlan at Apr 22, 2012 at 5:26 am

    On Sun, Apr 22, 2012 at 11:06 AM, Eric V. Smith wrote:
    Furthermore, given how __path__ is built, by one-at-a-time remembering
    the path entries that have a foo directory but no foo/__init__.py, I'm
    not sure how you'd turn that into some auto-updating iterable.
    You just have to remember all your namespace packages somewhere and
    then use a list subclass that triggers a rescan whenever the contents
    change.

    Personally, I'm happier with the basic behaviour being that
    dynamically updating sys.path while the program is running can be a
    bit hit-or-miss in terms of what recognises the change.

    Longer term, rather than introducing magical side effects for sys.path
    manipulation, I think the better solution is to expose a more
    object-oriented API for manipulating the import system state that
    takes care of maintaining the state invariants, invalidating caches
    when appropriate and triggering updates to package __path__ entries.
    Hence, PEP 406 (currently deferred) and its import engine API.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • PJ Eby at Apr 22, 2012 at 9:10 pm

    On Sun, Apr 22, 2012 at 1:26 AM, Nick Coghlan wrote:
    On Sun, Apr 22, 2012 at 11:06 AM, Eric V. Smith wrote:
    Furthermore, given how __path__ is built, by one-at-a-time remembering
    the path entries that have a foo directory but no foo/__init__.py, I'm
    not sure how you'd turn that into some auto-updating iterable.
    You just have to remember all your namespace packages somewhere and
    then use a list subclass that triggers a rescan whenever the contents
    change.
    Not necessary if you set __path__ to an iterable that caches a tuple of its
    parent package __path__ (or sys.path), and compares that against the
    current value before iterating. If it's changed, you walk the parent and
    rescan, otherwise iterate over your cached value. I posted a sketch to
    Python-Dev the first time 402 discussion happened there.

    The consequences of making namespace package __path__ iterable are less
    problematic, I believe, than changing the type of sys.path: almost no code
    manipulates __path__ as anything but an iterable, and code that does is
    broken for namespace packages anyway, because accessing specific offsets
    won't give you what you think you're looking for. So you get noisy
    breakage instead of quiet breakage in such cases (as would happen with
    using lists for __path__).

    If for some reason you want to explicitly change a namespace package's
    __path__, you could just reset __path__ to list(__path__), and proceed from
    there -- which is the recommended idiom for using extend_path, anyway.


    Personally, I'm happier with the basic behaviour being that
    dynamically updating sys.path while the program is running can be a
    bit hit-or-miss in terms of what recognises the change.
    pkg_resources supports dynamic updating today, so the idea here was to make
    it possible to do away with that. (It only supports updating if it's the
    one doing the sys.path manipulation, however.)

    I think there should be *some* blessed API(s) to force the updating,
    though, even if it's not automatic or dynamic. extend_path() really isn't
    the right tool for the job.

    The main argument in favor of automatic updating is that it more closely
    matches naive expectations of users coming from other languages. (Although
    to be honest I'm not 100% certain that those other languages actually do
    change their lookups that dynamically.)

    Anyway, the sketch (using PEP 402's importer protocol; not updated for 420)
    was something like:

    class VirtualPath:
    __slots__ = ('__name__', '_parent', '_last_seen', '_path')

    def __init__(self, name, parent_path):
    self.__name__ = name
    self._parent = parent_path
    self._path = self._last_seen = ()

    def _fail(self, *args, **kw):
    raise TypeError(self.__name__+" is a virtual package")

    __getitem__ = __setitem__ = __delitem__ = append = extend = insert =
    _fail

    def _calculate(self):
    with _ImportLockContext():
    parent = tuple(self._parent)
    if parent != self._last_seen:
    items = []
    name = self.__name__
    for entry in parent:
    importer = get_importer(entry)
    if hasattr(importer, 'get_subpath'):
    item = importer.get_subpath(name)
    if item is not None:
    items.append(item)
    self._last_seen = parent
    self._path = tuple(items)
    return self._path

    def __iter__(self):
    return iter(self._calculate())

    def __len__(self):
    return len(self._calculate())

    def __repr__(self):
    return "VirtualPath" + repr((self.__name__, self._parent))

    def __contains__(self, item):
    return item in self._calculate()


    Using these objects in place of lists for __path__ objects would then do
    the trick.

    (And of course, you'd want to change "Virtual" to "Namespace" throughout, I
    suppose. ;-) )
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120422/d5f4ae8a/attachment.html>
  • Michael Foord at Apr 22, 2012 at 11:51 pm

    On 19 April 2012 21:18, Eric V. Smith wrote:

    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    So a namespace package is a directory (tree) on sys.path. For a standard
    Python install how will these be installed?

    If you need to install "foo.bar" and "foo.baz" will distutils and packaging
    do the right thing? (And what specifically is the right thing for Python's
    own package management tools - merging the namespace packages or keeping
    them separate somehow?)

    setuptools creates a new directory for each installed package and adds this
    directory to sys.path using pth files. It's a bit of a hack, but it allows
    namespace packages to co-exist.

    Michael


    Eric.

    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig


    --

    http://www.voidspace.org.uk/

    May you do good and not evil
    May you find forgiveness for yourself and forgive others
    May you share freely, never taking more than you give.
    -- the sqlite blessing http://www.sqlite.org/different.html
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120423/20279c6d/attachment.html>
  • PJ Eby at Apr 23, 2012 at 12:29 am
    On Sun, Apr 22, 2012 at 7:51 PM, Michael Foord wrote:
    On 19 April 2012 21:18, Eric V. Smith wrote:

    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    So a namespace package is a directory (tree) on sys.path. For a standard
    Python install how will these be installed?

    If you need to install "foo.bar" and "foo.baz" will distutils and
    packaging do the right thing? (And what specifically is the right thing for
    Python's own package management tools - merging the namespace packages or
    keeping them separate somehow?)
    I don't know about 3.x distutils or packaging specifically, but I do know
    that 2.x distutils will install packages compatibly with this approach if
    you list the child packages but NOT the namespace package in your setup.py.
    So if one distribution lists 'foo.bar' and the other lists 'foo.baz', but
    *neither* lists 'foo', then the subpackages will be installed without a
    foo/__init__.py, and that will make it work.

    If packaging and 3.x distutils inherit this behavior from the 2.x
    distutils, then that would be the simplest way to do it. (And if you
    install to different directories, the parts will get merged.)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120422/22c5467e/attachment-0001.html>
  • Nick Coghlan at Apr 23, 2012 at 1:08 am

    On Mon, Apr 23, 2012 at 9:51 AM, Michael Foord wrote:
    On 19 April 2012 21:18, Eric V. Smith wrote:

    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    So a namespace package is a directory (tree) on sys.path. For a standard
    Python install how will these be installed?

    If you need to install "foo.bar" and "foo.baz" will distutils and packaging
    do the right thing? (And what specifically is the right thing for Python's
    own package management tools - merging the namespace packages or keeping
    them separate somehow?)
    <lib_dir>/site-packages/foo/bar
    <lib_dir>/site-packages/foo/baz

    The whole point of dropping the __init__.py file requirement is that
    merging the namespace portions becomes trivial, so you don't need to
    worry about sys.path hackery in the normal case - you can just install
    them into a common directory (adding it on install if it doesn't exist
    yet, removing it on uninstall if the only remaining contents are the
    __pycache__ subdirectory).

    However, for zipfile distribution, or running from a source checkout,
    you could instead provide them as <app_dir>/foo/bar and
    <app_dir>/foo/baz and they would still be accessible as "foo.bar" and
    "foo.baz". Basically, PEP 420 should mean that managing subpackages
    and submodules becomes a *lot* more like managing top level packages
    and modules.

    Agreed the packaging implications should be specified clearly in the
    PEP, though (especially the install/uninstall behaviour when namespace
    portions get merged into a single directory).

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Carl Meyer at Apr 28, 2012 at 12:52 am

    On 04/19/2012 02:18 PM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    One clarity issue in the PEP:

    "If the scan along the parent path completes without finding a module or
    package, then a namespace package is created."

    This seems incomplete, and should say something like:

    "If the scan along the parent path completes with finding a module or
    package, *but at least one directory was recorded,* then a namespace
    package is created."

    The current wording seems to imply that any failed import would always
    cause the creation of a namespace package with an empty __path__, which
    I presume is not the intent.

    Carl
  • Eric V. Smith at Apr 28, 2012 at 10:27 am

    On 4/27/2012 8:52 PM, Carl Meyer wrote:
    On 04/19/2012 02:18 PM, Eric V. Smith wrote:
    This reflects (I hope!) the discussions at PyCon. My plan is to produce
    an implementation based on the importlib code, and then flush out pieces
    of the PEP.

    In particular, I want to make sure the PEP addresses the various
    objections that were raised, especially by Nick.
    One clarity issue in the PEP:

    "If the scan along the parent path completes without finding a module or
    package, then a namespace package is created."

    This seems incomplete, and should say something like:

    "If the scan along the parent path completes with finding a module or
    package, *but at least one directory was recorded,* then a namespace
    package is created."

    The current wording seems to imply that any failed import would always
    cause the creation of a namespace package with an empty __path__, which
    I presume is not the intent.
    Completely agree. I changed "but" to "and", but otherwise used it
    as-is. It's checked in.

    Thanks!

    Eric.
  • Eric V. Smith at May 1, 2012 at 10:00 pm
    I'm working on finishing up the PEP 420 work. I think the PEP itself is
    complete. If you have any comments, please send them to me or this list.

    The implementation at features/pep-420 has been merged with the recent
    importlib changes to the 3.3 branch. I've implemented support in the
    import machinery itself, as well as modified the filesystem finder
    (FileFinder) and the zipimport finder.

    About the only question I have is: Is everyone okay with the changes to
    the finders, described in the PEP? Basically they now return a string in
    addition to a loader or None. If they return a string, then the string
    represents the path of a possible namespace package portion. The change
    is backward compatible: unmodified finders will just be unable to
    participate in a namespace package.

    Barry Warsaw, Jason Coombs, and I are sprinting this Thursday. We'll
    focus on adding tests, and maybe documentation if we have time. If
    anyone has any concerns I'd like to hear them before then so that we can
    work on addressing them.

    The changes themselves are very small. I think the diff is a total of
    maybe 40 lines of code. Yury Selivanov had mentioned backporting to 3.2
    (which I assume would be an unsupported-by-python-dev effort). I
    actually don't think it would be all that complicated.

    Eric.
  • Brett Cannon at May 2, 2012 at 2:22 am

    On Tue, May 1, 2012 at 6:00 PM, Eric V. Smith wrote:

    I'm working on finishing up the PEP 420 work. I think the PEP itself is
    complete. If you have any comments, please send them to me or this list.

    The implementation at features/pep-420 has been merged with the recent
    importlib changes to the 3.3 branch. I've implemented support in the
    import machinery itself, as well as modified the filesystem finder
    (FileFinder) and the zipimport finder.

    About the only question I have is: Is everyone okay with the changes to
    the finders, described in the PEP? Basically they now return a string in
    addition to a loader or None. If they return a string, then the string
    represents the path of a possible namespace package portion. The change
    is backward compatible: unmodified finders will just be unable to
    participate in a namespace package.
    I obviously okay with the change. =) So this email is just a +1 in support
    of this work and a thanks for coding it up and seeing this through!

    -Brett

    Barry Warsaw, Jason Coombs, and I are sprinting this Thursday. We'll
    focus on adding tests, and maybe documentation if we have time. If
    anyone has any concerns I'd like to hear them before then so that we can
    work on addressing them.

    The changes themselves are very small. I think the diff is a total of
    maybe 40 lines of code. Yury Selivanov had mentioned backporting to 3.2
    (which I assume would be an unsupported-by-python-dev effort). I
    actually don't think it would be all that complicated.
    Ignoring that the classes he would need to access are technically private,
    backporting should be no more than a subclass and an extra stat call by
    FileFinder if None is returned.

    -Brett

    Eric.


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120501/2422cd68/attachment.html>
  • Martin v. Löwis at May 2, 2012 at 7:17 am

    About the only question I have is: Is everyone okay with the changes to
    the finders, described in the PEP?
    It looks good to me. It's a somewhat surprising change, but I can see no
    flaw in it.

    Regards,
    Martin
  • Eric V. Smith at May 2, 2012 at 10:23 am

    On 5/2/2012 3:17 AM, "Martin v. L?wis" wrote:
    About the only question I have is: Is everyone okay with the changes to
    the finders, described in the PEP?
    It looks good to me. It's a somewhat surprising change, but I can see no
    flaw in it.
    Surprising in that any change to find_module is needed, or surprising
    that it now returns one of {None, loader, str}?

    If it's the latter: yeah, it's a little strange. But find_module knows
    something that the caller needs to be told. It seemed easiest to add
    another possible return type. Any other suggestions?

    Eric.
  • PJ Eby at May 2, 2012 at 5:06 pm

    On Wed, May 2, 2012 at 6:23 AM, Eric V. Smith wrote:

    If it's the latter: yeah, it's a little strange. But find_module knows
    something that the caller needs to be told. It seemed easiest to add
    another possible return type. Any other suggestions?
    It seems quite elegant to me.

    I do see one point of concern with the spec, though. At one point it says
    that finders must return a path without a trailing separator, but at
    another it says the package __file__ will contain a separator.

    This strikes me as inconsistent, and also incompatible with
    non-filesystem-based finder implementations. The import machinery *must
    not* assume that import path strings are filenames, so it is wrong for the
    import machinery to add a path separator that the finder did not include.

    IOW, I don't think the spec can assume or guarantee anything about the
    strings returned by finders: it MUST treat them as opaque strings. If this
    means that there can't be any meaningful __file__ for a namespace package,
    I think we will have to live with that.

    The only alternative I see is to delegate the string manipulation back to
    the finders, or to change the return value from a string to a (file, path)
    tuple, wherein 'file' is the value to be used as __file__, and 'path' is
    the value to be used in __path__.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/116a9478/attachment.html>
  • Eric V. Smith at May 2, 2012 at 5:24 pm

    On 05/02/2012 01:06 PM, PJ Eby wrote:

    I do see one point of concern with the spec, though. At one point it
    says that finders must return a path without a trailing separator, but
    at another it says the package __file__ will contain a separator.

    This strikes me as inconsistent, and also incompatible with
    non-filesystem-based finder implementations. The import machinery *must
    not* assume that import path strings are filenames, so it is wrong for
    the import machinery to add a path separator that the finder did not
    include.

    IOW, I don't think the spec can assume or guarantee anything about the
    strings returned by finders: it MUST treat them as opaque strings. If
    this means that there can't be any meaningful __file__ for a namespace
    package, I think we will have to live with that.
    I've come to the same conclusion myself. I actually had a draft of the
    PEP that removed the word "directory", at which point it becomes obvious
    that you're adding a path separator to something that might not be a
    path name.
    The only alternative I see is to delegate the string manipulation back
    to the finders, or to change the return value from a string to a (file,
    path) tuple, wherein 'file' is the value to be used as __file__, and
    'path' is the value to be used in __path__.
    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I think it
    would be better to set __file__ to None. That would noisily break some
    code that isn't likely to work anyway.

    Eric.
  • Brett Cannon at May 2, 2012 at 5:53 pm

    On Wed, May 2, 2012 at 1:24 PM, Eric V. Smith wrote:
    On 05/02/2012 01:06 PM, PJ Eby wrote:

    I do see one point of concern with the spec, though. At one point it
    says that finders must return a path without a trailing separator, but
    at another it says the package __file__ will contain a separator.

    This strikes me as inconsistent, and also incompatible with
    non-filesystem-based finder implementations. The import machinery *must
    not* assume that import path strings are filenames, so it is wrong for
    the import machinery to add a path separator that the finder did not
    include.

    IOW, I don't think the spec can assume or guarantee anything about the
    strings returned by finders: it MUST treat them as opaque strings. If
    this means that there can't be any meaningful __file__ for a namespace
    package, I think we will have to live with that.
    I've come to the same conclusion myself. I actually had a draft of the
    PEP that removed the word "directory", at which point it becomes obvious
    that you're adding a path separator to something that might not be a
    path name.
    The only alternative I see is to delegate the string manipulation back
    to the finders, or to change the return value from a string to a (file,
    path) tuple, wherein 'file' is the value to be used as __file__, and
    'path' is the value to be used in __path__.
    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I think it
    would be better to set __file__ to None. That would noisily break some
    code that isn't likely to work anyway.

    Problem is that None for __file__ would be a unique use here. Frozen
    modules, for instance, typically say "<frozen>" for __file__. Now part of
    the reason (I suspect) this is done is that this was the only way to tell
    how the module was created, but with __loader__ now on all modules this is
    redundant. So perhaps this fake value for __file__ is just outdated and not
    worth perpetuating?

    I vote for using __file__ as None as suggested and having people infer how
    the module was created from __loader__.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/7649d6bd/attachment.html>
  • PJ Eby at May 2, 2012 at 9:05 pm

    On Wed, May 2, 2012 at 1:24 PM, Eric V. Smith wrote:
    On 05/02/2012 01:06 PM, PJ Eby wrote:

    I do see one point of concern with the spec, though. At one point it
    says that finders must return a path without a trailing separator, but
    at another it says the package __file__ will contain a separator.

    This strikes me as inconsistent, and also incompatible with
    non-filesystem-based finder implementations. The import machinery *must
    not* assume that import path strings are filenames, so it is wrong for
    the import machinery to add a path separator that the finder did not
    include.

    IOW, I don't think the spec can assume or guarantee anything about the
    strings returned by finders: it MUST treat them as opaque strings. If
    this means that there can't be any meaningful __file__ for a namespace
    package, I think we will have to live with that.
    I've come to the same conclusion myself. I actually had a draft of the
    PEP that removed the word "directory", at which point it becomes obvious
    that you're adding a path separator to something that might not be a
    path name.
    The only alternative I see is to delegate the string manipulation back
    to the finders, or to change the return value from a string to a (file,
    path) tuple, wherein 'file' is the value to be used as __file__, and
    'path' is the value to be used in __path__.
    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I think it
    would be better to set __file__ to None. That would noisily break some
    code that isn't likely to work anyway.
    Either None or a missing attribute is fine with me. (One advantage to the
    missing attribute is that it fails at the exact point where the inspecting
    code needs fixing, whereas the None will get passed on to some other code
    before the error manifests itsefl.)

    By the way, I finished reading the rest of the PEP, and with regard to
    auto-updating paths, I want to mention that it wasn't me who originally
    brought up issues about auto-update, it was someone on Python-Dev, and the
    use cases were discussed there. Also, I would challenge the argument about
    it being a major block to implementation, since the implementation is
    straightforward (and TONS simpler than setuptools' approach to the problem).

    More to the point, though, supporting auto-updates *later* is not really an
    option, since we'd be changing the rules on people, and invalidating
    whatever workarounds people come up with for manually updating the path.
    If namespace package __path__ objects start out as some other type than
    lists, then there's no change to trip anyone up later.

    I guess my point is that if we're not going to do auto-updates from the
    start, it's kind of going to rule it out in the long term as well, so if
    that's the intention it should be explicitly addressed. I don't want to
    see it just get ruled out by default due to not being done now, and then
    not being able to be done later.

    That's why my earlier question was about whether it had been discussed or
    not -- there was previous discussion on it in the 402 context, and it was
    left as an open issue pending BDFL comment on the basic idea of 402. Since
    then, the basic idea of treating init-less directories as namespace
    packages has been blessed, so now it's time to get the auto-updates
    yea-or-nay question ruled on as well.

    The implementation is pretty trivial; see PEP 402 version of it here:

    http://mail.python.org/pipermail/import-sig/2012-April/000473.html

    ...and the PEP 420 version is even simpler, since instead of looking for a
    'get_subpath()' method on the finders, it should just call find_module()
    and check for a string return.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/10cef13c/attachment.html>
  • Eric V. Smith at May 3, 2012 at 12:58 am

    On 5/2/2012 5:05 PM, PJ Eby wrote:

    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I think it
    would be better to set __file__ to None. That would noisily break some
    code that isn't likely to work anyway.


    Either None or a missing attribute is fine with me. (One advantage to
    the missing attribute is that it fails at the exact point where the
    inspecting code needs fixing, whereas the None will get passed on to
    some other code before the error manifests itsefl.)
    I can go either way on this, but would lean toward __file__ not being
    set. Brett: what's your opinion?
    By the way, I finished reading the rest of the PEP, and with regard to
    auto-updating paths, I want to mention that it wasn't me who originally
    brought up issues about auto-update, it was someone on Python-Dev, and
    the use cases were discussed there. Also, I would challenge the
    argument about it being a major block to implementation, since the
    implementation is straightforward (and TONS simpler than setuptools'
    approach to the problem).

    I guess my point is that if we're not going to do auto-updates from the
    start, it's kind of going to rule it out in the long term as well, so if
    that's the intention it should be explicitly addressed. I don't want to
    see it just get ruled out by default due to not being done now, and then
    not being able to be done later.
    Okay. I'll take a look at it tomorrow to see what's involved and if
    we're backing ourselves into a corner or not.

    Thanks.

    Eric.
  • Barry Warsaw at May 3, 2012 at 1:23 am

    On May 02, 2012, at 08:58 PM, Eric V. Smith wrote:
    On 5/2/2012 5:05 PM, PJ Eby wrote:

    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I think it
    would be better to set __file__ to None. That would noisily break some
    code that isn't likely to work anyway.


    Either None or a missing attribute is fine with me. (One advantage to
    the missing attribute is that it fails at the exact point where the
    inspecting code needs fixing, whereas the None will get passed on to
    some other code before the error manifests itsefl.)
    I can go either way on this, but would lean toward __file__ not being
    set. Brett: what's your opinion?
    I rather like __file__ not existing, although I haven't really thought about
    the practical effects. PJE makes a good argument though.

    -Barry
  • PJ Eby at May 3, 2012 at 4:37 am

    On Wed, May 2, 2012 at 9:23 PM, Barry Warsaw wrote:
    On May 02, 2012, at 08:58 PM, Eric V. Smith wrote:
    On 5/2/2012 5:05 PM, PJ Eby wrote:

    I don't see the value of __file__ at all in the case of namespace
    packages. If it's just a hint that it's a namespace package, I
    think it
    would be better to set __file__ to None. That would noisily break
    some
    code that isn't likely to work anyway.


    Either None or a missing attribute is fine with me. (One advantage to
    the missing attribute is that it fails at the exact point where the
    inspecting code needs fixing, whereas the None will get passed on to
    some other code before the error manifests itsefl.)
    I can go either way on this, but would lean toward __file__ not being
    set. Brett: what's your opinion?
    I rather like __file__ not existing, although I haven't really thought
    about
    the practical effects. PJE makes a good argument though.
    There's a counterargument that I realized later: PEP 302 currently requires
    that __file__ be set, AND that it be a string. "The privilege of not
    having a __file__ attribute at all is reserved for built-in modules."

    (Of course, that argues equally against __file__ being None, so I'm not
    sure it helps any to point that out!)

    Still, code that expects to do something with a package's __file__ is
    *going* to break somehow with a namespace package, so it's probably better
    for it to break sooner rather than later.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/ffa02e84/attachment-0001.html>
  • Nick Coghlan at May 3, 2012 at 6:23 am

    On Thu, May 3, 2012 at 2:37 PM, PJ Eby wrote:
    Still, code that expects to do something with a package's __file__ is
    *going* to break somehow with a namespace package, so it's probably better
    for it to break sooner rather than later.
    My own preference is for markers like "<frozen>", "<namespace>" and "<builtin>".

    They're significantly nicer to deal with when dumping module state for
    diagnostic purposes. If I get a KeyError on __file__, or an
    AttributeError on NoneType when all I'm trying to do is display data,
    it's annoying.

    Standardising on a pattern also opens up the possibility of doing
    something meaningful with it in get_data() later. One of the
    guarantees of PEP 302 if that you should be able to do this:

    data_ref = os.path.join(__file__, relative_ref)
    data = __loader__.get_data(data_ref)

    That should really only blow up in get_data(), *not* on the
    os.path.join step. Ideally, you should also be able to do this:

    data_ref = os.path.join(mod.__file__, relative_ref)
    data = mod.__loader__.get_data(data_ref)

    I see it as being similar to the mandatory file attribute on code
    objects - placeholders like "<stdin>" and "<string>" are a lot more
    informative when errors occur than just using None, even though
    neither of them is a valid filesystem path.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Brett Cannon at May 3, 2012 at 2:48 pm

    On Thu, May 3, 2012 at 2:23 AM, Nick Coghlan wrote:
    On Thu, May 3, 2012 at 2:37 PM, PJ Eby wrote:
    Still, code that expects to do something with a package's __file__ is
    *going* to break somehow with a namespace package, so it's probably better
    for it to break sooner rather than later.
    I'm going to roll my replies all into this email to keep things simple.

    So, to the people not wanting to set __file__, that (probably) won't fly
    because it has been documented for years that built-in modules are the only
    things that don't define __file__. Or we at least need to explain to people
    how to tell the difference in a backwards-compatible fashion (e.g.
    ``module.__name__ in sys.builtin_module_names``).

    My own preference is for markers like "<frozen>", "<namespace>" and
    "<builtin>".
    So I would have said that had experience with the stdlib not big me on
    this. In my situation, the trace module was checking file, and if __file__
    didn't contain "<frozen>" or "<doctest" it would try to read it as a path,
    and then error out if it couldn't open the file. Now I updated it to
    startswith('<') and endswith('>'), but I wonder how many people made a
    similar whitelist approach. And while having __file__ to None or
    non-existent will take about the same amount of time to fix, it is less
    prone to silly whitelisting like what the trace module had.

    They're significantly nicer to deal with when dumping module state for
    diagnostic purposes. If I get a KeyError on __file__, or an
    AttributeError on NoneType when all I'm trying to do is display data,
    it's annoying.

    Standardising on a pattern also opens up the possibility of doing
    something meaningful with it in get_data() later. One of the
    guarantees of PEP 302 if that you should be able to do this:

    data_ref = os.path.join(__file__, relative_ref)
    data = __loader__.get_data(data_ref)

    That should really only blow up in get_data(), *not* on the
    os.path.join step. Ideally, you should also be able to do this:

    data_ref = os.path.join(mod.__file__, relative_ref)
    data = mod.__loader__.get_data(data_ref)

    I see it as being similar to the mandatory file attribute on code
    objects - placeholders like "<stdin>" and "<string>" are a lot more
    informative when errors occur than just using None, even though
    neither of them is a valid filesystem path.
    But that's because there are no other introspection options to tell where
    the module originated, unlike modules which have __loader__.

    Cheers,
    Nick.

    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/f0661c26/attachment.html>
  • Brett Cannon at May 3, 2012 at 3:09 pm
    On Thu, May 3, 2012 at 10:48 AM, Brett Cannon wrote:
    On Thu, May 3, 2012 at 2:23 AM, Nick Coghlan wrote:
    On Thu, May 3, 2012 at 2:37 PM, PJ Eby wrote:
    Still, code that expects to do something with a package's __file__ is
    *going* to break somehow with a namespace package, so it's probably better
    for it to break sooner rather than later.
    I'm going to roll my replies all into this email to keep things simple.

    So, to the people not wanting to set __file__, that (probably) won't fly
    because it has been documented for years that built-in modules are the only
    things that don't define __file__. Or we at least need to explain to people
    how to tell the difference in a backwards-compatible fashion (e.g.
    ``module.__name__ in sys.builtin_module_names``).

    My own preference is for markers like "<frozen>", "<namespace>" and
    "<builtin>".
    So I would have said that had experience with the stdlib not big me on
    this.
    That should say "So I would have agreed with that had my experience with
    the stdlib in bootstrapping importlib not caused me to disagree."

    Don't try to multi-task at work while in the middle of writing an email is
    the lesson there. =)

    -Brett

    In my situation, the trace module was checking file, and if __file__ didn't
    contain "<frozen>" or "<doctest" it would try to read it as a path, and
    then error out if it couldn't open the file. Now I updated it to
    startswith('<') and endswith('>'), but I wonder how many people made a
    similar whitelist approach. And while having __file__ to None or
    non-existent will take about the same amount of time to fix, it is less
    prone to silly whitelisting like what the trace module had.

    They're significantly nicer to deal with when dumping module state for
    diagnostic purposes. If I get a KeyError on __file__, or an
    AttributeError on NoneType when all I'm trying to do is display data,
    it's annoying.

    Standardising on a pattern also opens up the possibility of doing
    something meaningful with it in get_data() later. One of the
    guarantees of PEP 302 if that you should be able to do this:

    data_ref = os.path.join(__file__, relative_ref)
    data = __loader__.get_data(data_ref)

    That should really only blow up in get_data(), *not* on the
    os.path.join step. Ideally, you should also be able to do this:

    data_ref = os.path.join(mod.__file__, relative_ref)
    data = mod.__loader__.get_data(data_ref)

    I see it as being similar to the mandatory file attribute on code
    objects - placeholders like "<stdin>" and "<string>" are a lot more
    informative when errors occur than just using None, even though
    neither of them is a valid filesystem path.
    But that's because there are no other introspection options to tell where
    the module originated, unlike modules which have __loader__.

    Cheers,
    Nick.

    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/3fa32c6b/attachment-0001.html>
  • Barry Warsaw at May 3, 2012 at 4:15 pm

    On May 03, 2012, at 10:48 AM, Brett Cannon wrote:
    So, to the people not wanting to set __file__, that (probably) won't fly
    because it has been documented for years that built-in modules are the only
    things that don't define __file__.
    Okay, but *why* is this the rule, other than that PEP 302 says it? IOW, PEP
    302 doesn't give much of a rationale for the rule, and I suspect it just
    reflected the reality back in 2002.
    Or we at least need to explain to people how to tell the difference in a
    backwards-compatible fashion.
    Definitely, and I think that would be fine to include in PEP 420.
    So I would have said that had experience with the stdlib not big me on
    this. In my situation, the trace module was checking file, and if __file__
    didn't contain "<frozen>" or "<doctest" it would try to read it as a path,
    and then error out if it couldn't open the file. Now I updated it to
    startswith('<') and endswith('>'), but I wonder how many people made a
    similar whitelist approach. And while having __file__ to None or
    non-existent will take about the same amount of time to fix, it is less
    prone to silly whitelisting like what the trace module had.
    See what I mean about arbitrary and underdocumented? :)

    Cheers,
    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 836 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/11bae3e6/attachment.pgp>
  • Brett Cannon at May 3, 2012 at 4:49 pm

    On Thu, May 3, 2012 at 12:15 PM, Barry Warsaw wrote:
    On May 03, 2012, at 10:48 AM, Brett Cannon wrote:

    So, to the people not wanting to set __file__, that (probably) won't fly
    because it has been documented for years that built-in modules are the only
    things that don't define __file__.
    Okay, but *why* is this the rule, other than that PEP 302 says it? IOW,
    PEP
    302 doesn't give much of a rationale for the rule, and I suspect it just
    reflected the reality back in 2002.
    Exactly. I am willing to be that historically it's just because that was
    the only way you could tell what was or was not a built-in module.

    Or we at least need to explain to people how to tell the difference in a
    backwards-compatible fashion.
    Definitely, and I think that would be fine to include in PEP 420.
    So I would have said that had experience with the stdlib not big me on
    this. In my situation, the trace module was checking file, and if __file__
    didn't contain "<frozen>" or "<doctest" it would try to read it as a path,
    and then error out if it couldn't open the file. Now I updated it to
    startswith('<') and endswith('>'), but I wonder how many people made a
    similar whitelist approach. And while having __file__ to None or
    non-existent will take about the same amount of time to fix, it is less
    prone to silly whitelisting like what the trace module had.
    See what I mean about arbitrary and underdocumented? :)
    I don't remind me about "arbitrary and underdocumented" when it comes to
    the import system. =P

    -Brett

    Cheers,
    -Barry

    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/11dbd7aa/attachment.html>
  • Martin at May 4, 2012 at 12:11 am

    Zitat von Barry Warsaw <barry at python.org>:
    On May 03, 2012, at 10:48 AM, Brett Cannon wrote:

    So, to the people not wanting to set __file__, that (probably) won't fly
    because it has been documented for years that built-in modules are the only
    things that don't define __file__.
    Okay, but *why* is this the rule, other than that PEP 302 says it?
    I think it predates PEP 302 by a decade or so. You might also ask why
    the keyword is "def", and not "define" (other than that the Grammar says
    so). It's a natural thing, also: If the module comes from the file system,
    it has an __file__ attribute, else it's built-in.

    Regards,
    Martin
  • Barry Warsaw at May 4, 2012 at 2:51 pm

    On May 04, 2012, at 02:11 AM, martin at v.loewis.de wrote:
    I think it predates PEP 302 by a decade or so. You might also ask why
    the keyword is "def", and not "define" (other than that the Grammar says
    so). It's a natural thing, also: If the module comes from the file system,
    it has an __file__ attribute, else it's built-in.
    Sure, that makes sense in a 2002 world where we didn't have importlib and all
    the modernization of the import system. Today, it's not only antiquated, it's
    also not necessarily true. We're already significantly overhauling the import
    machinery, so I think it's entirely reasonable to relax this constraint.

    See my previous post for a proposal.

    -Barry
  • Paul Moore at May 4, 2012 at 3:16 pm

    On 4 May 2012 15:51, Barry Warsaw wrote:
    On May 04, 2012, at 02:11 AM, martin at v.loewis.de wrote:

    I think it predates PEP 302 by a decade or so. You might also ask why
    the keyword is "def", and not "define" (other than that the Grammar says
    so). It's a natural thing, also: If the module comes from the file system,
    it has an __file__ attribute, else it's built-in.
    Sure, that makes sense in a 2002 world where we didn't have importlib and all
    the modernization of the import system. ?Today, it's not only antiquated, it's
    also not necessarily true. ?We're already significantly overhauling the import
    machinery, so I think it's entirely reasonable to relax this constraint.
    When we wrote PEP 302, so much code assumed that modules lived in the
    filesystem that we had very little room for manoeuvre, One of the
    goals of PEP 302 (in my mind, at least) was to disrupt the mindset
    that assumed this. Now, Brett's implementation of importlib has made
    that a reality - code that assumes modules live in a filesystem should
    have a really good justification for doing so (and document the
    limitation, ideally). I suspect you'll still break a reasonable amount
    of code like this, but that's probably OK, as it's less of a breakage,
    and more of a case of the existing code not anticipating cases that
    never existed before.
    See my previous post for a proposal.
    +1 and I'd also explicitly allow for loaders to assign other "private"
    metadata as well as __file__, if only to avoid the spectre of __file__
    being a base64-encoded pickled object :-)

    I wonder whether treating repr specially is the best way, though -
    maybe have a loader method "code_location" which is defined as being a
    human-readable, but otherwise unspecified string. The key use case is
    for repr, but it might be useful elsewhere (IDE tooltips or some such
    usage spring to mind).

    Paul.
  • Barry Warsaw at May 4, 2012 at 7:52 pm

    On May 04, 2012, at 04:16 PM, Paul Moore wrote:
    +1 and I'd also explicitly allow for loaders to assign other "private"
    metadata as well as __file__, if only to avoid the spectre of __file__
    being a base64-encoded pickled object :-)
    That's in PEP 420 now too.
    I wonder whether treating repr specially is the best way, though -
    maybe have a loader method "code_location" which is defined as being a
    human-readable, but otherwise unspecified string. The key use case is
    for repr, but it might be useful elsewhere (IDE tooltips or some such
    usage spring to mind).
    Maybe, but I think this is the simplest thing possible, which solves an
    existing use case. :)

    -Barry

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 836 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/5f770ef1/attachment.pgp>
  • Nick Coghlan at May 3, 2012 at 10:20 pm
    I'd still prefer to just officially bless the existing "<whatever>"
    convention for non-filesystem imports over encouraging type checks on
    __loader__ or defining a new introspection interface for loaders.

    If we say "this is the stdlib convention" people are going to start using
    the same check as is now used in traceback.py

    The precedent is there with code objects, and I think it's a good example
    to follow.

    Cheers,
    Nick.

    --
    Sent from my phone, thus the relative brevity :)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/f0f3cdd6/attachment-0001.html>
  • Guido van Rossum at May 3, 2012 at 10:43 pm
    +1
    On Thu, May 3, 2012 at 3:20 PM, Nick Coghlan wrote:
    I'd still prefer to just officially bless the existing "<whatever>"
    convention for non-filesystem imports over encouraging type checks on
    __loader__ or defining a new introspection interface for loaders.

    If we say "this is the stdlib convention" people are going to start using
    the same check as is now used in traceback.py

    The precedent is there with code objects, and I think it's a good example to
    follow.

    Cheers,
    Nick.

    --
    Sent from my phone, thus the relative brevity :)


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig


    --
    --Guido van Rossum (python.org/~guido)
  • PJ Eby at May 4, 2012 at 12:05 am

    On Thu, May 3, 2012 at 6:20 PM, Nick Coghlan wrote:

    I'd still prefer to just officially bless the existing "<whatever>"
    convention for non-filesystem imports over encouraging type checks on
    __loader__ or defining a new introspection interface for loaders.

    If we say "this is the stdlib convention" people are going to start using
    the same check as is now used in traceback.py

    The precedent is there with code objects, and I think it's a good example
    to follow.
    Note that this messes with the idea of using the first directory as
    filename -- anybody who joins with os.path.dirname(__file__) is going to
    get a mess (on regular filesystem paths), which is (I'm guessing) why the
    trailing separator idea was proposed in the first place.

    Which kind of brings us full circle on that point. I suppose we could just
    say screw it, anybody implementing VFS importers had darn well better
    understand os.path.join and friends, since PEP 302 requires it for get_data
    anyway.

    Still seems like a wart, but oh well. OTOH, maybe it's better for people
    munging __file__ to get a weird error all the time with namespace packages,
    instead of something that works some of the time, and fails later?
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/f9266230/attachment.html>
  • Nick Coghlan at May 4, 2012 at 1:05 am

    On Fri, May 4, 2012 at 10:05 AM, PJ Eby wrote:
    On Thu, May 3, 2012 at 6:20 PM, Nick Coghlan wrote:

    I'd still prefer to just officially bless the existing "<whatever>"
    convention for non-filesystem imports over encouraging type checks on
    __loader__ or defining a new introspection interface for loaders.

    If we say "this is the stdlib convention" people are going to start using
    the same check as is now used in traceback.py

    The precedent is there with code objects, and I think it's a good example
    to follow.
    Note that this messes with the idea of using the first directory as filename
    -- anybody who joins with os.path.dirname(__file__) is going to get a mess
    (on regular filesystem paths), which is (I'm guessing) why the trailing
    separator idea was proposed in the first place.

    Which kind of brings us full circle on that point.? I suppose we could just
    say screw it, anybody implementing VFS importers had darn well better
    understand os.path.join and friends, since PEP 302 requires it for get_data
    anyway.
    Yep. It also means VFS importers are officially free to put all the
    metadata they want inside the angle brackets, secure in the knowledge
    that everyone else should be treating it as an opaque blob. It then
    becomes a way for them to pass necessary info to get_data() *without*
    having to create distinct loader instances for every module.

    Arguably, we should also be adding the angle brackets in zipimporter
    (since those aren't real filesystem paths).
    Still seems like a wart, but oh well.? OTOH, maybe it's better for people
    munging __file__ to get a weird error all the time with namespace packages,
    instead of something that works some of the time, and fails later?
    Right. Otherwise we'd get layout dependent behaviour where dubious
    cross-portion references worked if all portions were installed to the
    same path segment, but then failed if they were split across multiple
    segments.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Eric V. Smith at May 4, 2012 at 1:21 am

    On 05/03/2012 09:05 PM, Nick Coghlan wrote:
    On Fri, May 4, 2012 at 10:05 AM, PJ Eby wrote:

    Still seems like a wart, but oh well. OTOH, maybe it's better for people
    munging __file__ to get a weird error all the time with namespace packages,
    instead of something that works some of the time, and fails later?
    Right. Otherwise we'd get layout dependent behaviour where dubious
    cross-portion references worked if all portions were installed to the
    same path segment, but then failed if they were split across multiple
    segments.
    Under no circumstances should anyone be looking at __file__ for a
    namespace package in order to find a related file. We should do
    something that causes this to always break.

    Eric.
  • Barry Warsaw at May 4, 2012 at 2:56 pm

    On May 04, 2012, at 11:05 AM, Nick Coghlan wrote:
    Yep. It also means VFS importers are officially free to put all the
    metadata they want inside the angle brackets, secure in the knowledge
    that everyone else should be treating it as an opaque blob. It then
    becomes a way for them to pass necessary info to get_data() *without*
    having to create distinct loader instances for every module.
    Ooh! I can't wait for the __file__ set to a pickle to steganographically
    communicate secret messages to get_data(). :)

    -Barry
  • Barry Warsaw at May 4, 2012 at 2:34 pm

    On May 04, 2012, at 08:20 AM, Nick Coghlan wrote:
    I'd still prefer to just officially bless the existing "<whatever>"
    convention for non-filesystem imports over encouraging type checks on
    __loader__ or defining a new introspection interface for loaders.
    The thing is, that convention is at best meaningless and at worst misleading.
    I also don't think it gives you all the diagnosis support you really want.

    The PEP 302 rule (reservation of no __file__ only for built-ins) is a
    historical relic for which no good rationale exists. Forgetting that for a
    moment, it simply makes no sense for a module that wasn't loaded from a file
    system path to have an __file__ attribute.

    It's also not true even today. At our PEP 420 sprint we noticed importlib
    does something like this to create new modules:
    type(sys)('foo')
    That module isn't a built-in and doesn't have an __file__. It also
    doesn't have an __loader__, but oh well.

    (BTW, Brett, that's pretty clever. :)

    It seemed to us that the only reasonable semantics for such modules is that
    __file__ is None or __file__ is missing. Not setting __file__ is better
    though because you get appropriate exceptions at the place where you make the
    initial mistake (i.e. assuming every module has an __file__). If you set
    __file__ to None, you may instead get cryptic messages in os.path.join() for
    example.

    So, what about the "diagnostics" use case? Certainly a very important use
    case is the repr of module objects. In the case of modules loaded from the
    file system, I definitely want to know where the file lives, and the repr is a
    great way to see that. For other modules, you do want to know something about
    how that module was created, and having a repr that gives a good indication of
    that is very useful. But you can easily do that without a contrived __file__
    (more on that below).

    What about other introspection use cases? Relying on __file__
    programmatically might be a convenient shorthand, but knowing the loader (via
    __loader__ if available) is more helpful, because that tells you more about
    how that module actually came into existence.

    The value of __file__ is really under the purview of the loader anyway.
    Consider a hypothetical database loader (or even many different third party
    database loaders). Of what use is an __file__ that says '<database>'? That
    way leads to uncertainty, and namespace collisions, for example if both a
    SQLite loader and a PostgreSQL loader wanted to use the '<database>' value.
    In either case, maybe you'd prefer to know what the database url is, or maybe
    the query that produced the module, or some combination there of.
    Overloading all that into a contrived __file__ seems wrong.

    I would prefer if the requirement were relaxed, and we simply allowed the
    loaders to set __file__ to whatever they think is appropriate, which would
    include allowing them to not setting __file__ at all.

    It's actually easy to give modules a reasonable repr even without __file__. I
    have a branch in the PEP 420 feature repo which implements the following rules
    for module object reprs:

    * Use mod.__file__ if it exists
    * Otherwise, get the module's __loader__
    * If the module has no loader, then just return the module's name. E.g.
    type(sys)('foo')
    <module 'foo'>
    * Define a new optional method on loaders, called module_repr() that
    takes the module as an argument. Use whatever this returns as the
    module's repr.
    * As a last fallback, just use the repr of the loader as part of the module's
    repr.

    I'm not particularly married to this implementation, but it seems reasonably
    backward compatible, and flexible enough to support useful alternatives. For
    example, the BuiltinImporter could define its module_repr() like so:

    @classmethod
    def module_repr(cls, module):
    return '<module {} (built-in)>'.format(module.__name__)

    Specifically, my proposed elaboration on PEP 420 is this:

    * Explicitly leave the assignment of __file__ to the loader.
    * Allow loaders to not set __file__
    * Add an optional API to loaders, module_repr() as defined above.

    Cheers,
    -Barry
  • PJ Eby at May 4, 2012 at 2:56 pm

    On May 4, 2012 10:34 AM, "Barry Warsaw" wrote:
    Specifically, my proposed elaboration on PEP 420 is this:

    * Explicitly leave the assignment of __file__ to the loader.
    * Allow loaders to not set __file__
    * Add an optional API to loaders, module_repr() as defined above.
    +1 on all the above, plus getting rid of __file__ for namespace packages.
    Seems like an elegant solution to the problems involved, and allows DB or
    other importers to make their own attributes like __dsn__ or __url__, but
    still have a decent repr.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/6bb4ac0c/attachment.html>
  • Barry Warsaw at May 4, 2012 at 7:11 pm

    On May 04, 2012, at 10:56 AM, PJ Eby wrote:
    On May 4, 2012 10:34 AM, "Barry Warsaw" wrote:
    Specifically, my proposed elaboration on PEP 420 is this:

    * Explicitly leave the assignment of __file__ to the loader.
    * Allow loaders to not set __file__
    * Add an optional API to loaders, module_repr() as defined above.
    +1 on all the above, plus getting rid of __file__ for namespace packages.
    Seems like an elegant solution to the problems involved, and allows DB or
    other importers to make their own attributes like __dsn__ or __url__, but
    still have a decent repr.
    Yes, exactly.

    It seems like there's general consensus about the basic proposal; I'll update
    the PEP so Guido has specific language to pronounce on.

    I want to make one change to what I posted. If m.__loader__.module_repr()
    exists, I want to give it a first crack at producing the repr. This means
    that __file__ is used as a fallback, not as the first step.

    Cheers,
    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 836 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/629b8420/attachment.pgp>
  • Nick Coghlan at May 4, 2012 at 3:14 pm

    On Sat, May 5, 2012 at 12:34 AM, Barry Warsaw wrote:
    ?* Explicitly leave the assignment of __file__ to the loader.
    ?* Allow loaders to not set __file__
    ?* Add an optional API to loaders, module_repr() as defined above.
    I can accept that approach on one condition: the PEP 420
    implementation comes with the long-overdue migration of the definition
    of the import system semantics into the language reference.

    The main sticking point preventing that in the past has been that
    nobody wanted to document all the caveats and special cases needed to
    accurately describe CPython's behaviour. For 3.3+, no such caveats are
    necessary, since Brett's importlib efforts mean that even the default
    import system follows the rules.

    The proposed update will require changes to the description of the
    import semantics, anyway, so rather than making those changes directly
    in PEP 302, it would be better to document them in the language
    reference and update PEP 302 with a note to say that, for 3.3+, it is
    no longer the authoritative source.

    Cheers,
    Nick.

    --
    Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
  • Eric V. Smith at May 4, 2012 at 3:17 pm

    On 05/04/2012 11:14 AM, Nick Coghlan wrote:
    On Sat, May 5, 2012 at 12:34 AM, Barry Warsaw wrote:
    * Explicitly leave the assignment of __file__ to the loader.
    * Allow loaders to not set __file__
    * Add an optional API to loaders, module_repr() as defined above.
    I can accept that approach on one condition: the PEP 420
    implementation comes with the long-overdue migration of the definition
    of the import system semantics into the language reference.

    The main sticking point preventing that in the past has been that
    nobody wanted to document all the caveats and special cases needed to
    accurately describe CPython's behaviour. For 3.3+, no such caveats are
    necessary, since Brett's importlib efforts mean that even the default
    import system follows the rules.

    The proposed update will require changes to the description of the
    import semantics, anyway, so rather than making those changes directly
    in PEP 302, it would be better to document them in the language
    reference and update PEP 302 with a note to say that, for 3.3+, it is
    no longer the authoritative source.
    We did discuss this yesterday at the sprint. I'm all for it, and I think
    the others were, too.

    I'm not keen on tying all of this to PEP 420 acceptance or rejection,
    but it's not the end of the world.

    Eric.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimport-sig @
categoriespython
postedApr 19, '12 at 8:18p
activeMay 5, '12 at 7:32p
posts90
users14
websitepython.org

People

Translate

site design / logo © 2018 Grokbase