FAQ
Hi all,


I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload) "private".


Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)


Anyway, I still need to take some time to clean up the PEP formatting and
run a spell checker. I probably also missed some artifact of an older
version of the API. Otherwise I think it's in a good spot. Comments
welcome.


-eric


p.s. I also plan on getting the implementation up one of these days. :P


===============================================================


PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently@gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
Resolution:




Abstract
========


This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will be authoritative for all the import-related
information about a module, and will be available without needing to
load the module first. Finders will directly provide a module's spec
instead of a loader (which they will continue to provide indirectly).
The import machinery will be adjusted to take advantage of module specs,
including using them to load modules.




Motivation
==========


The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.


As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.


Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system. It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system. Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered.


Currently finders are strictly responsible for providing the loader,
through their find_module() method, which the import system will use to
load the module. The loader is then responsible for doing some checks,
creating the module object, setting import-related attributes,
"installing" the module to ``sys.modules``, and loading the module,
along with some cleanup. This all takes place during the import
system's call to ``Loader.load_module()``. Loaders also provide some
APIs for accessing data associated with a module.


Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.


Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.


Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.


Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details. This is the same gap as before between finders and
loaders.


As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace search locations.


The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.


(The idea gained momentum during discussions related to another PEP.[1])




Specification
=============


The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ``ModuleSpec`` type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.


This is a high-level summary of the changes described by this PEP. More
detail is available in later sections.


importlib.machinery.ModuleSpec (new)
------------------------------------


A specification for a module's import-system-related state.


* ModuleSpec(name, loader, \*, origin=None, loading_info=None,
is_package=None)


Attributes:


* name - a string for the name of the module.
* loader - the loader to use for loading and for module data.
* origin - a string for the location from which the module is loaded,
   e.g. "builtin" for built-in modules and the filename for modules
   loaded from source.
* submodule_search_locations - strings for where to find submodules,
   if a package.
* loading_info - a container of extra data for use during loading.
* cached (property) - a string for where the compiled module will be
   stored (see PEP 3147).
* package (RO-property) - the name of the module's parent (or None).
* has_location (RO-property) - the module's origin refers to a location.


Instance Methods:


* module_repr() - provide a repr string for the spec'ed module.
* init_module_attrs(module) - set any of a module's import-related
   attributes that aren't already set.


importlib.util Additions
------------------------


* spec_from_file_location(name, location, \*, loader=None,
submodule_search_locations=None)
   - factory for file-based module specs.
* from_loader(name, loader, \*, origin=None, is_package=None) - factory
   based on information provided by loaders.
* spec_from_module(module, loader=None) - factory based on existing
   import-related module attributes. This function is expected to be
   used only in some backward-compatibility situations.


Other API Additions
-------------------


* importlib.abc.Loader.exec_module(module) will execute a module in its
   own namespace. It replaces ``importlib.abc.Loader.load_module()``.
* importlib.abc.Loader.create_module(spec) (optional) will return a new
   module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* importlib.find_spec(name, path=None) will return the spec for a
   module.


exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.


API Changes
-----------


* ``InspectLoader.is_package()`` will become optional.


Deprecations
------------


* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* The parameters and attributes of the various loaders in
   importlib.machinery
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()


Removals
--------


These were introduced prior to Python 3.4's release.


* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()


Other Changes
-------------


* The import system implementation in importlib will be changed to make
   use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
   longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
   directly.
* The module type's ``__repr__()`` will be thin wrapper around a pure
   Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
   name and origin.


Backward-Compatibility
----------------------


* If a finder does not define find_spec(), a spec is derived from
   the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
   find_module().
* Loader.load_module() is used if exec_module() is not defined.


What Will not Change?
---------------------


* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
   the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
   same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.




What Will Existing Finders and Loaders Have to Do Differently?
==============================================================


Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
finders and loaders should change relative to this PEP:


* Implement ``find_spec()`` on finders.
* Implement ``exec_module()`` on loaders, if possible.


The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. ``from_loader()`` and
``from_file_location()`` are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ``ModuleSpec.from_module()`` may be useful to
the corresponding finder.


For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().




ModuleSpec Users
================


``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
import hooks, and normal Python users.


Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ``ModuleSpec`` API will get used.


Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.


Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.




How Loading Will Work
=====================


This is an outline of what happens in ModuleSpec's loading
functionality::


    def load(spec):
        if not hasattr(spec.loader, 'exec_module'):
            module = spec.loader.load_module(spec.name)
            spec.init_module_attrs(module)
            return sys.modules[spec.name]


        module = None
        if hasattr(spec.loader, 'create_module'):
            module = spec.loader.create_module(spec)
        if module is None:
            module = ModuleType(spec.name)
        spec.init_module_attrs(module)


        spec._initializing = True
        sys.modues[spec.name] = module
        try:
            spec.loader.exec_module(module)
        except Exception:
            del sys.modules[spec.name]
        finally:
            spec._initializing = False
        return sys.modules[spec.name]


These steps are exactly what ``Loader.load_module()`` is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().


Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.




ModuleSpec
==========


Attributes
----------


Each of the following names is an attribute on ModuleSpec objects. A
value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
ModuleSpec.init_module_attrs().


========================== ==============
On ModuleSpec On Modules
========================== ==============
name __name__
loader __loader__
package __package__
origin __file__*
cached __cached__*,**
submodule_search_locations __path__**
loading_info \-
has_location \-
========================== ==============


\* Set only if has_location is true.
\*\* Set only if the spec attribute is not None.


While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.


**origin**


origin is a string for the place from which the module originates.
Aside from the informational value, it is also used in module_repr().


The module attribute ``__file__`` has a similar but more restricted
meaning. Not all modules have it set (e.g. built-in modules). However,
``origin`` is applicable to all modules. For built-in modules it would
be set to "built-in".


**has_location**


Some modules can be loaded by reference to a location, e.g. a filesystem
path or a URL or something of the sort. Having the location lets you
load the module, but in theory you could load that module under various
names.


In contrast, non-located modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"location".


This attribute reflects whether or not the module is locatable. If it
is, origin must be set to the module's location and ``__file__`` will be
set on the module. Not all locatable modules will be cachable, but most
will.


The corresponding module attribute name, ``__file__``, is somewhat
inaccurate and potentially confusion, so we will use a more explicit
combination of origin and has_location to represent the same
information. Having a separate filename is unncessary since we have
origin.


**submodule_search_locations**


The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is ``None``.


The corresponding module attribute's name, ``__path__``, is relatively
ambiguous. Instead of mirroring it, we use a more explicit name that
makes the purpose clear.


**loading_info**


A finder may set loading_info to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.


For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.


loading_info is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.


Omitted Attributes and Methods
------------------------------


The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).


* create() - provide a new module to use for loading.
* exec(module) - execute the spec into a module namespace.
* load() - prepare a module and execute it in a protected way.
* reload(module) - re-execute a module in a protected way.


Here are other omissions:


There is no PathModuleSpec subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.


While is_package would be a simple additional attribute (aliasing
``self.submodule_search_locations is not None``), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
packages.


Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).


Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
proposal.


Others:


* Add ModuleSpec.submodules (RO-property) - returns possible submodules
   relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
   any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
   spec's loader.
* Also see [3].




Backward Compatibility
----------------------


ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.


Subclassing
-----------


Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loading_info or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().




Existing Types
==============


Module Objects
--------------


Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.


A module's spec will not be kept in sync with the corresponding import-
related attributes. Though they may differ, in practice they will
typically be the same.


One notable exception is that case where a module is run as a script by
using the ``-m`` flag. In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
``__main__``.


Notably, the spec for each module instance will be unique to that
instance even if the information is identical to that of another spec.
This won't happen in general.


Finders
-------


Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_spec()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.


**MetaPathFinder.find_spec(name, path=None)**


**PathEntryFinder.find_spec(name)**


Finders will return ModuleSpec objects when ``find_spec()`` is
called. This new method replaces ``find_module()`` and
``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does
not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are
used instead, for backward-compatibility.


Adding yet another similar method to loaders is a case of practicality.
``find_module()`` could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. However, the extra complexity and a less-than-
explicit method name aren't worth it.


Loaders
-------


**Loader.exec_module(module)**


Loaders will have a new method, exec_module(). Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.


exec_module() should properly handle the case where it is called more
than once. For some kinds of modules this may mean raising ImportError
every time after the first time the method is called. This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.


**Loader.create_module(spec)**


Loaders may also implement create_module() that will return a
new module to exec. It may return None to indicate that the default
module creation code should be used. One use case for create_module()
is to provide a module that is a subclass of the builtin module type.
Most loaders will not need to implement create_module(),


create_module() should properly handle the case where it is called more
than once for the same spec/module. This may include returning None or
raising ImportError.


Other changes:


PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.


``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.


However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.


One consequence of ModuleSpec is that loader ``__init__`` methods will
no longer need to accommodate per-module state. The path-based loaders
in ``importlib`` take arguments in their ``__init__()`` and have
corresponding attributes. However, the need for those values is
eliminated by module specs.


In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.




Other Changes
=============


* The various finders and loaders provided by importlib will be
   updated to comply with this proposal.
* The spec for the ``__main__`` module will reflect how the interpreter
   was started. For instance, with ``-m`` the spec's name will be that
   of the run module, while ``__main__.__name__`` will still be
   "__main__".
* We add ``importlib.find_spec()`` to mirror
   ``importlib.find_loader()`` (which becomes deprecated).
* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
* ``importlib.reload()`` will now make use of the per-module import
   lock.




Reference Implementation
========================


A reference implementation will be available at
http://bugs.python.org/issue18864.




Open Issues
==============


\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
knowledge.


\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.


For instance, pickle should be updated in the __main__ case to look at
``module.__spec__.name``.


\* Impact on some kinds of lazy loading modules. See [3].


\* Find a better name than loading_info? Perhaps loading_data,
loader_state, or loader_info.


\* Change loader.create_module() to prepare_module()?


\* Add more explicit reloading support to exec_module() (and
prepare_module())?




References
==========


[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html


[2] https://mail.python.org/pipermail/import-sig/2013-September/000735.html


[3] https://mail.python.org/pipermail/python-dev/2013-August/128129.html




Copyright
=========


This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/0945f92c/attachment-0001.html>

Search Discussions

  • Brett Cannon at Sep 18, 2013 at 2:57 pm
    Looking good! Comments inline.




    On Wed, Sep 18, 2013 at 5:51 AM, Eric Snow wrote:

    Hi all,

    I finally got some time to update the PEP. I've simplified a few things,
    most notably by making the 4 ModuleSpec methods (create, exec, load,
    reload) "private".

    Also notable is that the new loader method is still create_module() and
    there is still no flag for is_reload on either of the loader methods. I'm
    still not clear on what the flag buys us and on why anything we'd do in a
    prepare_module() we couldn't do in exec_module(). I'm trying to keep this
    simple. :)

    Anyway, I still need to take some time to clean up the PEP formatting and
    run a spell checker. I probably also missed some artifact of an older
    version of the API. Otherwise I think it's in a good spot. Comments
    welcome.

    -eric

    p.s. I also plan on getting the implementation up one of these days. :P

    ===============================================================

    PEP: 451
    Title: A ModuleSpec Type for the Import System
    Version: $Revision$
    Last-Modified: $Date$
    Author: Eric Snow <ericsnowcurrently@gmail.com>
    Discussions-To: import-sig at python.org
    Status: Draft
    Type: Standards Track
    Content-Type: text/x-rst
    Created: 8-Aug-2013
    Python-Version: 3.4
    Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
    Resolution:
      [SNIP]



    Specification
    =============

    The goal is to address the gap between finders and loaders while
    changing as little of their semantics as possible. Though some
    functionality and information is moved to the new ``ModuleSpec`` type,
    their behavior should remain the same. However, for the sake of clarity
    the finder and loader semantics will be explicitly identified.

    This is a high-level summary of the changes described by this PEP. More
    detail is available in later sections.

    importlib.machinery.ModuleSpec (new)
    ------------------------------------

    A specification for a module's import-system-related state.

    * ModuleSpec(name, loader, \*, origin=None, loading_info=None,
    is_package=None)

    Attributes:

    * name - a string for the name of the module.
    * loader - the loader to use for loading and for module data.

    Just drop the "and for module data"; sentence is awkward with it and is a
    margin use-case.



    * origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    * submodule_search_locations - strings for where to find submodules,
    if a package.

    Very subtle hint that it's a sequence of of strings; might want to make it
    more explicit that it's a list.



    * loading_info - a container of extra data for use during loading.
    * cached (property) - a string for where the compiled module will be
    stored (see PEP 3147).
    * package (RO-property) - the name of the module's parent (or None).
    * has_location (RO-property) - the module's origin refers to a location.

    Instance Methods:

    * module_repr() - provide a repr string for the spec'ed module.
    * init_module_attrs(module) - set any of a module's import-related
    attributes that aren't already set.

    importlib.util Additions
    ------------------------

    * spec_from_file_location(name, location, \*, loader=None,
    submodule_search_locations=None)
    - factory for file-based module specs.
    * from_loader(name, loader, \*, origin=None, is_package=None) - factory
    based on information provided by loaders.
    * spec_from_module(module, loader=None) - factory based on existing
    import-related module attributes. This function is expected to be
    used only in some backward-compatibility situations.

    Other API Additions
    -------------------

    * importlib.abc.Loader.exec_module(module) will execute a module in its
    own namespace. It replaces ``importlib.abc.Loader.load_module()``.
    * importlib.abc.Loader.create_module(spec) (optional) will return a new
    module to use for loading.
    * Module objects will have a new attribute: ``__spec__``.
    * importlib.find_spec(name, path=None) will return the spec for a
    module.

    exec_module() and create_module() should not set any import-related
    module attributes. The fact that load_module() does is a design flaw
    that this proposal aims to correct.

    This is a rather jarring place to make this statement since you're just
    outlining API additions, not design decisions.



    API Changes
    -----------

    * ``InspectLoader.is_package()`` will become optional.

    Deprecations
    ------------

    * importlib.abc.MetaPathFinder.find_module()
    * importlib.abc.PathEntryFinder.find_module()
    * importlib.abc.PathEntryFinder.find_loader()
    * importlib.abc.Loader.load_module()
    * importlib.abc.Loader.module_repr()
    * The parameters and attributes of the various loaders in
    importlib.machinery
    * importlib.util.set_package()
    * importlib.util.set_loader()
    * importlib.find_loader()

    Yay to all of this! =)



    Removals
    --------

    These were introduced prior to Python 3.4's release.

    * importlib.abc.Loader.init_module_attrs()
    * importlib.util.module_to_load()

    Other Changes
    -------------

    * The import system implementation in importlib will be changed to make
    use of ModuleSpec.
    * Import-related module attributes (other than ``__spec__``) will no
    longer be used directly by the import system.
    * Import-related attributes should no longer be added to modules
    directly.
    * The module type's ``__repr__()`` will be thin wrapper around a pure
    Python implementation which will leverage ModuleSpec.

    "be a thin"



    * The spec for the ``__main__`` module will reflect the appropriate
    name and origin.

    Backward-Compatibility
    ----------------------

    * If a finder does not define find_spec(), a spec is derived from
    the loader returned by find_module().
    * PathEntryFinder.find_loader() still takes priority over
    find_module().
    * Loader.load_module() is used if exec_module() is not defined.

    What Will not Change?
    ---------------------

    * The syntax and semantics of the import statement.
    * Existing finders and loaders will continue to work normally.
    * The import-related module attributes will still be initialized with
    the same information.
    * Finders will still create loaders (now storing them in specs).
    * Loader.load_module(), if a module defines it, will have all the
    same requirements and may still be called directly.
    * Loaders will still be responsible for module data APIs.
    * importlib.reload() will still overwrite the import-related attributes.


    What Will Existing Finders and Loaders Have to Do Differently?
    ==============================================================

    Immediately? Nothing. The status quo will be deprecated, but will
    continue working. However, here are the things that the authors of
    finders and loaders should change relative to this PEP:

    * Implement ``find_spec()`` on finders.
    * Implement ``exec_module()`` on loaders, if possible.

    The ModuleSpec factory functions in importlib.util are intended to be
    helpful for converting existing finders. ``from_loader()`` and
    ``from_file_location()`` are both straight-forward utilities in this
    regard. In the case where loaders already expose methods for creating
    and preparing modules, ``ModuleSpec.from_module()`` may be useful to
    the corresponding finder.

    For existing loaders, exec_module() should be a relatively direct
    conversion from the non-boilerplate portion of load_module(). In some
    uncommon cases the loader should also implement create_module().


    ModuleSpec Users
    ================

    ``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
    import hooks, and normal Python users.

    "has" -> "have"



    Python will use specs in the import machinery, in interpreter startup,
    and in various standard library modules. Some modules are
    import-oriented, like pkgutil, and others are not, like pickle and
    pydoc. In all cases, the full ``ModuleSpec`` API will get used.

    Import hooks (finders and loaders) will make use of the spec in specific
    ways. First of all, finders may use the spec factory functions in
    importlib.util to create spec objects. They may also directly adjust
    the spec attributes after the spec is created. Secondly, the finder may
    bind additional information to the spec (in finder_extras) for the
    loader to consume during module creation/execution. Finally, loaders
    will make use of the attributes on a spec when creating and/or executing
    a module.

    Python users will be able to inspect a module's ``__spec__`` to get
    import-related information about the object. Generally, Python
    applications and interactive users will not be using the ``ModuleSpec``
    factory functions nor any the instance methods.


    How Loading Will Work
    =====================

    This is an outline of what happens in ModuleSpec's loading
    functionality::

    def load(spec):
    if not hasattr(spec.loader, 'exec_module'):
    module = spec.loader.load_module(spec.name)
    spec.init_module_attrs(module)
    return sys.modules[spec.name]

    module = None
    if hasattr(spec.loader, 'create_module'):
    module = spec.loader.create_module(spec)
    if module is None:
    module = ModuleType(spec.name)
    spec.init_module_attrs(module)

    spec._initializing = True
    sys.modues[spec.name] = module
    try:
    spec.loader.exec_module(module)
    except Exception:
    del sys.modules[spec.name]
    finally:
    spec._initializing = False
    return sys.modules[spec.name]

    These steps are exactly what ``Loader.load_module()`` is already
    expected to do. Loaders will thus be simplified since they will only
    need to implement exec_module().

    Two things. One, it's not exactly what loaders do as that _initializing is
    done by import itself. Any specific reason you added it here?


    Two, you forgot to re-raise the exception in the except clause.



    Note that we must return the module from sys.modules. During loading
    the module may have replaced itself in sys.modules. Since we don't have
    a post-import hook API to accommodate the use case, we have to deal with
    it. However, in the replacement case we do not worry about setting the
    import-related module attributes on the object. The module writer is on
    their own if they are doing this.


    ModuleSpec
    ==========

    Attributes
    ----------

    Each of the following names is an attribute on ModuleSpec objects. A
    value of ``None`` indicates "not set". This contrasts with module
    objects where the attribute simply doesn't exist. Most of the
    attributes correspond to the import-related attributes of modules. Here
    is the mapping. The reverse of this mapping is used by
    ModuleSpec.init_module_attrs().

    ========================== ==============
    On ModuleSpec On Modules
    ========================== ==============
    name __name__
    loader __loader__
    package __package__
    origin __file__*
    cached __cached__*,**
    submodule_search_locations __path__**
    loading_info \-
    has_location \-
    ========================== ==============

    \* Set only if has_location is true.
    \*\* Set only if the spec attribute is not None.

    "Set on the module if the spec"



    While package and has_location are read-only properties, the remaining
    attributes can be replaced after the module spec is created and even
    after import is complete. This allows for unusual cases where directly
    modifying the spec is the best option. However, typical use should not
    involve changing the state of a module's spec.

    **origin**

    origin is a string for the place from which the module originates.
    Aside from the informational value, it is also used in module_repr().

    The module attribute ``__file__`` has a similar but more restricted
    meaning. Not all modules have it set (e.g. built-in modules). However,
    ``origin`` is applicable to all modules. For built-in modules it would
    be set to "built-in".

    **has_location**

    Some modules can be loaded by reference to a location, e.g. a filesystem
    path or a URL or something of the sort. Having the location lets you
    load the module, but in theory you could load that module under various
    names.

    In contrast, non-located modules can't be loaded in this fashion, e.g.
    builtin modules and modules dynamically created in code. For these, the
    name is the only way to access them, so they have an "origin" but not a
    "location".

    This attribute reflects whether or not the module is locatable. If it
    is, origin must be set to the module's location and ``__file__`` will be
    set on the module. Not all locatable modules will be cachable, but most
    will.

    The corresponding module attribute name, ``__file__``, is somewhat
    inaccurate and potentially confusion,

    "confusion" -> "confusing"



    so we will use a more explicit
    combination of origin and has_location to represent the same
    information. Having a separate filename is unncessary since we have
    origin.

    Quote 'origin' so you don't read it like it should have been written "we
    have an origin".



    **submodule_search_locations**

    The list of location strings, typically directory paths, in which to
    search for submodules. If the module is a package this will be set to
    a list (even an empty one). Otherwise it is ``None``.

    The corresponding module attribute's name, ``__path__``, is relatively
    ambiguous. Instead of mirroring it, we use a more explicit name that
    makes the purpose clear.

    **loading_info**

    A finder may set loading_info to any value to provide additional
    data for the loader to use during loading. A value of None is the
    default and indicates that there is no additional data. Otherwise it
    can be set to any object, such as a dict, list, or
    types.SimpleNamespace, containing the relevant extra information.

    For example, zipimporter could use it to pass the zip archive name
    to the loader directly, rather than needing to derive it from origin
    or create a custom loader for each find operation.

    loading_info is meant for use by the finder and corresponding loader.
    It is not guaranteed to be a stable resource for any other use.

    Omitted Attributes and Methods
    ------------------------------

    The following ModuleSpec methods are not part of the public API since
    it is easy to use them incorrectly and only the import system really
    needs them (i.e. they would be an attractive nuisance).

    * create() - provide a new module to use for loading.
    * exec(module) - execute the spec into a module namespace.
    * load() - prepare a module and execute it in a protected way.
    * reload(module) - re-execute a module in a protected way.

    If they are not part of the public API they should have a leading
    underscore.



    Here are other omissions:

    There is no PathModuleSpec subclass of ModuleSpec that separates out
    has_location, cached, and submodule_search_locations. While that might
    make the separation cleaner, module objects don't have that distinction.
    ModuleSpec will support both cases equally well.

    While is_package would be a simple additional attribute (aliasing
    ``self.submodule_search_locations is not None``), it perpetuates the
    artificial (and mostly erroneous) distinction between modules and
    packages.

    Conceivably, a ModuleSpec.load() method could optionally take a list of
    modules with which to interact instead of sys.modules. That
    capability is left out of this PEP, but may be pursued separately at
    some other time, including relative to PEP 406 (import engine).

    Likewise load() could be leveraged to implement multi-version
    imports. While interesting, doing so is outside the scope of this
    proposal.

    Others:

    * Add ModuleSpec.submodules (RO-property) - returns possible submodules
    relative to the spec.
    * Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
    any.
    * Add ModuleSpec.data - a descriptor that wraps the data API of the
    spec's loader.
    * Also see [3].


    Backward Compatibility
    ----------------------

    ModuleSpec doesn't have any. This would be a different story if
    Finder.find_module() were to return a module spec instead of loader.
    In that case, specs would have to act like the loader that would have
    been returned instead. Doing so would be relatively simple, but is an
    unnecessary complication. It was part of earlier versions of this PEP.

    Subclassing
    -----------

    Subclasses of ModuleSpec are allowed, but should not be necessary.
    Simply setting loading_info or adding functionality to a custom
    finder or loader will likely be a better fit and should be tried first.
    However, as long as a subclass still fulfills the requirements of the
    import system, objects of that type are completely fine as the return
    value of Finder.find_spec().


    [SNIP]




    Open Issues
    ==============

    \* The impact of this change on pkgutil (and setuptools) needs looking
    into. It has some generic function-based extensions to PEP 302. These
    may break if importlib starts wrapping loaders without the tools'
    knowledge.

    \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
    inspect.

    For instance, pickle should be updated in the __main__ case to look at
    ``module.__spec__.name``.

    \* Impact on some kinds of lazy loading modules. See [3].

    \* Find a better name than loading_info? Perhaps loading_data,
    loader_state, or loader_info.

    loader_state or loader_data get my vote.



    \* Change loader.create_module() to prepare_module()?

    -0 from me.


    -Brett
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/24dbb4e5/attachment-0001.html>
  • Eric Snow at Sep 19, 2013 at 5:06 am

    On Wed, Sep 18, 2013 at 8:57 AM, Brett Cannon wrote:


    Looking good! Comments inline.

    Thanks for the feedback, Brett. I fixed everything you pointed out. Also,
    I'm going with loader_state. :)


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/a58d3f73/attachment.html>
  • Nick Coghlan at Sep 18, 2013 at 4:08 pm

    On 18 September 2013 19:51, Eric Snow wrote:
    Hi all,

    I finally got some time to update the PEP. I've simplified a few things,
    most notably by making the 4 ModuleSpec methods (create, exec, load, reload)
    "private".

    Also notable is that the new loader method is still create_module() and
    there is still no flag for is_reload on either of the loader methods. I'm
    still not clear on what the flag buys us and on why anything we'd do in a
    prepare_module() we couldn't do in exec_module(). I'm trying to keep this
    simple. :)

    The point is to give the invoker of the loader a chance to muck about
    with the module state before actually executing the module. For
    example, runpy and the updated extension loader API could use this to
    support execution of compiled Cython modules with -m.


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Eric Snow at Sep 18, 2013 at 10:14 pm

    On Wed, Sep 18, 2013 at 10:08 AM, Nick Coghlan wrote:

    On 18 September 2013 19:51, Eric Snow wrote:
    Hi all,

    I finally got some time to update the PEP. I've simplified a few things,
    most notably by making the 4 ModuleSpec methods (create, exec, load, reload)
    "private".

    Also notable is that the new loader method is still create_module() and
    there is still no flag for is_reload on either of the loader methods. I'm
    still not clear on what the flag buys us and on why anything we'd do in a
    prepare_module() we couldn't do in exec_module(). I'm trying to keep this
    simple. :)
    The point is to give the invoker of the loader a chance to muck about
    with the module state before actually executing the module. For
    example, runpy and the updated extension loader API could use this to
    support execution of compiled Cython modules with -m.

    That makes sense. A loader.create_module() method (not called during
    reload) gives you that. I'm all for that. I'm just not clear on why it
    needs to be more than that.


    My understanding of the proposed prepare_module() is it would always be
    called right before exec_module(), whether it be load or reload (there
    would be no create_module()). Then in that case, can't loaders just roll
    their prepare_module() implementation into the beginning of exec_module()
    (even call spec.init_module_attrs() directly)? What's the advantage to
    splitting that out in the Loader API? I know I'm missing something here.
      (Maybe I shouldn't try to work on the PEP so late at night!)


    ...after further consideration...


    I expect it's so that during reload the loader can indicate "don't reload
    in-place, load into this module instead!" So the module passed in to
    exec_module() would end up being different from the existing module in
    sys.modules. However, can't exec_module() simply exec into the module that
    it would have returned from prepare_module() and then directly stick it
    into sys.modules?


    ...after further consideration...


    Okay, maybe I'm seeing it. Would it be something like the following?


    #-- start prepare_module() example --


    class ModuleSpec:
         ...
         def _load(self):
             # This is basically the same as the PEP currently defines it.
             module = self.loader.prepare_module(self) # I prefer create_module
    for this.
             if module is None:
                 module = ModuleType(self.name)
             self.init_module_attrs(module)
             # skipping some boilerplate
             sys.modules[self.name] = module
             self.loader.exec_module(module)
             return sys.modules[self.name]


         def _reload(self, module):
             # This is where it gets different.
             prepared = self.loader.prepare_module(self, module)
             if prepared is not None:
                 self.init_module_attrs(prepared)
                 module = prepared
                 sys.modules[self.name] = module
             self.loader.exec_module(module)
             return sys.modules[self.name]


    class SomeLoader:


         def prepare_module(self, spec, module=None):
             if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
    spec.name):
                 self.initialize_stuff(spec)
             return MyCustomModule(spec.name)


         def exec_module(self, module):
             # Do exec stuff here.


    #-- end prepare_module() example --


    (Note that _load() and _reload() could share more code than they do, but
    regardless...)


    Contrast that with what the PEP specifies currently.


    #-- start current PEP example --


    class ModuleSpec:
         ...
         def _create(self):
             module = self.loader.create_module(self)
             if module is None:
                 module = ModuleType(self.name)
             self.init_module_attrs(module)
             return module


         def _load(self):
             module = self._create()
             # skipping boilerplate
             self.loader.exec_module(module)
             return sys.modules[self.name]


         def _reload(self, module):
             self.loader.exec_module(module)
             return sys.modules[self.name]


    class SomeLoader:


         def create_module(self, spec):
             if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
    spec.name):
                 self.initialize_stuff(spec)
             return MyCustomModule(spec.name)


         def exec_module(self, module):
             if not
    self.never_ever_been_loaded_before_not_even_in_subinterpreters(spec.name):
                 module = module.__spec__._create()
                 # or module = self.create_module(spec);
    spec.init_module_attrs(module)
                 sys.modules[module.__name__] = module
             # Do exec stuff here.


    #-- end current PEP example --


    The way I see it, in the latter example the ModuleSpec is easier to follow,
    without making exec_module() that much more complicated.


    Regardless, at this point I'm seeing prepare_module() as a formal API for
    "use *this* module instead of what you would use by default." While
    create_module() provides that for the loading case, prepare_module() also
    provides it explicitly for the reloading case. Consequently, in the reload
    case prepare_module() does eliminate the boilerplate that exec_module()
    otherwise must accommodate. That's probably the biggest reason to go there.


    I wonder if we could instead wrap that bit in a ModuleSpec helper method
    that loaders can call in exec_module():


       def _new_module_for_reload(self):
           module = self._create()
           sys.modules[self.name] = module


    FWIW, I think create_module() is still an appropriate (and better) name
    regardless of where it's used.


    At this point I still would rather stick with what the PEP currently
    specifies, but I'm going ruminate on the reload case--e,g, re-read your
    message about reload strategies as well as your response to my message
    about module lifecycles. I think I have a more context to fit them into
    the big picture here.


    Not to leave anything out, is there any reason we shouldn't punt right now
    on the whole reload mechanics issue and bundle it with the PEP on improving
    extension modules? I'd like to wrap up ModuleSpec and see about the .ref
    PEP that started all this. Plus I think this PEP is hitting the limit of a
    mentally bite-size proposal. I've been lamentably busy of late so I'm
    worried about expanding them PEP. However, I'm open to more discussion on
    supporting other reload strategies, particularly if you think this PEP
    should not move forward with having settled the issue.


    BTW, thanks for diving into the extension module questions (you and
    Stefan). Those discussions have helped improve this PEP. :)


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/cbb2a535/attachment.html>
  • Nick Coghlan at Sep 19, 2013 at 1:01 am
    Yeah, I preferred the "prepare_module" name when I thought the extension
    loader returned the cached module object directly. It doesn't, it returns a
    copy, so "create_module" is fine.


    Also agreed on deferring reload behavioural improvements to a separate PEP.
    As noted in my other email, I think an advisory "this isn't going to work"
    API is a better idea now, since even pure Python modules don't always
    support reloading.


    And +1 to "loader_state" as the helper attribute name.


    Cheers,
    Nick.
    On 19 Sep 2013 08:14, "Eric Snow" wrote:

    On Wed, Sep 18, 2013 at 10:08 AM, Nick Coghlan wrote:

    On 18 September 2013 19:51, Eric Snow <ericsnowcurrently@gmail.com>
    wrote:
    Hi all,

    I finally got some time to update the PEP. I've simplified a few things,
    most notably by making the 4 ModuleSpec methods (create, exec, load, reload)
    "private".

    Also notable is that the new loader method is still create_module() and
    there is still no flag for is_reload on either of the loader methods. I'm
    still not clear on what the flag buys us and on why anything we'd do in a
    prepare_module() we couldn't do in exec_module(). I'm trying to keep this
    simple. :)
    The point is to give the invoker of the loader a chance to muck about
    with the module state before actually executing the module. For
    example, runpy and the updated extension loader API could use this to
    support execution of compiled Cython modules with -m.
    That makes sense. A loader.create_module() method (not called during
    reload) gives you that. I'm all for that. I'm just not clear on why it
    needs to be more than that.

    My understanding of the proposed prepare_module() is it would always be
    called right before exec_module(), whether it be load or reload (there
    would be no create_module()). Then in that case, can't loaders just roll
    their prepare_module() implementation into the beginning of exec_module()
    (even call spec.init_module_attrs() directly)? What's the advantage to
    splitting that out in the Loader API? I know I'm missing something here.
    (Maybe I shouldn't try to work on the PEP so late at night!)

    ...after further consideration...

    I expect it's so that during reload the loader can indicate "don't reload
    in-place, load into this module instead!" So the module passed in to
    exec_module() would end up being different from the existing module in
    sys.modules. However, can't exec_module() simply exec into the module that
    it would have returned from prepare_module() and then directly stick it
    into sys.modules?

    ...after further consideration...

    Okay, maybe I'm seeing it. Would it be something like the following?

    #-- start prepare_module() example --

    class ModuleSpec:
    ...
    def _load(self):
    # This is basically the same as the PEP currently defines it.
    module = self.loader.prepare_module(self) # I prefer
    create_module for this.
    if module is None:
    module = ModuleType(self.name)
    self.init_module_attrs(module)
    # skipping some boilerplate
    sys.modules[self.name] = module
    self.loader.exec_module(module)
    return sys.modules[self.name]

    def _reload(self, module):
    # This is where it gets different.
    prepared = self.loader.prepare_module(self, module)
    if prepared is not None:
    self.init_module_attrs(prepared)
    module = prepared
    sys.modules[self.name] = module
    self.loader.exec_module(module)
    return sys.modules[self.name]

    class SomeLoader:

    def prepare_module(self, spec, module=None):
    if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
    spec.name):
    self.initialize_stuff(spec)
    return MyCustomModule(spec.name)

    def exec_module(self, module):
    # Do exec stuff here.

    #-- end prepare_module() example --

    (Note that _load() and _reload() could share more code than they do, but
    regardless...)

    Contrast that with what the PEP specifies currently.

    #-- start current PEP example --

    class ModuleSpec:
    ...
    def _create(self):
    module = self.loader.create_module(self)
    if module is None:
    module = ModuleType(self.name)
    self.init_module_attrs(module)
    return module

    def _load(self):
    module = self._create()
    # skipping boilerplate
    self.loader.exec_module(module)
    return sys.modules[self.name]

    def _reload(self, module):
    self.loader.exec_module(module)
    return sys.modules[self.name]

    class SomeLoader:

    def create_module(self, spec):
    if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
    spec.name):
    self.initialize_stuff(spec)
    return MyCustomModule(spec.name)

    def exec_module(self, module):
    if not
    self.never_ever_been_loaded_before_not_even_in_subinterpreters(spec.name):
    module = module.__spec__._create()
    # or module = self.create_module(spec);
    spec.init_module_attrs(module)
    sys.modules[module.__name__] = module
    # Do exec stuff here.

    #-- end current PEP example --

    The way I see it, in the latter example the ModuleSpec is easier to
    follow, without making exec_module() that much more complicated.

    Regardless, at this point I'm seeing prepare_module() as a formal API for
    "use *this* module instead of what you would use by default." While
    create_module() provides that for the loading case, prepare_module() also
    provides it explicitly for the reloading case. Consequently, in the reload
    case prepare_module() does eliminate the boilerplate that exec_module()
    otherwise must accommodate. That's probably the biggest reason to go there.

    I wonder if we could instead wrap that bit in a ModuleSpec helper method
    that loaders can call in exec_module():

    def _new_module_for_reload(self):
    module = self._create()
    sys.modules[self.name] = module

    FWIW, I think create_module() is still an appropriate (and better) name
    regardless of where it's used.

    At this point I still would rather stick with what the PEP currently
    specifies, but I'm going ruminate on the reload case--e,g, re-read your
    message about reload strategies as well as your response to my message
    about module lifecycles. I think I have a more context to fit them into
    the big picture here.

    Not to leave anything out, is there any reason we shouldn't punt right now
    on the whole reload mechanics issue and bundle it with the PEP on improving
    extension modules? I'd like to wrap up ModuleSpec and see about the .ref
    PEP that started all this. Plus I think this PEP is hitting the limit of a
    mentally bite-size proposal. I've been lamentably busy of late so I'm
    worried about expanding them PEP. However, I'm open to more discussion on
    supporting other reload strategies, particularly if you think this PEP
    should not move forward with having settled the issue.

    BTW, thanks for diving into the extension module questions (you and
    Stefan). Those discussions have helped improve this PEP. :)

    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/3d10df23/attachment-0001.html>
  • Eric Snow at Sep 19, 2013 at 5:13 am

    On Wed, Sep 18, 2013 at 7:01 PM, Nick Coghlan wrote:


    Yeah, I preferred the "prepare_module" name when I thought the extension
    loader returned the cached module object directly. It doesn't, it returns a
    copy, so "create_module" is fine.
    Cool.

    Also agreed on deferring reload behavioural improvements to a separate PEP.
    Sounds good.

    As noted in my other email, I think an advisory "this isn't going to work"
    API is a better idea now, since even pure Python modules don't always
    support reloading.
      What do you mean by "advisory" API?

    And +1 to "loader_state" as the helper attribute name.
    That's settled then! Thanks for the feedback.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/ae9f9906/attachment.html>
  • Eric Snow at Sep 19, 2013 at 5:38 am
    I'm thinking that it may be useful to have ModuleSpec inherit from str and
    set it to the module name. Then the spec could be passed directly to those
    loader APIs that take the module name. Thoughts?


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/1e854cef/attachment-0001.html>
  • Nick Coghlan at Sep 19, 2013 at 8:17 am

    On 19 September 2013 15:38, Eric Snow wrote:
    I'm thinking that it may be useful to have ModuleSpec inherit from str and
    set it to the module name. Then the spec could be passed directly to those
    loader APIs that take the module name. Thoughts?

    I think I'd need to see the code you think it would simplify before
    saying yes (since my default answer is "No, inheriting from str is an
    unnecessary hack").


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Eric Snow at Sep 19, 2013 at 4:42 pm

    On Thu, Sep 19, 2013 at 2:17 AM, Nick Coghlan wrote:

    On 19 September 2013 15:38, Eric Snow wrote:
    I'm thinking that it may be useful to have ModuleSpec inherit from str and
    set it to the module name. Then the spec could be passed directly to those
    loader APIs that take the module name. Thoughts?
    I think I'd need to see the code you think it would simplify before
    saying yes (since my default answer is "No, inheriting from str is an
    unnecessary hack").

    On Thu, Sep 19, 2013 at 2:21 AM, Antoine Pitrou wrote:

    I would generally be -1 on some hacks.
    Especially, str subclasses can leak to unsuspected places and create
    weird issues (I remember an issue with BeautifulSoup, IIRC, which
    returned str subclasses which kept whole HTML trees alive: by passing
    those str objects around you would create yourself a huge memory leak).



    Agreed. I've done it in other projects for backward-compatibility reasons,
    but that doesn't really apply here. That's interesting about memory leaks.
      I would not have expected that.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/3f0a3f66/attachment.html>
  • Antoine Pitrou at Sep 19, 2013 at 8:21 am

    Le Wed, 18 Sep 2013 23:38:23 -0600, Eric Snow <ericsnowcurrently@gmail.com> a ?crit :
    I'm thinking that it may be useful to have ModuleSpec inherit from
    str and set it to the module name. Then the spec could be passed
    directly to those loader APIs that take the module name. Thoughts?

    I would generally be -1 on some hacks.
    Especially, str subclasses can leak to unsuspected places and create
    weird issues (I remember an issue with BeautifulSoup, IIRC, which
    returned str subclasses which kept whole HTML trees alive: by passing
    those str objects around you would create yourself a huge memory leak).


    Regards


    Antoine.
  • Antoine Pitrou at Sep 19, 2013 at 10:22 am
    Hi,


    I have some questions and comments:

    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.

    Filename or filepath? What if the module is stored in e.g. a ZIP file?

    submodule_search_locations - list of strings for where to find
    submodules, if a package (None otherwise).

    Why isn't is_package exposed as an attribute too?

    cached (property) - a string for where the compiled module will be
    stored

    "where" is a filesystem location?
    (absolute? relative to the origin?)

    has_location (RO-property) - the module's origin refers to a location.

    filesystem location? What about ZIP files?

    spec_from_file_location(name, location, *, loader=None,
    submodule_search_locations=None) - factory for file-based module specs

    What does it mean? Is it able to make "intelligent" decisions depending
    on e.g. whether the module is an extension module or a pure Python
    module?

    from_loader(name, loader, *, origin=None, is_package=None) - factory
    based on information provided by loaders.

    That description is rather unhelpful.

    importlib.find_spec(name, path=None) will return the spec for a module.

    Is the module supposed to be already loaded or not? How is the spec
    "found"?


    Regards


    Antoine.
  • Paul Moore at Sep 19, 2013 at 11:28 am

    On 19 September 2013 11:22, Antoine Pitrou wrote:
    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    Filename or filepath? What if the module is stored in e.g. a ZIP file?

    I haven't been following this thread closely, but this is a good
    point. There is a general issue that for modules loaded off sys.path,
    the module "location" needs to be somehow jammed into a string form
    (the absolute path for files, zip/file/path.zip/location/in/zipfile
    for zipfiles, but potentially anything at all for custom loaders) and
    for things loaded off sys.meta_path there's no need for any concept of
    path at all (that's how builtins, frozen modules et al work).


    It's worth being clear on both how this origin should be constructed
    in the general case (for the guidance of people implementing
    non-standard importers) and what users of the data can assume when
    using the data (can they split the value on os.sep or '/', for
    example, or is it in effect an opaque token).


    Some of the blame for all this being vague at the moment is down to me
    - when we were writing PEP 302, I wasn't brave enough to claim that
    path entries could be opaque token values, but I didn't want to insist
    that all importers had to follow a specific structure. So I ignored
    the issue and we just ended up with normal paths, and zipfiles which
    treat the zipfile as a pseudo-directory. And no examples of corner
    cases to keep people honest. My apologies for that...


    Paul
  • Eric Snow at Sep 19, 2013 at 7:30 pm

    On Thu, Sep 19, 2013 at 5:28 AM, Paul Moore wrote:

    On 19 September 2013 11:22, Antoine Pitrou wrote:
    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    Filename or filepath? What if the module is stored in e.g. a ZIP file?
    I haven't been following this thread closely, but this is a good
    point. There is a general issue that for modules loaded off sys.path,
    the module "location" needs to be somehow jammed into a string form
    (the absolute path for files, zip/file/path.zip/location/in/zipfile
    for zipfiles, but potentially anything at all for custom loaders) and
    for things loaded off sys.meta_path there's no need for any concept of
    path at all (that's how builtins, frozen modules et al work).

    It's worth being clear on both how this origin should be constructed
    in the general case (for the guidance of people implementing
    non-standard importers) and what users of the data can assume when
    using the data (can they split the value on os.sep or '/', for
    example, or is it in effect an opaque token).

    Actually, "origin" is meant to be pretty unconstrained string. It only has
    2 explicit purposes: use in spec.module_repr() and as the value of __file__
    when spec.has_location is true. The loader may use "origin" however it
    likes. Presumably the finder would populate origin in whatever format the
    loader needs (if the loader even needs "origin"), but that's between the
    finder and loader. If the loader needs even more info, the finder can just
    stick it into the spec's loader_state attribute.



    Some of the blame for all this being vague at the moment is down to me
    - when we were writing PEP 302, I wasn't brave enough to claim that
    path entries could be opaque token values, but I didn't want to insist
    that all importers had to follow a specific structure. So I ignored
    the issue and we just ended up with normal paths, and zipfiles which
    treat the zipfile as a pseudo-directory. And no examples of corner
    cases to keep people honest. My apologies for that...

    As Nick pointed out, the "loader_state" attribute of ModuleSpec objects is
      meant to be the container for any extra data the loader needs.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/d9bbf94f/attachment-0001.html>
  • Brett Cannon at Sep 19, 2013 at 2:11 pm
    On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou wrote:

    Hi,

    I have some questions and comments:
    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    Filename or filepath? What if the module is stored in e.g. a ZIP file?

    I think this would be what __file__ would be set to for zipfiles, so for
    zip files it would be e.g. /some/file.zip/path/to/module.py



    submodule_search_locations - list of strings for where to find
    submodules, if a package (None otherwise).
    Why isn't is_package exposed as an attribute too?

    It's redundant. The test for whether something is a package is literally
    ``submodule_search_locations is not None``. It just doesn't isn't
    complicated enough to warrant another attribute. Plus being a package isn't
    as important per-se as a concept as much as having a search path.



    cached (property) - a string for where the compiled module will be
    stored
    "where" is a filesystem location?
    (absolute? relative to the origin?)

    It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would
    return.



    has_location (RO-property) - the module's origin refers to a location.
    filesystem location? What about ZIP files?

    It's a flag to basically say that origin contains what __file__ should be.


    -Brett



    spec_from_file_location(name, location, *, loader=None,
    submodule_search_locations=None) - factory for file-based module specs
    What does it mean? Is it able to make "intelligent" decisions depending
    on e.g. whether the module is an extension module or a pure Python
    module?
    from_loader(name, loader, *, origin=None, is_package=None) - factory
    based on information provided by loaders.
    That description is rather unhelpful.
    importlib.find_spec(name, path=None) will return the spec for a module.
    Is the module supposed to be already loaded or not? How is the spec
    "found"?

    Regards

    Antoine.


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    https://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/7c130e02/attachment.html>
  • Nick Coghlan at Sep 19, 2013 at 2:30 pm

    On 20 Sep 2013 00:12, "Brett Cannon" wrote:


    On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou wrote:


    Hi,

    I have some questions and comments:
    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    Filename or filepath? What if the module is stored in e.g. a ZIP file?

    I think this would be what __file__ would be set to for zipfiles, so for
    zip files it would be e.g. /some/file.zip/path/to/module.py
    submodule_search_locations - list of strings for where to find
    submodules, if a package (None otherwise).
    Why isn't is_package exposed as an attribute too?

    It's redundant. The test for whether something is a package is literally
    ``submodule_search_locations is not None``. It just doesn't isn't
    complicated enough to warrant another attribute. Plus being a package isn't
    as important per-se as a concept as much as having a search path.
    cached (property) - a string for where the compiled module will be
    stored
    "where" is a filesystem location?
    (absolute? relative to the origin?)

    It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would
    return.
    has_location (RO-property) - the module's origin refers to a location.
    filesystem location? What about ZIP files?

    It's a flag to basically say that origin contains what __file__ should be.

    Thus indicating that get_data() on the loader can be used sensibly. Perhaps
    we could just make setting __file__ conditional on the loader defining
    get_data, rather than having it be a spec attribute?


    I also suggest that we adopt the convention of using angle brackets in
    non-location origins. So names like "<builtin>" and "<frozen>".


    To respond to something Paul said, our completely opaque token is
    "loader_state", origin is still intended to be a human readable string.


    Cheers,
    Nick.

    -Brett
    spec_from_file_location(name, location, *, loader=None,
    submodule_search_locations=None) - factory for file-based module specs
    What does it mean? Is it able to make "intelligent" decisions depending
    on e.g. whether the module is an extension module or a pure Python
    module?
    from_loader(name, loader, *, origin=None, is_package=None) - factory
    based on information provided by loaders.
    That description is rather unhelpful.
    importlib.find_spec(name, path=None) will return the spec for a module.
    Is the module supposed to be already loaded or not? How is the spec
    "found"?

    Regards

    Antoine.


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    https://mail.python.org/mailman/listinfo/import-sig


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    https://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130920/e499813a/attachment-0001.html>
  • Antoine Pitrou at Sep 19, 2013 at 2:48 pm

    Le Fri, 20 Sep 2013 00:30:29 +1000, Nick Coghlan <ncoghlan@gmail.com> a ?crit :

    I also suggest that we adopt the convention of using angle brackets in
    non-location origins. So names like "<builtin>" and "<frozen>".

    +1. They stand out much better.


    Regards


    Antoine.
  • Eric Snow at Sep 19, 2013 at 7:42 pm

    On Thu, Sep 19, 2013 at 8:30 AM, Nick Coghlan wrote:

    On 20 Sep 2013 00:12, "Brett Cannon" wrote:
    On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou wrote:
    has_location (RO-property) - the module's origin refers to a location.
    filesystem location? What about ZIP files?

    It's a flag to basically say that origin contains what __file__ should
    be.

    Thus indicating that get_data() on the loader can be used sensibly.
    Perhaps we could just make setting __file__ conditional on the loader
    defining get_data, rather than having it be a spec attribute?
    I'd still like to keep an explicit "has_location" as a clear, informational
    declaration. How about we always set it to True if loader.get_data exists?
      I think you proposed this before and it got lost in the shuffle.

    I also suggest that we adopt the convention of using angle brackets in
    non-location origins. So names like "<builtin>" and "<frozen>".
    Well, I'm already having module_repr() do that. I've thought of this
    before, but decided it was better to have the separate "has_location"
    attribute. Then there is no ambiguity between the origin of a
    non-locatable module and a locatable one that happens to have bookend angle
    brackets. I will make sure the spec is explicit about the angle brackets
    in module_repr().


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/d25f0270/attachment.html>
  • Eric Snow at Sep 19, 2013 at 7:52 pm

    On Thu, Sep 19, 2013 at 1:42 PM, Eric Snow wrote:

    On Thu, Sep 19, 2013 at 8:30 AM, Nick Coghlan wrote:

    I also suggest that we adopt the convention of using angle brackets in
    non-location origins. So names like "<builtin>" and "<frozen>".
    Well, I'm already having module_repr() do that.

    Actually no I wasn't. The current repr for the sys module is "<module
    'sys' (built-in)>". Adding the angle brackets would change that. It's not
    a big deal to me either way. I actually kind of like the idea of using
    angle brackets (by convention) on a non-locatable origin. It just changes
    existing reprs and can be ambiguous in the (unlikely) situation I
    described. I'm leaning toward not doing the angle brackets, but I can be
    swayed. :)


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/5a5860ef/attachment.html>
  • Eric Snow at Sep 19, 2013 at 7:12 pm
    Hi Antoine,


    Thanks for the feedback. Comments inline.


    On Thu, Sep 19, 2013 at 4:22 AM, Antoine Pitrou wrote:

    origin - a string for the location from which the module is loaded,
    e.g. "builtin" for built-in modules and the filename for modules
    loaded from source.
    Filename or filepath? What if the module is stored in e.g. a ZIP file?

    As Brett mentioned, it would be whatever is currently bound to __file__.
      Keep in mind that the two things I listed are just examples of the sorts
    of things that would go into "origin". The point of "origin" is actually
    explained in more detail further on in the PEP.



    submodule_search_locations - list of strings for where to find
    submodules, if a package (None otherwise).
    Why isn't is_package exposed as an attribute too?

    We had some discussion on this on a previous revision of the PEP.
      Initially I had is_package as a property of ModuleSpec. However, we came
    to the agreement that whether or not the spec represents a package is not
    very important once you have the spec. This contrasts with the is_package
    parameter to ModuleSpec which is useful since it represents a set of things
    that should be effected on the new spec object. Ultimately Nick put it
    best when he said that we need to de-emphasize the superficial
    package/module distinction, not enshrine it as an attribute. The PEP
    actually addresses the question of is_package in the "Omitted Attributes
    and Methods" section.



    cached (property) - a string for where the compiled module will be
    stored
    "where" is a filesystem location?
    (absolute? relative to the origin?)

    As Brett noted (and the module attribute table further on indicates), this
    is the same as the __cache__ attribute of modules.



    has_location (RO-property) - the module's origin refers to a location.
    filesystem location? What about ZIP files?

    Also as Brett indicated, this is a flag that indicates that "origin" should
    be copied into __file__
    on corresponding module objects. However, the summary is pretty unclear.
      I'll fix that.



    spec_from_file_location(name, location, *, loader=None,
    submodule_search_locations=None) - factory for file-based module specs
    What does it mean? Is it able to make "intelligent" decisions depending
    on e.g. whether the module is an extension module or a pure Python
    module?

    It does make some intelligent decisions. Otherwise a finder would just
    call ModuleSpec directly. (All three factory functions are there for the
    convenience of finders.) I'll add some explanation on what those decisions
    entail and also clarify the summary.



    from_loader(name, loader, *, origin=None, is_package=None) - factory
    based on information provided by loaders.
    That description is rather unhelpful.

    Likewise I'll add more explanation for this as well as improve the summary.



    importlib.find_spec(name, path=None) will return the spec for a module.
    Is the module supposed to be already loaded or not? How is the spec
    "found"?

    This function is the replacement for importlib.find_loader(). Instead of
    returning a loader it
    returns a spec. Otherwise it's the same. I'll make the summary more clear.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/abcd3e5b/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimport-sig @
categoriespython
postedSep 18, '13 at 9:51a
activeSep 19, '13 at 7:52p
posts20
users5
websitepython.org

People

Translate

site design / logo © 2018 Grokbase