FAQ
This is an outgrowth of discussions on the .ref PEP, but it's also
something I've been thinking about for over a year and starting toying with
at the last PyCon. I have a patch that passes all but a couple unit tests
and should pass though when I get a minute to take another pass at it.
  I'll probably end up adding a bunch more unit tests before I'm done as
well. However, the functionality is mostly there.


BTW, I gotta say, Brett, I have a renewed appreciation for the long and
hard effort you put into importlib. There are just so many odd corner
cases that I never would have looked for if not for that library. And
those unit tests do a great job of covering all of that. Thanks!


-eric


-------------------------------------------------------------------------------


PEP: 4XX
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently@gmail.com>
BDFL-Delegate: ???
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013
Resolution:




Abstract
========


This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will contain all the import-related information
about a module without needing to load the module first. Finders will
now return a module's spec rather than a loader. The import system will
use the spec to load the module.




Motivation
==========


The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.


As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.


Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.


Finders are strictly responsible for providing the loader which the
import system will use to load the module. The loader is then
responsible for doing some checks, creating the module object, setting
import-related attributes, "installing" the module to ``sys.modules``,
and loading the module, along with some cleanup. This all takes place
during the import system's call to ``Loader.load_module()``. Loaders
also provide some APIs for accessing data associated with a module.


Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.


Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.


Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Either store it in a module-to-info
mapping somewhere like on the finder itself, or store it on the loader.
Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
that. This is the same gap as before between finders and loaders.


As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace path.


The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
of loading the module.


(The idea grew feet during discussions related to another PEP.[1])




Specification
=============


ModuleSpec
----------


A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
information about the module. This greatly reduces the need to add any
new import-related attributes to module objects.


Attributes:


* ``name`` - the module's name (compare to ``__name__``).
* ``loader`` - the loader to use during loading and for module data
   (compare to ``__loader__``).
* ``package`` - the name of the module's parent (compare to
   ``__package__``).
* ``is_package`` - whether or not the module is a package.
* ``origin`` - the location from which the module originates.
* ``filename`` - like origin, but limited to a path-based location
   (compare to ``__file__``).
* ``cached`` - the location where the compiled module should be stored
   (compare to ``__cached__``).
* ``path`` - the list of path entries in which to search for submodules
   or ``None``. (compare to ``__path__``). It should be in sync with
   ``is_package``.


Those are also the parameters to ``ModuleSpec.__init__()``, in that
order. The last three are optional. When passed the values are taken
as-is. The ``from_loader()`` method offers calculated values.


Methods:


* ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the
   arguments. The parameters are the same as with ``__init__``, except
   ``package`` is excluded and only ``name`` and ``loader`` are required.
* ``module_repr()`` - returns a repr for the module.
* ``init_module_attrs(module)`` - sets the module's import-related
   attributes.
* ``load(module=None, *, is_reload=False)`` - calls the loader's
   ``exec_module()``, falling back to ``load_module()`` if necessary.
   This method performs the former responsibilities of loaders for
   managing modules before actually loading and for cleaning up. The
   reload case is facilitated by the ``module`` and ``is_reload``
   parameters.


Values Derived by from_loader()
-------------------------------


As implied above, ``from_loader()`` makes a best effort at calculating
any of the values that are not passed in. It duplicates the behavior
that was formerly provided the several ``importlib.util`` functions as
well as the ``init_module_attrs()`` method of several of ``importlib``'s
loaders. Just to be clear, here is a more detailed description of those
calculations:


``is_package`` is derived from ``path``, if passed. Otherwise the
loader's ``is_package()`` is tried. Finally, it defaults to False.


``filename`` is pulled from the loader's ``get_filename()``, if
possible.


``path`` is set to an empty list if ``is_package`` is true, and the
directory from ``filename`` is appended to it, if available.


``cached`` is derived from ``filename`` if it's available.


``origin`` is set to ``filename``.


``package`` is set to ``name`` if the module is a package and
to ``name.rpartition('.')[0]`` otherwise. Consequently, a
top-level module will have ``package`` set to the empty string.


Backward Compatibility
----------------------


Since finder ``find_module()`` methods would now return a module spec
instead of loader, specs must act like the loader that would have been
returned instead. This is relatively simple to solve since the loader
is available as an attribute of the spec.


However, ``ModuleSpec.is_package`` (an attribute) conflicts with
``InspectLoader.is_package()`` (a method). Working around this requires
a more complicated solution but is not a large obstacle.


Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.


Subclassing
-----------


.. XXX Allowed but discouraged?


Module Objects
--------------


Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be.
Any such deprecation can wait until Python 4.


``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. They may differ, though in
practice they will be the same.


Finders
-------


Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.


Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough.
The approach taken by this PEP should be sufficient.


The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.


Loaders
-------


Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.


The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.


A loader must have ``exec_module()`` or ``load_module()`` defined. If
both exist on the loader, ``exec_module()`` is used and
``load_module()`` is ignored.


PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.


The loader ``init_module_attr()`` method, added for Python 3.4 will be
eliminated in favor of the same method on ``ModuleSpec``.


However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.


In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.


Other Changes
-------------


* The various finders and loaders provided by ``importlib`` will be
updated to comply with this proposal.


* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that of
the run module, while ``__main__.__name__`` will still be "__main__".


* We add ``importlib.find_module()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).


* Deprecations in ``importlib.util``: ``set_package()``,
``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
(introduced in 3.4) can be removed.


* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.


* ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
the per-module import lock, whereas ``Loader.load_module()`` did not.


Reference Implementation
------------------------


A reference implementation is available at <TBD>.




References
==========


[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html




Copyright
=========


This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/7d823183/attachment-0001.html>

Search Discussions

  • Antoine Pitrou at Aug 9, 2013 at 8:28 am
    Hi,


    Le Fri, 9 Aug 2013 00:34:34 -0600,
    Eric Snow <ericsnowcurrently@gmail.com> a ?crit :
    Abstract
    ========

    This PEP proposes to add a new class to ``importlib.machinery`` called
    ``ModuleSpec``. It will contain all the import-related information
    about a module without needing to load the module first. Finders will
    now return a module's spec rather than a loader. The import system
    will use the spec to load the module.

    Looks good on the principle.

    Attributes:

    * ``name`` - the module's name (compare to ``__name__``).
    * ``loader`` - the loader to use during loading and for module data
    (compare to ``__loader__``).

    Should it be the loader or just a factory to build it?
    I'm wondering if in some cases creating a loader is costly.

    * ``package`` - the name of the module's parent (compare to
    ``__package__``).

    Is it None if there is no parent?

    * ``is_package`` - whether or not the module is a package.
    * ``origin`` - the location from which the module originates.
    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).

    Can you explain the difference between origin and filename (or, better,
    give an example)?

    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.

    So how about separate load() and reload() methods?

    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this
    requires a more complicated solution but is not a large obstacle.

    Or how about keeping the method API?

    Module Objects
    --------------

    Module objects will now have a ``__spec__`` attribute to which the
    module's spec will be bound.

    Nice!

    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.

    Does it work with extension modules as well? Generally, extension
    modules are populated when created (i.e. the two steps aren't separate
    at the C API level, IIRC).


    Regards


    Antoine.
  • Brett Cannon at Aug 9, 2013 at 2:43 pm
    On Fri, Aug 9, 2013 at 4:28 AM, Antoine Pitrou wrote:

    Hi,

    Le Fri, 9 Aug 2013 00:34:34 -0600,
    Eric Snow <ericsnowcurrently@gmail.com> a ?crit :
    Abstract
    ========

    This PEP proposes to add a new class to ``importlib.machinery`` called
    ``ModuleSpec``. It will contain all the import-related information
    about a module without needing to load the module first. Finders will
    now return a module's spec rather than a loader. The import system
    will use the spec to load the module.
    Looks good on the principle.
    Attributes:

    * ``name`` - the module's name (compare to ``__name__``).
    * ``loader`` - the loader to use during loading and for module data
    (compare to ``__loader__``).
    Should it be the loader or just a factory to build it?
    I'm wondering if in some cases creating a loader is costly.

    Theoretically it could be costly, but up to this point I have not seen a
    single loader that cost a lot to create. Every loader I have ever written
    just stores details that the finder had to calculate for it's work and
    potentially stores something, e.g. an open zipfile that the finder used to
    see if a module was there.



    * ``package`` - the name of the module's parent (compare to
    ``__package__``).
    Is it None if there is no parent?

    Top-level modules have the value of '' for __package__. None is used to
    represent an unknown value.


    -Brett





    * ``is_package`` - whether or not the module is a package.
    * ``origin`` - the location from which the module originates.
    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).
    Can you explain the difference between origin and filename (or, better,
    give an example)?
    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.
    So how about separate load() and reload() methods?
    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this
    requires a more complicated solution but is not a large obstacle.
    Or how about keeping the method API?
    Module Objects
    --------------

    Module objects will now have a ``__spec__`` attribute to which the
    module's spec will be bound. Nice!
    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.
    Does it work with extension modules as well? Generally, extension
    modules are populated when created (i.e. the two steps aren't separate
    at the C API level, IIRC).

    Regards

    Antoine.


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/3465489b/attachment.html>
  • Eric Snow at Aug 9, 2013 at 4:45 pm

    On Fri, Aug 9, 2013 at 2:28 AM, Antoine Pitrou wrote:


    Le Fri, 9 Aug 2013 00:34:34 -0600,
    Eric Snow <ericsnowcurrently@gmail.com> a ?crit :
    Attributes:

    * ``name`` - the module's name (compare to ``__name__``).
    * ``loader`` - the loader to use during loading and for module data
    (compare to ``__loader__``).
    Should it be the loader or just a factory to build it?
    I'm wondering if in some cases creating a loader is costly.

    The finder is currently responsible for creating the loader and this PEP
    does not propose changing that. So any such loader already has to deal
    with this. I suppose some loader could be expensive to create, but none of
    the existing loaders in the stdlib are that costly. If some future loader
    runs into this problem they can pretty easily write the loader in such a
    way that it defers the costly operations. I'll make a note in the PEP
    about this.



    * ``package`` - the name of the module's parent (compare to
    ``__package__``).
    Is it None if there is no parent?

    As Brett noted, it is ''. This is the same as the __package__ attribute of
    modules. The goal is to keep the same behavior, as much as possible, for
    all the feature that are moved into ModuleSpec. I'll make this objective
    more clear in the PEP.



    * ``is_package`` - whether or not the module is a package.
    * ``origin`` - the location from which the module originates.
    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).
    Can you explain the difference between origin and filename (or, better,
    give an example)?

    Yeah, that wasn't too clear, was it? filename maps directly to the
    module's __file__ attribute, which is not set for all modules. For
    instance, built-in modules do not set it nor do namespace packages. In
    those cases it is still nice to be able to indicate where the module came
    from. For built-in modules origin will be set to 'built-in' and for
    namespace packages 'namespace'. For any module with a filename, origin is
    set to the filename.


    Having both origin and filename is meant to provide for different usage.
      filename is used to populate a module's __file__ attribute. If set, it
    indicates a path-based module (along with cached and path). In contrast,
    origin has a broader meaning and is used by the module_repr() method.


    I suppose there could be a flag to indicate the module is path-based, but I
    went with a separate spec attribute. Likewise, I toyed with the idea of a
    path-based subclass, perhaps PathModuleSpec, but wanted to stick with a
    one-size-fits-all spec class since it is meant to be used almost
    exclusively for state rather than functionality. In some ways it's like
    types.SimpleNamespace, but with a couple of import-related methods and some
    dedicated state.


    I'll make sure the PEP reflects this.



    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.
    So how about separate load() and reload() methods?

    I thought about that too, but found it simpler to keep them together.
      Also, reload is a pretty specialized activity and I plan on leaving some
    of the boilerplate of it to importlib.reload(). However, I'm not convinced
    either way actually. I'll think about that some more and update the PEP
    regardless. Do you have a case to make for making them separate?



    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this
    requires a more complicated solution but is not a large obstacle.
    Or how about keeping the method API?

    Because it is a static piece of data. At the point that we can remove the
    backward compatibility support, we would be stuck with a method when it
    should be just a normal attribute.



    Module Objects
    --------------

    Module objects will now have a ``__spec__`` attribute to which the
    module's spec will be bound.
    Nice!

    Ironic that this PEP adds yet another import-related attribute to modules.
    :) Hopefully it's the last one.



    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.
    Does it work with extension modules as well? Generally, extension
    modules are populated when created (i.e. the two steps aren't separate
    at the C API level, IIRC).

    Yeah, it works great. We simply don't implement exec_module() on
    ExtensionFileLoader and things just stay the same. There is room to add an
    exec_module() and update the C-API for extension modules to support it, but
    I'll leaving that out of the PEP. However, I will mention that in the PEP
    because your question is quite relevant and not well answered there.


    -eric



    Regards

    Antoine.


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/467e76b8/attachment-0001.html>
  • Antoine Pitrou at Aug 9, 2013 at 7:22 pm

    On Fri, 9 Aug 2013 10:45:22 -0600 Eric Snow wrote:
    So how about separate load() and reload() methods?
    I thought about that too, but found it simpler to keep them together.
    Also, reload is a pretty specialized activity and I plan on leaving some
    of the boilerplate of it to importlib.reload(). However, I'm not convinced
    either way actually. I'll think about that some more and update the PEP
    regardless. Do you have a case to make for making them separate?

    Well, is there another way to use load() than:
    - load(): load a new module
    - load(existing_module, is_reload=True): reload an existing module


    I mean, does it make sense to call e.g.
    - load(some_existing_module, is_reload=False)
    - load(is_reload=True)


    ?


    Regards


    Antoine.
  • Eric Snow at Aug 9, 2013 at 10:44 pm

    On Fri, Aug 9, 2013 at 1:22 PM, Antoine Pitrou wrote:


    Well, is there another way to use load() than:
    - load(): load a new module
    - load(existing_module, is_reload=True): reload an existing module

    I mean, does it make sense to call e.g.
    - load(some_existing_module, is_reload=False)

    This would be a ValueError. The module argument is meant just for reload.
      I'm not sure it makes sense otherwise. Perhaps so you could prepare your
    own new module prior to calling load()? I'd like to leave that off the
    table for this PEP.



    - load(is_reload=True)

    This was always okay in my mind, but I realized it did not make it to the
    PEP until Brett had some similar questions. :) The updated PEP covers
    this. Like I told Brett, I'm going to see how a separate reload() looks
    and go from there.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/bfbb45c5/attachment.html>
  • Nick Coghlan at Aug 10, 2013 at 10:50 am

    On 10 August 2013 08:44, Eric Snow wrote:
    On Fri, Aug 9, 2013 at 1:22 PM, Antoine Pitrou wrote:

    Well, is there another way to use load() than:
    - load(): load a new module
    - load(existing_module, is_reload=True): reload an existing module

    I mean, does it make sense to call e.g.
    - load(some_existing_module, is_reload=False)

    This would be a ValueError. The module argument is meant just for reload.
    I'm not sure it makes sense otherwise. Perhaps so you could prepare your
    own new module prior to calling load()? I'd like to leave that off the
    table for this PEP.

    The advantage of offering that API over telling people to call
    spec.load.exec_module(m) directly is that it gives us more control
    over the loading process (by updating ModuleSpec.load), avoiding the
    current problem we have where providing new load time behaviour is
    difficult because we don't control the loader implementations.

    - load(is_reload=True)

    This was always okay in my mind, but I realized it did not make it to the
    PEP until Brett had some similar questions. :) The updated PEP covers this.
    Like I told Brett, I'm going to see how a separate reload() looks and go
    from there.

    A separate reload that works something like this sounds good to me:


         def reload(self, module=None):
             if module is None:
                 module = sys.modules[self.name]
             self.load(module)


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Brett Cannon at Aug 9, 2013 at 2:40 pm
    I like the idea and I think it can be more-or-less safe. Just need more
    specification/clarification on things.




    On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote:

    This is an outgrowth of discussions on the .ref PEP, but it's also
    something I've been thinking about for over a year and starting toying with
    at the last PyCon. I have a patch that passes all but a couple unit tests
    and should pass though when I get a minute to take another pass at it.
    I'll probably end up adding a bunch more unit tests before I'm done as
    well. However, the functionality is mostly there.

    BTW, I gotta say, Brett, I have a renewed appreciation for the long and
    hard effort you put into importlib. There are just so many odd corner
    cases that I never would have looked for if not for that library. And
    those unit tests do a great job of covering all of that. Thanks!

    Welcome! And yes, importlib didn't take multiple years out of laziness, but
    just how much work had to go in to cover corner cases along with pauses
    from frustration with the semantics. :P



    -eric


    -------------------------------------------------------------------------------

    PEP: 4XX
    Title: A ModuleSpec Type for the Import System
    Version: $Revision$
    Last-Modified: $Date$
    Author: Eric Snow <ericsnowcurrently@gmail.com>
    BDFL-Delegate: ???
    Discussions-To: import-sig at python.org
    Status: Draft
    Type: Standards Track
    Content-Type: text/x-rst
    Created: 8-Aug-2013
    Python-Version: 3.4
    Post-History: 8-Aug-2013
    Resolution:


    Abstract
    ========

    This PEP proposes to add a new class to ``importlib.machinery`` called
    ``ModuleSpec``. It will contain all the import-related information
    about a module without needing to load the module first. Finders will
    now return a module's spec rather than a loader. The import system will
    use the spec to load the module.


    Motivation
    ==========

    The import system has evolved over the lifetime of Python. In late 2002
    PEP 302 introduced standardized import hooks via ``finders`` and
    ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
    with Python 3.1, now exposes a pure Python implementation of the APIs
    described by PEP 302, as well as of the full import system. It is now
    much easier to understand and extend the import system. While a benefit
    to the Python community, this greater accessibilty also presents a
    challenge.

    As more developers come to understand and customize the import system,
    any weaknesses in the finder and loader APIs will be more impactful. So
    the sooner we can address any such weaknesses the import system, the
    better...and there are a couple we can take care of with this proposal.

    Firstly, any time the import system needs to save information about a
    module we end up with more attributes on module objects that are
    generally only meaningful to the import system and occoasionally to some
    people. It would be nice to have a per-module namespace to put future
    import-related information. Secondly, there's an API void between
    finders and loaders that causes undue complexity when encountered.

    Finders are strictly responsible for providing the loader which the
    import system will use to load the module. The loader is then
    responsible for doing some checks, creating the module object, setting
    import-related attributes, "installing" the module to ``sys.modules``,
    and loading the module, along with some cleanup. This all takes place
    during the import system's call to ``Loader.load_module()``. Loaders
    also provide some APIs for accessing data associated with a module.

    Loaders are not required to provide any of the functionality of
    ``load_module()`` through other methods. Thus, though the import-
    related information about a module is likely available without loading
    the module, it is not otherwise exposed.

    Furthermore, the requirements assocated with ``load_module()`` are
    common to all loaders and mostly are implemented in exactly the same
    way. This means every loader has to duplicate the same boilerplate
    code. ``importlib.util`` provides some tools that help with this, but
    it would be more helpful if the import system simply took charge of
    these responsibilities. The trouble is that this would limit the degree
    of customization that ``load_module()`` facilitates. This is a gap
    between finders and loaders which this proposal aims to fill.

    Finally, when the import system calls a finder's ``find_module()``, the
    finder makes use of a variety of information about the module that is
    useful outside the context of the method. Currently the options are
    limited for persisting that per-module information past the method call,
    since it only returns the loader. Either store it in a module-to-info
    mapping somewhere like on the finder itself, or store it on the loader.

    The two previous sentences are hard to read; I think you were after
    something like,
    "Popular options for this limitation are to store the information is in a
    module-to-info
    mapping somewhere on the finder itself, or store it on the loader.



    Unfortunately, loaders are not required to be module-specific. On top
    of that, some of the useful information finders could provide is
    common to all finders, so ideally the import system could take care of
    that. This is the same gap as before between finders and loaders.

    As an example of complexity attributable to this flaw, the
    implementation of namespace packages in Python 3.3 (see PEP 420) added
    ``FileFinder.find_loader()`` because there was no good way for
    ``find_module()`` to provide the namespace path.

    The answer to this gap is a ``ModuleSpec`` object that contains the
    per-module information and takes care of the boilerplate functionality
    of loading the module.

    (The idea grew feet during discussions related to another PEP.[1])

    "(This PEP grew out of discussions related to another PEP [1])"




    Specification
    =============

    ModuleSpec
    ----------

    A new class which defines the import-related values to use when loading
    the module. It closely corresponds to the import-related attributes of
    module objects. ``ModuleSpec`` objects may also be used by finders and
    loaders and other import-related APIs to hold extra import-related
    information about the module. This greatly reduces the need to add any
    new import-related attributes to module objects.

    Attributes:

    * ``name`` - the module's name (compare to ``__name__``).
    * ``loader`` - the loader to use during loading and for module data
    (compare to ``__loader__``).
    * ``package`` - the name of the module's parent (compare to
    ``__package__``).
    * ``is_package`` - whether or not the module is a package.

    I think is_package() is redundant in the face of 'name'/'package' or 'path'
    as you can introspect the same information. I honestly have always found it
    a weakness of InspectLoader.is_package() that it didn't return the value
    for __path__.



    * ``origin`` - the location from which the module originates.

    Don't quite follow what this is meant to represent? Like the path to the
    zipfile if loaded that way, otherwise it's the file path?



    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).
    * ``cached`` - the location where the compiled module should be stored
    (compare to ``__cached__``).
    * ``path`` - the list of path entries in which to search for submodules
    or ``None``. (compare to ``__path__``). It should be in sync with
    ``is_package``.

    Why is 'path' the only attribute with a default value? Should probably say
    everything has a default value of None if not set/known.



    Those are also the parameters to ``ModuleSpec.__init__()``, in that
    order.

    I would consider arguing all arguments should be keyword-only past 'name'
    since there is no way most people will remember that order correctly.



    The last three are optional.

    (filename, cached, and path).


    And that definitely makes is_package redundant if that's true.



    When passed the values are taken
    as-is. The ``from_loader()`` method offers calculated values.

    "(see below)."



    Methods:

    * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the
    arguments. The parameters are the same as with ``__init__``, except
    ``package`` is excluded and only ``name`` and ``loader`` are required.

    Why the switch in requirements compared to __init__()?



    * ``module_repr()`` - returns a repr for the module.
    * ``init_module_attrs(module)`` - sets the module's import-related
    attributes.

    Specify what those attributes are and how they are set.



    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.

    If a module is provided and there is already a matching key in sys.modules,
    what happens? What if is_reload is True but there is no module provided or
    in sys.modules; KeyError, ValueError, ImportError? Do you follow having
    None in sys.modules and raise ImportError, or do you overwrite (same
    question if a module is explicitly provided)?



    Values Derived by from_loader()
    -------------------------------

    As implied above, ``from_loader()`` makes a best effort at calculating
    any of the values that are not passed in. It duplicates the behavior
    that was formerly provided the several ``importlib.util`` functions as
    well as the ``init_module_attrs()`` method of several of ``importlib``'s
    loaders. Just to be clear, here is a more detailed description of those
    calculations:

    ``is_package`` is derived from ``path``, if passed. Otherwise the
    loader's ``is_package()`` is tried. Finally, it defaults to False.

    It can also be calculated based on whether ``name`` == ``package``: ``True
    if path is not None else name == package``.
    Always need to watch out for [] for path as that is valid and signals the
    module is a package.


    This is where defining exactly what details need to be passed in and which
    ones are optional are going to be critical in determining what represents
    ambiguity/unknown details vs. what is flat-out known to be true/false.



    ``filename`` is pulled from the loader's ``get_filename()``, if
    possible.

    ``path`` is set to an empty list if ``is_package`` is true, and the
    directory from ``filename`` is appended to it, if available.

    ``cached`` is derived from ``filename`` if it's available.

    Derived how?



    ``origin`` is set to ``filename``.

    ``package`` is set to ``name`` if the module is a package and

    "... is a package, else to ..."



    to ``name.rpartition('.')[0]`` otherwise. Consequently, a
    top-level module will have ``package`` set to the empty string.

    Backward Compatibility
    ----------------------

    Since finder ``find_module()``

    ``Finder.find_module()``



    methods would now return a module spec
    instead of loader, specs must act like the loader that would have been
    returned instead. This is relatively simple to solve since the loader
    is available as an attribute of the spec.

    Are you going to define a __getattr__ to delegate to the loader? Or are you
    going to specifically define equivalent methods, e.g. get_filename() is
    obviously solvable by getting the attribute from the spec (as long as
    filename is a required value)?



    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this requires
    a more complicated solution but is not a large obstacle.

    Unfortunately, the ability to proxy does not extend to ``id()``
    comparisons and ``isinstance()`` tests. In the case of the return value
    of ``find_module()``, we accept that break in backward compatibility.

    Mention that ModuleSpec can be added to the proper ABCs in importlib.abc to
    help alleviate this issue.



    Subclassing
    -----------

    .. XXX Allowed but discouraged?

    Why should it matter if they are subclassed?



    Module Objects
    --------------

    Module objects will now have a ``__spec__`` attribute to which the
    module's spec will be bound. None of the other import-related module
    attributes will be changed or deprecated, though some of them could be.
    Any such deprecation can wait until Python 4.

    "... could be; any such ..."



    ``ModuleSpec`` objects will not be kept in sync with the corresponding
    module object's import-related attributes. They may differ, though in
    practice they will be the same.

    "Though they may differ, in practice they will typically be the same."



    Finders
    -------

    Finders will now return ModuleSpec objects when ``find_module()`` is
    called rather than loaders. For backward compatility, ``Modulespec``
    objects proxy the attributes of their ``loader`` attribute.

    Adding another similar method to avoid backward-compatibility issues
    is undersireable if avoidable. The import APIs have suffered enough.

    in lieu of the fact that find_loader() was just introduced in Python 3.3.



    The approach taken by this PEP should be sufficient.

    The change to ``find_module()`` applies to both ``MetaPathFinder`` and
    ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
    deprecated and, for backward compatibility, implicitly special-cased if
    the method exists on a finder.

    Loaders
    -------

    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.

    The ``load_module()`` of loaders will still work and be an active part
    of the loader API. It is still useful for cases where the default
    module creation/prepartion/cleanup is not appropriate for the loader.

    But will it still be required? Obviously importlib.abc.Loader can grow a
    default load_module() defined around exec_module(), but it should be clear
    if we expect the method to always be manually defined or if it will
    eventually go away.



    A loader must have ``exec_module()`` or ``load_module()`` defined. If
    both exist on the loader, ``exec_module()`` is used and
    ``load_module()`` is ignored.

    Ignored by whom? Should specify that the import system is the one doing the
    ignoring.



    PEP 420 introduced the optional ``module_repr()`` loader method to limit
    the amount of special-casing in the module type's ``__repr__()``. Since
    this method is part of ``ModuleSpec``, it will be deprecated on loaders.
    However, if it exists on a loader it will be used exclusively.

    The loader ``init_module_attr()`` method, added for Python 3.4 will be
    eliminated in favor of the same method on ``ModuleSpec``.

    "method, added prior to Python 3.4's release, will be removed ..."



    However, ``InspectLoader.is_package()`` will not be deprecated even
    though the same information is found on ``ModuleSpec``. ``ModuleSpec``
    can use it to populate its own ``is_package`` if that information is
    not otherwise available. Still, it will be made optional.

    In addition to executing a module during loading, loaders will still be
    directly responsible for providing APIs concerning module-related data.

    Other Changes
    -------------

    * The various finders and loaders provided by ``importlib`` will be
    updated to comply with this proposal.

    * The spec for the ``__main__`` module will reflect how the interpreter
    was started. For instance, with ``-m`` the spec's name will be that of
    the run module, while ``__main__.__name__`` will still be "__main__".

    * We add ``importlib.find_module()`` to mirror
    ``importlib.find_loader()`` (which becomes deprecated).

    * Deprecations in ``importlib.util``: ``set_package()``,
    ``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
    (introduced in 3.4) can be removed.

    "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless
    and will outlive 3.4 so specifying it never went public is important.



    * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.

    * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
    the per-module import lock, whereas ``Loader.load_module()`` did not.
    Reference Implementation
    ------------------------

    A reference implementation is available at <TBD>.


    References
    ==========

    [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html


    Copyright
    =========

    This document has been placed in the public domain.

    ..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:


    _______________________________________________
    Import-SIG mailing list
    Import-SIG at python.org
    http://mail.python.org/mailman/listinfo/import-sig
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/1369e9f3/attachment-0001.html>
  • Eric Snow at Aug 9, 2013 at 6:03 pm

    On Fri, Aug 9, 2013 at 8:40 AM, Brett Cannon wrote:

    On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote:

    Finally, when the import system calls a finder's ``find_module()``, the
    finder makes use of a variety of information about the module that is
    useful outside the context of the method. Currently the options are
    limited for persisting that per-module information past the method call,
    since it only returns the loader. Either store it in a module-to-info
    mapping somewhere like on the finder itself, or store it on the loader.
    The two previous sentences are hard to read; I think you were after
    something like,
    "Popular options for this limitation are to store the information is in a
    module-to-info
    mapping somewhere on the finder itself, or store it on the loader.

    Sounds good.



    (The idea grew feet during discussions related to another PEP.[1])
    "(This PEP grew out of discussions related to another PEP [1])"

    Yeah, this was one of the last things I added to the PEP and my brain was
    starting to get a little fuzzy. :)



    * ``is_package`` - whether or not the module is a package.
    I think is_package() is redundant in the face of 'name'/'package' or
    'path' as you can introspect the same information. I honestly have always
    found it a weakness of InspectLoader.is_package() that it didn't return the
    value for __path__.

    I see what you mean, but I also think it's nice to be able to explicitly
    see if a spec is for a package without having to know about underlying
    rules. However, I'll just make it a property instead of something set on
    the spec (and remove it from __init__).



    * ``origin`` - the location from which the module originates.
    Don't quite follow what this is meant to represent? Like the path to the
    zipfile if loaded that way, otherwise it's the file path?

    Yeah, Antoine had the same question. I'll make sure the PEP is clearer.
      Basically filename maps to the module's __file__ and origin is used for
    the module's repr if filename isn't set.



    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).
    * ``cached`` - the location where the compiled module should be stored
    (compare to ``__cached__``).
    * ``path`` - the list of path entries in which to search for submodules
    or ``None``. (compare to ``__path__``). It should be in sync with
    ``is_package``.
    Why is 'path' the only attribute with a default value? Should probably say
    everything has a default value of None if not set/known.

    Good point.



    Those are also the parameters to ``ModuleSpec.__init__()``, in that
    order.
    I would consider arguing all arguments should be keyword-only past 'name'
    since there is no way most people will remember that order correctly.

    Makes sense, though I'll make everything but name and loader keyword-only.



    * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the
    arguments. The parameters are the same as with ``__init__``, except
    ``package`` is excluded and only ``name`` and ``loader`` are required.
    Why the switch in requirements compared to __init__()?

    Because package is always calculated and only name and loader are necessary
    to calculate the remaining attributes. Perhaps from_loader() is the wrong
    name (I'm open to alternatives). Perhaps __init__() should take over some
    of the calculating. My intention is to provide one API for
    what-you-pass-in-is-what-you-get (__init__) and another for calculating
    attributes. Of course, one could simply modify the spec after creating it,
    but I like idea of explicitly opting in to calculated values. I'll add
    this point to the PEP. Also I'll probably also drop package as a parameter
    of __init__ and make the attribute a property.


    I've also toyed with the idea of making all the attributes properties (aka
    read-only) since changing a module's spec later on could lead to headache,
    but I'm not convinced that is a easy problem to cause. It's better to not
    get in the way of those who have needs I haven't anticipated (consenting
    adults, etc.). What do you think?



    * ``module_repr()`` - returns a repr for the module.
    * ``init_module_attrs(module)`` - sets the module's import-related
    attributes.
    Specify what those attributes are and how they are set.

    Will do.



    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.
    If a module is provided and there is already a matching key in
    sys.modules, what happens?
      What if is_reload is True but there is no module provided or in
    sys.modules; KeyError, ValueError, ImportError? Do you follow having None
    in sys.modules and raise ImportError, or do you overwrite (same question if
    a module is explicitly provided)?

    That's a good point. I thought I had addressed this in the PEP, but
    apparently not. For Loader.load_module(), as you know, the existence of
    the key in sys.modules indicates a reload should happen. The is_reload
    parameter is meant to provide an explicit indicator. The module you pass
    in is simply the one to use. If a module is not passed in and is_reload is
    true, the module in sys.modules will be used. If that module is None or
    not there, ImportError would be raised. If a module is passed in and
    is_reload is false, I was planning on just ignoring that module. However
    raising ValueError in that case would be more useful, indicating that the
    method was called incorrectly.


    Having just the module parameter and letting it indicate a reload is
    doable, but that would mean losing the option of having load() look up the
    module (and it's less explicit). Another option is to have a separate
    reload() method. Antoine mentioned it and I'd considered it early on. I'm
    considering it again since it makes the API less complicated. Do you have
    a preference between the current proposal (load() does it all) and a
    separate reload() method?


      ``is_package`` is derived from ``path``, if passed. Otherwise the
    loader's ``is_package()`` is tried. Finally, it defaults to False.
    It can also be calculated based on whether ``name`` == ``package``: ``True
    if path is not None else name == package``.

    Good point, though at this point I don't think package will be something
    you set.


    Always need to watch out for [] for path as that is valid and signals the
    module is a package.

    Yeah, I've got that covered in from_loader().


    This is where defining exactly what details need to be passed in and which
    ones are optional are going to be critical in determining what represents
    ambiguity/unknown details vs. what is flat-out known to be true/false.

    Agreed. I'll be sure to spell it out.



    ``cached`` is derived from ``filename`` if it's available.
    Derived how?

    cache_from_source()



    methods would now return a module spec
    instead of loader, specs must act like the loader that would have been
    returned instead. This is relatively simple to solve since the loader
    is available as an attribute of the spec.
    Are you going to define a __getattr__ to delegate to the loader? Or are
    you going to specifically define equivalent methods, e.g. get_filename() is
    obviously solvable by getting the attribute from the spec (as long as
    filename is a required value)?

    __getattr__(). I don't want to guess what methods a loader might have.
      And if someone wants to call get_filename() on what they think is the
    loader, I think it's better to just call the loader's get_filename(). I'd
    left this stuff out as an implementation detail. Do you think it should be
    in the PEP? I could simply elaborate on "specs must act like the loader".



    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this requires
    a more complicated solution but is not a large obstacle.

    Unfortunately, the ability to proxy does not extend to ``id()``
    comparisons and ``isinstance()`` tests. In the case of the return value
    of ``find_module()``, we accept that break in backward compatibility.
    Mention that ModuleSpec can be added to the proper ABCs in importlib.abc
    to help alleviate this issue.

    Good point.



    Subclassing
    -----------

    .. XXX Allowed but discouraged?
    Why should it matter if they are subclassed?

    My goal was for ModuleSpec to be the container for module definition state
    with some common attributes as a baseline and a minimal number of methods
    for the import system to use. Loaders would be where you would do extra
    stuff or customize functionality, which is basically what happens now.


    It seemed correct before but now it's feeling like a very artificial and
    unnecessary objective.


    Finders
    -------

    Finders will now return ModuleSpec objects when ``find_module()`` is
    called rather than loaders. For backward compatility, ``Modulespec``
    objects proxy the attributes of their ``loader`` attribute.

    Adding another similar method to avoid backward-compatibility issues
    is undersireable if avoidable. The import APIs have suffered enough.
    in lieu of the fact that find_loader() was just introduced in Python 3.3.

    Are you suggesting additional wording or making a comment?



    Loaders
    -------

    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.

    The ``load_module()`` of loaders will still work and be an active part
    of the loader API. It is still useful for cases where the default
    module creation/prepartion/cleanup is not appropriate for the loader.
    But will it still be required? Obviously importlib.abc.Loader can grow a
    default load_module() defined around exec_module(), but it should be clear
    if we expect the method to always be manually defined or if it will
    eventually go away.

    load_module() will no longer be required. However, it still serves a real
    purpose: the loader may still need to control more of the loading process.
      By implementing load_module() but not exec_module(), a loader gets that.
      I'm make sure that's clear.



    A loader must have ``exec_module()`` or ``load_module()`` defined. If
    both exist on the loader, ``exec_module()`` is used and
    ``load_module()`` is ignored.
    Ignored by whom? Should specify that the import system is the one doing
    the ignoring.

    Got it.



    * Deprecations in ``importlib.util``: ``set_package()``,

    ``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
    (introduced in 3.4) can be removed.
    "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless
    and will outlive 3.4 so specifying it never went public is important.

    Good catch. You should be a PEP editor. <wink>


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/9efa6c6d/attachment-0001.html>
  • Brett Cannon at Aug 9, 2013 at 6:20 pm

    On Fri, Aug 9, 2013 at 2:03 PM, Eric Snow wrote:

    On Fri, Aug 9, 2013 at 8:40 AM, Brett Cannon wrote:
    On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote:

    Finally, when the import system calls a finder's ``find_module()``, the
    finder makes use of a variety of information about the module that is
    useful outside the context of the method. Currently the options are
    limited for persisting that per-module information past the method call,
    since it only returns the loader. Either store it in a module-to-info
    mapping somewhere like on the finder itself, or store it on the loader.
    The two previous sentences are hard to read; I think you were after
    something like,
    "Popular options for this limitation are to store the information is in a
    module-to-info
    mapping somewhere on the finder itself, or store it on the loader.
    Sounds good.

    (The idea grew feet during discussions related to another PEP.[1])
    "(This PEP grew out of discussions related to another PEP [1])"
    Yeah, this was one of the last things I added to the PEP and my brain was
    starting to get a little fuzzy. :)

    * ``is_package`` - whether or not the module is a package.
    I think is_package() is redundant in the face of 'name'/'package' or
    'path' as you can introspect the same information. I honestly have always
    found it a weakness of InspectLoader.is_package() that it didn't return the
    value for __path__.
    I see what you mean, but I also think it's nice to be able to explicitly
    see if a spec is for a package without having to know about underlying
    rules. However, I'll just make it a property instead of something set on
    the spec (and remove it from __init__).

    * ``origin`` - the location from which the module originates.
    Don't quite follow what this is meant to represent? Like the path to the
    zipfile if loaded that way, otherwise it's the file path?
    Yeah, Antoine had the same question. I'll make sure the PEP is clearer.
    Basically filename maps to the module's __file__ and origin is used for
    the module's repr if filename isn't set.

    * ``filename`` - like origin, but limited to a path-based location
    (compare to ``__file__``).
    * ``cached`` - the location where the compiled module should be stored
    (compare to ``__cached__``).
    * ``path`` - the list of path entries in which to search for submodules
    or ``None``. (compare to ``__path__``). It should be in sync with
    ``is_package``.
    Why is 'path' the only attribute with a default value? Should probably
    say everything has a default value of None if not set/known.
    Good point.

    Those are also the parameters to ``ModuleSpec.__init__()``, in that
    order.
    I would consider arguing all arguments should be keyword-only past 'name'
    since there is no way most people will remember that order correctly.
    Makes sense, though I'll make everything but name and loader keyword-only.

    * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from
    the
    arguments. The parameters are the same as with ``__init__``, except
    ``package`` is excluded and only ``name`` and ``loader`` are required.
    Why the switch in requirements compared to __init__()?
    Because package is always calculated and only name and loader are
    necessary to calculate the remaining attributes. Perhaps from_loader() is
    the wrong name (I'm open to alternatives). Perhaps __init__() should take
    over some of the calculating. My intention is to provide one API for
    what-you-pass-in-is-what-you-get (__init__) and another for calculating
    attributes. Of course, one could simply modify the spec after creating it,
    but I like idea of explicitly opting in to calculated values. I'll add
    this point to the PEP. Also I'll probably also drop package as a parameter
    of __init__ and make the attribute a property.

    I've also toyed with the idea of making all the attributes properties (aka
    read-only) since changing a module's spec later on could lead to headache,
    but I'm not convinced that is a easy problem to cause. It's better to not
    get in the way of those who have needs I haven't anticipated (consenting
    adults, etc.). What do you think?

    I agree with your thinking that you should necessarily block usage just
    because it might be a bad idea; consenting adults and all is right.



    * ``module_repr()`` - returns a repr for the module.
    * ``init_module_attrs(module)`` - sets the module's import-related
    attributes.
    Specify what those attributes are and how they are set.
    Will do.

    * ``load(module=None, *, is_reload=False)`` - calls the loader's
    ``exec_module()``, falling back to ``load_module()`` if necessary.
    This method performs the former responsibilities of loaders for
    managing modules before actually loading and for cleaning up. The
    reload case is facilitated by the ``module`` and ``is_reload``
    parameters.
    If a module is provided and there is already a matching key in
    sys.modules, what happens?
    What if is_reload is True but there is no module provided or in
    sys.modules; KeyError, ValueError, ImportError? Do you follow having None
    in sys.modules and raise ImportError, or do you overwrite (same question if
    a module is explicitly provided)?
    That's a good point. I thought I had addressed this in the PEP, but
    apparently not. For Loader.load_module(), as you know, the existence of
    the key in sys.modules indicates a reload should happen. The is_reload
    parameter is meant to provide an explicit indicator. The module you pass
    in is simply the one to use. If a module is not passed in and is_reload is
    true, the module in sys.modules will be used. If that module is None or
    not there, ImportError would be raised. If a module is passed in and
    is_reload is false, I was planning on just ignoring that module. However
    raising ValueError in that case would be more useful, indicating that the
    method was called incorrectly.

    Having just the module parameter and letting it indicate a reload is
    doable, but that would mean losing the option of having load() look up the
    module (and it's less explicit). Another option is to have a separate
    reload() method. Antoine mentioned it and I'd considered it early on. I'm
    considering it again since it makes the API less complicated. Do you have
    a preference between the current proposal (load() does it all) and a
    separate reload() method?

    Nope, no preference.



    ``is_package`` is derived from ``path``, if passed. Otherwise the
    loader's ``is_package()`` is tried. Finally, it defaults to False.
    It can also be calculated based on whether ``name`` == ``package``:
    ``True if path is not None else name == package``.
    Good point, though at this point I don't think package will be something
    you set.

    So you would set 'name' and 'path' to decide if something is a package and
    use that to calculate 'package'?



    Always need to watch out for [] for path as that is valid and signals the
    module is a package.
    Yeah, I've got that covered in from_loader().

    This is where defining exactly what details need to be passed in and which
    ones are optional are going to be critical in determining what represents
    ambiguity/unknown details vs. what is flat-out known to be true/false.
    Agreed. I'll be sure to spell it out.

    ``cached`` is derived from ``filename`` if it's available.
    Derived how?
    cache_from_source()

    I figured, but I know too much about this stuff. =) I would spell it out in
    the PEP.



    methods would now return a module spec
    instead of loader, specs must act like the loader that would have been
    returned instead. This is relatively simple to solve since the loader
    is available as an attribute of the spec.
    Are you going to define a __getattr__ to delegate to the loader? Or are
    you going to specifically define equivalent methods, e.g. get_filename() is
    obviously solvable by getting the attribute from the spec (as long as
    filename is a required value)?
    __getattr__(). I don't want to guess what methods a loader might have.
    And if someone wants to call get_filename() on what they think is the
    loader, I think it's better to just call the loader's get_filename(). I'd
    left this stuff out as an implementation detail. Do you think it should be
    in the PEP? I could simply elaborate on "specs must act like the loader".

    I would elaborate that it's going to be __getattr__() since it influences
    the level of backwards-compatibility.



    However, ``ModuleSpec.is_package`` (an attribute) conflicts with
    ``InspectLoader.is_package()`` (a method). Working around this requires
    a more complicated solution but is not a large obstacle.

    Unfortunately, the ability to proxy does not extend to ``id()``
    comparisons and ``isinstance()`` tests. In the case of the return value
    of ``find_module()``, we accept that break in backward compatibility.
    Mention that ModuleSpec can be added to the proper ABCs in importlib.abc
    to help alleviate this issue.
    Good point.

    Subclassing
    -----------

    .. XXX Allowed but discouraged?
    Why should it matter if they are subclassed?
    My goal was for ModuleSpec to be the container for module definition state
    with some common attributes as a baseline and a minimal number of methods
    for the import system to use. Loaders would be where you would do extra
    stuff or customize functionality, which is basically what happens now.

    It seemed correct before but now it's feeling like a very artificial and
    unnecessary objective.

    I totally get where you are coming from and if we were working in a
    language that pushed for read-only attributes I would agree, but we aren't
    so I wouldn't. =) It just becomes more hassle than it's worth to enforce.



    Finders
    -------

    Finders will now return ModuleSpec objects when ``find_module()`` is
    called rather than loaders. For backward compatility, ``Modulespec``
    objects proxy the attributes of their ``loader`` attribute.

    Adding another similar method to avoid backward-compatibility issues
    is undersireable if avoidable. The import APIs have suffered enough.
    in lieu of the fact that find_loader() was just introduced in Python 3.3.
    Are you suggesting additional wording or making a comment?

    Both? =)



    Loaders
    -------

    Loaders will have a new method, ``exec_module(module)``. Its only job
    is to "exec" the module and consequently populate the module's
    namespace. It is not responsible for creating or preparing the module
    object, nor for any cleanup afterward. It has no return value.

    The ``load_module()`` of loaders will still work and be an active part
    of the loader API. It is still useful for cases where the default
    module creation/prepartion/cleanup is not appropriate for the loader.
    But will it still be required? Obviously importlib.abc.Loader can grow a
    default load_module() defined around exec_module(), but it should be clear
    if we expect the method to always be manually defined or if it will
    eventually go away.
    load_module() will no longer be required. However, it still serves a real
    purpose: the loader may still need to control more of the loading process.
    By implementing load_module() but not exec_module(), a loader gets that.
    I'm make sure that's clear.

    A loader must have ``exec_module()`` or ``load_module()`` defined. If
    both exist on the loader, ``exec_module()`` is used and
    ``load_module()`` is ignored.
    Ignored by whom? Should specify that the import system is the one doing
    the ignoring.
    Got it.

    * Deprecations in ``importlib.util``: ``set_package()``,

    ``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
    (introduced in 3.4) can be removed.
    "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless
    and will outlive 3.4 so specifying it never went public is important.
    Good catch. You should be a PEP editor. <wink>

    Ha! Being a PEP editor means I know how to use hg, run a make command, and
    can count.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/5615a2be/attachment-0001.html>
  • Eric Snow at Aug 9, 2013 at 10:28 pm

    On Fri, Aug 9, 2013 at 12:20 PM, Brett Cannon wrote:


    On Fri, Aug 9, 2013 at 2:03 PM, Eric Snow wrote:
    Having just the module parameter and letting it indicate a reload is
    doable, but that would mean losing the option of having load() look up the
    module (and it's less explicit). Another option is to have a separate
    reload() method. Antoine mentioned it and I'd considered it early on. I'm
    considering it again since it makes the API less complicated. Do you have
    a preference between the current proposal (load() does it all) and a
    separate reload() method?
    Nope, no preference.

    Okay. I'll probably try it out a separate reload() and see how things look.



    ``is_package`` is derived from ``path``, if passed. Otherwise the
    loader's ``is_package()`` is tried. Finally, it defaults to False.
    It can also be calculated based on whether ``name`` == ``package``:
    ``True if path is not None else name == package``.
    Good point, though at this point I don't think package will be something
    you set.
    So you would set 'name' and 'path' to decide if something is a package and
    use that to calculate 'package'?

    That and the loader's is_package(), if available.



    cache_from_source()
    I figured, but I know too much about this stuff. =) I would spell it out
    in the PEP.

    Done.



    __getattr__(). I don't want to guess what methods a loader might have.
    And if someone wants to call get_filename() on what they think is the
    loader, I think it's better to just call the loader's get_filename(). I'd
    left this stuff out as an implementation detail. Do you think it should be
    in the PEP? I could simply elaborate on "specs must act like the loader".
    I would elaborate that it's going to be __getattr__() since it influences
    the level of backwards-compatibility.

    Done.



    My goal was for ModuleSpec to be the container for module definition state
    with some common attributes as a baseline and a minimal number of methods
    for the import system to use. Loaders would be where you would do extra
    stuff or customize functionality, which is basically what happens now.

    It seemed correct before but now it's feeling like a very artificial and
    unnecessary objective.
    I totally get where you are coming from and if we were working in a
    language that pushed for read-only attributes I would agree, but we aren't
    so I wouldn't. =) It just becomes more hassle than it's worth to enforce.

    Agreed.



    in lieu of the fact that find_loader() was just introduced in Python 3.3.
    Are you suggesting additional wording or making a comment?
    Both? =)

    Okay. I clarified that.


    I'll probably be posting an updated PEP shortly.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/86a0549d/attachment-0001.html>
  • Eric Snow at Aug 9, 2013 at 6:15 pm
    Would it be worth deprecating the current signature and attributes of
    FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses
    self.path, but otherwise the only use for the attributes is already covered
    by the info in the spec.


    Also, should we have timelines for the deprecations in the PEP. I'm
    inclined to not worry about it, but it *would* be nice to remove at least
    some of the backward compatibility hackery that this PEP will introduce.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/b274030d/attachment.html>
  • Brett Cannon at Aug 9, 2013 at 6:23 pm

    On Fri, Aug 9, 2013 at 2:15 PM, Eric Snow wrote:


    Would it be worth deprecating the current signature and attributes of
    FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses
    self.path, but otherwise the only use for the attributes is already covered
    by the info in the spec.

    Probably, or at least provide a Spec-only signature of the __init__().



    Also, should we have timelines for the deprecations in the PEP. I'm
    inclined to not worry about it, but it *would* be nice to remove at least
    some of the backward compatibility hackery that this PEP will introduce.

    Since the backwards-compatibility hacks don't sound like they will be
    ridiculously complex or getting in the way I say just put in proper
    PendingDeprecationWarnings and assume they will be there until Python 4 (no
    later than 8 years away! =).
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/66020069/attachment.html>
  • Eric Snow at Aug 9, 2013 at 10:36 pm
    On Fri, Aug 9, 2013 at 12:23 PM, Brett Cannon wrote:


    On Fri, Aug 9, 2013 at 2:15 PM, Eric Snow wrote:

    Would it be worth deprecating the current signature and attributes of
    FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses
    self.path, but otherwise the only use for the attributes is already covered
    by the info in the spec.
    Probably, or at least provide a Spec-only signature of the __init__().

    Also, should we have timelines for the deprecations in the PEP. I'm
    inclined to not worry about it, but it *would* be nice to remove at least
    some of the backward compatibility hackery that this PEP will introduce.
    Since the backwards-compatibility hacks don't sound like they will be
    ridiculously complex or getting in the way I say just put in proper
    PendingDeprecationWarnings and assume they will be there until Python 4 (no
    later than 8 years away! =).

    Sounds good.


    -eric
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/bce33978/attachment.html>
  • Nick Coghlan at Aug 10, 2013 at 11:02 am
    This generally looks good to me. Something I'm wondering:


    Q1. Can we experiment with this as a custom metapath importer?


    A1. Not really, because we want to use it to avoid some of the other
    importlib additions made in 3.4. However, a backport to 3.3 as a
    custom metapath hook may still be interesting.


    Q2. Given this idea as a foundation, could we experiment with ref file
    support as a custom importer?


    A2. Quite possibly, which may make that a good thing to defer to 3.5
    (for stdlib inclusion, anyway).


    I'll wait until the updated version gets through before commenting further :)


    Cheers,
    Nick.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimport-sig @
categoriespython
postedAug 9, '13 at 6:34a
activeAug 10, '13 at 11:02a
posts15
users4
websitepython.org

People

Translate

site design / logo © 2018 Grokbase