FAQ
Hello!


Based on previous discussions, particularly the lacks of objections to
repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489
to the editors. I'm including a copy below.


The implementation is nearly finished, with several things missing:
- Support for non-Linuxy platforms
- PyImport_Inittab, see below
- Documentation
- porting "xx" and "xxsubtype" modules (but "xxlimited" is done)




The changes from the last update are:
- PyModuleExport -> PyModuleDef (which brings us down to two slot types,
create & exec)
- Removed "singleton modules"
- Stated that PyModule_Create, PyState_FindModule, PyState_AddModule,
PyState_RemoveModule will not work on slots-based modules.
- Added a section on C-level callbacks
- Clarified that if PyModuleExport_* returns NULL, it's as if it wasn't
defined (i.e. falls back to PyInit)
- Added API functions: PyModule_FromDefAndSpec, PyModule_ExecDef
- Added PyModule_AddMethods and PyModule_AddDocstring helpers
- Added PyMODEXPORT_FUNC macro for x-platform declarations of the export
function
- Added summary of API changes
- Added example code for a backwards-compatible module
- Changed modules ported in the initial implementation to "array" and "xx*"
- Changed ImportErrors to SystemErrors in cases where the module is
badly written (and to mirror what PyInit does now)
- Several typo fixes and clarifications




Some further thoughts:


The docstring and methods are initialized in the creation step, rather
than exec. I don't think it's important enough to do this in exec, and
this way the implementation is easier (with respect to NULL slots, and
backwards compatibility with PyInit-based modules where Exec is a no-op).


As I was implementing this, I ran into PyImport_Inittab. I'll need to
add a similar list of PyModuleDefs.




And now for the PEP:


--


PEP: 489
Title: Redesigning extension module loading
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou@gmail.com>,
         Stefan Behnel <stefan_ml@behnel.de>,
         Nick Coghlan <ncoghlan@gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
Resolution:




Abstract
========


This PEP proposes a redesign of the way in which extension modules interact
with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave;
specifically to hook into the ModuleSpec-based loading mechanism
introduced in PEP 451.


This proposal draws inspiration from PyType_Spec of PEP 384 to allow
extension
authors to only define features they need, and to allow future additions
to extension module declarations.


Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of
classes.


Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.


The proposal also allows extension modules with non-ASCII names.




Motivation
==========


Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.


For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.


In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to
re-import
it and thus run into an infinite loop when it executes the module init
function
again. Without the FQMN, it is not trivial to correctly add the module to
sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.


Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and,
while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and
give
extension authors adequate time to consider these issues when porting.




The current process
===================


Currently, extension modules export an initialization function named
"PyInit_modulename", named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialized module object. The
function receives no arguments, so it has no way of knowing about its
import context.


During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.


In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe
as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.




The proposal
============


The current extension module initialization will be deprecated in favor of
a new initialization scheme. Since the current scheme will continue to be
available, existing code will continue to work unchanged, including binary
compatibility.


Extension modules that support the new initialization scheme must export
the public symbol "PyModuleExport_<modulename>", where "modulename"
is the name of the module. (For modules with non-ASCII names the symbol name
is slightly different, see "Export Hook Name" below.)


If defined, this symbol must resolve to a C function with the following
signature::


     PyModuleDef* (*PyModuleExportFunction)(void)


For cross-platform compatibility, the function should be declared as::


     PyMODEXPORT_FUNC PyModuleExport_<modulename>(void)


The function must return a pointer to a PyModuleDef structure.
This structure must be available for the lifetime of the module created from
it ? usually, it will be declared statically.


Alternatively, this function can return NULL, in which case it is as if the
symbol was not defined ? see the "Legacy Init" section.


The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module
storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::


     typedef struct {
         int slot;
         void *value;
     } PyModuleDef_Slot;


     typedef struct PyModuleDef {
         PyModuleDef_Base m_base;
         const char* m_name;
         const char* m_doc;
         Py_ssize_t m_size;
         PyMethodDef *m_methods;
         PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */
         traverseproc m_traverse;
         inquiry m_clear;
         freefunc m_free;
     } PyModuleDef;


The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).


To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.


A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.


The following slots are currently available, and described later:


* Py_mod_create
* Py_mod_exec


Unknown slot IDs will cause the import to fail with SystemError.


When using the new import mechanism, m_size must not be negative.
Also, the *m_name* field of PyModuleDef will not be unused during importing;
the module name will be taken from the ModuleSpec.




Module Creation
---------------


Module creation ? that is, the implementation of
ExecutionLoader.create_module ? is governed by the Py_mod_create slot.


The Py_mod_create slot
......................


The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::


     PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)


The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.


This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.


There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state.


Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in
particular
to not call user code from it.


Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.


If Py_mod_create is not specified, the import machinery will create a normal
module object by PyModule_New. The name is taken from *spec*.




Post-creation steps
...................


If the Py_mod_create function returns an instance of types.ModuleType
(or subclass), or if a Py_mod_create slot is not present, the import
machinery
will do the following steps after the module is created:


* If *m_size* is specified, per-module state is allocated and made
accessible
   through PyModule_GetState
* The PyModuleDef is associated with the module, making it accessible to
   PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks.
* The docstring is set from m_doc.
* The module's functions are initialized from m_methods.


If the Py_mod_create function does not return a module subclass, then m_size
must be 0 or negative, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.




Module Execution
----------------


Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.


Execution slots may be specified multiple times, and are processed in
the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.




The Py_mod_exec slot
....................


The entry in this slot must point to a function with the following
signature::


     int (*PyModuleExecFunction)(PyObject* module)


It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.


If PyModuleExec replaces the module's entry in sys.modules,
the new object will be used and returned by importlib machinery.
(This mirrors the behavior of Python modules. Note that for extensions,
implementing Py_mod_create is usually a better solution for the use cases
this serves.)


The function must return ``0`` on success, or, on error, set an
exception and
return ``-1``.




Legacy Init
-----------


If the PyModuleExport function is not defined, or if it returns NULL, the
import machinery will try to initialize the module using the
"PyInit_<modulename>" hook, as described in PEP 3121.


If the PyModuleExport function is defined, the PyInit function will be
ignored.
Modules requiring compatibility with previous versions of CPython may
implement
the PyInit function in addition to the new hook.


Modules using the legacy init API will be initialized entirely in the
Loader.create_module step; Loader.exec_module will be a no-op.


A module that supports older CPython versions can be coded as::


     #define Py_LIMITED_API
     #include <Python.h>


     static int spam_exec(PyObject *module) {
         PyModule_AddStringConstant(module, "food", "spam");
         return 0;
     }


     static PyModuleDef_Slot spam_slots[] = {
         {Py_mod_exec, spam_exec},
         {0, NULL}
     };


     static PyModuleDef spam_def = {
         PyModuleDef_HEAD_INIT, /* m_base */
         "spam", /* m_name */
         PyDoc_STR("Utilities for cooking spam"), /* m_doc */
         0, /* m_size */
         NULL, /* m_methods */
         spam_slots, /* m_slots */
         NULL, /* m_traverse */
         NULL, /* m_clear */
         NULL, /* m_free */
     };


     PyModuleDef* PyModuleExport_spam(void) {
         return &spam_def;
     }


     PyMODINIT_FUNC
     PyInit_spam(void) {
         PyObject *module;
         module = PyModule_Create(&spam_def);
         if (module == NULL) return NULL;
         if (spam_exec(module) != 0) {
             Py_DECREF(module);
             return NULL;
         }
         return module;
     }


Note that this must be *compiled* on a new CPython version, but the
resulting
shared library will be backwards compatible.
(Source-level compatibility is possible with preprocessor directives.)


If a Py_mod_create slot is used, PyInit should call its function instead of
PyModule_Create. Keep in mind that the ModuleSpec object is not available in
the legacy init scheme.




Subinterpreters and Interpreter Reloading
-----------------------------------------


Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
interpreters.
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except
built-in types
with no mutable or user-settable class attributes.


Behavior of existing module creation functions
----------------------------------------------


The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL m_slots pointer.
The function doesn't have access to the ModuleSpec object necessary for
"new style" module creation.


The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will fail with SystemError.
PyState registration is disabled because multiple module objects may be
created from the same PyModuleDef.




Module state and C-level callbacks
----------------------------------


Due to the unavailability of PyState_FindModule, any function that needs
access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:


* Methods of classes, which receive a reference to the class, but not to
   the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
   data set at cllback registration


Fixing these cases is outside of the scope of this PEP, but will be
needed for
the new mechanism to be useful to all modules. Proper fixes have been
discussed
on the import-sig mailing list [#findmodule-discussion]_.


As a rule of thumb, modules that rely on PyState_FindModule are, at the
moment,
not good candidates for porting to the new mechanism.




New Functions
-------------


A new function and macro will be added to implement module creation.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::


     PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
     PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                         int module_api_version)


A new function will be added to run "execution slots" on a module::


     PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)


Additionally, two helpers will be added for setting the docstring and
methods on a module::


     int PyModule_SetDocString(PyObject *, const char *)
     int PyModule_AddFunctions(PyObject *, PyMethodDef *)




Export Hook Name
----------------


As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyModuleExport hook name.


For ASCII module names, the import hook is named
PyModuleExport_<modulename>, where <modulename> is the name of the module.


For module names containing non-ASCII characters, the import hook is named
PyModuleExportU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").




In Python::


     def export_hook_name(name):
         try:
             suffix = b'_' + name.encode('ascii')
         except UnicodeEncodeError:
             suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
         return b'PyModuleExport' + suffix


Examples:


============= ===========================
Module name Export hook name
============= ===========================
spam PyModuleExport_spam
lan?m?t PyModuleExportU_lanmt_2sa6t
??? PyModuleExportU_zck5b2b
============= ===========================




Module Reloading
----------------


Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.


Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.


Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.




Multiple modules in one library
-------------------------------


To support multiple Python modules in one shared library, the library can
export additional PyModuleExport* symbols besides the one that corresponds
to the library's filename.


Note that this mechanism can currently only be used to *load* extra modules,
not to *find* them.


Given the filesystem location of a shared library and a module name,
a module may be loaded with::


     import importlib.machinery
     import importlib.util
     loader = importlib.machinery.ExtensionFileLoader(name, path)
     spec = importlib.util.spec_from_loader(name, loader)
     module = importlib.util.module_from_spec(spec)
     loader.exec_module(module)
     return module


On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.




Testing and initial implementations
-----------------------------------


For testing, a new built-in module ``_testmoduleexport`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".


The ``_testcapi`` module will be unchanged, and will use the old API
indefinitely (or until the old API is removed).


The ``array`` and ``xx*`` modules will be converted to the new API as
part of the initial implementation.




API Changes and Additions
-------------------------


New functions:


* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions


New macros:


* PyMODEXPORT_FUNC
* Py_mod_create
* Py_mod_exec


New structures:


* PyModuleDef_Slot


PyModuleDef.m_reload changes to PyModuleDef.m_slots.




Possible Future Extensions
==========================


The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.


Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::


     PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                             PY_SSL_ERROR_ZERO_RETURN);


Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.


String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)


Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.




Implementation
==============


Work-in-progress implementation is available in a Github repository
[#gh-repo]_;
a patchset is at [#gh-patch]_.




Previous Approaches
===================


Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module
objects.


Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.


The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python
modules
are initialized, but it was later recognized that this isn't an
important goal.
The current PEP describes a simpler solution.




References
==========


.. [#lazy_import_concerns]
    https://mail.python.org/pipermail/python-dev/2013-August/128129.html


.. [#pep-0451-attributes]
    https://www.python.org/dev/peps/pep-0451/#attributes


.. [#stefans_protopep]
    https://mail.python.org/pipermail/python-dev/2013-August/128087.html


.. [#nicks-prototype]
    https://mail.python.org/pipermail/python-dev/2013-August/128101.html


.. [#rfc-3492]
    http://tools.ietf.org/html/rfc3492


.. [#gh-repo]
    https://github.com/encukou/cpython/commits/pep489


.. [#gh-patch]
    https://github.com/encukou/cpython/compare/master...encukou:pep489.patch


.. [#findmodule-discussion]
    https://mail.python.org/pipermail/import-sig/2015-April/000959.html




Copyright
=========


This document has been placed in the public domain.

Search Discussions

  • Petr Viktorin at May 13, 2015 at 2:31 pm

    On Thu, May 7, 2015 at 5:35 PM, Petr Viktorin wrote:
    Hello!

    Based on previous discussions, particularly the lacks of objections to
    repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489
    to the editors. I'm including a copy below.

    The implementation is nearly finished, with several things missing:
    - Support for non-Linuxy platforms
    - PyImport_Inittab, see below
    - Documentation
    - porting "xx" and "xxsubtype" modules (but "xxlimited" is done) [...]

    Some further thoughts:

    The docstring and methods are initialized in the creation step, rather
    than exec. I don't think it's important enough to do this in exec, and
    this way the implementation is easier (with respect to NULL slots, and
    backwards compatibility with PyInit-based modules where Exec is a no-op).

    As I was implementing this, I ran into PyImport_Inittab. I'll need to
    add a similar list of PyModuleDefs.

    And here I'm somewhat stumped, can someone help me find the right direction?


    There's a tool called freeze, which (among other things) generates the
    PyImport_Inittab, in the file config.c which looks a bit like this:


    extern PyObject* PyInit__thread(void);
    extern PyObject* PyInit__signal(void);
    [... and so on for the other modules ...]


    struct _inittab _PyImport_Inittab[] = {
         {"_thread", PyInit__thread},
         {"_signal", PyInit__signal},
         [... and so on for the other modules ...]
    };


    This file is generated just from a list of module names, without
    loading them. So, it can't easily determine whether a module uses
    PyInit_*, or PyModuleExport_*. But it needs to choose the hook name
    correctly, otherwise the program will fail to link.


    I can see three solutions for this problem.
    I could modify freeze to inspect the modules somehow. I'm wary of
    writing platform-specific code for such an edge case, though, and I'm
    not sure if freeze always has access to the modules it processes,
    rather than just their names.


    I could introduce some way to specify which hook is used out-of band.
    But that's just passing the problem on to users, not solving it.
    Also, freeze is pretty minimal and I'm vaguely aware of third-party
    tools that do something similar (cx_freeze, py2exe, py2app); I might
    need to coordinate with them.


    Or, I could keep the "PyInit_*" hook name, and allow it to return
    PyModuleDef instead of a module. This is obviously a hack, and would
    force me to get back down to the drawing board, but considering the
    options it seems best to explore this option.
    (PyInit_* and PyModuleExport_* signatures are technically compatible,
    since a PyModuleDef is a PyObject)


    I'd welcome your thoughts.
  • Nick Coghlan at May 13, 2015 at 4:04 pm

    On 14 May 2015 at 00:31, Petr Viktorin wrote:
    Or, I could keep the "PyInit_*" hook name, and allow it to return
    PyModuleDef instead of a module. This is obviously a hack, and would
    force me to get back down to the drawing board, but considering the
    options it seems best to explore this option.
    (PyInit_* and PyModuleExport_* signatures are technically compatible,
    since a PyModuleDef is a PyObject)

    I'd welcome your thoughts.

    Would it be feasible to go with a model where _PyImport_inittab
    continues to be based on the legacy extension module initialisation
    system for the time being? That would mean implementing PyInit_* would
    remain required rather than optional for 3.5, but lots of folks are
    going to want to provide it anyway for compatibility with 3.4 and
    earlier.


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Petr Viktorin at May 14, 2015 at 8:10 am

    On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote:
    On 14 May 2015 at 00:31, Petr Viktorin wrote:
    Or, I could keep the "PyInit_*" hook name, and allow it to return
    PyModuleDef instead of a module. This is obviously a hack, and would
    force me to get back down to the drawing board, but considering the
    options it seems best to explore this option.
    (PyInit_* and PyModuleExport_* signatures are technically compatible,
    since a PyModuleDef is a PyObject)

    I'd welcome your thoughts.
    Would it be feasible to go with a model where _PyImport_inittab
    continues to be based on the legacy extension module initialisation
    system for the time being? That would mean implementing PyInit_* would
    remain required rather than optional for 3.5, but lots of folks are
    going to want to provide it anyway for compatibility with 3.4 and
    earlier.

    That doesn't really solve the problem, just delays it until we decide
    that PyInit_* is really optional.
    It would mean you couldn't take advantage of the improvements in PEP
    489 (create/exec split and ModuleSpec). You'd just write more
    boilerplate for no benefit (except small stuff like non-ASCII module
    names).


    What might be worse, it would mean that modules would have different
    behavior depending on whether they're frozen or not, which would
    probably result in subtle bugs you'd only find when creating frozen
    binaries.
  • Nick Coghlan at May 14, 2015 at 8:48 am

    On 14 May 2015 at 18:10, Petr Viktorin wrote:
    On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote:
    On 14 May 2015 at 00:31, Petr Viktorin wrote:
    Or, I could keep the "PyInit_*" hook name, and allow it to return
    PyModuleDef instead of a module. This is obviously a hack, and would
    force me to get back down to the drawing board, but considering the
    options it seems best to explore this option.
    (PyInit_* and PyModuleExport_* signatures are technically compatible,
    since a PyModuleDef is a PyObject)

    I'd welcome your thoughts.
    Would it be feasible to go with a model where _PyImport_inittab
    continues to be based on the legacy extension module initialisation
    system for the time being? That would mean implementing PyInit_* would
    remain required rather than optional for 3.5, but lots of folks are
    going to want to provide it anyway for compatibility with 3.4 and
    earlier.
    That doesn't really solve the problem, just delays it until we decide
    that PyInit_* is really optional.

    Yeah, I was seeing if you thought a "buy more time to think about it
    further" approach might be viable here. I think you're right that we
    need a better answer up front, though.

    It would mean you couldn't take advantage of the improvements in PEP
    489 (create/exec split and ModuleSpec). You'd just write more
    boilerplate for no benefit (except small stuff like non-ASCII module
    names).

    What might be worse, it would mean that modules would have different
    behavior depending on whether they're frozen or not, which would
    probably result in subtle bugs you'd only find when creating frozen
    binaries.

    Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py,
    I'm thinking your "out-of-band" option may be a reasonable way to go,
    with a corresponding tweak to the semantics of
    https://docs.python.org/3/c-api/import.html#c._inittab to permit
    (initfunc) to be a pointer to a PyInit_* function OR to a
    PyModuleExport_* function.


    We'd then have to determine which was which at runtime when processing
    the inittab internally, by checking whether or not the result of the
    call was a PyModuleDef or not.


    For the inittab generation side, freeze would need to be updated to:


    * allow builtin modules to be specifically nominated as "initialised
    modules" or "defined modules"
    * allow the default handling of builtin modules not nominated as one
    or the other to be configured
    * for backwards compatibility, builtin modules would be treated as
    initialised modules by default


    If you had a new module that was export only, you'd get a link time
    error looking for the init function that didn't exist if you didn't
    explicitly flag it as a "defined module". Similarly, if you switched
    the default to be defined modules, you'd get a link time error for a
    legacy module that didn't support the new API.


    Does that approach sound plausible to you?


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Petr Viktorin at May 14, 2015 at 12:38 pm

    On Thu, May 14, 2015 at 10:48 AM, Nick Coghlan wrote:
    On 14 May 2015 at 18:10, Petr Viktorin wrote:
    On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote:
    On 14 May 2015 at 00:31, Petr Viktorin wrote:
    Or, I could keep the "PyInit_*" hook name, and allow it to return
    PyModuleDef instead of a module. This is obviously a hack, and would
    force me to get back down to the drawing board, but considering the
    options it seems best to explore this option.
    (PyInit_* and PyModuleExport_* signatures are technically compatible,
    since a PyModuleDef is a PyObject)

    I'd welcome your thoughts.
    Would it be feasible to go with a model where _PyImport_inittab
    continues to be based on the legacy extension module initialisation
    system for the time being? That would mean implementing PyInit_* would
    remain required rather than optional for 3.5, but lots of folks are
    going to want to provide it anyway for compatibility with 3.4 and
    earlier.
    That doesn't really solve the problem, just delays it until we decide
    that PyInit_* is really optional.
    Yeah, I was seeing if you thought a "buy more time to think about it
    further" approach might be viable here. I think you're right that we
    need a better answer up front, though.
    It would mean you couldn't take advantage of the improvements in PEP
    489 (create/exec split and ModuleSpec). You'd just write more
    boilerplate for no benefit (except small stuff like non-ASCII module
    names).

    What might be worse, it would mean that modules would have different
    behavior depending on whether they're frozen or not, which would
    probably result in subtle bugs you'd only find when creating frozen
    binaries.
    Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py,
    I'm thinking your "out-of-band" option may be a reasonable way to go,
    with a corresponding tweak to the semantics of
    https://docs.python.org/3/c-api/import.html#c._inittab to permit
    (initfunc) to be a pointer to a PyInit_* function OR to a
    PyModuleExport_* function.

    We'd then have to determine which was which at runtime when processing
    the inittab internally, by checking whether or not the result of the
    call was a PyModuleDef or not.

    That would work, but I don't see much of an advantage over allowing
    PyInit_* itself to return either module or PyModuleDef.

    For the inittab generation side, freeze would need to be updated to:

    * allow builtin modules to be specifically nominated as "initialised
    modules" or "defined modules"
    * allow the default handling of builtin modules not nominated as one
    or the other to be configured
    * for backwards compatibility, builtin modules would be treated as
    initialised modules by default

    If you had a new module that was export only, you'd get a link time
    error looking for the init function that didn't exist if you didn't
    explicitly flag it as a "defined module". Similarly, if you switched
    the default to be defined modules, you'd get a link time error for a
    legacy module that didn't support the new API.

    Does that approach sound plausible to you?

    I think the "initialized" vs. "exported" distinction is an
    implementation detail of the module, and this would expose it too
    much.
    According to its README, freeze "[parses] the program (and all its
    modules) and scans the generated byte code for IMPORT instructions". I
    think py2exe does something similar. The end users of such tools would
    need to designate which modules use init vs. export.


    Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
    but it keeps the details isolated between the module and the import
    machinery.
    PyModuleDef is a PyObject, so the PyInit signature matches. Just the
    PyInit name is a bit misleading :(
    I think I have a favorite direction now. (Sorry for asking for
    directions and then wanting to ignore them! The discussion is
    helpful.)




    Somewhat related: any thoughts on the legacy init example code [0]?
    You asked for an example like this; is it what you had in mind? If you
    compile this with a PEP-489 Python with the stable API, the .so can be
    used with older Pythons as well.
    I now think it's a bit silly: it would be enough to use #ifdef: define
    either PyModuleExport or PyInit, depending on the Python version.
    This won't do if you're targetting the stable API, but in that case
    you can't use any of the new PEP 489 features anyway, so it's enough
    to only define PyInit.
    Or is there something I missed?




    [0] https://www.python.org/dev/peps/pep-0489/#legacy-init
  • Nick Coghlan at May 14, 2015 at 4:45 pm

    On 14 May 2015 at 22:38, Petr Viktorin wrote:
    I think the "initialized" vs. "exported" distinction is an
    implementation detail of the module, and this would expose it too
    much.
    According to its README, freeze "[parses] the program (and all its
    modules) and scans the generated byte code for IMPORT instructions". I
    think py2exe does something similar. The end users of such tools would
    need to designate which modules use init vs. export.

    Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
    but it keeps the details isolated between the module and the import
    machinery.
    PyModuleDef is a PyObject, so the PyInit signature matches. Just the
    PyInit name is a bit misleading :(

    Agreed it makes the name of PyInit_* a bit misleading, but also agreed
    that it sounds like a good trick for making this work in a way that
    can handle _PyImport_inittab appropriately.


    In terms of documenting it in a way that lets the hook name still make
    sense, perhaps we can refer to returning PyModuleDef as "multi-phase
    initialisation"? That is:


    - initialise the module definition
    - create the module object
    - execute the module body


    If you *don't* return a module definition, then the import system will
    assume single phase initialisation.

    I think I have a favorite direction now. (Sorry for asking for
    directions and then wanting to ignore them! The discussion is
    helpful.)

    I find that seeing a suggestion I don't like often sparks new ideas as
    I attempt to figure out why I don't like it :)

    Somewhat related: any thoughts on the legacy init example code [0]?
    You asked for an example like this; is it what you had in mind? If you
    compile this with a PEP-489 Python with the stable API, the .so can be
    used with older Pythons as well.
    I now think it's a bit silly: it would be enough to use #ifdef: define
    either PyModuleExport or PyInit, depending on the Python version.
    This won't do if you're targetting the stable API, but in that case
    you can't use any of the new PEP 489 features anyway, so it's enough
    to only define PyInit.
    Or is there something I missed?

    I think the idea above makes it mandatory to use "#ifdef" to request
    multi-phase initialisation on 3.5+ and single-phase initialisation on
    earlier versions. An example of the relevant incantations might still
    be useful though.


    Cheers,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
  • Petr Viktorin at May 14, 2015 at 7:04 pm

    On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan wrote:
    On 14 May 2015 at 22:38, Petr Viktorin wrote:
    Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
    but it keeps the details isolated between the module and the import
    machinery.
    PyModuleDef is a PyObject, so the PyInit signature matches. Just the
    PyInit name is a bit misleading :(
    Agreed it makes the name of PyInit_* a bit misleading, but also agreed
    that it sounds like a good trick for making this work in a way that
    can handle _PyImport_inittab appropriately.

    In terms of documenting it in a way that lets the hook name still make
    sense, perhaps we can refer to returning PyModuleDef as "multi-phase
    initialisation"? That is:

    - initialise the module definition
    - create the module object
    - execute the module body

    Yes! That'll even make a much better name for the PEP; currently it
    reads like "yet another change".
    (I hope I can rename a PEP once submitted?)

    Somewhat related: any thoughts on the legacy init example code [0]?
    You asked for an example like this; is it what you had in mind? If you
    compile this with a PEP-489 Python with the stable API, the .so can be
    used with older Pythons as well.
    I now think it's a bit silly: it would be enough to use #ifdef: define
    either PyModuleExport or PyInit, depending on the Python version.
    This won't do if you're targetting the stable API, but in that case
    you can't use any of the new PEP 489 features anyway, so it's enough
    to only define PyInit.
    Or is there something I missed?
    I think the idea above makes it mandatory to use "#ifdef" to request
    multi-phase initialisation on 3.5+ and single-phase initialisation on
    earlier versions. An example of the relevant incantations might still
    be useful though.

    Definitely.
  • Nick Coghlan at May 15, 2015 at 6:10 am

    On 15 May 2015 05:04, "Petr Viktorin" wrote:
    On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan wrote:
    On 14 May 2015 at 22:38, Petr Viktorin wrote:
    Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
    but it keeps the details isolated between the module and the import
    machinery.
    PyModuleDef is a PyObject, so the PyInit signature matches. Just the
    PyInit name is a bit misleading :(
    Agreed it makes the name of PyInit_* a bit misleading, but also agreed
    that it sounds like a good trick for making this work in a way that
    can handle _PyImport_inittab appropriately.

    In terms of documenting it in a way that lets the hook name still make
    sense, perhaps we can refer to returning PyModuleDef as "multi-phase
    initialisation"? That is:

    - initialise the module definition
    - create the module object
    - execute the module body
    Yes! That'll even make a much better name for the PEP; currently it
    reads like "yet another change".
    (I hope I can rename a PEP once submitted?)

    Yes, renaming is fine. That's one of the advantages of using PEP numbers in
    their permanent URLs, rather than their names.


    Cheers,
    Nick.


    P.S. I think this change makes this PEP another fine example of why
    reference implementations are such an important part of the process - they
    usually uncover issues and implications that *nobody* had thought of yet :)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/import-sig/attachments/20150515/80032c4f/attachment-0001.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimport-sig @
categoriespython
postedMay 7, '15 at 3:35p
activeMay 15, '15 at 6:10a
posts9
users2
websitepython.org

2 users in discussion

Petr Viktorin: 5 posts Nick Coghlan: 4 posts

People

Translate

site design / logo © 2018 Grokbase