FAQ

[Python] PEP 324: popen5 - New POSIX process module

Peter Åstrand
Jan 3, 2004 at 1:47 pm
There's a new PEP available:

PEP 324: popen5 - New POSIX process module

A copy is included below. Comments are appreciated.

----

PEP: 324
Title: popen5 - New POSIX process module
Version: $Revision: 1.4 $
Last-Modified: $Date: 2004/01/03 10:32:53 $
Author: Peter Astrand <astrand at lysator.liu.se>
Status: Draft
Type: Standards Track (library)
Created: 19-Nov-2003
Content-Type: text/plain
Python-Version: 2.4


Abstract

This PEP describes a new module for starting and communicating
with processes on POSIX systems.


Motivation

Starting new processes is a common task in any programming
language, and very common in a high-level language like Python.
Good support for this task is needed, because:

- Inappropriate functions for starting processes could mean a
security risk: If the program is started through the shell, and
the arguments contain shell meta characters, the result can be
disastrous. [1]

- It makes Python an even better replacement language for
over-complicated shell scripts.

Currently, Python has a large number of different functions for
process creation. This makes it hard for developers to choose.

The popen5 modules provides the following enhancements over
previous functions:

- One "unified" module provides all functionality from previous
functions.

- Cross-process exceptions: Exceptions happening in the child
before the new process has started to execute are re-raised in
the parent. This means that it's easy to handle exec()
failures, for example. With popen2, for example, it's
impossible to detect if the execution failed.

- A hook for executing custom code between fork and exec. This
can be used for, for example, changing uid.

- No implicit call of /bin/sh. This means that there is no need
for escaping dangerous shell meta characters.

- All combinations of file descriptor redirection is possible.
For example, the "python-dialog" [2] needs to spawn a process
and redirect stderr, but not stdout. This is not possible with
current functions, without using temporary files.

- With popen5, it's possible to control if all open file
descriptors should be closed before the new program is
executed.

- Support for connecting several subprocesses (shell "pipe").

- Universal newline support.

- A communicate() method, which makes it easy to send stdin data
and read stdout and stderr data, without risking deadlocks.
Most people are aware of the flow control issues involved with
child process communication, but not all have the patience or
skills to write a fully correct and deadlock-free select loop.
This means that many Python applications contain race
conditions. A communicate() method in the standard library
solves this problem.


Rationale

The following points summarizes the design:

- popen5 was based on popen2, which is tried-and-tested.

- The factory functions in popen2 have been removed, because I
consider the class constructor equally easy to work with.

- popen2 contains several factory functions and classes for
different combinations of redirection. popen5, however,
contains one single class. Since popen5 supports 12 different
combinations of redirection, providing a class or function for
each of them would be cumbersome and not very intuitive. Even
with popen2, this is a readability problem. For example, many
people cannot tell the difference between popen2.popen2 and
popen2.popen4 without using the documentation.

- One small utility function is provided: popen5.run(). It aims
to be an enhancement over os.system(), while still very easy to
use:

- It does not use the Standard C function system(), which has
limitations.

- It does not call the shell implicitly.

- No need for quoting; using a variable argument list.

- The return value is easier to work with.

- The "preexec" functionality makes it possible to run arbitrary
code between fork and exec. One might ask why there are special
arguments for setting the environment and current directory, but
not for, for example, setting the uid. The answer is:

- Changing environment and working directory is considered
fairly common.

- Old functions like spawn() has support for an
"env"-argument.

- env and cwd are considered quite cross-platform: They make
sense even on Windows.

- No MS Windows support is available, currently. To be able to
provide more functionality than what is already available from
the popen2 module, help from C modules is required.


Specification

This module defines one class called Popen:

class Popen(args, bufsize=0, argv0=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, preexec_args=(), close_fds=0,
cwd=None, env=None, universal_newlines=0)

Arguments are:

- args should be a sequence of program arguments. The program to
execute is normally the first item in the args sequence, but can
be explicitly set by using the argv0 argument. The Popen class
uses os.execvp() to execute the child program.

- bufsize, if given, has the same meaning as the corresponding
argument to the built-in open() function: 0 means unbuffered, 1
means line buffered, any other positive value means use a buffer
of (approximately) that size. A negative bufsize means to use
the system default, which usually means fully buffered. The
default value for bufsize is 0 (unbuffered).

- stdin, stdout and stderr specify the executed programs' standard
input, standard output and standard error file handles,
respectively. Valid values are PIPE, an existing file
descriptor (a positive integer), an existing file object, and
None. PIPE indicates that a new pipe to the child should be
created. With None, no redirection will occur; the child's file
handles will be inherited from the parent. Additionally, stderr
can be STDOUT, which indicates that the stderr data from the
applications should be captured into the same file handle as for
stdout.

- If preexec_fn is set to a callable object, this object will be
called in the child process just before the child is executed,
with arguments preexec_args.

- If close_fds is true, all file descriptors except 0, 1 and 2
will be closed before the child process is executed.

- If cwd is not None, the current directory will be changed to cwd
before the child is executed.

- If env is not None, it defines the environment variables for the
new process.

- If universal_newlines is true, the file objects fromchild and
childerr are opened as a text files, but lines may be terminated
by any of '\n', the Unix end-of-line convention, '\r', the
Macintosh convention or '\r\n', the Windows convention. All of
these external representations are seen as '\n' by the Python
program. Note: This feature is only available if Python is
built with universal newline support (the default). Also, the
newlines attribute of the file objects fromchild, tochild and
childerr are not updated by the communicate() method.

The module also defines one shortcut function:

run(*args):
Run command with arguments. Wait for command to complete,
then return the returncode attribute. Example:

retcode = popen5.run("stty", "sane")


Exceptions
----------
Exceptions raised in the child process, before the new program has
started to execute, will be re-raised in the parent. Additionally,
the exception object will have one extra attribute called
'child_traceback', which is a string containing traceback
information from the child's point of view.

The most common exception raised is OSError. This occurs, for
example, when trying to execute a non-existent file. Applications
should prepare for OSErrors.

A PopenException will also be raised if Popen is called with
invalid arguments.


Security
--------
popen5 will never call /bin/sh implicitly. This means that all
characters, including shell metacharacters, can safely be passed
to child processes.


Popen objects
-------------
Instances of the Popen class have the following methods:

poll()
Returns -1 if child process hasn't completed yet, or its exit
status otherwise. See below for a description of how the exit
status is encoded.

wait()
Waits for and returns the exit status of the child process.
The exit status encodes both the return code of the process
and information about whether it exited using the exit()
system call or died due to a signal. Functions to help
interpret the status code are defined in the os module (the
W*() family of functions).

communicate(input=None)
Interact with process: Send data to stdin. Read data from
stdout and stderr, until end-of-file is reached. Wait for
process to terminate. The optional stdin argument should be a
string to be sent to the child process, or None, if no data
should be sent to the child.

communicate() returns a tuple (stdout, stderr).

Note: The data read is buffered in memory, so do not use this
method if the data size is large or unlimited.

The following attributes are also available:

fromchild
A file object that provides output from the child process.

tochild
A file object that provides input to the child process.

childerr
A file object that provides error output from the child
process.

pid
The process ID of the child process.

returncode
The child return code. A None value indicates that the
process hasn't terminated yet. A negative value means that
the process was terminated by a signal with number
-returncode.


Open Issues

Perhaps the module should be called something like "process",
instead of "popen5".


Reference Implementation

A reference implementation is available from
http://www.lysator.liu.se/~astrand/popen5/.


References

[1] Secure Programming for Linux and Unix HOWTO, section 8.3.
http://www.dwheeler.com/secure-programs/

[2] Python Dialog
http://pythondialog.sourceforge.net/


Copyright

This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:


--
/Peter ?strand <astrand at lysator.liu.se>
reply

Search Discussions

18 responses

  • Martin v. Loewis at Jan 3, 2004 at 9:50 pm

    Peter ?strand wrote:
    This PEP describes a new module for starting and communicating
    with processes on POSIX systems.
    I see many aspects in this PEP that improve the existing implementation
    without changing the interface. I would suggest that you try to enhance
    the existing API (making changes to its semantics where reasonable),
    instead of coming up with a completely new module.

    With that approach, existing applications could use these features
    with no or little change.
    - One "unified" module provides all functionality from previous
    functions.
    I doubt this is a good thing. Different applications have different
    needs - having different API for them is reasonable.
    - Cross-process exceptions: Exceptions happening in the child
    before the new process has started to execute are re-raised in
    the parent. This means that it's easy to handle exec()
    failures, for example. With popen2, for example, it's
    impossible to detect if the execution failed.
    This is a bug in popen2, IMO. Fixing it is a good thing, but does not
    require a new module.
    - A hook for executing custom code between fork and exec. This
    can be used for, for example, changing uid.
    Such a hook could be merged as a keyword argument into the existing
    API.
    - No implicit call of /bin/sh. This means that there is no need
    for escaping dangerous shell meta characters.
    This could be an option to the existing API. Make sure it works on
    all systems, though.
    - All combinations of file descriptor redirection is possible.
    For example, the "python-dialog" [2] needs to spawn a process
    and redirect stderr, but not stdout. This is not possible with
    current functions, without using temporary files.
    Sounds like a new function on the popen2 module.
    - With popen5, it's possible to control if all open file
    descriptors should be closed before the new program is
    executed.
    This should be an option on the existing API.
    - Support for connecting several subprocesses (shell "pipe").
    Isn't this available already, as the shell supports pipe creation,
    anyway?
    - Universal newline support.
    This should be merged into the existing code.
    - A communicate() method, which makes it easy to send stdin data
    and read stdout and stderr data, without risking deadlocks.
    Most people are aware of the flow control issues involved with
    child process communication, but not all have the patience or
    skills to write a fully correct and deadlock-free select loop.
    Isn't asyncore supposed to simplify that?

    So in short, I'm -1 on creating a new module, but +1 on merging
    most of these features into the existing code base - they are good
    features.

    Regards,
    Martin
  • Peter Astrand at Jan 3, 2004 at 10:27 pm

    On Sat, 3 Jan 2004, Martin v. Loewis wrote:

    - One "unified" module provides all functionality from previous
    functions.
    I doubt this is a good thing. Different applications have different
    needs - having different API for them is reasonable.
    I don't agree. I have used all of the existing mechanism in lots of apps,
    and it's just a pain. There are lots of functions to choose between, but
    none does what you really want.

    - Cross-process exceptions: Exceptions happening in the child
    before the new process has started to execute are re-raised in
    the parent. This means that it's easy to handle exec()
    failures, for example. With popen2, for example, it's
    impossible to detect if the execution failed.
    This is a bug in popen2, IMO. Fixing it is a good thing, but does not
    require a new module.
    "Fixing popen2" would mean a break old applications; exceptions will
    happen, which apps are not prepared of.

    - A hook for executing custom code between fork and exec. This
    can be used for, for example, changing uid.
    Such a hook could be merged as a keyword argument into the existing
    API.
    Into which module/method/function? There is no one flexible enough. The
    case for redirecting only stderr is just one example; this is simple not
    possible with the current API.

    - All combinations of file descriptor redirection is possible.
    For example, the "python-dialog" [2] needs to spawn a process
    and redirect stderr, but not stdout. This is not possible with
    current functions, without using temporary files.
    Sounds like a new function on the popen2 module.
    To support all combinations, 12 different functions are necessary. Who
    will remember what popen2.popen11() means?

    - Support for connecting several subprocesses (shell "pipe").
    Isn't this available already, as the shell supports pipe creation,
    anyway?
    With popen5, you can do it *without* using the shell.

    - Universal newline support.
    This should be merged into the existing code.
    There's already a bug about this; bug 788035. This is what one of the
    comment says:

    "But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly
    complicated with all the platform special cases that I'm loath to touch
    it..."

    I haven't checked if this is really true, though.

    - A communicate() method, which makes it easy to send stdin data
    and read stdout and stderr data, without risking deadlocks.
    Most people are aware of the flow control issues involved with
    child process communication, but not all have the patience or
    skills to write a fully correct and deadlock-free select loop.
    Isn't asyncore supposed to simplify that?
    Probably not. The description says:

    "This module provides the basic infrastructure for writing asynchronous
    socket service clients and servers."

    It's not obvious to me how this module could be use as a "shell backquote"
    replacement (which is what communicate() is about). It's probably possible
    though; I haven't tried. Even if this is possible I guess we need some
    kind of "entry" or "wrapper" method in the popen module to simplify things
    for the user. My guess is that an communicate() method that uses asyncore
    would be as long/complicated as the current implementation. The current
    implementation is only 68 lines, including comments.

    So in short, I'm -1 on creating a new module, but +1 on merging
    most of these features into the existing code base - they are good
    features.
    Well, I don't see how this could be done easily: The current API is not
    flexible enough, and some things (like cross-process exceptions) breaks
    compatibility.

    Writing a good popen module is hard. Providing cross-platform support (for
    Windows, for example) is even harder. Trying to retrofit a good popen
    implementation into an old API without breaking compatibility seems
    impossible to me. I'm not prepared to try.


    --
    /Peter ?strand <astrand at lysator.liu.se>
  • Martin v. Loewis at Jan 4, 2004 at 12:14 am

    Peter Astrand wrote:
    I don't agree. I have used all of the existing mechanism in lots of apps,
    and it's just a pain. There are lots of functions to choose between, but
    none does what you really want.
    So enhance them, instead of replacing them.
    - Cross-process exceptions: Exceptions happening in the child
    before the new process has started to execute are re-raised in
    the parent.
    This is a bug in popen2, IMO. Fixing it is a good thing, but does not
    require a new module.
    "Fixing popen2" would mean a break old applications; exceptions will
    happen, which apps are not prepared of.
    I find that an acceptable incompatibility, and it will likely break
    no existing application. Applications usually expect that the program
    they start actually exists; it is a good thing that they now can
    detect the error that the missing/non-executable application.

    Errors should never pass silently.
    - A hook for executing custom code between fork and exec. This
    can be used for, for example, changing uid.
    Such a hook could be merged as a keyword argument into the existing
    API.

    Into which module/method/function?
    For example, popen2.popen2, as argument preexec_fn.

    There is no one flexible enough. The
    case for redirecting only stderr is just one example; this is simple not
    possible with the current API.
    Can you elaborate? What is the specific problem, how does your preexec
    function look like, and how is it used with popen5. I can then show you
    how it could be used with popen2, if that was enhanced appropriately.
    - All combinations of file descriptor redirection is possible.
    For example, the "python-dialog" [2] needs to spawn a process
    and redirect stderr, but not stdout. This is not possible with
    current functions, without using temporary files.
    Sounds like a new function on the popen2 module.

    To support all combinations, 12 different functions are necessary. Who
    will remember what popen2.popen11() means?
    Why is that? Just add a single function, with arguments
    stdin/stdout/stderr. No need for 12 functions. Then explain the existing
    functions in terms of your new function (if possible).
    - Support for connecting several subprocesses (shell "pipe").
    Isn't this available already, as the shell supports pipe creation,
    anyway?

    With popen5, you can do it *without* using the shell.
    Why is that a good thing?
    There's already a bug about this; bug 788035. This is what one of the
    comment says:

    "But this whole popen{,2,3,4} section of posixmodule.c is so fiendishly
    complicated with all the platform special cases that I'm loath to touch
    it..."

    I haven't checked if this is really true, though.
    You really should work with the existing code base. Ignoring it is a
    guarantee that your PEP will be rejected. (Studying it, and then
    providing educated comments about it, might get you through)

    I think this is the core problem of your approach: You throw away all
    past history, and imply that you can do better than all prior
    contributors could. Honestly, this is doubtful. The current code
    is so complicated because implementing pipes is complicated.
    Well, I don't see how this could be done easily: The current API is not
    flexible enough, and some things (like cross-process exceptions) breaks
    compatibility.
    I never said it would be easy. However, introducing a new popen module
    is a major change, and there must be strong indications that the current
    API cannot be enhanced before throwing it away.

    There should be one-- and preferably only one --obvious way to do it.

    As for breaking compatibility: This is what the PEP should study in
    detail. It is sometimes acceptable to break compatibility, if
    applications are likely to be improved by the change. *Any* change
    can, in principle, break compatibility. Suppose I had an application
    that did

    from popen5 import open

    This application might break if your proposed change is implemented,
    as a new module is added. So you can't claim "I will break no programs".
    Writing a good popen module is hard. Providing cross-platform support (for
    Windows, for example) is even harder. Trying to retrofit a good popen
    implementation into an old API without breaking compatibility seems
    impossible to me. I'm not prepared to try.
    So I continue to be -1 with your PEP.

    Regards,
    Martin
  • Peter Astrand at Jan 4, 2004 at 11:02 am

    On Sun, 4 Jan 2004, Martin v. Loewis wrote:

    "Fixing popen2" would mean a break old applications; exceptions will
    happen, which apps are not prepared of.
    I find that an acceptable incompatibility, and it will likely break
    no existing application.
    Not true. There are lots or apps out there that uses fallback commands:
    tries to execute one, and if it doesn't exist, tries another one. (One
    example is jakarta-gump, see
    http://cvs.apache.org/viewcvs.cgi/jakarta-gump/python/gump/utils/launcher.py?rev=1.6&view=auto)
    With the current API, you do this by checking if the return code is 127.
    No-one is prepared for an exception.

    The return code stuff is also very problematic, and is another reason why
    make a new module and not "enhance" the old ones. With the current API
    (which calls /bin/sh most of the time), some returncodes are overloaded by
    the shell. The shell uses these return codes:

    126: the command was found but is not executable
    127: the command was not found
    128+n: the command was terminated by signal n

    This means that it is currently impossible to use these return codes for
    programs launched via the current API, since you cannot tell the
    difference between a 127 generated by a successful call to your command,
    and a 127 generated by the shell.

    I don't see how this can be solved by "enhancing" the current functions,
    without breaking old applications.

    Applications usually expect that the program
    they start actually exists; it is a good thing that they now can
    detect the error that the missing/non-executable application.
    There are lots of other errors as well, not just missing/non-executable
    programs.

    There is no one flexible enough. The
    case for redirecting only stderr is just one example; this is simple not
    possible with the current API.
    Can you elaborate? What is the specific problem, how does your preexec
    function look like, and how is it used with popen5. I can then show you
    how it could be used with popen2, if that was enhanced appropriately.
    Yes, the preexec function feature could possiby be added popen2. This is
    not the problem.

    Sounds like a new function on the popen2 module.

    To support all combinations, 12 different functions are necessary. Who
    will remember what popen2.popen11() means?
    Why is that? Just add a single function, with arguments
    stdin/stdout/stderr. No need for 12 functions. Then explain the existing
    functions in terms of your new function (if possible).
    Just like popen5.Popen? Yes, that could be done. We would still have the
    problem with returncode incompatibilites, exceptions and such.

    With popen5, you can do it *without* using the shell.
    Why is that a good thing?
    1) Performance. No need for parsing .bashrc on every call...

    2) Security. You can do pipes without having to deal with all the quoting
    issues.

    3) Getting rid of the shells overloading of return codes

    It's also much more elegant, IMHO.

    I think this is the core problem of your approach: You throw away all
    past history, and imply that you can do better than all prior
    contributors could. Honestly, this is doubtful.
    In a discussion like this, I think it's important to separate the new API
    from the new implementation:

    1) The new API. If you look at the popen5 implementation and PEP, it's
    obvious that I haven't throwed away the history. I have tried to take all
    the good parts from the various existing functions. The documentation
    contains 140 lines describing how to migrate from the earlier functions.

    Much of the current API has really never been designed. The API for the
    functions os.popen, os.system, os.popen2 comes from the old POSIX
    functions. These were never intended to be flexible, cross-platform on
    anything like that. So, it's not hard to do better than these.


    2) The new implementation.

    When I wrote popen5, I took some good ideas out of popen2. The rest of the
    code is written from scratch.

    The current code
    is so complicated because implementing pipes is complicated.
    Let's keep the POSIX stuff separated from the Windows stuff. popen2.py
    does not depend on posixmodule.c on POSIX systems, and popen2.py is not
    complicated at all.

    The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated
    because:

    1) It's written in low-level C

    2) It contains lots of old DOS stuff

    3) It tries to launch the program through the shell (which is always a
    pain).

    Well, I don't see how this could be done easily: The current API is not
    flexible enough, and some things (like cross-process exceptions) breaks
    compatibility.
    I never said it would be easy. However, introducing a new popen module
    is a major change, and there must be strong indications that the current
    API cannot be enhanced before throwing it away.
    I wouldn't say that introducing a new module is a "major change".

    Of course, we don't want to end up writing "popen6" in two years, because
    we've realized that "popen5" is too limited. That's why we should try to
    get it exactly right this time. I think it would be more useful it we put
    our energy into trying to accomplish that.

    As for breaking compatibility: This is what the PEP should study in
    detail. It is sometimes acceptable to break compatibility, if
    applications are likely to be improved by the change. *Any* change
    can, in principle, break compatibility. Suppose I had an application
    that did

    from popen5 import open

    This application might break if your proposed change is implemented,
    as a new module is added. So you can't claim "I will break no programs".
    Isn't this quite a silly example?


    --
    /Peter ?strand <astrand at lysator.liu.se>
  • Martin v. Loewis at Jan 4, 2004 at 11:48 am

    Peter Astrand wrote:
    The popen* stuff for Windows (and OS2 etc) in posixmodule.c is complicated
    because:

    1) It's written in low-level C

    2) It contains lots of old DOS stuff
    Such code could be eliminated now; DOS is not supported anymore.
    Would you like to contribute a patch in that direction?

    Regards,
    Martin
  • Barry Scott at Jan 4, 2004 at 3:16 pm

    At 04-01-2004 00:14, Martin v. Loewis wrote:
    With popen5, you can do it *without* using the shell.
    Why is that a good thing?
    Because using the shell on windows is causing a DOS box window to appear for
    every popen2/3/4 use in a windowed python program on Windows.

    Barry
  • Donn Cave at Jan 4, 2004 at 4:39 am
    Quoth "Martin v. Loewis" <martin at v.loewis.de>:
    ...
    Errors should never pass silently.
    Which reminds me, this is a much stickier problem if you take
    into account errors that happen _after_ the fork.

    It can be highly useful to account for them, though. In my
    own process handling functions, I give myself the option, and
    the function I nearly always use is the one that raises an
    exception if the command exits with an error; the value of
    the error comes from unit 2 (a.k.a. stderr, though of course
    not actually stderr since it would be foolish to use a file
    object here instead of a file descriptor.)

    I haven't looked at the proposed module, would suggest that it
    be placed on a web page so people can do so conveniently. In
    my experience, attempts like this (including my own) tend to
    founder on the inherent complexity of the problem, failing to
    provide an interface that is simple and obvious enough to be
    generally useful.

    Donn Cave, donn at drizzle.com
  • Jess Austin at Jan 6, 2004 at 3:42 pm
    Martin, you seem to be quite fond of popen2.

    "Martin v. Loewis" <martin at v.loewis.de> wrote in message news:<mailman.60.1073175311.12720.python-list at python.org>...
    So enhance them, instead of replacing them. and...
    I find that an acceptable incompatibility, and it will likely break
    no existing application. Applications usually expect that the program and...
    Why is that? Just add a single function, with arguments
    stdin/stdout/stderr. No need for 12 functions. Then explain the existing
    functions in terms of your new function (if possible). and...
    You really should work with the existing code base. Ignoring it is a
    guarantee that your PEP will be rejected. (Studying it, and then
    providing educated comments about it, might get you through)
    But you're not quite so fond of the people who are using popen2.
    Dumping a bunch of interface changes on them at this late date will
    not make users happy. They worked to get their code functioning with
    what you admit is not a perfect module. Now you suggest breaking all
    that installed code the next time a new version comes out?

    As someone who has used popen2 and who looks forward to using this new
    "process" module, I value user happiness. This seems to be one of the
    many situations in which happiness is highly correlated with choice.
    If I have some old code that works as well as I need it to work with
    popen2, I choose to leave it the hell alone, and I appreciate it not
    getting broken behind my back. If I'm writing something new, and I
    find the interface offered by "process" more logical, I will choose to
    use that. Further, I will appreciate not having to reacquaint myself
    with the numerological arcana of popen2, if only to know what to
    ignore while using the new function. If I have code I could never get
    to work properly with popen2 and suddenly "process" becomes available,
    I will be only too happy to modify it for use with the new module. If
    I'm an utter newbie who is thinking about processes for the first time
    in my life, it doesn't matter where the gurus point me so long as they
    point me to something that makes sense.

    I admit that I haven't looked at this proposed package at all yet;
    I've only read the PEP. It's quite possible that it doesn't fulfill
    all my hopes and dreams, that it won't contribute to my choice and my
    happiness. I note with some trepidation that plans for the Windows
    platform aren't fully fleshed out. But you haven't offered criticism
    of the implementation, your objection is that "it's not called
    popen2". I suspect that few people will give that objection much
    weight, whether they have used popen2 or not.

    "Martin v. Loewis" <martin at v.loewis.de> continues...
    I never said it would be easy. However, introducing a new popen module
    is a major change, and there must be strong indications that the current
    API cannot be enhanced before throwing it away.

    There should be one-- and preferably only one --obvious way to do it.

    As for breaking compatibility: This is what the PEP should study in
    detail. It is sometimes acceptable to break compatibility, if
    I think that the PEP contains the "strong indications" you require.
    While it's nice to have "one way" that we can tell newbies, it isn't
    mandatory that there be only one way we've ever done it. That would
    inhibit progress. I support functioning code, and progress in the
    directions that users choose. +1.

    later,
    Jess
  • Michael Chermside at Jan 5, 2004 at 2:51 pm

    Peter writes:
    There's a new PEP available:

    PEP 324: popen5 - New POSIX process module

    A copy is included below. Comments are appreciated.
    Just one really minor suggestion: in the class Popen, the default values
    for the arguments close_fds and universal_newlines should be False not 0.

    And I vote +1 on changing the name from "popen5" to something else.
    "process" (as suggested in the PEP) isn't too bad, but I'd certainly be
    open to another name. If this is to replace the entire rest of the popenX
    family for most normal purposes, a separate, more comprehensible name
    will help newbies find it.

    -- Michael Chermside
  • Skip Montanaro at Jan 5, 2004 at 2:59 pm
    Michael> And I vote +1 on changing the name from "popen5" to something
    Michael> else. "process" (as suggested in the PEP) isn't too bad, but
    Michael> I'd certainly be open to another name. If this is to replace
    Michael> the entire rest of the popenX family for most normal purposes,
    Michael> a separate, more comprehensible name will help newbies find it.

    +2. It would be one thing to have a module named "popen". A casual user
    can easily pop open the docs and see that it stands for "process open".
    Determining the difference between popen and popenN (for N in range(1,6)) is
    a bit more challenging. I think it's time to abstract the names (at least)
    frome their C heritage.

    Skip
  • Paul Moore at Jan 5, 2004 at 10:53 pm
    I've read the PEP, and I'd like to help with an implementation for
    Windows, if no-one else is doing it. Initially, I'll probably use
    win32all (or maybe ctypes) but the intention is that longer term the
    necessary functions be migrated into a supporting C extension.

    I do have some comments, which may be of interest.

    The whole interface feels very Unix-specific. In particular,

    1. The preexec_* arguments have no meaning on Windows, where the
    low-level functionality is CreateProcess (spawn) rather than
    fork/exec. This isn't too crucial, as use of the argument can
    either be ignored, or probably better, treated as an error on
    Windows.

    2. The method name poll() doesn't seem appropriate in a Windows
    context (I don't know how it fits with Unix). Better might be
    is_complete() returning True/False, and use returncode to get the
    status separately.

    3. The whole thing about the return code encoding both the exit status
    and the signal number is very Unix-specific. Why not split them -
    status and signal attributes, and maybe have is_complete() return 3
    possible values - +1 = completed OK, 0 = still running, -1 = died
    due to a signal.

    4. wait() should be defined as "wait for the subprocess to finish, and
    then return is_complete()". That's basically what you have now,
    it's just a bit clearer if you're explicit about the intent.

    5. I don't like the names fromchild, tochild, and childerr. What's
    wrong with stdin, stdout, and stderr? OK, stdin is open for write,
    and the other two for read, but that's fairly obvious as soon as
    you think about it a bit.

    The biggest issue, though, is that args as a sequence of program
    arguments is very Unix-specific. This is a big incompatibility between
    Unix and Windows - in Windows, the low-level operation (CreateProcess)
    takes a *command line* which is passed unchanged to the child process.
    The child then parses that command line for itself (often, but not
    always, in the C runtime). In Unix, exec() takes an *argument list*.
    If you have a command line, you have to split it yourself, a task
    which is usually delegated to /bin/sh in the absence of anything else.
    This is hard to handle portably.

    I'd suggest that the Windows implementation allow two possibilities
    for the "args" argument. If args is a string, it is passed intact to
    CreateProcess. That is the recommended, and safe, approach. The shell
    is still not invoked, so there's no security issue. If a (non-string)
    sequence is passed, the Windows implementation should (for Unix
    compatibility) build a command line, using its best attempt to quote
    things appropriately (this can never be safe, as different programs
    can implement different command line parsing). This is not safe (wrong
    answers are the main problem, not security holes), but will improve
    portability.

    I agree with Guido's comments on python-dev - this module (popen5 is a
    *horrible* name - I'd prefer "process", but am happy if someone comes
    up with a better suggestion) should aim to be the clear "best of
    breed" process control module for all platforms.

    Some other points may well come up as I try to implement a Windows
    version.

    I hope this is of some use,
    Paul.
    --
    This signature intentionally left blank
  • Peter Astrand at Jan 6, 2004 at 4:47 pm

    I've read the PEP, and I'd like to help with an implementation for
    Windows, if no-one else is doing it.
    Help is always appreciated. I haven't got very far yet. It might be useful
    to look at http://starship.python.net/~tmick/#process as well (but we
    should really try to get a smaller code base than this).

    Initially, I'll probably use
    win32all (or maybe ctypes) but the intention is that longer term the
    necessary functions be migrated into a supporting C extension.
    I like this approach. Ideally, popen5 should work on any Python 2.3+
    system with win32all, or using the build-in support, when we have added
    that. So, when we are writing supporting C code, we should probably keep
    the interface from win32all.

    1. The preexec_* arguments have no meaning on Windows, where the
    low-level functionality is CreateProcess (spawn) rather than
    fork/exec. This isn't too crucial, as use of the argument can
    either be ignored, or probably better, treated as an error on
    Windows.
    Yes, I've thought of this. I also think that these arguments (preexec_fn
    and preexec_arg) should be treated as errors.

    2. The method name poll() doesn't seem appropriate in a Windows
    context (I don't know how it fits with Unix). Better might be
    is_complete() returning True/False, and use returncode to get the
    status separately.
    Yes, perhaps. This needs some thought.

    3. The whole thing about the return code encoding both the exit status
    and the signal number is very Unix-specific. Why not split them -
    status and signal attributes, and maybe have is_complete() return 3
    possible values - +1 = completed OK, 0 = still running, -1 = died
    due to a signal.
    Perhaps. poll() and wait() is mainly a heritage from popen2. I have no
    objections to change this, if we can come up with a good and clean
    solution. Your idea looks interesting, although I've not convinced on the
    name "is_complete()".

    Currently, popen5 provides both the unaltered "exit status" (via
    poll/wait) and the "returncode" (via the .returncode attribute). This is
    good, because it's makes it easy to migrate from the earlier API.

    5. I don't like the names fromchild, tochild, and childerr. What's
    wrong with stdin, stdout, and stderr? OK, stdin is open for write,
    and the other two for read, but that's fairly obvious as soon as
    you think about it a bit.
    Here's how I see it:

    (fromchild, tochild, childerr):
    + Same as with popen2. Easy to migrate etc.

    + No risk for confusion when connecting the parents stdout to the childs
    stdin, for example.

    (stdin, stdout, stderr):
    + Nice symmetri with the arguments to the Popen class.

    + Not as ugly as (fromchild, tochild, childerr)


    I need input on this one. I'll change this to whatever people likes best.

    The biggest issue, though, is that args as a sequence of program
    arguments is very Unix-specific. This is a big incompatibility between
    Unix and Windows - in Windows, the low-level operation (CreateProcess)
    takes a *command line* which is passed unchanged to the child process.
    The child then parses that command line for itself (often, but not
    always, in the C runtime). In Unix, exec() takes an *argument list*.
    If you have a command line, you have to split it yourself, a task
    which is usually delegated to /bin/sh in the absence of anything else.
    This is hard to handle portably.
    Oh, I've never thought of this.

    I'd suggest that the Windows implementation allow two possibilities
    for the "args" argument. If args is a string, it is passed intact to
    CreateProcess. That is the recommended, and safe, approach. The shell
    is still not invoked, so there's no security issue. If a (non-string)
    sequence is passed, the Windows implementation should (for Unix
    compatibility) build a command line, using its best attempt to quote
    things appropriately (this can never be safe, as different programs
    can implement different command line parsing). This is not safe (wrong
    answers are the main problem, not security holes), but will improve
    portability.
    I've found some documentation on this.
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/progs_12.asp
    is interesting. It describes how the MS C runtime translates the
    commandline to an argv array. Let's call this "Algorithm A".

    When passed a sequence, popen5 should translate this sequence into a
    string by using algorithm A backwards. This should definitely be
    implemented in popen5.

    There are two more cases:

    1) Should popen5 support a string argument on Windows?

    and

    2) Should popen5 support a string argument on UNIX?


    You seems to have made up your mind about case 1, and even thinks that
    this should be "recommended". I'm not that sure.

    What about case 2 ? This could be supported by converting the string to an
    sequence using Algorithm A. One large problem though is that Algorithm A
    is *not* the same as a typical shell uses. For example, an OS X user might
    want to do:

    Popen("ls '/Applications/Internet Explorer'")

    This won't work if we use Algorithm A.

    If we extend Algorithm A to support single quotes as well, this will not
    work as expected on Windows:

    Popen("echo 'hello'")

    Sigh.

    I agree with Guido's comments on python-dev - this module (popen5 is a
    *horrible* name - I'd prefer "process", but am happy if someone comes
    up with a better suggestion) should aim to be the clear "best of
    breed" process control module for all platforms.
    The only drawback with "process" is that it is already in use by
    http://starship.python.net/~tmick/#process.

    I hope this is of some use,
    Indeed.

    --
    /Peter ?strand <astrand at lysator.liu.se>
  • Paul Boddie at Jan 6, 2004 at 10:31 am
    Paul Moore <paul.moore at atosorigin.com> wrote in message news:<mailman.121.1073343223.12720.python-list at python.org>...
    The whole interface feels very Unix-specific.
    Well, the subject of the original message did mention POSIX
    explicitly. ;-)

    Seriously, it would be very nice to have something equivalent on
    Windows that works as well as the POSIX-like implementations without
    the user of the module needing to use various arcane tricks to prevent
    deadlock between parent and child processes. Of course, as people
    often mention, Windows operating systems do have some POSIX API
    support, although their compliance to the POSIX standard seems to be
    best described as being at the "version 0.1a1" level.

    Paul
  • Moore, Paul at Jan 6, 2004 at 5:09 pm
    From: Peter Astrand [mailto:astrand at lysator.liu.se]
    The biggest issue, though, is that args as a sequence of program
    arguments is very Unix-specific. This is a big incompatibility between
    Unix and Windows - in Windows, the low-level operation (CreateProcess)
    takes a *command line* which is passed unchanged to the child process.
    The child then parses that command line for itself (often, but not
    always, in the C runtime). In Unix, exec() takes an *argument list*.
    If you have a command line, you have to split it yourself, a task
    which is usually delegated to /bin/sh in the absence of anything else.
    This is hard to handle portably.
    Oh, I've never thought of this.
    It's a major issue. I've been porting process-creation code from Unix to
    Windows for a long while now, and there simply isn't a good, compatible,
    answer.
    I've found some documentation on this.
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/progs_12.asp
    is interesting. It describes how the MS C runtime translates the
    commandline to an argv array. Let's call this "Algorithm A".
    The problem here is that not all programs use the MSVC runtime.
    Borland C uses a subtly different algorithm, and programs written in
    other languages often don't use an argv concept at all.

    Even adding quotes when not necessary can cause certain programs to
    fail.
    When passed a sequence, popen5 should translate this sequence into a
    string by using algorithm A backwards. This should definitely be
    implemented in popen5.
    There are two more cases:

    1) Should popen5 support a string argument on Windows?

    and

    2) Should popen5 support a string argument on UNIX?
    You seems to have made up your mind about case 1, and even thinks that
    this should be "recommended". I'm not that sure.
    Using a list works most of the time, but can break in subtle, and surprising
    ways. I have coded "Algorithm A in reverse" a number of times, and *always*
    managed to construct a program that "broke". With Windows GUI programs
    (not the key use case for this module, I admit) it's not remotely hard to
    construct a broken example :-(
    What about case 2 ? This could be supported by converting the string to an
    sequence using Algorithm A. One large problem though is that Algorithm A
    is *not* the same as a typical shell uses. For example, an OS X user might
    want to do:

    Popen("ls '/Applications/Internet Explorer'")

    This won't work if we use Algorithm A.

    If we extend Algorithm A to support single quotes as well, this will not
    work as expected on Windows:

    Popen("echo 'hello'")

    Sigh.
    Sigh indeed. It's just not cross-platform no matter how you try. BTW, your
    echo example won't work in any case on Windows, as "echo" is a shell
    builtin, and not available as a standalone executable. So on Windows,
    you'd need

    Popen("cmd /c echo hello")

    And don't get me started on how cmd/c's quoting peculiarities can make this
    even worse :-(
    The only drawback with "process" is that it is already in use by
    http://starship.python.net/~tmick/#process.
    Ah. I wasn't aware of that module. I wonder whether Trent would be willing
    to combine his code and yours somehow, and donate the name?

    Paul
  • Peter Astrand at Jan 6, 2004 at 9:44 pm

    On Tue, 6 Jan 2004, Moore, Paul wrote:

    1) Should popen5 support a string argument on Windows?

    and

    2) Should popen5 support a string argument on UNIX?
    You seems to have made up your mind about case 1, and even thinks that
    this should be "recommended". I'm not that sure.
    Using a list works most of the time, but can break in subtle, and surprising
    ways. I have coded "Algorithm A in reverse" a number of times, and *always*
    managed to construct a program that "broke". With Windows GUI programs
    (not the key use case for this module, I admit) it's not remotely hard to
    construct a broken example :-(
    Ok, lets support case 1 then. Currently, I also think that we should
    support case 2 as well, using Algorithm A. We need to document that single
    quote escaping are not supported, though.

    The only drawback with "process" is that it is already in use by
    http://starship.python.net/~tmick/#process.
    Ah. I wasn't aware of that module.
    I wasn't either, until I was about to announce the PEP. That's why there's
    no reference to Trent's module in the PEP (yet).

    I wonder whether Trent would be willing
    to combine his code and yours somehow, and donate the name?
    I have contacted Trent; let's give hime some time to respond.


    --
    /Peter ?strand <astrand at lysator.liu.se>
  • Andrew MacIntyre at Jan 6, 2004 at 10:22 pm

    On Tue, 6 Jan 2004, Moore, Paul wrote:

    From: Peter Astrand [mailto:astrand at lysator.liu.se]
    The only drawback with "process" is that it is already in use by
    http://starship.python.net/~tmick/#process.
    Ah. I wasn't aware of that module. I wonder whether Trent would be willing
    to combine his code and yours somehow, and donate the name?
    There might be some logic to making this a subpackage of os (os.process ?),
    which already has a doc subsection for process management functions,
    rather than a separate top level module (like popen2).

    --
    Andrew I MacIntyre "These thoughts are mine alone..."
    E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370
    andymac at pcug.org.au (alt) | Belconnen ACT 2616
    Web: http://www.andymac.org/ | Australia
  • Paul Moore at Jan 6, 2004 at 11:05 pm

    Peter Astrand <astrand at lysator.liu.se> writes:

    I've read the PEP, and I'd like to help with an implementation for
    Windows, if no-one else is doing it.
    Help is always appreciated. I haven't got very far yet. It might be useful
    to look at http://starship.python.net/~tmick/#process as well (but we
    should really try to get a smaller code base than this).
    I've got a (so far *very* basic) implementation which works. I'll have
    a think about some of the issues which came up when I was writing it,
    and write up some notes.

    Paul
    --
    This signature intentionally left blank
  • Donn Cave at Jan 7, 2004 at 6:08 am
    Quoth Michael Chermside <mcherm at mcherm.com>:
    Peter writes:
    There's a new PEP available:

    PEP 324: popen5 - New POSIX process module

    A copy is included below. Comments are appreciated.
    ...
    And I vote +1 on changing the name from "popen5" to something else.
    "process" (as suggested in the PEP) isn't too bad, but I'd certainly be
    open to another name. If this is to replace the entire rest of the popenX
    family for most normal purposes, a separate, more comprehensible name
    will help newbies find it.
    "process" is not only ambiguous, it's a little off-target if the
    software in question is more about executing foreign commands, as
    opposed to process management in general. So I'd be open to another
    name, too.

    Donn Cave, donn at drizzle.com

Related Discussions