FAQ
I'm working on an ODBMS written in Python, for Python, and was
wondering if anyone was interested. In particular, I'd like to know
what features would be useful, and what types of use cases people
would have for a simple, but feature-rich object database.

The system that I'm developing is PyPerSyst, which began as a simple
persistence mechanism, but is now becoming a complete ODBMS. Some
details are available here:

http://www.orbtech.com/wiki/PyPerSyst

The code is available in CVS on SF:

http://sourceforge.net/projects/pypersyst/

As you'll see when you look at it, my goal is to provide many of the
features you would find in relational databases (declarative alternate
keys, referential integrity, etc.) without any of the impedence
mismatch associated with mapping between objects and relational
tables. And since this is Python, I've got several features I've
never seen in any database of any kind (like built-in, automatic,
self-maintained, bi-directional associations for all references).

So, what else would you like to have in a pure-Python ODBMS?

--
Patrick K. O'Brien
Orbtech http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------

Search Discussions

  • Pettersen, Bjorn S at Aug 28, 2003 at 11:07 pm

    From: Patrick K. O'Brien [mailto:pobrien at orbtech.com]

    I'm working on an ODBMS written in Python, for Python, and was
    wondering if anyone was interested. In particular, I'd like to know
    what features would be useful, and what types of use cases people
    would have for a simple, but feature-rich object database.

    The system that I'm developing is PyPerSyst, which began as a simple
    persistence mechanism, but is now becoming a complete ODBMS. Some
    details are available here:

    http://www.orbtech.com/wiki/PyPerSyst

    The code is available in CVS on SF:

    http://sourceforge.net/projects/pypersyst/
    I'd be interested, but can't seem to find docs, demos or tests through
    sf's web interface.. any pointers?

    - bjorn
  • Patrick K. O'Brien at Aug 29, 2003 at 12:38 am

    "Pettersen, Bjorn S" <BjornPettersen at fairisaac.com> writes:

    From: Patrick K. O'Brien [mailto:pobrien at orbtech.com]

    I'm working on an ODBMS written in Python, for Python, and was
    wondering if anyone was interested. In particular, I'd like to know
    what features would be useful, and what types of use cases people
    would have for a simple, but feature-rich object database.

    The system that I'm developing is PyPerSyst, which began as a simple
    persistence mechanism, but is now becoming a complete ODBMS. Some
    details are available here:

    http://www.orbtech.com/wiki/PyPerSyst

    The code is available in CVS on SF:

    http://sourceforge.net/projects/pypersyst/
    I'd be interested, but can't seem to find docs, demos or tests through
    sf's web interface.. any pointers?
    First of all, let me just make a caveat that this is still in the
    early stages of development. By that I mean that many features are
    coded, and there are a good many unit tests, but I haven't got much in
    the way of docs and demos. PyPerSyst is being used in a commercial
    application, so it does work quite well. But in no way am I
    advertising it as a finished product. I'm just looking for feedback
    from early adopters and developers with an interest.

    The main pypersyst package is here:

    http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/pypersyst/pypersyst/pypersyst/

    The unit tests are here:

    http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/pypersyst/pypersyst/pypersyst/test/

    I'm working on a simple demo (twistedcred), but haven't committed it
    to cvs yet. In the mean time, here is what the database portion of an
    application looks like:

    import os

    from pypersyst.database import Database
    from pypersyst.engine.engine import Engine
    from pypersyst.storage.storage import Storage

    from twistedcred import data
    from twistedcred.schema import cred

    def database():
    """Return a PyPerSyst database."""
    dir = os.path.dirname(data.__file__)
    app = '.twistedcred'
    storage = Storage(dir, app, binary=False, python=True)
    engine = Engine(storage, cred.Root)
    database = Database(engine)
    return database

    ---

    And here is the schema for the twistedcred database:

    from pypersyst import root
    from pypersyst.entity.entity import Entity


    class Avatar(Entity):
    """Avatar class."""

    _attrSpec = [
    'realm',
    'user',
    'name',
    ]

    _altkeySpec = [
    ('user', 'realm', 'name',),
    ]

    def __init__(self, user, realm, name='Avatar'):
    """Create Avatar instance."""
    self._prep(locals())
    Entity.__init__(self)


    class Realm(Entity):
    """Realm class."""

    _attrSpec = [
    'name',
    ]

    _altkeySpec = [
    ('name',),
    ]

    def __init__(self, name):
    """Create Realm instance."""
    self._prep(locals())
    Entity.__init__(self)


    class User(Entity):
    """User class."""

    _attrSpec = [
    'name',
    'hashedPassword',
    ]

    _altkeySpec = [
    ('name',),
    ]

    def __init__(self, name, hashedPassword=None):
    """Create User instance."""
    self._prep(locals())
    Entity.__init__(self)


    class Root(root.Root):
    """Root class."""

    _EntityClasses = [
    Avatar,
    Realm,
    User,
    ]

    You can create the database using PyCrust, for example, and interact
    with it like this:
    from twistedcred.database import database
    db = database.database()
    from pypersyst.entity import transaction as tx
    t = tx.Create('User', name='Bob')
    u1 = db.execute(t)
    u1.name
    'Bob'
    t = tx.Create('Realm', name='Whatever')
    r1 = db.execute(t)
    t = tx.Create('Avatar', name='MyAvatar', user=u1, realm=r1)
    a1 = db.execute(t)
    a1.user.name
    'Bob'
    t = tx.Create('User', name='Bob')
    u = db.execute(t)
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/database.py", line 27, in execute
    return self._engine.execute(transaction)
    File "/home/pobrien/Code/pypersyst/engine/engine.py", line 75, in execute
    return transaction.execute(self._root)
    File "/home/pobrien/Code/pypersyst/entity/transaction.py", line 31, in execute
    return self.EntityClass(**self.attrs)
    File "/home/pobrien/Code/twistedcred/schema/cred.py", line 65, in __init__
    Entity.__init__(self)
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 81, in __init__
    self.extent._insert(self)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 213, in _insert
    self._validate(instance, instance._attrs())
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 325, in _validate
    self._validatekeys(instance, attrs)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 335, in _validatekeys
    raise KeyError, msg
    KeyError: duplicate value ('Bob',) for altkey ('name',)
    u1.links
    {('Avatar', 'user'): [<twistedcred.schema.cred.Avatar object at 0x88a6294>]}
    r1.links
    {('Avatar', 'realm'): [<twistedcred.schema.cred.Avatar object at 0x88a6294>]}
    db.root['Avatar'].match(name='Ava')
    []
    db.root['Avatar'].search(name='Ava')
    [<twistedcred.schema.cred.Avatar object at 0x88a6294>]
    >>>

    I hope that helps demonstrate some of what it can do.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 1:16 am

    pobrien at orbtech.com (Patrick K. O'Brien) writes:

    I hope that helps demonstrate some of what it can do.
    I forgot to show a cool feature:
    t = tx.Delete(u1)
    db.execute(t)
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/database.py", line 27, in execute
    return self._engine.execute(transaction)
    File "/home/pobrien/Code/pypersyst/engine/engine.py", line 75, in execute
    return transaction.execute(self._root)
    File "/home/pobrien/Code/pypersyst/entity/transaction.py", line 49, in execute
    return root[self.classname]._delete(self.oid)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 204, in _delete
    raise error.DeleteRestricted, 'instance is referenced elsewhere'
    DeleteRestricted: instance is referenced elsewhere
    >>>

    What's that? You don't recall seeing referential integrity rules
    defined in the schema, or in the application code? That's right, you
    didn't. Curious?

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 1:26 am

    pobrien at orbtech.com (Patrick K. O'Brien) writes:

    pobrien at orbtech.com (Patrick K. O'Brien) writes:
    I hope that helps demonstrate some of what it can do.
    I forgot to show a cool feature:
    Here's another:
    u1.name
    'Bob'
    u1.name = 'Joe'
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 94, in __setattr__
    raise AttributeError, 'Modifications can only be made by transactions'
    AttributeError: Modifications can only be made by transactions

    So, let's use a transaction:
    t = tx.Update(u1, name='Joe')
    db.execute(t)
    <twistedcred.schema.cred.User object at 0x8884634>
    u1.name
    'Joe'
    >>>

    Of course, nobody is perfect. So what happens when we send a bad
    transaction:
    t = tx.Update(u1, foo='Joe')
    db.execute(t)
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/database.py", line 27, in execute
    return self._engine.execute(transaction)
    File "/home/pobrien/Code/pypersyst/engine/engine.py", line 75, in execute
    return transaction.execute(self._root)
    File "/home/pobrien/Code/pypersyst/entity/transaction.py", line 76, in execute
    return root[self.classname]._update(self.instance, **self.attrs)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 312, in _update
    self._validate(instance, combined)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 324, in _validate
    instance._validate(attrs)
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 157, in _validate
    raise error.InvalidAttribute, '%r is not an attribute' % name
    InvalidAttribute: 'foo' is not an attribute
    >>>

    Can you tell I've been having some fun with this? ;-)

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Jeremy Bowers at Aug 29, 2003 at 2:07 am

    On Thu, 28 Aug 2003 20:26:41 -0500, Patrick K. O'Brien wrote:
    u1.name
    'Bob'
    u1.name = 'Joe'
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 94, in
    __setattr__
    raise AttributeError, 'Modifications can only be made by transactions'
    AttributeError: Modifications can only be made by transactions

    So, let's use a transaction:
    So why *isn't* it a transaction? Unless you have a good reason not to, I'd
    suggest automatically "coercing" that into a transaction instead of
    throwing an error.

    Give an indication in the docs about the performance issues if you like,
    but make the trivially easy case easy.

    (I'm only really entering my maturity (IMHO) as a software engineer, but
    one of my rules of thumb for developing software for other people to use
    is that the API can ***never*** be too easy. Doing something hard may be a
    little tricky but if you can make the easy case still work, you're way
    ahead. And Python is one ass-kicking language in that regard; it's one of
    the reasons I love it so much, the APIs can be made so easy to use they
    sometimes fade into complete transparency, like "u1.name = 'joe'". (I've
    been focusing on how to write APIs for others to use, esp. in Open Source
    though it applies equally to any team effort, that will be successful,
    rather then ignored.))
  • Patrick K. O'Brien at Aug 29, 2003 at 2:46 am

    Jeremy Bowers <jerf at jerf.org> writes:
    On Thu, 28 Aug 2003 20:26:41 -0500, Patrick K. O'Brien wrote:
    u1.name
    'Bob'
    u1.name = 'Joe'
    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 94, in
    __setattr__
    raise AttributeError, 'Modifications can only be made by transactions'
    AttributeError: Modifications can only be made by transactions

    So, let's use a transaction:
    So why *isn't* it a transaction? Unless you have a good reason not
    to, I'd suggest automatically "coercing" that into a transaction
    instead of throwing an error.
    These are some of my reasons: 1) every transaction gets pickled and
    logged before executed, so that the database can recover from a crash,
    2) most of the other cool features depend on mutations passing through
    the extent manager for each class, 3) transparent transactions only
    seem like a good idea, 4) security is hard to enforce without explicit
    boundaries (read the Twisted docs regarding Perspective Broker), 5)
    explicit is better than implicit, especially when valuable persistent
    data is involved.
    Give an indication in the docs about the performance issues if you
    like, but make the trivially easy case easy.
    Performance has nothing to do with it, actually. Integrity and
    security are to blame.
    (I'm only really entering my maturity (IMHO) as a software engineer,
    but one of my rules of thumb for developing software for other
    people to use is that the API can ***never*** be too easy. Doing
    something hard may be a little tricky but if you can make the easy
    case still work, you're way ahead. And Python is one ass-kicking
    language in that regard; it's one of the reasons I love it so much,
    the APIs can be made so easy to use they sometimes fade into
    complete transparency, like "u1.name = 'joe'". (I've been focusing
    on how to write APIs for others to use, esp. in Open Source though
    it applies equally to any team effort, that will be successful,
    rather then ignored.))
    I agree that APIs are very important, and I've worked very hard on the
    API for PyPerSyst. At the same time, everything has a tradeoff. When
    I gave up on the dream of transparent transactions, there were a lot
    of benefits, and lots of other features fell into place. Even though
    I think PyPerSyst is one of the most elegant things I've ever coded,
    especially in terms of the API, I'm sure it can be better. So I
    welcome suggestions for improving it.

    If you can figure out a way to have transparent transactions, without
    giving up on any ACID properties considered mandatory for a DBMS, I
    would love to hear about it. Have you worked with any other object
    databases?

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Paul D. Fernhout at Aug 29, 2003 at 4:11 pm

    Patrick K. O'Brien wrote:
    Jeremy Bowers <jerf at jerf.org> writes:
    So why *isn't* it a transaction? Unless you have a good reason not
    to, I'd suggest automatically "coercing" that into a transaction
    instead of throwing an error.
    These are some of my reasons: 1) every transaction gets pickled and
    logged before executed, so that the database can recover from a crash,
    2) most of the other cool features depend on mutations passing through
    the extent manager for each class, 3) transparent transactions only
    seem like a good idea, 4) security is hard to enforce without explicit
    boundaries (read the Twisted docs regarding Perspective Broker), 5)
    explicit is better than implicit, especially when valuable persistent
    data is involved.

    [snip]
    (I'm only really entering my maturity (IMHO) as a software engineer,
    but one of my rules of thumb for developing software for other
    people to use is that the API can ***never*** be too easy.
    If you can figure out a way to have transparent transactions, without
    giving up on any ACID properties considered mandatory for a DBMS, I
    would love to hear about it. Have you worked with any other object
    databases?
    To cite:
    http://databases.about.com/library/weekly/aa120102a.htm
    "The ACID model is one of the oldest and most important concepts of
    database theory. It sets forward four goals that every database
    management system must strive to achieve: atomicity, consistency,
    isolation and durability. No database that fails to meet any of these
    four goals can be considered reliable."

    Well, to chime in here, in a "friendly" competition / cooperation sort
    of way, the Pointrel Data Repository System,
    http://sourceforge.net/projects/pointrel/
    while not quite an object database (and admittedly its case being
    easier) has a simple API in the bare minimum use case (it has more
    complex variants). Here is an example of its use (with fragments
    inspired in response to an earlier c.l.p poster's use case a few days ago):

    from pointrel20030812 import *

    # add a first attendant -- uses built in unique ID function
    # each change will be implicitely a seperate transaction
    attendantID = Pointrel_generateUniqueID()
    Pointrel_add("congress", attendantID, 'object type', 'user')
    Pointrel_add("congress", attendantID, 'name', 'Sir Galahad')

    # add a second attendant, this time as an atomic transaction
    attendantID = Pointrel_generateUniqueID()
    Pointrel_startTransaction()
    Pointrel_add("congress", attendantID, 'object type', 'user')
    Pointrel_add("congress", attendantID, 'name', 'Brian')
    Pointrel_finishTransaction()

    In the first case, the changes are automatically made into transactions,
    in the second, they are lumped under the current transaction.

    Note that Python objects could be added to the database, as in:

    Pointrel_add("test", 10, ["hello", "goodbye"], MyClass)

    This simple API is made possible by two decisions:
    * to have a version of the API function set which are named as module
    level globals and use a hidden repository (stored in _repository) which
    is defaulted in various ways when needed.
    * to keep a flag in a repository of whether it is in a transaction or
    not, and if it isn't, to create a transaction on the fly (if an
    "implicit transactions allowed" option is set, which it is by default).

    A more general use of the API allowing multiple repositories to be used
    by one application simultaneously is:

    repository = PointrelDataRepositorySystem(archiveName)
    repository.startTransaction()
    repository.add(context, a, b, c)
    repository.add(context, d, e, f)
    repository.finishTransaction()

    The module level "Pointrel_xyz()" functions use these sorts of more
    general API calls behind the scenes.

    Granted, the Pointrel System is essentially a single user single
    transaction system at the core. It (in theory, subject to bugs) supports
    atomicity (transactions), isolation (locking) and durability
    (logging&recovery). It only supports consistency by how applications use
    transactions as opposed to explicit constraints or rules maintained by
    the database, so one could argue it fails the ACID test there. (Although
    would any typical ODBMS pass consistency without extra code support?
    Does PyPerSyst have this as the database level?) And the Pointrel System
    doesn't attempt to hook into the Python language syntax, so it's task
    may be much easier for PyPerSyst's goals?

    To be clear, I'm not holding this out as "Pointrel System great" and
    "PyPerSystem not so great", since obviously the two systems do different
    things, each have its own focus, your task is perhaps harder, I don't
    fully understand everything that is going on here in your design and
    requirements, etc. What I am trying to get at is more to challenge you
    (in a friendly way) to have a very simple API in a default case by
    throwing down a pseudo-gauntlet of a simpler system API. The Pointrel
    System has gone through years of permutation on the API (mainly just by
    me) to get to the conceptual simplicity it has. And of course, now I'm
    in the process of adding more complexity on top of it (but not in it)
    where I am running into more object persistance and interface issues
    (such as the ones PyPerSyst may already solve easily). So feel free to
    say I don't understand all the issues yet. Maybe I'll learn something. ;-)

    In Smalltalk, typically persistant objects may get stored and retrieved
    as proxies, which is made possible by overriding the basic storage and
    retrieval methods which are all exposed etc. Maybe Python the language
    could do with more hooks for persistances as a PEP? I know there are
    some lower level hooks for access, I'm just wondering if they are enough
    for what you may want to do with PyPerSyst to make an elegant API for
    persistant objects (perhaps better unique ID support?), where you could
    then just go:

    import persistanceSystem import *
    foo = MyClass()
    PersistanceSystem_Wrap(foo)
    # the following defaults to a transaction
    foo.x = 10
    # this makes a two change transaction
    PersistanceSystem_StartTransaction()
    foo.y = 20
    foo.z = 20
    foo.info = "I am a 3D Point"
    PersistanceSystem_EndTransaction()
    # what happens to foo on garbage collection? It persists!
    ...
    # Other code in another program
    import persistanceSystem import *
    foo = PersistanceSystem_Query(x, y , z0)
    print foo.info # prints --> "I am a 3D Point"

    That MyClass instance called foo and the related variable changes gets
    stored in an ODBMS in transactions somewhere... Then I could do the same
    for the Pointrel System somehow using the same simple hooks.

    I any case, if you can point out why such useage would be impossible
    using Python and some future version of PyPerSyst, we might be on to
    something interesting. I know in a typical Smalltalk I could easily do
    such a thing. But then again, in most Smalltalks, 3/4 yields a fraction
    (not an int, and not a float), and when 3/4 in Smalltalk is multiplied
    by 4/3 you get 1 back again (as an int, not a rounded float), and Python
    still stuggles with some basic things like this (although Python has
    many other good qualities that more than make up for such weaknesses).
    So, what else would you like to have in a pure-Python ODBMS?
    I do think PyPerSyst is a really cool concept (in memory use and disk
    checkpoints and a log). It reminds me a little of Gemstone (an ODBMS)
    for Smalltalk.

    By the way, if you add support for the sorts of associative tuples with
    the Pointrel System is based on, efficiently managed, maybe I'll
    consider switching to using your system, if the API is simple enough.
    :-) Or, perhaps there is a way the Pointrel System can be extended to
    support what you might want to do (in the sense of transparent
    interaction with Python). In its use of the pickler, the Pointrel System
    does not keep a list of previously pickled object, so it can't
    transparently pickle objects that refer to previously pickled object in
    the repository, so that is one way that the Pointrel system can't do
    what your system does at all. (I'm not sure how to do that without like
    PyPerSyst keeping lots of previously pickled objects in memory at once
    for the Pickler to work with). Also, in the Pointrel System repositories
    are sort of on the fly made up of an arbitrary collection of archives
    where archives may be added and removed dynamically, so I don't quite
    begin to see to handle object persistance across a repository if
    subobjects are stored in different archives which are dropped out of the
    repository.

    My biggest issue with OO databases (including "a Smalltalk image" for
    that matter) in general is that the definition of objects changes over
    time, and on a practical basis, it might be needed to support multiple
    definitions of a class with the same name simultaneously if supporting a
    broad range of applications and somehwo resolve version issues. The
    Pointrel System in itself doesn't solve that problem either, but it also
    doesn't have that problem built in at the core, since its main storage
    type is just an arbitrary binary string. I mainly added the Python
    object support just because "pickle" made it easy and fun to do the
    basics, and I thought that a limited level of transparent support might
    make it more appealing to Pythonistas and provide some extra easy
    expanability if people really wanted to easily store typed information
    as oposed to strings. (ALthough I think it could also bring headaches if
    people have PyPerSyst level expectations for object storage and
    retrieval when I support something more like a Newton soup entry..)

    By the way, I like your overview of various related ODBMS projects here:
    http://www.orbtech.com/wiki/PythonPersistence
    (maybe http://munkware.sourceforge.net/ might go there now?)
    and your article at:
    http://www-106.ibm.com/developerworks/library/l-pypers.html

    And I'm just starting to poke around with your PyCrust to see if it
    can't be used to support more Smalltalk like development of Python apps.
    As a hint as to what I'd like to do :-) I'm hoping to get a lot of
    mileage out of code like:
    newMethodSource = self.editText.GetValue()
    print newMethodSource
    self.expr = compile(newMethodSource, '<string>', 'exec')
    exec self.expr in self.__class__.__dict__
    as opposed to reloading a whole file at once -- to support incremental
    development on live GUIs. I just need a good way to iterate through the
    fields of a GUI instance to rebind action methods to newer versions. (I
    discovered typical Python GUI toolkits have sort of same problem as when
    using Smalltalk blocks for GUI action code, so when you override a boudn
    method the old version is hung onto by the GUI event system since a
    method is referenced by pointer not by name, and I don't think Python
    has a "instance become: otherInstance" equivalent.) If it worked out
    well, such a system could then leverage the Pointrel System or PyPerSyst
    to provide version control at a fine grained level for method function
    definitions. If PyPerSyst was as transparent to use as outlined above,
    maybe it could then be used to store and retrieve hand built GUI
    instances with their hand built methods (sort of like in a Squeakish
    Smalltalk image with Morphic, but maybe better).

    So anyway, yours in friendly coopetition. :-)

    --Paul Fernhout
    http://www.pointrel.org



    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
  • Patrick K. O'Brien at Aug 29, 2003 at 5:26 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    Well, to chime in here, in a "friendly" competition / cooperation
    Let me start by saying I'd love to cooperate, even if I am competitive
    by nature. ;-)

    Please keep that in mind as I agree/disagree with some of your
    points. ;-)
    sort of way, the Pointrel Data Repository System,
    http://sourceforge.net/projects/pointrel/
    while not quite an object database (and admittedly its case being
    easier) has a simple API in the bare minimum use case (it has more
    complex variants). Here is an example of its use (with fragments
    inspired in response to an earlier c.l.p poster's use case a few days
    ago):

    from pointrel20030812 import *

    # add a first attendant -- uses built in unique ID function
    # each change will be implicitely a seperate transaction
    attendantID = Pointrel_generateUniqueID()
    Pointrel_add("congress", attendantID, 'object type', 'user')
    Pointrel_add("congress", attendantID, 'name', 'Sir Galahad')

    # add a second attendant, this time as an atomic transaction
    attendantID = Pointrel_generateUniqueID()
    Pointrel_startTransaction()
    Pointrel_add("congress", attendantID, 'object type', 'user')
    Pointrel_add("congress", attendantID, 'name', 'Brian')
    Pointrel_finishTransaction()

    In the first case, the changes are automatically made into
    transactions, in the second, they are lumped under the current
    transaction.

    Note that Python objects could be added to the database, as in:

    Pointrel_add("test", 10, ["hello", "goodbye"], MyClass)

    This simple API is made possible by two decisions:
    This API looks rather verbose to me. I think mine would look like:
    t = tx.Create('User', name='Sir Galahad')
    user = db.execute(t)
    And unique ids (immutable, btw) are assigned by PyPerSyst:
    user.oid
    42

    And you can still access attributes directly, you just can't change
    them outside of a transaction:
    user.name
    'Sir Galahad'

    And the generic Update transaction is equally simple:
    t = tx.Update(user, name='Brian')
    db.execute(t)
    user.name
    'Brian'
    Granted, the Pointrel System is essentially a single user single
    transaction system at the core. It (in theory, subject to bugs)
    supports atomicity (transactions), isolation (locking) and
    durability (logging&recovery). It only supports consistency by how
    applications use transactions as opposed to explicit constraints or
    rules maintained by the database, so one could argue it fails the
    ACID test there. (Although would any typical ODBMS pass consistency
    without extra code support? Does PyPerSyst have this as the database
    level?)
    PyPerSyst can persist *any* picklable object graph. But it also comes
    with an Entity class and a Root class (that understands Entity
    classes) that provides additional functionality, such as alternate
    indexes, referential integrity, instance validation, etc. So if your
    schema describes classes that subclass Entity, you get lots of
    functionality built into the database itself, without having to write
    any additional code, other than the additional validity checking that
    only your subclass knows. But I'd like to make more of that
    declarative as well. I'm also working on Fields, which provide
    validation and other features at the individual Entity attribute
    level. Fields have lots of metadata, like fields in an RDBMS.

    My goal is to have as much behavior as possible in the database, and
    have that behavior controlled declaratively within the schema.
    To be clear, I'm not holding this out as "Pointrel System great" and
    "PyPerSystem not so great", since obviously the two systems do
    different things, each have its own focus, your task is perhaps
    harder, I don't fully understand everything that is going on here in
    your design and requirements, etc. What I am trying to get at is
    more to challenge you (in a friendly way) to have a very simple API
    in a default case by throwing down a pseudo-gauntlet of a simpler
    system API.
    I don't mind a friendly challenge. I'm just surprised that the bulk
    of this thread is debating an API that has barely seen the light of
    day, and that I consider to be drop-dead simple. I guess I need to
    get a demo app created soon, just to put this to rest. Or at least
    make sure we're all debating about the same thing. ;-)

    Right now we're debating an API that nobody on this thread has really
    seen or used, other than me. The other thing I can say is that, imo,
    the way you interact with persistent class instances is not the same
    way you interact with regular class instances. Not if you value the
    integrity and reliability of your data. And trying to make it appear
    so is a disservice. I know everyone seems to think transparent
    persistence is the holy grail, but I've come to think otherwise.

    Unfortunately, I don't have time to fully elaborate my position. But
    you don't have to agree with me on this point. PyPerSyst is very
    modular, and there implementations of transparent proxies in the
    PyPerSyst CVS sandbox that some other developers on the team have
    written. So it can be done.

    I'll reply to other stuff separately to keep the message size down.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Paul D. Fernhout at Aug 29, 2003 at 6:43 pm

    Patrick K. O'Brien wrote:
    Let me start by saying I'd love to cooperate, even if I am
    competitive by nature. ;-)
    Nothing like a good controversy to get people paying attention. :-)
    This API looks rather verbose to me. I think mine would look like:
    t = tx.Create('User', name='Sir Galahad') user = db.execute(t)
    I think your notion of transactions is growing on me. :-) I can see how
    you can generalize this to construct a transaction in a view of a
    database, querying on DB + T1 + T2 etc. while they are uncommitted and
    then commit them all (perhaps resolving multiuser multitransaction
    issues on commits). Kind of neat concept, I'll have to consider for some
    version of the Pointrel System.

    I think it is the special syntax of:
    tx.Update(u1, name='Joe')
    or:
    tx.Create('User', name='Sir Galahad')
    which I am recoiling some from.

    I think part of this comes from thinking as a transaction as something
    that encloses other changes, as opposed to something which is changed.
    Thus my discomfort at requesting services from a transaction other than
    commit or abandon. I'm not saying maybe I couldn't grow to love
    tx.Update(), just that it seems awkward at first compared to what I am
    used to, as well compared to making operations on a database itself
    after having told the database to begin a transaction. I'm also left
    wondering what the read value of the "name" field is when accessed
    directly as "u1.name" after doing the "wx.Update()" and before doing the
    "db.execute()". [By the way, pickly, picky, and I fall down on it too,
    but you use different capitalizations for those two functions.]

    So is it that in PyPerSyst there appears to be one way to access
    information (directly through the object using Python object attribute
    access dot syntax) [not sure about database queries?] and another way to
    change objects -- using tx.XYZ()? This mixing of mindsets could be
    confusing (especially within an object that changes its own values
    internally).

    Using tx.Update also becomes an issue of how to convert existing code to
    persistant code. Mind you, the Pointrel System can't do this
    transparently either, but it doesn't try to do it at all. The Pointrel
    System requires both looking up a value and storing it to use a
    different syntax. Is it just a matter of aesthetics about whether it is
    better to have the whole approach be unfamiliar or whether it is better
    to have only half of it be unfamiliar? Or is there something more here,
    some violation of programmer expectations? [See below.]
    And unique ids (immutable, btw) are assigned by PyPerSyst:
    user.oid
    42
    Being competetive here :-) I would love to know if you have a good
    approach for making them globally unique across all possible users of
    all PyPerSyst repositories for all time. The Pointrel has an approach to
    handle this (I don't say it will always work, or is efficient, but it
    tries). :-) Feel free to raid that code (BSDish license, see
    license.txt), but that issue may have other deeper implications for your
    system.
    And you can still access attributes directly, you just can't change
    them outside of a transaction:

    user.name
    'Sir Galahad'

    And the generic Update transaction is equally simple:

    t = tx.Update(user, name='Brian') db.execute(t) user.name
    'Brian'
    I know one rule of user interface design (not nexceesarily API of
    course) is that familiar elements should act familiar (i.e. a drop down
    list should not launch a dialog window on drop down) and that if you are
    going to experiment it should look very different so expectations are
    not violated.

    The issue here is in part that when you can reference "u1.name" and then
    "u1.name = 'Joe'" generates an exception (instead of automatically
    making an implict transaction), some user expectation of API symmetry
    may be violated...

    Also, on another issue, it seems like the persistant classes need to
    derive from a special class and define their persistant features in a
    special wy, i.e. class Realm(Entity): _attrSpec = [ 'name', ] etc.
    Again, this is going somewhat towards Python language integration yet
    not all the way.

    While I'd certainly agree your version is more concise than what I
    posted first (just an example of a system that does not attempt to use
    Python language features), later in the email (perhaps you'll get to it
    in your next reply) was the simpler:

    import persistanceSystem import *
    foo = MyClass()
    PersistanceSystem_Wrap(foo)
    # the following defaults to a transaction
    foo.x = 10
    # this makes a two change transaction
    PersistanceSystem_StartTransaction()
    foo.y = 20
    foo.z = 20
    foo.info = "I am a 3D Point"
    PersistanceSystem_EndTransaction()

    That approach does not violate any symmetry expectations by users -- you
    can assign and retrieve values just like always.
    Granted, the Pointrel System is essentially a single user single
    transaction system at the core. It (in theory, subject to bugs)
    supports atomicity (transactions), isolation (locking) and
    durability (logging&recovery). It only supports consistency by how
    applications use transactions as opposed to explicit constraints or
    rules maintained by the database, so one could argue it fails the
    ACID test there. (Although would any typical ODBMS pass consistency
    without extra code support? Does PyPerSyst have this as the
    database level?)

    PyPerSyst can persist *any* picklable object graph.
    Are the graphs stand alone can they reference other previously persisted
    Python objects (not derived from "Root" or "Entity")?
    But it also comes with an Entity class and a Root class (that
    understands Entity classes) that provides additional functionality,
    such as alternate indexes, referential integrity, instance
    validation, etc.
    I guess I need to learn more about when these are better handled by the
    persistance system as opposed to the applications that use it.
    I don't mind a friendly challenge. I'm just surprised that the bulk
    of this thread is debating an API that has barely seen the light of
    day, and that I consider to be drop-dead simple. I guess I need to
    get a demo app created soon, just to put this to rest. Or at least
    make sure we're all debating about the same thing. ;-)
    Good point.

    I think the issue is that with the other systems out there
    (MySQL, ZODB, etc.) it seems like a new system has to offer something
    really new (speed, footprint, simplicity, robustness, documentation :-)
    etc.).

    Presumably a very transaparent API for persistance is still needed for
    an ODBMS which is Python friendly? (Does ZODB do any of this?) If I need
    to write any extra code at all for an object to be persistant, or derive
    from a specialized class, I could just derive from a class that knows
    how to use SQL to store pickled fields. Obviously, PyPerSyst may have
    many wonderful features (not having used it yet) which make it worth it
    to do a special derivation or write special code, but it just seems like
    it would have language transparency too. But, I haven't tried to do that
    in Python, so maybe it's not possible.
    Right now we're debating an API that nobody on this thread has really
    seen or used, other than me. The other thing I can say is that,
    imo, the way you interact with persistent class instances is not the
    same way you interact with regular class instances. Not if you value
    the integrity and reliability of your data. And trying to make it
    appear so is a disservice. I know everyone seems to think
    transparent persistence is the holy grail, but I've come to think
    otherwise.
    I think this is the core of the question of this part of the thread.
    You wrote "I've come to think otherwise". I'd be curious to hear more on
    any use cases or examples on why transaparency is not so compatible with
    reliability etc. I frankly don't know. I just don't see them being
    mutually exclusive, especially based on what I have read of Smalltalk
    systems that do persistance using proxies. But again, Smalltalk has
    "become:" which can essentially swap any arbitray instance and a proxy,
    thus making it easy to suddenly start using a proxy for a previously
    used instance and have all previous references point to the proxy. Maybe
    Python need's become? I could use it elsewhere. Maybe it has it and I
    never noticed?
    Unfortunately, I don't have time to fully elaborate my position. But
    you don't have to agree with me on this point. PyPerSyst is very
    modular, and there implementations of transparent proxies in the
    PyPerSyst CVS sandbox that some other developers on the team have
    written. So it can be done.
    OK.

    Thanks for the reply.

    --Paul Fernhout
    http://www.pointrel.org



    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
  • Patrick K. O'Brien at Aug 29, 2003 at 8:24 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    Patrick K. O'Brien wrote:
    Let me start by saying I'd love to cooperate, even if I am
    competitive by nature. ;-)
    Nothing like a good controversy to get people paying attention. :-)
    And never let the facts get in the way of a good story. ;-)
    This API looks rather verbose to me. I think mine would look like:
    t = tx.Create('User', name='Sir Galahad') user = db.execute(t)
    I think your notion of transactions is growing on me. :-) I can see how
    you can generalize this to construct a transaction in a view of a
    database, querying on DB + T1 + T2 etc. while they are uncommitted and
    then commit them all (perhaps resolving multiuser multitransaction
    issues on commits). Kind of neat concept, I'll have to consider for some
    version of the Pointrel System.

    I think it is the special syntax of:
    tx.Update(u1, name='Joe')
    or:
    tx.Create('User', name='Sir Galahad')
    which I am recoiling some from.

    I think part of this comes from thinking as a transaction as something
    that encloses other changes, as opposed to something which is changed.
    Thus my discomfort at requesting services from a transaction other than
    commit or abandon. I'm not saying maybe I couldn't grow to love
    tx.Update(), just that it seems awkward at first compared to what I am
    used to, as well compared to making operations on a database itself
    after having told the database to begin a transaction.
    My use of the term "transaction" has certain subtleties that deserve
    clarification. First, a transaction is an instance of a Transaction
    class (or subclass). This instance must have an execute method that
    will get called by the database (after the transaction instance gets
    tested for picklability, and gets logged as a pickle). That execute
    method will be passed the root of the database. It is then free to do
    whatever it wants, as long as the sum total of what it does leaves the
    database in a consistent state. All transactions are executed
    sequentially. All changes made by a transaction must be
    deterministic, in case the transaction gets reapplied from the
    transaction log during a recovery, or restarting a database that
    wasn't dumped just prior to stopping.

    At this point, PyPerSyst does not have commit/rollback capability. So
    it is up to the transaction class instance to not leave the database
    in an inconsistent state. I'm looking into supporting
    commit/rollback, but the simple solution there would double RAM
    requirements, and other solutions are tricky, to say the least. So
    I'm still looking for something simple and elegant to fit in with the
    rest of the framework.

    The transactions I've shown, tx.Create, tx.Update, tx.Delete, are
    simply generic classes that come with PyPerSyst to make it easy to
    create, update and delete single instances of entities. Most real
    applications would define their own Transaction classes in addition to
    these.
    I'm also left wondering what the read value of the "name" field is
    when accessed directly as "u1.name" after doing the "wx.Update()"
    and before doing the "db.execute()".
    t = tx.Update() merely creates a transaction instance, providing it
    with values that will be needed by its execute() method. (See the GOF
    Command pattern.) So nothing changes until the transaction is
    executed by the database, which happens when the transaction instance
    is passed to the database's execute method:

    db.execute(t)
    [By the way, pickly, picky, and I fall down on it too, but you use
    different capitalizations for those two functions.]
    There aren't two functions: tx.Update is a class, db.execute is a
    method. The capitalization is correct. ;-)
    So is it that in PyPerSyst there appears to be one way to access
    information (directly through the object using Python object
    attribute access dot syntax) [not sure about database queries?] and
    another way to change objects -- using tx.XYZ()? This mixing of
    mindsets could be confusing (especially within an object that
    changes its own values internally).
    You could define transactions that do queries as well. And some
    people prefer to do that. But I think for most reads it is easier to
    traverse the db.root object.

    If you use entities, and an instance of the Root class for your
    db.root, then your db.root is a dictionary-like object that gets you
    to the extent for each Entity subclass in your schema. The entity
    extent is an instance of an Entity class that manages the set of all
    instances of the class that it manages. The Extent class is how I'm
    able to provide Relational-like features.

    Inside of Entity instances, your code looks just like regular Python
    code. Its just application code that must go through transactions.
    Sure this mixing of mindsets is different than what people are used
    to, but we're talking about managing valuable data. If you simplify
    things too much, you lose the integrity of your data.
    Using tx.Update also becomes an issue of how to convert existing
    code to persistant code. Mind you, the Pointrel System can't do
    this transparently either, but it doesn't try to do it at all. The
    Pointrel System requires both looking up a value and storing it to
    use a different syntax. Is it just a matter of aesthetics about
    whether it is better to have the whole approach be unfamiliar or
    whether it is better to have only half of it be unfamiliar? Or is
    there something more here, some violation of programmer
    expectations? [See below.]
    Existing code won't become magically persistent by adding PyPerSyst.
    And unique ids (immutable, btw) are assigned by PyPerSyst:
    user.oid
    42
    Being competetive here :-) I would love to know if you have a good
    approach for making them globally unique across all possible users
    of all PyPerSyst repositories for all time. The Pointrel has an
    approach to handle this (I don't say it will always work, or is
    efficient, but it tries). :-) Feel free to raid that code (BSDish
    license, see license.txt), but that issue may have other deeper
    implications for your system.
    Sorry, nothing special here. They are just incrementing ints unique
    within each extent. It would be easy to switch to a globally unique
    id if you have a good one, and as long as it was deterministic, and
    not random in any way.
    And you can still access attributes directly, you just can't
    change them outside of a transaction:
    user.name
    'Sir Galahad'
    And the generic Update transaction is equally simple:
    t = tx.Update(user, name='Brian') db.execute(t) user.name
    'Brian'
    I know one rule of user interface design (not nexceesarily API of
    course) is that familiar elements should act familiar (i.e. a drop
    down list should not launch a dialog window on drop down) and that
    if you are going to experiment it should look very different so
    expectations are not violated.

    The issue here is in part that when you can reference "u1.name" and
    then "u1.name = 'Joe'" generates an exception (instead of
    automatically making an implict transaction), some user expectation
    of API symmetry may be violated...
    While this is feasible, the problem I have with this is that I think
    implicit transactions on this minute level of granularity are evil.
    That's the main reason I haven't implemented this, even though others
    have done this for PyPerSyst. I think too many people would abuse the
    implicit transaction feature, resulting in inconsistent and unreliable
    objects. I'm targeting serious, multi-user applications. But
    PyPerSyst is completely modular, so you can use it to implement all
    kinds of persistence systems. Most of the capabilities I've been
    discussing are new, and completely optional.
    Also, on another issue, it seems like the persistant classes need to
    derive from a special class and define their persistant features in
    a special wy, i.e. class Realm(Entity): _attrSpec = [ 'name', ] etc.
    Again, this is going somewhat towards Python language integration
    yet not all the way.
    You don't *have* to use the Entity class that comes with PyPerSyst,
    but if you do, it lets you define the attributes, alternate keys, and
    fields for your subclass in as simple a form as I could think of.

    If you don't use the Entity class, then you have to figure out how to
    support instance integrity, alternate keys, referential integrity,
    bi-directional references, etc. So I think they provide some benefit.
    While I'd certainly agree your version is more concise than what I
    posted first (just an example of a system that does not attempt to use
    Python language features), later in the email (perhaps you'll get to it
    in your next reply) was the simpler:

    import persistanceSystem import *
    foo = MyClass()
    PersistanceSystem_Wrap(foo)
    # the following defaults to a transaction
    foo.x = 10
    # this makes a two change transaction
    PersistanceSystem_StartTransaction()
    foo.y = 20
    foo.z = 20
    foo.info = "I am a 3D Point"
    PersistanceSystem_EndTransaction()

    That approach does not violate any symmetry expectations by users --
    you can assign and retrieve values just like always.
    If users expect symmetry it is because they are used to writing single
    process programs that do not share objects. Does anyone expect this
    kind of symmetry and transparency when writing a multi-threaded
    application? Why not? Granted, having start/end transaction
    semantics might change some of the rules. But even if we had those in
    PyPerSyst, I would probably only use them inside of Transaction
    classes, not embedded in application code where they are harder to
    find and test. Explicit transaction objects have many benefits.

    It's sort of similar to the notion of separating your application
    logic from your gui code. Sure its easier to just put a bunch of code
    in the event handler for a button. But is that the best way to code?
    In my mind, implicit transactions, or commit/rollback in application
    code, is like putting all your business logic in the event handlers
    for your gui widgets. I'm trying to keep people from writing crappy
    persistent applications.
    PyPerSyst can persist *any* picklable object graph.
    Are the graphs stand alone can they reference other previously
    persisted Python objects (not derived from "Root" or "Entity")?
    A PyPerSyst database has a single entry point, named root, that can be
    any picklable Python object, and any objects reachable from that
    object. When the root gets pickled (for example when you do
    db.dump()), the whole thing gets pickled and all references are
    maintained. When the database starts, the entire thing gets
    unpickled. The entire thing is always in memory (real, or virtual).
    The snapshot and log are on disk. Each transaction is appended to the
    log. Did that answer your question?
    But it also comes with an Entity class and a Root class (that
    understands Entity classes) that provides additional functionality,
    such as alternate indexes, referential integrity, instance
    validation, etc.
    I guess I need to learn more about when these are better handled by
    the persistance system as opposed to the applications that use it.
    In my mind, anything that is generic shouldn't have to be reinvented
    in application code. I feel like I've spent most of my career
    reinventing one database application after another. ;-)
    Presumably a very transaparent API for persistance is still needed
    for an ODBMS which is Python friendly? (Does ZODB do any of this?)
    I started writing a wrapper for ZODB and gave up about a year ago.
    If I need to write any extra code at all for an object to be
    persistant, or derive from a specialized class, I could just derive
    from a class that knows how to use SQL to store pickled fields.
    You don't think there is a benefit to not having to use a database,
    not having to map anything to relational tables, not being limited to
    the relational model, and not having to do joins, etc? I don't care
    how good an O-R mapper is, not having to use one at all is better.
    I think this is the core of the question of this part of the thread.
    You wrote "I've come to think otherwise". I'd be curious to hear
    more on any use cases or examples on why transaparency is not so
    compatible with reliability etc.
    I just think implicit transparent transactions would lull users into a
    false sense of integrity and make them write sloppy applications that
    didn't actually maintain the integrity of their objects when used in a
    multi-user environment. I think the kind of applications I want to
    use PyPerSyst for demand that it be difficult for application
    programmers to do the wrong thing with regards to the integrity of the
    persisted data. I think having transactions as explicit objects
    provides more control over the integrity of the database. If users
    want transparency, it can be done, using PyPerSyst, it just isn't the
    focus of my current efforts. And I don't think explicit transactions
    are that much of a burden. Transaction code is a small percentage of
    application code, compared to all the interface code you have to
    write. And you could easily write wrappers for transactions that make
    them less burdensome.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Paul D. Fernhout at Aug 31, 2003 at 6:18 pm
    Patrick-

    I think based on this and your other posts I now understand better where
    you are coming from. Thanks for the explainations and comments.

    To try to restate (and better justify) what I now think I see as your
    point of view on this transactional API issue, let me present this analogy.

    When one builds a modern GUI application that supports complete
    multicommand "Undo" and "Redo" such as built on the Macintosh MacApp
    framework
    http://developer.apple.com/documentation/mac/MacAppProgGuide/MacAppProgGuide-44.html
    or any other similar approach, the stategy generally is to have a stack
    of Command (subclassed) objects, where each such object supports "do",
    "undo" and "redo". We use a general purpose system like this for example
    in our Garden Simulator software (and other Delphi applications --
    hopefully someday to be ported to Python).
    http://www.gardenwithinsight.com/progmanlong.htm
    Rather than mess with the application's data domain directly, every user
    action in such an undoable application, from selecting an object in a
    drawing program, to making a change with a slider, to dragging an
    object, to deleting an item, to even setting multiple options in a
    dialog (if each change isn't itself a command), creates a command (i.e.
    a transaction), which changes the domain and then continues to modify
    the domain while it is active (say at the top of the command stack while
    a mouse is dragged) and then completely finishes modifying the domain
    and is left on the stack when all the related GUI activity is done.
    While the command (transaction) itself may fiddle with the domain, no
    button press, or mouse click, or drop down selection ever messes
    directly with the data domain (or what might in another context be sort
    of like the business logic and business data). By constraining changes
    to this approach, one can readily do, undo, and redo a stack of commands
    to one's heart's content -- and subject to available memory :-) or other
    limits.

    Your transaction notion in PyPerSyst, now that I understand it better,
    seems to have something of this GUI command system flavor. And that
    emphasis is perhaps why you do not feel it is inconsistent to have one
    way to read values and another way to change values, since changing
    values is something in this model requiring significant forethought as
    an application level transaction. Implicitely, what you are getting at
    here is a development methodology where all data domain changes go
    through transactions (commands), and the transactions have been
    consciously considered and designed (rather than just resulting from
    randomly poking around in the data domain). And that is perhaps why you
    are so against the implicit transactions -- they violate this
    development methodology of being explicit about what chunks of changes
    are a transaction (as an atomic unit). The same sort of issues come up
    whan people try to avoid COmmand type framekworks, thinking it is easier
    to just fire off changes directly to the data domain from GUI events
    (and it is easier -- just not undoable or consistent). Adhering to a
    transactional (command-al?) development methodology makes it very
    straightforward to understand how the application is structured and what
    it can or cannot do (i.e just look in the transaction (or command) class
    hierarchy). And so, from your perspective, it is quite reasonable to
    have a lot of work go into crafting transaction objects (or subclassing
    them from related ones etc.) in the same way that it is expected that
    GUI applications with undo/redo capabilities will have a lot of effort
    put into their analogous "Command" class hierarchy.

    To step back a minute, in general, a transactional development
    methodology is in a way a step up from the random flounderings of how
    many programs work, with code that changes the data domain potentially
    sprinkled throughout the application code based, rather than cleanly
    specified in a set of Command or Transaction subclasses. So you are sort
    of proposing generally a step up in people's understanding and practice
    of how to deal with applications and persistent data.

    Does this sort of capture an essential part of what you are getting at
    here with your PyPerSyst application architecture development strategy?
    If so, I like it. ;-)

    All the best.

    --Paul Fernhout
    http://www.pointrel.org

    P.S. The Pointrel System supports abandoning in process transactions by
    sotrign all the data it changes long the way, and being able to roll
    back to this state. But, with an object database as you have outlined
    it, I think this would naturally be a lot more complicated -- although
    perhaps you could adopt the "undo" and "redo" aspect of Commands
    (including stashing the old objects somewhere in case of a redo...)

    Patrick K. O'Brien wrote:
    [Lots of good stuff snipped, and thanks for the interesting dialogue. :-)]
    If users expect symmetry it is because they are used to writing single
    process programs that do not share objects. Does anyone expect this
    kind of symmetry and transparency when writing a multi-threaded
    application? Why not? Granted, having start/end transaction
    semantics might change some of the rules. But even if we had those in
    PyPerSyst, I would probably only use them inside of Transaction
    classes, not embedded in application code where they are harder to
    find and test. Explicit transaction objects have many benefits. >
    It's sort of similar to the notion of separating your application
    logic from your gui code. Sure its easier to just put a bunch of code
    in the event handler for a button. But is that the best way to code?
    In my mind, implicit transactions, or commit/rollback in application
    code, is like putting all your business logic in the event handlers
    for your gui widgets. I'm trying to keep people from writing crappy
    persistent applications.
    I think this is the core of the question of this part of the thread.
    You wrote "I've come to think otherwise". I'd be curious to hear
    more on any use cases or examples on why transaparency is not so
    compatible with reliability etc.
    >
    I just think implicit transparent transactions would lull users into a
    false sense of integrity and make them write sloppy applications that
    didn't actually maintain the integrity of their objects when used in a
    multi-user environment. I think the kind of applications I want to
    use PyPerSyst for demand that it be difficult for application
    programmers to do the wrong thing with regards to the integrity of the
    persisted data. I think having transactions as explicit objects
    provides more control over the integrity of the database. If users
    want transparency, it can be done, using PyPerSyst, it just isn't the
    focus of my current efforts. And I don't think explicit transactions
    are that much of a burden. Transaction code is a small percentage of
    application code, compared to all the interface code you have to
    write. And you could easily write wrappers for transactions that make
    them less burdensome.


    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
  • Patrick K. O'Brien at Sep 1, 2003 at 4:08 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    Patrick-

    I think based on this and your other posts I now understand better
    where you are coming from. Thanks for the explainations and
    comments.

    To try to restate (and better justify) what I now think I see as
    your point of view on this transactional API issue, let me present
    this analogy.
    [Analogy snipped]
    >

    Your analogy is absolutely correct. A PyPerSyst transaction
    Class/instance follows the Command pattern. In fact, they were
    initial called commands, and later renamed to transactions to
    emphasize the fact that the actions taking place inside the command
    needed to be atomic and leave the database in a consistent state (half
    of the ACID properties needed to be a reliable database). PyPerSyst
    itself provides the isolation (the engine provides this) and
    durability (storage provides this).
    To step back a minute, in general, a transactional development
    methodology is in a way a step up from the random flounderings of
    how many programs work, with code that changes the data domain
    potentially sprinkled throughout the application code based, rather
    than cleanly specified in a set of Command or Transaction
    subclasses. So you are sort of proposing generally a step up in
    people's understanding and practice of how to deal with applications
    and persistent data.
    Yes! And I love how you describe this - random flounderings.
    Does this sort of capture an essential part of what you are getting
    at here with your PyPerSyst application architecture development
    strategy? If so, I like it. ;-)
    Absolutely. And I'm glad you like it. Now if only I had a perfect
    model for guaranteeing the integrity and consistency of the state of
    Python class instances, we'd be set. Well, actually, I've got some
    ways to do that, I just don't completely like them.
    P.S. The Pointrel System supports abandoning in process transactions
    by sotrign all the data it changes long the way, and being able to
    roll back to this state. But, with an object database as you have
    outlined it, I think this would naturally be a lot more complicated
    -- although perhaps you could adopt the "undo" and "redo" aspect of
    Commands (including stashing the old objects somewhere in case of a
    redo...)
    I'm not sure we'd be able to easily support undo and redo in a
    multi-user environment. For a single-user application, this should be
    easy. And I'll probably eventually build in mechanisms to support
    this. But multi-user adds a few complexities.

    Commit and rollback would be nice for complex transactions that change
    a lot of state where your ability to guarantee the success of those
    changes, or test the success as a pre-condition, is difficult. The
    easiest approach, which Prevayler is implementing, is to always have
    two copies of the database in memory - one to try out a transaction
    and see if it completes, the other to receive only transactions that
    successfully completed on the tester. But that approach doubles the
    memory requirements, and we already have high memory requirements
    since we keep the entire object graph in memory. But the beauty of
    this approach is that it is simple and foolproof.

    Another approach would be for each transaction to keep mementos of
    objects that get changed, so they can be restored if an exception is
    raised at some point during the transaction. But I think that will be
    too complex and too much to expect from transaction writers, unless I
    can come up with support for that in PyPerSyst that makes it easier.

    Anyway, just some more thoughts. Good talking to you.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 5:38 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    In Smalltalk, typically persistant objects may get stored and
    retrieved as proxies, which is made possible by overriding the basic
    storage and retrieval methods which are all exposed etc. Maybe Python
    the language could do with more hooks for persistances as a PEP? I
    know there are some lower level hooks for access, I'm just wondering
    if they are enough for what you may want to do with PyPerSyst to make
    an elegant API for persistant objects (perhaps better unique ID
    support?), where you could then just go:

    import persistanceSystem import *
    foo = MyClass()
    PersistanceSystem_Wrap(foo)
    # the following defaults to a transaction
    foo.x = 10
    # this makes a two change transaction
    PersistanceSystem_StartTransaction()
    foo.y = 20
    foo.z = 20
    foo.info = "I am a 3D Point"
    PersistanceSystem_EndTransaction()
    # what happens to foo on garbage collection? It persists!
    ...
    # Other code in another program
    import persistanceSystem import *
    foo = PersistanceSystem_Query(x, y , z0)
    print foo.info # prints --> "I am a 3D Point"

    That MyClass instance called foo and the related variable changes gets
    stored in an ODBMS in transactions somewhere... Then I could do the
    same for the Pointrel System somehow using the same simple hooks.
    Adding hooks to Python itself has been discussed (look for the
    persistence SIG), and not gone anywhere, as far as I know. And I'm
    not sure it would be so good to add to the language. One reason is
    that it would either only be able to capture very simple transactions,
    or would require quite a framework to handle all the requirements for
    real use cases. This is one area where it would be hard to please
    everyone, and I think the Python language has to appeal to a broad set
    of uses.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 5:50 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    By the way, if you add support for the sorts of associative tuples
    with the Pointrel System is based on, efficiently managed, maybe
    I'll consider switching to using your system, if the API is simple
    enough. :-) Or, perhaps there is a way the Pointrel System can be
    extended to support what you might want to do (in the sense of
    transparent interaction with Python). In its use of the pickler, the
    Pointrel System does not keep a list of previously pickled object,
    so it can't transparently pickle objects that refer to previously
    pickled object in the repository, so that is one way that the
    Pointrel system can't do what your system does at all. (I'm not sure
    how to do that without like PyPerSyst keeping lots of previously
    pickled objects in memory at once for the Pickler to work
    with). Also, in the Pointrel System repositories are sort of on the
    fly made up of an arbitrary collection of archives where archives
    may be added and removed dynamically, so I don't quite begin to see
    to handle object persistance across a repository if subobjects are
    stored in different archives which are dropped out of the
    repository.
    Oy! There we go with the API thing again. ;-)

    PyPerSyst can manage anything that can be pickled. So it should be
    able to support your associative tuples. But to get the most bang for
    your buck, you'd want to subclass the Entity class that I recently
    added to PyPerSyst. I can't think of a reason it wouldn't work, but
    we'd have to give it a try and see.

    The root of a PyPerSyst database can be any Python object graph, with
    any kind of object referencing that Python supports. But transactions
    must be deterministic and independent, so they cannot contain
    references. If you saw my examples of the generic transactions you'll
    see that I passed in references. How can that be? The secret is the
    dereferencing that takes place in those transaction classes:

    """Generic transactions."""

    __author__ = "Patrick K. O'Brien <pobrien at orbtech.com>"
    __cvsid__ = "$Id: transaction.py,v 1.8 2003/08/27 00:53:01 pobrien Exp $"
    __revision__ = "$Revision: 1.8 $"[11:-2]


    from pypersyst.entity.entity import Entity
    from pypersyst.transaction import Transaction


    class Create(Transaction):

    def __init__(self, classname, **attrs):
    Transaction.__init__(self)
    self.classname = classname
    self.attrs = attrs

    def __getstate__(self):
    self.refs = {}
    for name, value in self.attrs.items():
    if isinstance(value, Entity):
    self.refs[name] = (value.__class__.__name__, value.oid)
    self.attrs[name] = None
    return self.__dict__.copy()

    def execute(self, root):
    self.EntityClass = root._classes[self.classname]
    for name, (classname, oid) in self.refs.items():
    self.attrs[name] = root[classname][oid]
    return self.EntityClass(**self.attrs)


    class Delete(Transaction):

    def __init__(self, instance):
    Transaction.__init__(self)
    self.instance = instance

    def __getstate__(self):
    self.classname = self.instance.__class__.__name__
    self.oid = self.instance.oid
    d = self.__dict__.copy()
    del d['instance']
    return d

    def execute(self, root):
    return root[self.classname]._delete(self.oid)


    class Update(Transaction):

    def __init__(self, instance, **attrs):
    Transaction.__init__(self)
    self.instance = instance
    self.attrs = attrs

    def __getstate__(self):
    self.classname = self.instance.__class__.__name__
    self.oid = self.instance.oid
    self.refs = {}
    for name, value in self.attrs.items():
    if isinstance(value, Entity):
    self.refs[name] = (value.__class__.__name__, value.oid)
    self.attrs[name] = None
    d = self.__dict__.copy()
    del d['instance']
    return d

    def execute(self, root):
    self.instance = root[self.classname][self.oid]
    for name, (classname, oid) in self.refs.items():
    self.attrs[name] = root[classname][oid]
    return root[self.classname]._update(self.instance, **self.attrs)


    Try telling me that isn't one sweet API! ;-)

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 6:02 pm

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

    My biggest issue with OO databases (including "a Smalltalk image"
    for that matter) in general is that the definition of objects
    changes over time, and on a practical basis, it might be needed to
    support multiple definitions of a class with the same name
    simultaneously if supporting a broad range of applications and
    somehwo resolve version issues. The Pointrel System in itself
    doesn't solve that problem either, but it also doesn't have that
    problem built in at the core, since its main storage type is just an
    arbitrary binary string. I mainly added the Python object support
    just because "pickle" made it easy and fun to do the basics, and I
    thought that a limited level of transparent support might make it
    more appealing to Pythonistas and provide some extra easy
    expanability if people really wanted to easily store typed
    information as oposed to strings. (ALthough I think it could also
    bring headaches if people have PyPerSyst level expectations for
    object storage and retrieval when I support something more like a
    Newton soup entry..)
    Schema evolution and schema migration are tough issues. In some ways
    things are simpler with PyPerSyst, since all objects reside in memory
    at all times. What that means is that there is no need to write
    utilities that "touch" all of your instances. When you start a
    PyPerSyst database, the entire thing is unpickled, which calls
    __setstate__ on all your objects. So migrating to a new schema is
    simply a matter of dumping the database, stopping the engine,
    replacing the schema, and restarting the engine.

    Making sure that your schema does the right thing is another matter.
    I'm still working on that. Ideally you would want your class
    definitions to themselves be a persisted schema that could only be
    modified by transactions that would generate the appropriate
    __getstate__ and __setstate__ methods to properly handle the changes.
    So a distant goal is to have a PyPerSyst application that handles
    schema evolution and migration for other PyPerSyst databases. Doesn't
    that sound like fun?

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Patrick K. O'Brien at Aug 29, 2003 at 6:34 pm

    pobrien at orbtech.com (Patrick K. O'Brien) writes:

    "Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:
    By the way, I like your overview of various related ODBMS projects here:
    http://www.orbtech.com/wiki/PythonPersistence
    (maybe http://munkware.sourceforge.net/ might go there now?)
    Boy that's old material. I forgot about that page. Look at that!
    I also forgot to mention that it is a wiki page, and you have my
    blessing to add whatever material you like (not that you needed my
    blessing, if you know what I mean).

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Jeremy Jones at Aug 29, 2003 at 11:55 pm

    On Fri, 29 Aug 2003 12:11:51 -0400 "Paul D. Fernhout" wrote:


    By the way, I like your overview of various related ODBMS projects here:
    http://www.orbtech.com/wiki/PythonPersistence
    (maybe http://munkware.sourceforge.net/ might go there now?)

    I wouldn't be offended if Munkware found its way to the PythonPersistence page ;-)



    Jeremy Jones
  • Patrick K. O'Brien at Aug 30, 2003 at 12:32 am

    Jeremy Jones <zanesdad at bellsouth.net> writes:

    On Fri, 29 Aug 2003 12:11:51 -0400
    "Paul D. Fernhout" wrote:
    By the way, I like your overview of various related ODBMS projects
    here: http://www.orbtech.com/wiki/PythonPersistence (maybe
    http://munkware.sourceforge.net/ might go there now?)
    I wouldn't be offended if Munkware found its way to the
    PythonPersistence page ;-)
    Me either. It's a wiki, so please feel free to add anything you like.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
  • Niki Spahiev at Sep 1, 2003 at 6:40 pm
    8/29/2003, 19:11:51, Paul D. Fernhout wrote:

    [...]
    PDF> Well, to chime in here, in a "friendly" competition / cooperation sort
    PDF> of way, the Pointrel Data Repository System,
    PDF> http://sourceforge.net/projects/pointrel/
    PDF> while not quite an object database (and admittedly its case being
    PDF> easier) has a simple API in the bare minimum use case (it has more
    PDF> complex variants). Here is an example of its use (with fragments
    PDF> inspired in response to an earlier c.l.p poster's use case a few days ago):

    How it compares with e4graph?

    --
    Best regards,
    Niki Spahiev
  • Paul D. Fernhout at Sep 1, 2003 at 8:36 pm
    Niki-

    Thanks for the pointer to e4graph. I found more info about it here:
    http://www.marshallbrain.com/robotic-freedom.htm

    There are several similarities -- and conceptually the ideas are very
    related (to the same extent the ER model is related to graphs).

    A few comments on part of their blurb on that page:

    "The e4Graph library allows you to model any kind of relationship
    between data that can be represented by a directed graph, including
    circular graphs of connections between data items. e4Graph is unique in
    that it allows these circular relationships to be represented directly
    rather than implicitly or through meta-data, as is necessitated by other
    approaches such as relational databases. A bi-directional link between
    two data items can be represented by two directed connections between
    those data items."

    The Pointrel Data Repository System allows circular relationships. All
    links in the Pointrel System are bi-directional (well, sort of quad
    directional in a way).

    Differences include:
    * The Pointrel System is in pure Python (e4graph is in C++ and uses a
    C++ database).
    * The Pointrel System suports the notion of "spaces" which could be seen
    as somewhat like "graphs" -- except query operations can span spaces (or
    even multiple archives for that matter).
    * The Pointrel System focuses more on complete history (e.g. tracking
    many versions of a linked node as it were in the e4graph metaphor).

    These comments are all off the top of my head after persuing the e4graph
    site for a few minutes; perhaps after looking more at e4graph I may
    realize some of these comments are incorrect or there may other
    important differences or similarities.

    Thanks again for pointing e4graph out.

    In general, the Pointrel System bears a resemblance to any system that
    implements something related to Peter Chen's "Entity Relational" (ER)
    model.
    http://bit.csc.lsu.edu/~chen/display.html
    http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter2/node1.html
    The Pointrel System is not quite the same as pure ER in some
    implementation details (e.g. it embeds the notion of a space, it focuses
    on triads not arbitrary relationships, the newer version doesn't have
    relations as first class objects, it doesn't seperate attributes from
    relations, etc.), but otherwise has many of the same features.

    --Paul Fernhout
    http://www.pointrel.org

    Niki Spahiev wrote:
    8/29/2003, 19:11:51, Paul D. Fernhout wrote:

    [...]
    PDF> Well, to chime in here, in a "friendly" competition / cooperation sort
    PDF> of way, the Pointrel Data Repository System,
    PDF> http://sourceforge.net/projects/pointrel/
    PDF> while not quite an object database (and admittedly its case being
    PDF> easier) has a simple API in the bare minimum use case (it has more
    PDF> complex variants). Here is an example of its use (with fragments
    PDF> inspired in response to an earlier c.l.p poster's use case a few days ago):

    How it compares with e4graph?


    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
  • Paul D. Fernhout at Sep 1, 2003 at 8:44 pm

    Paul D. Fernhout wrote:
    Thanks for the pointer to e4graph. I found more info about it here:
    http://www.marshallbrain.com/robotic-freedom.htm
    Oops, sorry, I had the wrong url in my cut&paste buffer and didn't do
    not enough proofreading. The (incorrect, and unrelated) URL I posted
    came from this recent Slashdot article:
    http://slashdot.org/article.pl?sid/08/31/182228&mode=thread
    on the future of jobs given robotics.

    The correct URL for the e4graph introduction I quoted is:
    http://www.e4graph.com/e4graph/e4graphintro.html

    --Paul Fernhout
    http://www.pointrel.org



    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 28, '03 at 9:37p
activeSep 1, '03 at 8:44p
posts22
users6
websitepython.org

People

Translate

site design / logo © 2022 Grokbase