I am getting in the habit of storing much of my day-to-day
information in postgres, rather than "flat" files.
I have not had any problems of data corruption or loss,
but others have warned me against abandoning files.
I like the benefits of enforced data types, powerful searching,
data integrity, etc.
But I worry a bit about the "safety" of my data, residing
in a big scary database, instead of a simple friendly
folder-based files system.

I ran across this quote on Wikipedia at
http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29

"Text files are also much safer than databases, in that should disk
corruption occur, most of the mail is likely to be unaffected, and any
that is damaged can usually be recovered."

How naive (optimistic?) is it to think that "the database" can
replace "the filesystem"?

TJ O'Donnell
http://www.gnova.com/

Search Discussions

  • Ron Johnson at Sep 6, 2007 at 5:53 pm

    On 09/06/07 10:43, TJ O'Donnell wrote:
    I am getting in the habit of storing much of my day-to-day
    information in postgres, rather than "flat" files.
    I have not had any problems of data corruption or loss,
    but others have warned me against abandoning files.
    I like the benefits of enforced data types, powerful searching,
    data integrity, etc.
    But I worry a bit about the "safety" of my data, residing
    in a big scary database, instead of a simple friendly
    folder-based files system.

    I ran across this quote on Wikipedia at
    http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29

    "Text files are also much safer than databases, in that should disk
    corruption occur, most of the mail is likely to be unaffected, and any
    that is damaged can usually be recovered."

    How naive (optimistic?) is it to think that "the database" can
    replace "the filesystem"?
    Text file are *simple*. When fsck repairs the disk and creates a
    bunch of recovery files, just fire up $EDITOR (or cat, for that
    matter) and piece your text files back together. You may lose a
    block of data, but the rest is there, easy to read.

    Database files are *complex*. Pointers and half-vacuumed freespace
    and binary fields and indexes and WALs, yadda yadda yadda. And, by
    design, it's all got to be internally consistent. Any little
    corruption and *poof*, you've lost a table. A strategically placed
    corruption and you've lost your database.

    But... that's why database vendors create backup/restore commands.

    You *do* back up your database(s), right??????

    - --
    Ron Johnson, Jr.
    Jefferson LA USA

    Give a man a fish, and he eats for a day.
    Hit him with a fish, and he goes away for good!
  • Tom Lane at Sep 6, 2007 at 7:09 pm

    "TJ O'Donnell" <tjo@acm.org> writes:
    I ran across this quote on Wikipedia at
    http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
    "Text files are also much safer than databases, in that should disk
    corruption occur, most of the mail is likely to be unaffected, and any
    that is damaged can usually be recovered."
    This is mostly FUD. You can get data out of a damaged database, too.
    (I'd also point out that modern filesystems are nearly as complicated
    as databases --- try getting your "simple" text files back if the
    filesystem metadata is fried.)

    In the end there is no substitute for a good backup policy...

    regards, tom lane
  • Kenneth Downs at Sep 6, 2007 at 7:17 pm

    Tom Lane wrote:
    "TJ O'Donnell" <tjo@acm.org> writes:
    I ran across this quote on Wikipedia at
    http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
    "Text files are also much safer than databases, in that should disk
    corruption occur, most of the mail is likely to be unaffected, and any
    that is damaged can usually be recovered."
    Should probably insert as well the standard disclaimer about Wikipedia.
    Great source of info, but that particular sentence has not been
    corrected yet by the
    forces-that-dictate-everything-ends-up-correct-sooner-or-later to point
    out the design trade-offs between simple systems like files (or paper
    for that matter) vs more complex but safer systems such as databases.

    And no, I wont write it.... :)
    This is mostly FUD. You can get data out of a damaged database, too.
    (I'd also point out that modern filesystems are nearly as complicated
    as databases --- try getting your "simple" text files back if the
    filesystem metadata is fried.)

    In the end there is no substitute for a good backup policy...

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 2: Don't 'kill -9' the postmaster

    --
    Kenneth Downs
    Secure Data Software, Inc.
    www.secdat.com www.andromeda-project.org
    631-689-7200 Fax: 631-689-0527
    cell: 631-379-0010
  • Chris Browne at Sep 6, 2007 at 8:04 pm

    "TJ O'Donnell" writes:
    I am getting in the habit of storing much of my day-to-day
    information in postgres, rather than "flat" files.
    I have not had any problems of data corruption or loss,
    but others have warned me against abandoning files.
    I like the benefits of enforced data types, powerful searching,
    data integrity, etc.
    But I worry a bit about the "safety" of my data, residing
    in a big scary database, instead of a simple friendly
    folder-based files system.

    I ran across this quote on Wikipedia at
    http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29

    "Text files are also much safer than databases, in that should disk
    corruption occur, most of the mail is likely to be unaffected, and any
    that is damaged can usually be recovered."

    How naive (optimistic?) is it to think that "the database" can
    replace "the filesystem"?
    There is certainly some legitimacy to the claim; the demerits of
    things like the Windows Registry as compared to "plain text
    configuration" have been pretty clear.

    If the "monstrous fragile binary data structure" gets stomped on, by
    any means, then you can lose data in pretty massive and invisible
    ways. It's most pointedly true if the data representation conflates
    data and indexes in some attempt to "simplify" things by having Just
    One File. In such a case, if *any* block gets corrupted, that has the
    potential to irretrievably destroy the database.

    However, the argument may also be taken too far.

    -> A PostgreSQL database does NOT assemble data into "one monstrous
    fragile binary data structure."

    Each table consists of data files that are separate from index
    files. Blowing up an index file *doesn't* blow up the data.

    -> You are taking regular backups, right???

    If you are, that's a considerable mitigation of risks. I don't
    believe it's typical to set up off-site backups of one's Windows
    Registry, in contrast...

    -> In the case of PostgreSQL, mail stored in tuples is likely to get
    TOASTed, which changes the shape of things further; the files get
    smaller (due to compression), which changes the "target profile"
    for this data.

    -> In the contrary direction, storing the data as a set of files, each
    of which requires storing metadata in binary filesystem data
    structures provides an (invisible-to-the-user) interface to
    what is, no more or less, than a "monstrous fragile binary data
    structure."

    That is, after all, what a filesystem is, if you strip out the
    visible APIs that turn it into open()/close()/mkdir() calls.

    If the wrong directory block gets "crunched," then /etc could get
    munched just like the Windows Registry could.

    Much of the work going into filesystem efforts, the last dozen years,
    is *exceeding* similar to the work going into managing storage in
    DBMSes. People working in both areas borrow from each other.

    The natural result is that they live in fairly transparent homes in
    relation to one another. Someone who "casts stones" of the sort in
    your quote is making the fallacious assumption that since the fact
    that a filesystem is a database of file information is kept fairly
    much invisible, that a filesystem is somehow fundamentally less
    vulnerable to the same kinds of corruptions.

    Reality is that they are vulnerable in similar ways.

    The one thing I could point to, in Eudora, as a *further* visible
    merit that DOES retain validity is that there is not terribly much
    metadata entrusted to the filesystem. Much the same is true for the
    Rand MH "Mail Handler", where each message is a file with very little
    filesystem-based metadata.

    If you should have a filesystem failure, and discover you have a
    zillion no-longer-named in lost+found, and decline to recover from a
    backup, it should nonetheless be possible to re-process them through
    any mail filters, and rebuild a mail filesystem that will appear
    roughly similar to what it was like before.

    That actually implies that there is *more* "conservatism of format"
    than first meets the eye; in effect, the data is left in raw form,
    replete with redundancies that can, in order to retain the ability to
    perform this recovery process, *never* be taken out.

    There is, in effect, more than meets the eye here...
    --
    (format nil "~S@~S" "cbbrowne" "acm.org")
    http://linuxfinances.info/info/advocacy.html
    "Lumping configuration data, security data, kernel tuning parameters,
    etc. into one monstrous fragile binary data structure is really dumb."
    - David F. Skoll
  • Trevor Talbot at Sep 6, 2007 at 9:45 pm
    There's also a point in regard to how modifications are made to your
    data store. In general, things working with text files don't go to
    much effort to maintain durability like a real database would. The
    most direct way of editing a text file is to make all the changes in
    memory, then write the whole thing out. Some editors make backup
    files, or use a create-delete-rename cycle, but they won't necessarily
    force the data to disk -- if it's entirely in cache you could end up
    losing the contents of the file anyway.

    In the general case on the systems I work with, corruption is a
    relatively low concern due to the automatic error detection and
    correction my disks perform, and the consistency guarantees of modern
    filesystems. Interruptions (e.g. crashes or power failures) are much
    more likely, and in that regard the typical modification process of
    text files is more of a risk than working with a database.

    I've also had times where faulty RAM corrupted gigabytes of data on
    disk due to cache churn alone.

    It will always depend on your situation. In both cases, you
    definitely want backups just for the guarantees neither approach can
    make.


    [way off topic]
    In regard to the Windows Registry in particular...
    There is certainly some legitimacy to the claim; the demerits of
    things like the Windows Registry as compared to "plain text
    configuration" have been pretty clear.
    -> You are taking regular backups, right???

    If you are, that's a considerable mitigation of risks. I don't
    believe it's typical to set up off-site backups of one's Windows
    Registry, in contrast...
    Sometimes I think most people get their defining impressions of the
    Windows Registry from experience with the Windows 9x line. I'll
    definitely agree that it was simply awful there, and there's much to
    complain about still, but...

    The Windows Registry in NT is an actual database, with a WAL,
    structured and split into several files, replication of some portions
    in certain network arrangements, redundant backup of key parts in a
    local system, and any external storage or off-site backup system for
    Windows worth its salt does, indeed, back it up.

    It's been that way for about a decade.
  • Chris Browne at Sep 7, 2007 at 2:04 am

    "Trevor Talbot" writes:
    There's also a point in regard to how modifications are made to your
    data store. In general, things working with text files don't go to
    much effort to maintain durability like a real database would. The
    most direct way of editing a text file is to make all the changes in
    memory, then write the whole thing out. Some editors make backup
    files, or use a create-delete-rename cycle, but they won't
    necessarily force the data to disk -- if it's entirely in cache you
    could end up losing the contents of the file anyway.
    In the case of Eudora, if its filesystem access protocol involves
    writing a new text file, and completing that before unlinking the old
    version, then the risk of "utter destruction" remains fairly low
    specifically because of the nature of access protocol.
    In the general case on the systems I work with, corruption is a
    relatively low concern due to the automatic error detection and
    correction my disks perform, and the consistency guarantees of
    modern filesystems. Interruptions (e.g. crashes or power failures)
    are much more likely, and in that regard the typical modification
    process of text files is more of a risk than working with a
    database.
    Error rates are not so low that it's safe to be cavalier about this.
    I've also had times where faulty RAM corrupted gigabytes of data on
    disk due to cache churn alone.
    Yeah, and there is the factor that as disk capacities grow, the
    chances of there being errors grow (more bytes, more opportunities)
    and along with that, the number of opportunities for broken checksums
    to match by accident also grow. (Ergo "don't be cavalier" unless you
    can be pretty sure that your checksums are getting more careful...)
    It will always depend on your situation. In both cases, you
    definitely want backups just for the guarantees neither approach can
    make.
    Certainly.
    [way off topic]
    In regard to the Windows Registry in particular...
    There is certainly some legitimacy to the claim; the demerits of
    things like the Windows Registry as compared to "plain text
    configuration" have been pretty clear.
    -> You are taking regular backups, right???

    If you are, that's a considerable mitigation of risks. I don't
    believe it's typical to set up off-site backups of one's Windows
    Registry, in contrast...
    Sometimes I think most people get their defining impressions of the
    Windows Registry from experience with the Windows 9x line. I'll
    definitely agree that it was simply awful there, and there's much to
    complain about still, but...

    The Windows Registry in NT is an actual database, with a WAL,
    structured and split into several files, replication of some portions
    in certain network arrangements, redundant backup of key parts in a
    local system, and any external storage or off-site backup system for
    Windows worth its salt does, indeed, back it up.

    It's been that way for about a decade.
    I guess I deserve that :-).

    There is a further risk, that is not directly mitigated by backups,
    namely that if you don't have some lowest common denominator that's
    easy to recover from, you may not have a place to recover that data.

    In the old days, Unix filesystems were sufficiently buggy corruptible
    that it was worthwhile to have an /sbin partition, all statically
    linked, generally read-only, and therefore seldom corrupted, to have
    as a base for recovering the rest of the system.

    Using files in /etc, for config, and /sbin for enough tools to recover
    with, provided a basis for recovery.

    In contrast, there is definitely risk to stowing all config in a DBMS
    such that you may have the recursive problem that you can't get the
    parts of the system up to help you recover it without having the DBMS
    running, but since it's corrupted, you don't have the config needed to
    get the system started, and so we recurse...
    --
    let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
    http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
    As of next Monday, TRIX will be flushed in favor of VISI-CALC.
    Please update your programs.
  • Ron Johnson at Sep 7, 2007 at 5:10 am

    On 09/06/07 20:45, Chris Browne wrote:
    quension@gmail.com ("Trevor Talbot") writes:
    There's also a point in regard to how modifications are made to your
    data store. In general, things working with text files don't go to
    much effort to maintain durability like a real database would. The
    most direct way of editing a text file is to make all the changes in
    memory, then write the whole thing out. Some editors make backup
    files, or use a create-delete-rename cycle, but they won't
    necessarily force the data to disk -- if it's entirely in cache you
    could end up losing the contents of the file anyway.
    In the case of Eudora, if its filesystem access protocol involves
    writing a new text file, and completing that before unlinking the old
    version, then the risk of "utter destruction" remains fairly low
    specifically because of the nature of access protocol.
    mbox is a monolithic file also, and you need to copy/delete,
    copy/delete, yadda yadda yadda. Just to do anything, you need 2x as
    much free disk space as you biggest mbox file. What a PITA.

    mh and Maildir are, as has been partially mentioned, much more
    efficient in that regard.

    (Yes... mbox is an excellent transport format.)

    - --
    Ron Johnson, Jr.
    Jefferson LA USA

    Give a man a fish, and he eats for a day.
    Hit him with a fish, and he goes away for good!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedSep 6, '07 at 5:08p
activeSep 7, '07 at 5:10a
posts8
users6
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase