FAQ
I've been on a crusade against my developers lately after having had
enough of fields being added to already-insane primary keys.

Today I was given a script to add a field to make what is now a
15-field primary key, all natural data. The first 9 fields of the key
also provide the foreign key into the parent table (although a foreign
key constraint is, of course, not used), and it carries on down the
line, growing worse and worse. They've all heard me calling for
surrogate keys, but they say they need uniqueness among this set of
fields. Then when they discover duplicates, they just add another
field.

I even suggested having NO primary key, just a non-unique key on the
first 4 or 5 fields. But again they say they need to guarantee
uniqueness. These tables are bulk-loaded and can contain over 150
million records. There is no query that even comes close to utilizing
these fields, it is purely a unique constraint. However, since a
unique constraint also creates an index, I didn't see any advantage
there.

I'm looking for the words to basically doom this practice once and for
all. I've already told them about the degradation of normalization,
the storage needed for these unused indexes, etc. They claim there's
nothing they can do for now. Migrating to surrogate keys would be
non-trivial and isn't a priority, it seems.

Search Discussions

  • JApplewhite_at_austinisd.org at Jan 27, 2007 at 3:01 pm
    I actually prefer natural keys to surrogates. Our developers tend to slap
    surrogate, sequence-generated, keys on tables that still allow duplicate
    natural data. I'm surprised that your developers want to code the joins
    based on sevaral columns, instead of a single column. Usually they're
    lazier than that.;-)

    To enforce uniqueness, we're all stuck with having space taken up by
    indexes, but that's OK with me. The important thing is the quality of the
    data.

    You should consider partitioning these tables. That and all local indexes
    should help all round.

    IMHO you're off base in arguing against normalization. It's a GOOD thing
    - a VERY GOOD thing, actually. Again, quality of the data and flexibility
    of the data model are the most important things.

    They really should have FK constraints where called for. Plus indexes on
    the FK columns.

    Jack C. Applewhite - Database Administrator
    Austin (Texas) Independent School District
    512.414.9715 (wk) / 512.935.5929 (pager)

    Same-Day Stump Grinding! Senior Discounts!

    Mike's Tree Service

    "Don Seiler"
    Sent by: oracle-l-bounce_at_freelists.org
    01/26/2007 10:50 PM
    Please respond to
    don_at_seiler.us

    To
    oracle-l
    cc

    Subject
    The Case Against Compound/Natural Keys

    I've been on a crusade against my developers lately after having had
    enough of fields being added to already-insane primary keys.

    Today I was given a script to add a field to make what is now a
    15-field primary key, all natural data. The first 9 fields of the key
    also provide the foreign key into the parent table (although a foreign
    key constraint is, of course, not used), and it carries on down the
    line, growing worse and worse. They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.

    I even suggested having NO primary key, just a non-unique key on the
    first 4 or 5 fields. But again they say they need to guarantee
    uniqueness. These tables are bulk-loaded and can contain over 150
    million records. There is no query that even comes close to utilizing
    these fields, it is purely a unique constraint. However, since a
    unique constraint also creates an index, I didn't see any advantage
    there.

    I'm looking for the words to basically doom this practice once and for
    all. I've already told them about the degradation of normalization,
    the storage needed for these unused indexes, etc. They claim there's
    nothing they can do for now. Migrating to surrogate keys would be
    non-trivial and isn't a priority, it seems.
  • Niall Litchfield at Jan 27, 2007 at 3:23 pm
    Well first up natural vs synthetic is a rather religious argument. My
    observations are the, perhaps apparently contradictory.

    Really well designed applications tend to have 'natural' keys.
    Badly designed applications also tend to have natural keys.

    It sounds rather like you might be falling into the latter category.

    In a well designed application careful thought is given to the entities
    involved, what properties they have that you are interested in and how they
    relate to each other, either logically or in terms of the rules of the model
    (aka business rules). When you do this the 'natural' key will tend to fall
    out of the analysis.

    In other applications tables tend to be thrown in one after another to meet
    changing application requirements, to have columns added and deleted almost
    at whim and to have a changing definition of what it is that the developer
    is interested in from the table. Sometimes the table ends up fulfilling an
    entirely different purpose altogether than it's original one.

    I'd not be arguing for or against the choice of natural vs surrogate but for
    a rational process of design. Adding a field to a primary key because the
    data arriving as the primary key is non-unique seems daft to me, surely you
    throw out the duplicates at the load stage. I can however almost guarantee
    that if you get a surrogate key, they'll still want a second unique index
    and will want to add fields to it in the event of non-uniqueness.
    On 1/27/07, Don Seiler wrote:

    I've been on a crusade against my developers lately after having had
    enough of fields being added to already-insane primary keys.

    Today I was given a script to add a field to make what is now a
    15-field primary key, all natural data. The first 9 fields of the key
    also provide the foreign key into the parent table (although a foreign
    key constraint is, of course, not used), and it carries on down the
    line, growing worse and worse. They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.

    I even suggested having NO primary key, just a non-unique key on the
    first 4 or 5 fields. But again they say they need to guarantee
    uniqueness. These tables are bulk-loaded and can contain over 150
    million records. There is no query that even comes close to utilizing
    these fields, it is purely a unique constraint. However, since a
    unique constraint also creates an index, I didn't see any advantage
    there.

    I'm looking for the words to basically doom this practice once and for
    all. I've already told them about the degradation of normalization,
    the storage needed for these unused indexes, etc. They claim there's
    nothing they can do for now. Migrating to surrogate keys would be
    non-trivial and isn't a priority, it seems.
    --
    http://www.freelists.org/webpage/oracle-l

    --
    Niall Litchfield
    Oracle DBA
    http://www.orawin.info

    --
    http://www.freelists.org/webpage/oracle-l
  • Johan Muller at Jan 27, 2007 at 8:38 pm
    Niall nailed it right off the bat - you've a really bad case of a
    de-normalized app by the sound of it.

    1st (by now famous rule) of normalization - "...the (table) data has to be
    dependant on the key, the whole key and nothing but the whole key - so help
    me Codd".

    It's probably too late, but a competent data-modeler to drive out a
    normalized data model instead of what the duh-velopers assume to be the
    requirements could be the voice of sanity you need in the environment.

    When the app gets re-designed (inevitable when it collapses under it's own
    weight) this could the out you need at that time.
    On 1/27/07, Niall Litchfield wrote:

    Well first up natural vs synthetic is a rather religious argument. My
    observations are the, perhaps apparently contradictory.

    1) Really well designed applications tend to have 'natural' keys.
    2) Badly designed applications also tend to have natural keys.

    It sounds rather like you might be falling into the latter category.

    In a well designed application careful thought is given to the entities
    involved, what properties they have that you are interested in and how they
    relate to each other, either logically or in terms of the rules of the model
    (aka business rules). When you do this the 'natural' key will tend to fall
    out of the analysis.

    In other applications tables tend to be thrown in one after another to
    meet changing application requirements, to have columns added and deleted
    almost at whim and to have a changing definition of what it is that the
    developer is interested in from the table. Sometimes the table ends up
    fulfilling an entirely different purpose altogether than it's original one.

    I'd not be arguing for or against the choice of natural vs surrogate but
    for a rational process of design. Adding a field to a primary key because
    the data arriving as the primary key is non-unique seems daft to me, surely
    you throw out the duplicates at the load stage. I can however almost
    guarantee that if you get a surrogate key, they'll still want a second
    unique index and will want to add fields to it in the event of
    non-uniqueness.
    On 1/27/07, Don Seiler wrote:

    I've been on a crusade against my developers lately after having had
    enough of fields being added to already-insane primary keys.

    Today I was given a script to add a field to make what is now a
    15-field primary key, all natural data. The first 9 fields of the key
    also provide the foreign key into the parent table (although a foreign
    key constraint is, of course, not used), and it carries on down the
    line, growing worse and worse. They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.

    I even suggested having NO primary key, just a non-unique key on the
    first 4 or 5 fields. But again they say they need to guarantee
    uniqueness. These tables are bulk-loaded and can contain over 150
    million records. There is no query that even comes close to utilizing
    these fields, it is purely a unique constraint. However, since a
    unique constraint also creates an index, I didn't see any advantage
    there.

    I'm looking for the words to basically doom this practice once and for
    all. I've already told them about the degradation of normalization,
    the storage needed for these unused indexes, etc. They claim there's
    nothing they can do for now. Migrating to surrogate keys would be
    non-trivial and isn't a priority, it seems.
    --
    http://www.freelists.org/webpage/oracle-l


    --
    Niall Litchfield
    Oracle DBA
    http://www.orawin.info
    --
    Johan Muller
    Oracle DBA
    (214) 676 2147 anytime.

    "I love deadlines. I like the whooshing sound they make as they fly by."
    Douglas Adams

    --
    http://www.freelists.org/webpage/oracle-l
  • JApplewhite_at_austinisd.org at Jan 28, 2007 at 3:03 am
    I actually prefer natural keys to surrogates. Our developers tend to slap
    surrogate, sequence-generated, keys on tables that still allow duplicate
    natural data. I'm surprised that your developers want to code the joins
    based on several columns, instead of a single column. Usually they're
    lazier than that.;-)

    To enforce uniqueness, we're all stuck with having space taken up by
    indexes, but that's OK with me. The important thing is the quality of the
    data.

    You should consider partitioning these tables. That and all local indexes
    should help all round.

    IMHO you're off base in arguing against normalization. It's a GOOD thing
    - a VERY GOOD thing, actually. Again, quality of the data and flexibility
    of the data model are the most important things.

    They really should have FK constraints where called for. Plus indexes on
    the FK columns.

    Jack C. Applewhite - Database Administrator
    Austin (Texas) Independent School District
    512.414.9715 (wk) / 512.935.5929 (pager)

    Same-Day Stump Grinding! Senior Discounts!

    Mike's Tree Service

    "Don Seiler"
    Sent by: oracle-l-bounce_at_freelists.org
    01/26/2007 10:50 PM
    Please respond to
    don_at_seiler.us

    To
    oracle-l
    cc

    Subject
    The Case Against Compound/Natural Keys

    I've been on a crusade against my developers lately after having had
    enough of fields being added to already-insane primary keys.

    Today I was given a script to add a field to make what is now a
    15-field primary key, all natural data. The first 9 fields of the key
    also provide the foreign key into the parent table (although a foreign
    key constraint is, of course, not used), and it carries on down the
    line, growing worse and worse. They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.

    I even suggested having NO primary key, just a non-unique key on the
    first 4 or 5 fields. But again they say they need to guarantee
    uniqueness. These tables are bulk-loaded and can contain over 150
    million records. There is no query that even comes close to utilizing
    these fields, it is purely a unique constraint. However, since a
    unique constraint also creates an index, I didn't see any advantage
    there.

    I'm looking for the words to basically doom this practice once and for
    all. I've already told them about the degradation of normalization,
    the storage needed for these unused indexes, etc. They claim there's
    nothing they can do for now. Migrating to surrogate keys would be
    non-trivial and isn't a priority, it seems.
  • Jaromir nemec at Jan 28, 2007 at 10:47 pm
    Don,
    These tables are bulk-loaded and .
    I assume your database is a kind of DW system.
    They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.
    I thing you address two different problems here:
    a) how to enforce the uniqueness of a fact table
    b) how to define the primary key (natural / surrogate) on the dimension
    table

    Uniqueness of a fact table can be enforced using index, alternatively you
    may define a cleaning step in the loading process (eliminating the dups
    before the load) and not to rely on an index. A similar pre-processing step
    can enforce the consistency of the FK relation to the parent table.

    For a dimensional table (your "parent table") there are two options in my
    opinion
    a) use natural key as a primary key of the dimension and a foreign key of
    the fact table - it is your implementation
    b) use surrogate key for PK of the dimension and FK of the fact table and
    additionally denormalize the dimension natural key into the fact table.
    There is a nice example on Jonathan Lewis blog demonstrating the
    consequences of using "pure" surrogates.
    When to use surrogate keys? It depends on the "nature" of the natural keys.
    A little example: I wouldn't for sure set up a DW with natural key (only)
    for Oracle product names. Querying webDB, htmlDB, RAC,. over years of
    history would be a nightmare.
    A real value added surrogate key processing must implement some logic
    deciding when to assign a new key (for a new dimension instance) or to reuse
    existing one (for a new version of changed dimension instance).

    HTH

    Jaromir D.B. Nemec
    ----- Original Message -----
    From: "Don Seiler"
    To: "oracle-l"
    Sent: Saturday, January 27, 2007 5:48 AM
    Subject: The Case Against Compound/Natural Keys

    --
    http://www.freelists.org/webpage/oracle-l
  • Don Seiler at Jan 29, 2007 at 3:23 pm
    My database is actually more of an all-purpose hybrid. We have OLTP
    data where data is entered by sales and updated by customer service,
    etc. We then bulk-load call records and processed billing information
    (we are a telecom) that the customer service app uses when customers
    call about their bills or question a call. So we don't have fact vs
    dimension tables as you might find in an ideal DW instance.

    To be precise, I don't hate natural keys for the sake of hating
    natural keys. It's the composite keys that I hate, and especially
    when there are no queries that such a large index would address.

    These tables are already partitioned with local indexes. We are
    running the "rolling window" scenario, keeping the most recent 4
    months.

    Jack: are you suggesting that I put a foreign key constraint/index on
    the leading X number of fields already in my primary key
    constraint/index? Because that is what it would be, and is yet
    another exhibit of my frustration with this design (or lack thereof).

    Don.
    On 1/28/07, jaromir nemec wrote:
    Don,
    These tables are bulk-loaded and .
    I assume your database is a kind of DW system.
    They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.
    I thing you address two different problems here:
    a) how to enforce the uniqueness of a fact table
    b) how to define the primary key (natural / surrogate) on the dimension
    table

    Uniqueness of a fact table can be enforced using index, alternatively you
    may define a cleaning step in the loading process (eliminating the dups
    before the load) and not to rely on an index. A similar pre-processing step
    can enforce the consistency of the FK relation to the parent table.

    For a dimensional table (your "parent table") there are two options in my
    opinion
    a) use natural key as a primary key of the dimension and a foreign key of
    the fact table - it is your implementation
    b) use surrogate key for PK of the dimension and FK of the fact table and
    additionally denormalize the dimension natural key into the fact table.
    There is a nice example on Jonathan Lewis blog demonstrating the
    consequences of using "pure" surrogates.
    When to use surrogate keys? It depends on the "nature" of the natural keys.
    A little example: I wouldn't for sure set up a DW with natural key (only)
    for Oracle product names. Querying webDB, htmlDB, RAC,. over years of
    history would be a nightmare.
    A real value added surrogate key processing must implement some logic
    deciding when to assign a new key (for a new dimension instance) or to reuse
    existing one (for a new version of changed dimension instance).

    HTH
    Jaromir D.B. Nemec
    ----- Original Message -----
    From: "Don Seiler"
    To: "oracle-l"
    Sent: Saturday, January 27, 2007 5:48 AM
    Subject: The Case Against Compound/Natural Keys


    --
    http://www.freelists.org/webpage/oracle-l
  • Bobak, Mark at Jan 29, 2007 at 3:34 pm
    I guess I'm a little late to this party.

    FYI, Steve Adams has a nice write up on his website of synthetic vs.
    natural keys:
    http://www.ixora.com.au/tips/design/synthetic_keys.htm

    -Mark

    --
    Mark J. Bobak
    Senior Oracle Architect
    ProQuest Information & Learning

    There is nothing so useless as doing efficiently that which shouldn't be
    done at all. -Peter F. Drucker, 1909-2005

    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Don Seiler
    Sent: Monday, January 29, 2007 10:23 AM
    To: jaromir nemec
    Cc: oracle-l
    Subject: Re: The Case Against Compound/Natural Keys

    My database is actually more of an all-purpose hybrid. We have OLTP
    data where data is entered by sales and updated by customer service,
    etc. We then bulk-load call records and processed billing information
    (we are a telecom) that the customer service app uses when customers
    call about their bills or question a call. So we don't have fact vs
    dimension tables as you might find in an ideal DW instance.

    To be precise, I don't hate natural keys for the sake of hating natural
    keys. It's the composite keys that I hate, and especially when there
    are no queries that such a large index would address.

    These tables are already partitioned with local indexes. We are running
    the "rolling window" scenario, keeping the most recent 4 months.

    Jack: are you suggesting that I put a foreign key constraint/index on
    the leading X number of fields already in my primary key
    constraint/index? Because that is what it would be, and is yet another
    exhibit of my frustration with this design (or lack thereof).

    Don.
    On 1/28/07, jaromir nemec wrote:
    Don,
    These tables are bulk-loaded and .
    I assume your database is a kind of DW system.
    They've all heard me calling for
    surrogate keys, but they say they need uniqueness among this set of
    fields. Then when they discover duplicates, they just add another
    field.
    I thing you address two different problems here:
    a) how to enforce the uniqueness of a fact table
    b) how to define the primary key (natural / surrogate) on the
    dimension table

    Uniqueness of a fact table can be enforced using index, alternatively
    you may define a cleaning step in the loading process (eliminating the
    dups before the load) and not to rely on an index. A similar
    pre-processing step can enforce the consistency of the FK relation to
    the parent table.
    For a dimensional table (your "parent table") there are two options in
    my opinion
    a) use natural key as a primary key of the dimension and a foreign key
    of the fact table - it is your implementation
    b) use surrogate key for PK of the dimension and FK of the fact table
    and additionally denormalize the dimension natural key into the fact table.
    There is a nice example on Jonathan Lewis blog demonstrating the
    consequences of using "pure" surrogates.
    When to use surrogate keys? It depends on the "nature" of the natural keys.
    A little example: I wouldn't for sure set up a DW with natural key
    (only) for Oracle product names. Querying webDB, htmlDB, RAC,. over
    years of history would be a nightmare.
    A real value added surrogate key processing must implement some logic
    deciding when to assign a new key (for a new dimension instance) or to
    reuse existing one (for a new version of changed dimension instance).

    HTH

    Jaromir D.B. Nemec
    ----- Original Message -----
    From: "Don Seiler"
    To: "oracle-l"
    Sent: Saturday, January 27, 2007 5:48 AM
    Subject: The Case Against Compound/Natural Keys


    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • Jared Still at Jan 29, 2007 at 10:34 pm

    On 1/29/07, Bobak, Mark wrote:
    I guess I'm a little late to this party.

    FYI, Steve Adams has a nice write up on his website of synthetic vs.
    natural keys:
    http://www.ixora.com.au/tips/design/synthetic_keys.htm
    Steve is effectively saying "Consider your data, your application
    and your use of the data to determine which kind of keys to use"

    As has already been said: Normalization is a good thing. This
    has been proven over, and over, and over.

    Personally and in general, in an OLTP environment, I like natural keys in
    lookup tables, and surrogate keys in volatile data. This use of surrogate
    keys can save you much effort in the future when the business rules
    change. (always a matter of 'when', not 'if')

    --
    Jared Still
    Certifiable Oracle DBA and Part Time Perl Evangelist

    --
    http://www.freelists.org/webpage/oracle-l
  • JApplewhite_at_austinisd.org at Jan 29, 2007 at 4:31 pm
    Don,

    If an FK constraint would help preserve data integrity, then it is called
    for. Sounds like one of the FK column sets in your large "child" tables
    serves as the leading columns for those tables' PK column sets. In those
    cases, you already have the FK column sets indexed, so there's no extra
    space cost. If there are other FK columns or column sets, then adding FK
    constraints to ensure data integrity would necessitate indexes for them to
    facilitate joins and prevent the table-locking that I seem to recall
    happens when inserts and deletes are done on parent tables of child tables
    with unindexed FKs. I may be out-of-date on that, though - no time to
    check right now.

    Surprised you got my email. I never saw it (them, actually, I resent it)
    come through Oracle-L.

    Jack C. Applewhite - Database Administrator
    Austin (Texas) Independent School District
    512.414.9715 (wk) / 512.935.5929 (pager)

    Same-Day Stump Grinding! Senior Discounts!

    Mike's Tree Service

    "Don Seiler"
    Sent by: oracle-l-bounce_at_freelists.org
    01/29/2007 09:25 AM
    Please respond to
    don_at_seiler.us

    To
    "jaromir nemec"
    cc
    oracle-l
    Subject
    Re: The Case Against Compound/Natural Keys

    My database is actually more of an all-purpose hybrid. We have OLTP
    data where data is entered by sales and updated by customer service,
    etc. We then bulk-load call records and processed billing information
    (we are a telecom) that the customer service app uses when customers
    call about their bills or question a call. So we don't have fact vs
    dimension tables as you might find in an ideal DW instance.

    To be precise, I don't hate natural keys for the sake of hating
    natural keys. It's the composite keys that I hate, and especially
    when there are no queries that such a large index would address.

    These tables are already partitioned with local indexes. We are
    running the "rolling window" scenario, keeping the most recent 4
    months.

    Jack: are you suggesting that I put a foreign key constraint/index on
    the leading X number of fields already in my primary key
    constraint/index? Because that is what it would be, and is yet
    another exhibit of my frustration with this design (or lack thereof).

    Don.
  • Nuno Souto at Jan 30, 2007 at 11:16 am

    Don Seiler wrote,on my timestamp of 30/01/2007 2:23 AM:

    My database is actually more of an all-purpose hybrid.
    Just like the majority of dbs out there. Like it or not,
    OLTP and DW are two extremes. The middle of the road and
    most common db type is by far the DSS. Which has bits of
    both plus its own distinctive set of a zillion tables.
    Then again, I've recently heard someone ask "DSS, what's
    that?" (must be the "powerpoint OCP" factor again...)
    To be precise, I don't hate natural keys for the sake of hating
    natural keys. It's the composite keys that I hate, and especially
    when there are no queries that such a large index would address.
    and when you have those natural composite keys present on
    a large number of tables as FKs. And indexed, of course,
    because otherwise joins would be painfully slow.

    This is of course the detail that is omitted from the
    "natural key" camp arguments: it's not just the index
    on the composite natural PK, it's also the one that will be
    needed wherever that composite, very long natural key will
    be used as a FK!

    As opposed to a single column index on the synthetic key
    wherever it is used as a FK, plus a unique index on the
    composite key in the prime parent table.

    I know which overhead I'd rather take.

    Yes, I'm painfully aware of the index hot spot problem
    with growing synthetic key values: been aware of that for
    many years, even before database engines had an "r" before
    the "dbms"! It's not a new problem and it has been overcome
    in other dbms engines: there is no reason it can't in Oracle
    other than inactivity on the part of their R&D.

    There is nothing in indexing b-tree theory that says the top
    level index node has to be stored in one single block, for
    example. In fact, there is nothing in b-tree indexing theory
    that requires the concept of "blocks": that's a physical storage
    requirement, not a logical one necessary for the operation
    of b-tree indexes.

    But even with that problem present, I'll gladly take the
    trade-off and use synthetic keys anywhere and everywhere
    rather than having to incur the horrible maintenance and
    space overhead that is natural composite keys used as PKs
    *AND* as FKs.

    --
    Cheers
    Nuno Souto
    dbvision_at_iinet.net.au
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedJan 27, '07 at 4:48a
activeJan 30, '07 at 11:16a
posts11
users8
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase