I'm not sure why you're calling that a denormalization.
Count1 through Count15 are likely on the same domain, but they are each in
Now if you use the role name intelligently, say 20090315_count,
20090401_count, etc. then you actually need only the ID as the primary key
to completely convey to the user the meaning of each column in the 16-tuple.
Now I find it surpassing strange that a single abstract ID gives meaning
about your data to your users, but perhaps there is another table to which
id is a foreign key that has further description of the rows. Not strictly
needed, since it is likely to be 1 to 1, but possibly useful especially if
there is a lot in it and it is somewhat static compared to the
(id,count(1..15)) relation, forgiving the shorthand which I hope is
What about rotating months? That is, making a new relation where the oldest
count column is removed and a newest count column is added? It seems likely
to me that the drop column, add column on this relatively short row is
likely to remain within a single block, so that is probably not a big deal.
Now, as for maintenance of programs, your first impression might be "OUCH!"
now if I want to set up some query that compares the two most recent "count"
columns you're going to have to constantly update the programs.
However, THAT is a tremendously good use of views, where it is quite easy to
define views that you only have to trivally update in concert (just after)
the drop column add column exercise. Then the reports on the relative
columns remain static and are valid in context of when they are run, while
ad hoc reports specifically naming columns make sense and no one has to look
up what count1 means, because the column name (or role.domain column as Dr.
Codd put it) is clear unto itself.
Further, since presumably there is period in time where the newest
yyyymmdd_count column must be populated, you could well populate that column
before redefining the views and then drop the oldest column at your leisure.
Users directly using the yyyymmdd_count name would need to have some
mechanism to know the column is valid, but users of views would never be
exposed to columns partially filled or partially dropped.
Now I suspect some folks are just itching to write back that this is
denormalized. Before you pull that trigger, I suggest you actually read at
least pages 377-380 of Communications of the ACM, Volume 13, Number 6, June
On Behalf Of Michael Moore
Sent: Friday, March 27, 2009 1:44 PM
To: [email protected]
Subject: Re: Looking for table design input
If performance is going to be a problem, and you can improve performance be
denormalization, then do it. There are many factors which you have not
mentioned such as Service Level Agreement, frequency of update, average
transaction size etc etc. that will determine your final decision.
On Fri, Mar 27, 2009 at 8:13 AM, wrote:
I am looking for some input on a table design (oracle 10.2.0.3). Here is
what I know:
the table will probably have several hundreds of millions of rows (I am
looking for a better estimate from my customers)
it will store the counts per ID pre date per code for each of the ID.
ID - Year - Month - Day - Code1 - Code2 - Code3 - Count
We will have a primary key on all the columns except for the count
The data will be stored for the 15 half-month periods (the Day will be
either 1 or 15)
The data will be selected based on the ID and the users will need to get
all the data
I will partition the table by Year/Month/Day to make purging easier.
What I wonder is whether the following design (which I do not like very
much) may give a better performance:
ID - Code1 - Code2 - Code3 - Count1 - Count2 - ... Count15
Because it may potentially return less rows and therefore require less IO.
Please consider the environment before printing this email.