=?utf-8?Q?=C5=81ukasz_Walkowski?= <lukasz.walkowski@homplex.pl> writes:
3. And this part is most interesting for me. Columns browser, eventsource, eventtype, devicetype, operatingsystem contain a small pool of strings - for example for devicetype this is set to Computer, Mobile, Tablet or Unknown. Browser is set to normalized browser name. In every case I can store those data using one of 3 different methods:
- store as varchar as it is now - nice and easy, but index on those columns is quite big and I think storing many of similar strings is waste of space.
If you're starting to be concerned about space, it's definitely time to
get away from this choice. Depending on what locale you're using,
comparing varchar values can be quite an expensive operation, too.
- store only id's and join external tables as needed, for example for browsers I only need smallint key, as there is a limited number of browsers.
I think the main "pro" of this approach is that it doesn't use any
nonstandard SQL features, so you preserve your options to move to some
other database in the future. The main "con" is that you'd be buying into
fairly significant rewriting of your application code, since just about
every query involving these columns would have to become a join.

FWIW, I'd be inclined to just use integer not smallint. The space savings
from smallint is frequently illusory because of alignment considerations
--- for instance, an index on a single smallint column will *not* be any
smaller than one on a single int column. And smallint has some minor
usage annoyances because it's a second-class citizen in the type promotion
hierarchy --- you may find yourself needing explicit casts to smallint
here and there.
- introduce enumerator type for each of the column and store those values as enumerator. This one should be the most space efficient, but it will be problematic in case of changing column values like browser or operatingsystem as altering enumerator isn't that simple.
Space-wise this is going to be equivalent to the integer-foreign-key
solution. It's much nicer from a notational standpoint, though, because
you don't need joins --- it's likely that you'd need few if any
application code changes to go this route. (But I'd advise doing some
testing to verify that before you take it as a given.)

You're right though that enums are not a good option if you expect
frequent changes in the pool of allowed values. I guess the question
is how often does that happen, in your application? Adding a new value
from time to time isn't much of a problem unless you want to get picky
about how it sorts relative to existing values. But you can't ever delete
an individual enum value, and we don't support renaming them either.
(Though if you're desperate, I believe a manual UPDATE on the pg_enum
catalog would work for that.)

Another thing to think about is whether you have auxiliary data about each
value that might usefully be stored as additional columns in the small
tables. The enum approach doesn't directly handle that, though I suppose
you could still create small separate tables that use an enum column as
primary key.

    regards, tom lane

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
grouppgsql-performance @
postedAug 31, '13 at 1:36p
activeAug 31, '13 at 5:06p

2 users in discussion

Łukasz Walkowski: 2 posts Tom Lane: 1 post



site design / logo © 2017 Grokbase