lets say I hve the following in the 'fruits' table:

Round orange
Sunkist orange
navel orange
strawberry
blueberry
sunkist orange
apple

how would I get something like the following:

count as c | Fruit type
---------------------------------
4 | orange
2 | berry
1 | apple

Search Discussions

  • Bill Moran at Sep 8, 2010 at 6:51 pm

    In response to jackassplus <jackassplus@gmail.com>:

    lets say I have the following in the 'fruits' table:

    Round orange
    Sunkist orange
    navel orange
    strawberry
    blueberry
    sunkist orange
    apple

    how would I get something like the following:

    count as c | Fruit type
    ---------------------------------
    4 | orange
    2 | berry
    1 | apple
    Organize your data better.

    If you want to analyze data in the way you describe, then you need to store
    the data in a way that makes in analyzable. I'm 100% sure that someone can
    provide you with some query magic that will make the example you provided
    produce the results you're wanting. I'm also 100% sure that a few weeks
    or months down the line you find exceptions that will break that solution,
    and that said solution will never really be reliable.

    If you're going to analyze fruits by type, then you need a type column.
    You can then group by type to get count()s. To ensure data integrity,
    you should probably create a fruit_type table with a unique column that
    lists the possible types, and then foreign key the fruit_type column in
    the fruits table to that to ensure nothing funky is entered. An enum
    for type is another possibility.
  • Jackassplus at Sep 8, 2010 at 7:23 pm
    <snip>
    To ensure data integrity,
    you should probably create a fruit_type table with a unique column that
    lists the possible types, and then foreign key the fruit_type column in
    the fruits table to that to ensure nothing funky is entered.  An enum
    for type is another possibility.
    In the real world, this column actiually holds Operating Systems.
    I have 7 variants of Windows XP, even more of server, a dozen *nixes,
    etc, etc and it is fed from an external app.
    So I am looking for a magic query, or even a perl function to wrap up
    insde a procedure, whatever.
  • Bricklen at Sep 8, 2010 at 8:23 pm

    On Wed, Sep 8, 2010 at 12:22 PM, jackassplus wrote:
    <snip>
    To ensure data integrity,
    you should probably create a fruit_type table with a unique column that
    lists the possible types, and then foreign key the fruit_type column in
    the fruits table to that to ensure nothing funky is entered.  An enum
    for type is another possibility.
    In the real world, this column actiually holds Operating Systems.
    I have 7 variants of Windows XP, even more of server, a dozen *nixes,
    etc, etc and it is fed from an external app.
    So I am looking for a magic query, or even a perl function to wrap up
    insde a procedure, whatever.
    If your column values can be grouped by regexp, you might be able to
    get away with using a CASE statement.
    eg.
    select
    sum(case when val ~* 'windows xp' then 1 else 0 end) as winxp,
    sum(case when val ~* 'nix' then 1 else 0 end) as nix,
    sum(case when val ~* 'redhat|rhel' then 1 else 0 end) as rh
    ....

    (I don't have your original mail handy to reference the example values).
  • Susan Cassidy at Sep 8, 2010 at 8:32 pm
    Use a CASE statement?

    something like:
    select case WHEN os ~* E'^windows' then 'windows'
    WHEN os ~* E'server' then 'server'
    WHEN os ~* E'nix$' then '*nix'
    else 'other' end
    as osval, count(*) from os_tbl group by osval order by osval;

    The hard part is making sure your regexes cover all the bases, without duplication.

    It still sounds like the value should be a reference to a unique value in a small table of operating system entries, then store the value, rather than the string, in the main table.

    Susan

    -----Original Message-----
    From: pgsql-general-owner@postgresql.org On Behalf Of jackassplus
    Sent: Wednesday, September 08, 2010 12:22 PM
    To: pgsql-general@postgresql.org
    Subject: Re: [GENERAL] how do i count() similar items


    <snip>
    To ensure data integrity,
    you should probably create a fruit_type table with a unique column that
    lists the possible types, and then foreign key the fruit_type column in
    the fruits table to that to ensure nothing funky is entered.  An enum
    for type is another possibility.
    In the real world, this column actiually holds Operating Systems.
    I have 7 variants of Windows XP, even more of server, a dozen *nixes,
    etc, etc and it is fed from an external app.
    So I am looking for a magic query, or even a perl function to wrap up
    insde a procedure, whatever.

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general
  • Jackassplus at Sep 9, 2010 at 12:56 am

    Use a CASE statement?

    something like:
    select case WHEN os ~* E'^windows' then 'windows'
    WHEN os ~* E'server' then 'server'
    WHEN os ~* E'nix$' then '*nix'
    else 'other' end
    as osval, count(*) from os_tbl group by osval order by osval;

    The hard part is making sure your regexes cover all the bases, without
    duplication.

    It still sounds like the value should be a reference to a unique value in a
    small table of operating system entries, then store the value, rather than
    the string, in the main table.

    Susan
    I'll give this a shot. looks like it will work well. Regexes, I don't mind,
    SQL, I'm new at. Much less the pg functions.
    thanks to both you and bricklin.
  • Darren Duncan at Sep 8, 2010 at 7:23 pm

    jackassplus wrote:
    lets say I hve the following in the 'fruits' table:

    Round orange
    Sunkist orange
    navel orange
    strawberry
    blueberry
    sunkist orange
    apple

    how would I get something like the following:

    count as c | Fruit type
    ---------------------------------
    4 | orange
    2 | berry
    1 | apple
    Your best solution is to have separate database fields for your 2 levels of
    detail, say call them "fruit type" and "variety", or maybe other fields
    depending on purpose.

    This is because the organization of fruit is somewhat arbitrary and isn't easily
    encoded into variety names without kludges. Also, many fruit with similar names
    are actually very different. So simple textual analysis of the field you have
    often won't be very useful.

    For example, a kiwifruit is a berry, and a strawberry is very different from a
    blueberry, the first not actually being a berry at all. There are also various
    citrus which are crosses between oranges and other citrus. There are also stone
    fruits which are crosses between plums and apricots. Also, "Sunkist" isn't a
    variety of orange but rather is a brand name used for multiple varieties.

    (Yes, members of my family cultivate fruit trees, so I have a lot of second-hand
    experience with this.)

    If you still want to go by textual analysis as you suggest, it will be
    nontrivial and involve pattern matching for common suffixes where some are
    separate words and some aren't, and you can do this pattern matching in an extra
    select-list item which you then group by.

    -- Darren Duncan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedSep 8, '10 at 3:23p
activeSep 9, '10 at 12:56a
posts7
users5
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase