Hello

I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM
(Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64
GNU/Linux).
I have a table "tickets" with 1 000 000 insert by month ( ~2600 each 2hours
) (for the moment 13000000 rows for 5GB )
and i have to extract statistics ( number of calls, number of calls less
than X seconds, number of news calles, number of calls from the new
callers, ...)



1°) The server will handle max 15 queries at a time.
So this is my postgresql.conf

max_connections = 15
shared_buffers = 995600 # ~1Go
temp_buffers = 1000
work_mem = 512000 # ~512Ko
maintenance_work_mem = 1048576 # 1Mo

max_fsm_pages = 41522880 # ~40Mo
max_fsm_relations = 8000
checkpoint_segments = 10
checkpoint_timeout = 3600
effective_cache_size = 13958643712 # 13Go

stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
autovacuum = off

How can i optimize the configuration?




2°) My queries look like
SELECT tday AS n,
COUNT(DISTINCT(a.appelant)) AS new_callers,
COUNT(a.appelant) AS new_calls
FROM cirpacks.tickets AS a
WHERE LENGTH(a.appelant) > 4
AND a.service_id IN ( 95, 224, 35, 18 )
AND a.exploitant_id = 66
AND a.tyear = 2008
AND a.tmonth = 08
AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant =
a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66
HAVING to_char(MIN(b.premier_appel), 'YYYYMMDD') = to_char(a.date,
'YYYYMMDD') )
GROUP BY n
ORDER BY n;

or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND
audiotel IN ( '...', '...' ....);
or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
cirpacks.tickets WHERE '2007-01-01' <= date AND date <= '2008-08-31' AND
audiotel IN ( '...', '...' ....);


which indexes are the best ?
case 0:
index_0_0 (service_id, exploitant_id, palier_id, habillage_id, tweek, tday,
thour, tmonth, tyear, length(appelant::text))
index_0_1 (audiotel, cat, tweek, tday, thour, tmonth, tyear,
length(appelant::text))

or case 1
index_1_0 (audiotel, cat, service_id, exploitant_id, palier_id,
habillage_id, tweek, tday, thour, tmonth, tyear, length(appelant::text))

or case 2:
index_2_0 (tweek, tday, thour, tmonth, tyear, length(appelant::text))
index_2_1 (service_id, exploitant_id, palier_id, habillage_id)
index_2_2 (audiotel, cat)

or even (case 3)
index_3_0 (service_id, exploitant_id, palier_id, habillage_id, tyear,
length(appelant::text))
index_3_1 (service_id, exploitant_id, palier_id, habillage_id, tmonth,
tyear, length(appelant::text))
index_3_2 (service_id, exploitant_id, palier_id, habillage_id, tday,
tmonth, tyear, length(appelant::text))
[...]

Search Discussions

  • Richard Huxton at Oct 1, 2008 at 11:36 am

    paul@wayr.org wrote:
    Hello

    I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM
    (Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64
    GNU/Linux).
    Unless you're committed to this version, I'd seriously look into 8.3
    from backports (or compiled yourself). I'd expect some serious
    performance improvements for the workload you describe.
    I have a table "tickets" with 1 000 000 insert by month ( ~2600 each 2hours
    ) (for the moment 13000000 rows for 5GB )
    and i have to extract statistics ( number of calls, number of calls less
    than X seconds, number of news calles, number of calls from the new
    callers, ...)
    OK, so not a lot of updates, but big aggregation queries. You might want
    to pre-summarise older data as the system gets larger.
    1°) The server will handle max 15 queries at a time.
    So this is my postgresql.conf

    max_connections = 15
    Well, I'd allow 20 - just in case.
    shared_buffers = 995600 # ~1Go
    temp_buffers = 1000
    work_mem = 512000 # ~512Ko
    I'd be tempted to increase work_mem by a lot, possibly even at the
    expense of shared_buffers. You're going to be summarising large amounts
    of data so the larger the better, particularly as your database is
    currently smaller than RAM. Start with 5MB then try 10MB, 20MB and see
    what difference it makes.
    maintenance_work_mem = 1048576 # 1Mo

    max_fsm_pages = 41522880 # ~40Mo
    max_fsm_relations = 8000
    See what a vacuum full verbose says for how much free space you need to
    track.
    checkpoint_segments = 10
    checkpoint_timeout = 3600
    With your low rate of updates shouldn't matter.
    effective_cache_size = 13958643712 # 13Go
    Assuming that's based on what "top" or "free" say, that's fine. Don't
    forget it will need to be reduced if you increase work_mem or
    shared_buffers.
    stats_start_collector = on
    stats_command_string = on
    stats_block_level = on
    stats_row_level = on
    autovacuum = off
    Make sure you're vacuuming if autovacuum is off.
    How can i optimize the configuration?
    Looks reasonable, so far as you can tell from an email. Try playing with
    work_mem though.
    2°) My queries look like
    SELECT tday AS n,
    COUNT(DISTINCT(a.appelant)) AS new_callers,
    COUNT(a.appelant) AS new_calls
    FROM cirpacks.tickets AS a
    WHERE LENGTH(a.appelant) > 4
    AND a.service_id IN ( 95, 224, 35, 18 )
    AND a.exploitant_id = 66
    AND a.tyear = 2008
    AND a.tmonth = 08
    Index on (tyear,tmonth) might pay off, or one on exploitant_id perhaps.
    AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant =
    a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66
    HAVING to_char(MIN(b.premier_appel), 'YYYYMMDD') = to_char(a.date,
    'YYYYMMDD') )
    It looks like you're comparing two dates by converting them to text.
    That's probably not the most efficient way of doing it. Might not be an
    issue here.
    GROUP BY n
    ORDER BY n;

    or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
    cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND
    audiotel IN ( '...', '...' ....);
    or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
    cirpacks.tickets WHERE '2007-01-01' <= date AND date <= '2008-08-31' AND
    audiotel IN ( '...', '...' ....);


    which indexes are the best ?
    The only way to find out is to test. You'll want to run EXPLAIN after
    adding each index to see what difference it makes. Then you'll want to
    see what impact this has on overall workload.

    Mostly though, I'd try out 8.3 and see if that buys you a free
    performance boost.

    --
    Richard Huxton
    Archonet Ltd
  • Paul at Oct 2, 2008 at 8:01 am
    Thanks,

    Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.

    So i'm going to play with work_mem & shared_buffers.

    With big shared_buffers pgsql tells me
    shmget(cle=5432001, taille=11183431680, 03600).
    so i do "echo 13183431680 > /proc/sys/kernel/shmmax" ( 10Go + 2Go just
    in case)

    but pgsql tells me again that it there's not enought shm..
    How can i compute the go shmmax for my server ?
    On Wed, 01 Oct 2008 12:36:48 +0100, Richard Huxton wrote:
    paul@wayr.org wrote:
    Hello

    I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM
    (Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64
    GNU/Linux).
    Unless you're committed to this version, I'd seriously look into 8.3
    from backports (or compiled yourself). I'd expect some serious
    performance improvements for the workload you describe.
    I have a table "tickets" with 1 000 000 insert by month ( ~2600 each 2hours
    ) (for the moment 13000000 rows for 5GB )
    and i have to extract statistics ( number of calls, number of calls less
    than X seconds, number of news calles, number of calls from the new
    callers, ...)
    OK, so not a lot of updates, but big aggregation queries. You might want
    to pre-summarise older data as the system gets larger.
    1°) The server will handle max 15 queries at a time.
    So this is my postgresql.conf

    max_connections = 15
    Well, I'd allow 20 - just in case.
    shared_buffers = 995600 # ~1Go
    temp_buffers = 1000
    work_mem = 512000 # ~512Ko
    I'd be tempted to increase work_mem by a lot, possibly even at the
    expense of shared_buffers. You're going to be summarising large amounts
    of data so the larger the better, particularly as your database is
    currently smaller than RAM. Start with 5MB then try 10MB, 20MB and see
    what difference it makes.
    maintenance_work_mem = 1048576 # 1Mo

    max_fsm_pages = 41522880 # ~40Mo
    max_fsm_relations = 8000
    See what a vacuum full verbose says for how much free space you need to
    track.
    checkpoint_segments = 10
    checkpoint_timeout = 3600
    With your low rate of updates shouldn't matter.
    effective_cache_size = 13958643712 # 13Go
    Assuming that's based on what "top" or "free" say, that's fine. Don't
    forget it will need to be reduced if you increase work_mem or
    shared_buffers.
    stats_start_collector = on
    stats_command_string = on
    stats_block_level = on
    stats_row_level = on
    autovacuum = off
    Make sure you're vacuuming if autovacuum is off.
    How can i optimize the configuration?
    Looks reasonable, so far as you can tell from an email. Try playing with
    work_mem though.
    2°) My queries look like
    SELECT tday AS n,
    COUNT(DISTINCT(a.appelant)) AS new_callers,
    COUNT(a.appelant) AS new_calls
    FROM cirpacks.tickets AS a
    WHERE LENGTH(a.appelant) > 4
    AND a.service_id IN ( 95, 224, 35, 18 )
    AND a.exploitant_id = 66
    AND a.tyear = 2008
    AND a.tmonth = 08
    Index on (tyear,tmonth) might pay off, or one on exploitant_id perhaps.
    AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant =
    a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66
    HAVING to_char(MIN(b.premier_appel), 'YYYYMMDD') = to_char(a.date,
    'YYYYMMDD') )
    It looks like you're comparing two dates by converting them to text.
    That's probably not the most efficient way of doing it. Might not be an
    issue here.
    GROUP BY n
    ORDER BY n;

    or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
    cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND
    audiotel IN ( '...', '...' ....);
    or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
    cirpacks.tickets WHERE '2007-01-01' <= date AND date <= '2008-08-31' AND
    audiotel IN ( '...', '...' ....);


    which indexes are the best ?
    The only way to find out is to test. You'll want to run EXPLAIN after
    adding each index to see what difference it makes. Then you'll want to
    see what impact this has on overall workload.

    Mostly though, I'd try out 8.3 and see if that buys you a free
    performance boost.

    --
    Richard Huxton
    Archonet Ltd
  • Richard Huxton at Oct 2, 2008 at 8:29 am

    paul@wayr.org wrote:
    Thanks,

    Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.
    That's why backports.org was invented :-)
    Or does can't mean "not allowed to"?
    So i'm going to play with work_mem & shared_buffers.

    With big shared_buffers pgsql tells me
    shmget(cle=5432001, taille=11183431680, 03600).
    so i do "echo 13183431680 > /proc/sys/kernel/shmmax" ( 10Go + 2Go just
    in case)

    but pgsql tells me again that it there's not enought shm..
    How can i compute the go shmmax for my server ?
    I'm not seeing anything terribly wrong there. Are you hitting a limit
    with shmall?

    Oh - and I'm not sure there's much point in having more shared-buffers
    than you have data.

    Try much larger work_mem first, I think that's the biggest gain for you.

    --
    Richard Huxton
    Archonet Ltd
  • Tommy Gildseth at Oct 2, 2008 at 8:37 am

    Richard Huxton wrote:
    paul@wayr.org wrote:
    Thanks,

    Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.
    That's why backports.org was invented :-)
    Or does can't mean "not allowed to"?

    Well, running production servers from backports can be a risky
    proposition too, and can land you in situations like the one discussed
    in "Debian packages for Postgres 8.2" from the General list.


    --
    Tommy Gildseth
  • Richard Huxton at Oct 2, 2008 at 9:08 am

    Tommy Gildseth wrote:
    Richard Huxton wrote:
    paul@wayr.org wrote:
    Thanks,

    Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.
    That's why backports.org was invented :-)
    Or does can't mean "not allowed to"?
    Well, running production servers from backports can be a risky
    proposition too, and can land you in situations like the one discussed
    in "Debian packages for Postgres 8.2" from the General list.
    Well, there's a reason why "stable" is a popular choice for production
    servers. I must admit that I build from source for my PostgreSQL
    packages (because I care which version I run). I was reading one of the
    Perl fellows recommending the same.

    --
    Richard Huxton
    Archonet Ltd
  • Paul at Oct 2, 2008 at 3:15 pm
    I played with work_mem and setting work_mem more than 256000 do not change
    the performance.

    I try to upgrade to 8.3 using etch-backports but it's a new install not an
    upgrade.
    So i have to create users, permissions, import data again, it scared me so
    i want to find another solutions first.
    But now i'll try 8.3


    On Thu, 02 Oct 2008 10:36:50 +0200, Tommy Gildseth
    wrote:
    Richard Huxton wrote:
    paul@wayr.org wrote:
    Thanks,

    Unfornatly, i can't update pgsql to 8.3 since it's not in debian
    stable.
    That's why backports.org was invented :-)
    Or does can't mean "not allowed to"?

    Well, running production servers from backports can be a risky
    proposition too, and can land you in situations like the one discussed
    in "Debian packages for Postgres 8.2" from the General list.


    --
    Tommy Gildseth
  • Thomas Spreng at Oct 3, 2008 at 10:11 am

    On 2. Oct, 2008, at 10:00, <paul@wayr.org> wrote:
    Unfornatly, i can't update pgsql to 8.3 since it's not in debian
    stable.
    Did you consider using backport packages (http://www.backports.org) for
    Debian Etch? They are providing postgresql v.8.3.3 packages for Debian
    Etch.

    Cheers.

    PS: We are also running backported postgresql packages using Debian Etch
    on our production servers without any problems.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedSep 29, '08 at 1:43p
activeOct 3, '08 at 10:11a
posts8
users4
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase