Hi,

Recently I have tried TSearch2 (also on Linux and win with an
Athlon64 3000+ machine).
I have a table and I loaded some text files into it, so I have a row
number, and a text column and there is a ts_vec column for tsvector.
I created the gist index for ts_vec. The table has ~ 1 million
records. It seems that using a regular search query like:

SELECT line_number, headline(line, to_tsquery('keyword'))
FROM tstexts
WHERE ts_vec @@ to_tsquery('keyword')
ORDER BY rank(ts_vec, to_tsquery('keyword')) DESC

is slow for the first time (7-15 sec), but then using the same keyword
next time it is fast (10-100 ms). The reason is, as I read, first time
it is not cached at all, but next time the index pages are
cached so it is fast.

I think in a real word application, in this form, it is useless, because
the 15 sec search time is not allowable for any user. Is there a way
to initially cache the index or tune some postgres parameter? (I tried
to increase "shared_buffers", "effective_cache_size", "work_mem" but
had no effect on it) Or I should look for another search technology? (I
just want to provide a 2 sec maximum search time at 1 million records,
I think it is not a big expectation nowadays)


Abbath

Search Discussions

  • Richard Huxton at Mar 15, 2006 at 2:52 pm

    Abbath wrote:
    is slow for the first time (7-15 sec), but then using the same keyword
    next time it is fast (10-100 ms). The reason is, as I read, first time
    it is not cached at all, but next time the index pages are
    cached so it is fast.

    I think in a real word application, in this form, it is useless, because
    the 15 sec search time is not allowable for any user.
    What, never? Even if this facility is only used once a year by one user
    and you have 1000 other users who need their queries to complete in 0.2
    secs at most? What you mean is that it's not useful for *your*
    application - don't assume the same applies to all applications.
    Is there a way
    to initially cache the index or tune some postgres parameter? (I tried
    to increase "shared_buffers", "effective_cache_size", "work_mem" but
    had no effect on it) Or I should look for another search technology? (I
    just want to provide a 2 sec maximum search time at 1 million records,
    I think it is not a big expectation nowadays)
    If you want to force the data to be cached, just put a cron-job in to
    run a query for "abc" or whatever once a minute.

    Of course, if it turns out that your users aren't running the query very
    often then you're wasting resources, and if they are running it often
    then again you're wasting resources. But - if you really need to, that's
    the solution.

    --
    Richard Huxton
    Archonet Ltd
  • Abbath at Mar 15, 2006 at 7:36 pm
    Hello Richard,

    Wednesday, March 15, 2006, 3:35:26 PM, you wrote:
    Abbath wrote:
    is slow for the first time (7-15 sec), but then using the same keyword
    next time it is fast (10-100 ms). The reason is, as I read, first time
    it is not cached at all, but next time the index pages are
    cached so it is fast.

    I think in a real word application, in this form, it is useless, because
    the 15 sec search time is not allowable for any user.
    What, never? Even if this facility is only used once a year by one user
    and you have 1000 other users who need their queries to complete in 0.2
    secs at most? What you mean is that it's not useful for *your*
    application - don't assume the same applies to all applications.
    The search function will be a frequently used one so it shouldn't be
    slow. Ok, maybe it is not suitable for me if I want a user friendly
    search function.
    Is there a way
    to initially cache the index or tune some postgres parameter? (I tried
    to increase "shared_buffers", "effective_cache_size", "work_mem" but
    had no effect on it) Or I should look for another search technology? (I
    just want to provide a 2 sec maximum search time at 1 million records,
    I think it is not a big expectation nowadays)
    If you want to force the data to be cached, just put a cron-job in to
    run a query for "abc" or whatever once a minute.
    I can't guess what the user want to search.
    Of course, if it turns out that your users aren't running the query very
    often then you're wasting resources, and if they are running it often
    then again you're wasting resources. But - if you really need to, that's
    the solution.
    It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
    time...I will see how can I tune it more.

    Thanks for the reply.

    Abbath
  • Scott Marlowe at Mar 15, 2006 at 7:49 pm

    On Wed, 2006-03-15 at 13:36, Abbath wrote:
    Hello Richard,

    Wednesday, March 15, 2006, 3:35:26 PM, you wrote:
    If you want to force the data to be cached, just put a cron-job in to
    run a query for "abc" or whatever once a minute.
    I can't guess what the user want to search.
    But that query will likely load up all the index info into memory.
    Of course, if it turns out that your users aren't running the query very
    often then you're wasting resources, and if they are running it often
    then again you're wasting resources. But - if you really need to, that's
    the solution.
    It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
    time...I will see how can I tune it more.
    That statement is pretty telling. You're new to PostgreSQL I'll
    assume. You'll need to read up on the periodic maintenance section of
    the docs.

    Here ya go:
    http://www.postgresql.org/docs/8.1/interactive/maintenance.html
  • Abbath at Mar 16, 2006 at 12:09 am
    Hello Scott,

    Wednesday, March 15, 2006, 8:49:00 PM, you wrote:
    On Wed, 2006-03-15 at 13:36, Abbath wrote:
    Hello Richard,

    Wednesday, March 15, 2006, 3:35:26 PM, you wrote:
    If you want to force the data to be cached, just put a cron-job in to
    run a query for "abc" or whatever once a minute.
    I can't guess what the user want to search.
    But that query will likely load up all the index info into memory.
    Misunderstanding: I experienced that if I run a search for a keyword
    first time it is slow, then next time it is fast BUT for that keyword,
    not for any keyword.
    Of course, if it turns out that your users aren't running the query very
    often then you're wasting resources, and if they are running it often
    then again you're wasting resources. But - if you really need to, that's
    the solution.
    It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
    time...I will see how can I tune it more.
    That statement is pretty telling. You're new to PostgreSQL I'll
    assume. You'll need to read up on the periodic maintenance section of
    the docs.
    Here ya go:
    http://www.postgresql.org/docs/8.1/interactive/maintenance.html
    Yes, I have just started to use postgres so I need further experience.
    Thanks for the link.
    ---------------------------(end of
    broadcast)---------------------------
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postgresql.org so that your
    message can get through to the mailing list cleanly
  • Scott Marlowe at Mar 16, 2006 at 12:33 am

    On Wed, 2006-03-15 at 18:09, Abbath wrote:
    Hello Scott,

    Wednesday, March 15, 2006, 8:49:00 PM, you wrote:
    I can't guess what the user want to search.
    But that query will likely load up all the index info into memory.
    Misunderstanding: I experienced that if I run a search for a keyword
    first time it is slow, then next time it is fast BUT for that keyword,
    not for any keyword.
    I think you mean "ONLY for that keyword" there? If everything else
    becomes fast but the keyword becomes slow, then we've got a very
    interesting (and possibly difficult) problem.

    Full text search is the kind of problem you throw ONE database at on a
    machine with LOTS of ram. It doesn't need lots of CPU horsepower, or
    even disk performance, as long as everything can fit into RAM. Then,
    set shared_buffers to 10-15% of the memory size, and let the OS do the
    caching.

    One of the best performance tuning docs is here:

    http://www.varlena.com/GeneralBits/Tidbits/perf.html

    That statement is pretty telling. You're new to PostgreSQL I'll
    assume. You'll need to read up on the periodic maintenance section of
    the docs.
    Here ya go:
    http://www.postgresql.org/docs/8.1/interactive/maintenance.html
    Yes, I have just started to use postgres so I need further experience.
    Thanks for the link.
    We all started somewhere. PostgreSQL is a pretty good place to start
    learning databases.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedMar 15, '06 at 1:50p
activeMar 16, '06 at 12:33a
posts6
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase