Just out of curiousity, does anyone here know how well query caching
works in general with an extremely high-volume search engine?

It seems like as your search volume goes up, and the number of unique
queries goes up with it, the cache hit rate would go down, and caching
would help less and less. Urs Hoelzle (Google) mentioned this in a
talk he gave at UW in 2002:

(link to video on this page)

On 2/7/06, Byron Miller wrote:
I use OSCache with great success.

I would an amazing amount (more then i assumed) of
queries we get are duplicate of one fashion or another
so on top of warming things up as much as possible to
the OS buffer cache we use OSCache as well.

You could also use Squid to cache pages for x amount
of time to offload your hotspots to free up cpu time
for those ad-hoc/random queries. (as long as you
aren't forcing content expire in your headers)


--- "Insurance Squared Inc."

Running nutch 0.71on Mandrake linux 2006 (P4 with a
2 sata drives on
raid 0, 2 gigs of ram, about 4 million pages, but
expecting to hit 10+),
and finding that our initial queries take up to
15-20 seconds to return
results. I'd like to get that speeded up and am
seeking thoughts on how
to do so.

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupnutch-user @
postedFeb 6, '06 at 8:33p
activeFeb 8, '06 at 3:56a



site design / logo © 2021 Grokbase