Grokbase
Topics Posts Groups | in
x
[ help ]

Arjen van der Meijden (acmma...@tweakers.net)

Profile | Posts (92)

User Information

Display Name:Arjen van der Meijden
Partial Email Address:acmma...@tweakers.net
Posts:
92 total
8 in PostgreSQL - Bugs
75 in PostgreSQL - Performance
9 in Xapian

5 Most Recent

All Posts
1) Arjen van der Meijden Re: [Xapian-discuss] Xapian performance testing
| +1 vote
Well, the slowness is to be expected due to the way xapian (or rather omega and...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Well, the slowness is to be expected due to the way xapian (or rather
omega and omindex/scriptindex) process those kind of terms: They split a
term and index the elements as seperate terms. And when such a term is
encountered in a query, the term is rewritten to be a phrase query (i.e.
d-link, "d link" and d.link all yield the same results).

Phrase queries are (relatively) slow because they use positional
information, and these kind of examples are extra heavy because terms
like 'd', 'link', 's' and 'video' are very common words. For subterms
that aren't so frequent or (since 1.0.4) when ANDed with some other
terms or filters its less of a problem.
It probably does pay off to index any "term + single character term c"
as both "term", "c" and "term + c" rather than only "term" and "c".

Afaik Olly had already some ideas to be a much more efficient while
consuming a bit more (disk)space, but I have no idea whether he has
decided upon anything and/or started implementing them.
He already did optimise the "version number" special cases, which used
to be very slow too (1, 0 etc are very common terms).

Best regards,

Arjen


On 19-3-2008 20:05 Kevin Duraj wrote:
> I have encountered the same issue having slow queries when term
> contain dash characters.
>
> I do not have solution except that I replaced the '-' with 'dash'
> inside of the index.
> Example: 'd-link', 's-video' as temporary solution: ' ddashlink', ' sdashvideo'
>
> Perhaps we could address this slowness issue of dash characters in the
> near future.
>
> Kevin Duraj
> http://myhealthcare.com
>
>
> On Tue, Mar 18, 2008 at 1:26 AM, Arjen van der Meijden
> <acmmailing@tweakers.net> wrote:
>> On 5-11-2007 14:40, Olly Betts wrote:
>> > BTW, I have implemented the hoisting of the positional information
>> > checking part of NEAR and PHRASE, so that the "AND" inside can be
>> > merged with other AND and FILTER operations. This gave a big
>> > performance boost to the slow queries (~50% saving in time just
>> > from this one change) and a good boost to the other queries (~25%
>>  > saving from just this change).
>>  >
>> > This optimisation and all the earlier ones are in 1.0.4, so once you
>> > upgrade to that, it would be interesting to see what the slow query log
>> > looks like with these new optimisations in place.
>>
>> We finally found time to upgrade our 0.9.8 to 1.0.5 and reindex the
>> whole database. The results so far are quite good.
>> When taking the daily average of our forum search result page it went
>> down from about 0.55 seconds to about 0.38 seconds.
>> Looking at the log-files, the slow query log file (queries taking more
>> than 2 seconds) dramatically reduced in size. Prior to the update we had
>> 3190 and 3112 lines in a week and now in the latest week it had only 863
>>  lines.
>>
>> The slowest queries seem to be the single-term phrases with a single
>> character attached to a common word like 'd-link', 's-video' and
>> variants on that with only a few additional terms. As expected, I don't
>> see any version numbers anymore in the slow query log.
>>
>>  Best regards,
>>
>>  Arjen van der Meijden
>>  Tweakers.net
>>
>> _______________________________________________
>>  Xapian-discuss mailing list
>> [email protected: Xapian-di...@lists.xapian.org]
>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
>

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
2) Arjen van der Meijden Re: [Xapian-discuss] Xapian performance testing
| +1 vote
We finally found time to upgrade our 0.9.8 to 1.0.5 and reindex the whole database. The results so...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On 5-11-2007 14:40, Olly Betts wrote:
> BTW, I have implemented the hoisting of the positional information
> checking part of NEAR and PHRASE, so that the "AND" inside can be
> merged with other AND and FILTER operations. This gave a big
> performance boost to the slow queries (~50% saving in time just
> from this one change) and a good boost to the other queries (~25%
> saving from just this change).
>
> This optimisation and all the earlier ones are in 1.0.4, so once you
> upgrade to that, it would be interesting to see what the slow query log
> looks like with these new optimisations in place.

We finally found time to upgrade our 0.9.8 to 1.0.5 and reindex the
whole database. The results so far are quite good.
When taking the daily average of our forum search result page it went
down from about 0.55 seconds to about 0.38 seconds.
Looking at the log-files, the slow query log file (queries taking more
than 2 seconds) dramatically reduced in size. Prior to the update we had
3190 and 3112 lines in a week and now in the latest week it had only 863
lines.

The slowest queries seem to be the single-term phrases with a single
character attached to a common word like 'd-link', 's-video' and
variants on that with only a few additional terms. As expected, I don't
see any version numbers anymore in the slow query log.

Best regards,

Arjen van der Meijden
Tweakers.net

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
3) Arjen van der Meijden Re: Hardware for PostgreSQL
| +1 vote
It really depends on your budget and workload. Will it be read-heavy or write-heavy? How large will...
PostgreSQL - Performance
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On 31-10-2007 17:45 Ketema wrote:
> I understand query tuning and table design play a large role in
> performance, but taking that factor away
> and focusing on just hardware, what is the best hardware to get for Pg
> to work at the highest level
> (meaning speed at returning results)?

It really depends on your budget and workload. Will it be read-heavy or
write-heavy? How large will the database be? Are those concurrent users
actively executing queries or is the actual concurrent query load lower
(it normally is)?
You should probably also try to estimate the amount of concurrently
executed queries and how heavy those queries are, as that is normally
more important as a performance measure. And normally its much less than
the amount of concurrently connected users.

> How does pg utilize multiple processors? The more the better?
> Are queries spread across multiple processors?

It forks a process for a new connection and leaves the multi-cpu
scheduling to the OS. It does not spread a single query across multiple
cpu's. But with many concurrent users, you normally don't want or need
that anyway, it would mainly add extra stress to the scheduling of your
operating system.

> Is Pg 64 bit?
It can be compiled 64-bit and is available pre-compiled as 64-bits as well.

> If so what processors are recommended?

I think the x86-class cpu's deliver the most bang for buck and are the
best tested with postgres. Both AMD and Intel cpu's are pretty good, but
I think currently a system with two intel quad core cpus is in a very
good price/performance-point. Obviously you'll need to match the cpus to
your load, you may need more cpu-cores.

> Its pretty old (2003) but is it still accurate? if this statement is
> accurate how would it affect connection pooling software like pg_pool?

It just keeps the process alive as long as the connection isn't closed,
nothing fancy or worrisome going on there. That's just the behavior I'd
expect at the connection pool-level.

> RAM? The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.

More is better, but don't waste your money on it if you don't need it,
if your (the active part of your) database is smaller than the RAM,
increasing it doesn't do that much. I would be especially careful with
configurations that require those very expensive 4GB-modules.

> Disks? standard Raid rules right? 1 for safety 5 for best mix of
> performance and safety?

Make sure you have a battery backed controller (or multiple), but you
should consider raid 10 if you have many writes and raid 5 or 50 if you
have a read-heavy environment. There are also people reporting that it's
faster to actually build several raid 1's and use the OS to combine them
to a raid 10.
Be careful with the amount of disks, in performance terms you're likely
better off with 16x 73GB than with 8x 146GB

> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

I'd consider only SAS (serial attached scsi, the successor of scsi) for
a relatively small high performance storage array. Fibre channel is so
much more expensive, that you'll likely get much less performance for
the same amount of money. And I'd only use sata in such an environment
if the amount of storage, not its performance, is the main metric. I.e.
for file storage and backups.

Best regards,

Arjen
4) Arjen van der Meijden Re: Problems with + 1 million record table
| +1 vote
Is that the entire query? Are you sure you really want to select the entire table without having a...
PostgreSQL - Performance
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On 5-10-2007 16:34 Cláudia Macedo Amorim wrote:
> [13236.470] statement_type=0, statement='select
> a_teste_nestle."CODCLI",
> a_teste_nestle."CODFAB",
> a_teste_nestle."CODFAMILIANESTLE",
> a_teste_nestle."CODFILIAL",
> a_teste_nestle."CODGRUPONESTLE",
> a_teste_nestle."CODSUBGRUPONESTLE",
> a_teste_nestle."CONDVENDA",
> a_teste_nestle."DATA",
> a_teste_nestle."DESCRICAO",
> a_teste_nestle."PESO",
> a_teste_nestle."PRACA",
> a_teste_nestle."PUNIT",
> a_teste_nestle."PVENDA",
> a_teste_nestle."QT",
> a_teste_nestle."QTITVENDIDOS",
> a_teste_nestle."QTPESOPREV",
> a_teste_nestle."QTVENDAPREV",
> a_teste_nestle."SUPERVISOR",
> a_teste_nestle."VENDEDOR",
> a_teste_nestle."VLVENDAPREV"
> from a_teste_nestle
>  
> '

Is that the entire query? Are you sure you really want to select the
entire table without having a where-clause? That's normally not a very
scalable aproach...

Best regards,

Arjen
5) Arjen van der Meijden Re: [Xapian-discuss] Processing the Xapian results
| +1 vote
Afaik you pre-process your form-input data using php? In that case you can construct a list of all...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Afaik you pre-process your form-input data using php?

In that case you can construct a list of all forums a user is allowed to
see. That list can be easily combined with the forums the user wants to
include in the search.

On the indexing-site you can add a boolean-term indicating the topic's
forumid.

And in the search-code you can than add a boolean-OR-construction for
that list of forums a user may see. (in SQL-terms you'd add something
like: AND topic.forumId IN (f1, f2, f3) which you'll have to expand to:
AND (topic.forumId = f1 OR topic.forumId = f2 OR topic.forumId = f3)).

Best regards,

Arjen


Simon de la Court wrote:
> Dear Xapian list members,
>
> My first post ever on this list, so please don't go hard on me ;)
>
> I have recently started to work with Xapian, I run a forum for an
> opensource game project, and we have trouble with searching in our
> topics. MySQLs fulltext does not give the right results, very easy to
> setup though, and Google did not index older topics. So I read the
> xapian docs and it looked pretty cool to me, a lot more precise than
> fulltext and can index everything I want.
>
> But I've got a problem, indexing is easy (well, not entirely, instead of
> indexing single messages I have to index whole topics). Not everybody is
> allowed to see every topic. And topics are in different subfora. And I
> have not yet found a way to get Xapian showing me only the topics I
> want, so only the topics for one subforum (if asked) and only the topics
> my visitor is allowed to see. Xapian just gives me the results of all
> the results found.
>
> The other option for me is to process these results in PHP, but that
> would take quite some processing power, and would require caching the
> results, because I want these to be spread over several pages.
>
> Has anyone here done a similar implementation of Xapian? I hope you can
> help me out.
>
> Thanks in advance and greetings,
>
> Simon de la Court
>
> _______________________________________________
> Xapian-discuss mailing list
> [email protected: Xapian-di...@lists.xapian.org]
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss

spacer
Profile | Posts (92)
Home > People > Arjen van der Meijden