Grokbase
Topics Posts Groups | in
x
[ help ]

Richard Huxton (d...@archonet.com)

Profile | Posts (3733)

User Information

Display Name:Richard Huxton
Partial Email Address:d...@archonet.com
Posts:
3733 total
21 in PostgreSQL - Admin
65 in PostgreSQL - Advocacy
98 in PostgreSQL - Bugs
24 in PostgreSQL - Docs
2312 in PostgreSQL - General
10 in PostgreSQL - Novice
54 in PostgreSQL - ODBC
408 in PostgreSQL - Performance
2 in PostgreSQL - PHP
822 in PostgreSQL - SQL

5 Most Recent

All Posts
1) Richard Huxton Re: [GENERAL] ts_headline
| +1 vote
What does this give you: show default_text_search_config; I get pg_catalog.english and the same...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Stephen Davies wrote:
> I just spotted the difference between your test and mine.
>
> My query says:
>
> select ts_headline(abstract,to_tsquery('english','database'),'minWords = 99,
> maxWords = 999') from document where id=21;
>
> where your equivalent does not include the 'english' arg.
>
> If I take out the 'english' from this query, I get the same result as you.

What does this give you:
   show default_text_search_config;
I get pg_catalog.english and the same result for the query whether I use:
    to_tsquery('english','database')
or to_tsquery('pg_catalog.english','database')

Could you be picking up a bad "english" configuration (see \dF)?

> However, the following returns zero rows:
>
> select title,author,ts_headline(abstract,to_tsquery('database') from document
> where clob @@ to_tsquery('database')

I take it "clob" matches "abstract"?

> It gets more interesting:
>
> select title,author,ts_headline(abstract,to_tsquery('database') from document
> where clob @@ to_tsquery('english','database')
>
> returns the "correct" result - one row with the expected headline.

Now that *is* strange. ts_headline() works without specifying 'english'
but the actual search works the other way.

> select title,author,ts_headline(abstract,to_tsquery('english','thesaurus')
> from document where clob @@ to_tsquery('english','thesaurus')
>
> also returns the "correct" result.
>
> I suggest that the above indicates a bug somewhere.

Could be - it'd be good to rule out a bad config. You might have an
unexpected list of stopwords or similar.

Let's try:
  SELECT ts_debug('the database and thesaurus');
  SELECT ts_debug('english', 'the database and thesaurus');
  SELECT ts_debug('pg_catalog.english', 'the database and thesaurus');
I'd expect "the", "and" to be stripped out as stopwords and the other
two to get through (database stemmed to "databas").


--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [email protected: majo...@postgresql.org] so that your
message can get through to the mailing list cleanly
2) Richard Huxton Re: [GENERAL] ts_headline
| +1 vote
Well, no-one has been using 8.3 for more than a few weeks, so I don't think anyone has a lot of...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Stephen Davies wrote:
> Interesting. I hadn't seen that section before.
>
> As I said in my original post: "Is this a bug or am I missing some
> configuration option".
>
> I shall investigate the stuff in 12.8.
> Any suggestions as to where to start?

Well, no-one has been using 8.3 for more than a few weeks, so I don't
think anyone has a lot of experience with it.

I'm getting to_tsquery('database') stemming to "databas" and
to_tsvector(<your text>) gives me the following for that token:

'databas':305,630,642,663,698,719,746,870,872,951,961,974,993,1034,1042,1159,1217,1223,1238,1244,1265,1343,1357,1399,1434,1758,1818,1821,1834,2212,2240,2258,2278,2389,2529


--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
3) Richard Huxton Re: [GENERAL] Querying the schema for column widths - what syntax do I use?
| +1 vote
Unless you're after PG-specific stuff, it's probably best to use the information-schema....
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Howard Wilkinson wrote:
> I am working on some upgrades to the MyDNS open source product. I have
> some expertise in MySQL but am not overly familiar with PostgreSQL and
> need some guidance on how to query the schema for the maximum size of
> data a column can hold.

Unless you're after PG-specific stuff, it's probably best to use the
information-schema.

http://www.postgresql.org/docs/8.3/static/information-schema.html

This has a standard layout cross-database, so will help you to extend
you app across other systems. I think it's supported in the latest
version of MySQL too.

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
match
4) Richard Huxton Re: [GENERAL] ts_headline
| +1 vote
Seems OK here - might need to look at your configuration settings....
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Stephen Davies wrote:
> Attached is the "document" in question.
>
> Searches for "norwegian", "thesaurus" and "statement" give good results. A
> search for "database" gives the plain text from the beginning.

Seems OK here - might need to look at your configuration settings.
http://www.postgresql.org/docs/8.3/static/textsearch-debugging.html

I'll make sure I've got a clean setup here and re-run the test.


SELECT ts_headline(t, to_tsquery('database')) FROM tsearch_test;
                                               ts_headline
-------------------------------------------------------------------------------------------------------
<b>database</b> (using a 2 KB page) to a Large File Support (LFS)
<b>database</b> (using an 8 KB page
(1 row)

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
5) Richard Huxton Re: [GENERAL] ts_headline
| +1 vote
Hmm - a simple test seems to work OK. SELECT ts_headline( repeat('apple banana carrot ', 100),...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Stephen Davies wrote:
> G'day Richard.
>
> I don't think so. A sample command is:
>
> ts_headline(abstract,to_tsquery('english','database'),'minWords = 99, maxWords
> = 999')
>
> I have also tried with smaller maxwords without any visible effect.

Hmm - a simple test seems to work OK.

SELECT ts_headline( repeat('apple banana carrot ', 100),
to_tsquery('apple'));
                                                               ts_headline
----------------------------------------------------------------------------------------------------------------------------------------
<b>apple</b> banana carrot <b>apple</b> banana carrot <b>apple</b>
banana carrot <b>apple</b> banana carrot <b>apple</b> banana carrot
(1 row)

It's not just the start of the text either:

SELECT ts_headline( repeat('elephant ', 100) || repeat('apple banana
carrot ', 100), to_tsquery('apple'));
                                                               ts_headline
----------------------------------------------------------------------------------------------------------------------------------------
<b>apple</b> banana carrot <b>apple</b> banana carrot <b>apple</b>
banana carrot <b>apple</b> banana carrot <b>apple</b> banana carrot
(1 row)

Can you provide a piece of text that shows the problem?

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

spacer
Profile | Posts (3733)
Home > People > Richard Huxton