Grokbase
Topics Posts Groups | in
x
[ help ]

Olly Betts (o...@survex.com)

Profile | Posts (593)

User Information

Display Name:Olly Betts
Partial Email Address:o...@survex.com
Posts:
593 total
1 in PostgreSQL - General
592 in Xapian

5 Most Recent

All Posts
1) Olly Betts Re: [Xapian-discuss] simple frequency list
| +1 vote
There's a wiki page for contributing examples: http://wiki.xapian.org/SampleCode FWIW, I think it...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Thu, Mar 13, 2008 at 09:24:47AM +0530, s|s wrote:
> I wrote a python script for finding frequency of terms in database.
> Thought of sharing since it could would come handy for introspecting
> database. Possibly it could be included as an example to python
> bindings..

There's a wiki page for contributing examples:

http://wiki.xapian.org/SampleCode

FWIW, I think it would be more natural to write the loop as:

    for t in database.allterms():
print t.termfreq, t.term

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
2) Olly Betts Re: [Xapian-discuss] Query time boosting
| +1 vote
Is this what you mean? http://wiki.xapian.org/FAQ#head-3c17843de2310bf942166d976b212acfce9ddc89...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Fri, Mar 14, 2008 at 03:12:48PM +0000, Robert Young wrote:
> Does Xapian support query time boosting?

Is this what you mean?

http://wiki.xapian.org/FAQ#head-3c17843de2310bf942166d976b212acfce9ddc89

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
3) Olly Betts Re: [Xapian-discuss] Spell Checking
| +1 vote
The spell checker suggests words, not records/documents. I can't tell you from the information...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Fri, Mar 14, 2008 at 04:38:59PM +0000, Martin Hearn wrote:
> Continuing my investigation with spelling, I have a record with this
> data: "Harry Potter and the Goblet of Fire"
>
> delve -r 1237832 full
> Term List for record #1237832: Zand Zfire Zgoblet Zharri Zjk Zof
> Zpotter Zrowl Zthe and fire goblet harry id1237832 jk of potter
> rowling the
>
> Should I expect the spell checker to suggest this record

The spell checker suggests words, not records/documents.

> if I search "Harry Porter Goblet" rather than "Harry Potter Goblet"
> (replacing a t  in potter with r)

I can't tell you from the information given.  I'd need to know what
spellings you've added.

If you can post a short example program which creates the index from
scratch and then shows the spell correction not working as you'd
expect, it would be a lot easier to understand what's going on.

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
4) Olly Betts Re: [Xapian-discuss] UTF-8 Corruption
| +1 vote
There are ways to detect the character set of a file, though not always 100% reliably. Most of...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Fri, Mar 14, 2008 at 11:14:56PM +0000, Colin Bell wrote:
> I was wondering if anyone every came across a problem I seem to be
> having. I'm indexing in text files using some basic code written in C+
> +. The text files may or may not be in UTF-8, ISO 8859-1 or possibly
> (but very rarely) even some other format - I have no way of knowing.

There are ways to detect the character set of a file, though not always
100% reliably.

> Question is, does Xapian convert none UTF-8 characters when it stores
> the document. I think I read that UTF-8 is the default encoding for
> Xapian, which is exactly what I am after.

Most of Xapian treats things as opaque data.  The classes which need
to know are Xapian::Stem, Xapian::QueryParser, and
Xapian::TermGenerator.  The UTF-8 parsing used by the latter two will
treat invalid sequences as if they were ISO-8859-1, which for
real-world examples will almost always do the right thing when fed
ISO-8859-1.  Xapian::Stem uses Snowball's UTF-8 parsing code currently -
I'm not sure how that handles invalid sequences.

> The reason I'm asking is that I am getting some seriously corrupted
> characters in the index. When they are displayed on Tomcat I get a
> "sun.io.MalformedInputException" when trying to display the search
> results. I have set the pages charset to UTF-8 and apparently this
> error is thrown when when the streamreader detects characters that are
> not proper UTF-8 characters.

If you set document data, document values, or directly add terms (using
Document::add_posting() or Document::add_term()) then you'll get back
what you put in verbatim.  So if you pass in something which is invalid
UTF-8, it will still be invalid.

If you pass data through Xapian::Utf8Iterator before doing anything with
it, then this will fix bad UTF-8.  This is essentially what omindex
does to deal with this problem.

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss
5) Olly Betts Re: [Xapian-discuss] Enquire set_cutoff problem
| +1 vote
Yes, this isn't totally ideal but is a consequence of how SWIG implements overloading by type in...
Xapian
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Mon, Mar 17, 2008 at 12:42:21PM +0100, Donato Di Leo wrote:
> I found out the problem ...
> I wrote the cutoff value as decimal value like 70.0 and in this way the
> procedure doesn't work

Yes, this isn't totally ideal but is a consequence of how SWIG
implements overloading by type in PHP, which doesn't support such
overloading directly.

Perhaps an integer parameter should accept either an integer or floating
point argument if there are no ambiguous overloads (as here).

> If I write 70 (integer value) the procedure works fine

I'm glad you've resolved your problem, but as a general tip, please
report code that doesn't work *exactly as you tried it*.  All we have
to go on when you report a problem is the information you give, so if
that isn't correct, it is so much harder to help.

You said you had tried this:

$enquire->set_cutoff(70);

If instead you'd said you'd tried it with 70.0, we'd have been able to
resolve your problem right away.

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
[email protected: Xapian-di...@lists.xapian.org]
http://lists.xapian.org/mailman/listinfo/xapian-discuss

spacer
Profile | Posts (593)
Home > People > Olly Betts