Bradley wrote:
Hi
I've decided to use xapian because my files table in my mysql database is going
to grow very large, and it seems mysql isn't good at full text searching. I'm
doing this with the php wrapper by the way.

The way my system is set out, each user has their own set of files, and when
doing a search it is going to be for a specific user's file (based on file
name, title, description). Although at some point we may decide we want
functionality to search for files for a list of users or all users.

I was planning on having a xapian database for each user's files. Would it be
better this way (multiple databases), or to have on large database for all
users files, as I'm doing with mysql. I'm thinking mainly with regard to
performance, feel free to add other thoughts.

Thanks
Bradley

If I were doing it, I'd do it your way. Searching a single DB will most
likely be faster. Once you allow your users to search multiple DBs you
can evaluate performance and see if merging them makes sense.

Consider
1. Are the searches fast enough (of multiple DBs)?
2. How often are multiple DBs searched?

If you need to merge them, there is a utility, xapian-compact,
(http://xapian.org/docs/admin_notes.html#merging-databases) that will do
it for you with a minimum of effort.

You didn't ask, but here are a few things to consider.

1. Xapian searches will not be looking at realtime data. It takes a
finite amount of time to add new entries. The larger the database, the
longer it will take to index new entries.
1.1. Be sure to have something in the database that either says "This
row has been added to Xapian" or have a field with a last changed
timestamp. Periodically add new entries to the Xapian DB by comparing
times or select on the "is_added" field.
2. Consider ping ponging two Xapian DBs when updating. I use the
following logic.
I have two directories with Xapian DBs. A and B.
If A is older than B
copy contents of B into A
else
copy contents of A into B
add new entries to the copy
if the copy is A
rm C
ln -s A C
if the copy is B
rm C
ln -s B C

where C is the database that I am using to search.

Jim.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupxapian-discuss @
categoriesxapian
postedNov 21, '08 at 2:29a
activeNov 30, '08 at 8:50p
posts6
users4
websitexapian.org
irc#xapian

People

Translate

site design / logo © 2021 Grokbase