User Information
| Display Name: | Oleg Bartunov |
|---|
| Partial Email Address: | o...@sai.msu.su |
| Posts: |
|
| 1) Oleg Bartunov Re: [GENERAL] longest prefix match |
|
|
| you can maintain an additional index for terms backwards. Regards, Oleg... |
|
|
|
|
|
|
|
On Wed, 20 Feb 2008, Jorge Godoy wrote:
> Em Wednesday 20 February 2008 05:55:07 Dragan Zubac escreveu: >> >> Anybody got any ideas/experiences/links for 'longest prefix match' >> solution in PostgreSQL ? >> Basically,put some telephone prefices in some kind of trie,and be able >> to perform fast lookups ? > > Prefix or suffix? > > For prefix you can use "SELECT number FROM table WHERE number LIKE '123%'". > > For suffix you change the "%" to the beginning of the string, but then loose > the ability to use indices. (Unfortunately, using suffixes is really > interesting for caller IDs since you don't always receive area code, country > code, etc.)
you can maintain an additional index for terms backwards. >> Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/phone: +007(495)939-16-83, +007(495)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
|
|
|
| 2) Oleg Bartunov Re: [GENERAL] DB design: How to store object properties? |
|
|
| Max, we use contrib/hstore specially designed for such kind of problem. It's a sort of perl's hash,... |
|
|
|
|
|
|
|
Max, we use contrib/hstore specially designed for such kind of problem. It's a sort of perl's hash, where you can store all specific properties. In that way, your table will looks like create table objects ( id integer, x real, y real, ...., props hstore) Here '...' designates other mandatory fields you want to be searched. Oleg On Sun, 17 Feb 2008, Maxim Khitrov wrote: > Greetings everyone,>> I'm fairly new to PostgreSQL, but I'm currently doing some research on> tools to be used for an upcoming project and could really use your> help with a possible database design.>> So the scenario is this. We have two projects starting that will deal> heavily with mapping spatial regions. One of the reasons I'm looking> at PostgreSQL is the PostGIS extension that may help us in dealing> with all the geometry. The first project will be mapping the outside> environment. Think along the lines of Google Maps, but intended for> use by pedestrians rather than cars. The other project, one that I'll> be managing, will be doing the same thing but with interiors of> buildings. For now, the two projects will be separate, but eventually> the plan is to merge them and actually allow the system to plan paths> that incorporate outside and interior maps.>> That's the project description in a nutshell. What I'm doing now is> trying to figure out the best way to store all the spatial> information. I want to have a foundational layer that is as simple as> possible, and that can later be extended for any special needs.>> The current plan is to do all the mapping via a graph structure using> nodes/vertices and edges. A node will be defined simply in terms of a> GUID (UUID in PostgreSQL). Each node will have a unique id, but no> other "mandatory" information. An edge is defined by the ids of its> two end points. This setup will be common to both projects, and the> use of UUIDs will ensure that any database merges in the future will> not result in conflicts.>> Now here's the tricky bit. We have to be able to associate an> arbitrary number of properties with each node and edge. So an example> of a node property would be that node's position defined by a> latitude/longitude pair. We will define all the possible properties> between our two projects, but just because a property is valid,> doesn't mean that it will be present in or be relevant to every node> in the system. For instance, nodes located inside of a building will> likely have their position defined by something other than latitude> and longitude, since capturing that data with a GPS will not be> possible.>> The simplest design would be to create two tables, one for nodes> another for edges, and create a column for every possible property.> This, however, is huge waste of space, since there will not be a> single node or edge that will make use of all the defined properties.> There may be hundreds of properties, but each node may use ten on> average. That's the question - how do you represent this information> in the database in a space-efficient manner. We still have to be able> to search for specific nodes given their properties. For example ÿÿ> find all the nodes located within radius z of coordinate x,y. PostGIS> will handle the special bit, but there has to be an efficient way of> accessing this information/property for each node that has it.>> One other possibility I thought of was defining a master node table> that would contain just the node ids. Then for each property I would> create a separate table with the id column referencing the master.> Here we're still wasting space for the extra storage of UUIDs, and if> I want to run a query that selects nodes with several different> properties, it would require a long list of JOINs.>> I looked at table inheritance that PostgreSQL offers, but it turned> out to be something different from what I was hoping for. What I need> is some sort of data inheritance based on a common unique key, but I> don't think that's possible. Same thing with partitioning. If there> was a way to partition the columns (with data) of a single large> table, then that would be an ideal solution. But in the absence of> that feature, what design would you recommend?>> Thanks,> Max>> ---------------------------(end of broadcast)---------------------------> TIP 1: if posting/reading through Usenet, please send an appropriate> subscribe-nomail command to [email protected: majo...@postgresql.org] so that your> message can get through to the mailing list cleanly> Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/phone: +007(495)939-16-83, +007(495)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
|
|
|
| 3) Oleg Bartunov Re: [GENERAL] a newbie question on table design |
|
|
| If you can select stable structure (common columns) and additional columns than you can : 1. Use... |
|
|
|
|
|
|
|
If you can select stable structure (common columns) and additional columns than you can : 1. Use base class (table) with common columns + inherited tables with their own additional columns 2. We use contrib/hstore as a storage for semistructured data - we store additional columns as a key->value pairs in hstore data type. This is very flexible design. On Fri, 15 Feb 2008, Leif B. Kristensen wrote: > On Friday 15. February 2008, [email protected: fdu.x...@gmail.com] wrote:>> Hi all,>>>> I have a large sdf file with many records of molecules and associated>> data items and I want to save them in a PostgreSQL database. There are>> about less than 40 data items for every molecule(the names of the data>> items fore each molecule are the same, but values differ). The number>> of molecules may exceed 20 million.>>>> Now I have come up with two ways to construct the table:>>>> 1) a table with about 40 columns, every data item has its>> corresponding column, and one molecule corresponds to one row in the>> table.>>>> This is direct and simple. The drawbacks is if I want to add more data>> types to the database, I have to modify the structure of the table.>>>> 2) a table with just 3 columns:>>>> CREATE TABLE mytable(>> id serial,>> data_name text,>> data_value text>> );>>>> Then a single molecule will corresonds to about 40 rows in the>> database.>> This is a sound concept, but I'd rather store the data_name in a> separate table with an integer key, and replace data_name in mytable> with a data_name_fk REFERENCES data_names (data_name_id). That's just> Occam's Razor applied to database design, aka first normal form.>> You'd probably store the name of the molecule in a third table. Then you> have a model very similar to the classic 'book database' where a book> can have multiple authors, and an author can have multiple books. There> are examples for this design all over the place.>>> If I need to add more data types to the table, I just need to add new>> rows with new "data_name" column values. The drawback of this table is>> it has too many rows(40 times of the former one) and waste a lot>> space.>>>> Which one is better, or there are some other smart ways ?>>>> I have another question. Since the data I need to save is huge, is it>> appropriate that I save the data value in compressed format ?>> That sounds a lot like premature optimization. Postgres is actually> quite good at compacting data natively.> Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/phone: +007(495)939-16-83, +007(495)939-23-83
|
|
|
| 4) Oleg Bartunov Re: [GENERAL] Working with huge amount of data. RESULTS! |
|
|
| suffix tree (array) would speedup '%keyword%' query, but currently it doesn't supported by GiST... |
|
|
|
|
|
|
|
On Tue, 12 Feb 2008, Mario Lopez wrote:
> Hi!, > > I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following > results: > > # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from > table1 where varchar_reverse(data) like varchar_reverse('%keyword');" > real 0m0.055s > user 0m0.011s > sys 0m0.006s > > > # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from > table1 where data like 'keyword%';" > real 0m0.026s > user 0m0.012s > sys 0m0.006s > > It works flawlesly as you can see by the timings, take in consideration that > "table1" has 100 million records. The only problem is generating the reversed > index which takes like 20 minutes, I guess it has to do with the plperl > function, perhaps a C function for inverting would make it up in less time. > > The problem is still with the LIKE '%keyword%', my problem is that I am not > searching for Words in a dictionary fashion, suppose my "data" is random > garbage, that it has common consecutive bytes. How could I generate a > dictionary from this random garbage to make it easier for indexing?
suffix tree (array) would speedup '%keyword%' query, but currently it doesn't supported by GiST extension architecture (we have it in our TODO), so I see several ways (not tested): 1. try contrib/pg_trgm to reduce a number of candidate words 2. generate all possible substrings and use your tested approach >>> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote:>> >>> SELECT * FROM names WHERE name LIKE keyword%>>> Or>>> SELECT * FROM names WHERE name LIKE %keyword%>>> >> >> check this:>> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/>> and this:>> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/>> >> depesz>>>> >>> ---------------------------(end of broadcast)---------------------------> TIP 9: In versions below 8.0, the planner will ignore your desire to> choose an index scan if your joining column's datatypes do not> match> Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/phone: +007(495)939-16-83, +007(495)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
|
|
|
| 5) Oleg Bartunov Re: [GENERAL] [pgsql-advocacy] PostgreSQL Certification |
|
|
| Can you show us the goals of the PostgreSQL Certification ? I always voted for the united... |
|
|
|
|
|
|
|
Can you show us the goals of the PostgreSQL Certification ? I always voted for the united PostgreSQL Certification program (amin, developer) we could promote with the help of commercial companies. In my opinion, common certificate, valid in all countries will be much more useful than buttons. We have several good authors who can be sponsored to write certification courses with the help of developers. Oleg On Wed, 30 Jan 2008, Joshua D. Drake wrote: > -----BEGIN PGP SIGNED MESSAGE-----> Hash: SHA1>> Hey guys,>> Myself and a small team of PostgreSQL contributors have started a new> community project for PostgreSQL Certification. It is just launching> but we wanted to get it out there so that people can join in on the> discussion now :).>> For more information please visit:>> http://www.postgresqlcertification.org/>> Joshua D. Drake>> - --> The PostgreSQL Company: Since 1997, http://www.commandprompt.com/> PostgreSQL Community Conference: http://www.postgresqlconference.org/> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate> PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit>> -----BEGIN PGP SIGNATURE-----> Version: GnuPG v1.4.6 (GNU/Linux)>> iD8DBQFHoPdMATb/zqfZUUQRAqhlAJ92rMzYpn+k4rGDXpd4WiZwJQcBNACfWNeg> 0zPBFRb4yc6Idpj99PCcFbY=> =Spdr> -----END PGP SIGNATURE----->> ---------------------------(end of broadcast)---------------------------> TIP 5: don't forget to increase your free space map settings> Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/phone: +007(495)939-16-83, +007(495)939-23-83
|
|
|
|
 | |