Grokbase
Topics Posts Groups | in
x
[ help ]

Oleg Bartunov (o...@sai.msu.su)

Profile | Posts (716)

User Information

Display Name:Oleg Bartunov
Partial Email Address:o...@sai.msu.su
Posts:
716 total
1 in Perl 5 Porters
56 in PostgreSQL - Admin
74 in PostgreSQL - Advocacy
1 in PostgreSQL - Announce
6 in PostgreSQL - Bugs
11 in PostgreSQL - Docs
386 in PostgreSQL - General
21 in PostgreSQL - Interfaces
1 in PostgreSQL - JDBC
1 in PostgreSQL - Novice
60 in PostgreSQL - Performance
1 in PostgreSQL - PHP
113 in PostgreSQL - SQL

5 Most Recent

All Posts
1) Oleg Bartunov Re: [GENERAL] longest prefix match
| +1 vote
you can maintain an additional index for terms backwards. Regards, Oleg...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Wed, 20 Feb 2008, Jorge Godoy wrote:

> Em Wednesday 20 February 2008 05:55:07 Dragan Zubac escreveu:
>>
>> Anybody got any ideas/experiences/links for 'longest prefix match'
>> solution in PostgreSQL ?
>> Basically,put some telephone prefices in some kind of trie,and be able
>> to perform fast lookups ?
>
> Prefix or suffix?
>
> For prefix you can use "SELECT number FROM table WHERE number LIKE '123%'".
>
> For suffix you change the "%" to the beginning of the string, but then loose
> the ability to use indices. (Unfortunately, using suffixes is really
> interesting for caller IDs since you don't always receive area code, country
> code, etc.)

you can maintain an additional index for terms backwards.


>
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
2) Oleg Bartunov Re: [GENERAL] DB design: How to store object properties?
| +1 vote
Max, we use contrib/hstore specially designed for such kind of problem. It's a sort of perl's hash,...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Max,

we use contrib/hstore specially designed for such kind of problem. It's a
sort of perl's hash, where you can store all specific properties.
In that way, your table will looks like

create table objects ( id integer, x real, y real, ....,  props hstore)

Here '...' designates other mandatory fields you want to be searched.

Oleg
On Sun, 17 Feb 2008, Maxim Khitrov wrote:

> Greetings everyone,
>
> I'm fairly new to PostgreSQL, but I'm currently doing some research on
> tools to be used for an upcoming project and could really use your
> help with a possible database design.
>
> So the scenario is this. We have two projects starting that will deal
> heavily with mapping spatial regions. One of the reasons I'm looking
> at PostgreSQL is the PostGIS extension that may help us in dealing
> with all the geometry. The first project will be mapping the outside
> environment. Think along the lines of Google Maps, but intended for
> use by pedestrians rather than cars. The other project, one that I'll
> be managing, will be doing the same thing but with interiors of
> buildings. For now, the two projects will be separate, but eventually
> the plan is to merge them and actually allow the system to plan paths
> that incorporate outside and interior maps.
>
> That's the project description in a nutshell. What I'm doing now is
> trying to figure out the best way to store all the spatial
> information. I want to have a foundational layer that is as simple as
> possible, and that can later be extended for any special needs.
>
> The current plan is to do all the mapping via a graph structure using
> nodes/vertices and edges. A node will be defined simply in terms of a
> GUID (UUID in PostgreSQL). Each node will have a unique id, but no
> other "mandatory" information. An edge is defined by the ids of its
> two end points. This setup will be common to both projects, and the
> use of UUIDs will ensure that any database merges in the future will
> not result in conflicts.
>
> Now here's the tricky bit. We have to be able to associate an
> arbitrary number of properties with each node and edge. So an example
> of a node property would be that node's position defined by a
> latitude/longitude pair. We will define all the possible properties
> between our two projects, but just because a property is valid,
> doesn't mean that it will be present in or be relevant to every node
> in the system. For instance, nodes located inside of a building will
> likely have their position defined by something other than latitude
> and longitude, since capturing that data with a GPS will not be
> possible.
>
> The simplest design would be to create two tables, one for nodes
> another for edges, and create a column for every possible property.
> This, however, is huge waste of space, since there will not be a
> single node or edge that will make use of all the defined properties.
> There may be hundreds of properties, but each node may use ten on
> average. That's the question - how do you represent this information
> in the database in a space-efficient manner. We still have to be able
> to search for specific nodes given their properties. For example ÿÿ
> find all the nodes located within radius z of coordinate x,y. PostGIS
> will handle the special bit, but there has to be an efficient way of
> accessing this information/property for each node that has it.
>
> One other possibility I thought of was defining a master node table
> that would contain just the node ids. Then for each property I would
> create a separate table with the id column referencing the master.
> Here we're still wasting space for the extra storage of UUIDs, and if
> I want to run a query that selects nodes with several different
> properties, it would require a long list of JOINs.
>
> I looked at table inheritance that PostgreSQL offers, but it turned
> out to be something different from what I was hoping for. What I need
> is some sort of data inheritance based on a common unique key, but I
> don't think that's possible. Same thing with partitioning. If there
> was a way to partition the columns (with data) of a single large
> table, then that would be an ideal solution. But in the absence of
> that feature, what design would you recommend?
>
> Thanks,
> Max
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [email protected: majo...@postgresql.org] so that your
> message can get through to the mailing list cleanly
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
3) Oleg Bartunov Re: [GENERAL] a newbie question on table design
| +1 vote
If you can select stable structure (common columns) and additional columns than you can : 1. Use...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
If you can select stable structure (common columns) and additional columns
than you can :

1. Use base class (table) with common columns + inherited tables with their
    own additional columns
2. We use contrib/hstore as a storage for semistructured data - we store
    additional columns as a key->value pairs in hstore data type. This is
    very flexible design.

On Fri, 15 Feb 2008, Leif B. Kristensen wrote:

> On Friday 15. February 2008, [email protected: fdu.x...@gmail.com] wrote:
>> Hi all,
>>
>> I have a large sdf file with many records of molecules and associated
>> data items and I want to save them in a PostgreSQL database. There are
>> about less than 40 data items for every molecule(the names of the data
>> items fore each molecule are the same, but values differ). The number
>> of molecules may exceed 20 million.
>>
>> Now I have come up with two ways to construct the table:
>>
>> 1) a table with about 40 columns, every data item has its
>> corresponding column, and one molecule corresponds to one row in the
>> table.
>>
>> This is direct and simple. The drawbacks is if I want to add more data
>> types to the database, I have to modify the structure of the table.
>>
>> 2) a table with just 3 columns:
>>
>>    CREATE TABLE mytable(
>>    id              serial,
>>    data_name       text,
>>    data_value      text
>>    );
>>
>> Then a single molecule will corresonds to about 40 rows in the
>> database.
>
> This is a sound concept, but I'd rather store the data_name in a
> separate table with an integer key, and replace data_name in mytable
> with a data_name_fk REFERENCES data_names (data_name_id). That's just
> Occam's Razor applied to database design, aka first normal form.
>
> You'd probably store the name of the molecule in a third table. Then you
> have a model very similar to the classic 'book database' where a book
> can have multiple authors, and an author can have multiple books. There
> are examples for this design all over the place.
>
>> If I need to add more data types to the table, I just need to add new
>> rows with new "data_name" column values. The drawback of this table is
>> it has too many rows(40 times of the former one) and waste a lot
>> space.
>>
>> Which one is better, or there are some other smart ways ?
>>
>> I have another question. Since the data I need to save is huge, is it
>> appropriate that I save the data value in compressed format ?
>
> That sounds a lot like premature optimization. Postgres is actually
> quite good at compacting data natively.
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
4) Oleg Bartunov Re: [GENERAL] Working with huge amount of data. RESULTS!
| +1 vote
suffix tree (array) would speedup '%keyword%' query, but currently it doesn't supported by GiST...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, 12 Feb 2008, Mario Lopez wrote:

> Hi!,
>
> I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following
> results:
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where varchar_reverse(data) like varchar_reverse('%keyword');"
> real    0m0.055s
> user    0m0.011s
> sys     0m0.006s
>
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where data like 'keyword%';"
> real    0m0.026s
> user    0m0.012s
> sys     0m0.006s
>
> It works flawlesly as you can see by the timings, take in consideration that
> "table1" has 100 million records. The only problem is generating the reversed
> index which takes like 20 minutes, I guess it has to do with the plperl
> function, perhaps a C function for inverting would make it up in less time.
>
> The problem is still with the LIKE '%keyword%', my problem is that I am not
> searching for Words in a dictionary fashion, suppose my "data" is random
> garbage, that it has common consecutive bytes. How could I generate a
> dictionary from this random garbage to make it easier for indexing?

suffix tree (array) would speedup '%keyword%' query, but currently it doesn't
supported by GiST extension architecture (we have it in our TODO), so I see
several ways (not tested):

1. try contrib/pg_trgm to reduce a number of candidate words
2. generate all possible substrings and use your tested approach


>
>> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote:
>>
>>> SELECT * FROM names WHERE name LIKE keyword%
>>> Or
>>> SELECT * FROM names WHERE name LIKE %keyword%
>>>
>>
>> check this:
>> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
>> and this:
>> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/
>>
>> depesz
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
>      match
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
5) Oleg Bartunov Re: [GENERAL] [pgsql-advocacy] PostgreSQL Certification
| +1 vote
Can you show us the goals of the PostgreSQL Certification ? I always voted for the united...
PostgreSQL - General
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Can you show us the goals of the PostgreSQL Certification ?
I always voted for the united PostgreSQL Certification program
(amin, developer) we could promote with the help of commercial companies.
In my opinion, common certificate, valid in all countries will be much more
useful than buttons. We have several good authors who can be sponsored to
write certification courses with the help of developers.

Oleg
On Wed, 30 Jan 2008, Joshua D. Drake wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey guys,
>
> Myself and a small team of PostgreSQL contributors have started a new
> community project for PostgreSQL Certification. It is just launching
> but we wanted to get it out there so that people can join in on the
> discussion now :).
>
> For more information please visit:
>
> http://www.postgresqlcertification.org/
>
> Joshua D. Drake
>
> - --
> The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
> PostgreSQL Community Conference: http://www.postgresqlconference.org/
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFHoPdMATb/zqfZUUQRAqhlAJ92rMzYpn+k4rGDXpd4WiZwJQcBNACfWNeg
> 0zPBFRb4yc6Idpj99PCcFbY=
> =Spdr
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

spacer
Profile | Posts (716)
Home > People > Oleg Bartunov