Grokbase
x

Re: TB-sized databases

View PostFlat  Thread  Threaded | < Prev - Next >
Oleg Bartunov Re: TB-sized databases
| +1 vote
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
We have several TB database in production and it works well on
HP rx1620 dual Itanium2, MSA 20, running Linux. It's read-only storage for
astronomical catalogs with about 4-billions objects. We have custom
index for spherical coordinates which provide great performance.

Oleg
On Mon, 26 Nov 2007, Peter Koczan wrote:

> Hi all,
>
> I have a user who is looking to store 500+ GB of data in a database
> (and when all the indexes and metadata are factored in, it's going to
> be more like 3-4 TB). He is wondering how well PostgreSQL scales with
> TB-sized databases and what can be done to help optimize them (mostly
> hardware and config parameters, maybe a little advocacy). I can't
> speak on that since I don't have any DBs approaching that size.
>
> The other part of this puzzle is that he's torn between MS SQL Server
> (running on Windows and unsupported by us) and PostgreSQL (running on
> Linux...which we would fully support). If any of you have ideas of how
> well PostgreSQL compares to SQL Server, especially in TB-sized
> databases, that would be much appreciated.
>
> We're running PG 8.2.5, by the way.
>
> Peter
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

  Regards,
   Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected: o...@sai.msu.su], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Thread : TB-sized databases
1)
Peter Koczan Hi all, I have a user who is looking to store 500+ GB of data in a database be more like 3-4 TB)....
2)
Pablo Alcaraz I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply...
3)
Peter Koczan Thanks all. This is just what I needed. ...
4)
Pablo Alcaraz it would be nice to do something with selects so we can recover a rowset on huge tables using a...
5)
Bill Moran In response to Matthew <matthew@flymine.org>: Is there something wrong with: set enable_seqscan =...
6)
Bill Moran In response to Csaba Nagy <nagy@ecircle-ag.com>: If that's true, then I have a bug report to file:...
7)
Bill Moran In response to Gregory Stark <stark@enterprisedb.com>: Ah. I misunderstood the intent of the...
8)
Matthew Wakeling The query planner is not always right. I would like an option like "set enable_seqscan = off" but...
9)
David Lang and even better if the option can be overridden for a specific transaction or connection. that way...
10)
Simon Riggs Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people...
11)
Tom Lane The units are not the problem. The problem is that you are staking non-failure of your application...
12)
Stephen Frost I'm not convinced you've outlined the consequences of implementing a plan cost limit sufficiently....
13)
Robert Treat If the whole performance of your system depends upon indexed access, then maybe you need a database...
14)
Michael Stone OTOH, the planner can really screw up queries on really large databases. IIRC, the planner can use...
15)
Matthew Wakeling Oo, oo, I have one! So, this query bit us a while back. We had two tables being joined together in...
16)
Matthew Wakeling Very cool. Would that be a planner cost estimate fix (so it avoids the merge join), or a query...
17)
Tom Lane I've posted a proposed patch for this:...
18)
Michael Stone Yeah, the trick is to get it to a digestable test case. The basic scenario (there are more tables &...
19)
Ron Mayer If I read this right, I've got quite a few cases where the planner expects 1 row but gets over...
20)
Tom Lane Ah; I missed the fact that the two relations you want to join first don't have any connecting WHERE...
21)
Robert Treat If you want to eat peas, and someone suggests you use a knife, can I only argue the validity of...
22)
Trevor Talbot Isn't that what statement_timeout is for? Since this is entirely based on estimates, using...
23)
Pablo Alcaraz Thanks. That would be nice too. I want that Postgres does not fall so easy to do sequential scan if...
24)
alvherre Pablo Alcaraz escribió: Your example does not work, so I created my own for your first item....
25)
Russell Smith If that's true, then you want to get behind the work Gokulakannan Somasundaram done with relation...
26)
Stephen Cook I think either would work; both PostgreSQL and MS SQL Server have success stories out there running...
27)
Simon Riggs All of those responses have cooked up quite a few topics into one. Large databases might mean text...
28)
Matthew Wakeling You mean: Be able to tell Postgres "Don't ever do a sequential scan of this table. It's silly. I...
29)
Csaba Nagy Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want...
30)
Csaba Nagy I didn't mean that it can't be set per session, I meant that it is not fine grained enough to...
31)
Matthew Wakeling Yes please. The more options, the better. I think this is something that the average person should...
32)
David Lang arbitrary numbers are fine if they are relativly consistant with each other. will a plan with a...
33)
Gregory Stark Hm, that's only kind of true. Since 8.mumble seq_page_cost is itself configurable meaning you can...
34)
Simon Riggs I think you have a point, but the alternative is often much worse. If an SQL statement fails...
35)
Simon Riggs Robert, That sounds like a request for hints, which is OT here, ISTM. The issue is that if somebody...
36)
Tom Lane Indeed, and if you've got examples where it's that far off, you should report them. regards, tom...
37)
Tom Lane Hmm. IIRC, there are smarts in there about whether a mergejoin can terminate early because of...
38)
Tom Lane Cost estimate fix. Basically what I'm thinking is that the startup cost attributed to a mergejoin...
39)
Ron Mayer I think the reason it's not picking it was discussed back in this thread too....
40)
Ron Mayer Interesting.... I think Simon mentioned last time that this type of query is quite common for...
41)
Ron Mayer One final thing I find curious about this is that the estimated number of rows is much closer in...
42)
Csaba Nagy It would still be useful in the sense that if the planner is taking wrong estimates you must...
43)
Csaba Nagy There's an important difference to statement_timeout: this proposal would avoid completely taking...
44)
Mark Kirkwood Knowing how to set it is a problem - but a possibly bigger one is that monster query crippling your...
45)
Gregory Stark This does kind of the opposite of what you would actually want here. What you want is that if you...
46)
Jim Nasby Note that in cases of very heavy skew, that won't work. It only adds 10M to the cost estimate for a...
47)
Pablo Alcaraz I am dealing with a very huge database. I am not sure if all these things could be solved with the...
48)
Simon Riggs OK, I agree with this one. I'd thought that index-only plans were only for OLTP, but now I see they...
49)
Simon Riggs Hmm, well I proposed that in Jan/Feb, but I'm sure others have also. I don't think its practical to...
50)
Joshua D. Drake Well I can't speak to MS SQL-Server because all of our clients run PostgreSQL ;).. I can tell you...
51)
Oleg Bartunov We have several TB database in production and it works well on HP rx1620 dual Itanium2, MSA 20,...
52)
Luke Lonergan Hi Peter, If you run into a scaling issue with PG (you will at those scales 1TB+), you can deploy...
spacer
View PostFlat  Thread  Threaded | < Prev - Next >