This page describes the relative benefits and trade-offs of column stores
versus row stores:
http://en.wikipedia.org/wiki/Column-oriented_DBMS#Benefits
On Fri, May 24, 2013 at 12:18 PM, wrote:
Hi,
When I run a select query on data in parquet with ~50 million rows and 10
columns I get much worse performance as I select more columns in the row.
Suppose the following query returns 3 rows:
select a from table where a = 12345;
This query returns in 2 seconds. Then if I query:
select a, b from table where a = 12345;
the query returns in 4 seconds and so on. Is this expected behaviour as
parquet is a columnar store? Is there a way to optimise this?
Hi,
When I run a select query on data in parquet with ~50 million rows and 10
columns I get much worse performance as I select more columns in the row.
Suppose the following query returns 3 rows:
select a from table where a = 12345;
This query returns in 2 seconds. Then if I query:
select a, b from table where a = 12345;
the query returns in 4 seconds and so on. Is this expected behaviour as
parquet is a columnar store? Is there a way to optimise this?