Why should we adopt ColumnStore as new Datawarehouse ?

Finally Datawarehouse is set free from the bondage of slow colloquial relational databases which do not scale quite well with growing number of multiple customers and multiple dimensions and with rising demand of trending historical data.

RDBMS not suitable for Analytics :

  • Huge Overhead of – Record Level Locking
  • Lacks notion of implicit ordering
  • Overhead of buffer pool is too big
  • Overhead of latching threads
  • Write-ahead Logging
  • No support for ROLAP
  • No notion of Shared Nothing Grid Based Architecture

select count(source), stage, dayOfWeek, owner group by source, stage, dayOfWeek, owner.  The SQL statement only access 4 columns.

In a Row Store, the database has to read the ENTIRE table from disk (say 5 GB).

In a Column Store, the database only reads the columns it needs from disk (100 MB).

There is no dynamic rules for sharding and partitioning. Its all done once through schema creation process. http://www.nicholasgoodman.com/bt/blog/2011/01/31/column-store-101/

CREATE TABLE  Sale (
tenantId int NOT NULL,
orderId int NOT NULL,
… omitting for …
PARTITION BY LIST(tenantId) (
PARTITION t1 VALUES IN (1) ENGINE=InnoDB,
PARTITION t2 VALUES IN (2) ENGINE=InnoDB,
PARTITION t3 VALUES IN (3) ENGINE=InnoDB,

We can completely get rid of Fact-2-Fact Joins , Snow Flake Schema, Monolithic Views (which are major roadblocks of performance in RDBMS based Datawarehouse).  ColumnStores is all about partitioning data sets into logical time-series in pre-aggregated tables. For real-time query and historical trending ColumnStore prove to be fast and scalable through add-on layers.

Actian is leading the edge through its award-winning ‘Real-time analytics modules/add-ons top of its fasted Columnar DB’

Lets see the innovations Columnar DB thrive on :

Very fast query on highly-selective multi-dimensional data through >>

  • Powerful Vectorisation
  • Hash join/Aggregation
  • Compression
  • Star Join Optimization
  • Fast Intersect/ Union/Inner/ Merge – joins.
  • Dimension history tracking,
  • More cpu-bound & less io-bound,
  • Point-in-time Query.
  • Fast bulk loading in parallel,
  • Data-streaming, page multi-versioning,
  • Hot/incremental backup, upsert/merge,
  • User-defined transformations

Of course, only few Analytics Datastores (Vectorwise, ParAccel, InfoBright) can boast of innovative feature-set. There ares still few open areas like correlated and paralleled queries, implicit support for timezones and encryption.

This blogpost summarizes the analytics offerings based on market segments : http://blogs.actian.com/ketankaria/2012/06/23/vectorwise-trends/   

This entry was posted in Columnar Datastore, NoSQL and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s