Saturday, June 18, 2011

Columnar RDBMS, Gourmet Fast Food and Santa Claus

Boris Evelson of Forrester recently published a blog post titled It's The Dawning Of The Age Of BI DBMS (Database Management System). I took note that in this post he classified Vertica, ParAccel and Sybase IQ in a category he named ‘Columnar RDBMS (Relational DBMS)’, and that started off a friendly email exchange as to what the heck that really means.

I said: “RDBMS is tabular, by definition.”

...and Boris said: “To me if I can access something via SQL, it’s relational.”

Who’s right is a matter of perspective, I suppose. But technically, defining RDBMS by the existence of SQL access is incorrect. According to Wikipedia, the short definition of an RDBMS is a DBMS (Database Management System) in which data is stored in tables and the relationships among the data are also stored in tables. The data can be accessed or reassembled in many different ways without having to change the table forms.

Note that the word 'table' appears three times in the short definition of the term. SQL does not appear anywhere in the definition.

What's worse, columnar databases do not store data in tables - again, by definition. So, come to think of it, how can such a thing as a Columnar RDBMS even exist? (get the title yet?)

I suppose that the only thing Columnar RDBMS could technically describe is a database system that stores data in a column-oriented manner, yet still relies on the fundamental mechanisms of an RDBMS (tables, SQL, indexing, etc). In practicality, this means that each field is stored in its own table, with an additional field for correlation. But that is a side-ways implementation technique that is mainly practical for somewhat extending the lifetime of existing software assets that are reaching their scalability limits, and hardly deserves its own DBMS category.

I know we wouldn't describe ElastiCube as being an RDBMS (even though it supports SQL access) and I'm pretty sure Vertica wouldn't describe their technology as an RDBMS either.

The similarities between ElastiCube, Vertica and RDBMS are sufficiently described within the 4 letters D-B-M-S. The letter R is what differentiates between them.

SiSense refers to ElastiCube as a Columnar DBMS or Column-oriented DBMS and I think this describes Vertica equally well. These two databases are not similar in the way they work internally, but neither are SQL Server and Oracle - which are still both RDBMS.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

2 comments:

  1. Interesting. I've always thought of column-based data storage as just "a table turned on its side" (obviously in very simple terms) - and hence still a table - done for the purpose of allowing faster queries on tables that would in a row-based RDBMS have a LOT of rows, with the entirety of a row having to be read off disk to answer a query that only needed data from one column (eg select count (male) from gender where state = 'CA') - very fast in a columnar database like Sybase IQ, especially if bitmap indexed. Or Vertica for that matter.

    But in my mind, still a table... but of course, perhaps that begs the question of defining a table - and then relating that definition to an accepted definition of an RDBMS.

    And to throw some chilli sauce into the gourmet, in Greenplum a DBA can actually partition a table and then specify one partition to use row-based storage and another to use column-based - depending on what kind of queries are expected over (say) recent data vs historical data. To the user issuing SQL, it's just one table.

    ReplyDelete
  2. correlation DBMS (CRDBMS)

    ReplyDelete

Total Blog Directory Technology Blog Directory Business Intelligence Directory