Friday, November 12, 2010
"Our new Tableau Data Engine achieves instant query response on hundreds of millions of data rows, even on hardware as basic as a corporate laptop... No other platform allows companies to choose in-memory analytics on gigabytes of data …" Christian Chabot, CEO of Tableau Software, said in a statement.
These are bombastic claims indeed and the underlined segments of the CEO’s quote are particularly interesting. So with the help of my friend, colleague and brilliant database technologist Eldad Farkash, I decided to put these claims to a real life test.
Since this data engine was claimed to be utilizing in-memory technology, we set up a 64-bit computer with adequate amounts of RAM (hardly a corporate laptop) and used a real customer’s data set consisting of 560 million rows of raw internet traffic data. To make it easier, we imported just a single text field out of this entire data set.
1. Surprisingly, and unlike what Tableau’s CEO claims, Tableau’s new data engine is not really in-memory technology. In fact, their entire data set is stored on disk after it is imported and RAM is hardly utilized.
2. It took Tableau 6.0 approximately 5 hours to import this single text field, out of which 1.5 hours was pure import and the rest a process Tableau calls ‘Column Optimization’ which we believe is creating an index very similar to that of a regular relational database. For comparison, it took QlikView 50 minutes and ElastiCube 30 minutes to import the same field. That is an x7 difference. All products were using their default settings.
3. Once the import process completed, we asked Tableau to count how many distinct values existed in that field, a common query required for business intelligence purposes. That query took 30 minutes to return. For comparison, it took both QlikView and ElastiCube approximately 10 seconds to return. That’s an x180 difference. Again, both products were used with their default settings.
Tableau’s new data engine is a step up from their previous engine which was quite similar to that which Microsoft Access had been using in Office 2007. That is good news for individual analysts working with non-trivial amounts of data using earlier versions of Tableau, which were quite poor in this respect. This release, I imagine, also helps Tableau against SpotFire (Tibco), which until now was the only pure visualization player who could claim to have technology aimed for handling of larger data sets.
From a practical perspective, however, the handling of hundreds of millions of rows of data as well as the reference to in-memory analytics are more marketing fluff geared towards riding the in-memory hype than a true depiction of what this technology is or what it is capable of. Tableau’s data engine is not in the same league as in-memory technology, or pure columnar technologies like ElastiCube, when it comes to import times or query response times. In fact, it is slower by several orders of magnitude.