Sunday, July 11, 2010

Comparing BI Vendors Based on Technology

I've recently come across an interesting online discussion where several posters discuss working with large amounts of data and its implications on business intelligence implementations. I wouldn't have noticed it if one of the posters had not referred to SiSense in one of the comments.

The main reason for the post was purely technological, putting on display the internals of QlikView's in-memory database technology. This lasted for about 5 posts, after which it turned into a bashing match between QlikView supporters and what you could call QlikView non-sympathizers in regards to whether it would even make sense to use in-memory database technology for large (1TB and greater) BI implementations.

As part of this discussion, several vendors were mentioned including: SiSense, QlikView, Lyza, Vertica and Microsoft. Some of these vendors do not even directly compete with each other. There were also several types of technologies mentioned, from in-memory databases (IMDB), to columnar databases (CDBMS), and even compression.

Apart from this discussion being interesting and even entertaining (for some), it is indicative of a common mistake that people sometimes make when they compare business intelligence vendors and products based on the technology they use.

Technology is important as it is the foundation on which everything is based, but every vendor takes its technology down different paths, and in many cases comparing two BI vendors is like comparing a Boeing airplane to a Toyota family car. I could easily say that a plane's engine is more powerful than a car's, right? Does that mean you, the consumer, would want an airplane engine stuffed under your car's hood? Your car would theoretically drive faster, thats for sure. But in practice, most civilized areas impose speed limits that would prevent you from gaining any benefit from your automobile's super-fast engine. Not to mention the ridiculous amounts of money you'd be spending on maintaining and refueling your car.

Wanna take the kids out to McDonalds? Better notify the FAA. ;-)

There are significant differences between the above-mentioned vendors which are important to understand. These differences may come from the particular strengths and weaknesses the internal data technology in use has, but it usually goes way beyond that.

QlikView targets departments with reasonable amounts of data that is centralized and accessed by multiple users. QlikView is a developer tool for creating canned BI solutions based on a design made in advance, not as much for ad-hoc analysis. QlikView utilizes an in-memory database to address performance. It is a good solution for small-medium implementations, not as good for larger ones (tons of data and/or too many users). QlikView competes with the giants, such as Oracle, Microsoft, SAP and IBM for end-to-end BI implementations.

Microsoft PowerPivot is a pivoting add-in to Excel 2010. Because it comes with an in-memory database, it removes the 1M row limit imposed by Excel 2007, assuming you have 64-bit machine with adequate RAM. It targets power analysts, like Excel advanced features always have. PowerPivot is really single user BI and is not applicable to multiple users, unless you include SQL Server and SharePoint in the package.

Lyza targets individual power analysts as well, but they rather assume abundance of disk than abundance of RAM. They have created a tool that let's you perform ETL-based filters and analysis over large amounts of data, even on a 32-bit computer (similar in concept to SSIS). They do this by using a columnar database. Lyza is also BI without a centralized data repository, which doesn't make it very effective for multi-user scenarios. It will be interesting to see how Lyza is impacted by Microsoft PowerPivot.

Vertica is a data warehouse software vendor. Their technology is based on an open source project called C Store, which is also a columnar database. Vertica competes with other data warehouse vendors such as Greenplum and InfoBright. They do not currently provide a BI front end for reporting or analysis.

SiSense targets departments and businesses looking for centralized business intelligence accessed by multiple users. SiSense uses both a columnar database for storage and in-memory query processing to make sure it is both infinitely scalable without infinite amounts of RAM and provides viable query performance without having to go down the OLAP path. SiSense also provides its own reporting/analysis front end and competes with the BI giants, as well as QlikView.

As a BI consumer, you are buying a BI solution, not BI technology. Don't get confused by marketing people throwing technological buzzwords at you because most likely you won't be able to identify which of the marketing blather is actually relevant to you. Make sure you get what you need, functionality-wise, and that the solution will still hold water a year from now as your data grows and more users use it.


By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

5 comments:

  1. Very useful information, I an a business development manager in a software development company, sometimes you need to cut through the B*lls**t.
    Thanks

    ReplyDelete
  2. Interesting post and interesting technology you guys have there.

    Curious to know what is the backend database used by SiSense? is it an open source columnar database or one of the commercial ones?

    ReplyDelete
  3. Thanks, Sharon. I hope your transition into EMC goes smoothly.

    We use portions of the Monetdb open source code in our ElastiCube technology, though we are progressively phasing it out.

    ReplyDelete

Total Blog Directory Technology Blog Directory Business Intelligence Directory