The ElastiCube Chronicles

If you follow trends in the business intelligence (BI) space, you’ll notice that many analysts, independent bloggers and BI vendors talk about in-memory technology.

There are technical differences that separate one in-memory technology from another, some of which are listed on Boris Evelson’s blog.

Some of the items on Boris’ list are just as applicable to BI technologies that are not in-memory (‘Incremental updates’, for example), but there is one item that merits much deeper discussion. Boris calls this characteristic ‘Memory Swapping’ and describes it as, What the (BI) vendor’s approach is for handling models that are larger than what fits into a single memory space.

Understanding Memory Swapping

The fundamental idea of in-memory BI technology is the ability to perform real-time calculations without having to perform slow disk operations during the execution of a query. For more details on this, visit my article describing how in-memory technology works.

Obviously, in order to perform calculations on data completely in memory, all the relevant data must reside in memory, i.e., in the computer’s RAM. So the questions are: 1) how does the data get there? and 2) how long does it stay there?

These are probably the most important aspects of in-memory technology, as they have great implications on the BI solution as a whole.

Pure In-Memory Technology

Pure in-memory technologies are the class of in-memory technologies that load the entire data model into RAM before a single query can be executed by users. An example of a BI product which utilizes such a technology is QlikView.

QlikView’s technology is described as “associative technology.” That is a fancy way of saying that QlikView uses a simple tabular data model which is stored entirely in memory. For QlikView, much like any other pure in-memory technology, compression is very important. Compressing the data well makes it possible to hold more data inside a fixed amount of RAM.

Pure in-memory technologies which do not compress the data they store in memory are usually quite useless for BI. They either handle amounts of data too small to extract interesting information from, or they break too often.

With or without compression, the fact remains that pure in-memory BI solutions become useless when RAM runs out for the entire data model, even if you're only looking to work with limited portions of it at any one time.

Just-In-Time In-Memory Technology

Just-In-Time In-Memory (or JIT In-Memory) technology only loads the portion of the data into RAM required for a particular query, on demand. An example of a BI product which utilizes this type of technology is SiSense.

Note: The term JIT is borrowed from Just-In-Time compilation, which is a method to improve the runtime performance of computer programs.

JIT in-memory technology involves a smart caching engine that loads selected data into RAM and releases it according to usage patterns.

This approach has obvious advantages:

1. You have access to far more data than can fit in RAM at any one time
2. It is easier to have a shared cache for multiple users
3. It is easier to build solutions that are distributed across several machines

However, since JIT In-Memory loads data on demand, an obvious question arises: Won't the disk reads introduce unbearable performance issues?

The answer would be yes, if the data model used is tabular (as they are in RDBMSs such as SQL Server and Oracle, or pure in-memory technologies such as QlikView), but scalable JIT In-Memory solutions rely on a columnar database instead of a tabular database.

This fundamental ability of columnar databases to access only particular fields, or parts of fields, is what makes JIT In-Memory so powerful. In fact, the impact of columnar database technology on in-memory technology is so great, that many confuse the two.

The combination of JIT In-Memory technology and a columnar database structure delivers the performance of pure in-memory BI technology with the scalability of disk-based models, and is thus an ideal technological basis for large-scale and/or rapidly-growing BI data stores.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Successful business intelligence (BI) solutions serve as many business users as possible. As more users use it, the more value the solution brings.

However, if you’ve had any experience with BI, you must have noticed that as the number of users grow – so does the complexity (and consequent cost) of the solution. This is a fundamental reality in the traditional business intelligence space, although many startups in the space are attempting to change it – each according to their own vision and understanding of the space.

But why is buying a BI solution for dozens or hundreds of users so much more complicated than buying a solution for a select group of power users?

Perspective #1: The Cost of Software Licenses

People often think that the answer to this question lies in software costs, but in fact software costs are usually the red herring in the process of business intelligence costing.

It is obvious that the more users your solution has the more software licenses are going to cost. Therefore, you might be tempted to choose a vendor that sells software for 30% less than another vendor – but basing a decision solely on this is a big mistake as license costs have little bearing on the total cost of a BI solution, and hardly any impact on ROI.

Some proof to this can be found in open source. Open source BI provides (by definition) free software, and there is no shortage of open source BI tools/platforms. However, none of them are doing as well as the established non-open source vendors, even though they have been around since the beginning of the century. They’re having trouble acquiring customers, at least compared to commercial vendors. It is very easy to assume that if software costs were significant inhibitors in the BI space, open source solutions would be much more prominent than they actually are.

Another hint at this can be found in the ‘commercial’ (non-open source) world, where BI vendors do charge for licenses but will usually provide significant discounts on purchasing of large volumes of licenses. BI vendors do it for reasons that go beyond the obvious attempt to motivate potential buyers to expand their purchase orders. They do it because they realize the total cost of the solution – to the customer – grows significantly as the number of users grows, regardless of license costs (preparation projects, IT personnel assignment, etc). They need to take this into account when they price their software.

Tip: Pay attention to software costs, but there are way more important things to consider. You should really leave the license cost comparison to last.

Perspective #2: The Cost of Hardware

Two things that have great impact on the hardware requirements of a BI solution are the amounts of data being queried directly by business users, and the number of business users doing the querying concurrently. Depending on which technology you use, each user can add between 10%-50% to the configuration of hardware resources required (disk, RAM and CPU).

(For you technology geeks out there, there is an interesting discussion about this topic on Curt Monash’s blog. Check out the comments section, as it will also give you a good idea on what hardware configurations can be used, when different technologies are utilized)

The tipping point, however, is when your requirements grow beyond what can be fitted inside a single commodity hardware box (read: cheap off-the-shelf computer). If this limit is hit, you basically have three options, none of which are practical for most companies:

1. Buy a high-end proprietary server
2. Clustering / sharding
3. Build a data warehouse / pre-processed OLAP cubes

Unfortunately, BI technologies that were designed prior to the 21st century (RDBMS, OLAP, In-Memory Databases) don’t leave much room for innovation on this particular aspect. They were designed for hardware that was different than what exists today. So while there will always be a limit on what can be achieved with a single hardware box, with traditional BI technologies the threshold is too low to be feasible for most modern companies that both have large volumes of data and seek extensive usage at reasonable and consistent response times.

The good news is that this is not the case with new technologies that are designed specifically to utilize the modern chipsets that are available on any commodity 64-bit machine, and therefore get more (orders of magnitude more) juice out of a single 64-bit commodity box. Running dozens or hundreds of users on a single box is more than possible these days, even when data is in the 100s of GBs size range.

Tip: If you do not wish to spend loads of money on high-end or proprietary servers, and your internal IT department has better things to than to manage a cluster for BI, you should really give preference to technologies that would allow you to set up your BI solution on a single commodity box.

Perspective #3: The Cost of Starting Too Big… or Too Small

After talking to business managers, executives and other stakeholders, you’ve determined that this BI solution you’re considering has the potential of serving 100 users. How would you then go about calculating your project costs? This is where things get tricky, and where most BI buyers fail to protect their wallets. Making the wrong decision here is far more significant than any decision you make on software licenses or even hardware.

Even if the development stage of your BI project goes without a hitch, getting a hundred users to use any kind of software, in any company, is a challenge that is not at all easier than any technical challenge you will encounter during the various stages of the project. You could easily find yourself spending tons of money on the development and deployment of a complicated 100 user solution, only to find that only 15 of them are actually using it.

So instead of your total cost per user being reduced due to the ‘volume-pricing’ model, you actually paid much more – because each one of these 15 users absorbs the cost of the 85 others who find it utterly useless, too difficult to use or completely misaligned with their business objectives. You'd be surprised how often this happens.

The obvious way of dealing with this common problem is to start off small (10-20 users), and expand as usage of the system grows (assuming it will). But when it comes to traditional business intelligence solutions, there’s a catch - deploying a solution for 10-20 users and deploying a solution for 100 users are utterly different tasks and require significant changes in solution architecture.

Following this path will save you some cost on the software licenses you did not purchase straight off. However, if demand for the solution grows inside the business, you will have to re-design your solution – which would probably end up costing more than it would have initially.

Tip: The correct way of dealing with this challenge is to seek a solution that scales without having to re-architect the solution as usage grows. Buying more software and upgrading hardware when the time comes is relatively easy and inexpensive, while rebuilding the entire solution from scratch every year or two costs way more.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

The ElastiCube Chronicles

Thursday, August 25, 2011

The In-Memory Technologies Behind Business Intelligence Software