The ElastiCube Chronicles

Showing posts with label cloud business intelligence. Show all posts

Thursday, June 16, 2011

BI-as-a-Service - Some Questions Worth Asking

For a couple of years now, there has been a substantial amount of hype in the business intelligence (BI) space regarding “cloud BI,” or business intelligence systems hosted by Internet “cloud computing” service providers. This “cloud BI”, which is actually SaaS (software-as-a-service) BI, has been riding the wave of cloud computing in general, with the lower startup costs, faster deployment and easier scalability that cloud-based software implementations promise business customers. Several new companies have emerged and are promoting a new golden age of BI which they say will be faster, easier and cheaper than conventional business intelligence.

While this sounds fantastic at first glance, it might be a good idea to look beyond the hype to determine if deploying a BI solution in the cloud offers the same types of advantages of other SaaS solutions, such as CRM, accounting and email.

To this end, consider the following questions:

Does SaaS BI reduce dependence on IT staff?
Does SaaS BI compromise your data security?
Is all your business data already in the cloud?
How much will hardware cost you in the cloud?
Is BI backbone technology becoming more 'cloud-aware'?

1. Does SaaS BI reduce dependence on IT staff?

One of the main claims of SaaS software solutions in general is the dramatic reduction in dependence on expensive (and overworked) information technology (IT) professionals. In many organizations, the limited availability of IT staff is a major bottleneck when considering implementing a new software system that will be used across the organization. The cloud offers an appealing solution: since there is no hardware to install, no software to upgrade, no storage to backup and no new security mechanisms to implement, adopting a cloud solution (mostly) circumvents the need for extensive IT services (whether from in-house personnel or consultants).

In the realm of conventional BI systems, dependence on IT is a common and frustrating bottleneck for business users. The IT department (or consulting firm) is required for every piece of the BI puzzle, from the data warehousing to creating OLAP cubes to creating and customizing individual reports. In most companies, business users quickly discover that getting what they want from their company’s BI system, including incorporating new data sources, adding reports, customizing dashboards and extending the system to more users/departments, requires IT resources which are often unavailable when needed. The result is a frustrating and compromised system which fails to deliver on its full strategic potential.

So, does moving to the cloud solve this central problem of conventional BI?

The truth is that hosted BI solutions are really just outsourced IT departments which happen to come along with a bag full of their own home-grown or third-party software systems. All the stages of the familiar BI system deployment – from requirements specification consulting through data warehouse creation through report customization – still require the involvement of IT staff.Since these IT professionals are experts in their software and environments, they will likely be more efficient than hiring or retraining your company’s own staff (although they will not likely be more effective than any other dedicated outsourced technical BI team). However, think carefully about how much you want to be dependent on outsourced IT services for the lifeblood of your company’s strategic decision-making platform.

If your BI needs are modest and don’t change often, then, in theory, you will probably end up saving money as compared with hiring your own IT staff for BI. However, as the system grows and extends (as it always does), you will be at the mercy of the schedules and price rates of a third-party IT team which you are locked in to. If you thought internal IT can be a bottleneck, imagine how difficult it will be to get good and timely service from an external IT department located far away and busy with numerous other customers as well.

2. Does SaaS BI compromise your data security?

There are two unrelated issues to think about here. Read the rest of the article...

Friday, December 10, 2010

Thoughts about Business Intelligence and the Cloud

Business intelligence in the cloud is a hot topic recently, as part of the hype surrounding the cloud in general. I am not a big fan of cloud BI and I have mentioned that several times. However the topic does merit discussion.

The advantages of the cloud over on-premises are pretty straight forward. However, as far as business intelligence implementations are concerned, the question to me was always whether the benefits outweigh the unique challenges the cloud introduces. If all business data was in the cloud, there was a definite case to make for implement business intelligence software in the cloud. But since most business data isn’t, the benefits of cloud BI are not as obvious.

The blogosphere and analyst community in the business intelligence space are not sparing any words on the subject. There are several startups in this space as well, such as GoodData, PivotLink and others. But is the business intelligence space really heading in the direction of the cloud? I believe the answer is no.

The main reason I do not believe that the BI space is headed towards the cloud (at least for now) is because business intelligence backbone technology doesn’t seem to be headed there. In fact, it seems to be going in the opposite direction.

If you take a careful look at the new technology promoted by the established business intelligence vendors like SAP, IBM and Microsoft, and even those promoted by slightly less established vendors (yet successful) such as QlikTech and Tableau – it is all technology that is either ‘desktop enabling’ technology or in-memory technology.

These technologies, in-memory in particular, aren’t very cloud friendly and weren’t designed with the cloud in mind at all. They are designed to extract more juice out of a single computer, but very hard to distribute across multiple machines as in the case in most cloud implementations. Also, to benefit significantly from these types of technologies, you need very powerful computers, a premise which goes against proper cloud architecture that dictates that computing operations should be parallelized across multiple cheaper machines.

On the other hand, the current cloud BI platform vendors are using the same traditional backbone technology the on-premises vendors do, and by that they suffer from the same drawbacks most BI vendors do such as complexity and long development cycles. And when these drawbacks come into play, whether the data is in the cloud or on-premises isn’t even the main issue.

Even if the ‘pure cloud’ BI platform vendors did develop better technology more suited for running BI in the cloud, it is still years away. So while you can use the cloud for some types of solutions (mainly around other cloud data sources) the fact of the matter is that the cloud BI hype is at least a few years too early.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Friday, September 24, 2010

In-memory BI is not the future. It’s the past.

In recent times, one of the most popular subjects related to the field of Business Intelligence (BI) has been In-memory BI technology. The subject gained popularity largely due to the success of QlikTech, provider of the in-memory-based QlikView BI product. Following QlikTech’s lead, many other BI vendors have jumped on the in-memory “hype wagon,” including the software giant, Microsoft, which has been aggressively marketing PowerPivot, their own in-memory database engine.

The increasing hype surrounding in-memory BI has caused BI consultants, analysts and even vendors to spew out endless articles, blog posts and white papers on the subject, many of which have also gone the extra mile to describe in-memory technology as the future of business intelligence, the death blow to the data warehouse and the swan song of OLAP technology.

I find one of these in my inbox every couple of weeks.

Just so it is clear - the concept of in-memory business intelligence is not new. It has been around for many years. The only reason it became widely known recently is because it wasn’t feasible before 64-bit computing became commonly available. Before 64-bit processors, the maximum amount of RAM a computer could utilize was barely 4GB, which is hardly enough to accommodate even the simplest of multi-user BI solutions. Only when 64-bit systems became cheap enough did it became possible to consider in-memory technology as a practical option for BI.

The success of QlikTech and the relentless activities of Microsoft’s marketing machine have managed to confuse many in terms of what role in-memory technology plays in BI implementations. And that is why many of the articles out there, which are written by marketers or market analysts who are not proficient in the internal workings of database technology (and assume their readers aren’t either), are usually filled with inaccuracies and, in many cases, pure nonsense.

The purpose of this article is to put both in-memory and disk-based BI technologies in perspective, explain the differences between them and finally lay out, in simple terms, why disk-based BI technology isn’t on its way to extinction. Rather, disk-based BI technology is evolving into something that will significantly limit the use of in-memory technology in typical BI implementations.

But before we get to that, for the sake of those who are not very familiar with in-memory BI technology, here’s a brief introduction to the topic.

Disk and RAM

Generally speaking, your computer has two types of data storage mechanisms – disk (often called a hard disk) and RAM (random access memory). The important differences between them (for this discussion) are outlined in the following table:

Most modern computers have 15-100 times more available disk storage than they do RAM. My laptop, for example, has 8GB of RAM and 300GB of available disk space. However, reading data from disk is much slower than reading the same data from RAM. This is one of the reasons why 1GB of RAM costs approximately 320 times that of 1GB of disk space.

Another important distinction is what happens to the data when the computer is powered down: data stored on disk is unaffected (which is why your saved documents are still there the next time you turn on your computer), but data residing in RAM is instantly lost. So, while you don’t have to re-create your disk-stored Microsoft Word documents after a reboot, you do have to re-load the operating system, re-launch the word processor and reload your document. This is because applications and their internal data are partly, if not entirely, stored in RAM while they are running.

Disk-based Databases and In-memory Databases

Now that we have a general idea of what the basic differences between disk and RAM are, what are the differences between disk-based and in-memory databases? Well, all data is always kept on hard disks (so that they are saved even when the power goes down). When we talk about whether a database is disk-based or in-memory, we are talking about where the data resides while it is actively being queried by an application: with disk-based databases, the data is queried while stored on disk and with in-memory databases, the data being queried is first loaded into RAM.

Disk-based databases are engineered to efficiently query data residing on the hard drive. At a very basic level, these databases assume that the entire data cannot fit inside the relatively small amount of RAM available and therefore must have very efficient disk reads in order for queries to be returned within a reasonable time frame. The engineers of such databases have the benefit of unlimited storage, but must face the challenges of relying on relatively slow disk operations.

On the other hand, in-memory databases work under the opposite assumption that the data can, in fact, fit entirely inside the RAM. The engineers of in-memory databases benefit from utilizing the fastest storage system a computer has (RAM), but have much less of it at their disposal.

That is the fundamental trade-off in disk-based and in-memory technologies: faster reads and limited amounts of data versus slower reads and practically unlimited amounts of data. These are two critical considerations for business intelligence applications, as it is important both to have fast query response times and to have access to as much data as possible.

The Data Challenge

A business intelligence solution (almost) always has a single data store at its center. This data store is usually called a database, data warehouse, data mart or OLAP cube. This is where the data that can be queried by the BI application is stored.

The challenges in creating this data store using traditional disk-based technologies is what gave in-memory technology its 15 minutes (ok, maybe 30 minutes) of fame. Having the entire data model stored inside RAM allowed bypassing some of the challenges encountered by their disk-based counterparts, namely the issue of query response times or ‘slow queries.’

Disk-based BI

When saying ‘traditional disk-based’ technologies, we typically mean relational database management systems (RDBMS) such as SQL Server, Oracle, MySQL and many others. It’s true that having a BI solution perform well using these types of databases as their backbone is far more challenging than simply shoving the entire data model into RAM, where performance gains would be immediate due to the fact RAM is so much faster than disk.

It’s commonly thought that relational databases are too slow for BI queries over data in (or close to) its raw form due to the fact they are disk-based. The truth is, however, that it’s because of how they use the disk and how often they use it.

Relational databases were designed with transactional processing in mind. But having a database be able to support high-performance insertions and updates of transactions (i.e., rows in a table) as well as properly accommodating the types of queries typically executed in BI solutions (e.g., aggregating, grouping, joining) is impossible. These are two mutually-exclusive engineering goals, that is to say they require completely different architectures at the very core. You simply can’t use the same approach to ideally achieve both.

In addition, the standard query language used to extract transactions from relational databases (SQL) is syntactically designed for the efficient fetching of rows, while rare are the cases in BI where you would need to scan or retrieve an entire row of data. It is nearly impossible to formulate an efficient BI query using SQL syntax.

So while relational databases are great as the backbone of operational applications such as CRM, ERP or Web sites, where transactions are frequently and simultaneously inserted, they are a poor choice for supporting analytic applications which usually involve simultaneous retrieval of partial rows along with heavy calculations.

In-memory BI

In-memory databases approach the querying problem by loading the entire dataset into RAM. In so doing, they remove the need to access the disk to run queries, thus gaining an immediate and substantial performance advantage (simply because scanning data in RAM is orders of magnitude faster than reading it from disk). Some of these databases introduce additional optimizations which further improve performance. Most of them also employ compression techniques to represent even more data in the same amount of RAM.

Regardless of what fancy footwork is used with an in-memory database, storing the entire dataset in RAM has a serious implication: the amount of data you can query with in-memory technology is limited by the amount of free RAM available, and there will always be much less available RAM than available disk space.

The bottom line is that this limited memory space means that the quality and effectiveness of your BI application will be hindered: the more historical data to which you have access and/or the more fields you can query, the better analysis, insight and, well, intelligence you can get.

You could add more and more RAM, but then the hardware you require becomes exponentially more expensive. The fact that 64-bit computers are cheap and can theoretically support unlimited amounts of RAM does not mean they actually do in practice. A standard desktop-class (read: cheap) computer with standard hardware physically supports up to 12GB of RAM today. If you need more, you can move on to a different class of computer which costs about twice as much and will allow you up to 64GB. Beyond 64GB, you can no longer use what is categorized as a personal computer but will require a full-blown server which brings you into very expensive computing territory.

It is also important to understand that the amount of RAM you need is not only affected by the amount of data you have, but also by the number of people simultaneously querying it. Having 5-10 people using the same in-memory BI application could easily double the amount of RAM required for intermediate calculations that need to be performed to generate the query results. A key success factor in most BI solutions is having a large number of users, so you need to tread carefully when considering in-memory technology for real-world BI. Otherwise, your hardware costs may spiral beyond what you are willing or able to spend (today, or in the future as your needs increase).

There are other implications to having your data model stored in memory, such as having to re-load it from disk to RAM every time the computer reboots and not being able to use the computer for anything other than the particular data model you’re using because its RAM is all used up.

A Note about QlikView and PowerPivot In-memory Technologies

QlikTech is the most active in-memory BI player out there so their QlikView in-memory technology is worth addressing in its own right. It has been repeatedly described as “unique, patented associative technology” but, in fact, there is nothing “associative” about QlikView’s in-memory technology. QlikView uses a simple tabular data model, stored entirely in-memory, with basic token-based compression applied to it. In QlikView’s case, the word associative relates to the functionality of its user interface, not how the data model is physically stored. Associative databases are a completely different beast and have nothing in common with QlikView’s technology.

PowerPivot uses a similar concept, but is engineered somewhat differently due to the fact it’s meant to be used largely within Excel. In this respect, PowerPivot relies on a columnar approach to storage that is better suited for the types of calculations conducted in Excel 2010, as well as for compression. Quality of compression is a significant differentiator between in-memory technologies as better compression means that you can store more data in the same amount RAM (i.e., more data is available for users to query). In its current version, however, PowerPivot is still very limited in the amounts of data it supports and requires a ridiculous amount of RAM.

The Present and Future Technologies

The destiny of BI lies in technologies that leverage the respective benefits of both disk-based and in-memory technologies to deliver fast query responses and extensive multi-user access without monstrous hardware requirements. Obviously, these technologies cannot be based on relational databases, but they must also not be designed to assume a massive amount of RAM, which is a very scarce resource.

These types of technologies are not theoretical anymore and are already utilized by businesses worldwide. Some are designed to distribute different portions of complex queries across multiple cheaper computers (this is a good option for cloud-based BI systems) and some are designed to take advantage of 21st-century hardware (multi-core architectures, upgraded CPU cache sizes, etc.) to extract more juice from off-the-shelf computers.

A Final Note: ElastiCube Technology

The technology developed by the company I co-founded, SiSense, belongs to the latter category. That is, SiSense utilizes technology which combines the best of disk-based and in-memory solutions, essentially eliminating the downsides of each. SiSense’s BI product, Prism, enables a standard PC to deliver a much wider variety of BI solutions, even when very large amounts of data, large numbers of users and/or large numbers of data sources are involved, as is the case in typical BI projects.

When we began our research at SiSense, our technological assumption was that it is possible to achieve in-memory-class query response times, even for hundreds of users simultaneously accessing massive data sets, while keeping the data (mostly) stored on disk. The result of our hybrid disk-based/in-memory technology is a BI solution based on what we now call ElastiCube, after which this blog is named. You can read more about this technological approach, which we call Just-in-Time In-memory Processing, at our BI Software Evolved technology page.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Wednesday, September 1, 2010

Business Intelligence Vendors and their Partners – Rough Seas Ahead

The traditional business intelligence ecosystem is built on the numerous strategic partnerships that exist between BI software vendors, which provide the technology, and value added resellers (VARs), which provide customized solutions based on that technology.

The Relationship between BI Software Vendors and their VARs

As in all partnerships, both sides need to have something significant to gain for their partnership to be successful. In the business intelligence industry, this has indeed been the case for a long time. The software vendors use their channel partners to distribute their software to a larger audience and these, in turn, have made a pretty penny from commissions, consulting and implementation fees.

There has always been a distinct difference, however, between the business goals software vendors set for themselves and those sought after by their VARs.

The software vendors, for their part, want to sell as many software licenses as they can to new customers, as well as to charge software maintenance fees from their existing clientele. This provides them a steady income stream from existing customers while new customers grow the business. Their partners, on the other hand, prefer long and complex implementation projects from which they generate significantly more revenue than they do from commissions on software license sales.

This symbiosis used to be great. Since most traditional BI companies are focused on high-end corporations with huge budgets, there was enough to go around. These customers have large numbers of employees who can benefit from BI (read: big money selling software licenses for the software vendors) and who have no problem spending hundreds of thousands (or millions) of dollars on implementation projects (read: significant income from project fees for the implementer).

Mutually Beneficial Relationships?

It so happens, however, that changing conditions over the past couple of years (and particularly during 2010) have brought the traditional business intelligence industry to a point where the mutual vendor-VAR benefits are not as obvious anymore. While these conditions have contributed to a deterioration in relationships between BI software vendors and their partners, the good news is that companies exploring business intelligence options stand to benefit substantially from the situation.

Let’s take a look at some of the conditions affecting the BI industry in recent years:

1. Tough Economic Times

Obviously, the economic crisis which began in 2008 affected everyone, vendors and customers alike. Business intelligence as a concept was actually positively affected by this crisis as it became painfully obvious how important it is to track a business’s operational and financial performance. On the other hand, available budgets shrank significantly and there was a smaller pie to share between BI software vendors and their partners. This fact has been causing friction between the two sides as each attempts to vigorously protect its own piece of the pie.

2. Too Many Partners

In an attempt to gain more market share, software vendors invested extra effort in recruiting more and more VARs for their partner networks. While this had a positive effect on software vendors’ revenues, it wasn’t as good for those in the partner network. Having more partners leads to more competition which, in turn, means more investment in marketing and sales (and lower profits). To make matters worse, in a further attempt to increase revenues, some software vendors actually began competing with their own partners on implementation deals.

3. QlikTech and their IPO

Ever since QlikTech began gaining popularity, their main sales pitch has been shorter implementation times and reduced ongoing costs (due to the supposedly fewer IT personnel required to maintain their BI solution). While this holds mighty appeal to BI customers, it flies in the face of the entire premise of BI resellers, which rely on project implementation and BI maintenance revenues. QlikTech addressed this issue by providing their VARs higher commissions on software license sales (as compared to those offered by Microsoft, Cognos or Business Objects, for example). Coupled with the implementation and maintenance work a QlikTech solution still requires, the higher commissions provide reasonable revenues for their partners.

Along with their impressive sales and growth numbers, QlikTech’s recent IPO revealed that they generated $157M in revenues during 2009 with total expenses of $150M. The resulting profit of $7M is not great.

Whether QlikTech’s intentions are to be acquired soon or to keep growing their business remains a mystery, but either way their partners should pay close attention. If they do seek a quick exit, their partners face an uncertain future. If they intend on growing their business and improving profitability, they will have to raise their prices and/or expand their partner network significantly and/or increase their direct involvement in both software sales and implementation. Existing partners will not be pleased with either of these alternatives.

As the successful pioneer of a newer, faster, easier approach to BI, the QlikTech example should be considered carefully by VARs as an indication of what the future may hold for the BI industry as whole.

4. The Self-Service BI Hype

The hottest thing in the BI industry today is the self-service BI concept. Regardless of whether it’s promoted by vendors providing personal analysis tools or cloud BI platforms, the basic idea behind it is the same: traditional BI is too expensive, takes too long to implement and is a big pain to maintain. Instead, the customer wants tools to enable self-reliance (as opposed to relying on external consultants/implementers who live off service fees). Whether these solutions actually deliver what they promise is beside the point (you can read my opinion about cloud BI here), but the buzz is out there and the market hears it, so it’s getting harder these days to justify long and expensive BI projects.

5. Microsoft PowerPivot

PowerPivot is Microsoft’s attempt to promote the self-service BI concept. By introducing PowerPivot, Microsoft is basically giving up on penetrating the mid-market with SQL Server Analysis Services and is trying instead to do it by introducing stronger BI capabilities in their Office product. While some believe that PowerPivot is just a lot of hot air, the fact remains that Microsoft is investing a lot of effort and money on marketing it. This places their existing partners – who rely on SQL Server sales – in a very problematic situation. These partners prefer SQL Server-based solutions, which provide more license commissions and more project hours, yet they need to fight Microsoft’s own marketing machine which is now essentially promoting self-service BI. Not an enviable situation to be in, to say the least.

What Does the Future Hold?

It’s great that so much emphasis is being placed on simplifying business intelligence and making it accessible to companies that do not have multimillion dollar budgets. Since established players and new startups alike are now beginning to focus on this type of approach, it is actually realistic to expect that self-service BI is on its way to gradually becoming a commodity. Customers will benefit greatly from this trend.

On the other hand, business intelligence VARs must understand that this is where the market is headed – and adjust their business models accordingly. A company selling BI solutions based on existing BI platforms will need to provide real added value to the customer in order to stay in business. In the not-too-distant future, this value will almost certainly come from industry-specific professional knowledge and experience (as opposed to purely technical expertise). More and more customers will no longer accept lengthy R&D projects to achieve BI and, with the new software and technologies now emerging, it is no longer justifiable.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Sunday, July 18, 2010

Would I Use Cloud Business Intelligence?

This post was inspired by the latest announcement made by Tibco, that they are providing SpotFire Silver - their cloud-BI offering - free for one year. Every few months, another BI player announces such a hosted BI service/product. This comes in addition to number of smaller companies that focus on these types of hosted solutions such as Gooddata, Birst and PivotLink.

To-date, no one has proved that hosted (cloud) BI is a sustainable business. None of the startups doing this have skyrocketed (yet?), and more of the larger players (Tibco included, in my opinion) are joining the effort on marketing hype alone. I doubt if they really know how they're going to be making money out of it.

All that being said, would I use a cloud/hosted BI service? In spite of its promise in terms of cost of ownership and easy deployment, the answer to that is an emphatic no. There are several reasons for this and they all revolve around the following points:

Independence

One of the common problems typical business intelligence solutions suffer from is their heavy reliance on IT involvement - data warehousing, OLAP cubes, and even report creation and/or customization. The IT department quickly becomes a bottleneck and just as quick the effectiveness of the BI solution you paid so much for relies on you adding more (expensive) IT people to tend to requests. Otherwise you'd be frustrating the users and prevent the solution from expanding throughout the department or company.

Hosted BI solutions are simply hosted IT departments with an arsenal of home-made or 3rd party software. In theory, as long as your solution is not too complicated you could save some money on recruitments to your IT department, but if you thought internal IT can be a bottleneck, you can only imagine how an IT department who is located in a different city or country can respond to your requests (and other customers' as well!).

As long as I have an option, IT-centric BI (hosted or not) is not a good idea as it contradicts what BI is supposed to be. A fast and flexible tool for the business user. But if I need IT to support my BI efforts, I would rather they be close.

Privacy and Security

This one's a big issue for me. I'm not so much worried they'll get hacked, as I am worried about the vendor itself (I have trust issues, I know). I am a heavy BI user and wherever I work a lot of the secret sauce relies on how use BI is used and what KPIs are tracked. Taking all this information and putting it on the server of a BI vendor, just to find it in the next version of their product could turn out to be disastrous. BI gives a tangible form to a business strategy, and that is something I would want to protect without compromise.

Working with Data

You could call anything BI. But basic reporting aside, getting the real gems in BI always involves a lot of data, and it's usually not all in one place. These are the main hurdles you must face, before you can use some sort of a reporting/visualization tool (or even Excel) and extract the answers or insights you're looking for. The mere thought of doing all these ETL tasks over the WWW gives me the shivers. This process is gruesome enough without having to wait for data to be transferred over the internet.

Data Size vs. Cost

Hosted BI vendors charge you for the hardware they use. They have to in order to remain in business. It's commonly known that BI solutions typically require sturdy hardware, particularly with strong multiple CPUs and dozens of GBs of RAM.

The CPU and RAM requirements for a solution are pretty closely bound to the amount of data being stored and queried. Because of this, with a hosted BI solution there is a very clear choice you need to make - pay a lot of money to perform direct queries of medium-to-large data sets hosted on the expensive cloud machine or limit the amount of data you store thus damaging the scope of business intelligence you will be doing.

This is a choice I prefer not to make.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog