Monday, September 17, 2012

BI Professional, Are You Earning Enough?

SiSense, a leader in Big Data Analytics and Business Intelligence (BI), recently released findings of a new salary survey providing insights into compensation trends and drivers for data professionals worldwide.

With a shortage of business intelligence (BI) professionals and data scientists looming, companies around the world are wondering how to best equip their teams to win with data. The Data Professional Survey results are based on responses from over 400 professionals spanning the North America, South America, Europe, Africa and Asia collected online in July 2012.

The full survey results are available for download here.

SiSense’s survey finds that salaries for data professionals are on the rise across all geographies. The annual earnings of a data professional can range from an average of $55,000 USD for a data analyst to an average of $132,000 for VP Analytics. As many as 61% of the survey respondents reported higher earnings in 2012 compared to 2011, and only 12% reported lower earnings.

While a vast majority of the respondents (78%) expect even higher earnings in 2013, close to half of the respondents (47%) expressed concern about their job security.

The survey indicates the data profession is male-dominated: 85% of the respondents are men. At the same time, the results demonstrate that gender equality is more common in the data profession than in others. While OECD data shows that on average across all professions men earn 15% more than women, the SiSense survey shows that women in the data profession earn virtually the same or more than their male counterparts.

Other highlights of the survey findings include: Data professionals are highly educated. 85% of the respondents have some college degree, 39% have a Master’s degree, and 5% are Ph.D.’s. Those with doctoral degrees earn on average 65% more than those with Master’s degrees, who in turn earn 16% more than those with Bachelor’s degrees. On the job experience is even more important than education in determining salary levels. On average, professionals with ten or more years of experience earn 80% more than those with 3 years or less.

At the same time, the survey shows that those with 6 years or less make up as much as 59% of the data profession workforce. Bruno Aziza, SiSense VP of Marketing, commented on these findings saying, “The data profession needs a more formal career track. It seems that many professionals exit the field at or around the 6-year mark, moving on to other opportunities. If we don’t formalize data careers more actively, deployment success could continue to remain low and the data industry might evolve slower than others.”

Most Data Professionals work in teams of up to five people. “Companies are starting to realize that Data is key to their success. The majority of them, though are not growing their Data Science teams fast enough to win. This maybe because they don’t want to or because they can’t. This is an alarming trend though and only software can come to the rescue,” noted Aziza.


About SiSense
SiSense Prism is a Big Data Reporting and Analytics Solution that provides the benefits of In-Memory Analytics without its disadvantages. SiSense In-Memory Columnar Datastore analyzes 100 times more data at 10 times the speed of comparable solutions. No need to set up complex data warehouse systems, OLAP cubes. No need for programming either, regardless where data comes from or how big it is.

Friday, March 2, 2012

SiSense Takes Silicon Valley

As part as its global expansion, and in particular as part as extensive activity in the US, SiSense has opened up its US office in Redwood City, California.

The US office will spearhead all marketing and sales activities for the US and Latin America.

SiSense R&D will remain in Tel-Aviv, along with sales and marketing for the EMEA region.

Stay tuned for further news (and job postings!)
By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Sunday, September 11, 2011

Google Says: Americans More Patriotic Since 9/11

America is a patriotic nation. Notwithstanding the definite upsurge in the period following 9/11/2001 ten years ago, overall the level of American patriotism tends to remain remarkably consistent. Polls support this statement, as reported in a Forbes article last year:

Patriotic attitudes are generally very stable. In a question Gallup asked in January 2001, 87% said they were "extremely" or "very" proud to be American. When Pew repeated the identical question last year, 86% gave that response. In 2001 and 2009, only 1% said they were "not at all proud." The 9/11 tragedy produced more overt displays of patriotism and heightened sentiment, but responses soon returned to the norm.

So how did 9/11 impact American patriotism? Back in 2005, an MSNBC article reported the results of a poll conducted by the Roper Reports unit of NOP World. NOP’s vice president of consumer trends was quoted as saying:

“We tracked patriotism, spirituality and religion, and giving to charities and volunteerism right after 9/11,” Silvers said. “All three popped up. Within about nine months, volunteering was down and so was religion, but what has stayed with us is patriotism, and it’s obviously fueled by a couple of things. The shift point was 9/11.”

As are most of the folks working at my BI software development company (SiSense), I could be called obsessed when it comes to applying business intelligence methodologies to anything. In this spirit, I automatically started thinking about how I might be able to uncover some trends showing if and how actual data might support the articles and surveys I saw about American patriotism and 9/11, as the ten-year anniversary of the 9/11 tragedies approached.

I decided to take a look at Google search data and see if and how online searches reflect the “patriotism effects” of 9/11 in America. After looking around and reading some more, I came up with two searches to check out: flag sales and volunteering.

What led me to check out flag sales? I saw a recent article that says that, “One response to the catastrophic events of Sept. 11, 2001, came quickly in a traditionally American way -- flag sales soared.” The article quotes the president of Gettysburg Flag Works in New York State as saying, “‘On 9/11, 12, 13, 14, we sold out everything that was red, white and blue. We had lines out the door,’ recalled Mike Cronin, president of Gettysburg Flag Works. “I was in flag shock for a couple of weeks.’”

Not surprisingly, a sharp increase in searches related to flag sales can be seen in Google search data in the month of September 2001. For the years following 2001, however, searches for flag sales were relatively flat.

 Flags Sales 2001

 Flags Sales 2007

Interestingly, though, this year’s tenth anniversary of the 9/11 attacks seems to be resonating more with Americans than on previous anniversaries: flag sale searches increased significantly over the past few weeks.

 Flags Sales 2011


When I went looking for articles which might discuss the increase in community service and volunteering among Americans, I didn’t find much. However, a look at Google search data for “volunteer USA” shows an interesting consistency with what I saw for flag sales searches: significant spikes in searches after the original 9/11 and then again this year.

US Volunteers 2001


US Volunteers 2009 


US Volunteers 2011

From the graphs above it is quite clear that the 10th anniversary of 9/11 has a great impact on patriotism, even more so than years before.

In any case, it’s hard to believe that ten years have passed since thousands of Americans died on 9/11. Hopefully, America is stronger as a country for it and its people are stronger as a nation. In case you feel like reading more about how patriotism has increased in the US since 9/11/01, I leave you with the following article, published this week in a Florida newspaper, which talks about American patriotism (and the difficulty in measuring it with surveys): Since 9/11: Patriotism - Ten years after the deadliest terrorist attack inside the United States, love of country remains as strong as ever, if not a little stronger.

By: Roi Hildesheimer | The ElastiCube Chronicles - Business Intelligence Blog

Thursday, August 25, 2011

The In-Memory Technologies Behind Business Intelligence Software

If you follow trends in the business intelligence (BI) space, you’ll notice that many analysts, independent bloggers and BI vendors talk about in-memory technology.

There are technical differences that separate one in-memory technology from another, some of which are listed on Boris Evelson’s blog.

Some of the items on Boris’ list are just as applicable to BI technologies that are not in-memory (‘Incremental updates’, for example), but there is one item that merits much deeper discussion. Boris calls this characteristic ‘Memory Swapping’ and describes it as, What the (BI) vendor’s approach is for handling models that are larger than what fits into a single memory space.

Understanding Memory Swapping

The fundamental idea of in-memory BI technology is the ability to perform real-time calculations without having to perform slow disk operations during the execution of a query. For more details on this, visit my article describing how in-memory technology works.

Obviously, in order to perform calculations on data completely in memory, all the relevant data must reside in memory, i.e., in the computer’s RAM. So the questions are: 1) how does the data get there? and 2) how long does it stay there?

These are probably the most important aspects of in-memory technology, as they have great implications on the BI solution as a whole.

Pure In-Memory Technology

Pure in-memory technologies are the class of in-memory technologies that load the entire data model into RAM before a single query can be executed by users. An example of a BI product which utilizes such a technology is QlikView.

QlikView’s technology is described as “associative technology.” That is a fancy way of saying that QlikView uses a simple tabular data model which is stored entirely in memory. For QlikView, much like any other pure in-memory technology, compression is very important. Compressing the data well makes it possible to hold more data inside a fixed amount of RAM.

Pure in-memory technologies which do not compress the data they store in memory are usually quite useless for BI. They either handle amounts of data too small to extract interesting information from, or they break too often.

With or without compression, the fact remains that pure in-memory BI solutions become useless when RAM runs out for the entire data model, even if you're only looking to work with limited portions of it at any one time.

Just-In-Time In-Memory Technology

Just-In-Time In-Memory (or JIT In-Memory) technology only loads the portion of the data into RAM required for a particular query, on demand. An example of a BI product which utilizes this type of technology is SiSense.

Note: The term JIT is borrowed from Just-In-Time compilation, which is a method to improve the runtime performance of computer programs.

JIT in-memory technology involves a smart caching engine that loads selected data into RAM and releases it according to usage patterns.

This approach has obvious advantages:

1. You have access to far more data than can fit in RAM at any one time
2. It is easier to have a shared cache for multiple users
3. It is easier to build solutions that are distributed across several machines

However, since JIT In-Memory loads data on demand, an obvious question arises: Won't the disk reads introduce unbearable performance issues?

The answer would be yes, if the data model used is tabular (as they are in RDBMSs such as SQL Server and Oracle, or pure in-memory technologies such as QlikView), but scalable JIT In-Memory solutions rely on a columnar database instead of a tabular database.

This fundamental ability of columnar databases to access only particular fields, or parts of fields, is what makes JIT In-Memory so powerful. In fact, the impact of columnar database technology on in-memory technology is so great, that many confuse the two.

The combination of JIT In-Memory technology and a columnar database structure delivers the performance of pure in-memory BI technology with the scalability of disk-based models, and is thus an ideal technological basis for large-scale and/or rapidly-growing BI data stores.
By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Tuesday, August 9, 2011

Putting The Costs of BI In Perspective

Successful business intelligence (BI) solutions serve as many business users as possible. As more users use it, the more value the solution brings.

However, if you’ve had any experience with BI, you must have noticed that as the number of users grow – so does the complexity (and consequent cost) of the solution. This is a fundamental reality in the traditional business intelligence space, although many startups in the space are attempting to change it – each according to their own vision and understanding of the space.

But why is buying a BI solution for dozens or hundreds of users so much more complicated than buying a solution for a select group of power users?

Perspective #1: The Cost of Software Licenses

People often think that the answer to this question lies in software costs, but in fact software costs are usually the red herring in the process of business intelligence costing.

It is obvious that the more users your solution has the more software licenses are going to cost. Therefore, you might be tempted to choose a vendor that sells software for 30% less than another vendor – but basing a decision solely on this is a big mistake as license costs have little bearing on the total cost of a BI solution, and hardly any impact on ROI.

Some proof to this can be found in open source. Open source BI provides (by definition) free software, and there is no shortage of open source BI tools/platforms. However, none of them are doing as well as the established non-open source vendors, even though they have been around since the beginning of the century. They’re having trouble acquiring customers, at least compared to commercial vendors. It is very easy to assume that if software costs were significant inhibitors in the BI space, open source solutions would be much more prominent than they actually are.

Another hint at this can be found in the ‘commercial’ (non-open source) world, where BI vendors do charge for licenses but will usually provide significant discounts on purchasing of large volumes of licenses. BI vendors do it for reasons that go beyond the obvious attempt to motivate potential buyers to expand their purchase orders. They do it because they realize the total cost of the solution – to the customer – grows significantly as the number of users grows, regardless of license costs (preparation projects, IT personnel assignment, etc). They need to take this into account when they price their software.

Tip: Pay attention to software costs, but there are way more important things to consider. You should really leave the license cost comparison to last.

Perspective #2: The Cost of Hardware

Two things that have great impact on the hardware requirements of a BI solution are the amounts of data being queried directly by business users, and the number of business users doing the querying concurrently. Depending on which technology you use, each user can add between 10%-50% to the configuration of hardware resources required (disk, RAM and CPU).

(For you technology geeks out there, there is an interesting discussion about this topic on Curt Monash’s blog. Check out the comments section, as it will also give you a good idea on what hardware configurations can be used, when different technologies are utilized)

The tipping point, however, is when your requirements grow beyond what can be fitted inside a single commodity hardware box (read: cheap off-the-shelf computer). If this limit is hit, you basically have three options, none of which are practical for most companies:

1. Buy a high-end proprietary server
2. Clustering / sharding
3. Build a data warehouse / pre-processed OLAP cubes

Unfortunately, BI technologies that were designed prior to the 21st century (RDBMS, OLAP, In-Memory Databases) don’t leave much room for innovation on this particular aspect. They were designed for hardware that was different than what exists today. So while there will always be a limit on what can be achieved with a single hardware box, with traditional BI technologies the threshold is too low to be feasible for most modern companies that both have large volumes of data and seek extensive usage at reasonable and consistent response times.

The good news is that this is not the case with new technologies that are designed specifically to utilize the modern chipsets that are available on any commodity 64-bit machine, and therefore get more (orders of magnitude more) juice out of a single 64-bit commodity box. Running dozens or hundreds of users on a single box is more than possible these days, even when data is in the 100s of GBs size range.

Tip: If you do not wish to spend loads of money on high-end or proprietary servers, and your internal IT department has better things to than to manage a cluster for BI, you should really give preference to technologies that would allow you to set up your BI solution on a single commodity box.

Perspective #3: The Cost of Starting Too Big… or Too Small

After talking to business managers, executives and other stakeholders, you’ve determined that this BI solution you’re considering has the potential of serving 100 users. How would you then go about calculating your project costs? This is where things get tricky, and where most BI buyers fail to protect their wallets. Making the wrong decision here is far more significant than any decision you make on software licenses or even hardware.

Even if the development stage of your BI project goes without a hitch, getting a hundred users to use any kind of software, in any company, is a challenge that is not at all easier than any technical challenge you will encounter during the various stages of the project. You could easily find yourself spending tons of money on the development and deployment of a complicated 100 user solution, only to find that only 15 of them are actually using it.

So instead of your total cost per user being reduced due to the ‘volume-pricing’ model, you actually paid much more – because each one of these 15 users absorbs the cost of the 85 others who find it utterly useless, too difficult to use or completely misaligned with their business objectives. You'd be surprised how often this happens.

The obvious way of dealing with this common problem is to start off small (10-20 users), and expand as usage of the system grows (assuming it will). But when it comes to traditional business intelligence solutions, there’s a catch - deploying a solution for 10-20 users and deploying a solution for 100 users are utterly different tasks and require significant changes in solution architecture.

Following this path will save you some cost on the software licenses you did not purchase straight off. However, if demand for the solution grows inside the business, you will have to re-design your solution – which would probably end up costing more than it would have initially.

Tip: The correct way of dealing with this challenge is to seek a solution that scales without having to re-architect the solution as usage grows. Buying more software and upgrading hardware when the time comes is relatively easy and inexpensive, while rebuilding the entire solution from scratch every year or two costs way more.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Saturday, June 18, 2011

Columnar RDBMS, Gourmet Fast Food and Santa Claus

Boris Evelson of Forrester recently published a blog post titled It's The Dawning Of The Age Of BI DBMS (Database Management System). I took note that in this post he classified Vertica, ParAccel and Sybase IQ in a category he named ‘Columnar RDBMS (Relational DBMS)’, and that started off a friendly email exchange as to what the heck that really means.

I said: “RDBMS is tabular, by definition.”

...and Boris said: “To me if I can access something via SQL, it’s relational.”

Who’s right is a matter of perspective, I suppose. But technically, defining RDBMS by the existence of SQL access is incorrect. According to Wikipedia, the short definition of an RDBMS is a DBMS (Database Management System) in which data is stored in tables and the relationships among the data are also stored in tables. The data can be accessed or reassembled in many different ways without having to change the table forms.

Note that the word 'table' appears three times in the short definition of the term. SQL does not appear anywhere in the definition.

What's worse, columnar databases do not store data in tables - again, by definition. So, come to think of it, how can such a thing as a Columnar RDBMS even exist? (get the title yet?)

I suppose that the only thing Columnar RDBMS could technically describe is a database system that stores data in a column-oriented manner, yet still relies on the fundamental mechanisms of an RDBMS (tables, SQL, indexing, etc). In practicality, this means that each field is stored in its own table, with an additional field for correlation. But that is a side-ways implementation technique that is mainly practical for somewhat extending the lifetime of existing software assets that are reaching their scalability limits, and hardly deserves its own DBMS category.

I know we wouldn't describe ElastiCube as being an RDBMS (even though it supports SQL access) and I'm pretty sure Vertica wouldn't describe their technology as an RDBMS either.

The similarities between ElastiCube, Vertica and RDBMS are sufficiently described within the 4 letters D-B-M-S. The letter R is what differentiates between them.

SiSense refers to ElastiCube as a Columnar DBMS or Column-oriented DBMS and I think this describes Vertica equally well. These two databases are not similar in the way they work internally, but neither are SQL Server and Oracle - which are still both RDBMS.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Thursday, June 16, 2011

BI-as-a-Service - Some Questions Worth Asking

For a couple of years now, there has been a substantial amount of hype in the business intelligence (BI) space regarding “cloud BI,” or business intelligence systems hosted by Internet “cloud computing” service providers. This “cloud BI”, which is actually SaaS (software-as-a-service) BI, has been riding the wave of cloud computing in general, with the lower startup costs, faster deployment and easier scalability that cloud-based software implementations promise business customers. Several new companies have emerged and are promoting a new golden age of BI which they say will be faster, easier and cheaper than conventional business intelligence.

While this sounds fantastic at first glance, it might be a good idea to look beyond the hype to determine if deploying a BI solution in the cloud offers the same types of advantages of other SaaS solutions, such as CRM, accounting and email.

To this end, consider the following questions:

  • Does SaaS BI reduce dependence on IT staff?
  • Does SaaS BI compromise your data security?
  • Is all your business data already in the cloud?
  • How much will hardware cost you in the cloud?
  • Is BI backbone technology becoming more 'cloud-aware'?


1. Does SaaS BI reduce dependence on IT staff?

One of the main claims of SaaS software solutions in general is the dramatic reduction in dependence on expensive (and overworked) information technology (IT) professionals. In many organizations, the limited availability of IT staff is a major bottleneck when considering implementing a new software system that will be used across the organization. The cloud offers an appealing solution: since there is no hardware to install, no software to upgrade, no storage to backup and no new security mechanisms to implement, adopting a cloud solution (mostly) circumvents the need for extensive IT services (whether from in-house personnel or consultants).

In the realm of conventional BI systems, dependence on IT is a common and frustrating bottleneck for business users. The IT department (or consulting firm) is required for every piece of the BI puzzle, from the data warehousing to creating OLAP cubes to creating and customizing individual reports. In most companies, business users quickly discover that getting what they want from their company’s BI system, including incorporating new data sources, adding reports, customizing dashboards and extending the system to more users/departments, requires IT resources which are often unavailable when needed. The result is a frustrating and compromised system which fails to deliver on its full strategic potential.

So, does moving to the cloud solve this central problem of conventional BI?

The truth is that hosted BI solutions are really just outsourced IT departments which happen to come along with a bag full of their own home-grown or third-party software systems. All the stages of the familiar BI system deployment – from requirements specification consulting through data warehouse creation through report customization – still require the involvement of IT staff.Since these IT professionals are experts in their software and environments, they will likely be more efficient than hiring or retraining your company’s own staff (although they will not likely be more effective than any other dedicated outsourced technical BI team). However, think carefully about how much you want to be dependent on outsourced IT services for the lifeblood of your company’s strategic decision-making platform.

If your BI needs are modest and don’t change often, then, in theory, you will probably end up saving money as compared with hiring your own IT staff for BI. However, as the system grows and extends (as it always does), you will be at the mercy of the schedules and price rates of a third-party IT team which you are locked in to. If you thought internal IT can be a bottleneck, imagine how difficult it will be to get good and timely service from an external IT department located far away and busy with numerous other customers as well.

2. Does SaaS BI compromise your data security?

There are two unrelated issues to think about here. Read the rest of the article...

Tuesday, May 31, 2011

The Collapse of Traditional Business Intelligence

The traditional approach to business intelligence has gone bankrupt. In its place, a new wave of companies provides alternative solutions based on innovative technologies and new business models.

Changes are Happening in BI
During the past few years, dramatic changes have been occurring in the world of business intelligence (BI). These changes all go towards one goal: removing the barriers - firmly set by the traditional BI vendors - which prevent wider usage of these decision-making systems within organizations.

These barriers include great complexity, high cost, excessive dependence on external system integrators and general dissatisfaction among business users with the tools foisted upon them by the traditional BI software vendors.

Naturally, these changes are leading relatively young and innovative companies to the field. Whether by utilizing newer technologies such as columnar databases or via a software-as-a-service (SaaS) business model, their goal is to change the rules of the game in favor of the BI customer.

The Importance of Data Analysis for Organizations
Business intelligence is a concept which has already been around for more than 20 years, and most organizations understand well the advantages of making decisions based on real data as opposed to relying on intuition and guesses. This has led to tremendous demand for these types of solutions, and the phenomenal consequent growth of the BI industry.

In today’s dynamic world, BI solutions are more necessary than ever to organizations interested in making their operations more efficient, primarily in controlling expenses and maximizing revenues. Furthermore, the Internet’s growth as a means of marketing and distribution is increasing competition in almost every field. Without BI solutions, more and more organizations will be finding themselves left far behind.

The New Self-Service BI Concept
Most of the traditional BI solutions (SAP, IBM, Microsoft, Oracle, etc.) are designed for implementation by subcontractors – experienced professionals generally known as “system integrators” – who specialize in customizing and deploying these solutions (in fact, most BI companies are actually integration companies).

As a result, these solutions require technical expertise which does not exist in the average organization. Additionally, the cost of the actual software in this situation is small relative to the service contracts required with the system integrator. And if that’s not enough, the organization’s complete dependence on the contractor can continue indefinitely, a situation that severely limits the adaptability and use of the solution for the frequently-changing needs of almost every organization.

Organizations considering implementing a traditional BI solution face these serious obstacles. This is the main driver for the most prominent industry trend of “self-service” BI, based on newer technologies and more business-friendly pricing models. These newer BI solutions provide at least the same degree of functionality and power as the traditional products, but are designed to be implemented, customized and managed by the people already found in most organizations.

Self-Service BI Products and Tools
The various types of “self-service" BI solutions can be categorized as follows:

1. Software-as-a-Service (SaaS) and/or cloud-based BI – This approach enables the implementation of certain, usually simple, business intelligence solutions without heavily involving of the organization’s IT department. The basic idea is to fully outsource all the IT services involved in implementing and maintaining a BI solution. These solutions eliminate, at least on paper, the need for expert technical staff, the need to buy and maintain dedicated computer hardware and the need to manage software updates. On the other hand, this approach does not remove the dependency on an outside service provider – it actually increases this type of dependence.

2. BI tools for analysts – This type of BI focuses on tools which make the organization’s data analysts more efficient in their work. Analysts spend a tremendous amount of time gathering and organizing data, usually in Excel, and then preparing graphs and reports. BI tools in this category generally provide facilities to more easily gather, organize and present data, including special analytical and graphical features. However, since these tools do not contain centralized data repositories and reporting facilities, they are not ideal for the multi-user environments which characterize most organizations interested in a BI solution.

3. Data warehouse/OLAP-replacement BI solutions – Solutions designed to serve many users and/or to process complex (or very large amounts of) business data represent the biggest technical challenge. Meeting this challenge in the traditional way – by using a data warehouse and OLAP cubes – is what positioned BI into the exclusive domain of the wealthy and the brave. The tremendous complexity, lack of flexibility and very high cost of this approach is what gave rise to alternative technologies which could deliver the same results (many users and much data) faster, easier and more cheaply. SiSense, for example, developed its own BI technology (called ElastiCube) which exploits columnar database and advanced memory-management technologies to deliver enterprise-scale BI without the complexity, rigidity and high cost of traditional BI solutions. Solutions of this kind represent the basis of an entirely new approach to the BI challenge.

Conclusion
In recent years, great strides have been made to enable the widespread deployment of enterprise business intelligence solutions. Whereas in the past, BI was the exclusive province of large and wealthy organizations, today it is also readily accessible to small and medium-sized companies. Now, even startups can (and should!) take advantage of the substantial business benefits provided by such solutions.

You, too, are invited to join the BI revolution!

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Thursday, April 28, 2011

Are BI Appliances Simply 30 Year Old Databases?


In a thought-provoking blog post published by WIT, a business intelligence consulting company in the U.S., the author writes of latest acquisitions relating to Business Intelligence appliances.

BI Appliances

It got me thinking. I’ve been seeing and hearing the term ‘BI appliance’ a lot recently, and whenever I do - I find myself struggling to understand what it means.

One characteristic that seems to be commonly identified with BI appliances is that they are a combination of software and hardware that form specific functions that have to do with analytics (i.e business intelligence). WIT’s article lists a few examples, including HANA (SAP), HP Business Decision Appliance (Microsoft), Netezza (acquired by IBM) and Greenplum (acquired by EMC).

But is proprietary hardware really required for a so-called BI appliance? No, it’s not. And indeed, I have noticed numerous references to Vertica (acquired by HP) and ElastiCube (by SiSense) as BI appliances. Interestingly enough, both are software-only solutions (i.e. software appliances).

It makes sense, as it shouldn’t matter if your ‘appliance’ runs on proprietary hardware or commodity hardware, if it essentially does that same thing.

The BI Appliance Wars

In a recent interview and in response to quips made by Netezza’s CEO regarding HP’s latest acquisition, Vertica CEO Chris Lynch had this to say about Netezza:

Their tag line is ‘The power to question everything'. So the first question is: why do they need proprietary hardware? The second question is: why are they using a database engine that’s based on technology from 1982?

He is obviously angry, but I agree with the premise of his argument. If you’re in the analytics business and you require proprietary hardware – there’s something seriously wrong with your database software technology. Commodity hardware is so powerful today with 64-bit computing and multi-core CPUs, that it’s hard to imagine what type of BI solution would require proprietary hardware.  That is, if your technology was engineered in the 21st century.

The established vendors are not oblivious to this, but rewriting their entire codebase is not something they are willing to do. So some are partnering and/or merging with hardware companies as an alternative. But at some point, scraping this codebase will be unavoidable, or customers will flee due to availability of much better and cheaper alternatives.

BI Appliance or BI Tool?

As if to toss a little more confusion into the mix, the WIT author asks:

"Though I wonder - with memory becoming cheaper and cheaper and with 64 bit platform, why do you have to have a special appliance? Why not use an in-memory tool with tons of RAM ?"

The question itself indicates a misunderstanding of why appliances exist in the first place, and there are a several answers to this question.  Here are a few:

  1. RAM is cheaper, but it's not cheap. Disk was and always will be cheaper than RAM.
  2. The price of a computer jumps significantly beyond 64GB.  A PC with 64GB of RAM costs significantly less than a server machine with 65GB of RAM, even though there is supposedly just a single GB of memory difference.
  3. In-memory databases assume that the main bottleneck is I/O.  However, when dealing with large amounts of data, this is no longer true.  At such volumes, bottlenecks are between RAM and CPU.

For more information about this, please read In-Memory BI is Not the Future, It's the Past.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog

Thursday, April 14, 2011

BI vs. Big Data - Watch a Columnar Database in Action! (Video)

I was recently fortunate to speak at one of the database technology conferences held in Israel. Big parts of this conference revolved around ‘Big Data’ and I was asked to give the Business Intelligence perspective on this fascinating subject.

As part of my presentation, I attempted to show the impact of columnar database technology on the basic premise of business intelligence - the ability to have business users perform ad-hoc analytics and reporting tasks over as much data as possible.

In order to do that, I represented a business user building a report over a very large operational database containing 13 tables, the largest of which hold 100 million and 40 million rows. While databases of this size were once rare - now, any company who has a properly tracked website quickly accumulates even more data than that.

To demonstrate, I was using a front-end analytics tool (SiSense Prism) to create reports that query the database directly - a feat not advisable with a relational database. So instead of querying the source database, the data was replicated (but unmodified) into a columnar database that was designed specifically for ad-hoc analytics - ElastiCube.

The computer holding the ElastiCube was a $1200 off-the-shelf PC with 6GB of RAM, 100GB of disk space and a single quad-core CPU (64-bit). The Prism front end could be installed on any computer, as it does not process the queries or hold data - only requests query results.

For your convenience, here is the video of this demonstration.



Interesting Points
One thing you should understand from this video is how simple drag-and-drop operations of a business user (or multiple users) within his or her desktop tool turn into complex database operations that would choke any relational database (joining, grouping, aggregating), yet are handled by a columnar database without any difficulty.

Which brings me to my final, and perhaps most important point -

Whichever business intelligence front end tool you pick, dealing with issues like this (and their subsequent side effects) is in fact 90% of the life-time cost of a BI solution, and often why the solution stops being used all together.  This is because BI solutions which rely on relational back-end technology must be designed to assume there had been significant trimming, de-normalizing and pre-aggregation of the data conducted prior to being delivered to business users.  This process never ends, and only becomes more and more difficult to maintain over time.

Columnar databases change this reality entirely, and combining them with 64-bit and multi-core computing makes for a dramatic evolution in BI development.

By: Elad Israeli | The ElastiCube Chronicles - Business Intelligence Blog
Total Blog Directory Technology Blog Directory Business Intelligence Directory