Oops! The input is malformed!
Originally published 3 December 2008
Following on from my article, “Getting Started with Operational BI,” I would like to continue the discussion on this topic by delving into new emerging technologies in this field. In my last article, I defined two broad categories of operational business intelligence (BI). These are operational use of on-demand BI (typically BI services integrated into operational applications and processes) and event-driven operational use of BI. The focus of this article is event-driven operational BI.
In my research into operational BI, I have developed an Operational BI Adoption Survey, which I would encourage you to complete on the Intelligent Business Strategies website. In that survey, there is a question about what it is you think operational BI includes, and you are asked to select from a list of options. Two of these options, business activity monitoring (BAM) and complex event processing (CEP), are about event processing. Having written about BAM on a number of occasions, my focus this time is on CEP, which is a newly emerging technology.
There is no doubt that the speed at which processes execute is on the increase. Take ordering a “build-to-order” PC for example. In the past, it would take about a month to get your build-to-order PC, and now it takes a few days. Checking in at an airport is now a simple bag drop or heading straight into the security line if you already checked in at home. If these things are happening, then clearly the speed at which events happen in business is on the increase. In addition, there are increasing amounts of information available to consume. Companies are investing in instrumentation to capture more and more data to make sure they can track business operations. Look at RFIDs in retail supply chains and sensor networks in manufacturing and in oil and gas. These technologies are resulting in an explosion of event data that businesses can now capture to measure and analyse business performance. However, it is not just about being able to analyse this information, it is about being able to interpret it and act on it more and more rapidly.
One thing is very clear. If you have a classic BI system whereby operational data is captured and integrated overnight, then the earliest any business analyst can respond to an issue is 24 hours later. That is not good enough when the speed of business is in the tens of thousands, hundreds of thousands or even millions of events a day. With classic BI systems, businesses are unable to see problems, assess the impact of events (e.g., changed order) on their businesses and respond in time. In short, if your business is only taking data from operational systems into a BI system for analysis once a night, then for much of the day you are flying blind. Fraud is a clear example. Fraud must be detected as quickly as possible so that action can be taken to stop it. Taking action the following day is too late. In fact, it would be preferable to detect a pattern in the data that predicts fraudulent activity before it happens so that the business impact is minimised and the business continues to run smoothly. Risk management is another example. If a banking customer defaults on a loan payment, how long does it take before the bank closes off the credit limits on the credit card(s) of same customer? If a truck breaks down on a valuable customer delivery, how long does it take to respond to get the order to the customer on time? There are clearly all sorts of good reasons why businesses want to monitor events including to remain in compliance, manage and minimise risk, prevent business disruption, reduce operational cost, etc. But it is more than just simple events that need to be monitored. The correlation between many events also needs to be understood. For example, a password change on an account, a large withdrawal and a change of mailing address might indicate fraudulent activity. This correlated event pattern is known as a complex event, and this may be an event that the business has to respond to rapidly.
Now take this to another level. Consider millions of products with RFID tags running under RFID scanners at different points in a supply chain, or a billion mobile phones on the Internet and the huge volume of events that this may cause, or the financial markets and the millions of events happening there every minute or even every second. Identifying event correlations in massive “event clouds” (potentially millions of events) is non-trivial. It is complicated by the fact that the events that form a correlated pattern can arrive out of sequence and come from different sources (e.g., a mix of external web feeds and internal OLTP systems). To further complicate things, there could be several instances of the same correlated pattern in an event cloud with events associated with each instance of the pattern mixed in with events of another instance of the same pattern. Figure 1 shows the concept of CEP. It is derived from a similar diagram I saw recently at an event processing analyst briefing.
Figure 1: Complex Event Processing
These complexities give rise to a number of requirements for complex event processing which are:
Another name for event data that is moving over an internal web, an internal enterprise service bus or over the Internet is a data stream. What we have seen from the previous discussions is the desire to be able to query and analyse multiple data streams to identify correlated event patterns. So the question is how we do that. The answer lies in complex event processing technology. There are now several products on the market in this space. Some examples include:
Given the potential for huge volumes of events, CEP requires data to be read in memory before it ever reaches disk. In fact the requirement is to be able to query continuous streams of event data while the events “fly by” so to speak. This means that time series is important. For example, you may need to calculate a moving average over the last 5 minutes. The 5 minute time period is known as a time window. Streaming query language operators can query and manipulate continuous streams of event data within these windows. Today, CEP vendors often use their own languages to read streaming data. For example, IBM InfoSphere Streams uses a language called SPADE. Programs, therefore, need to be developed in this language and deployed to execute over multiple blade servers. Standards are emerging however to query time-series data as it moves in real time. The possible emerging standard is called StreamSQL, but this is not yet widely adopted or ratified. Nevertheless, vendors like Corel8, IBM, Oracle, Streambase, and Truviso are all working on the StreamSQL standard.
StreamSQL is a variant of standard SQL and is specifically designed to express processing over continuous streams of time-series data. Stream SQL can be used to perform SQL-style processing on the incoming messages as they fly by, without necessarily storing them as SQL does. StreamSQL extends SQL by adding to it rich windowing constructs and stream-oriented operations. To accommodate this new language, new tools are emerging to develop applications with StreamSQL.
Following is an example:
SELECT t.symbol, avg (T.price)
FROM StockTick (policy: maintain last 2O seconds where symbol = “HSBC”) T
Group by t.symbol
With this kind of technology, it potentially becomes possible to use CEP to filter high volume events so that only events of interest are passed on to simple event processing technologies for analysis, alerting and/or action taking. Again, as an example, IBM InfoSphere Streams can filter events and feed them into IBM Cognos Now! or WebSphere Business Events for subsequent analysis and action taking (e.g., alerts). Equally, Tibco Business Events could feed Tibco BusinessFactor in a similar fashion.
CEP holds hold a lot of promise. The challenge for most of us is to figure out how to leverage this technology to optimise business operations.
Recent articles by Mike Ferguson