This is about how the shift from traditional store-and-analyze to modern real-time-analytics opens many new possibilities for organizations to become agile.
Organizations today are generating huge amounts of data at high velocity in a variety of formats. IOT (Internet of Things) promises to bring hundreds of billions of connected devices to the Internet that will add to this data explosion.
Ability to translate this data to meaning in real-time is of great value to organizations. Its like having a “Live Translator” that is continuously listening to underlying business events, recognizing patterns, performing correlations and translating your data to meaning as things happen.
Streaming Big Data Analytics enables organizations to continuously analyze data to reveal real-time insights and proactively act on them while these insights are still fresh and relevant.
Traditional BI (Business Intelligence) tools and techniques fail to provide the scale and capacity required to handle the increasing volume, velocity and variety of big data. In addition, traditional BI techniques are based on after-the-fact analysis or Batch Analytics, and provide limited support for Live orStreaming Analytics.
Contrasting Persistent Data Analytics with Streaming Analytics
Traditional approaches to analytics focus on capturing data from multiple sources at scheduled intervals, persisting the captured data in a data warehouse and then running queries through the persistent data to detect patterns, correlations and insights for delivery to end users.
On the other hand, Streaming Analytics focus on capturing data in the form of micro-level events from multiple sources in real-time, and running them through a pre-defined set of queries to detect patterns, correlations and insights for immediate delivery to end users.
Reference Architecture for Streaming Analytics
Following is a high-level reference architecture of a Streaming Big Data Analytics Solution using Google Cloud Platform.
The solution makes use of the multiple Big Data Services available under Google Cloud Platform. Following is a list of major components of the reference architecture:
- Cloud Pub/Sub to subscribe and aggregate events generated across multiple systems (sensors) within an organization.
- Cloud Dataflow to process the event stream through a pipeline with pre-defined processing modules that perform event filtering, correlation and pattern detection to derive useful insights.
- Distribution tier for real-time distribution of insights via web based Dashboard and mobile notifications.
- BigQuery to store and analyze massive amount of historical events.
Event Stream Processing (ESP)
Streaming Analytics is based on a technique known as Event Stream Processing. Think of ESP as a relational database upside down. In a relational database, you store data and run queries through it. The query exits after it returns the results. In Event Stream Processing, you store queries and run data through it. Streaming queries do not exit and continue to return results as incremental data arrives in the stream.
In order to process the continuous stream of events and detect patterns, Event Stream Processing platforms provide mechanism to bundle events by time, length, session and multiple other ways. These bundles are referred as Data Windows.
- A Sliding Time Window might capture events that arrive in past n seconds. Events enter the window as they arrive and exits in n seconds.
- A Sliding Length Window might capture past n number of events. Events enter the window as they arrive and exits as soon as the next n number of events enter the window.
The shift from traditional store-and-analyze to modern real-time-analytics opens many new possibilities for organizations to notice trends and insights as they happen and act on them immediately within a narrow window of opportunity. Organizations are starting to find a lot of value in applying these patterns in many use cases like the following:
- Identifying fraudulent transactions and performing preventive actions as they happen.
- Notifying application owners of possible issues by continuously scanning log entries.
- Creating a lead in your CRM as soon as the engagement of a user on your website crosses a certain threshold.
- Continuous analysis of social media stream to perform sentiment analysis.
The advent of services like Google Cloud Dataflow and Cloud Pub/Sub now enables performing “Event Stream Processing” at high scale and reliability in the cloud. This is a significant technology enabler that will trigger a trend towards real-time analytics.