This BeyeNETWORK spotlight features Ron Powell's interview with John Santaferraro, Vice President of Solutions & Product Marketing at ParAccel. Ron and John discuss how ParAccel differs from Hadoop and NoSQL vendors, and how analytics are proliferating within companies as they realize the value of big data analytics.John, what is ParAccel doing with big data, and how is your approach different from some of the NoSQL vendors?John Santaferraro:
While a lot of the NoSQL vendors are focusing on big data, at ParAccel we focus on big data analytics. We look at the whole world of big data – the volume of data, the variety of data, and the velocity of data. We look at the potential for what can be done with big data and have come to the realization that there really is not one single platform that does all that needs to be done. A lot of big data resides in a data warehouse on traditional data warehouse platforms. Those platforms are good at doing things like static reporting, supporting dashboards throughout an organization, and doing limited analytics. A lot of big data also exists in big data platforms like Hadoop. Hadoop is very good at archiving data, filtering data, doing text analytics, and doing batch analytics; but where Hadoop and the data warehouses tend to fall down is in their ability to handle complex or dynamic analytics on massive amounts of data. That happens to be exactly what ParAccel was built to do. If you're going to try and do complex big data analytics
on a data warehouse, it requires a lot of tuning, modeling, partitioning, creating of indexes, and a lot of extra work. If you're going to do big data analytics on a Hadoop environment, it requires a lot of programming; and it really slows down if in that program you have to go back to MapReduce several times. What's great about a platform like ParAccel is that it is built from the ground up to handle complex analytics on massive amounts of data.What has ParAccel done with technology to align with big data analytics?John Santaferraro:
From the very start, we architected the product with the idea that we would be doing complex analytics. It was built end-to-end as a columnar database. That means that compared to row-based technology, we're able to handle analytic kinds of queries at a much faster speed. The old technology would go row-by-row through all the rows of a table to get the data they need. We just pick a column, and grab it, and run the analytics.
Because ParAccel is columnar, we also have extreme compression of our data, which also contributes to the speed. We see on average roughly ten times the compression of data. For example, one of our customers recently had 5.7 terabytes of raw data. They had put it into a traditional database and because of all the indexing, it grew to 10 terabytes. When they took that same 5.7 terabytes of raw data
and put it in ParAccel, we compressed it down to 800 gigabytes. That drives additional speed for analytics.
In addition, we compile all of our queries so we're not actually running a query against the database. We're running the query within the database.
Finally, we have a library approach to advanced functions. We embed 502 advanced analytic functions in our database that are easily usable by any analyst. We're not only supporting the speed of the analytic processing, but we're also supporting the speed of analytic discovery –looking at the needs of analysts and helping them move quickly through the iterative process that they use to get to a quality end result with their analytic discovery work.
Most enterprises that we have surveyed are in the initial startup and prototyping stage with Hadoop and NoSQL databases. Does this agree with your customer base and their use of big data analytics?John Santaferraro:
I want to approach this from a couple of ways because we recently made an announcement about our Hadoop On-Demand Integration module and our overall solution for big data analytics. In that announcement, we identified two new customers. They both happen to come from more of the digital media or Web 2.0 world, but they're approaching it in a very different way.
The first company is Evernote. They support online storage of personal data from individuals in a way that it can be shared and accessed anywhere, anytime. They came to us because they had recently stood up their Hadoop cluster and were beginning to use it. It did all of the archiving they wanted, it did a lot of the filtering, and it did some basic analytics. But they realized when it came to trying to optimize their user experience and the need to do complex analytics that Hadoop wasn't built for that. So they actually stood up a cluster of ParAccel analytic platform right next to Hadoop to cooperate with it and to handle all of the complex analytics that they needed to better understand how individuals and companies were using their products.
The other company that we mentioned in our announcement was Alliance Health Network. They are a social media company that is working to connect people who happen to have similar conditions or similar health interests. It’s 100% social media. They actually looked at us in comparison to Hadoop, and ended up choosing ParAccel instead of Hadoop because, again, we have all of the analytic capabilities that they wanted built into the platform. We were actually able to do a lot of the things that they were intending to do with Hadoop so they chose us in place of Hadoop.
Those examples are a couple of companies in the digital media space that are using big data analytics. We also see a lot of uptake in financial services, especially the area of enterprise risk management. Back in the collapse of 2007/2008, one of the things that led to that collapse was the fact that companies had large portfolios of mortgage loans and they didn't have enough detailed data to be able to drill down and see which individual assets were actually creating the risk. Because of that, they only had a summarized view of the risk; they never fully understood it. With the power of an analytic platform, we're seeing financial services companies do enterprise risk management where they can put more data onto an analytic platform because of our compression capabilities and be able then to see risk not only at a high level, but also be able to drill down to individual assets.
We're also seeing a lot of use of big data in financial services to do things like dynamic stress testing, looking at worst case scenarios of what might happen in the future and trying to predict the outcomes, and also doing context-aware customer engagement where they're looking at full risk and credit portfolios in light of how they market to each individual person.
In the retail sector, we see companies using big data to do things like market basket analysis, merchandising and inventory optimization. But it turns out that the older technology that these retail companies were using is just not able to handle the complexity of market basket analysis, merchandising and inventory optimization. We're seeing folks now moving to the analytic platform to be able to do things in retail that companies have been trying to do for the last 10 years.John, what kind of impact will new technology have on the future of big data analytics?John Santaferraro:
That's a great question Ron. If you look three to five years down the road, I think that we're going to see three things. One, we're going to see analytics embedded in business processes. We've already gone through a phase of moving toward business intelligence
(BI) integrated into business processes. However, while BI
was looking back at what had happened, we are moving towards analytics that look forward to predict the future. The kinds of analytics that will be embedded in business processes will be things like next best offer or next best action. In addition to that, we're going to see more automated decisioning. Companies will move from making a recommendation to an individual about what the next best offer and next best action is to a point where they just allow the machine and the analytics to make the decision on what the next best offer is or next best action. In addition to that, I think we're going to begin to see things like competing algorithms. For example, what happens when you've got one algorithm that is calculating credit risk, inventory levels, and supply chain optimization; and, on the other side, you have a customer who's likely to buy products. There may actually be a thousand of those of customers. What do you do when the recommendation engine is trying to sell a product to a thousand customers and there are only 500 of those products left in inventory? There's going to be a whole world of competing algorithms. When companies embed analytics in business processes and begin to automate decisions, they’re going to have to figure out how to reconcile these competing algorithms and create the best possible scenario.What are the trends that you are seeing today that will have the greatest impact on analytics within enterprises?John Santaferraro:
We see a few trends. One, the analytic culture among business decision makers is becoming much more common. If you look at the MBA programs today, they all include statistical analysis. Most people come out of an MBA program with a basic knowledge of how analytics work – and everybody's seen Moneyball. So there's an analytic culture among this business decision makers that is growing and will eventually create the analytic-driven enterprise.
The second trend that we see is toward analytic proliferation. As one part of a business begins to see the benefits of using analytics to improve their supply chain management, or their acquisition of new customers, or to improve retention, or to help salespeople close more deals, other parts of the business are hearing about their success and want the same kind of analytics. There's a definite move toward analytics spreading across the enterprise.
Third, we're going to see cases of analytic dominance. Companies that have become analytic-driven enterprises – those that have figured out how to use analytics, have changed the culture of the leadership and the business decision makers, and have proliferated analytics – are going to be the dominant force in their industries. We're already beginning to see that. I think you could go back and look at some of our customers when they started their analytic proliferation program and compare their stock price over the last couple of years. I think it would be pretty obvious that analytics have impacted the overall outcome of the company and how they effectively compete against others in their marketplace.Where do you feel we are with these trends in analytics and big data? John Santaferraro:
I think that we're very early stage. I think that analytics as a weapon has been around for at least the last ten years. What's happened more recently is this push in big data. Everybody jumped on the bandwagon and looked at big data and suddenly put the two pieces together and realized that the real value in big data is actually in the analytic part of it. It's big data analytics that actually drives value out of the big data, and I think that people are just beginning to see the potential and just beginning to figure out how to use technology to extract the value from big data.Recently it seems that growth and awareness of ParAccel has really accelerated in the marketplace. Do you agree?John Santaferraro:
Absolutely. We started off the year with a historic quarter for our company. We're continuing to see a lot of momentum in terms of people seeking out our product to do big data analytics. I think if you look at the last several years, we were out looking for customers. Now those customers are coming to us. It's an exciting time.John, thank you so much for taking the time to talk with me about big data analytics.
SOURCE: Big Data Analytics: A Q&A Spotlight with John Santaferraro of ParAccel
Recent articles by Ron Powell