27. September 2012–
The conference will include more than 25 high-profile and international speakers includeing New York Times bestselling author Christopher Steiner who will give the opening keynote. The event is shifting from a specialist theme to the public focus. Here, Matt explains why big data means big business…
First things first: what is “big data”?
In matter of a few months, “big data” seems to have gone from a technical topic of interest to internet geeks only to a global phenomenon with wide-reaching implications, not only to business but also society.
On the one hand, just about everything in our work and personal lives is digitized and instrumented, resulting in massive data sets; on the other hand, a new breed of technologies is appearing that are increasingly capable of processing those data sets at scale, offering the promise of extraordinary insights in just about everything we do as humans.
This is an opportunity so big and exciting that commentators have started running out of metaphors – data is “the new oil”, “the new gold”, “the new frontier”, the “new plastic”, the “new black”, etc.
But for entrepreneurs and technologists around the world what, exactly, is the opportunity? I see it unfolding in several “waves.”
Wave #1 Big data infrastructure
Right now, the big data discussion is very much about core technology. Look up the agendas for big data conferences around the world and you’ll see – it’s all about software and data science, fascinating stuff but very technical and generally hard to understand for anyone that’s not deeply versed into those topics.
Core big data technologies may have originated from consumer internet companies, but at this stage there’s not much that feels “consumerish” about big data. The reason for this is that we’re still early in they cycle, and a lot of key infrastructure challenges need to be addressed, before much else can happen.
For example: how do you process big data in real time? How do you clean up large data sets at scale? How do you transfer large volumes of data to the cloud and process it there? How do you simplify big data tools to make them approachable by a larger number of software engineers and business users? As a result, much of the innovation has been happening at the infrastructure level.
Opportunities for big data pioneers
This is a time of tremendous opportunities for new entrants. Many large technology vendors have been struggling with big data, in part because the underlying technologies are very different, and in part because they’ve been making a lot of money so far selling expensive solutions to process comparatively smaller data sets – some of the new entrants claim to be up to an order of magnitude cheaper than the Oracles of the world.
Large companies have made some interesting moves (Oracle partnering with Cloudera, Microsoft announcing support for Hadoop) but presumably, they will delay the inevitable for the most part, and this will lead to plenty of attractive acquisition opportunities for startups and their investors over the next few years.
It is also a time of confusion for anyone trying to figure out who the real success stories will be
• There’s a lot of noise, and this is only going to accelerate as VC money continues to pour into the industry. Also, the fact that older, larger companies seem to be racing to rebrand as big data companies doesn’t help.
• There’s a fair number of “science projects” out there – companies that, at least for now seem to be focused on solving an engineering issue but haven’t quite thought through their commercial applicability.
• It is going to take a while for winners to emerge – unlike consumer internet startups that can experience hockeystick growth from inception, software startups go through generally slower adoption cycles (consumerization of IT notwithstanding). Also, the abundantly documented (but presumably temporary) shortage of Hadoop engineers and data scientists may somewhat slow down the widespread adoption of those technologies.
• The surge in interest about all things big data will inevitably lead to some level of disillusionment, as projects turn out to be harder and more time-consuming than expected, and sometimes underwhelm their sponsors. Startups will have to struggle through that phase, which may slow things down as well
Sooner or later, of course, winners will emerge, and what seems to us like daunting technical challenges will become something that any qualified software engineer will be able to handle, equipped with reasonably simple and cheap tools. There’s always a slight irony to underlying technologies: their ultimate sign of success is that at some point they become a given, a starting point, a simple enabler – in other words, boring.
Wave #2: “Big-data enabled” applications and features
As core infrastructure issues are gradually being resolved, the action will move to the application level, expanding the benefits of big data to a broader, non-technical audience within the enterprise, and to more consumers online.
Within the enterprise, we should see a lot of innovation around business applications. Enterprise software has always been to a large extent about enabling business end users to access and manipulate large amounts of data. “Big-data enabled” enterprise applications will take this to the next level, offering business users unprecedented data mining and analysis opportunities, using larger volumes of internal data, in real time or close, and sometimes augmenting it with external data sets available through data marketplaces.
This will happen across many different enterprise functions (finance, sales, marketing, HR, marketing, etc) and across industries, from retail to healthcare to financial services.
Mine your customer base in real time; a more personalized web
The possibilities are intriguing: for example, what will a CRM application look like, when you can mine in real time all of your customer base, the interactions of your sales force with them, and combine the results with external data sets on industry and company news, geographic and demographic patterns, to determine which prospects are the most likely to buy in the next quarter?
On the consumer internet front, data-driven features should become commonplace on many websites. Internet startups led the way, in particular with their recommendation engines (Amazon, LinkedIn, Netflix, Facebook and iTunes in particular). But so far those features have required having first-rate data scientists on board, and an ad hoc infrastructure.
I would expect all of this to democratize considerably in the near future, as the infrastructure evolution mentioned above takes place. Retailers, financial services companies, healthcare providers will all use data-driven features to customize and personalize their users’ online experience, and accelerate their core business.
As over time any company with a web presence will want to offer data-driven features, there is an interesting market opportunity for startups that could provide easy-to-use, out of the box tools to do this easily (“big data out of the box”).
Third wave: The emergence of “big data enabled” startups
The democratization of big data infrastructure tools will also open wide the opportunity for entrepreneurs, including those without a deep tech background, to dream up entire new businesses (and business models) based on data.
Just the way we were talking about “web enabled” businesses a few years ago, we’re likely to see more and more “big data enabled” businesses appearing. By that I mean companies that have the ability to process large amounts of data as their core DNA, and use it to deliver a product or service that could not exist otherwise.
Of course, there are already a number of startups that live and breathe data. Companies like WeatherBill (which compiles large amounts of weather data from a variety of sources, then sells insurance based on statistical analysis), Klout (a controversial startup that processes large amounts of data to create every users’s social influence score) or Wonga (which crunches data to grant financial loans) are some early examples of startups with big data as their core DNA.
This is just the beginning…
FOR RELATED ARTICLES, CHECK OUT: