Mike Abbott of Kleiner Perkins Caufield and Byers delivered a great presentation on this topic. He began describing what big data is. The funny thing is that there is no agreed definition for this term. in a recent study more companies agree on big data's benefits than its definition.
Mike had led the team the designed big data for Twitter (no small feat) before he joined the VC community. Often he was the person "trying to get the money". He raised $100m in venture capital. During this time he got a strong appreciation for big data and its role at companies.
Interesting big data stat: 2.8 Zettabytes (ZB) of data will be created and replicated this year. A zettabyte (symbol ZB, derived from the SI prefix zetta-) is equal to 10 powered by 21 bytes or 1,000 exabytes (or one sextillion (one long scale trilliard) bytes). Just 0.5% of worlds massive trove of data is being analyzed. 3% is tagged and 23% is useful if tagged and analyzed
By 2020, IDC predicts that the digital universe will hold an amazing 40ZB of data. 57x amount of all the grains of sand on all the beaches on earth. If we saved 40ZB onto today's blue ray discs : 424 aircraft carriers
Between 2012 and 2020 the size of digital universe will double every 2 years. Machine generated data will be a major driver of that growth.
What is big data?
Big data is defined by its dimensions:
Velocity: move at very high rate and valuable in its temporal, high velocity state
Volume: Fast moving data creates massive historical archives. Valuable for mining patterns, trends
Variety: disparate sources create new insights
Relevance: Can you derive relevance from the data?
The applications for big data are very early Companies that are piloting in customer focus areas are not looking at twitter like data yet. But there is a great deal of experimentation.
Four of 10 businesses view analytics of it as an enigma. The ROI is not very clear yet.
How does this apply to marketers and advertisers?
64% of B2B companies said they are having unerliable marketing data such as incomplete records and data integrity issues. Often, after we clean up our data and try to use it we run into privacy issues. These issues are becoming increasingly problematic. Consumers are conflicted. 3 in 10 say companies should never track them. Clearly trackers deployed by websites are on the rise. US consumers keep personal data even from trusted companies such as religion, politicl leaning (US personal information sharing with trusted companies). Americans' online privacy concerns are also in popular online activities such as shopping. At the same time 9 in 10 mobile device users will swap personal info for offers *Mobile users' openness to data sharing for offers)
How do we combine multiple data sources?
Years ago the traditional model with Cognos was to move a large amount of data to the data warehouse in order to get insights. Now we can put the data in one place very efficiently and run queries in sync, join information and generate patterns.
Ad agencies have a role in data. They can bridge gaps and provide a "data hub" to customers. Advertising is dependent on trends and brands have a lot of this data at their disposal. Ad agencies provide a valuable communication conduit between the customer and the brand.
The market for big data is big. More investments were made this year than any previous year (112 investments). The term big data as a label will change and evolve. It is overloaded and means different things to different people. Some potential new definitions he offers:
Smart Data: production of persistent data through predictive analytics. Companies are moving beyond BY such as Twitter.
Predictive analytics: used in fraud systems and recommendation engines. Techniques for this borrow from statistics, machine learning, modeling and other fields to identify and exploit patterns.
Another area is NewSQL (vs. NoSQL). It describes highly scalable horizontally distributed SQL systems.
Last but not least is data science and data scientists. There is a shortage in the field and a need to develop and hire data scientists in this new and growing field.
Other areas that investors are looking into in the VC community:
Stream processing and streaming analytics
Image and video mining
The VC community is bullish on first followers - those companies that look at innovation and make it a breakthrough of their own.
First movers have historically had a 47% fail rate
Fast followers on the other hand, have a 8% failure rate and understand customers better