Tags

, , , , ,

The media industry: After digitisation comes machine learning

I’ve been working most of my life in and around media analysis and I believe machine learning and it’s younger sister data science is in an exponential trajectory in restructuring the whole business. I see a Ketchup-effect in the near future: first nothing, then nothing, then a drip and then all of a sudden – splash!

For you who are not familiar with that business its first of all very small market and it’s core is about measuring and evaluating PR efforts. The global organisation organizing most of the companies in this business is AMEC. It’s members are fundamentally different from the much larger media monitoring industry in that they (we) provide deeper and more tailored analysis of media content than the monitoring firms that focus on collecting and distributing media content. After the rise of social media there has also grown up an offshoot-industry around creating statistics around social media engagement by firms such as Socialbakers.

“Media Monitoring” is being eaten by data collection giants

The digitisation of media has gone on for quite some time now. One aspect is the traditional media companies themselves that go online, another is new media forms such as the social eco-system and a third is the inevitable change in the media “after-market” of monitoring and so-called listening tools. In the first wave digitalization meant scanning offline print to PDF before distribution to the after-market. The second wave meant actual raw data being delivered via API:s in raw machine readable formats such as JSON. This shift has meant the creation of new markets for data delivery, most notably in the form of data vendors Datasift and GNIP, but also specialized data vendors such as Spinn3r for blog data on a global scale.

Adding coders to business-as-usual or coding the new businesses?

From another side of the market a new industry has emerged that is a very different animal than traditional media monitoring and analysis companies. Therefore it has been hard to take into the equation for especially he analysis companies assessing the competitive landscape. The main reason is that it is technological at its core and have human analysts more as crusting to the cake or even in the sales positions, while in the traditional media monitoring in general and the media analysis business in particular it is by tradition the other way around. Here in Sweden the monitoring-turned-analysis company Meltwater is a good example. Based on a relatively simple technological product for monitoring and a strong focus on sales people it grew so much that it could make a strategic investment in an “AI-company”, buying a small team of around 10-15 machine learning specialists to add to their over hundred employees at the time. Applying machine learning atop their own monitoring product they are able to eat into the market for media analysis answering deeper questions about the content than just volume and simple key-word searches.

The growth potential in a truly data-centric business is unbeatable

For a business born out of the machine learning market media content is just another input source to their core text mining algorithms, however. Text mining currently experience a tremendous momentum due to the recent progress made in deep learning. It’s, just like data science, an interdisciplinary approach to content analysis that draws from advances in computing power, statistics and machine learning over many years, but happened to prove amazingly effective to current industry problems like classifying what’s in pictures really well. It’s currently quite a frenzy in applying the technique to all kinds of areas such as astronomy, biology, finance and of course marketing. Even this quiet little pond, measured as market size, as media analysis is currently hosting meetups where technologists show-and-tell about their experiments in the field.

What makes me think that the media analysis market will be more or less swept away the coming years?

Well, first of all the built-in productivity of any machine learning endeavor that has exponential growth built into its DNA (see slides below about why companies like Uber, Google, Apple, Valve etc exponentially outperform the competition) the business as compared to the business-as-usual way of adding more head-count especially in sales to grow.

Secondly I’ve been watching and in recent years experimenting myself with using this technology to answer the more complex questions of media analysis such as does the media coverage align with our brand strategy? What client-defined topics are used when describing our company? Those types of questions where probably possible to answer 10 years ago if you had a few million dollars to spare (or access to the few academics that where than developing the techniques that are now booming into market). Today all you need is really some fundamental coding skills and ability to ask the right questions and design the research projects well. This leads me to comment on the very prevalent over-confidence in algorithmic approaches to the analysis of communication and management of businesses based on data analysis. You really have to be reluctant to actually meet the realities of any business operations or manually reading through and tagging texts to come to the conclusion that algorithms by sheer magical power will produce useful insights.

A word of caution: don’t count out wisdom in the age of quick facts. 

Just like the mass hysteria to “prove the ROI” of PR in traditional media by measurement during the last 15 or so years is barking up the wrong tree (the right tree is going online for end-to-end measurability) the current tendency to fall in love with numbers in the age of cheap and abundant data and analysis capacity will be just a phase. The 10000-dollar question is really how to balance the tremendous power fallen into almost everyone’s hands due to the advances in the data analysis industry in general and machine learning/text mining industry in particular and the ability to identify business relevant questions to apply data-logical thinking and techniques to. And judging by history it will most likely NOT be the current players, fettered by current successes to defend and mind-sets based on how success was produced in an old business paradigm (the Innovators dilemma is a good read on this). My guess is that it will come from a small team with a large mission and a whole new set of skills and capabilities in machine learning and insight into why people share stuff online. Hint: it’s got a lot to do with the fundamental difference between people and machines.