Overcoming Data Blindness

May 31, 2018 by 

This piece originally aired in the Journal of Business Strategy, Vol. 39 Issue: 2

Every day brings a new and astonishing application of “alternative data,” the increasingly visible wave breaking on the shores of the Big Data lake. In an age of radically increasing data transparency, companies are in an ever-accelerating hunt for data-driven insights that can help them understand customers better and increase performance and profitability. This scramble for data is perhaps most pronounced among big asset managers seeking under the radar trading hints but can be found across the full spectrum of industry sectors. The internet of things and aspects of artificial intelligence, such as natural language processing and facial recognition, are all being mined for data leading to insights to give companies and investors a meaningful edge.

Business executives can’t leave data analytics to the specialists but need real mastery in order not to be blindsided by the increasing diverse and intrusive ways the field is being used.

For most of us, however, the explosion of data sets has led to a condition not unlike snow blindness – the harder we look the less we can see. As the ability to identify, manage and extract powerful insights from larger data sets becomes an increasingly important source of competitive advantage, we need to develop more robust skills for leaders outside the lab. In the struggle for this advantage, those companies that develop the data analytics “muscle” of their non-specialist leadership will rapidly widen the gap separating them from their less farsighted peers. Those executives closest to the customer are those who will most need to understand how to query this logarithmically expanding data universe.

A polar vortex or bomb cyclone of data sets, to use the meteorological terminology currently in fashion, is throwing off increasingly strange data sets and data correlations, some of them creating significant issues. In early 2018, much to the surprise of the US military, a graduate student in Australia used GPS data from a wearable fitness app to identify previously undisclosed US military bases in a range of different countries. In late 2017, the fitness app company Strava released a searchable heat map based on a billion activities logged by people who use the app either on their smart phone or with a Fitbit. Looking at those areas of the heat map covering less populated areas of Africa, the Middle East and Central Asia, Nathan Ruser, an Australian cyber security research student, deduced that the app users in these areas were more likely to be US soldiers than fitness-crazed villagers. It was a small step to deduce from these data that Strava was unwittingly revealing secret troop locations.

In other industry sectors, the search for a competitive advantage is leading to some very surprising alternative data experiments. Cargill, at $120 bn, the world’s largest food company, has long benefited from the insights into the world’s agricultural commodities markets derived from its market dominance. However, the increasing availability of information about weather patterns and shipping movements has eroded this advantage. As a result, the company is looking to artificial intelligence and alternative data sources to preserve its market leadership. This process has involved looking at all of the data scraps, often called “data exhaust,” that it can pick up as products flow through its factories, ports and silos.

In addition to developing machine learning tools to enhance its future trading and reading satellite images to assess crop health, it is also using sensors in its farmed shrimp ponds to pick up the sounds of shrimp eating. Interpreting these sounds will, Cargill claims, help farmers to more accurately dispense fish feed. According to Cargill’s press release describing its new iQuaticTM automatic feed dispenser, acoustic sensors can detect when the shrimp are actually eating. As a result, these can dispense the feed pellets within the shrimp’s natural feeding cycle so that the pellets are consumed before important nutrients dissolve.

Looking for a similar edge, financial analysts have begun to use the natural language processing and pattern recognition to detect hidden and subconscious messages being delivered by Federal Reserve chairmen and the chief executive officers (CEOs) of public companies. A company called Prattle founded by Evan Schnidman, a former Brown University economics professor, is an example of this form of analytics. He hatched the idea while studying Fed communications following the fiscal crisis of 2008 to evaluate the shift from the communications style of Alan Greenspan to the more open approach of Ben Bernanke. Today, Prattle provides communication analysis for more than 20 central banks around world. In September 2017, his firm began offering analyses of more than 3,000 publicly traded US companies.

Prattle’s machine learning algorithm is based on the idea that people’s speech patterns are not random but relate directly to the conscious and sub conscious thoughts of the speaker. By comparing the language used by a CEO during an earnings’ call to his or her historic speech patterns on past calls, the company scores the lexicon of words and phrases used to provide a so-called Prattle score that is also corrected for common fundamentals, such as peer company and market performance.

With a client base of large asset managers, hedge funds and international investment banks, Prattle’s approach is by no means an outlier. According to a study by Alternativedata.org, more than 160 major money managers employ at least 340 full-time data analysts, engineers and scientists, a number four times higher than that in 2012. In addition to using services such as Prattle, these analysts are scouring satellite data to read foot traffic in shopping centers and scraping email receipts to identify consumer spending trends. We are already a long way from sending an intern to check on the number of cars in the corporate parking lot at 7 a.m. to determine a company’s financial health.

As disturbing as some of these new analytical tools are, there are numerous examples of how alternative data analytics is being put to good use. One area of particular interest from this perspective is biometrics. An Australian startup, Brain Gauge, has shown that analyzing people’s speech patterns can be used for the real-time detection of stress levels. Insights from these data could be used in call centers, for example, to improve employee health and reduce absenteeism. Other unexpected insights have been derived from studies showing that a decreasing length of stride (gait) is a predictor of dementia and, conversely, that walking higher speeds in older adults correlates with longevity.

However, this rapidly evolving data analytics environment poses a number of significant challenges for corporate decision-makers that will need to be addressed, both in corporate usage of alternative data sets and in defending themselves against its use by others. The most pressing need is to educate corporate executives on the use and potential abuse of alternative data sets. Outside of the ranks of specialists, understanding of how alternative data can and should be used is rudimentary. As these tools become ubiquitous, it is a matter of increasing urgency for the average business executive to develop a reasonable familiarity with data analytics. Indeed, it will arguably become a prerequisite for advancement and success in many fields in the not too distant future to have more than a rudimentary understanding of this complex science.

The first compelling reason for this requirement is that not all alternative data sets are created equal, and it is important for us to recognize when the alternative data emperor is wearing no clothes. The critical skill is the ability to ask intelligent questions about the quality of the data being analyzed. We need to be able to ask the specialists questions such as:

Q1. What is the source of the data and are there any reasons to question its reliability?

Q2. Are there any anomalies in the methodology that may have affected the data?

Q3. Are there any obvious missing values in the raw data that could distort the analysis?

Q4. What kinds of outliers exist in the data set and do they make sense in the overall context? For example, a house price of $2m in a neighborhood of $200,000 homes should send a red flag.

A healthy skepticism provides the best defense against being led astray by alternative data, and this vigilance should be ongoing, as the use of alternative data increases exponentially.

In addition to understanding the quality of alternative data sets, business executives also need to develop a clear understanding of the regulatory risk in their companies’ use of alternative data. While data providers insist that they are scrupulous in scrubbing the data sets they are offering clients of all personally identifiable information, the increasing use of real-time location data provided by mobile phone providers (with all the risks that implies) suggests that the compliance challenges are not trivial. In recognition of this problem, a new consortium of alternative data providers was formed in early 2018 with the goal of providing international standards to protect providers and users from risks. The consortium, called the Investment Data Standards Association, will establish industry best practices “designed to mitigate legal and compliance risks and improve alternative data product.”

More widespread sophistication about the benefits and risks of alternative data usage will not only help companies manage their regulatory and compliance risk but also give them better tools to understand how alternative data about their operations and their customers is being exploited by investors and other third parties. In particular, those executives responsible for the reputations of their companies will not only need to understand how to respond to investor analysis of their CEOs’ body language and speech patterns but also develop a comprehensive portrait of how alternative data shapes perception of the organization’s strengths and weaknesses. Just as, today, top-tier communication leaders capture and track social and mainstream media sentiment, the leaders of tomorrow will need to understand and attempt to manage how alternative data analytics shapes perceptions of everything the company does.

It will be a fascinating journey keeping track of the burgeoning alternative data universe and, fortunately, not all of it poses an existential threat. Students of science today can already avail themselves of a bizarre range of data sets, including the database of races provided by the American Racing Pigeon Union, the data on pinching efficacy from the officially named “Investigation for Determining the Optimum Length of Chopstick” and how LSD affects mathematical ability. The ways that we will have of listening to customers will soon approach the infinite, including those customers who are shrimp.


Best Viewed in Chrome, IE8/9 and Firefox.