Here are the 5 questions we will be unpacking:
- 1: How much data did the financial industry generate?
- 2: Self-service data tools – in or out?
- 3: What is the real cost of poor quality data to the world’s largest economy?
- 4: How much data was created this year and what are the implications on data pools?
- 5: How often did people tweet over the course of 2020, what does that say about user interest, and the data sets collected?
Image source: Bright Data
#1: How much data did the financial industry generate?
The banking sector generates unparalleled quantities of data. The amount of data generated each second in the financial industry grew 700% in 2020.
Analysis + insights
The financial sector has had an epiphany over the course of 2020. Financial institutions have always made decisions based on analyses, consumer trends, and cold, hard facts. But never has real-time data been so highly valued as over the course of 2020. alternative data otherwise known as ‘External Data’ includes the likes of:
- social media posts
- credit card transactions
- geolocation data
which have all gained prominence among hedge funds, institutional investors, investment houses, and the like. These data sets are enabling financial professionals to identify consumer, market, and competitor trends in real-time and in many cases before quarterly reports or earnings are announced. Acting on alt data insights enables these investors to capitalize on trends before the public markets, and individual securities fluctuate based on corporate news enabling them to greatly increase their alpha.
Going into 2021 we can expect to see alt data usage trickling down to smaller financial actors, boutique investment houses, for example as well as a wider variety of industries benefiting from these data sets such as real estate, eCommerce, and brand management.
#2: Self-service data tools – in or out?
In 2020, self-service spending on data tools grew an estimated 2.5x higher than traditional data tools spending in the previous year.
(Source: Finance Online)
Analysis + insights
I think that companies are beginning to realize how resource-heavy in-house data collection solutions truly are (think servers, manpower, DevOps etc). Then there is the issue of having to maintain and sustain your in-house data collection operation even when it is not a monthly or quarterly priority. Over the course of 2020, large and mid-size corporations have started investing in self-service data tools as they realize the financial and operational benefits of on-demand autonomous data collection solutions. They can turn data collection jobs on and off at will, team members are free to work on core products and off-site infrastructure can be used such as cloud solutions.
Going into 2021 I think we will see this trend increasing as companies try to cut costs and diversions while increasing profitability and market share.
#3: What is the real cost of poor quality data to the world’s largest economy?
Poor data quality cost the US economy an approximate $3.1 trillion yearly over 2020.
(Source: Tech Jury)
Analysis + insights
Speak to any CEO (especially at technology companies) and they will likely acknowledge the importance of Artificial Intelligence [AI] and its significance in developing market-leading, next-gen products. Surprisingly, fewer are talking about the quality of the data which actually serves as the basis for the AI’s training and operational performance. As the above statistic shows using poor quality data has a high cost whether it comes in the form of cleaning or rendering data sets or in precious wasted data scientist hours or simply less valuable output. For example, if data sets used by real estate investment houses had a significant time lag or were corrupted in any way from a geolocation perspective, this would severely damage algorithmic insights and output.
In any event, despite the affect, poor quality data has had on the world’s economy in general, and the American economy in particular, I think awareness is on the rise. But as with carbon emissions and the global goal of decreasing fossil fuel usage, adoption and implementation will take time.
Going into 2021 we will see more corporations recognizing the fact that they want to invest in solid foundations (ie high-quality data collection) in order to streamline algorithmic output, and employee efficiency.
#4: How much data was created this year and what are the implications on data pools?
Approximately 90% of all data has been created over the course of the last two years.
(Source: Tech Jury)
Analysis + insights
As our economy in general, and consumers, in particular, shift their activities online including eCommerce and social media, the quantity of data we collect is growing exponentially. This means that the data companies collect are losing relevance at a higher rate than ever before and the need for up-to-date and/or real-time data sets is and will be increasing.
Going into 2021 companies may realize that consumer behavior, sentiment, and preferences are changing, and as such previous models are obsolete. Companies will work to sync their analytics and predictive software to work in tandem with freshly sourced consumer data sets.
#5: How often did people tweet over the course of 2020, what does that say about user interest, and the data sets collected?
Twitter users sent 528,780 tweets every minute in 2020 ie over half a million tweets every minute of 2020.
(Source: Tech Jury)
Analysis + insights
What this says is that people are spending more and more of their time on social media. This increased activity has created and is creating huge amounts of ‘alternative data’ which when collected with specific targets in mind can be immensely profitable. We are talking about what news story is trending (i.e. where public interest currently is), as well as how many people are showing interest in a company’s posts and products.
Going into 2021 I think that we will see a wider variety of industries leveraging social media data in particular but alt data in general as a means of predicting market activity before official corporate and government reports are made public. Quarterly and annual earnings may become superfluous as institutional investors use predictive models driven by alt data to preempt the markets.
Summing it up
2020 was a big year for data – the sheer quantity being produced grew and is growing exponentially. Corporations are identifying the benefits of using ‘alt data’ as well as gaining increased awareness as to the value of using ‘clean data sets’ in algorithms, and investment in self-service automated data collection is taking front and center stage. We can’t predict what is to come in 2021 but what we can do is analyze what happened over the course of 2020 and make educated data-driven decisions.