2020 In Data: Stats + Insights + Trends

In this article, we will discuss 5 mind-blowing data-related stats generated over the course of 2020 and what they mean as we ease into 2021
Tamir Roter
Tamir Roter | VP EMEA & APAC

Here are the 5 questions we will be unpacking:

Image source: Bright Data

#1: How much data did the financial industry generate?

The banking sector generates unparalleled quantities of data. The amount of data generated each second in the financial industry grew 700% in 2020.

Analysis + insights

The financial sector has had an epiphany over the course of 2020. Financial institutions have always made decisions based on analyses, consumer trends, and cold, hard facts. But never has real-time data been so highly valued as over the course of 2020. alternative data otherwise known as ‘External Data’ includes the likes of:

  • social media posts
  • credit card transactions
  • geolocation data

which have all gained prominence among hedge funds, institutional investors, investment houses, and the like. These data sets are enabling financial professionals to identify consumer, market, and competitor trends in real-time and in many cases before quarterly reports or earnings are announced. Acting on alt data insights enables these investors to capitalize on trends before the public markets, and individual securities fluctuate based on corporate news enabling them to greatly increase their alpha.

Going into 2021 we can expect to see alt data usage trickling down to smaller financial actors, boutique investment houses, for example as well as a wider variety of industries benefiting from these data sets such as real estate, eCommerce, and brand management.

#2: Self-service data tools – in or out?

In 2020, self-service spending on data tools grew an estimated 2.5x higher than traditional data tools spending in the previous year.

(Source: Finance Online)

Analysis + insights

I think that companies are beginning to realize how resource-heavy in-house data collection solutions truly are (think servers, manpower, DevOps etc). Then there is the issue of having to maintain and sustain your in-house data collection operation even when it is not a monthly or quarterly priority. Over the course of 2020, large and mid-size corporations have started investing in self-service data tools as they realize the financial and operational benefits of on-demand autonomous data collection solutions. They can turn data collection jobs on and off at will, team members are free to work on core products and off-site infrastructure can be used such as cloud solutions.

Going into 2021 I think we will see this trend increasing as companies try to cut costs and diversions while increasing profitability and market share.

#3: What is the real cost of poor quality data to the world’s largest economy?

Poor data quality cost the US economy an approximate $3.1 trillion yearly over 2020.

(Source: Tech Jury)

Analysis + insights

Speak to any CEO (especially at technology companies) and they will likely acknowledge the importance of Artificial Intelligence [AI] and its significance in developing market-leading, next-gen products. Surprisingly, fewer are talking about the quality of the data which actually serves as the basis for the AI’s training and operational performance. As the above statistic shows using poor quality data has a high cost whether it comes in the form of cleaning or rendering data sets or in precious wasted data scientist hours or simply less valuable output. For example, if data sets used by real estate investment houses had a significant time lag or were corrupted in any way from a geolocation perspective, this would severely damage algorithmic insights and output.

In any event, despite the affect, poor quality data has had on the world’s economy in general, and the American economy in particular, I think awareness is on the rise. But as with carbon emissions and the global goal of decreasing fossil fuel usage, adoption and implementation will take time.

Going into 2021 we will see more corporations recognizing the fact that they want to invest in solid foundations (ie high-quality data collection) in order to streamline algorithmic output, and employee efficiency.

#4: How much data was created this year and what are the implications on data pools?

Approximately 90% of all data has been created over the course of the last two years.

(Source: Tech Jury)

Analysis + insights

As our economy in general, and consumers, in particular, shift their activities online including eCommerce and social media, the quantity of data we collect is growing exponentially. This means that the data companies collect are losing relevance at a higher rate than ever before and the need for up-to-date and/or real-time data sets is and will be increasing.

Going into 2021 companies may realize that consumer behavior, sentiment, and preferences are changing, and as such previous models are obsolete. Companies will work to sync their analytics and predictive software to work in tandem with freshly sourced consumer data sets.

#5: How often did people tweet over the course of 2020, what does that say about user interest, and the data sets collected?

Twitter users sent 528,780 tweets every minute in 2020 ie over half a million tweets every minute of 2020.

(Source: Tech Jury)

Analysis + insights

What this says is that people are spending more and more of their time on social media. This increased activity has created and is creating huge amounts of ‘alternative data’ which when collected with specific targets in mind can be immensely profitable. We are talking about what news story is trending (i.e. where public interest currently is), as well as how many people are showing interest in a company’s posts and products.

Going into 2021 I think that we will see a wider variety of industries leveraging social media data in particular but alt data in general as a means of predicting market activity before official corporate and government reports are made public. Quarterly and annual earnings may become superfluous as institutional investors use predictive models driven by alt data to preempt the markets.

Summing it up

2020 was a big year for data – the sheer quantity being produced grew and is growing exponentially. Corporations are identifying the benefits of using ‘alt data’ as well as gaining increased awareness as to the value of using ‘clean data sets’ in algorithms, and investment in self-service automated data collection is taking front and center stage. We can’t predict what is to come in 2021 but what we can do is analyze what happened over the course of 2020 and make educated data-driven decisions.

Tamir Roter
Tamir Roter | VP EMEA & APAC

Tamir is a software business executive with a track record of successfully driving international corporate growth and profitability with large enterprises as well as start-ups. Over the course of 20+ years, he has been building and managing sales, marketing and regional teams. Tamir currently works as a VP at Bright Data, the leading global web data collection network.

You might also be interested in

What is data aggregation

Data Aggregation – Definition, Use Cases, and Challenges

This blog post will teach you everything you need to know about data aggregation. Here, you will see what data aggregation is, where it is used, what benefits it can bring, and what obstacles it involves.
What is a data parser featured image

What Is Data Parsing? Definition, Benefits, and Challenges

In this article, you will learn everything you need to know about data parsing. In detail, you will learn what data parsing is, why it is so important, and what is the best way to approach it.
What is a web crawler featured image

What is a Web Crawler?

Web crawlers are a critical part of the infrastructure of the Internet. In this article, we will discuss: Web Crawler Definition A web crawler is a software robot that scans the internet and downloads the data it finds. Most web crawlers are operated by search engines like Google, Bing, Baidu, and DuckDuckGo. Search engines apply […]

A Hands-On Guide to Web Scraping in R

In this tutorial, we’ll go through all the steps involved in web scraping in R with rvest with the goal of extracting product reviews from one publicly accessible URL from Amazon’s website.

The Ultimate Web Scraping With C# Guide

In this tutorial, you will learn how to build a web scraper in C#. In detail, you will see how to perform an HTTP request to download the web page you want to scrape, select HTML elements from its DOM tree, and extract data from them.
Javascript and node.js web scraping guide image

Web Scraping With JavaScript and Node.JS

We will cover why frontend JavaScript isn’t the best option for web scraping and will teach you how to build a Node.js scraper from scratch.
Web scraping with JSoup

Web Scraping in Java With Jsoup: A Step-By-Step Guide

Learn to perform web scraping with Jsoup in Java to automatically extract all data from an entire website.
Static vs. Rotating Proxies

Static vs Rotating Proxies: Detailed Comparison

Proxies play an important role in enabling businesses to conduct critical web research.