2020 In Data: Stats + Insights + Trends

In this article, we will discuss 5 mind-blowing data-related stats generated over the course of 2020 and what they mean as we ease into 2021
6 min read
2020 data fun facts

Here are the 5 questions we will be unpacking:

Image source: Bright Data

#1: How much data did the financial industry generate?

The banking sector generates unparalleled quantities of data. The amount of data generated each second in the financial industry grew 700% in 2020.

Analysis + insights

The financial sector has had an epiphany over the course of 2020. Financial institutions have always made decisions based on analyses, consumer trends, and cold, hard facts. But never has real-time data been so highly valued as over the course of 2020. alternative data otherwise known as ‘External Data’ includes the likes of:

  • social media posts
  • credit card transactions
  • geolocation data

which have all gained prominence among hedge funds, institutional investors, investment houses, and the like. These data sets are enabling financial professionals to identify consumer, market, and competitor trends in real-time and in many cases before quarterly reports or earnings are announced. Acting on alt data insights enables these investors to capitalize on trends before the public markets, and individual securities fluctuate based on corporate news enabling them to greatly increase their alpha.

Going into 2021 we can expect to see alt data usage trickling down to smaller financial actors, boutique investment houses, for example as well as a wider variety of industries benefiting from these data sets such as real estate, eCommerce, and brand management.

#2: Self-service data tools – in or out?

In 2020, self-service spending on data tools grew an estimated 2.5x higher than traditional data tools spending in the previous year.

(Source: Finance Online)

Analysis + insights

I think that companies are beginning to realize how resource-heavy in-house data collection solutions truly are (think servers, manpower, DevOps etc). Then there is the issue of having to maintain and sustain your in-house data collection operation even when it is not a monthly or quarterly priority. Over the course of 2020, large and mid-size corporations have started investing in self-service data tools as they realize the financial and operational benefits of on-demand autonomous data collection solutions. They can turn data collection jobs on and off at will, team members are free to work on core products and off-site infrastructure can be used such as cloud solutions.

Going into 2021 I think we will see this trend increasing as companies try to cut costs and diversions while increasing profitability and market share.

#3: What is the real cost of poor quality data to the world’s largest economy?

Poor data quality cost the US economy an approximate $3.1 trillion yearly over 2020.

(Source: Tech Jury)

Analysis + insights

Speak to any CEO (especially at technology companies) and they will likely acknowledge the importance of Artificial Intelligence [AI] and its significance in developing market-leading, next-gen products. Surprisingly, fewer are talking about the quality of the data which actually serves as the basis for the AI’s training and operational performance. As the above statistic shows using poor quality data has a high cost whether it comes in the form of cleaning or rendering data sets or in precious wasted data scientist hours or simply less valuable output. For example, if data sets used by real estate investment houses had a significant time lag or were corrupted in any way from a geolocation perspective, this would severely damage algorithmic insights and output.

In any event, despite the affect, poor quality data has had on the world’s economy in general, and the American economy in particular, I think awareness is on the rise. But as with carbon emissions and the global goal of decreasing fossil fuel usage, adoption and implementation will take time.

Going into 2021 we will see more corporations recognizing the fact that they want to invest in solid foundations (ie high-quality data collection) in order to streamline algorithmic output, and employee efficiency.

#4: How much data was created this year and what are the implications on data pools?

Approximately 90% of all data has been created over the course of the last two years.

(Source: Tech Jury)

Analysis + insights

As our economy in general, and consumers, in particular, shift their activities online including eCommerce and social media, the quantity of data we collect is growing exponentially. This means that the data companies collect are losing relevance at a higher rate than ever before and the need for up-to-date and/or real-time data sets is and will be increasing.

Going into 2021 companies may realize that consumer behavior, sentiment, and preferences are changing, and as such previous models are obsolete. Companies will work to sync their analytics and predictive software to work in tandem with freshly sourced consumer data sets.

#5: How often did people tweet over the course of 2020, what does that say about user interest, and the data sets collected?

Twitter users sent 528,780 tweets every minute in 2020 ie over half a million tweets every minute of 2020.

(Source: Tech Jury)

Analysis + insights

What this says is that people are spending more and more of their time on social media. This increased activity has created and is creating huge amounts of ‘alternative data’ which when collected with specific targets in mind can be immensely profitable. We are talking about what news story is trending (i.e. where public interest currently is), as well as how many people are showing interest in a company’s posts and products.

Going into 2021 I think that we will see a wider variety of industries leveraging social media data in particular but alt data in general as a means of predicting market activity before official corporate and government reports are made public. Quarterly and annual earnings may become superfluous as institutional investors use predictive models driven by alt data to preempt the markets.

Summing it up

2020 was a big year for data – the sheer quantity being produced grew and is growing exponentially. Corporations are identifying the benefits of using ‘alt data’ as well as gaining increased awareness as to the value of using ‘clean data sets’ in algorithms, and investment in self-service automated data collection is taking front and center stage. We can’t predict what is to come in 2021 but what we can do is analyze what happened over the course of 2020 and make educated data-driven decisions.

More from Bright Data

Datasets Icon
Get immediately structured data
Access reliable public web data for any use case. The datasets can be downloaded or delivered in a variety of formats. Subscribe to get fresh records of your preferred dataset based on a pre-defined schedule.
Web scraper IDE Icon
Build reliable web scrapers. Fast.
Build scrapers in a cloud environment with code templates and functions that speed up the development. This solution is based on Bright Data’s Web Unlocker and proxy infrastructure making it easy to scale and never get blocked.
Web Unlocker Icon
Implement an automated unlocking solution
Boost the unblocking process with fingerprint management, CAPTCHA-solving, and IP rotation. Any scraper, written in any language, can integrate it via a regular proxy interface.

Ready to get started?