Ethical Compliance In Data Science

This article will walk you through the business imperatives of using pristine datasets as part of your company’s operations, what constitutes an ethical data collection network, as well as how real-time compliance, and code-based response mechanisms are leading the way
Gal Shechter
Gal Shechter | Compliance Manager

In this article we will discuss:

Why ethical data collection is a business imperative? 

Or Lenchner once wrote “Your Data Won’t Serve You For Long If It Was Collected Unethically”. 

‘Why is that?’, you may be wondering.

The simple answer is that many businesses build their revenue models based on data. For example an investor’s tool that provides customers with real-time market insights needs to constantly mine relevant datasets. This information is imperative to their ability to survive, and thrive. If for example they are using a non-GDPR (General Data Protection Regulation) compliant system that does not respect EU user privacy regulations, then that company can be fined tens of thousands of Euros by The Data Protection Commission (DPC). Not only will they need to incur financial losses from these fines, but their investors may get spooked by their practices being called into question. Additionally they will need to most likely halt operations until they can prove which datasets are compliant, and which aren’t. And finally they will need to find a new way of sourcing legally-compliant datasets that follow the ethical guidelines of protecting people’s indivisible rights to personal privacy. 

Bright Data’s practical commitment to helping you source pristine datasets 

Bright Data is one of the main industry’s pioneers of ethically sourced data. But as a company, our culture is one of action, and not merely of talking. Which is why we ‘put our money where our mouth is’, -so to speak. Here are the practical ways that we help our customers source ‘clean’ datasets:

  1. The ‘Bright Data Security Reward Program’ – Asks the public to keep an eye out for security, and privacy shortcomings and to alert our team in such cases. These vulnerabilities include (among others):
    • Cross-Site Request Forgery – CSRF/ XSRF
    • Authentication or Authorization flaws
    • Access of internal company web pages via installed SDK

2. Our peers have the right to opt-in, and out of our networks at any point in time. This is paramount to any ethical data collection, putting user consent at the forefront.  

3. Bright Data works with leading independent firms to carry out third-party audits. A good example of this  is ‘Herzog Strategic’ who carried out a full review of our network policies, and activities. These external checks-and-balances ensure that our data collection networks are up to regulatory, and legal standards.

A Know Your Customer (KYC)- first approach to compliance 

Bright Data’s KYC-first approach is led by its dedicated compliance officer, and team. Here are some of the key ways in which this is carried out in our day-to-day operations:

  • Real-time compliance – Log checks are performed on an ongoing basis in order to ensure that traffic is aligned with the customer’s declared use case.
  • IP user validation – Each corporation needs to submit an ‘approved list of users’ which we vet using third-parties. This helps us guarantee that all people performing open source web-based data collection are indeed employees of the company in question. 
  • Code-based response mechanisms – Our developers work in real-time, in order to block network abuse attempts. This is spearheaded by our Build-and-Test (BAT) system, enabling us as a company to release 60 upgrades on average to our systems on a daily basis. An unprecedented number in the industry.
  • Due diligence – We have an in-house compliance department which serves as an entirely separate entity that does not operate under the CEO in order to ensure unethical network activities are thwarted. This separation helps completely divide business/economic interests from our commitment to data collection ethics, enabling our compliance team to properly carry out due diligence on each individual new customer. 
  • Access denied – We are not afraid of turning customers away. In fact, over the course of 2020 our compliance team made sure that almost 1,500 companies failed our stringent KYC/onboarding process. That number is on track to skyrocket, and double through 2021. 

The bottom line 

Data collection is part art, and part science. But it also has many legal, and ethical aspects that some companies are unaware of. These concerns can have major financial, operational, and customer-facing implications, and should be addressed meticulously. 

Gal Shechter
Gal Shechter | Compliance Manager

Gal is the Compliance Team Leader at Bright Data responsible for ensuring that the company complies with its extra-regulatory requirements, and internal policies, translates regulatory requirements to internal business partners, drives alignment on compliance requirements, and is responsible for assessing industry best practices for security compliance requirements. Gal is also responsible for designing and delivering automated processes to support the company's compliance requirements.

You might also be interested in

What is data aggregation

Data Aggregation – Definition, Use Cases, and Challenges

This blog post will teach you everything you need to know about data aggregation. Here, you will see what data aggregation is, where it is used, what benefits it can bring, and what obstacles it involves.
What is a data parser featured image

What Is Data Parsing? Definition, Benefits, and Challenges

In this article, you will learn everything you need to know about data parsing. In detail, you will learn what data parsing is, why it is so important, and what is the best way to approach it.
What is a web crawler featured image

What is a Web Crawler?

Web crawlers are a critical part of the infrastructure of the Internet. In this article, we will discuss: Web Crawler Definition A web crawler is a software robot that scans the internet and downloads the data it finds. Most web crawlers are operated by search engines like Google, Bing, Baidu, and DuckDuckGo. Search engines apply […]

A Hands-On Guide to Web Scraping in R

In this tutorial, we’ll go through all the steps involved in web scraping in R with rvest with the goal of extracting product reviews from one publicly accessible URL from Amazon’s website.

The Ultimate Web Scraping With C# Guide

In this tutorial, you will learn how to build a web scraper in C#. In detail, you will see how to perform an HTTP request to download the web page you want to scrape, select HTML elements from its DOM tree, and extract data from them.
Javascript and node.js web scraping guide image

Web Scraping With JavaScript and Node.JS

We will cover why frontend JavaScript isn’t the best option for web scraping and will teach you how to build a Node.js scraper from scratch.
Web scraping with JSoup

Web Scraping in Java With Jsoup: A Step-By-Step Guide

Learn to perform web scraping with Jsoup in Java to automatically extract all data from an entire website.
Static vs. Rotating Proxies

Static vs Rotating Proxies: Detailed Comparison

Proxies play an important role in enabling businesses to conduct critical web research.