Ethical Compliance In Data Science

This article will walk you through the business imperatives of using pristine datasets as part of your company’s operations, what constitutes an ethical data collection network, as well as how real-time compliance, and code-based response mechanisms are leading the way
4 min read
Ethical Compliance In Data Science_1

In this article we will discuss:

Why ethical data collection is a business imperative? 

Or Lenchner once wrote “Your Data Won’t Serve You For Long If It Was Collected Unethically”. 

‘Why is that?’, you may be wondering.

The simple answer is that many businesses build their revenue models based on data. For example an investor’s tool that provides customers with real-time market insights needs to constantly mine relevant datasets. This information is imperative to their ability to survive, and thrive. If for example they are using a non-GDPR (General Data Protection Regulation) compliant system that does not respect EU user privacy regulations, then that company can be fined tens of thousands of Euros by The Data Protection Commission (DPC). Not only will they need to incur financial losses from these fines, but their investors may get spooked by their practices being called into question. Additionally they will need to most likely halt operations until they can prove which datasets are compliant, and which aren’t. And finally they will need to find a new way of sourcing legally-compliant datasets that follow the ethical guidelines of protecting people’s indivisible rights to personal privacy. 

Bright Data’s practical commitment to helping you source pristine datasets 

Bright Data is one of the main industry’s pioneers of ethically sourced data. But as a company, our culture is one of action, and not merely of talking. Which is why we ‘put our money where our mouth is’, -so to speak. Here are the practical ways that we help our customers source ‘clean’ datasets:

  1. The ‘Bright Data Security Reward Program’ – Asks the public to keep an eye out for security, and privacy shortcomings and to alert our team in such cases. These vulnerabilities include (among others):
    • Cross-Site Request Forgery – CSRF/ XSRF
    • Authentication or Authorization flaws
    • Access of internal company web pages via installed SDK

2. Our peers have the right to opt-in, and out of our networks at any point in time. This is paramount to any ethical data collection, putting user consent at the forefront.  

3. Bright Data works with leading independent firms to carry out third-party audits. A good example of this  is ‘Herzog Strategic’ who carried out a full review of our network policies, and activities. These external checks-and-balances ensure that our data collection networks are up to regulatory, and legal standards.

A Know Your Customer (KYC)- first approach to compliance 

Bright Data’s KYC-first approach is led by its dedicated compliance officer, and team. Here are some of the key ways in which this is carried out in our day-to-day operations:

  • Real-time compliance – Log checks are performed on an ongoing basis in order to ensure that traffic is aligned with the customer’s declared use case.
  • IP user validation – Each corporation needs to submit an ‘approved list of users’ which we vet using third-parties. This helps us guarantee that all people performing open source web-based data collection are indeed employees of the company in question. 
  • Code-based response mechanisms – Our developers work in real-time, in order to block network abuse attempts. This is spearheaded by our Build-and-Test (BAT) system, enabling us as a company to release 60 upgrades on average to our systems on a daily basis. An unprecedented number in the industry.
  • Due diligence – We have an in-house compliance department which serves as an entirely separate entity that does not operate under the CEO in order to ensure unethical network activities are thwarted. This separation helps completely divide business/economic interests from our commitment to data collection ethics, enabling our compliance team to properly carry out due diligence on each individual new customer. 
  • Access denied – We are not afraid of turning customers away. In fact, over the course of 2020 our compliance team made sure that almost 1,500 companies failed our stringent KYC/onboarding process. That number is on track to skyrocket, and double through 2021. 

The bottom line 

Data collection is part art, and part science. But it also has many legal, and ethical aspects that some companies are unaware of. These concerns can have major financial, operational, and customer-facing implications, and should be addressed meticulously. 

More from Bright Data

Datasets Icon
Get immediately structured data
Access reliable public web data for any use case. The datasets can be downloaded or delivered in a variety of formats. Subscribe to get fresh records of your preferred dataset based on a pre-defined schedule.
Web scraper IDE Icon
Build reliable web scrapers. Fast.
Build scrapers in a cloud environment with code templates and functions that speed up the development. This solution is based on Bright Data’s Web Unlocker and proxy infrastructure making it easy to scale and never get blocked.
Web Unlocker Icon
Implement an automated unlocking solution
Boost the unblocking process with fingerprint management, CAPTCHA-solving, and IP rotation. Any scraper, written in any language, can integrate it via a regular proxy interface.

Ready to get started?