Ethical Compliance In Data Science

This article will walk you through the business imperatives of using pristine datasets as part of your company’s operations, what constitutes an ethical data collection network, as well as how real-time compliance, and code-based response mechanisms are leading the way
Ethical Compliance In Data Science_1
Gal Shechter
Gal Shechter | Compliance Manager

In this article we will discuss:

Why ethical data collection is a business imperative? 

Or Lenchner once wrote “Your Data Won’t Serve You For Long If It Was Collected Unethically”. 

‘Why is that?’, you may be wondering.

The simple answer is that many businesses build their revenue models based on data. For example an investor’s tool that provides customers with real-time market insights needs to constantly mine relevant datasets. This information is imperative to their ability to survive, and thrive. If for example they are using a non-GDPR (General Data Protection Regulation) compliant system that does not respect EU user privacy regulations, then that company can be fined tens of thousands of Euros by The Data Protection Commission (DPC). Not only will they need to incur financial losses from these fines, but their investors may get spooked by their practices being called into question. Additionally they will need to most likely halt operations until they can prove which datasets are compliant, and which aren’t. And finally they will need to find a new way of sourcing legally-compliant datasets that follow the ethical guidelines of protecting people’s indivisible rights to personal privacy. 

Bright Data’s practical commitment to helping you source pristine datasets 

Bright Data is one of the main industry’s pioneers of ethically sourced data. But as a company, our culture is one of action, and not merely of talking. Which is why we ‘put our money where our mouth is’, -so to speak. Here are the practical ways that we help our customers source ‘clean’ datasets:

  1. The ‘Bright Data Security Reward Program’ – Asks the public to keep an eye out for security, and privacy shortcomings and to alert our team in such cases. These vulnerabilities include (among others):
    • Cross-Site Request Forgery – CSRF/ XSRF
    • Authentication or Authorization flaws
    • Access of internal company web pages via installed SDK

2. Our peers have the right to opt-in, and out of our networks at any point in time. This is paramount to any ethical data collection, putting user consent at the forefront.  

3. Bright Data works with leading independent firms to carry out third-party audits. A good example of this  is ‘Herzog Strategic’ who carried out a full review of our network policies, and activities. These external checks-and-balances ensure that our data collection networks are up to regulatory, and legal standards.

A Know Your Customer (KYC)- first approach to compliance 

Bright Data’s KYC-first approach is led by its dedicated compliance officer, and team. Here are some of the key ways in which this is carried out in our day-to-day operations:

  • Real-time compliance – Log checks are performed on an ongoing basis in order to ensure that traffic is aligned with the customer’s declared use case.
  • IP user validation – Each corporation needs to submit an ‘approved list of users’ which we vet using third-parties. This helps us guarantee that all people performing open source web-based data collection are indeed employees of the company in question. 
  • Code-based response mechanisms – Our developers work in real-time, in order to block network abuse attempts. This is spearheaded by our Build-and-Test (BAT) system, enabling us as a company to release 60 upgrades on average to our systems on a daily basis. An unprecedented number in the industry.
  • Due diligence – We have an in-house compliance department which serves as an entirely separate entity that does not operate under the CEO in order to ensure unethical network activities are thwarted. This separation helps completely divide business/economic interests from our commitment to data collection ethics, enabling our compliance team to properly carry out due diligence on each individual new customer. 
  • Access denied – We are not afraid of turning customers away. In fact, over the course of 2020 our compliance team made sure that almost 1,500 companies failed our stringent KYC/onboarding process. That number is on track to skyrocket, and double through 2021. 

The bottom line 

Data collection is part art, and part science. But it also has many legal, and ethical aspects that some companies are unaware of. These concerns can have major financial, operational, and customer-facing implications, and should be addressed meticulously. 

Gal Shechter
Gal Shechter | Compliance Manager

Gal is the Compliance Team Leader at Bright Data responsible for ensuring that the company complies with its extra-regulatory requirements, and internal policies, translates regulatory requirements to internal business partners, drives alignment on compliance requirements, and is responsible for assessing industry best practices for security compliance requirements. Gal is also responsible for designing and delivering automated processes to support the company's compliance requirements.


You might also be interested in

If your company has even ONE developer dedicated to web data collection, you are wasting precious resources

The state of the economy in general, and of tech in particular, is leading many CEOs to put budget cut pressure on Information Technology execs. This article aims to help IT leaders improve their bottom lines by offering a more strategic approach to operational web data collection outsourcing

Shooting ourselves in the foot? Why we willingly killed 10% of our network

Bright Data believes in transparent and ethical practices, especially when it comes to dealing with users who make up its Residential peer network. To ensure compliance, we use advanced monitoring protocols and partner with top anti-virus companies. Sometimes, we make decisions which might seem a little crazy, like hurting our own network. That is what this post is about.
Web Data powering e-commerce

Mystery shoppers are so 2000 and late. Web data is the future of e-commerce.

We sat down with Charmagne Cruz from Shopee, the leading e-commerce platform in Southeast Asia, to discuss how the online conglomerate uses public web data to drive forward the company’s success as well as carve out a large section of the Asian e-commerce market.
Qualitative data collection methods

Qualitative data collection methods

Quantitative pertains to numbers such as competitor product fluctuations, while qualitative pertains to the ‘narrative’ such as audience social sentiment regarding a particular brand. This article explains all the key differences between the two, as well as offering tools to quickly and easily obtain target data points