How To Scrape A Website Without Getting Blocked Or Misled (Cloaked)?

Many websites block legit scraping activity. Here is how a business can avoid being blocked
Scraping and collecting website data and avoiding being blocked, cloaked or otherwise misled
Oren Knaan
Oren Knaan | Product Manager
12-Dec-2018
Share:

Why should I care?

When a target website detects crawlers from a proxy (data-center) IP, it typically:

  • Blocks the IP
  • Presents the IP with purposely misleading information
  • Throttle down the response rate

How does the target website identify my crawling activity?

Target websites log the IPs of whoever visits them and analyzes the activity of these IPs. Assuming you are using a traditional data center proxy, the target website can:

  • Identify that the activity from a single IP (the rate of requests) is much greater than what a human can accomplish in a given time frame
  • Identify that the IP address originated from a proxy server list, which these target websites have access to
  • Identify that the IPs have the same subnet block range

How to prevent being detected?

  • To prevent being detected by the number of requests per IP, you can reduce the number of requests per seconds. However, this will reduce your crawling speed. Learn about super fast crawling capacities here.
  • To prevent the target website from identifying your IP as coming from a proxy server, you must rotate your requests through residential IPs. You should be able to circulate through enough IPs that the target website cannot detect your activity.
  • When using residential IPs there is no subnet block range.

You can learn more from this guide how to prevent getting blacklisted or blocked when crawling a website.
By using a traditional proxy solution, it’s only a matter of time before the target website will identify your crawling activities, and can block or provide you with the wrong information.

Fill in the Form Below and Get:

  1. Your own Personal Bright Data Representative to set-up your account specifically for your needs
  2. Access to the Largest Residential Proxy Network in the World
  3. 24/7 support no matter where you are located
  4. A $250 bonus for every $250 deposit!

Get a $250 bonus for every $250 you deposit

Oren Knaan
Oren Knaan | Product Manager
Share:

You might also be interested in

Web Crawling Is So Last Decade

Web Crawling Is So 2019

Datasets are delivering ready-to-use snapshots of entire websites, or smart subsets in a matter of minutes: lenders are receiving alternative loan applicants’ data, Venture Capitalists are being served startup accelerator info, while other companies are having social media influencers’ engagement scores fed directly to algorithms
Choosing a proxy provider

What To Look for When Choosing A Proxy Provider?

Let’s have another take at this age-old yet ever-relevant question.
'How Do Our Residential Proxy Network Works_

What Are Premium Proxies And Why Would Anyone Need One?

From enabling you to implement your dynamic pricing strategy, and optimizing advertising campaigns based on social sentiment to powering algorithms built on more accurate historic datasets, and collecting financial data in order to make smarter investment decisions, this post will guide you through premium proxy differences, and capabilities.
Why Do eCommerce Sites Give Different Prices Based On Geo-Location_

Why Do eCommerce Sites Give Different Prices Based On Geo-Location?

Reasons for this practice span from a desire to gauge market demand for a product to driving conversion rates by offering zip code-based pricing that matches income levels. Learn how you can leverage data collection technology in order to ensure that your dynamic pricing strategy places you at the top of your market.