Buying proxies for web scraping. Pro tips to save on costs.

Learn the differences between the cost of proxies vs. the cost of data acquisition, how to optimize proxy integration/maintenance costs, as well as how to build a solution that will be relevant for years to come
7 min read
Three things to consider before choosing your proxy provider - A complete checklist

In this article we will cover:

#1: Understand the cost of proxies vs. the cost of data acquisition

When calculating your future expenses, you need to look not only and not so much at the cost per IP address or per GB of traffic but at the cost of data that you eventually receive. The final cost of data acquisition will be affected by:

  • The pricing model and network’s success rate 
  • As well as how costs are applied.

If you are a freelancer or an independent researcher, then the cost of the proxy  will be the deciding factor. But if your project requires large-scale data collection, then these small nuances can greatly increase your costs for proxy infrastructure.

Pricing model

If the pricing model is per IP, check that the provider has an effective fall-back mechanism. This basically means that the provider guarantees that your IP will have 100% uptime, and if there is a connectivity issue, the provider will automatically reroute your requests through other IPs with exactly the same properties free of charge and without having to make any changes to your code. 

If the pricing model is per GB, it would be wise to first check what  the success rate of the provider is. You can check the success rate on independent review sites such as Proxyways. In this case scenario, go ahead and  use the following formula to calculate the effective price of your data acquisition: (1Gb / success rate) * price = effective price of data acquisition. The lower the success rate, the higher the cost of data to you. 

How costs are applied

Some networks will charge you for all traffic that is routed through their peers, while others will include only successful requests into their traffic calculations. 

The ideal business proxy is one that provides a reliable fallback mechanism, and at the same time, only successfully completed requests are taken into account when calculating the traffic, that is, those requests that have retrieved the data you requested.

#2: Reduce related costs of data acquisition 

This includes the cost of:

  • Downtime
  • Cleaning and preparing data
  • Implementation and maintenance 

Cost of downtime

If your business is affected by seasonal peaks, make sure your provider has 100% network uptime. You don’t want to have your data collection funnel disrupted in the middle of a hot sales season. 

Cost of cleaning and preparation of data 

Data scraping is only the initial stage. After the collection stage, the process of cleaning, and structuring data makes it suitable for further analysis. Many companies spend up to 80% of their time on this stage.

The amount of bad data (i.e., broken, invalid, and inconsistent data points) can be significantly reduced if you choose the right proxies for your business.

Here are three things to look for in a potential provider:

  • Their networks are made up of devices that belong to real users or residential Internet Service Providers (ISPs). Target sites have a much higher level of trust when such proxies attempt to collect data from them which also contributes to above-average success rates. (Networks in this category include: Residential Proxies, Mobile Proxies, ISP Proxies).
  • Proxies that can automatically select digital fingerprints, and  emulate headers (Web Unlocker is a good example of a tool that helps accomplish this).
  • Proxies that are able to identify inconsistencies in page responses that indicate a potentially  hidden block. For example, when using Bright Data, such a response would not be considered successful, and the system will automatically skip this site (thereby saving the user time, money, and resources). 

In addition, sometimes, the site has information that you simply do not need. Choose proxies that allow you to split the traffic in terms of bandwidth and cost optimization. For example, if you don’t need media files, you can choose to skip these data points saving up to 90% of your bandwidth (and budget).

Cost of implementation and maintenance

Regardless of the size of your team, you want your developers to spend as little time as possible on proxy support and as much time as possible on your main product. Therefore, it is important to choose a proxy that is created with developers in mind. 

Look for a potential proxy provider that offers:

  • An easy integration procedure.
  • Availability of ready-made integrations with popular third-party automation programs (such as Selenium, Puppeteer, and the like).
  • Availability of tools that facilitate development and automate routine operations.
  • 24/7 technical support that is given by qualified specialists who speak the same language as your team.

#3: Stay relevant for the future

If you want to build a solution that will serve you for many years, then you should pay attention to:

  • The size and diversity of the proxy provider’s network
  • How the provider approaches data regulation gray areas 

The size and diversity of your proxy provider’s network

It is important to choose a proxy provider that has a large international network of different IP types in a variety of geolocations. One project requires Datacenter IPs, and the other – requires only mobile proxies. Besides, the larger the network, the lower the probability that you will run out of ‘fresh IPs’ to accomplish new data collection tasks.  

Whether you plan to enter new markets or want to understand how your competitors perform in different geographies, make sure that the proxy network you choose has plenty of peers in all countries in the world. This will enable you to lift any geo-restrictions on information that you need.  

How the provider approaches gray areas of data regulations 

Web data is a new and booming industry, and legislators cannot keep up with its development. At the moment, two very important laws have been adopted and are in force: the GDPR and CCPA, which protect individual user’s data rights . But other issues related to the ethical principles of data collection, which describe not only which data can and cannot be collected, but also how this should be done, are already being discussed everywhere. Therefore, if you want to prepare in advance, pay attention to:

  • How a proxy provider acquires its residential peers? The right answer – clear and explicit consent. If the consent is hidden in the Terms of Use, it is a sign that the person might not even be aware of what his device is being used for.    
  • Are the members of the Residential proxy network fairly compensated? Do they have control of how and when their device’s resources are used? The right answer is that the proxy provider uses the device’s resources only when it is charged, idle, and is connected to Wi-Fi. In some cases, for example, using earnapp, the device owner can decide which website and what kind of data he wants to allow access to through his IP address. 
  • Does your proxy provider take active measures to prevent any harm from being done to web ecosystems? Companies invest a lot of money and effort in creating seamless User Experiences for their customers through their website. Web scraping, when out of control, can create extra loads on the target website. Ask your provider if they have a mechanism to monitor peak loads and in order to protect User Experiences on the websites being targeted. 

The bottom line

The more transparent, the better. Check what the provider has beyond standard privacy policies. It can be a ‘more-ethical code’ or a detailed explanation on how the network is built. This can make the solution you are building future-proof.

Download check-list (no email required)

More from Bright Data

Datasets Icon
Get immediately structured data
Access reliable public web data for any use case. The datasets can be downloaded or delivered in a variety of formats. Subscribe to get fresh records of your preferred dataset based on a pre-defined schedule.
Web scraper IDE Icon
Build reliable web scrapers. Fast.
Build scrapers in a cloud environment with code templates and functions that speed up the development. This solution is based on Bright Data’s Web Unlocker and proxy infrastructure making it easy to scale and never get blocked.
Web Unlocker Icon
Implement an automated unlocking solution
Boost the unblocking process with fingerprint management, CAPTCHA-solving, and IP rotation. Any scraper, written in any language, can integrate it via a regular proxy interface.

Ready to get started?