Tutorial
How to Increase Request Speed
6 Min
intermediate
May 3, 2024
Unlock the full potential of web scraping and data gathering with this comprehensive guide to using a proxy with Python Requests, featuring Bright Data. Whether you're a beginner looking to understand the basics or an experienced developer aiming to optimize your projects, this tutorial covers all you need to know about setting up and managing proxies effectively.
In this video you'll learn
  • Why using a proxy is essential for web scraping
  • How to set up Bright Data proxies with Python Requests
  • Best practices for managing proxy rotation and avoiding IP blocks
  • Tips and tricks to enhance your data collection strategies
Start Your Free Trial

Learn how to increase request speed with simple proxy manipulations

Agenda

  • Simple steps to solve common speed issues
  • Using the fastest IPs and super proxy for your target domains
  • Decreasing response bandwidth
  • Optimize proxy configurations

Don’t want to watch the webinar, read it

The time spent sending and receiving requests is important for the success of your data collecting operations, we will begin by showing you how to speed up your request time.
I’ll begin by using the Proxy Manager that enables me to amend the request before it reaches the Super Proxy.

The Proxy Manager acts as the middleman between the crawler and the Super Proxies helping to control and shape the traffic to the Super Proxies and to the proxy exit node (i.e. called peer).
It also controls what should occur after getting the response.

The Proxy Manager is an open source software installed locally on your computer.
A download is available on our website: brightdata.com/products/proxy-manager, github.com/luminati-io/luminati-proxy, npmjs.com/package/@luminati-io/luminati-proxy, and hub.docker.com/r/luminati/luminati-proxy/

We will start by looking at the requests logs which are available in the Proxy Manager dashboard and under the HAR Viewer tab in each of the proxy ports.
Click on the request which will reveal the request and response details, along with its timing.
The timing shows the time the request took to be sent and received by the target site, along with the time it took the response to arrive back.
When the request time for sending the request exceeds the desired time, we can change this by going to the ‘Request speed’ tab of the proxy port.

I can choose the nearest super proxy to my location in order to have a shorter round trip.
This can be done by the Proxy Manager, by going to Proxy Manager and in the proxy port under the Request speed tab, select the desired country in the Super proxy’s location drop list.
Note, that the exit-node IP geolocation is not affected by the location of the super proxy itself.

Obtaining the super proxy by geolocation, for shorter latency, can also be done with the command servercountry-COUNTRY_CODE.zproxy.lum-superproxy.io Currently, the supported country codes for obtaining a super proxy are AU, CN, GB, IN, NL, US.
For example, obtaining a super proxy from Australia merely requires you run this command: servercountry-au.zproxy.lum-superproxy.io

Another option to improve speed is by sending each request multiple times in parallel via different super proxies and this will use the fastest one.
This can be done by setting Parallel race requests to 3 and the minimum number of super proxies to 5.
Setting ‘resolve DNS at the super proxy’ will also be faster than resolving on the peer side.
Once you set it up, perform a short test sending one request to the target website to verify it succeeded.

Another way to improve the requests speed is by setting a speed threshold and saving a pool of IPs that meet this threshold.
This can be done in the Proxy Manager, under the Rules tab.
I’ll create a rule that is triggered when a request meets my time thresholds or falls below the threshold by selecting Request time less than and 500 milliseconds.

Next, select the action as Save IP to fast pool, this will create a pool of IPs that are fastest for my specific target website, and I’ll set the pool size to 20 IPs.
Keep in mind to enable HTTPS analyzer for the Proxy Manager rules to work with https protocols.
See instruction at brightdata.com/faq#proxy-certificate
Depending on your operating system and browser you’ll find more specific instructions for each here.

Optimizing a slow response time can be done by removing unnecessary files from the response.
This can be done in the rules tab, by selecting the file type to be removed in the regex field, and enabling a trigger for specific URLs.
You can find on the right side the rule as a JavaScript function and copy to your code.
Note, always test your rule by clicking Test below the rule section.

The data-center network and static residential has a limit of 500 requests per sec, per IP.
When you reach this threshold you’ll receive this error: CODE 429: TUN_ERR: Too many requests per IP

To solve this lower the rate of requests or buy IPs to distribute the load across more data-center IPs.
Data-center IPs are machines IPs, and static residential IPs are ISP IPs that are extra fast and can be used as long as needed.
There is no limit on the number of requests using the residential network which consist of tens of millions of real users IPs.

Bright Data have many customers that send more than 20 million requests per day.
Now for the customers who are running tens of millions of requests per day, the ideal configuration is to connect your crawler or bot to the Proxy Manager or via API to the Proxy Manager that connects you to the Super Proxy.

The Proxy Manager is installed on your premises and in high loads, it is necessary to divide the load traffic into several Proxy Managers.
Otherwise, you might get a 502 error code due to the high traffic load on a single Proxy Manager machine.

To configure multiple Proxy Managers install the Proxy Manager on one PC, set up the required proxy ports, the proxy ports targeting, network rules and port configuration.
The proxy port can be configured in the General tab, and select Yes to Enable SSL logs.

Now, whitelist the IPs that you want to connect to this proxy port, meaning the IP of your crawler or bot.
This will ensure that only the allowed sources will use each of the proxy ports of the Proxy Manager.
Simply obtain the IPs from your crawler machine and type it in the Whitelist IPs access field and click V.

Now go to the Manual Configuration tab and copy the JSON file.
To Install the Proxy Manager on other machines, go to the Manual Configurations tab, click edit and paste the JSON file, and save.
This will copy all the proxy ports you defined with their configurations including the whitelisted IPs.

Now that we have a few machines with Proxy Manager, splitting the traffic between the instances is done by directing the requests to the relevant machine IP and Proxy Manager port.

The requests will look like this:
request 1 -> [first server IP]:24000
request 2 -> [second server IP]:24000

Note: remember to install the Proxy Manager certificate as described earlier.

Concluding our webinar, the steps for increasing request speed are:

  • Route the traffic to the fastest Super Proxy base on your location
  • Send parallel requests over a few super proxies to use the fastest one
  • Resolve DNS on the super proxy side
  • Remove unnecessary files from the response
  • Save the fastest IP in a fast IP pool for future use
  • Split the traffic load onto several Proxy Manager instances and machines

Hope this webinar was fruitful for you, You are welcome to visit our Frequently Asked Questions or watch our past webinars which can be found at brightdata.com/webinar

Resource download

The Data You Need
Is Only One Click Away.