Learn how to increase your success rate and reduce proxy costs
- IP rotation, reserve IPs and ban IPs
- Auto retry, limit requests
- Route requests and bypass proxy
- Improve request speed and reduce bandwidth
- Detailed request logs and success ratio metrics
Don’t want to watch the webinar, read it
Bright Data’s Proxy Manager is an open source free software that works with any proxy provider.
It will allow you to speed up development time with built-in data gathering features allowing for debugging, and complete control of your proxy operations.
To begin, download and install the Proxy Manager onto your computer or server.
The Proxy Manager is supported on Windows, Linux and MAC machines, and available as a Docker image.
When the installation is completed a CMD ‘black’ window will show up and will open the Proxy Manager dashboard on your default browser.
Don’t close the CMD black windows since it will turn-off the Proxy Manager and your entire communication with the Super Proxy.
The Proxy Manager is the middle man between you and your crawler or browser, therefore, you should update your connection to call on the Proxy Manager, for example:
curl --proxy 127.0.0.1 and then define the port in the Proxy Manager.
The Proxy Manager will connect and authenticate you with the Super Proxy which sends the request to the Peer that is also called proxy exit node, or if you are using the residential network the real user device.
The dashboard consists of a list of ports, overall success rate percentages along with the number of requests and BW usage.
At the bottom, we can see the summary of all logs including the URL of the target site, the port used, status code received, BW, the amount of time the request took, the peer proxy IP and date.
The request details can be found by clicking on the request itself and you’ll find the headers details, header response, and request.
Later in this discussion, we’ll look into the logs.
For creating a new proxy port click on ‘Add New Proxy’ button.
First I’ll choose the Proxy Zone I created earlier in my Bright Data dashboard.
Here you can see the Zone details such as Network type and here at Bright Data we have 3, our data-center, residential and mobile.
Next, we have the type of network permissions and permissions refers to the targeting, whether this port has Country, state, city, Exclusive residential IPs or ASN targeting.
Now an ASN is an autonomous system number which is an internet protocol prefix.
A Exclusive residential IP is a group of 3-30 residential IPs for your exclusive use.
You can always update the Zone settings later by clicking on Edit Zone.
Now we choose our preset configuration that automatically initiates the required settings for your particular need.
The default option is the Long Single Session IP which is the most common preset.
This preset is used when connecting a browser to the proxy or when conducting any activity where you don’t want the IP to change during the session.
The Long Single Session maintains the IP as long as possible and automatically applies the settings:
- It creates a Sequential Pool type
- The pool size is already set to 1, and KeepAlive is automatically set to yes as this pings the IP continuously to keep the session going.
- All the other options of changing sessions are greyed out as you can see here
The next preset option is Session IP per machine, this preset is used when you connect several computers to the same Proxy Manager where each of them acts alone and has its’ own session.
Selecting Session IP per machine preset will apply the following settings:
- Pool type as sequential
- And multiple proxy ports to 1, meaning there will be no multiple ports
The next preset in our list is the Round-robin pool that distributes the requests across a large number of IPs.
The common use case of round-robin preset is when harvesting data from the web that requires using different IPs in large scraping operations.
Selecting Round-robin preset will apply the following settings:
- Pool type as a round-robin
- Pool size as 10, and you can update to larger or smaller IP pool sizes
- Max request as 1
- Multiple ports to 1
Next preset is the High performance when fast IPs are required, the common use case is when crawling slow websites or when you need to display the crawling results in real-time.
The High-performance preset will apply the following settings:
- Pool type as round-robin
- Pool size to 50
- Parallel race requests as 2
- For every request, the Proxy Manager will send two parallel requests via different Super Proxies and will use the fastest one.
- Maximum number of super proxies to 20
Next preset is the Random User agent for rotating the User-Agent on every request.
When using a browser it will overwrite the browser header will apply different user-agent on every request.
This preset require when opening multiple session that requires different user-agents.
Now we have the online-shopping preset for scraping e-commerce websites.
The common use-case is data collection of product pages but can be modified to other use-cases.
The online-shopping preset will apply the following setting:
- Request speed
- Resolve DNS remotely by the peer which is the proxy exit node, rather than resolving on the super proxy side.
- Set random user agent per request and override the headers
- Enable SSL since most e-commerce website use HTTPS and this enables the SSL log
- This is a post data processing rule to fetch the required data, the default example is fetching product page data such as product page title, price, and bullets.
How the Proxy Manager rules work:
- A rule is triggered and in this case based on a specific URL, so here you see luminati.io
- The action is Processing the data from the page, and it can be changed for your specific use-case.
The last preset is Custom and can be defined for any specific use-case.
For any of the preset, you can go to the targeting tab and select the required geo targeting by selecting the country, state, city resolution, ISP/AN and carrier.
Keep in mind to be persistent of your geolocation IP targeting, web-profile and the time Zone in your header especially when doing account management.
Under IP control you can find another useful custom setting is session termination.
By selecting Yes you are asking the Proxy Manager to terminate the session and stop sending requests when the IP is not available.
This setup is very useful for social account management where changing the IP during the session may affect the social account.
The Proxy Manager rules is a great feature and require enabling the Proxy Manager SSL certificate this is also needed when working with HTTPS websites.
You should download the Proxy Manager SSL key and enabled it by following the instructions here.
Next, go into the port you are working with and under the General tab you can enable SSL logs and we do this by clicking yes.
Now under the RULES tab are The Proxy Manager Rules. This feature is very useful and flexible for many use-cases, I showed you in the online shopping preset with its pre configured rule but you can make your own rules as well based on your needs. So I will start by defining what will trigger the rule and the action to take once the rule is triggered.
The triggering options are:
- URL which triggers on specific URLs
- A status code of the request response, for example selecting from the drop list 200, 403, 500, etc. or just type the required status code.
- You can learn which status code is blocking the request by looking on the port logs
- HTML body element for trigger rule on a specific HTML body element, for example, Recaptcha that showed when Recaptcha presented
- And the last two options are Request time more than the specified milliseconds and Request time less than the specified milliseconds.
Now once we have chosen a trigger we need to choose an action.
The Actions that can be performed are:
- Retry with new IP to send the same request with a new IP address
- Retry with a new port that will route the same request with a new port, for example, your port is data center IP and once the rule is triggered then route the request to a residential port.
- Ban IP for a period of time which also can be referred to a cooling period, type zero will ban the IP permanently
- Ban IP per domain for a period of time, this will able you to work with the same IP with other domains.
- Refresh IP
In addition, you can choose where to schedule the rule in the request funnel:
- Before sending the request
- After the headers
- After the body
Let’s look at a few rule examples and their use-cases:
- There are cases that you would like to start your request with data-center IP and when getting a specific page URL you would like to switch to the residential or mobile IPs. This can be achieved by selecting URL as the trigger, type the required URL and retry with a new port and select the predefined port.
- Another useful option for reducing bandwidth is the REGEX, by selecting the listed file formats it will remove the selected file formats from the request and the responses will be lighter.
Once you have set up your rules you can test it at the bottom of the page.
Another great rule is based on an error or status code and you want to automatically retry with a new IP, for example, I am getting a 403 error code which means access is forbidden.
Setting the rule by selecting status code as the trigger, select 403 from the drop list and select retry with new IP – this will route the same request with a new IP address.
Timing issues can be solved, by banning the specific IPs or saving fast IPs for future use.
To save fast IPs for future use, I’ll select ‘Request time less then… and select the required time as the rule trigger.
And as the action select Save IP to fast pool, this will create your own fast IPs pool.
Finally, in every port, you can go to the LOGS tab and view the log history for troubleshooting and tracking your requests.
In the HAR viewer, you can find logs of all of your requests including URL status code, used bandwidth, time, peer IP and timestamp.
Clicking specific requests will reveal the request details including the header, IP, ASN, geolocation, the response and request time.
In the sessions tab, you can find the session ID, host IP and last used IP.
And in the banned IPs tab you can find the list of banned IPs, their domains and expiration of banning the IP.