In this tutorial, you will learn:
- What a Python proxy server is and how it works.
- The steps required to build an HTTP proxy server in Python.
- The pros and cons of this approach.
Let’s dive in!
What Is a Python Proxy Server?
A Python proxy server is a Python application that acts as an intermediary between clients and the Internet. It intercepts requests from clients, forwards them to the target servers, and sends the response back to the client. By doing so, it masks the client’s identity to the destination servers.
Read our article to dig into what a proxy server is and how it works.
Python’s socket programming capabilities make it easy to implement a basic proxy server, allowing users to inspect, modify, or redirect network traffic. Proxy servers are great for caching, improving performance, and enhancing security when it comes to web scraping.
How to Implement an HTTP Proxy Server in Python
Follow the steps below and learn how to build a Python proxy server script.
Step 1: Initialize Your Python Project
Before getting started, make sure to have Python 3+ installed on your machine. Otherwise, download the installer, execute it, and follow the installation wizard.
Next, use the commands below to create a python-http-proxy-server folder and initialize a Python project with a virtual environment inside it:
Open the python-http-proxy-server folder in your Python IDE and create an empty proxy_server.py file.
Great! You have everything you need to build an HTTP proxy server in Python.
Step 2: Initialize an Incoming Socket
First, you need to create a web socket server for accepting incoming requests. If you are not familiar with that concept, a socket is a low-level programming abstraction that allows for bidirectional data flow between a client and a server. In the context of a web server, a server socket is used to listen for incoming connections from clients.
Use the following lines to create a socket-based web server in Python:
This initializes an incoming socket server and binds it to the http://127.0.0.1:8888 local address. Then, it enables the server to accept connections with the listen() method.
Note: Feel free to change the number of the port the web proxy should listen to. You can also modify the script to read that information from the command line for maximum flexibility.
socket comes from the Python Standard Library. So, you will have the following import on top of your script:
To monitor that the Python proxy server has started as required, log this message:
Step 3: Accept Client Requests
When a client connects to the proxy server, this needs to create a new socket to handle communication with that specific client. This is how you can do it in Python:
To handle multiple client requests simultaneously, you should use multithreading as above. Do not forget to import threading from the Python Standard Library:
As you can see, the proxy server handles incoming requests through the custom handle_client_request() function. See how it is defined in the next steps.
Step 4: Process the Incoming Requests
Once the client socket has been created, you need to use it to:
- Read the data from the incoming requests.
- Extract the target server’s host and port from that data.
- Use it to forward the client request to the destination server.
- Get the response and forward it to the original client.
In this section, let’s focus on the first two steps. Define the handle_client_request() function and use it to read the data from the incoming request:
setblocking(False) sets the client socket to non-blocking mode. Then, use recv() to read the incoming data and append it to request in byte format. Since you do not know the size of the incoming request data, you have to read it one chunk at a time. In this case, a chunk of 1024 bytes has been specified. In non-blocking mode, if recv() does not find any data, it will raise an error exception. Thus, the except instruction marks the end of the operation.
Note the logged messages to keep track of what the Python proxy server is doing.
After retrieving the incoming request, you need to extract the destination server’s host and port from it:
extract_host_port_from_request() extracts the web server’s host and port from the “Host:” field. In this case, host is example.com and port is 80 (as a specific port has not been specified).
Step 5: Forward the Client Request and Handle the Response
Given the target host and port, you now have to forward the client request to the destination server. In handle_client_request(), create a new web socket and use it to send the original request to the desired destination:
Again, you need to work one chunk at a time as you do not know the size of the response. When data is empty, there is no more data to receive and you can terminate the operation.
Do not forget to close the two sockets you defined in the function:
Awesome! You just created an HTTP proxy server in Python. Time to see the entire code, launch it, and verify that it works as expected!
Step 6: Put It All Together
This is the final code of your Python proxy server script:
You should see the following message in the terminal:
To make sure that the server works, execute a proxy request with cURL. Read our guide to learn more on how to use cURL with a proxy.
Open a new terminal and run:
That would make a GET request to the http://httpbin.org/ip destination through the http://127.0.0.1:8888 proxy server.
You should get something like:
That is the IP of the proxy server. Why? Because the /ip endpoint of the HTTPBin project returns the IP the request comes from. If you are running the server locally, “origin” will correspond to your IP.
Note: The Python proxy server built here works only with HTTP destinations. Extending it to handle HTTPS connections is quite tricky.
Now, explore the log written by your proxy server Python application. It should contain:
This tells you that the proxy server received the request in the format specified by the HTTP protocol. Then, it forwarded it to the destination server, logged the response data, and sent the response back to the client. Why are we sure of that? Because the IPs in “origin” are the same!
Congrats! You just learned how to build an HTTP proxy server in Python!
Pros and Cons of Using a Custom Python Proxy Server
Now that you know how to implement a proxy server in Python, you are ready to see the benefits and limitations of this approach.
Pros:
- Total control: With a custom Python script like this, you have total control over what your proxy server does. No shady activity or data leakage there!
- Customization: The proxy server can be extended to include useful features such as logging and caching of requests to improve performance.
Cons:
- Infrastructure costs: Setting up a proxy server architecture is not easy and costs a lot of money in terms of hardware or VPS services.
- Hard to maintain: You are responsible for maintaining the architecture of the proxy, especially its scalability and availability. This is a task that only experienced system administrators can tackle.
- Unreliable: The main issue with this solution is that the exit IP of the proxy server never changes. As a result, anti-bot technologies will be able to block the IP and prevent the server from accessing the desired requests. In other words, the proxy will eventually stop working.
These limitations and drawbacks are too bad to use a custom Python proxy server in a production scenario. The solution? A reliable proxy provider like Bright Data! Create an account, verify your identity, get a free proxy, and use it in your favorite programming language. For example, integrate a proxy into your Python script with requests.
Our huge proxy network involves millions of proxy fast, reliable, secure proxy servers all over the world. Find out why we are the best proxy server provider.
Conclusion
In this guide, you learned what a proxy server is and how it works in Python. In detail, you learned how to build one from scratch using web sockets. You have now become a master of proxies in Python. The main issue with this approach is that the static exit IP of your proxy server will eventually get you blocked. Avoid that with Bright Data’s rotating proxies!
Bright Data controls the best proxy servers in the world, serving Fortune 500 companies and more than 20,000 customers. Its offer includes a wide range of proxy types:
- Datacenter proxies – Over 770,000 datacenter IPs.
- Residential proxies – Over 72M residential IPs in more than 195 countries.
- ISP proxies – Over 700,000 ISP IPs.
- Mobile proxies – Over 7M mobile IPs.
That reliable, fast, and global proxy network is also the basis of a number of web scraping services to effortlessly retrieve data from any site.
No credit card required