In this post, we will cover:
- What is curl
- What you will need to start using proxies
- Which proxies and protocols work best
- How to define a proxy in curl
- Important tricks and tips
What is curl
curl or ‘Client URL’ is a command-line tool which is used to transfer data to/from servers over the internet using URL Syntax. curl is used for proxy support, HTTP posts, user authentication, and data collection. Companies typically use curl to download entire web sites or specific pages.
Does curl support proxies?
Yes, to start using curl with proxies, enter the proxy addresses you wish to use with the help of the following commands:
-x
--proxy
Then go ahead and enter each proxies credentials using the following command lines:
-U
--proxy-user
If you fail to specify certain credentials then curl will substitute them for the following defaults:
Protocol: http://
Port number: 1080
Here is an example of what this should look like:
$ curl --proxy proxy_FQDN_OR_IPAddress:PortNo --proxy-user Username:Password “Website link”
For Example -
$ curl -x proxy.example.com:3128 -U testuser:test123 https://www.reddit.com
Or
$ curl --proxy proxy.example.com:3128 --proxy-user testuser:test123 https://www.reddit.com
Or
$ curl --proxy testuser:[email protected]:3128 http://www.reddit.com
Installing curl
Let’s now learn how to install curl
on your machine.
macOS
You do not need to install curl
on macOS. This is because curl
is already included in the operating system and you can use it in the Terminal application natively.
Windows
Starting from Windows 10, Windows comes with a copy of curl
. At the same time, the curl
command is an alias for the PowerShell Invoke-WebRequest
command. This means that curl
commands in the Windows terminal will invoke Invoke-Request
behind the scenes. To avoid this and actually use curl
, replace curl
with curl.exe
. This way, PowerShell will run curl
and not Invoke-Request
.
For example, to verify the current version of curl
installed on your Windows machine run in the terminal:
curl.exe --version
This should print something similar to:
curl 7.83.1 (Windows) libcurl/7.83.1 Schannel
Release-Date: 2022-05-13
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets
Linux
If you are a Linux user, the procedure for installing curl
changes depending on the specific distribution you are using. Popular Linux distributions, such as Ubuntu and Fedora, include curl
by default. So, you can use curl
directly in the terminal.
In other distributions, curl
may not be included by default. In this case, you can install it using your distribution package manager. For example, on a Debian-based OSs you can install curl
with the following command:
sudo apt-get install curl
What you will need to start using proxies
In order to get started using curl with proxy services, you will need to have the following information handy: IP address, port number, protocol, and in some cases the username and password. The most common internet protocols currently in use include HTTP and HTTPS – the next section will explain more about which proxies work best with which use cases and internet protocols.
Which proxies and protocols work best
In order to answer this question, one needs to understand that curl can be used to send API requests which have many components. The most important ones to focus on for our purposes are:
- ‘Endpoints’ – This is the URL or the website from which we are attempting to extract/download data.
- ‘Headers’ – This consists of the request metadata such as the User Agent.
Deciding which proxies to use for web scraping with curl will very much depend on your use case and the nature of your request. In Bright Data’s ‘Ultimate Guide for Proxies’, you can really get into the nitty gritty. But for now here are some highlights:
- Datacenter: Demands more R&D resources in order to build things such as ‘user emulation’ and may benefit use cases that require ‘static IPs’ e.g., collecting retail data for a single location.
- Residential: Requires fewer in-house resources. This is a real network of devices belonging to individuals and will work best when looking to collect geo-specific, customer tailored data. For example, localized competitor marketing campaigns.
- ISP proxies: Are a combination between Datacenter and Residential as they are routed through datacenters but are treated as Residential requests by target sites. This network works best with web data extraction use cases that have specific city or country targeting such as product pricing and consumer sentiment.
- Mobile proxies: Consist of real 3G/4G devices. This type of proxy works best for cellular based activities such as ad-verification, and application User Experience and Interface monitoring.
HTTP vs. HTTPS
Regarding HTTP vs. HTTPS, the latter is the preferable option both in terms of using curl as well as generally speaking when it comes to secure data collection. The ‘S’ stands for Secure as it has better end-to-end encryption employing Transport Layer Security (TLS) protocols.
HTTPS aims to authenticate the target website as well as protecting the privacy and integrity of the data being transferred. HTTPS is therefore more suited for discreet collection or transfer of sensitive data. Whereas HTTP is better suited for collecting in-depth market research data or data at scale.
Each business can use the above to decide which protocol best suits their needs both when using curl and generally speaking.
How to define a proxy in curl
Once you have decided on a proxy protocol type, you can setup your proxies in curl by using this command:
curl --help
Then, choose the following option from the output list:
-x, --proxy [protocol://]host[:port]
Using environment variables
For those of you that are interested in using environment variables, go ahead and run the command that applies to your work as follows:
export http_proxy="http://user:[email protected]_IP_Address_or_FQDN:port"
export https_proxy=http://user:[email protected]_Ip_Address_or_FQDN:port
Example –
$ export http_proxy=”http://testuser:[email protected]:3128”
$ export https_proxy=”http://testuser:[email protected]:3128”
Now, you can continue running curl normally using the following command:
$ curl -v https://www.reddit.com
-v option can be helpful to investigate which proxy and port number is used to connect the target URL.
Important tricks and tips
In this section, we are going to show you some interesting tricks and valuable tips to using proxies with curl in a way that benefits your specific use case the most.
How to always use proxies for curl
If you want to designate proxies to only be used for curl-based jobs then go ahead and use the following string of commands:
One: cd ~
$ nano .curlrc
Two - Add this line in the file:
proxy=http://user:[email protected]_address_or_FQDN:port
Example -
proxy=http://testuser:[email protected]:3128
Three - Now run cUrl regularly:
$ curl "https://www.reddit.com"
Turning proxies on and off
You can do this by creating an alias in your .bashrc file in your editor as follows:
$ cd ~
alias proxyon="export http_proxy=' http://user:[email protected]_IP_Or_FQDN:Port';export https_proxy='http://user:[email protected] Proxy_IP_Or_FQDN:Port'"
alias proxyoff="unset http_proxy;unset https_proxy"
Example –
alias proxyon="export http_proxy='http://testuser:[email protected]:3128';export https_proxy=' http://testuser:[email protected]:3128'"
Run alias command on terminal to quickly check the alias setup
Now, save the .bashrc and update the shell using:
$ ~/.bashrc
Bypass SSL certificate errors
When cURL experiences SSL certificate errors it blocks those requests. When looking to debug, especially in a one-off case scenario, you can ‘skip’ SSL certificate errors if you add -k or –insecure to the cURL command line as follows:
curl -x "[protocol://][host][:port]" -k [URL]
Getting More information about the request
In some cases, your requests won’t work as you expected, and you will probably want to diagnose the request path, headers and different errors.
In order to investigate the request, add -v (–verbose) to the request after the Curl, this will output all the request headers and connections you’ve experienced.
Ignore proxies for a single request
If you are looking to override a proxy for a specific request, go ahead and use the following command line:
curl --proxy "http://user:[email protected]_FQDN_Or_IPAddress" "https://reddit.com"
Or use:
$ curl --noproxy "*" https://www.reddit.com
If you want to bypass proxies altogether. Using option -v, it shows connection is going directly to Reddit without using any Proxy as shown in the image:
Using SOCK proxies
If you wish to use any kind of SOCK proxy (4/4a/5/5h) the code structure remains the same as before except you swap out the relevant section with the relevant socks type as follows:
curl -x "socks5://user:[email protected]_IP_or_FQDN:Port" https://www.reddit.com
For Example -
$ curl -x "socks5://testuser:[email protected]:3128" https://www.reddit.com
Pro Tip – No protocol specified will make curl default to SOCKS4!
The bottom line
When looking to use curl with proxies there are many technical decisions to make, but the most important point to remember throughout this journey is using a reputable proxy provider. Bright Data offers all of the-above mentioned proxy types, performing real-time network monitoring and implementing a zero IP address reselling policy.
Additionally, Bright Data has one of the largest residential peer networks enabling data collection from a local user’s perspective. This is especially true for companies looking for US-based IPs, making Bright a popular choice among business professionals and developers alike.