Web Scraping With Laravel: A Step-By-Step Guide

Learn how to perform web scraping using Laravel, from setting up a scraping API to utilizing powerful libraries, while adhering to best practices for ethical scraping.
3 min read
Web Scraping With Laravel blog image

In this tutorial, you will explore web scraping in Laravel and learn:

  • Why Laravel is a great technology for web scraping
  • What the best Laravel scraping libraries are
  • How to build a Laravel web scraping API from scratch

Let’s dive in!

Is It Possible to Perform Web Scraping in Laravel?

TL;DR: Yes, Laravel is a viable technology for web scraping.

Laravel is a powerful PHP framework known for its elegant and expressive syntax. In particular, it enables you to create APIs for scraping data from the Web on the fly. This is possible thanks to the support of many scraping libraries, which simplify the process of getting data from pages. For more guidance, check out our article on web scraping in PHP.

Laravel is an excellent choice for web scraping due to its scalability, easy integration with other tools, and extensive community support. Its strong MVC architecture helps keep your scraping logic well-organized and maintainable. That comes in handy when building complex or large-scale scraping projects.

Best Laravel Web Scraping Libraries

These are the best libraries to do web scraping with Laravel:

  • BrowserKit: Part of the Symfony framework, it simulates the API of a web browser for interacting with HTML documents. It relies on DomCrawler to navigate and scrape HTML documents. This library is ideal for extracting data from static pages in PHP.
  • HttpClient: A Symfony component to send HTTP requests. It integrates seamlessly with BrowserKit.
  • Guzzle: A robust HTTP client to send web requests to servers and handle responses efficiently. It is helpful for retrieving the HTML documents associated with web pages. Learn how to set up a proxy in Guzzle.
  • Panther: A Symfony component that provides a headless browser for web scraping. It allows you to interact with dynamic sites that require JavaScript for rendering or interaction.

Prerequisites

To follow this tutorial for web scraping in Laravel, you need to meet the following prerequisites:

An IDE to code in PHP is also recommended. Visual Studio Code with the PHP extension or WebStorm are both great solutions.

How to Build a Web Scraping API in Laravel

In this step-by-step section, you will see how to build a Laravel web scraping API. The target site will be the Quotes scraping sandbox site, and the scraping endpoint will:

  1. Select the quote HTML elements from the page
  2. Extract data from them
  3. Return the scraped data in JSON

This is what the target site looks like:

Quotes to scrape page

Follow the instructions below and learn how to perform web scraping in Laravel!

Step 1: Set up a Laravel project

Open the terminal. Then, launch the Composer create-command command below to initialize your Laravel web scraping application:

composer create-project laravel/laravel laravel-scraper

The lavaral-scraper folder will now contain a blank Laravel project. Load it in your favorite PHP IDE.

This is the file structure of your current backend:

file structure in the backend

Wonderful! You now have a Laravel project in place.

Step 2: Initialize Your Scraping API

Launch the Artisan command below in the project directory to add a new Laravel controller:

php artisan make:controller HelloWorldController

This will create the following ScrapingController.php file in the /app/Http/Controllers directory:

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;

class ScrapingController extends Controller

{

//

}

In ScrapingController file, add the following scrapeQuotes() method:

public function scrapeQuotes(): JsonResponse

{

// scraping logic...

return response()->json('Hello, World!');

}

Currently, the method returns a placeholder 'Hello, World!' JSON message. Soon, it will contain some scraping logic in Laravel.

Do not forget to add the following import:

use Illuminate\Http\JsonResponse;

Associate the scrapeQuotes() method to a dedicated endpoint by adding the following lines to routes/api.php:

use App\Http\Controllers\ScrapingController;

Route::get('/v1/scraping/scrape-quotes', [ScrapingController::class, 'scrapeQuotes']);

Great! Time to verify that the Laravel scraping API works as desired. Keep in mind that Laravel APIs are available under the /api path. So, the complete API endpoint is /api/v1/scraping/scrape-quotes.

Launch your Laravel application with the following command:

php artisan serve

Your server should now be listening locally on port 8000.

Use cURL to make a GET request to the /api/v1/scraping/scrape-quotes endpoint:

curl -X GET 'http://localhost:8000/api/v1/scraping/scrape-quotes'

Note: On Windows, replace curl with curl.exe. Learn more in our cURL for web scraping guide.

You should get the following response:

"Hello, World!"

Fantastic! The sample scraping API works like a charm. It is time to define some scraping logic with Laravel.

Step 3: Install the scraping libraries

Before installing any packages, you need to determine which Laravel web scraping libraries best fit your needs. To do so, open the target site in your browser. Right-click on the page and select “Inspect” to open the Developer Tools. Then, go to the “Network” tab, reload the page, and access the “Fetch/XHR” section:

accessing the 'Fetch XHR' section

As you can tell, the webpage does not perform any AJAX requests. This means it does not dynamically load data on the client side. Thus, it is a static page with all the data embedded in the HTML documents.

Since the page is static, you do not need a headless browser library to scrape it. Although you could still use a browser automation tool, that would only introduce unnecessary overhead. The recommended approach is to use the BrowserKit and HttpClient components from Symfony.

Add the symfony/browser-kit and symfony/http-client components to your project’s dependencies with:

composer require symfony/browser-kit symfony/http-client

Well done! You now have everything required to perform data scraping in Laravel.

Step 4: Download the target page

Import BrowserKit and HttpClient in ScrapingController:

use Symfony\Component\BrowserKit\HttpBrowser;

use Symfony\Component\HttpClient\HttpClient;

In scrapeQuotes(), initialize a new HttpBrowser object:

$browser = new HttpBrowser(HttpClient::create());

This enables you to make HTTP requests by simulating browser behavior. At the same time, remember that it does not execute requests in a real browser. HttpBrowser just provides features similar to those of a browser, such as cookie and session handling.

Use the request() method to perform an HTTP GET request to the URL of the target page:

$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');

The result will be a Crawler object, which automatically parses the HTML document returned by the server. This class also provides node selection and data extraction capabilities.

You can verify that the above logic works by extracting the HTML of the page from the crawler:

$html = $crawler->outerHtml();

For testing, make your API return this data.

Your scrapeQuotes() function will now look like as follows:

public function scrapeQuotes(): JsonResponse

{

// initialize a browser-like HTTP client

$browser = new HttpBrowser(HttpClient::create());

// download and parse the HTML of the target page

$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');

// get the page outer HTML and return it

$html = $crawler->outerHtml();

return response()->json($html);

}

Amazing! Your API will now return:

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<title>Quotes to Scrape</title>

<link rel="stylesheet" href="/static/bootstrap.min.css">

<link rel="stylesheet" href="/static/main.css">

</head>

<!-- omitted for brevity ... -->

Step 5: Inspect the page content

To define the data extraction logic, it is essential to examine the HTML structure of the target page.

So, open Quotes To Scrape in your browser. Then, right-click on a quote HTML element and select the “Inspect” option. In the DevTools of your browser, expand the HTML and start studying it:

Inspecting the quote elements

Here, notice that each quote card is a .quote HTML node that contains:

  1. A .text element with the quote text
  2. A .author node with the name of the author
  3. Many .tag elements, each displaying a single tag

With the above CSS selectors, you have everything you need to perform web scraping in Laravel. Use these selectors to target the DOM elements of interest and extract data from them in the next steps!

Step 6: Get ready to perform web scraping

Since the target page contains several quotes, create a data structure where to store the scraped data. An array will be ideal:

quotes = []

Then, use the filter() method from the Crawler class to select all quote elements:

$quote_html_elements = $crawler->filter('.quote');

This returns all DOM nodes on the page that match the specified .quote CSS selector.

Next, iterate over them and get ready to apply the data extraction logic on each of them:

foreach ($quote_html_elements as $quote_html_element) {

// create a new quote crawler

$quote_crawler = new Crawler($quote_html_element);

// scraping logic...

}

Note that the DOMNode objects returned by filter() do not provide methods for node selection. So, you need to create a local Crawler instance limited to your specific HTML quote element.

For the above code to work, add the following import:

use Symfony\Component\DomCrawler\Crawler;

You do not need to manually install the DomCrawler package. That is because it is a direct dependency of the BrowserKit component.

Huge! You are one step closer to your Laravel web scraping goal.

Step 7: Implement data scraping

Inside the foreach loop:

  1. Extract the data of interest from the .text, .author, and .tag elements
  2. Populate a new $quote object with them
  3. Add the new $quote object to $quotes

First, select the .text element inside the HTML quote element. Then, use the text() method to extract the inner text from it:

$text_html_element = $quote_crawler->filter('.text');

$raw_text = $text_html_element->text();

Note that each quote is enclosed by the \u201c and \u201d special characters. You can remove them using the str_replace() PHP function as follows:

$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);

Similarly, scrape the author info with:

$author_html_element = $quote_crawler->filter('.author');

$author = $author_html_element->text();

Scraping the tags can be a bit more challenging. Since a single quote can have multiple tags, you need to define an array and scrape each tag individually:

$tag_html_elements = $quote_crawler->filter('.tag');

$tags = [];

foreach ($tag_html_elements as $tag_html_element) {

$tag = $tag_html_element->textContent;

$tags[] = $tag;

}

Note that the DOMNode elements returned by filter() do not expose the text() method. Equivalently, they provide the textContent attribute.

This is what the entire Laravel data scraping logic will look like:

// create a new quote crawler

$quote_crawler = new Crawler($quote_html_element);

// perform the data extraction logic

$text_html_element = $quote_crawler->filter('.text');

$raw_text = $text_html_element->text();

// remove special characters from the raw text information

$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);

$author_html_element = $quote_crawler->filter('.author');

$author = $author_html_element->text();

$tag_html_elements = $quote_crawler->filter('.tag');

$tags = [];

foreach ($tag_html_elements as $tag_html_element) {

$tag = $tag_html_element->textContent;

$tags[] = $tag;

}

Here we go! You are close to the final goal.

Step 8: Return the scraped data

Create a $quote object with the scraped data and add it to $quotes:

$quote = [

'text' => $text,

'author' => $author,

'tags' => $tags

];

$quotes[] = $quote;

Next, update the API response data with the $quotes list:

return response()->json(['quotes' => $quotes]);

At the end of the scraping loop, $quotes will contain:

array(10) {

[0]=>

array(3) {

["text"]=>

string(113) "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."

["author"]=>

string(15) "Albert Einstein"

["tags"]=>

array(4) {

[0]=>

string(6) "change"

[1]=>

string(13) "deep-thoughts"

[2]=>

string(8) "thinking"

[3]=>

string(5) "world"

}

}

// omitted for brevity...

[9]=>

array(3) {

["text"]=>

string(48) "A day without sunshine is like, you know, night."

["author"]=>

string(12) "Steve Martin"

["tags"]=>

array(3) {

[0]=>

string(5) "humor"

[1]=>

string(7) "obvious"

[2]=>

string(6) "simile"

}

}

}

Super! This data will then be serialized into JSON and returned by the Laravel scraping API.

Step 9: Put it all together

Here is the final code of the ScrapingController file in Laravel:

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;

use Illuminate\Http\JsonResponse;

use Symfony\Component\BrowserKit\HttpBrowser;

use Symfony\Component\HttpClient\HttpClient;

use Symfony\Component\DomCrawler\Crawler;

class ScrapingController extends Controller

{

public function scrapeQuotes(): JsonResponse

{

// initialize a browser-like HTTP client

$browser = new HttpBrowser(HttpClient::create());

// download and parse the HTML of the target page

$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');

// where to store the scraped data

$quotes = [];

// select all quote HTML elements on the page

$quote_html_elements = $crawler->filter('.quote');

// iterate over each quote HTML element and apply

// the scraping logic

foreach ($quote_html_elements as $quote_html_element) {

// create a new quote crawler

$quote_crawler = new Crawler($quote_html_element);

// perform the data extraction logic

$text_html_element = $quote_crawler->filter('.text');

$raw_text = $text_html_element->text();

// remove special characters from the raw text information

$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);

$author_html_element = $quote_crawler->filter('.author');

$author = $author_html_element->text();

$tag_html_elements = $quote_crawler->filter('.tag');

$tags = [];

foreach ($tag_html_elements as $tag_html_element) {

$tag = $tag_html_element->textContent;

$tags[] = $tag;

}

// create a new quote object

// with the scraped data

$quote = [

'text' => $text,

'author' => $author,

'tags' => $tags

];

// add the quote object to the quotes array

$quotes[] = $quote;

}

var_dump($quotes);

return response()->json(['quotes' => $quotes]);

}

}

Time to test it!

Start your Laravel server:

php artisan serve

Then, make a GET request to the /api/v1/scraping/scrape-quotes endpoint:

curl -X GET 'http://localhost:8000/api/v1/scraping/scrape-quotes'

You will get the following result:

{

"quotes": [

{

"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",

"author": "Albert Einstein",

"tags": [

"change",

"deep-thoughts",

"thinking",

"world"

]

},

// omitted for brevity...

{

"text": "A day without sunshine is like, you know, night.",

"author": "Steve Martin",

"tags": [

"humor",

"obvious",

"simile"

]

}

]

}

Et voilà! In less than 100 lines of code, you just performed web scraping in Laravel.

Next steps

The API you built here is just a basic example of what you can achieve with Laravel when it comes to web scraping. To take your project to the next level, consider the following improvements:

  • Implement web crawling: The target site contains several quotes spread across multiple pages. This is a common scenario requiring web crawling for complete data retrieval. Read our article on the definition of a web crawler.
  • Schedule your scraping task: Add a scheduler to call your API at regular intervals, store the data in a database, and ensure you always have fresh data.
  • Integrate a proxy: Making multiple requests from the same IP can lead to being blocked by anti-scraping measures. To avoid that, consider integrating residential proxies into your PHP scraper.

Keep Your Laravel Web Scraping Operation Ethical and Respectful

Web scraping is an effective way to gather valuable data for various purposes. However, the goal is to retrieve data responsibly, not to harm the target site. Thus, it is important to approach scraping with the right precautions.

Follow these tips to ensure responsible Kotlin web scraping:

  • Check and comply with the site’s Terms of Service: Before scraping a site, review its Terms of Service. These often include information on copyright, intellectual property rights, and guidelines for using their data.
  • Respect the robots.txt file: The robots.txt file of a site defines the rules for how automated crawlers should access its pages. To maintain ethical practices, adhere to these guidelines. Discover more in our robots.txt for web scraping guide.
  • Target only publicly available information: Focus on data that is publicly accessible. Avoid scraping pages protected by login credentials or other forms of authorization. Targeting private or sensitive data without proper permission is unethical and may lead to legal consequences.
  • Limit the frequency of your requests: Making too many requests in a short period can overload the server, affecting the site’s performance for all users. That might also trigger rate limiting measures and get you blocked. Avoid flooding the target server by adding random delays between your requests.
  • Rely on trustworthy and up-to-date scraping tools: Prefer reputable providers and opt for tools that are well-maintained and regularly updated. This ensures they align with the latest ethical Laravel scraping practices. If you are unsure, check out our article about how to choose the best web scraping service.

Conclusion

In this guide, you saw why Laraval is a good framework for building web scraping APIs. You also had the opportunity to explore some of its best scraping libraries. Then, you learned how to create a Laravel web scraping API that extracts data from a target page on the fly. As you saw, web scraping with Laravel is simple and takes only a few lines of code.

The problem is that most sites protect their data with anti-bot and anti-scraping solutions. These technologies can detect and block your automated requests. Fortunately, Bright Data has a set of solutions for making scraping easy:

  • Scraping Browser: A cloud-based controllable browser that offers JavaScript rendering capabilities while handling CAPTCHAs, browser fingerprinting, automated retries, and more for you. It integrates with the most popular automation browser libraries, such as Playwright and Puppeteer.
  • Web Unlocker: An unlocking API that can seamlessly return the clean HTML of any page, circumventing any anti-scraping measures.
  • Web Scraping APIs: Endpoints for programmatic access to structured web data from dozens of popular domains.

Don’t want to deal with web scraping but are still interested in online data? Explore Bright Data’s ready-to-use datasets!

Sign up now and start your free trial.

No credit card required