Web Scraping With Scala and jsoup in 2025

Discover efficient techniques to extract dynamic Pinterest data using Playwright and scraper APIs for fast, scalable web scraping.
8 min read
Web Scraping With Scala blog image

Python and JavaScript dominate the entire scraping industry. If you need performance or portability, Scala offers a strong alternative. Scala gives us a compiled, portable and strongly typed foundation to work with.

Today, we’re going over how to scrape using Scala and jsoup. While it’s not written about as often as web scraping with Python, Scala provides a strong foundation and decent scraping tools.

Why Scala?

There are quite a few reasons you might choose Scala over Python or JavaScript.

  • Performance: Scala is compiled to the JVM(Java Virtual Machine). Compilers translate our code into machine-executable bytecode. This makes it inherently faster than Python.
  • Static Typing: Type checking offers an additional layer of safety. Many common bugs get caught before the program will even run.
  • Portability: Scala gets compiled to JVM (Java Virtual Machine) bytecode. JVM bytecode can run anywhere that Java is installed.
  • Fully Compatible With Java: You can use Java dependencies in your Scala code. This greatly broadens the ecosystem available to you.

Getting Started

Before you get started, you need to make sure you’ve got Scala installed. We’ve got instructions below for Ubuntu, macOS, and Windows.

You can view the full documentation on installation here.

Ubuntu

curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup

macOS

brew install coursier && coursier setup

Windows

Download the Scala installer for Windows.

Creating a Scraper

Make a new project folder and cd into it.

mkdir quote-scraper
cd quote-scraper

Initialize a new Scala project. The command converts our new folder into a Scala project and creates a build.sbt file to hold our dependencies.

sbt new scala/scala3.g8

Now, open up build.sbt. You’ll need to add jsoup as a dependency. Your complete build file should look like this.

val scala3Version = "3.6.3"

lazy val root = project
  .in(file("."))
  .settings(
    name := "quote-scraper",
    version := "0.1.0-SNAPSHOT",

    scalaVersion := scala3Version,

    libraryDependencies += "org.scalameta" %% "munit" % "1.0.0" % Test,

    libraryDependencies += "org.jsoup" % "jsoup" % "1.18.3"
  )

Next, copy and paste the code below into your Main.scala file.

import org.jsoup.Jsoup
import scala.jdk.CollectionConverters._

@main def QuotesScraper(): Unit =
  val url = "http://quotes.toscrape.com"

  try
    val document = Jsoup.connect(url).get()
    //find all objects on the page with the quote class
    val quotes = document.select(".quote")

    for quote <- quotes.asScala do
      //find the first object with the class "text" and return its text
      val text = quote.select(".text").text()
      //find the first object with the class "author" and return its text
      val author = quote.select(".author").text()
      println(s"Quote: $text")
      println(s"Author: $author")
      println("-" * 50)

  catch case e: Exception => println(s"Error: ${e.getMessage}")

Running the Scraper

To run our scraper, run the following command from the root of the project.

sbt run

You should see an output similar to the one below.

Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
--------------------------------------------------
Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
--------------------------------------------------
Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
--------------------------------------------------
Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
--------------------------------------------------
Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
--------------------------------------------------
Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
--------------------------------------------------
Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
--------------------------------------------------
Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
--------------------------------------------------
Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
--------------------------------------------------
Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
--------------------------------------------------
[success] Total time: 6 s, completed Feb 18, 2025, 8:58:04 PM

Selection With jsoup

To find page elements with jsoup, we use the select() method. select() returns a list of all elements matching our selector. Let’s look at how this works in our Quote Scraper project.

In this line, we use document.select(".quote") to return all page elements with a class of quote.

val quotes = document.select(".quote")

We could also write these selectors with more structure: element[attribute='some value']. This allows us to apply stronger filters when searching for objects on the page.

The line below would still return the same page objects, but it’s much more expressive.

val quotes = document.select("div[class='quote']")

Let’s look at a couple other instances of select() from our code. Since there is only one text element and one author in each quote, select() only returns one text object and one author. If our quote element contained multiple texts or authors, it would return all texts and authors for each quote.

//find objects with the class "text" and return their text
val text = quote.select(".text").text()
//find objects with the class "author" and return their text
val author = quote.select(".author").text()

Extraction With jsoup

To extract data with jsoup, we can use the following methods:

  • text(): Extract the text from a list of page elements. When you’re scraping prices from a website, they show up on the page as text.
  • attr(): Extract a specific attribute from a single page element. These are pieces of data located within the HTML tags. This method is commonly used to extract links from a website.

text()

We saw examples of this with our initial scraper. text() returns the text of any elements we call it on. If the example below was to find two authors, text() would extract both of their text and combine them into a single string.

//find objects with the class "text" and return their text
val text = quote.select(".text").text()
//find objects with the class "author" and return their text
val author = quote.select(".author").text()

attr()

The attr() method behaves differently from text(). This method extracts a single attribute from a single page item.

//find link elements with the class "tag" and extract the "href" from the first one
val firstTagLink = quote.select("a[class='tag']").attr("href")

With this line added in, our output now looks like this.

Quote: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
Author: Albert Einstein
First Tag Link: /tag/change/page/1/
--------------------------------------------------
Quote: “It is our choices, Harry, that show what we truly are, far more than our abilities.”
Author: J.K. Rowling
First Tag Link: /tag/abilities/page/1/
--------------------------------------------------
Quote: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
Author: Albert Einstein
First Tag Link: /tag/inspirational/page/1/
--------------------------------------------------
Quote: “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
Author: Jane Austen
First Tag Link: /tag/aliteracy/page/1/
--------------------------------------------------
Quote: “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
Author: Marilyn Monroe
First Tag Link: /tag/be-yourself/page/1/
--------------------------------------------------
Quote: “Try not to become a man of success. Rather become a man of value.”
Author: Albert Einstein
First Tag Link: /tag/adulthood/page/1/
--------------------------------------------------
Quote: “It is better to be hated for what you are than to be loved for what you are not.”
Author: André Gide
First Tag Link: /tag/life/page/1/
--------------------------------------------------
Quote: “I have not failed. I've just found 10,000 ways that won't work.”
Author: Thomas A. Edison
First Tag Link: /tag/edison/page/1/
--------------------------------------------------
Quote: “A woman is like a tea bag; you never know how strong it is until it's in hot water.”
Author: Eleanor Roosevelt
First Tag Link: /tag/misattributed-eleanor-roosevelt/page/1/
--------------------------------------------------
Quote: “A day without sunshine is like, you know, night.”
Author: Steve Martin
First Tag Link: /tag/humor/page/1/
--------------------------------------------------
[success] Total time: 3 s, completed Feb 18, 2025, 10:29:30 PM

Alternative Web Scraping Tools

  • Scraping Browser: A remote browser fully integrated with proxies that you can use from Playwright and Selenium.
  • Web Scraper APIs: Automate your scraping process by calling one of our APIs. When you call a scraper API, we scrape a site and send the data back to you.
  • No Code Scraper: Tell us what site you want to scrape and which data you want. We’ll handle the rest.
  • Datasets: Our datasets are perhaps the easiest of any extraction method. We scrape hundreds of sites and update our databases all the time. Datasets give you a clean set of data that’s ready for analysis.

Conclusion

Web scraping is pretty intuitive with Scala. You learned how to select page elements and extract their data using jsoup. If scraping isn’t your thing, you can always use one of our automated tools to guide the process along or entirely skip the scraping process with our ready-to-use datasets.

Sign up now and start your free trial today!

No credit card required