find()
and find_all()
are essential methods for web scraping with BeautifulSoup, helping you extract data from HTML. The find()
method retrieves the first element matching your criteria, such as find("div")
to get the first div
on a page, returning None
if no match is found. Meanwhile, find_all()
finds all matching elements and returns them as a list, making it perfect for extracting multiple elements like all div
tags. Before starting your web scraping journey with BeautifulSoup, ensure you have both Requests and BeautifulSoup installed.
Install dependencies
find()
Let’s get acquainted with find()
. In the examples below, we’ll use Quotes To Scrape and the Fake Store API for finding elements on the page. Both of these sites were built for scraping. They don’t change much, so they’re perfect for learning.
Find by Class
To find an element using its class
, we use the class_
keyword. You might wonder why class_
and not class
? In Python, class
is a keyword used for creating custom datatypes. The underscore in class_
prevents this keyword from causing conflicts with our code.
The example below finds the first div
with the class
: quote
.
Here is our output.
Find By ID
When scraping, you’ll also commonly need to look for elements using their id
. In the example below, we use the id
arg to find the menu on the page. Here, we find the menu on the page using its id
.
Here is the menu once we’ve extracted it and printed it to the terminal.
Find by Text
We can also search for items using their text. To do this, we use the string
argument. The example below finds the Login
button on the page.
As you can see, Login
is printed to the console.
Find by Attribute
We can also use different attributes for more precise searching. This time, we once again find the first quote from the page. However, we look for a span
with the itemprop
of text
. This once again finds our first quote, but without all the extra stuff, like author
and tags
.
Here’s the clean version of our first quote.
Find Using Multiple Criteria
As you may have noticed earlier, the attr
argument takes a dict
instead of a single value. This allows us to pass in multiple criteria for even better filtering. Here, we find the first author on the page using the class
and itemprop
attributes.
When you run this, you should get Albert Einstein
as output.
find_all()
Now, let’s go through these same examples using find_all()
. Once again, we’ll use Quotes to Scrape and the Fake Store API. These examples are almost identical, but with one major difference. find()
returns a single element. find_all()
returns a list
of page elements.
Find by Class
To find elements using their class
, we use the class_
keyword argument. The code below uses find_all()
to extract each Quote using its CSS class.
When we extract and print the first page of quotes, it looks like this.
Find by ID
As we talked about when using find()
, id
is another one of the more common methods you might use to extract data from the page. To extract data using its id
, we use the id
argument… just like we did earlier.
We then find all of the ul
items with an id
of menu
. There’s only one menu, so we’ll actually only find one.
Since there is only one menu on the page, our output is exactly the same as it was when using find()
.
Find by Text
Now, we’ll extract items from a page using their text. We’ll use the string
argument. In the example below, we find all a
elements containing the string
: Login
. Once again, there’s only one.
Your output should look like this.
Find by Attribute
When you move on to scraping in the wild, you’ll often need to use other attributes to extract items from the page. Remember how messy output from the first example? In this next snippet, we’ll use the itemprop
attribute and only extract the quotes this time.
Look how clean our output is!
Find Using Multiple Criteria
This time, we’ll use the attrs
argument in a more complex way. Here, we find all small
elements that have a class
of author
and an itemprop
of author
. We do this by passing both attributes into our attrs
dictionary.
Here’s our list of authors in the console.
Advanced Techniques
Here are some more advanced techniques. In the examples below, we use find_all()
but these methods are equally compatible when using find()
. Just remember, do you want a single element, or a list of them?
Regex
Regex is a very powerful tool for string matching. In this code example, we combine it with the string
article to find all elements containing einstein
, regardless of their capitalization.
There are 3 quotes found on the page.
Custom Functions
Now, let’s write a custom function to return all actual quotes from Einstein. In the example below, we expand on the regex. We use the parent
method to traverse and find the card containing the quote. Next, we find all the spans. The first span
on the card contains the actual quote. We print its contents to the console.
Here is our output.
Bonus: Find Using CSS Selectors
BeautifulSoup’s select
method works almost exactly like find_all()
, but it’s a bit more flexible. This method takes in a CSS Selector. If you can write a selector, you can find it. In this code, we find all of our authors using multiple attributes again. However, we can pass these in as a single selector.
Here is our output.
Conclusion
Now you know just about every aspect of find()
and find_all()
in BeautifulSoup. You don’t need to master all of these methods. The large variety of find methods allow you to choose what you’re comfortable with. Most importantly, you can use them to extract data from any web page. In production, especially for fast and reliable results with a high success rate, you might want to consider our Residential Proxies or even Scraping Browser that has a built-in proxy management system and CAPTCHA solving capabilities.
Sign up and start your free trial today to find the perfect product for your needs.
No credit card required