How to Scrape Indeed With Python

Learn how to scrape job listings from Indeed using Python, handle anti-bot challenges, and streamline data collection efficiently.
14 min read
How to Scrape Indeed blog image

In this article, you will discover:

  • What an Indeed scraper is and how it works
  • The types of data you can extract automatically from Indeed
  • How to build an Indeed scraping script using Python
  • When and why you might need a more advanced solution

Let’s get started!

What Is an Indeed Scraper?

An Indeed scraper automatically extracts job listings and related data from the Indeed website. It works by mimicking human interactions to navigate job search pages. After that, it identifies specific elements like job titles, companies, locations, and descriptions. Finally, the scraping bot extracts data from them and exports it for analysis.

Data You Can Find on Indeed

Indeed is a treasure trove of job-related data, which can be invaluable for market analysis, recruitment, or research purposes. Below is a list of the key data points you can scrape from it:

  • Job titles: The role or position advertised in the listing.
  • Company names: Details of the employer, including company profiles.
  • Locations: The city, state, or country where the job is based.
  • Job descriptions: Detailed information about the role, responsibilities, and requirements.
  • Salary ranges: Advertised pay scales (if available).
  • Job types: Full-time, part-time, contract, internship, etc.
  • Posting dates: When the job listing was published.
  • Tags and attributes: Keywords like “Urgently Hiring” or “Remote.”
  • Ratings and reviews: Employer ratings and employee feedback.
  • Application options: Indicators like “Easy Apply” availability.

If your focus is on job positions, follow our guide on how to scrape job postings.

How to Scrape Indeed: Step-By-Step Guide

In this tutorial section, you will see how to create an Indeed scraper. You will be guided through the process of building a Python script to scrape the Indeed “data scientist” job posting page:

The data scientist job posting page on Indeed

Follow the instructions and learn how to scrape Indeed!

Step #1: Project Setup

Before getting started, make sure you have Python 3 installed on your machine. Otherwise, download it and install it.

Now, launch the command below in the terminal to create a directory for your project:

mkdir indeed_scraper

indeed_scraper will contain your Python Indeed scraper.

Enter it in the terminal, and initialize a virtual environment inside it:

cd indeed_scraper
python -m venv env

Next, load the project folder in your favorite Python IDE. Visual Studio Code with the Python extension and PyCharm Community Edition are both good options.

Create a scraper.py file in the project’s directory, which should now contain this file structure:

the new scraper.py file in the project directory

scraper.py will soon contain the desired scraping logic.

Time to activate the virtual environment in the IDE’s terminal. In Linux or macOS, do it with this command:

./env/bin/activate

Equivalently, on Windows, run:

env/Scripts/activate

Wonderful! You have a Python environment for Indeed web scraping.

Step #2: Choose the Right Scraping Library

The next step is to determine whether Indeed relies on dynamic or static pages. To do so, open the Indeed target page in incognito mode with your browser and start playing with it. As you can easily tell, most data on the page is loaded dynamically:

An example showing that most of the data is loading dynamically

That is enough to say that you need a browser automation tool like Selenium to scrape Indeed effectively. For more guidance on this process, read our guide on Selenium web scraping.

Selenium enables you to programmatically control a web browser to simulate user interactions and scrape content rendered by JavaScript. Time to install it and get started with it!

Step #3: Install and Configure Selenium

In an activated virtual environment, run the following command to install Selenium:

pip install -U selenium

Import Selenium in scraper.py and set up a WebDriver object:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# Set up a controllable Chrome instance
driver = webdriver.Chrome(service=Service())

The code above initializes what you need to control a Chrome instance.

Note: Indeed has implemented anti-scraping measures to stop headless browsers from accessing its pages. Thus, setting the --headless flag would make your script fail. As an alternative approach, take a look at Playwright Stealth.

As the last line of your script, do not forget to close the web driver:

driver.quit()

Amazing! You are fully configured to scrape Indeed.

Step #4: Visit the Target Page

With the get() method from Selenium, instruct the controlled browser to visit the target page:

driver.get("https://www.indeed.com/jobs?q=data+scientist&l=New+York%2C+NY&from=searchOnHP%2Cwhatautocomplete&vjk=45d1ba700870fbef")

scraper.py will now contain the following lines of code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# Set up a controllable Chrome instance
driver = webdriver.Chrome(service=Service())

# Open the target page in the browser
driver.get("https://www.indeed.com/jobs?q=data+scientist&l=New+York%2C+NY&from=searchOnHP%2Cwhatautocomplete&vjk=45d1ba700870fbef")

# Scraping logc...

# Close the web driver
driver.quit()

Add a debugging breakpoint on the final line. Run the script with the debugger, and below is what you should be seeing:

What you should see after running the script with the debugger

Note: The “Chrome is being controlled by automated test software.” notification tells you that Selenium is controlling Chrome as expected.

Well done!

Use Indeed Scraper API for seamless scraping. Use code API25 for 25% off for 6 months!

Step #5: Select the Job Posting Elements

The Indeed job search page displays numerous job openings. Since we aim to scrape all of them, start by initializing an array to store the scraped data:

jobs = []

Next, inspect the HTML elements of the job openings on the page to understand how to select them:

The HTML elements of the job openings

Here, each job element is a slider_item node inside the #mosaic-provider-jobcards container.

Normally, you would use CSS classes to select elements on the page. However, these classes appear to be randomly generated—likely at build time. To ensure stability, it is better to target the id and data-testid attributes, which are less likely to change frequently.

Rely on Selenium to select the job elements:

jobs_container_element = driver.find_element(By.CSS_SELECTOR, "#mosaic-provider-jobcards")
job_elements = jobs_container_element.find_elements(By.CSS_SELECTOR, "[data-testid=\"slider_item\"]")

The find_elements() method applies the specified selector strategy to retrieve all matching elements from the page. In this case, the selector strategy is a CSS selector.

Make sure to import By for this to work:

from selenium.webdriver.common.by import By

Now, iterate over the selected elements and prepare to scrape data from each one:

for job_element in job_elements:
    # scrape data from each job opening

Fantastic! You are ready to start scraping job positions from Indeed.

Step #6: Scrape the Job Main Info

Inspect a card element, focusing on the information in the upper section of the card:

The HTML of the job title element

Here you see that you can scrape:

  • The job title from the <h2>
  • The job page URL from the <a> inside the title <h2>
  • The company name from the [data-testid="company-name"] node
  • The company location from the [data-testid="text-location"] element

Transfrom the information above in scraping logic as follows:

title_element = job_element.find_element(By.CSS_SELECTOR, "h2.jobTitle")
title = title_element.text

url_element = title_element.find_element(By.CSS_SELECTOR, "a")
url = url_element.get_attribute("href")

company_element =job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"company-name\"]")
company = company_element.text

location_element = job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"text-location\"]")
location = location_element.text

find_element() selects the first element matching the given selector. Given a node, you can then access its text content with the text attribute. To get the value of an HTML attribute of the node, you must use the get_attribute() method.

Cool! You have laid the groundwork for your Indeed scraping logic, but there is still useful data left to scrape.

Step #7: Scrape the Job Details

Focus on the details section of the job position card:

Inspecting the HTML of the job details

This time, the information to scrape is:

  • The tags of the job position in one or more [data-testid="attribute_snippet_testid"] elements inside a .jobMetaDataGroup <div>
  • Whether there is an option to apply easily through Indeed
  • The description items in one or more ul li elements inside a [role="presentation"] <div>

Let’s start by targeting the tags. You can scrape them all with:

tags = []
tags_container_element = job_element.find_element(By.CSS_SELECTOR, ".jobMetaDataGroup")
tag_elements = tags_container_element.find_elements(By.CSS_SELECTOR, "[data-testid=\"attribute_snippet_testid\"]")
for tag_element in tag_elements:
    tag = tag_element.text
    tags.append(tag)

First, you have to initialize an array where to store all retrieved tags. That is required as a single job opening card can contain multiple tags. After selecting them, iterate over them, extract text from them, and add the tags to the array.

Scraping the “Easily apply” information is tricky, too. The problem is that the HTML element indicating that possibility is not present in all job positions. Clearly, it is only present where the “Easily apply” option is supported.

When you try to select an element that is not on the page, Selenium raises a NoSuchElementException. Thus, you can use that to scrape the “Easily apply” check effectively:

try:
    job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"indeedApply\"]")
    easily_apply = True
except NoSuchElementException:
    easily_apply = False

It the [data-testid="indeedApply"] node is not on the page, Selenium will raise a NoSuchElementException. That will be intercepted, and easily_apply will be set to False.

As for the description items, you can scrape them all as you did for the tags:

description = []
description_container_element = job_element.find_element(By.CSS_SELECTOR, "[role=\"presentation\"]")
description_elements = description_container_element.find_elements(By.CSS_SELECTOR, "ul li")
for description_element in description_elements:
    description_item_text = description_element.text
    # Ignore empty description strings
    if (description_item_text != ""):
        description.append(description_item_text)

Wow! The Indeed scraper is almost complete.

Step #8: Collect the Scraped Data

With the scraped data from each job position, populate a job dictionary:

job = {
    "title": title,
    "url": url,
    "company": company,
    "location": location,
    "tags": tags,
    "easily_apply": easily_apply,
    "description": description
}

Then, add it to the jobs array:

jobs.append(job)

At the end of the for loop, products should contain something like:

[{'title': 'Data Scientist', 'url': 'https://www.indeed.com/rc/clk?jk=efc7b7f4a8be2882&bb=NM368jsOPyYGAfEtQk2NNae8tSeBHdJ8Y9tImVa1Q9GAipGe0zzddcUozFEL0Na_pYCR4W6ljgljsBxWTUrluVuL8Gom7x7UZlgMzs0spo3NRgisrZ7meuaPfaEcjWoe&xkcb=SoD767M34WNyEaSTwx0FbzkdCdPP&fccid=8678bc4e64c24580&vjs=3', 'company': 'GQR', 'location': 'New York, NY', 'tags': [], 'easily_apply': False, 'description': ['Stay current with industry trends and emerging technologies to ensure competitive edge.', 'Apply statistical and machine learning techniques to improve investment…']},
# omitted for brevity...
{'title': 'Data Scientist, Financial Crimes - USDS', 'url': 'https://www.indeed.com/rc/clk?jk=aaa16dfd1cc6ef01&bb=NM368jsOPyYGAfEtQk2NNdxizAZQnHpzRrlr6WgbV1RtxmXz4vto1qiiqGiIj9CJFQQCV6cW59nE4hGw1yeNdokPfu8Fgl3EALBx5zdWjPm4COEu78DCFh4KTUMIFWkh&xkcb=SoAT67M34WNyEaSTwx0pbzkdCdPP&fccid=caed318a9335aac0&vjs=3', 'company': 'TikTok', 'location': 'Hybrid work in New York, NY', 'tags': [], 'easily_apply': False, 'description': ['As a Financial Crime Data Scientist, you will play a crucial role in leveraging machine learning, analytics and visualization techniques to enhance our…']}]

Marvelous! You only have to convert this data to a better format.

Step #9: Export the Scraped Data to CSV

To make the scraped data accessible and shareable, it is a good idea to export it to a human-readable format. For example, wrte it in a CSV file. To do so, use these lines of code:

csv_file = "scraped_jobs.csv"
csv_headers = ["title", "url", "company", "location", "tags", "easily_apply", "description"]

with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.DictWriter(file, fieldnames=csv_headers)
    writer.writeheader()
    for job in jobs:
        writer.writerow({
            "title": job["title"],
            "url": job["url"],
            "company": job["company"],
            "location": job["location"],
            "tags": ";".join(job["tags"]),
            "easily_apply": "Yes" if job["easily_apply"] else "No",
            "description": ";".join(job["description"])
        })

The open() function creates the output CSV file, which is then populated with csv.DictWriter. Since the tags and description fields are arrays, join() is used to flatten them into a single string with elements separated by ;.

Do not forget to import csv from the Python Standard Library:

import csv

Here we go! The Indeed scraper is complete.

Step #10: Put It All Together

Your final scraper.py file will now contain:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.common import NoSuchElementException
import csv

# Set up a controllable Chrome instance
driver = webdriver.Chrome(service=Service())

# Open the target page in the browser
driver.get("https://www.indeed.com/jobs?q=data+scientist&l=New+York%2C+NY&from=searchOnDesktopSerp")

# A data structure where to store the scraped job openings
jobs = []

# Select the job opening elements on the page
jobs_container_element = driver.find_element(By.CSS_SELECTOR, "#mosaic-provider-jobcards")
job_elements = jobs_container_element.find_elements(By.CSS_SELECTOR, "[data-testid=\"slider_item\"]")

# Scrape each job opening on the page
for job_element in job_elements:
    title_element = job_element.find_element(By.CSS_SELECTOR, "h2.jobTitle")
    title = title_element.text

    url_element = title_element.find_element(By.CSS_SELECTOR, "a")
    url = url_element.get_attribute("href")

    company_element =job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"company-name\"]")
    company = company_element.text

    location_element = job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"text-location\"]")
    location = location_element.text

    tags = []
    tags_container_element = job_element.find_element(By.CSS_SELECTOR, ".jobMetaDataGroup")
    tag_elements = tags_container_element.find_elements(By.CSS_SELECTOR, "[data-testid=\"attribute_snippet_testid\"]")
    for tag_element in tag_elements:
        tag = tag_element.text
        tags.append(tag)

    # Check whether the "Easy Apply" element is on the page
    try:
        job_element.find_element(By.CSS_SELECTOR, "[data-testid=\"indeedApply\"]")
        easily_apply = True
    except NoSuchElementException:
        easily_apply = False

    description = []
    description_container_element = job_element.find_element(By.CSS_SELECTOR, "[role=\"presentation\"]")
    description_elements = description_container_element.find_elements(By.CSS_SELECTOR, "ul li")
    for description_element in description_elements:
        description_item_text = description_element.text
        # Ignore empty description strings
        if (description_item_text != ""):
            description.append(description_item_text)

    # Store the scraped data
    job = {
        "title": title,
        "url": url,
        "company": company,
        "location": location,
        "tags": tags,
        "easily_apply": easily_apply,
        "description": description
    }
    jobs.append(job)

# Export the scraped data to an output CSV file
csv_file = "jobs.csv"
csv_headers = ["title", "url", "company", "location", "tags", "easily_apply", "description"]

with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.DictWriter(file, fieldnames=csv_headers)
    writer.writeheader()
    for job in jobs:
        writer.writerow({
            "title": job["title"],
            "url": job["url"],
            "company": job["company"],
            "location": job["location"],
            "tags": ";".join(job["tags"]),
            "easily_apply": "Yes" if job["easily_apply"] else "No",
            "description": ";".join(job["description"])
        })

# Close the web driver
driver.quit()

In less than 100 lines of code, you just built an Indeed scraper in Python!

Launch the scraper with the following command:

python3 script.py

Or, on Windows:

python script.py

jobs.csv file will appear in your project’s folder. Open it and you will see:

The final jobs.csv file with all the scraped results

Et voilà! Mission complete.

Unlock Indeed Data With Ease

Indeed is well aware of the value of its data and employs robust measures to protect it. This is why, when interacting with its pages using a browser automation tool like Selenium, you are likely to encounter a CAPTCHA:

Cloudflare CAPTCHA on Indeed

As a first step, consider following our guide on how to bypass CAPTCHAs in Python. Nevertheless, be aware that the site might still block your attempts with additional anti-bot measures. Discover them all in our webinar on anti-bot techniques.

These challenges highlight how scraping Indeed without the proper tools can quickly become frustrating and inefficient. Moreover, the inability to use headless browsers makes your scraping script slower and more resource-intensive.

The solution? Bright Data’s Indeed Scraper API, which tool lets you retrieve data from Indeed seamlessly through simple API calls—no CAPTCHAs, no blocks, and no hassle!

Conclusion

In this step-by-step guide, you learned what an Indeed scraper is, the types of data it can retrieve, and how to build one in Python. In just around 100 lines of code, you created a script that automatically collects data from Indeed.

Still, scraping Indeed comes with its challenges. The platform enforces strict anti-bot measures, including CAPTCHAs. These are difficult to bypass and can slow down your scraping process, making it less efficient. Forget about all those challenges with our Indeed Scraper API.

If web scraping is not your thing but you are still interested in job openings data, explore our ready-to-use Indeed datasets!

Create a free Bright Data account today to try our scraper APIs or explore our datasets.

No credit card required