How To Scrape Yahoo Finance in Python

In this step by step guide, you will learn how to scrape Yahoo Finance using Python
5 min read
Yahoo Finance scraping guide

TL:DR: Let’s learn how to build a Yahoo Finance scraper for extracting stock data to perform financial analysis for trading and investing.

This tutorial will cover:

  • Why scrape financial data from the Web?
  • Finance scraping libraries and tools
  • Scraping stock data from Yahoo Finance with Selenium

Why Scrape Financial Data From the Web?

Scraping finance data from the Web offers valuable insights that come in handy in various scenarios, including:

  • Automated Trading: By gathering real-time or historical market data, such as stock prices and volume, developers can build automated trading strategies.
  • Technical Analysis: Historical market data and indicators are extremely important for technical analysts. These allow them to identify patterns and trends, assisting their investment decision-making.
  • Financial Modeling: Researchers and analysts can gather relevant data like financial statements and economic indicators to build complex models for evaluating company performance, forecasting earnings, and assessing investment opportunities.
  • Market Research: Financial data provide a great deal of information about stocks, market indices, and commodities. Analyzing this data helps researchers understand market trends, sentiment, and industry health to make informed investment decisions.

When it comes to monitoring the market, Yahoo Finance is one of the popular finance websites. It provides a wide range of information and tools to investors and traders, such as real-time and historical data on stocks, bonds, mutual funds, commodities, currencies, and market indices. Plus, it offers news articles, financial statements, analyst estimates, charts, and other valuable resources.

By scraping Yahoo Finance, you can access a wealth of information to support your financial analysis, research, and decision-making processes.

Finance Scraping Libraries and Tools

Python is considered one of the best languages for scraping thanks to its syntax, ease of use, and rich ecosystem of libraries. Check out our guide on web scraping with Python.

To choose the right scraping libraries out of the many available, explore Yahoo Finance in your browser. You will notice that most of the data on the site gets updated in real-time or changes after an interaction. This means that the site heavily on AJAX to load and update data dynamically without requiring page reloads. In other words, you need a tool that is able to run JavaScript.

Selenium makes it possible to scrape dynamic websites in Python. It renders site in web browsers, programmatically performing operations on them even if they use JavaScript for rendering or retrieving data.

Thanks to Selenium, you will be able to scrape the target site with Python. Let’s learn how!

Scraping Stock Data From Yahoo Finance With Selenium

Follow this step-by-step tutorial and see how to build a Yahoo Finance web scraping Python script.

Step 1: Setup

Before diving into finance scraping, make sure to meet these prerequisites:

Next, use the commands below to set up a Python project with a virtual environment


mkdir yahoo-finance-scraper
cd yahoo-finance-scraper
python -m venv env

These will initialize the yahoo-finance-scraper project folder. Inside it, add a scraper.py file as below:

print('Hello, World!')

You will add the logic to scrape Yahoo Finance here. Right now, it is a sample script that only prints “Hello, World!”

Launch it to verify that it works with:

python scraper.py

In the terminal, you should see:

Hello, World!

Great, you now have a Python project for your finance scraper. It only remains to add the project’s dependencies. Install Selenium and the Webdriver Manager with the following terminal command:

pip install selenium webdriver-manager

This might take a while, so be patient.

webdriver-manager is not strictly required. However, it is highly recommended as it makes managing web drivers in Selenium way easier. Thanks to it, you do not have to manually download, configure, and import the web driver.

Update scraper.py 


from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager

# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))

# scraping logic...

# close the browser and free up the resources
driver.quit()

This script simply instantiates an instance of ChromeWebDriver. You will use that soon to implement the data extraction logic. 

Step 2: Connect to the target web page

This is what the URL of a Yahoo Finance stock page looks like:

https://finance.yahoo.com/quote/AMZN

As you can see, it is a dynamic URL that changes based on the ticker symbol. If you are not familiar with the concept, that is a string abbreviation used to uniquely identify shares traded in the stock market. For example, “AMZN” is the ticker symbol of the Amazon stock.

Let’s modify the script to make read the ticker from a command line argument. 


import sys

# if there are no CLI parameters
if len(sys.argv) <= 1:
    print('Ticker symbol CLI argument missing!')
    sys.exit(2)

# read the ticker from the CLI argument
ticker_symbol = sys.argv[1]

# build the URL of the target page
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'

sys is a Python standard library that provides access to the command-line arguments. Do not forget that the argument with index 0 is the name of your script. Thus, you have to target the argument with index 1.

After reading the ticker from the CLI, it is used in an f-string to produce the target URL to scrape. 

For example, assume to launch the scraper with the Tesla ticker “TSLA:”

python scraper.py TSLA

url will contain:

https://finance.yahoo.com/quote/TSLA

If you forget the ticker symbol in the CLI, the program will fail with the error below:

Ticker symbol CLI argument missing!

Before opening any page in Selenium, it is recommended to set the window size to ensure that every element is visible:

driver.set_window_size(1920, 1080)

You can now use Selenium to connect to the target page with:

driver.get(url)

The get() function instructs the browser to visit the desired page.

This is what your Yahoo Finance scraping script looks like so far:


from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import sys

# if there are no CLI parameters
if len(sys.argv) <= 1:
    print('Ticker symbol CLI argument missing!')
    sys.exit(2)

# read the ticker from the CLI argument
ticker_symbol = sys.argv[1]

# build the URL of the target page
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'

# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
# set up the window size of the controlled browser
driver.set_window_size(1920, 1080)
# visit the target page
driver.get(url)

# scraping logic...

# close the browser and free up the resources
driver.quit()

If you launch it, it will open this window for a fraction of a second before terminating:

Yahoo finance image

Starting the browser with the UI is useful for debugging by monitoring what the scraper is doing on the web page. At the same time, it takes a lot of resources. To avoid that, configure Chrome to run in headless mode with:


from selenium.webdriver.chrome.options import Options
# ...

options = Options()
options.add_argument('--headless=new')

driver = webdriver.Chrome(
    service=ChromeService(ChromeDriverManager().install()),
    options=options
)

The controlled browser will now be launched behind the scene, with no UI.

Step 3: Inspect the target page

If you want to structure an effective data mining strategy, you must first analyze the target web page. Open your browser and visit the Yahoo stock page.

If you are based in Europe, you will first see a modal asking you to accept the cookies:

Yahoo cookies

To close it and keep visiting the desired page, you must click “Accept all” or “Reject all.” Right-click on the first button and select the “Inspect” option to open the DevTools of your browser:

Yahoo inspect

Here, you will notice that you can select that button with the following CSS selector:

.consent-overlay .accept-all

Use these lines of ice to deal with the consent modal in Selenium:


try:
    # wait up to 3 seconds for the consent modal to show up
    consent_overlay = WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, '.consent-overlay')))

    # click the "Accept all" button
    accept_all_button = consent_overlay.find_element(By.CSS_SELECTOR, '.accept-all')
    accept_all_button.click()
except TimeoutException:
    print('Cookie consent overlay missing')

WebDriverWait allows you to wait for an expected condition to occur on the page. If nothing happens in the specified timeout, it raises a TimeoutException. Since the cookie overlay shows up only when your exit IP is European, you can handle the exception with a try-catch instruction. This way, the script will keep running when the consent modal is not present. 

To make the script works, you will need to add the following imports:


from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common import TimeoutException

Now, keep inspecting the target site in the DevTools and familiarize yourself with its DOM structure.

Step 4: Extract the stock data

As you should have noticed in the previous step, some of the most interesting information is in this section:

Extract stock data

Inspect the HTML price indicator element:

Note that CSS classes are not useful for defining proper selectors in Yahoo Finance. They seem to follow a special syntax for a styling framework. Instead, focus on the other HTML attributes. For example, you can get the stock price with the CSS selector below:

[data-symbol="TSLA"][data-field="regularMarketPrice"]

Following a similar approach, extract all stock data from the price indicators with:


regular_market_price = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketPrice"]')\
    .text
regular_market_change = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChange"]')\
    .text
regular_market_change_percent = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChangePercent"]')\
    .text\
    .replace('(', '').replace(')', '')
 
post_market_price = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketPrice"]')\
    .text
post_market_change = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChange"]')\
    .text
post_market_change_percent = driver\
    .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChangePercent"]')\
    .text\
    .replace('(', '').replace(')', '')

After selecting an HTML element through the specific CSS selector strategy, you can extract its content with the text field. Since the percent fields involve round parentheses, these are removed with replace().

Add them to a stock dictionary and print it to verify that the process of scraping financial data works as expected:


# initialize the dictionary
stock = {}

# stock price scraping logic omitted for brevity...

# add the scraped data to the dictionary
stock['regular_market_price'] = regular_market_price
stock['regular_market_change'] = regular_market_change
stock['regular_market_change_percent'] = regular_market_change_percent
stock['post_market_price'] = post_market_price
stock['post_market_change'] = post_market_change
stock['post_market_change_percent'] = post_market_change_percent

print(stock)

Run the script on the security you want to scrape and you should see something like:

{'regular_market_price': '193.17', 'regular_market_change': '+8.70', 'regular_market_change_percent': '+4.72%', 'post_market_price': '194.00', 'post_market_change': '+0.83', 'post_market_change_percent': '+0.43%'}

You can find other useful info in the #quote-summary table:

Quote summary table

In this case, you can extract each data field thanks to the data-test attribute as in the CSS selector below:

#quote-summary [data-test="PREV_CLOSE-value"]

Scrape them all with:


previous_close = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PREV_CLOSE-value"]').text
open_value = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="OPEN-value"]').text
bid = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BID-value"]').text
ask = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ASK-value"]').text
days_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DAYS_RANGE-value"]').text
week_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="FIFTY_TWO_WK_RANGE-value"]').text
volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="TD_VOLUME-value"]').text
avg_volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="AVERAGE_VOLUME_3MONTH-value"]').text
market_cap = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="MARKET_CAP-value"]').text
beta = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BETA_5Y-value"]').text
pe_ratio = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PE_RATIO-value"]').text
eps = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EPS_RATIO-value"]').text
earnings_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EARNINGS_DATE-value"]').text
dividend_yield = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DIVIDEND_AND_YIELD-value"]').text
ex_dividend_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EX_DIVIDEND_DATE-value"]').text
year_target_est = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ONE_YEAR_TARGET_PRICE-value"]').text

Then, add them to stock:


stock['previous_close'] = previous_close
stock['open_value'] = open_value
stock['bid'] = bid
stock['ask'] = ask
stock['days_range'] = days_range
stock['week_range'] = week_range
stock['volume'] = volume
stock['avg_volume'] = avg_volume
stock['market_cap'] = market_cap
stock['beta'] = beta
stock['pe_ratio'] = pe_ratio
stock['eps'] = eps
stock['earnings_date'] = earnings_date
stock['dividend_yield'] = dividend_yield
stock['ex_dividend_date'] = ex_dividend_date
stock['year_target_est'] = year_target_est

Fantastic! You just performed financial web scraping with Python!

Step 5: Scrape several stocks

A diversified investment portfolio consists of more than one security. To retrieve data for all of them, you need to extend your script to scrape multiple tickers. 

First, encapsulate the scraping logic in a function:


def scrape_stock(driver, ticker_symbol):
    url = f'https://finance.yahoo.com/quote/{ticker_symbol}'
    driver.get(url)

    # deal with the consent modal...

    # initialize the stock dictionary with the
    # ticker symbol
    stock = { 'ticker': ticker_symbol }

    # scraping the desired data and populate 
    # the stock dictionary...

    return stock

Then, iterate over the CLI ticker arguments and apply the scraping function:


if len(sys.argv) <= 1:
    print('Ticker symbol CLI arguments missing!')
    sys.exit(2)

# initialize a Chrome instance with the right
# configs
options = Options()
options.add_argument('--headless=new')
driver = webdriver.Chrome(
    service=ChromeService(ChromeDriverManager().install()),
    options=options
)
driver.set_window_size(1150, 1000)

# the array containing all scraped data
stocks = []

# scraping all market securities
for ticker_symbol in sys.argv[1:]:
    stocks.append(scrape_stock(driver, ticker_symbol))

At the end of the for cycle, the list of Python dictionaries stocks will contain all stock market data.

Step 6: Export scraped data to CSV

You can export the collected data to CSV with just a few lines of code:


import csv

# ...

# extract the name of the dictionary fields
# to use it as the header of the output CSV file
csv_header = stocks[0].keys()

# export the scraped data to CSV
with open('stocks.csv', 'w', newline='') as output_file:
    dict_writer = csv.DictWriter(output_file, csv_header)
    dict_writer.writeheader()
    dict_writer.writerows(stocks)

This snippet creates a stocks.csv file with open(), initializes with a header row, and populates it. Specifically, DictWriter.writerows() converts each dictionary into a CSV record and appends it to the output file.

Since csv comes from Python Standard Library, you do not even need to install an extra dependency to achieve the desired goal. 

You started from raw data contained in a webpage and have semi-structured data stored in a CSV file. It is time to take a look at the entire Yahoo Finance scraper.

Step 7: Put it all together

Here is the complete scraper.py file:


from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common import TimeoutException
import sys
import csv

def scrape_stock(driver, ticker_symbol):
    # build the URL of the target page
    url = f'https://finance.yahoo.com/quote/{ticker_symbol}'

    # visit the target page
    driver.get(url)

    try:
        # wait up to 3 seconds for the consent modal to show up
        consent_overlay = WebDriverWait(driver, 3).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.consent-overlay')))

        # click the 'Accept all' button
        accept_all_button = consent_overlay.find_element(By.CSS_SELECTOR, '.accept-all')
        accept_all_button.click()
    except TimeoutException:
        print('Cookie consent overlay missing')

    # initialize the dictionary that will contain
    # the data collected from the target page
    stock = { 'ticker': ticker_symbol }

    # scraping the stock data from the price indicators
    regular_market_price = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketPrice"]') \
        .text
    regular_market_change = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChange"]') \
        .text
    regular_market_change_percent = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChangePercent"]') \
        .text \
        .replace('(', '').replace(')', '')

    post_market_price = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketPrice"]') \
        .text
    post_market_change = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChange"]') \
        .text
    post_market_change_percent = driver \
        .find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChangePercent"]') \
        .text \
        .replace('(', '').replace(')', '')

    stock['regular_market_price'] = regular_market_price
    stock['regular_market_change'] = regular_market_change
    stock['regular_market_change_percent'] = regular_market_change_percent
    stock['post_market_price'] = post_market_price
    stock['post_market_change'] = post_market_change
    stock['post_market_change_percent'] = post_market_change_percent

    # scraping the stock data from the "Summary" table
    previous_close = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PREV_CLOSE-value"]').text
    open_value = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="OPEN-value"]').text
    bid = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BID-value"]').text
    ask = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ASK-value"]').text
    days_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DAYS_RANGE-value"]').text
    week_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="FIFTY_TWO_WK_RANGE-value"]').text
    volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="TD_VOLUME-value"]').text
    avg_volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="AVERAGE_VOLUME_3MONTH-value"]').text
    market_cap = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="MARKET_CAP-value"]').text
    beta = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BETA_5Y-value"]').text
    pe_ratio = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PE_RATIO-value"]').text
    eps = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EPS_RATIO-value"]').text
    earnings_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EARNINGS_DATE-value"]').text
    dividend_yield = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DIVIDEND_AND_YIELD-value"]').text
    ex_dividend_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EX_DIVIDEND_DATE-value"]').text
    year_target_est = driver.find_element(By.CSS_SELECTOR,
                                          '#quote-summary [data-test="ONE_YEAR_TARGET_PRICE-value"]').text

    stock['previous_close'] = previous_close
    stock['open_value'] = open_value
    stock['bid'] = bid
    stock['ask'] = ask
    stock['days_range'] = days_range
    stock['week_range'] = week_range
    stock['volume'] = volume
    stock['avg_volume'] = avg_volume
    stock['market_cap'] = market_cap
    stock['beta'] = beta
    stock['pe_ratio'] = pe_ratio
    stock['eps'] = eps
    stock['earnings_date'] = earnings_date
    stock['dividend_yield'] = dividend_yield
    stock['ex_dividend_date'] = ex_dividend_date
    stock['year_target_est'] = year_target_est

    return stock

# if there are no CLI parameters
if len(sys.argv) <= 1:
    print('Ticker symbol CLI argument missing!')
    sys.exit(2)

options = Options()
options.add_argument('--headless=new')

# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(
    service=ChromeService(ChromeDriverManager().install()),
    options=options
)

# set up the window size of the controlled browser
driver.set_window_size(1150, 1000)

# the array containing all scraped data
stocks = []

# scraping all market securities
for ticker_symbol in sys.argv[1:]:
    stocks.append(scrape_stock(driver, ticker_symbol))

# close the browser and free up the resources
driver.quit()

# extract the name of the dictionary fields
# to use it as the header of the output CSV file
csv_header = stocks[0].keys()

# export the scraped data to CSV
with open('stocks.csv', 'w', newline='') as output_file:
    dict_writer = csv.DictWriter(output_file, csv_header)
    dict_writer.writeheader()
    dict_writer.writerows(stocks)

In less than 150 lines of code, you built a full-featured web scraper to retrieve data from Yahoo Finance. 

Launch it against your target stocks as in the example below:

python scraper.py TSLA AMZN AAPL META NFLX GOOG

At the end of the scraping process, this stocks.csv file will appear in the root folder of your project:

Stocks csv

Conclusion

In this tutorial, you understood why Yahoo Finance is one the best financial portal on the web and how to extract data from it. In particular, you saw how to build a Python scraper that can retrieve stock data from it. As shown here, it is not complex and takes only a few lines of code. 

At the same time, Yahoo Finance is a dynamic site that relies heavily on JavaScript. When dealing with such sites, a traditional approach based on an HTTP library and HTML parser is not enough. On top of that, such popular sites tend to implement advanced data protection technologies. To scrape them you need a controllable browser that is automatically able to handle CAPTCHAs, fingerprinting, automated retries, and more for you. This is exactly what our new Scraping Browser solution is all about! 

Don’t want to deal with web scraping at all but are interested in financial data? Get a Yahoo Finance dataset.