TL:DR: Let’s learn how to build a Yahoo Finance scraper for extracting stock data to perform financial analysis for trading and investing.
This tutorial will cover:
- Why scrape financial data from the Web?
- Finance scraping libraries and tools
- Scraping stock data from Yahoo Finance with Selenium
Why Scrape Financial Data From the Web?
Scraping finance data from the Web offers valuable insights that come in handy in various scenarios, including:
- Automated Trading: By gathering real-time or historical market data, such as stock prices and volume, developers can build automated trading strategies.
- Technical Analysis: Historical market data and indicators are extremely important for technical analysts. These allow them to identify patterns and trends, assisting their investment decision-making.
- Financial Modeling: Researchers and analysts can gather relevant data like financial statements and economic indicators to build complex models for evaluating company performance, forecasting earnings, and assessing investment opportunities.
- Market Research: Financial data provide a great deal of information about stocks, market indices, and commodities. Analyzing this data helps researchers understand market trends, sentiment, and industry health to make informed investment decisions.
When it comes to monitoring the market, Yahoo Finance is one of the popular finance websites. It provides a wide range of information and tools to investors and traders, such as real-time and historical data on stocks, bonds, mutual funds, commodities, currencies, and market indices. Plus, it offers news articles, financial statements, analyst estimates, charts, and other valuable resources.
By scraping Yahoo Finance, you can access a wealth of information to support your financial analysis, research, and decision-making processes.
Finance Scraping Libraries and Tools
Python is considered one of the best languages for scraping thanks to its syntax, ease of use, and rich ecosystem of libraries. Check out our guide on web scraping with Python.
To choose the right scraping libraries out of the many available, explore Yahoo Finance in your browser. You will notice that most of the data on the site gets updated in real-time or changes after an interaction. This means that the site heavily on AJAX to load and update data dynamically without requiring page reloads. In other words, you need a tool that is able to run JavaScript.
Selenium makes it possible to scrape dynamic websites in Python. It renders site in web browsers, programmatically performing operations on them even if they use JavaScript for rendering or retrieving data.
Thanks to Selenium, you will be able to scrape the target site with Python. Let’s learn how!
Scraping Stock Data From Yahoo Finance With Selenium
Follow this step-by-step tutorial and see how to build a Yahoo Finance web scraping Python script.
Step 1: Setup
Before diving into finance scraping, make sure to meet these prerequisites:
- Python 3+ installed on your machine: Download the installer, double-click on it, and follows the installation wizard.
- A Python IDE of your choice: PyCharm Community Edition or Visual Studio Code with the Python extension will do.
Next, use the commands below to set up a Python project with a virtual environment:
mkdir yahoo-finance-scraper
cd yahoo-finance-scraper
python -m venv env
These will initialize the yahoo-finance-scraper
project folder. Inside it, add a scraper.py
file as below:
print('Hello, World!')
You will add the logic to scrape Yahoo Finance here. Right now, it is a sample script that only prints “Hello, World!”
Launch it to verify that it works with:
python scraper.py
In the terminal, you should see:
Hello, World!
Great, you now have a Python project for your finance scraper. It only remains to add the project’s dependencies. Install Selenium and the Webdriver Manager with the following terminal command:
pip install selenium webdriver-manager
This might take a while, so be patient.
webdriver-manager
is not strictly required. However, it is highly recommended as it makes managing web drivers in Selenium way easier. Thanks to it, you do not have to manually download, configure, and import the web driver.
Update scraper.py
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
# scraping logic...
# close the browser and free up the resources
driver.quit()
This script simply instantiates an instance of ChromeWebDriver
. You will use that soon to implement the data extraction logic.
Step 2: Connect to the target web page
This is what the URL of a Yahoo Finance stock page looks like:
https://finance.yahoo.com/quote/AMZN
As you can see, it is a dynamic URL that changes based on the ticker symbol. If you are not familiar with the concept, that is a string abbreviation used to uniquely identify shares traded in the stock market. For example, “AMZN” is the ticker symbol of the Amazon stock.
Let’s modify the script to make read the ticker from a command line argument.
import sys
# if there are no CLI parameters
if len(sys.argv) <= 1:
print('Ticker symbol CLI argument missing!')
sys.exit(2)
# read the ticker from the CLI argument
ticker_symbol = sys.argv[1]
# build the URL of the target page
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'
s
ys
is a Python standard library that provides access to the command-line arguments. Do not forget that the argument with index 0 is the name of your script. Thus, you have to target the argument with index 1.
After reading the ticker from the CLI, it is used in an f-string
to produce the target URL to scrape.
For example, assume to launch the scraper with the Tesla ticker “TSLA:”
python scraper.py TSLA
url
will contain:
https://finance.yahoo.com/quote/TSLA
If you forget the ticker symbol in the CLI, the program will fail with the error below:
Ticker symbol CLI argument missing!
Before opening any page in Selenium, it is recommended to set the window size to ensure that every element is visible:
driver.set_window_size(1920, 1080)
You can now use Selenium to connect to the target page with:
driver.get(url)
The get()
function instructs the browser to visit the desired page.
This is what your Yahoo Finance scraping script looks like so far:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import sys
# if there are no CLI parameters
if len(sys.argv) <= 1:
print('Ticker symbol CLI argument missing!')
sys.exit(2)
# read the ticker from the CLI argument
ticker_symbol = sys.argv[1]
# build the URL of the target page
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'
# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
# set up the window size of the controlled browser
driver.set_window_size(1920, 1080)
# visit the target page
driver.get(url)
# scraping logic...
# close the browser and free up the resources
driver.quit()
If you launch it, it will open this window for a fraction of a second before terminating:
Starting the browser with the UI is useful for debugging by monitoring what the scraper is doing on the web page. At the same time, it takes a lot of resources. To avoid that, configure Chrome to run in headless mode with:
from selenium.webdriver.chrome.options import Options
# ...
options = Options()
options.add_argument('--headless=new')
driver = webdriver.Chrome(
service=ChromeService(ChromeDriverManager().install()),
options=options
)
The controlled browser will now be launched behind the scene, with no UI.
Step 3: Inspect the target page
If you want to structure an effective data mining strategy, you must first analyze the target web page. Open your browser and visit the Yahoo stock page.
If you are based in Europe, you will first see a modal asking you to accept the cookies:
To close it and keep visiting the desired page, you must click “Accept all” or “Reject all.” Right-click on the first button and select the “Inspect” option to open the DevTools of your browser:
Here, you will notice that you can select that button with the following CSS selector:
.consent-overlay .accept-all
Use these lines of ice to deal with the consent modal in Selenium:
try:
# wait up to 3 seconds for the consent modal to show up
consent_overlay = WebDriverWait(driver, 3).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.consent-overlay')))
# click the "Accept all" button
accept_all_button = consent_overlay.find_element(By.CSS_SELECTOR, '.accept-all')
accept_all_button.click()
except TimeoutException:
print('Cookie consent overlay missing')
WebDriverWait
allows you to wait for an expected condition to occur on the page. If nothing happens in the specified timeout, it raises a TimeoutException
. Since the cookie overlay shows up only when your exit IP is European, you can handle the exception with a try-catch
instruction. This way, the script will keep running when the consent modal is not present.
To make the script works, you will need to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common import TimeoutException
Now, keep inspecting the target site in the DevTools and familiarize yourself with its DOM structure.
Step 4: Extract the stock data
As you should have noticed in the previous step, some of the most interesting information is in this section:
Inspect the HTML price indicator element:
Note that CSS classes are not useful for defining proper selectors in Yahoo Finance. They seem to follow a special syntax for a styling framework. Instead, focus on the other HTML attributes. For example, you can get the stock price with the CSS selector below:
[data-symbol="TSLA"][data-field="regularMarketPrice"]
Following a similar approach, extract all stock data from the price indicators with:
regular_market_price = driver.find_element(
By.CSS_SELECTOR,
f'[data-symbol="{ticker_symbol}"][data-field="regularMarketPrice"]'
).text
regular_market_change = driver\
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChange"]')\
.text
regular_market_change_percent = driver\
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChangePercent"]')\
.text\
.replace('(', '').replace(')', '')
post_market_price = driver\
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketPrice"]')\
.text
post_market_change = driver\
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChange"]')\
.text
post_market_change_percent = driver\
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChangePercent"]')\
.text\
.replace('(', '').replace(')', '')
After selecting an HTML element through the specific CSS selector strategy, you can extract its content with the text
field. Since the percent fields involve round parentheses, these are removed with replace()
.
Add them to a stock
dictionary and print it to verify that the process of scraping financial data works as expected:
# initialize the dictionary
stock = {}
# stock price scraping logic omitted for brevity...
# add the scraped data to the dictionary
stock['regular_market_price'] = regular_market_price
stock['regular_market_change'] = regular_market_change
stock['regular_market_change_percent'] = regular_market_change_percent
stock['post_market_price'] = post_market_price
stock['post_market_change'] = post_market_change
stock['post_market_change_percent'] = post_market_change_percent
print(stock)
Run the script on the security you want to scrape and you should see something like:
{'regular_market_price': '193.17', 'regular_market_change': '+8.70', 'regular_market_change_percent': '+4.72%', 'post_market_price': '194.00', 'post_market_change': '+0.83', 'post_market_change_percent': '+0.43%'}
You can find other useful info in the #quote-summary
table:
In this case, you can extract each data field thanks to the data-test
attribute as in the CSS selector below:
#quote-summary [data-test="PREV_CLOSE-value"]
Scrape them all with:
previous_close = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PREV_CLOSE-value"]').text
open_value = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="OPEN-value"]').text
bid = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BID-value"]').text
ask = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ASK-value"]').text
days_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DAYS_RANGE-value"]').text
week_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="FIFTY_TWO_WK_RANGE-value"]').text
volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="TD_VOLUME-value"]').text
avg_volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="AVERAGE_VOLUME_3MONTH-value"]').text
market_cap = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="MARKET_CAP-value"]').text
beta = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BETA_5Y-value"]').text
pe_ratio = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PE_RATIO-value"]').text
eps = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EPS_RATIO-value"]').text
earnings_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EARNINGS_DATE-value"]').text
dividend_yield = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DIVIDEND_AND_YIELD-value"]').text
ex_dividend_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EX_DIVIDEND_DATE-value"]').text
year_target_est = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ONE_YEAR_TARGET_PRICE-value"]').text
Then, add them to stock
:
stock['previous_close'] = previous_close
stock['open_value'] = open_value
stock['bid'] = bid
stock['ask'] = ask
stock['days_range'] = days_range
stock['week_range'] = week_range
stock['volume'] = volume
stock['avg_volume'] = avg_volume
stock['market_cap'] = market_cap
stock['beta'] = beta
stock['pe_ratio'] = pe_ratio
stock['eps'] = eps
stock['earnings_date'] = earnings_date
stock['dividend_yield'] = dividend_yield
stock['ex_dividend_date'] = ex_dividend_date
stock['year_target_est'] = year_target_est
Fantastic! You just performed financial web scraping with Python!
Step 5: Scrape several stocks
A diversified investment portfolio consists of more than one security. To retrieve data for all of them, you need to extend your script to scrape multiple tickers.
First, encapsulate the scraping logic in a function:
def scrape_stock(driver, ticker_symbol):
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'
driver.get(url)
# deal with the consent modal...
# initialize the stock dictionary with the
# ticker symbol
stock = { 'ticker': ticker_symbol }
# scraping the desired data and populate
# the stock dictionary...
return stock
Then, iterate over the CLI ticker arguments and apply the scraping function:
if len(sys.argv) <= 1:
print('Ticker symbol CLI arguments missing!')
sys.exit(2)
# initialize a Chrome instance with the right
# configs
options = Options()
options.add_argument('--headless=new')
driver = webdriver.Chrome(
service=ChromeService(ChromeDriverManager().install()),
options=options
)
driver.set_window_size(1150, 1000)
# the array containing all scraped data
stocks = []
# scraping all market securities
for ticker_symbol in sys.argv[1:]:
stocks.append(scrape_stock(driver, ticker_symbol))
At the end of the for
cycle, the list of Python dictionaries stocks
will contain all stock market data.
Step 6: Export scraped data to CSV
You can export the collected data to CSV with just a few lines of code:
import csv
# ...
# extract the name of the dictionary fields
# to use it as the header of the output CSV file
csv_header = stocks[0].keys()
# export the scraped data to CSV
with open('stocks.csv', 'w', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, csv_header)
dict_writer.writeheader()
dict_writer.writerows(stocks)
This snippet creates a stocks.csv
file with open()
, initializes with a header row, and populates it. Specifically, DictWriter.writerows()
converts each dictionary into a CSV record and appends it to the output file.
Since csv
comes from Python Standard Library, you do not even need to install an extra dependency to achieve the desired goal.
You started from raw data contained in a webpage and have semi-structured data stored in a CSV file. It is time to take a look at the entire Yahoo Finance scraper.
Step 7: Put it all together
Here is the complete scraper.py
file:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common import TimeoutException
import sys
import csv
def scrape_stock(driver, ticker_symbol):
# build the URL of the target page
url = f'https://finance.yahoo.com/quote/{ticker_symbol}'
# visit the target page
driver.get(url)
try:
# wait up to 3 seconds for the consent modal to show up
consent_overlay = WebDriverWait(driver, 3).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.consent-overlay')))
# click the 'Accept all' button
accept_all_button = consent_overlay.find_element(By.CSS_SELECTOR, '.accept-all')
accept_all_button.click()
except TimeoutException:
print('Cookie consent overlay missing')
# initialize the dictionary that will contain
# the data collected from the target page
stock = { 'ticker': ticker_symbol }
# scraping the stock data from the price indicators
regular_market_price = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketPrice"]') \
.text
regular_market_change = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChange"]') \
.text
regular_market_change_percent = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="regularMarketChangePercent"]') \
.text \
.replace('(', '').replace(')', '')
post_market_price = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketPrice"]') \
.text
post_market_change = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChange"]') \
.text
post_market_change_percent = driver \
.find_element(By.CSS_SELECTOR, f'[data-symbol="{ticker_symbol}"][data-field="postMarketChangePercent"]') \
.text \
.replace('(', '').replace(')', '')
stock['regular_market_price'] = regular_market_price
stock['regular_market_change'] = regular_market_change
stock['regular_market_change_percent'] = regular_market_change_percent
stock['post_market_price'] = post_market_price
stock['post_market_change'] = post_market_change
stock['post_market_change_percent'] = post_market_change_percent
# scraping the stock data from the "Summary" table
previous_close = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PREV_CLOSE-value"]').text
open_value = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="OPEN-value"]').text
bid = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BID-value"]').text
ask = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="ASK-value"]').text
days_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DAYS_RANGE-value"]').text
week_range = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="FIFTY_TWO_WK_RANGE-value"]').text
volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="TD_VOLUME-value"]').text
avg_volume = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="AVERAGE_VOLUME_3MONTH-value"]').text
market_cap = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="MARKET_CAP-value"]').text
beta = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="BETA_5Y-value"]').text
pe_ratio = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="PE_RATIO-value"]').text
eps = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EPS_RATIO-value"]').text
earnings_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EARNINGS_DATE-value"]').text
dividend_yield = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="DIVIDEND_AND_YIELD-value"]').text
ex_dividend_date = driver.find_element(By.CSS_SELECTOR, '#quote-summary [data-test="EX_DIVIDEND_DATE-value"]').text
year_target_est = driver.find_element(By.CSS_SELECTOR,
'#quote-summary [data-test="ONE_YEAR_TARGET_PRICE-value"]').text
stock['previous_close'] = previous_close
stock['open_value'] = open_value
stock['bid'] = bid
stock['ask'] = ask
stock['days_range'] = days_range
stock['week_range'] = week_range
stock['volume'] = volume
stock['avg_volume'] = avg_volume
stock['market_cap'] = market_cap
stock['beta'] = beta
stock['pe_ratio'] = pe_ratio
stock['eps'] = eps
stock['earnings_date'] = earnings_date
stock['dividend_yield'] = dividend_yield
stock['ex_dividend_date'] = ex_dividend_date
stock['year_target_est'] = year_target_est
return stock
# if there are no CLI parameters
if len(sys.argv) <= 1:
print('Ticker symbol CLI argument missing!')
sys.exit(2)
options = Options()
options.add_argument('--headless=new')
# initialize a web driver instance to control a Chrome window
driver = webdriver.Chrome(
service=ChromeService(ChromeDriverManager().install()),
options=options
)
# set up the window size of the controlled browser
driver.set_window_size(1150, 1000)
# the array containing all scraped data
stocks = []
# scraping all market securities
for ticker_symbol in sys.argv[1:]:
stocks.append(scrape_stock(driver, ticker_symbol))
# close the browser and free up the resources
driver.quit()
# extract the name of the dictionary fields
# to use it as the header of the output CSV file
csv_header = stocks[0].keys()
# export the scraped data to CSV
with open('stocks.csv', 'w', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, csv_header)
dict_writer.writeheader()
dict_writer.writerows(stocks)
In less than 150 lines of code, you built a full-featured web scraper to retrieve data from Yahoo Finance.
Launch it against your target stocks as in the example below:
python scraper.py TSLA AMZN AAPL META NFLX GOOG
At the end of the scraping process, this stocks.csv
file will appear in the root folder of your project:
Conclusion
In this tutorial, you understood why Yahoo Finance is one the best financial portal on the web and how to extract data from it. In particular, you saw how to build a Python scraper that can retrieve stock data from it. As shown here, it is not complex and takes only a few lines of code.
However, Yahoo Finance is a dynamic site that relies heavily on JavaScript and implements advanced data protection technologies. For seamless data extraction from such sites, consider using our Yahoo Finance Scraper API. This API handles the complexities of scraping, including managing CAPTCHAs, handling fingerprinting, and performing automated retries, allowing you to get structured financial data with ease. Get started with our Yahoo Finance Scraper API today to streamline your data collection process.
No credit card required
Don’t want to deal with web scraping at all but are interested in financial data? Get a Yahoo Finance dataset.
Note: This guide was thoroughly tested by our team at the time of writing, but as websites frequently update their code and structure, some steps may no longer work as expected.