TL:DR: Let’s learn how to build a Yahoo Finance scraper for extracting stock data to perform financial analysis for trading and investing.
This tutorial will cover:
- Why scrape financial data from the Web?
- Finance scraping libraries and tools
- Scraping stock data from Yahoo Finance with Selenium
Why Scrape Financial Data From the Web?
Scraping finance data from the Web offers valuable insights that come in handy in various scenarios, including:
- Automated Trading: By gathering real-time or historical market data, such as stock prices and volume, developers can build automated trading strategies.
- Technical Analysis: Historical market data and indicators are extremely important for technical analysts. These allow them to identify patterns and trends, assisting their investment decision-making.
- Financial Modeling: Researchers and analysts can gather relevant data like financial statements and economic indicators to build complex models for evaluating company performance, forecasting earnings, and assessing investment opportunities.
- Market Research: Financial data provide a great deal of information about stocks, market indices, and commodities. Analyzing this data helps researchers understand market trends, sentiment, and industry health to make informed investment decisions.
When it comes to monitoring the market, Yahoo Finance is one of the popular finance websites. It provides a wide range of information and tools to investors and traders, such as real-time and historical data on stocks, bonds, mutual funds, commodities, currencies, and market indices. Plus, it offers news articles, financial statements, analyst estimates, charts, and other valuable resources.
By scraping Yahoo Finance, you can access a wealth of information to support your financial analysis, research, and decision-making processes.
Finance Scraping Libraries and Tools
Python is considered one of the best languages for scraping thanks to its syntax, ease of use, and rich ecosystem of libraries. Check out our guide on web scraping with Python.
To choose the right scraping libraries out of the many available, explore Yahoo Finance in your browser. You will notice that most of the data on the site gets updated in real-time or changes after an interaction. This means that the site heavily on AJAX to load and update data dynamically without requiring page reloads. In other words, you need a tool that is able to run JavaScript.
Selenium makes it possible to scrape dynamic websites in Python. It renders site in web browsers, programmatically performing operations on them even if they use JavaScript for rendering or retrieving data.
Thanks to Selenium, you will be able to scrape the target site with Python. Let’s learn how!
Scraping Stock Data From Yahoo Finance With Selenium
Follow this step-by-step tutorial and see how to build a Yahoo Finance web scraping Python script.
Step 1: Setup
Before diving into finance scraping, make sure to meet these prerequisites:
- Python 3+ installed on your machine: Download the installer, double-click on it, and follows the installation wizard.
- A Python IDE of your choice: PyCharm Community Edition or Visual Studio Code with the Python extension will do.
Next, use the commands below to set up a Python project with a virtual environment:
These will initialize the yahoo-finance-scraper
project folder. Inside it, add a scraper.py
file as below:
You will add the logic to scrape Yahoo Finance here. Right now, it is a sample script that only prints “Hello, World!”
Launch it to verify that it works with:
In the terminal, you should see:
Great, you now have a Python project for your finance scraper. It only remains to add the project’s dependencies. Install Selenium and the Webdriver Manager with the following terminal command:
This might take a while, so be patient.
webdriver-manager
is not strictly required. However, it is highly recommended as it makes managing web drivers in Selenium way easier. Thanks to it, you do not have to manually download, configure, and import the web driver.
Update scraper.py
This script simply instantiates an instance of ChromeWebDriver
. You will use that soon to implement the data extraction logic.
Step 2: Connect to the target web page
This is what the URL of a Yahoo Finance stock page looks like:
As you can see, it is a dynamic URL that changes based on the ticker symbol. If you are not familiar with the concept, that is a string abbreviation used to uniquely identify shares traded in the stock market. For example, “AMZN” is the ticker symbol of the Amazon stock.
Let’s modify the script to make read the ticker from a command line argument.
s
ys
is a Python standard library that provides access to the command-line arguments. Do not forget that the argument with index 0 is the name of your script. Thus, you have to target the argument with index 1.
After reading the ticker from the CLI, it is used in an f-string
to produce the target URL to scrape.
For example, assume to launch the scraper with the Tesla ticker “TSLA:”
url
will contain:
If you forget the ticker symbol in the CLI, the program will fail with the error below:
Before opening any page in Selenium, it is recommended to set the window size to ensure that every element is visible:
You can now use Selenium to connect to the target page with:
The get()
function instructs the browser to visit the desired page.
This is what your Yahoo Finance scraping script looks like so far:
If you launch it, it will open this window for a fraction of a second before terminating:
Starting the browser with the UI is useful for debugging by monitoring what the scraper is doing on the web page. At the same time, it takes a lot of resources. To avoid that, configure Chrome to run in headless mode with:
The controlled browser will now be launched behind the scene, with no UI.
Step 3: Inspect the target page
If you want to structure an effective data mining strategy, you must first analyze the target web page. Open your browser and visit the Yahoo stock page.
If you are based in Europe, you will first see a modal asking you to accept the cookies:
To close it and keep visiting the desired page, you must click “Accept all” or “Reject all.” Right-click on the first button and select the “Inspect” option to open the DevTools of your browser:
Here, you will notice that you can select that button with the following CSS selector:
Use these lines of ice to deal with the consent modal in Selenium:
WebDriverWait
allows you to wait for an expected condition to occur on the page. If nothing happens in the specified timeout, it raises a TimeoutException
. Since the cookie overlay shows up only when your exit IP is European, you can handle the exception with a try-catch
instruction. This way, the script will keep running when the consent modal is not present.
To make the script works, you will need to add the following imports:
Now, keep inspecting the target site in the DevTools and familiarize yourself with its DOM structure.
Step 4: Extract the stock data
As you should have noticed in the previous step, some of the most interesting information is in this section:
Inspect the HTML price indicator element:
Note that CSS classes are not useful for defining proper selectors in Yahoo Finance. They seem to follow a special syntax for a styling framework. Instead, focus on the other HTML attributes. For example, you can get the stock price with the CSS selector below:
Following a similar approach, extract all stock data from the price indicators with:
After selecting an HTML element through the specific CSS selector strategy, you can extract its content with the text
field. Since the percent fields involve round parentheses, these are removed with replace()
.
Add them to a stock
dictionary and print it to verify that the process of scraping financial data works as expected:
Run the script on the security you want to scrape and you should see something like:
You can find other useful info in the #quote-summary
table:
In this case, you can extract each data field thanks to the data-test
attribute as in the CSS selector below:
Scrape them all with:
Then, add them to stock
:
Fantastic! You just performed financial web scraping with Python!
Step 5: Scrape several stocks
A diversified investment portfolio consists of more than one security. To retrieve data for all of them, you need to extend your script to scrape multiple tickers.
First, encapsulate the scraping logic in a function:
Then, iterate over the CLI ticker arguments and apply the scraping function:
At the end of the for
cycle, the list of Python dictionaries stocks
will contain all stock market data.
Step 6: Export scraped data to CSV
You can export the collected data to CSV with just a few lines of code:
This snippet creates a stocks.csv
file with open()
, initializes with a header row, and populates it. Specifically, DictWriter.writerows()
converts each dictionary into a CSV record and appends it to the output file.
Since csv
comes from Python Standard Library, you do not even need to install an extra dependency to achieve the desired goal.
You started from raw data contained in a webpage and have semi-structured data stored in a CSV file. It is time to take a look at the entire Yahoo Finance scraper.
Step 7: Put it all together
Here is the complete scraper.py
file:
In less than 150 lines of code, you built a full-featured web scraper to retrieve data from Yahoo Finance.
Launch it against your target stocks as in the example below:
At the end of the scraping process, this stocks.csv
file will appear in the root folder of your project:
Conclusion
In this tutorial, you understood why Yahoo Finance is one the best financial portal on the web and how to extract data from it. In particular, you saw how to build a Python scraper that can retrieve stock data from it. As shown here, it is not complex and takes only a few lines of code.
However, Yahoo Finance is a dynamic site that relies heavily on JavaScript and implements advanced data protection technologies. For seamless data extraction from such sites, consider using our Yahoo Finance Scraper API. This API handles the complexities of scraping, including managing CAPTCHAs, handling fingerprinting, and performing automated retries, allowing you to get structured financial data with ease. Get started with our Yahoo Finance Scraper API today to streamline your data collection process.
No credit card required
Don’t want to deal with web scraping at all but are interested in financial data? Get a Yahoo Finance dataset.
Note: This guide was thoroughly tested by our team at the time of writing, but as websites frequently update their code and structure, some steps may no longer work as expected.