In this Excel scraping guide, you will see:
- What VBA is.
- Whether a custom VBA module allows you to retrieve from the Web.
- How to deal with web scraping in Excel after Internet Explorer end of support.
- How to write VBA code for web scraping using Selenium.
- The legacy approach to online data retrieval in Excel.
Let’s dive in!
What Is VBA?
VBA, short for Visual Basic for Applications, is a programming language developed by Microsoft. Its goal is to enable users to automate tasks and write custom functions in various Microsoft Office applications, including Excel, Word, and Access.
In Excel, VBA can be used to define complex macros that interact with data in spreadsheet cells. With VBA, you can improve productivity, simplify workflows, and extend Excel’s capabilities beyond its built-in functions.
Is It Possible to Perform Excel Web Scraping With VBA?
Yes, VBA opens the doors to web scraping directly in Excel. Thanks to VBA, you can write custom modules that:
- Connect to a web page in a browser.
- Parse its HTML content.
- Extract data from it.
- Write the scraped data directly within Excel cells.
This is what web scraping is all about. Thus, scraping in VBA is not only possible but also efficient because it directly imports data into Excel. Learn more about the pros and cons of scraping in Excel in the section!
Pros and Cons of Excel Web Scraping
Before digging into how to write VBA code for web scraping, let’s see some advantages and disadvantages of web scraping in Excel with VBA.
👍 Pros
- Access to Excel features: Scraped data is directly imported into Excel, enabling you to use Excel powerful features for data manipulation and analysis.
- Ready to go: Microsoft Office comes with VBA support out of the box. With just Microsoft Office 365 installed on your PC, you have everything you need to do web scraping.
- All-in-one automation: With a VBA web scraping script, you can automate a complete data collection task, from data retrieval to data representation in Excel.
👎 Cons
- Some features are only available on Windows: VBA packages for Active-X controls and COM automation are only available on Windows and do not work on Office for Mac.
- VBA feels old: Visual Basic is not the most modern programming language around. If you have never worked with it, you may find it difficult to use due to its old syntax and approach to coding.
- Internet Explorer is now deprecated: VBA COM automation to control a web browser is based on Internet Explorer, which is now deprecated.
The last pitfall is quite major, so it deserves a more in-depth analysis.
How to Deal With Internet Explorer Deprecation in VBA Web Scraping?
The legacy approach to writing VBA code for web scraping used to rely on the COM automation interface coming with the InternetExplorer object. This provided everything you needed to visit a site in Internet Explorer, parse its HTML content, and extract data from it. The problem is that Internet Explorer is no longer supported from June 15, 2022.
In other words, most recent versions of Windows do not even come with Internet Explorer. As a result, using the InternetExplorer object in a VBA script will result in an error. Since Edge is the replacement for Internet Explorer, you might be thinking about using the equivalent object in VBA. However, Edge does not come with the COM automation interface. Thus, you cannot control it programmatically as you did with Internet Explorer.
Instead, Edge supports automation via a web driver that can be controlled through browser automation technologies such as Selenium. So, the currently supported way to perform Excel VBA web scraping is using the Selenium binding for VBA. This allows you to control a browser like Chrome, Edge, or Firefox.
In the section below, you will build an Excel web scraping script using Selenium and Edge. Later, you will also see the code snippet of the legacy approach, which does not require any third-party dependencies.
How To Write VBA Code for Web Scraping Using Selenium
In this tutorial section, you will learn how to perform Excel web scraping in VBA using SeleniumBasic, the Selenium API binding for VBA.
The target site will be the Scrape This Site Country sandbox, which contains a list of all the countries in the world:
The goal of the VBA scraper will be to automatically retrieve this data and write it into an Excel spreadsheet.
Time to write some VBA code!
Prerequisites
Make sure you have the latest version of Microsoft Office 365 installed on your machine. This section refers to Windows 11 and the Office 2024 update. At the same time, the steps below will be the same or similar for macOS and other versions of Office.
Note that the desktop version of Office is required to follow the tutorial. The free Microsoft 365 web platform does not support VBA scripting.
Step #1: Install and Set Up SeleniumBasic
Download the SeleniumBasic installer from the release page of the GitHub repository:
Double-click on the .exe installer and wait for the installation process to complete.
As with most VBA packages, SeleniumBasic has not received an update in years. Because of that, it comes with web drivers that no longer work with recent browsers. To fix that, you need to manually override the executable files of the drivers in the SeleniumBasic installation folder.
Here, you will see how to override the Edge web driver, but you can follow the same procedure for Chrome and Firefox.
First, download the latest Stable version of the Microsoft Edge WebDriver:
The “x64” version should be what you are looking for.
You should now have an edgedriver_win64.zip file. Unzip it, enter the unzipped folder, and make sure it contains a msedgedriver.exe executable. That is the Edge WebDriver executable.
Rename it to “edgedriver.exe” and get ready to place it in the right folder.
Open the SeleniumBasic installation folder you should find at:
C:\Users\<YOUR_USERNAME>\AppData\Local\SeleniumBasic
Place edgedriver.exe inside this folder, overriding the existing Edge WebDriver executable.
Wonderful! SeleniumBasic will now be able to control the latest version of Edge in Excel.
Step #2: Launch Excel
Open the Windows Start Menu, type “Excel,” and click on the “Excel” app. Select the “Blank workbook” option to create a new spreadsheet:
At the end of this section, this will contain the scraped data.
Step #3: Enable the Developer Tab
If you look at the tab bar on the top, you will not see any option to create a VBA script. That is because you must first enable that in the Excel configurations.
To do so, click on “File” in the top left corner:
Then, select “Options” as below:
In the “Options” popup window, reach the “Custom Ribbon” tab, and check the “Developer” option in the “Main tabs” section:
Press “OK” and a new “Developer” tab will appear:
Step #4: Initialize a VBA Web Scraping Module
Click on the “Developer” tab and press the “Visual Basic” button:
This will open the following window:
Here, click “Insert” on the top menu and then “Module” to initialize your VBA scraping module:
This is what you should be seeing right now:
The “Book1 – Module1 (Code)” inner window is where you should write your VBA code for web scraping.
Step #5: Import Seleniumbasic
In the top menu, click “Tools” and then “References…”
In the popup window, find the “Selenium Type Library” and check it:
Click the “OK” button, and you will now be able to use Selenium in Excel to perform web scraping.
Step #6: Automate Edge to Open the Target Site:
Paste the code below in the VBA module window:
Sub scrape_countries()
' initialize a Selenium WebDriver instance
Dim driver As New WebDriver
' open a new Edge window
driver.Start "Edge"
' navigate to the target page
driver.Get "https://www.scrapethissite.com/pages/simple/"
' wait 10 seconds before shutting down the application
Application.Wait Now + TimeValue("00:00:10")
' close the current driver window
driver.Quit
End Sub
This initializes a Selenium instance and uses it to instruct Edge to visit the target page. Test the code by clicking on the run button:
That will open the following Edge window:
Note the “Microsoft Edge is being controlled by automated test software.” message which informs that Selenium is operating on Edge as desired.
If you do not want Edge to show up, you can enable the headless mode with this line:
driver.SetCapability "ms:edgeOptions", "{""args"":[""--headless""]}"
Step #7: Inspect the HTML Code of the Page
Web scraping involves selecting HTML elements on a page and collecting data from them. CSS selectors are among the most popular methods for selecting HTML nodes. If you are a web developer, you should already be familiar with them. Otherwise, explore the official documentation.
To define effective CSS selectors, you must first get familiar with the HTML of the target page. So, open the Scrape This Site Country sandbox in the browser, right-click on a country element, and select the “Inspect” option:
Here, you can see that each country HTML element is <div> you can select with the following CSS selector:
.country
Given a .country HTML node, you should target:
- The country name in the .country-name element.
- The name of the capital in the .country-capital element.
- The population information in the .country-population element.
- The area in km² occupied by the country in the .country-area element.
Those are all the CSS selectors you need to select the desired HTML nodes and extract data from them. See how to use them in the next step!
Step #8: Write the VBA Web Scraping Logic
Use the FindElementsByCss() method exposed by driver to apply the.country CSS selector and select all country HTML nodes on the page:
Dim countryHTMLElements As WebElements
Set countryHTMLElements = driver.FindElementsByCss(".country")
Then, define an integer counter to keep track of the current Excel row to write data in:
Dim currentRow As Integer
currentRow = 1
Next, iterate over the country HTML nodes, extract the desired data from them, and write it to Excel cells using the Cells() function:
For Each countryHTMLElement In countryHTMLElements
' where to store the scraped data
Dim name, capital, population, area As String
' data retrieval logic
name = countryHTMLElement.FindElementByCss(".country-name").Text
capital = countryHTMLElement.FindElementByCss(".country-capital").Text
population = countryHTMLElement.FindElementByCss(".country-population").Text
area = countryHTMLElement.FindElementByCss(".country-area").Text
' write the scraped data in Excel cells
Cells(currentRow, 1).Value = name
Cells(currentRow, 2).Value = capital
Cells(currentRow, 3).Value = population
Cells(currentRow, 4).Value = area
' increment the row counter
currentRow = currentRow + 1
Next countryHTMLElement
Amazing! You are ready to take a look at the final Excel web scraping code.
Step #9: Put It All Together
Your VBA web scraping module should now contain:
Sub scrape_countries()
' initialize a Selenium WebDriver instance
Dim driver As New WebDriver
' enable the "headless" mode
driver.SetCapability "ms:edgeOptions", "{""args"":[""--headless""]}"
' open a new Edge window
driver.Start "Edge"
' navigate to the target page
driver.Get "https://www.scrapethissite.com/pages/simple/"
' select all country HTML nodes on the page
Dim countryHTMLElements As WebElements
Set countryHTMLElements = driver.FindElementsByCss(".country")
' counter to the current row
Dim currentRow As Integer
currentRow = 1
' iterate over each country HTML node and
' apply the Excel scraping logic
For Each countryHTMLElement In countryHTMLElements
' where to store the scraped data
Dim name, capital, population, area As String
' data retrieval logic
name = countryHTMLElement.FindElementByCss(".country-name").Text
capital = countryHTMLElement.FindElementByCss(".country-capital").Text
population = countryHTMLElement.FindElementByCss(".country-population").Text
area = countryHTMLElement.FindElementByCss(".country-area").Text
' write the scraped data in Excel cells
Cells(currentRow, 1).Value = name
Cells(currentRow, 2).Value = capital
Cells(currentRow, 3).Value = population
Cells(currentRow, 4).Value = area
' increment the row counter
currentRow = currentRow + 1
Next countryHTMLElement
' close the current driver window
driver.Quit
End Sub
Launch it and wait for the module execution to complete. At the end of the VBA script, the Excel spreadsheet will contain:
This file contains the same data as the target site but in a semi-structured format. Analyzing and filtering this data will now be much easier thanks to the capabilities offered by Excel.
Et voilà! In less than 100 VBA lines of code, you just performed Excel web scraping!
Find out more in our guide on how web scraping works in Excel.
Legacy Approach to Web Scraping in VBA with Internet Explorer
If you are using an older version of Windows, you can perform web scraping in VBA with Internet Explorer.
All you have to do is enable the “Microsoft HTML Object Library” and “Microsoft Internet Controls” references:
Note that these two packages are built-in in Excel, so you do not need to install extra libraries this time.
You can then achieve the same result as before with the following VBA code for web scraping:
Sub scrape_countries()
' start Internet Explorer
Dim browser As InternetExplorer
Set browser = New InternetExplorer
' enable the "headless" mode
browser.Visible = False
'visit the target page
browser.navigate "https://www.scrapethissite.com/pages/simple/"
' wait for the browser to load the page
Do: DoEvents: Loop Until browser.readyState = 4
' get the current page
Dim page As HTMLDocument
Set page = browser.document
' retrieve all country HTML nodes on the page
Dim countryHTMLNodes As Object
Set countryHTMLElements = page.getElementsByClassName("country")
' counter to the current row
Dim currentRow As Integer
currentRow = 1
' iterate over each country HTML node and
' apply the Excel scraping logic
For Each countryHTMLElement In countryHTMLElements
' where to store the scraped data
Dim name, capital, population, area As String
' data retrieval logic
name = countryHTMLElement.getElementsByClassName("country-name")(0).innerText
capital = countryHTMLElement.getElementsByClassName("country-capital")(0).innerText
population = countryHTMLElement.getElementsByClassName("country-population")(0).innerText
area = countryHTMLElement.getElementsByClassName("country-area")(0).innerText
' write the scraped data in Excel cells
Cells(currentRow, 1).Value = name
Cells(currentRow, 2).Value = capital
Cells(currentRow, 3).Value = population
Cells(currentRow, 4).Value = area
' increment the row counter
currentRow = currentRow + 1
Next countryHTMLElement
' close the current Internext Explorer window
browser.Quit
End Sub
Run this VBA module, and you will get the same result as before. Fantastic! You just did Excel web scraping with Internet Explorer.
Conclusion
In this guide, you learned what VBA is and why it allows you to do web scraping in Excel. The problem is that the VBA browser automation library relies on Internet Explorer, which is no longer supported. Here, you explored an equivalent approach to automate data retrieval in Excel with VBA using Selenium. Also, you saw the legacy approach for older versions of Windows.
At the same time, keep in mind that there are many challenges to take into account when extracting data from the Internet. In particular, most sites adopt anti-scraping and anti-bot solutions that can detect and block your VBA web scraping script. Avoid them all with our Scraping Browser solution. This next-generation browser integrates with Selenium and can automatically handle CAPTCHA resolution, browser fingerprinting, automated retries, and more for you!
Don’t want to deal with web scraping at all but are interested in Excel data? Explore our ready-to-use datasets. Not sure what data solution to choose? Contact us today!
No credit card required