In this XPath vs CSS Selector guide, you will learn:
- What XPath expressions are, how they work, and their advantages and drawbacks.
- What CSS selectors are, how they work, and their pros and cons.
- How XPath expressions and CSS selectors compare when it comes to performance, simplicity, and use cases.
Time to dive in!
XPath: Complete Analysis
Let’s start this XPath vs CSS selector guide by diving into the first element of the comparison, XPath.
Definition
XPath, short for XML Path Language, is a query language to navigate and query the DOM. In particular, it provides a powerful way to locate and extract information from XML/HTML documents.
XPath has a syntax that resembles that of a file system, relying on expressions to locate nodes in the XML/HTML tree. An XPath expression defines the path to specific elements and attributes within the document’s hierarchical structure.
Syntax
Below is a breakdown of the key components of the XPath syntax:
/
: To start selecting nodes from the root node.//
: To select nodes in the document from the current node that matches the selection, regardless of their location..
: To select the current node...
: To select the parent of the current node.@
: To select node attributes.element
: To select nodes based on a specific tag (e.g.,div
).[condition]
: To select nodes based on a specified condition (e.g.,[@type="submit"]
).function()
: To apply a specific XPath function on the expression (e.g.,text()
returns the text content of the selected node).
Some examples to better understand the syntax of XPath are:
//a
: Selects all<a>
elements in the document.//ul/li
: Selects all<li>
elements that are children of<ul>
elements.//ul/..
: Selects all parent nodes of<ul>
elements.//ul/li[@category='fiction']
: Selects all<il>
elements under<ul>
tags with acategory
attribute equal to'fiction'
.//title[@lang='en']
: Selects all<title>
elements with alang
attribute equal to'en'
anywhere in the document.- //title/text(): Retrieves the text content of all
<title>
elements in the document. //div[contains(@class, 'post')]/following-sibling::div[1]
: Select the first<div>
element that is a sibling of each<div>
element containing the class'post'
.
Note that XPath expressions also support boolean and arithmetic operators to combine multiple functions and conditions.
Pros
- High versatility: It allows you to navigate through both XML and HTML structures, enabling precise targeting of elements, attributes, and text nodes. It also supports both forward and backward traversal of the DOM as well as parent adn sibling node selection.
- Many functions and operators: It comes with a rich set of built-in functions (e.g.,
contains()
,concat()
,count()
, etc.) and operators (e.g.,+
,or
,and
, etc.) for manipulating and comparing data within XML/HTML documents. - Support for both absolute and relative paths: XPath expressions describe the path to the desired nodes from the document’s root (absolute paths) or from a specific element (relative paths).
- Support for text node selection: It enables the direct selection of text nodes, opening the door to the extraction of textual content from XML/HTML documents without the need for additional processing or parsing.
- Platform independence: It is not tied to a specific programming language or platform, supporting a wide range of environments, libraries, browsers, and operating systems.
Cons
- Intricate and long syntax: The syntax of XPath can be challenging, especially for beginners. Writing the path to a specific node deeply nested in the DOM can result in a long expression that may involve some functions and operators. This can make XPath expressions error-prone and difficult to debug.
- Limited support and popularity: Not all HTML parsing libraries support XPath. This is because CSS selectors are much more popular among web developers, and libraries tend to focus on them. Plus, most XPath-based libraries like HtmlAgilityPack still rely on XPath 1.0, released in 1999. The current version is XPath 3.1, released in 2017. Read our guide on HtmlAgilityPack to become an expert in web scraping with C#.
Tips and Tricks
Chrome allows you to test and retrieve XPath expressions directly in the browser.
Suppose you are interested in selecting a specific element on a webpage. Visit it in Chrome, right-click on the node of interest, and select “Inspect:”
Right-click on the specific DOM element and choose “Copy > Copy XPath” to get an XPath expression to it. In the example above, you will get:
//*[@id="site-content"]/section[1]/div/div/div[1]/div[4]/a[1]
Note: This is useful to get an idea of how to construct an effective XPath selection strategy. At the same time, automatically generated XPath expressions tend to be too long and implementation-oriented. So, you cannot rely on them in production.
Now, you want to test an XPath expression on the page. In Chrome, there are two ways to do that.
First, paste the XPath expression into the search bar of the “Elements” section of DevTools you can enable with CTRL/Command + F:
Second, call it in the console with the special function $x():
CSS Selectors: In-Depth Review
Continue this XPath vs CSS selector article by exploring the second element of the comparison, CSS selectors.
Definition
CSS selectors enable you to select HTML elements within a webpage. They are a part of CSS, and they are used to target the HTML elements on web pages. Similarly, headless browser tools and HTML parsing libraries support them as a way to select nodes on the DOM.
A CSS selector can target single elements or groups of elements based on their ID, class, attributes, and position in the document tree. While CSS selectors play a crucial role in applying styles and formatting to web pages, they are also a great tool when it comes to web scraping.
Syntax
The best way to explain the syntax of CSS selectors is to showcase them through some examples:
- Element selector: To target elements based on their tag name. For example,
p
selects all<p>
elements in the DOM. - Class selector: To target elements with a specific class attribute. For example,
.highlight
selects all elements with theclass="highlight <other_classes>"
HTML attribute. - ID selector: To target a specific element given its ID attribute. For example,
#navbar
selects the element withid="navbar"
. - Attribute selector: To target elements based on their attributes. For example,
input[type="text"]
selects all<input>
elements with thetype="text"
attribute. - Descendant selector: To target elements that are descendants of another element. For example,
div a
selects all<a>
elements that are descendants of<div>
elements. - Child selector: To target elements that are direct children of another element. For example,
ul > li
selects all<li>
elements that are direct children of<ul>
elements. - Adjacent sibling selector: To target an element that is immediately preceded by a specified sibling element. For example,
h2 + p
selects the<p>
element immediately following an<h2>
element.
Keep in mind that different browsers provide different implementations of the CSS standard. Check sites such as caniuse.com for information on the compatibility of a specific CSS operator or syntax.
Pros
- Excellent performance: Most browsers have a dedicated CSS selector engine that ensures high performance. This engine is used primarily for styling, but can also come in handy when using CSS selectors on a page via a browser automation tool.
- Quick to learn: The curve for mastering CSS selectors is quite shallow—even for beginners—thanks to its intuitive syntax.
- Easy and well-known syntax: They have a concise syntax that does not involve complex operators or functions. Plus, most web developers know how to use them, which makes them able to use them in more than styling.
- Great maintainability: CSS selectors are designed to be easy to read and update, simplifying code maintenance.
- Overall compatibility: Modern web browsers and the best web scraping tools support them. This ensures consistent node selection across different platforms, devices, and use cases without the need for environment-specific workarounds.
Cons
- Do not support advanced functions and operators: As opposed to XPath, CSS selectors are pretty straightforward and do not have many functions or operators. For example, you cannot use them to select text nodes or extract data from the DOM.
- Do not support upward DOM tree traversal: They can look for elements in the DOM only starting from the root node and moving downward.
Tips and Tricks
Just as in the case of XPath, Chrome can test and generate CSS selectors directly on a page.
Assume you are interested in writing a CSS selector to target a specific node. Visit the destination page in Chrome, right-click on the element of interest, and select “Inspect”:
Right-click on the specific DOM element and opt for “Copy > Copy selector” to get a full CSS selector for it. In the example above, you will receive:
#site-content > section.cta.bg-dark.pt-7.pt-md-8.pt-lg-9.pt-xl-10.pb-6.pb-xl-7.text-center > div > div > div.cta_btns.d-flex.flex-wrap.g-2.justify-content-center.justify-content-md-center > a
As you can see, it is too long and implementation-specific. Although it is useful to get an idea, do not use the CSS selectors generated with this function in production.
Let’s say you need to test a CSS selector on a webpage. In Chrome, there are a few ways to do that.
The first approach is to paste the CSS selector into the search bar as below, which can be activated with the CTRL/Command + F shortcut:
The second one is to test them in the console by using these special functions:
$()
: To select a single element with the specified CSS selector.- $$(): To select all matching elements.
Use them as in the following example:
Equivalently, you can use the querySelector()
and querySelectorAll()
JavaScript functions:
XPath vs CSS Selector: Direct Comparison
Now that you know what XPath and CSS selectors are, you are ready to dig into the XPath vs CSS selector analysis.
For a head-to-head comparison at a glance, check out the summary table below:
Aspect | XPath | CSS Selectors |
W3C standard | Yes | Yes |
Latest specification | XPath 3.1 (2017) | CSS Level 4 (constantly being updated) |
Compatibility | Most browsers and scraping tools still support XPath 1.0 | Most browsers and scraping tools support it in its latest specification |
Syntax | Complex and verbose | Simple and concise |
Functions and operators | Many | A few |
Text node selection | Supported | Not supported |
Performance in the browser | Medium/Slow | Fast |
Library support | Usually supported by XML parsing libraries | Usually supported by most HTML parsing libraries |
Simplicity
XPath syntax generally seems much more complex compared to CSS selectors. Its syntax resembles a path-based querying language, which involves a steep learning curve for developers not familiar with it. However, XPath offers precise control over element selection and traversal.
CSS selectors are generally simpler and more intuitive when it comes to selecting DOM elements. They use familiar patterns such as tag names, classes and IDs, making them easy to understand and use even by beginners. CSS selectors are widely adopted in web development, which makes their syntax quite familiar.
Speed
As shown by a benchmark, XPath expressions applied to DOM trees in a browser tend to be slower than CSS selectors. The reason is that XPath engines usually have to perform more complex traversal operations than CSS selector engines. Moreover, most modern browsers have highly optimized CSS selector engines, which enable efficient selection of HTML elements. As for HTML parsing libraries, the performance differences depend on the underlying implementation.
Use Cases
XPath is great for querying and navigating XML documents using XSLT or for simple data extraction. Its advanced capabilities can prove useful in particular scraping scenarios, such as when targeting parent nodes. CSS selectors are predominantly used for styling HTML documents and selecting nodes in modern web scraping scripts.
Conclusion
XPath or CSS selectors? In this guide to XPath and CSS selectors, you learned that they are both effective methods for selecting DOM elements. XPath focuses more on XML documents and provides advanced features, while CSS selectors work great on HTML pages and are simpler.
When using XPath expressions and CSS selectors in web scraping, the real problem is getting blocked by anti-bot technologies. Regardless of the node selection strategy you adopt, these systems can detect and block your automated scraping script. Fortunately, Bright Data offers several top-notch solutions for you:
- Web Scraper API: Easy-to-use APIs for programmatic access to structured web data from dozens of popular domains.
- Scraping Browser: A cloud-based controllable browser that offers JavaScript rendering capabilities while handling CAPTCHAs, browser fingerprinting, automated retries, and more for you. It integrates with the most popular automation browser libraries, such as Playwright and Puppeteer.
- Web Unlocker: An unlocking API that can seamlessly return the raw HTML of any page, circumventing any anti-scraping measures.
Don’t want to deal with web scraping at all but are still interested in online data? Explore our ready-to-use datasets!
No credit card required