Wikipedia dataset
The Wikipedia dataset provides a comprehensive overview of encyclopedia articles and knowledge content, helping you access structured information at scale. This dataset includes essential data points such as article titles, URLs, table of contents, raw text, cataloged text by sections, images, references, external links, categories, and more.
Trusted by 20,000+ customers worldwide
Available datasets
- Demo data in JSON/CSV
- Fresh records
- Customize, enrich, and format the data
LinkedIn people profiles
Amazon products
LinkedIn company information
Instagram - Profiles
Crunchbase companies information
Linkedin job listings information
Instagram - Posts
Zillow properties listing information
Google Maps full information
LinkedIn posts
X (formerly Twitter) - Posts
TikTok - Profiles
Youtube - Videos posts
Amazon Reviews
Facebook - Pages Posts by Profile URL
TikTok - Posts
Indeed job listings information
Companies information enriched dataset
Shopee - products
Walmart - products
Employees business enriched dataset
TikTok Shop
YouTube - Channels
Glassdoor companies overview information
Reddit- Posts
Google maps reviews
Airbnb Properties Information
IMDB media
X (formerly Twitter) - Profiles
Instagram - Reels
Booking Hotel Listings
Glassdoor companies reviews
Yahoo Finance business information
LinkedIn profiles Jobs Listings
Shein- Products
Yelp businesses overview
Instagram - Comments
pitchbook companies information
Facebook - Comments
Zoominfo companies information
Glassdoor job listings information
Amazon sellers info
eBay
Google Shopping
Amazon products global dataset
Github repository
G2 software product overview
Otodom Poland
Home Depot US
Facebook - Posts by group URL
Facebook Marketplace
Facebook - Posts by post URL
Etsy
Amazon best seller products
Australia real estate properties
Google Play Store
TikTok - Comments
Trustpilot business reviews
G2 software - product reviews
Amazon products search
Booking Listings Search
Goodreads books
Reddit - Comments
Facebook - Profiles
Yelp businesses reviews
Youtube - Comments
World population
Zillow price history
Amazon Walmart
Target
Zara - Products
Wikipedia articles
Facebook - Pages and Profiles
Pinterest - Posts
Indeed companies info
Best Buy products
Zoopla properties listing information
Lowes.com
NBA players' stats
Lazada - Products
Facebook Events
Ikea - Products
Walmart sellers info
Sephora products
OLX Brazil - marketplace ads
Realtor international properties listings
Xing social network
Ozon.ru products
Google Shopping products search US
Wayfair products
Facebook - Reels by profile URL
Digikey - Products
Creative Commons Images
Naver products
Google Play Store reviews
Facebook Company Reviews
Myntra products
Owler companies information
US lawyers directory
Mouser - Products
Manta businesses
Webmotors Brasil - Cars Listings
H&M - Products
Agoda Properties Listings
Apple App Store reviews
Tokopedia Products
Zonaprop Argentina - Properties Listing
Wildberries.ru products
mercadolivre.com.br products
VentureRadar company information
Quora posts
Carsales Cars Listings search page information
Pinterest - Profiles
Chileautos Chile - Cars Listings
Zalando products
Inmuebles24 Mexico - Properties Listings
Yapo Chile - marketplace ads
Asos - Products
Trustradius product reviews
Hermes- Products
World zipcodes
Vimeo - Videos posts
Bluesky - Posts
Costco products
Lazada - Reviews
Lego - Products
Home Depot CA
Kroger.com
Metrocuadrado - Properties Listings
Chanel Products
Aliexpress products
Lazada products search (GMV)
Dior - Products
Alibaba
Toctoc - Properties Listings
Infocasas Uruguay - Properties Listings
Properati Argentina and Colombia - Properties Listings
Top 500 Bluesky Profiles
Ashleyfurniture - Products
Snapchat posts
Fanatics.com - Products
Macys.com
Crateandbarrel - Products
AE.com - Complete Products
Mango Products
Apple App Store
Creative Commons 3D Models
Westelm products
apple shop products
Mediamarkt.de products
Balenciaga.com - Products
Autozone - products
Sephora Products
Toysrus - Products
Rona.ca products
chewy products
Loewe.com - Products
Carters.com - Products
Sally Beauty Products
Zara Home Products
llbean.com - Products
Prada.com - Products
Hoka products
Fendi Products
adidas products
Nike products
Micro Center Products
Dick’s Sporting Goods
LLBean
Massimo Dutti - Products
Ysl.com - Products
Delvaux - Products
Bottegaveneta.com - Products
Harbor Freight Products
Barnes & Noble Products
Nordstrom
Lululemon products
Free people
Mattressfirm - Products
Mybobs.com - Products
Rei
B&H Products
Grainger
Samsung
Sleepnumber.com - Products
Dell Products
Instacart Products unified schema
OLD NAVY Products
Raymourflanigan.com - Products
Berluti.com - Products
Montblanc - Products
Advance Auto Parts
American Eagle
Quince Products
Ulta
Overstock Products unified schema
Celine.com - Products
Neiman Marcus
Flipkart Products unified schema
Williams sonoma products
La-z-boy.com - Products
hp products
Mercari Products
Bath & Body Works
OUAI Products
Moynat.com - Products
Bass Pro Shops
Garmin Products
Coupang products
Sweetwater
Samsclub products
Sears Products
ACE products
Kohl's Products unified schema
Tatcha Products
Rona.ca products unified schema
Anthropologie Products
Ferguson Home Products
Poshmark Products unified schema
Threads - Profiles
Guitar Center Products
Vevor Products
Crateandbarrel - Products
iherb products
Victoria's Secret products
Saks Fifth Avenue products
Fragrance Net Products
GNC Products
Summit Racing Products
Lenovo Products
Dior Products Unified Schema
Staples
Macys Products unified schema
Newegg Products
ON Products
Parts Geek
Abercrombie & Fitch
Pottery barn products
H&M products
Bed Bath & Beyond
academy products
Zales
Underarmour Products
GameStop Products
Urban Outfitters
Newbalance products
Sony Electronics Products
vitamin shoppe products
J.Crew Products
Theordinary products
Paula's Choice Products
Walgreens
Athome products
Stradivarius Products
Editorialist products
Napa Online
thorne products
AT&T Products
Backcountry products
Markandgraham products
Flooranddecor Products
Office Depot Products
Vitacost products
Cabelas products
Bershka Products
Zara.com products
Zara Home products
Terrain Products
Oxo Products
tractor supply products
World Market products
Belk products
Hobbylobby
Pottery Barn Teen
Adorama
Bloomingdale's
Sharkninja
Massimodutti
Michaels Products
LA Roche Posay Products
Dollar General Products
Bjs Products
scheels products
Containerstore products
Greenrow
Lyst
Rocksbox
Peoples Jewellers products
Kiehl's Products
L'oreal Paris Products
Pull & Bear Products
Oysho
Pottery Barn Kids products
LG Products
Clinique Products
Nature Made Products
Blick Art Products
Asics Products
Dillard's
Nintendo products
WebstaurantStore
Converse Products
Famousfootwear Products
Naturium Products
Pet Smart Products
Rejuvenation
Ashley Furniture
Dermalogica Products
Sur La Table Products
Canon USA products
Banana Republic Products
Need real time Wikipedia data? Use our Wikipedia Scraper API
Filter the Wikipedia dataset with a single prompt
Describe exactly what you need, and let AI apply the perfect filters in seconds.
- Describe data needs in plain English
- AI applies accurate filters automatically
- Narrow huge datasets to only what matters to you
- Cut costs by skipping irrelevant data
- Export filtered data in your preferred format
Maximize value with strategic cost savings
Smart Data Updates
Access only "New Records" or "Updated Records," ensuring you pay only for what you need"
Dataset Bundles
Gain greater value by purchasing two or more datasets together, with exclusive discounts.
Volume Discounts
Get more for less with significant savings when purchasing large datasets or updates subscriptions
Enriched Datasets
Save time and resources with pre-built datasets that combine multiple sources into one clean dataset
Wikipedia dataset sample
Use our Wikipedia dataset to access encyclopedia content, research information, and knowledge resources across various topics and categories. Popular use cases include natural language processing, AI training, content analysis, research and education, knowledge graph development, semantic analysis, information extraction, and building question-answering systems.
Datasets Pricing
- Clean and validated
- Refreshed monthly
- JSON/CSV/Parquet
Power AI Agents Instantly
Our Wikipedia datasets are AI/LLM-optimized: clearly structured, well-documented, with code and recipes for easy LLM/chatbot integration.
Structured & Clean
Pre-processed data with consistent schemas, perfect for AI model training and inference.
Code Examples
Ready-to-use Python, Node.js, cURL, PHP, Go, Java, and Ruby snippets for easy integration with AI workflows.
Documentation
curl --request GET
--url https://api.brightdata.com/datasets/snapshots/{id}/download
--header 'Authorization: Bearer '
import requests
url = "https://api.brightdata.com/datasets/snapshots/{id}/download"
headers = {"Authorization": "Bearer "}
response = requests.get(url, headers=headers)
print(response.json())
const url = 'https://api.brightdata.com/datasets/snapshots/{id}/download';
const options = {method: 'GET', headers: {Authorization: 'Bearer '}, body: undefined};
try {
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
} catch (error) {
console.error(error);
}
HttpResponse response = Unirest.get("https://api.brightdata.com/datasets/snapshots/{id}/download")
.header("Authorization", "Bearer ")
.asString();
require 'uri'
require 'net/http'
url = URI("https://api.brightdata.com/datasets/snapshots/{id}/download")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Get.new(url)
request["Authorization"] = 'Bearer '
response = http.request(request)
puts response.read_body
Wikipedia datasets tailored to your needs
Data subscription
Subscribe to access datasets at a significantly reduced cost.
File output formats
JSON, NDJSON, JSON Lines, CSV, Parquet. Optional .gz compression.
Flexible delivery
Snowflake, Amazon S3 bucket, Google Cloud, Azure, and SFTP.
Scalable data
Scale without worrying about infra, proxy servers, or blocks.
Cost savings
Customize any dataset using filters and formatting options.
Code maintenance
Datasets are maintained based on website structure changes.
Simplified integrations
Benefit from integrations with Snowflake and AWS.
24/7 support
A dedicated team of data professionals is here to help.
Leaders in compliance
Data is ethically obtained and compliant with all privacy laws.
Get structured and reliable Wikipedia data
We’ll provide the data while you focus on the rest

High-volume web data
With our unblocking capabilities and round-the-clock IP rotation we ensure access to all data points on a website.

Data for immediate use
Every aspect of the data collection process is thoroughly validated as part of our robust data validation process.

Automated data flow
Create custom schedules to automate data delivery and watch the data flow seamlessly into your storage.
Bright Data's products are used by the world’s top brands
Wikipedia Dataset FAQs
How often is the Wikipedia dataset updated?
The Wikipedia dataset is available with flexible refresh schedules: one-time, bi-annual, quarterly, monthly, weekly, or daily - with deeper discounts for higher-frequency subscriptions (up to 80% off on monthly plans). You can also choose between pre-collected data (instantly available, collected within the last days to months) or freshly collected data gathered on-demand at the time of your order. The freshness window can be defined before checkout.
Can I purchase a subset of the Wikipedia dataset?
Yes. You can filter the Wikipedia dataset to include only the records and data fields you need - by geography, timeframe, category, or any supported field - using Bright Data's AI-powered filter tool or the Filter Dataset API. You only pay for the records in your filtered snapshot, which can substantially reduce cost. Filters support operators like equals, includes, greater than, is null, and more, with up to 3 levels of nested logic.
What formats and delivery options are available for the Wikipedia dataset?
The Wikipedia dataset is delivered in your choice of JSON, NDJSON, JSON Lines, CSV, XLSX, or Parquet, with optional .gz compression. Delivery destinations include: Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Snowflake, Google PubSub, SFTP, Webhook, or Email. You can also download directly via the Control Panel (up to 5 GB) or retrieve programmatically via the Snapshot Download API. For snapshots larger than 5 GB, download links are sent by email.
Can I get a free sample of the Wikipedia dataset before purchasing?
Yes. Every dataset in the Bright Data Marketplace - including the Wikipedia dataset - offers a downloadable data sample at no cost. You can preview the data schema, field definitions, and a representative set of records to evaluate quality, coverage, and relevance before committing to a purchase. Samples are available directly from the dataset page in the Bright Data Marketplace.
Can I request specific data fields or a custom version of the Wikipedia dataset?
Yes. You can customize the Wikipedia dataset to include only the specific fields you need, hiding irrelevant columns to reduce cost and simplify integration. For more advanced needs - such as a proprietary data schema, enriched fields, or a source not currently in the marketplace - Bright Data's team can build a custom dataset tailored to your requirements. Contact the sales team to discuss your use case.
How does the Wikipedia dataset integrate with my existing systems and workflows?
The Wikipedia dataset is built for seamless integration. You can pull data programmatically using the Marketplace Dataset API (with SDKs available for Python and JavaScript), push results directly to your data warehouse or cloud storage, or connect via native integrations with tools like Snowflake, AWS, Google Cloud, Databricks, and automation platforms like Zapier, Make.com, and n8n. The async workflow (trigger - poll - download) makes it easy to embed into any pipeline.
Is the Wikipedia dataset ethically sourced and legally compliant?
Yes. All Bright Data datasets - including Wikipedia - are collected exclusively from publicly available online sources in compliance with applicable laws and regulations, including GDPR, CCPA, and Bright Data's own Code of Ethics. Bright Data holds ISO 27001 certification and is SOC 2 compliant. Data undergoes rigorous quality assurance before delivery. You can review Bright Data's full compliance posture at the Trust Center.
What does the Wikipedia dataset cost, and are there volume discounts?
Pricing for the Wikipedia dataset starts at $250 for 100K records (approximately $0.0025 per record) for a one-time purchase. Subscription plans unlock significant savings: up to 25% off bi-annual, 50% off quarterly, and 80% off monthly refresh plans. Volume tiers are available for 100K, 500K, 1M, 5M, and 20M+ records. Dataset bundles (purchasing two or more datasets together) and smart update options (paying only for new or changed records) provide additional savings. For enterprise-scale pricing, contact the Bright Data sales team.
What is a snapshot, and how does the Wikipedia dataset delivery process work?
When you order the Wikipedia dataset - or filter it via API - Bright Data generates a snapshot: a point-in-time export of your selected records. The async workflow runs as follows: (1) your order or API call triggers the collection job; (2) you can poll the job status using the snapshot ID; (3) once complete, you download the snapshot via the API or your configured delivery destination. Snapshot metadata (including error codes and initiation type) is accessible via the Snapshot Metadata API.
Can I use the Wikipedia dataset for AI and machine learning projects?
Yes. The Wikipedia dataset is structured and validated for immediate use in AI and ML workflows, including LLM fine-tuning, model training, RAG pipelines, and agent knowledge bases. Data is delivered in standard ML-ready formats (JSON, Parquet, NDJSON) with consistent schemas and documented field definitions accessible via the Dataset Metadata API. Bright Data also offers specialized AI data packages and a web archive with 50+ PB of historical data for large-scale pre-training.