AI

Open-Source AI Visibility Tracker, And How Bright Data’s LLM Scrapers Made It Possible

Learn how the free, open-source GEO/AEO Tracker uses Bright Data’s LLM Scrapers to monitor brand visibility across 6 AI platforms.
12 min read
Open-Source AI Visibility Tracker

AI models are now answering the questions your customers used to Google. If your brand isn’t in those answers, you’re almost invisible and you probably don’t even know it. I built a free, open-source tool to track exactly that. Here’s what I learned, and why Bright Data’s Scraper APIs were the only infrastructure that could make it work.

Quick summary:

  • The GEO/AEO Tracker is a free, open-source AI visibility dashboard tracking 6 AI models simultaneously.
  • It uses Bright Data’s LLM Scrapers to query ChatGPT, Gemini, Perplexity, Grok, Copilot, and Google AI Mode.
  • Bright Data delivers structured output (citations, sources, answer text) per model, via a single API pattern.
  • Enterprise paid tools charge $200–$600/month and lock your data; this stack costs fractions of a cent per query with all data staying local.
  • The SRO Pipeline uses Bright Data’s SERP API, Web Unlocker, and LLM Scrapers in one end-to-end workflow.
  • All data stays in your own environment. No vendor lock-in, no external database.

The GEO Problem No One Has Fully Solved Yet

ChatGPT has crossed 900 million weekly active users as of early 2026. Google AI Overviews now appear in roughly 16% of all searches. And traffic that arrives from AI search engines converts 23x better than traditional organic visitors. Ahrefs confirmed this from their own data, finding that 0.5% of their traffic from AI sources drove 12.1% of all signups.

McKinsey projects $750 billion in US revenue will flow through AI-powered search by 2028. That’s not a forecast about some future state. It’s already happening, query by query, every time someone asks ChatGPT “which CRM should I use?” or Perplexity “who makes the best project management software?”

You can’t optimize what you can’t measure. And measuring AI visibility has been either too expensive, too limited, or both.

What I Built: The GEO/AEO Tracker in 60 Seconds

The GEO/AEO Tracker is an open-source, local-first AI visibility intelligence dashboard. You can try the live demo right now without an API key.

It tracks your brand across ChatGPT, Perplexity, Gemini, Grok, Google AI Mode, and Microsoft Copilot simultaneously, in parallel, with all data stored locally in your browser via IndexedDB. No external database. No vendor lock-in.

13 features, 6 AI models, zero vendor lock-in

I built this because I kept running into the same problem: every tool I evaluated either cost too much, locked me into their ecosystem, or didn’t cover enough models. So I built the thing I wanted to use.

The features that matter most for real-world brand tracking:

Prompt Hub runs any prompt across all 6 models at once. For a product marketing team tracking competitive queries, that’s the difference between running 6 separate experiments and running one. You can manage a full prompt library, use {brand} injection for dynamic substitution, and trigger batch runs, all in parallel.

Visibility Analytics gives you a 0–100 score based on brand mention rate, position in responses, citation frequency, and sentiment over time. This is the KPI CMOs can report upward without a 20-slide explanation. It’s also exportable as CSV.

Citation Opportunities is the feature I’m most proud of. It shows which URLs competitors get cited for where you don’t appear. That’s a direct content gap and link-building intelligence feed, delivered automatically.

SRO Analysis (more on this below) is a 6-stage pipeline that scores how well a specific page is optimized for AI search results, from 0 to 100, with prioritized, actionable recommendations. It uses multiple Bright Data products in a single workflow.

Drift Alerts fire automatically when your visibility score changes significantly. A brand reputation shift in AI answers can compound fast. Knowing within days is very different from knowing at your monthly review.

Why Bright Data Was the Only Viable Foundation

This is the part of the build story that most people skip, but it’s the whole reason the tool works at production quality instead of breaking every week.

The scraping challenge no one talks about

ChatGPT, Perplexity, Gemini, Grok, Google AI Mode, and Copilot are all:

  • Fully JavaScript-rendered. A plain HTTP request returns nothing useful.
  • Aggressively bot-blocked. They detect automated traffic patterns and reject them. The most common anti-scraping techniques which include browser fingerprinting, CAPTCHA challenges, and behavioral analysis — are all in play simultaneously across these platforms.
  • Structurally different from each other. Each platform returns data in a different format. Perplexity uses markdown with inline sources. Gemini returns citations as a separate structured array. Grok has a response_raw field alongside answer_text_markdown.
  • Geolocation-dependent. The same prompt can return different answers and different citations depending on the country the request appears to come from.

Building and maintaining scrapers for all six from scratch would require residential proxy infrastructure, CAPTCHA solving, session management, response normalization across models, polling for async responses, and ongoing maintenance every time a platform updates its structure. That’s months of engineering work before you write a single line of tracking logic.

Bright Data collapses all of that to a single API call per model.

Six scrapers, one API key: how it works in code

The core integration in brightdata-scraper.ts follows a simple, repeatable pattern across all six providers:

// Step 1: POST to the Bright Data dataset endpoint
const scrapeResponse = await fetch(
  `https://api.brightdata.com/datasets/v3/scrape?dataset_id=${datasetId}&format=json`,
  {
    method: "POST",
    headers: { Authorization: `Bearer ${BRIGHT_DATA_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      input: [{ url: providerBaseUrl[provider], prompt: request.prompt, index: 1 }]
    }),
  }
);

// Step 2: Handle async response — poll for snapshot readiness
if (scrapeResponse.status === 202) {
  const { snapshot_id } = await scrapeResponse.json();
  await monitorUntilReady(snapshot_id); // polls /progress/{id} every 2 seconds
  payload = await downloadSnapshot(snapshot_id); // GET /snapshot/{id}?format=json
}

// Step 3: Normalize the result
const answer = normalizeAnswer(record); // handles all 6 model formats
const sources = extractSourcesFromAnswer(answer); // merges text + structured citations

Every model uses this same pattern. The only thing that changes is the dataset_id, one environment variable per provider: BRIGHT_DATA_DATASET_CHATGPT, BRIGHT_DATA_DATASET_PERPLEXITY, and so on.

That’s the architecture: one integration pattern, six models, consistent structured output every time.

What the structured output actually looks like

Each Bright Data scraper returns model-specific fields. The normalizeAnswer() function handles cross-model format differences so the rest of the application sees a consistent interface:

Model Key Fields Returned
ChatGPT answer_text, links_attached, citations, recommendations, country
Perplexity answer_text_markdown, sources, source_html, is_shopping_data
Gemini answer_text, citations, links_attached, index, country
Grok answer_text, answer_text_markdown, citations, response_raw
Google AI Mode answer_text, citations, links_attached, index, country
Copilot answer_text_markdown, sources, answer_section_html, index

The normalization layer checks answer_text first, falls back to answer_text_markdown, then response_raw, then does a deep recursive extraction on the raw record. Bright Data handles the platform-specific complexity; the application handles the cross-platform normalization. Clean separation of concerns.

The SRO Pipeline: Bright Data’s Full Stack in One Feature

SRO Analysis is the most technically involved feature in the tracker, and it’s also the clearest demonstration of what Bright Data’s infrastructure enables at scale.

The idea: score how well a specific page is optimized for AI search results, from 0 to 100, with concrete recommendations. The six-stage pipeline behind that score:

Stage 1: Gemini Grounding. Uses the Google Gemini API to understand how AI systems perceive the page, including its topic, authority signals, and content structure.

Stage 2: Cross-Platform Citations. Calls all 6 Bright Data LLM Scrapers in parallel via scrapeAllPlatforms() to check whether the target URL or domain is cited when the relevant keyword is queried across ChatGPT, Perplexity, Gemini, Grok, Google AI Mode, and Copilot.

Stage 3: SERP Analysis. Uses Bright Data’s SERP API to pull organic ranking data for the keyword. If the page ranks #1 organically but isn’t cited in any AI answers, that’s a GEO gap worth surfacing.

Stage 4: Page Scraping. Uses Bright Data’s Web Unlocker to fetch the actual page content and analyze its structure, depth, BLUF density, heading hierarchy, and schema markup. No paywall, no bot block.

Stage 5: Site Context. Uses Bright Data’s Web Unlocker again to pull the homepage and extract brand authority signals that AI systems use when deciding whether to cite a source.

Stage 6: LLM Analysis. Synthesizes all of the above into a final SRO Score plus a prioritized recommendation list: which things to fix first, what content gaps exist, where competitors are outperforming you on AI citation.

One feature. Six Bright Data product integrations. The result is an auditing workflow that would take an enterprise team months to build from scratch, and that’s the point.

Enterprise Use Cases: What Companies Are Actually Doing With This

The tracker is open source, but the infrastructure it’s built on (Bright Data’s LLM Scraper APIs) is what scales to real enterprise workloads. Here’s how that looks in practice.

Brand Reputation Monitoring at Scale

A CMO at a mid-market SaaS company needs to know: when a user asks ChatGPT “which [product category] should I trust?”, what does it say? Is the answer accurate? Is the sentiment positive? Does it even mention the brand?

Without a tracking tool, you find out three months later when a prospect tells you they asked an AI and it recommended a competitor. With the tracker, you run a batch of reputation-sensitive prompts weekly, drift alerts fire when sentiment shifts, and the Citation Opportunities tab shows exactly what content to produce or what backlinks to earn to change the AI’s answer. For teams that want to go deeper, there’s a detailed walkthrough of building an automated brand reputation monitoring workflow using Bright Data’s SDK.

Competitive Intelligence for Sales Teams

Sales enablement and product marketing teams face a specific problem: competitors are showing up in AI answers for queries that should belong to them. They don’t know which queries, which models, why, or what to do about it.

The Competitor Battlecards tab generates AI-powered side-by-side comparisons between your brand and any competitor. The citation gap analysis shows exactly which URLs the competitor is cited for where you aren’t. That used to be the kind of intelligence agencies charged $50k/year to produce.

GEO Strategy for Multi-Brand or Agency Teams

An agency managing 12 brands can’t afford $500/month per brand for AI visibility tracking. The math breaks fast.

The tracker’s multi-workspace support and BYOK (Bring Your Own Key) model means you pay only for Bright Data API usage. At $1.50/1K records pay-as-you-go, running a full weekly tracking batch across 10 prompts and 6 models costs fractions of a dollar per brand. Ten brands tracked for less than the cost of one SaaS seat.

Technical GEO Audits for SEO Clients

When SEO clients ask “are we GEO optimized?” the honest answer, without tooling, is vague. The SRO Analysis changes that. It delivers a 0–100 score per page with a concrete priority stack: fix schema markup, improve BLUF density in the opening paragraph, earn citations from these three domains. It’s the difference between an audit that says “AI optimization matters” and one that says “here are the five things to do this week.” If you want to see how this kind of multi-agent GEO optimization workflow can be built end-to-end, the GEO and SEO content optimization guide with CrewAI walks through exactly that.

Data Sovereignty Requirements

Enterprise procurement and legal teams have a legitimate concern: they can’t send brand tracking data to a third-party SaaS vendor’s servers. This blocks adoption of almost every commercial GEO tool at the enterprise level.

The tracker’s local-first architecture (IndexedDB + localStorage) means Bright Data delivers structured data via API, and the enterprise decides where it goes. Bright Data itself is SOC 2 Type II, ISO 27001, GDPR, and CCPA compliant, so it passes enterprise security reviews. The data flow is clean: structured response in, local storage, no intermediary.

What This Means If You Want to Build Something Similar

The tracker is one application of Bright Data’s LLM Scraper APIs. The infrastructure it runs on is general-purpose.

If you’re building an AI monitoring dashboard, a brand intelligence tool, a competitive research product, or any application that needs to query AI models at scale and get back structured data, the building blocks are the same. For context on what’s available for these use cases, the top SERP and web search APIs comparison covers the landscape well. Bright Data’s network of 150M+ residential IPs across 195 countries means AI platforms see real user traffic. The 99.99% uptime means your automated pipelines don’t fail silently on a Tuesday morning. Bulk request handling up to 5,000 URLs means you can run enterprise-scale batch tracking in a single operation. Output delivery to S3, GCS, Snowflake, Azure, and SFTP means the data drops directly into whatever stack you already have.

If you’re also considering the best AI agent frameworks to orchestrate these scrapers into a full autonomous pipeline, that’s a natural next step. All of the top frameworks integrate directly with Bright Data.

The question isn’t whether to track AI visibility. It’s how fast you can get the infrastructure in place to act on what you find.

Get Started with Bright Data’s LLM Scrapers

If you want to run your own instance of the GEO/AEO Tracker, clone the repo and add your Bright Data API key. You’ll be live in under 10 minutes:

git clone https://github.com/danishashko/geo-aeo-tracker.git
cd geo-aeo-tracker && npm install
# Add BRIGHT_DATA_KEY + 6 dataset IDs to .env
npm run dev

The six Bright Data scraper dataset IDs (for the ChatGPT Scraper API, Perplexity Scraper, Gemini Scraper, Grok Scraper, Google AI Mode Scraper, and Copilot Scraper) are available directly from the Bright Data Scrapers Marketplace once you have an account.

If you want to build something custom at enterprise scale, the LLM Scrapers are the infrastructure layer. Both paths start in the same place: a free Bright Data trial.

View the open-source repo on GitHub

Daniel Shashko

SEO & AI Automations

6 years experience

Daniel Shashko is a Senior SEO/GEO at Bright Data, specializing in B2B marketing, international SEO, and building AI-powered agents, apps, and web tools.