signals

The signals an agent reads.

crwl grades how legible your site is to AI agents and answer engines. This is the whole method: what a machine actually sees, the signals it relies on, and how those roll up into one score.

the premise

A machine sees a different page than you do.

When ChatGPT, Claude, or Perplexity decide what to cite, their crawlers fetch your raw HTML and read the structured signals in it. They do not run your JavaScript, wait for a framework to hydrate, or see your design. A site can look finished to a person and be nearly blank to a machine. crwl reads the page the way the crawler does, then measures the gap between the two.

three layers

Found, read, called.

Every check belongs to one of three layers, weighted by how much it moves whether an agent can actually use you. The weights below are the default; crwl tilts them by the kind of site it detects.

Discovery

45%

Can an agent find out you exist and what you offer, without guessing?

field average 47/100

llms.txt
AI-crawler policy in robots.txt
sitemap
MCP card + tools
AGENTS.md

Structure

30%

Once it has the page, can it actually read the content and its meaning?

field average 56/100

server-rendered HTML
schema.org JSON-LD
canonical + description
one clear H1

Interface

25%

For products and APIs: can an agent call you, not just read you?

field average 15/100

discoverable OpenAPI spec

every signal

What each one checks, and why it matters.

llms.txtDiscoveryspec ↗: A /llms.txt at the site root that maps what you are and your key pages (and optionally llms-full.txt).; It is the one file written for language models: a clean index they can ground on instead of crawling blind.
AI-crawler policyDiscoveryspec ↗: A robots.txt that names the AI crawlers (GPTBot, ClaudeBot, PerplexityBot) and does not silently block them.; If you block the crawlers, the answer engines never see you, no matter how good the page is.
SitemapDiscoveryspec ↗: A reachable sitemap.xml (or sitemap index) listing your real URLs.; It tells a crawler the full shape of the site in one request instead of hoping links lead everywhere.
Agent / MCP cardDiscoveryspec ↗: A /.well-known card describing what your product does and how to call it.; Agents that look for a machine-readable capability card can use you as a tool, not just read about you.
MCP toolsDiscoverybonus: Connecting to the MCP server your card advertises and reading its tools: every tool needs a clear description, and an inline UI (MCP Apps) is a plus.; A vague or missing tool description is the top reason an agent calls the wrong tool. A server that merely exists is not the same as one an agent can use well.
AGENTS.mdDiscoverybonus: An /AGENTS.md brief that tells an agent what your site is and how to work with it.; A short, plain-text orientation written for agents, the same idea as llms.txt aimed at coding and task agents.
Server-rendered HTMLStructurespec ↗: The main content is present in the raw HTML, not injected by JavaScript after load.; Crawlers read raw HTML. A page that only fills in client-side is effectively blank to them.
schema.org JSON-LDStructurespec ↗: Structured data that labels the page's type (Organization, Article, Product, and so on).; It turns prose into facts a machine can quote with confidence instead of inferring.
Canonical + descriptionStructure: A self-referential canonical link and a real meta description in the page head.; The canonical tells an agent which URL is the real one to cite; the description gives it a ready-made summary.
Identity linksStructurebonus: schema.org sameAs links from your Organization or Person to authority profiles (Wikidata, LinkedIn, GitHub, and the like).; They resolve your brand to a known entity instead of an isolated name a machine has to disambiguate.
One clear H1Structurespec ↗: A single, sensible top-level heading with clean nesting below it.; It is the cheapest signal of what a page is actually about.
Discoverable OpenAPIInterfacespec ↗: A linked, valid OpenAPI spec an agent can fetch.; It is the difference between an agent reading about your API and being able to call it correctly.
Markdown negotiationBonusbonusspec ↗: Serving text/markdown when an agent sends Accept: text/markdown.; Hands agents clean text instead of making them strip HTML.
Content-Signal policyBonusbonusspec ↗: A declared AI content policy in robots.txt.; States your terms for AI use explicitly rather than leaving them to assumption.
Answer-optimized schemaBonusbonusspec ↗: FAQPage and HowTo structured data where it fits.; Maps directly onto the question-and-answer shape answer engines reach for.

Bonus signals add credit when they are present but never count against a site that leaves them out.

off-site

Whether agents find you where they already look.

An agent does not only read your site. When someone asks it for a tool or a vendor, it reaches for the places it already trusts: the package registries (npm, PyPI), the MCP registry, and the knowledge graph behind search (Wikipedia and Wikidata). crwl checks whether you show up there under your own name, and confirms each listing is really yours by its link back to your domain, so a stranger's same-named project never counts as you.

These are reported alongside the score rather than folded into it: they reach beyond your own site and are read with less certainty than the on-site checks, but they are often how an agent finds you in the first place.

scoring

One number, weighted for your kind of site.

crwl first detects a profile from the site itself: personal, SaaS or API, docs, ecommerce, or open-source project. It then weights the layers for that profile and, importantly, excludes the checks that do not apply rather than failing them. A personal site is never marked down for lacking an API spec. Each layer is the weighted average of its checks; the layers roll up to a 0–100 score and a grade.

92–100

82–91

70–81

55–69

below 55

from score to fixes

The report hands you the files, not homework.

For each failing check, crwl writes the actual file that fixes it: an llms.txt, a robots.txt policy, a sitemap, JSON-LD, generated from what the crawl really saw on your site, not a template. Drop them in, rescan, and watch the score move. crwl scores the same way on the web and in the open-source CLI, so the result is identical wherever you read it.

Scan your site →Run it from the CLI