The signals an agent reads.
crwl grades how legible your site is to AI agents and answer engines. This is the whole method: what a machine actually sees, the signals it relies on, and how those roll up into one score.
A machine sees a different page than you do.
When ChatGPT, Claude, or Perplexity decide what to cite, their crawlers fetch your raw HTML and read the structured signals in it. They do not run your JavaScript, wait for a framework to hydrate, or see your design. A site can look finished to a person and be nearly blank to a machine. crwl reads the page the way the crawler does, then measures the gap between the two.
Found, read, called.
Every check belongs to one of three layers, weighted by how much it moves whether an agent can actually use you. The weights below are the default; crwl tilts them by the kind of site it detects.
Discovery
45%Can an agent find out you exist and what you offer, without guessing?
- llms.txt
- AI-crawler policy in robots.txt
- sitemap
- agent / MCP card
Structure
30%Once it has the page, can it actually read the content and its meaning?
- server-rendered HTML
- schema.org JSON-LD
- one clear H1
Interface
25%For products and APIs: can an agent call you, not just read you?
- discoverable OpenAPI spec
What each one checks, and why it matters.
- llms.txtDiscovery
- A /llms.txt at the site root that maps what you are and your key pages (and optionally llms-full.txt).
- It is the one file written for language models: a clean index they can ground on instead of crawling blind.
- AI-crawler policyDiscovery
- A robots.txt that names the AI crawlers (GPTBot, ClaudeBot, PerplexityBot) and does not silently block them.
- If you block the crawlers, the answer engines never see you, no matter how good the page is.
- SitemapDiscovery
- A reachable sitemap.xml (or sitemap index) listing your real URLs.
- It tells a crawler the full shape of the site in one request instead of hoping links lead everywhere.
- Agent / MCP cardDiscovery
- A /.well-known card describing what your product does and how to call it.
- Agents that look for a machine-readable capability card can use you as a tool, not just read about you.
- Server-rendered HTMLStructure
- The main content is present in the raw HTML, not injected by JavaScript after load.
- Crawlers read raw HTML. A page that only fills in client-side is effectively blank to them.
- schema.org JSON-LDStructure
- Structured data that labels the page's type (Organization, Article, Product, and so on).
- It turns prose into facts a machine can quote with confidence instead of inferring.
- One clear H1Structure
- A single, sensible top-level heading with clean nesting below it.
- It is the cheapest signal of what a page is actually about.
- Discoverable OpenAPIInterface
- A linked, valid OpenAPI spec an agent can fetch.
- It is the difference between an agent reading about your API and being able to call it correctly.
- Markdown negotiationBonusbonus
- Serving text/markdown when an agent sends Accept: text/markdown.
- Hands agents clean text instead of making them strip HTML.
- Content-Signal policyBonusbonus
- A declared AI content policy in robots.txt.
- States your terms for AI use explicitly rather than leaving them to assumption.
- Answer-optimized schemaBonusbonus
- FAQPage and HowTo structured data where it fits.
- Maps directly onto the question-and-answer shape answer engines reach for.
Bonus signals add credit when they are present but never count against a site that leaves them out.
One number, weighted for your kind of site.
crwl first detects a profile from the site itself: personal, SaaS or API, docs, ecommerce, or open-source project. It then weights the layers for that profile and, importantly, excludes the checks that do not apply rather than failing them. A personal site is never marked down for lacking an API spec. Each layer is the weighted average of its checks; the layers roll up to a 0–100 score and a grade.
The report hands you the files, not homework.
For each failing check, crwl writes the actual file that fixes it: an llms.txt, a robots.txt policy, a sitemap, JSON-LD, generated from what the crawl really saw on your site, not a template. Drop them in, rescan, and watch the score move. crwl scores the same way on the web and in the open-source CLI, so the result is identical wherever you read it.