Technical

The llms.txt File Explained: What It Is, Why It Matters, and How to Write One

llms.txt is a plain text markdown file hosted at the root of your domain that gives AI models a curated map of your site. It includes a brand summary, prose context, and a list of your most important pages with one line descriptions. Proposed by Jeremy Howard of Answer.AI in September 2024, llms.txt sits next to robots.txt and sitemap.xml and speaks to a different audience: large language models that need to understand your site without crawling every page.

By Brendan Hunt · May 20, 2026 · 14 min read

The llms.txt file is a plain text file you put at the root of your domain to give AI models a structured map of your site, your organization, and your most important content. The format was proposed by Jeremy Howard of Answer.AI in September 2024 and has since been adopted by Anthropic, Cloudflare, Stripe, Hugging Face, OpenAI, Mintlify, and a growing list of documentation sites and product companies. It takes an hour to write, lives in a single file, and gives Claude, ChatGPT, Perplexity, and any future AI model a clean reading list for understanding your business.

If you are building for AI visibility in 2026, llms.txt is the cheapest infrastructure step in your AEO program. The file sits next to robots.txt and sitemap.xml at the root of your site. It does a job neither of those files was designed to do. And almost none of your competitors have published one yet.

Why AI models need a different file

Robots.txt tells crawlers where they can and cannot go. Sitemap.xml tells search engines every URL on your site. Both files were designed for a world where the crawler was an index. The model behind the crawler was simple: pull every accessible page, store it, and rank it later when somebody types a query.

AI answer engines work differently. When ChatGPT or Perplexity needs to understand your business, it does not crawl every page on your site. It pulls a few high signal documents into a context window with a hard token limit, reads them, and synthesizes an answer. The crawler does the same job a junior researcher would do: skim the most useful pages, get the gist, and move on.

That changes what a useful file looks like. AI models do not need a list of every URL. They need a curated reading list. They need to know which page on your site answers “what does this company do” without scanning the whole footer. They need clean markdown, not HTML soup. They need links with descriptions, not URLs in isolation.

That is the gap llms.txt fills.

Robots.txt is access control. Sitemap.xml is an index of every URL. llms.txt is the briefing document you would hand to a new hire on day one. Three files, three different jobs. You should publish all three.

What llms.txt is, formally

llms.txt is a markdown file hosted at the root of your domain, at the path /llms.txt. The format was proposed by Jeremy Howard, founder of Answer.AI and creator of fast.ai, in September 2024. Adoption since has been quiet but steady. Anthropic, Cloudflare, Stripe, Hugging Face, OpenAI, Replicate, Mintlify, and a growing roster of documentation sites and product companies now publish llms.txt files as standard practice.

The file follows a strict markdown structure. An H1 with the site or project name. A blockquote with a short summary. Free form prose sections describing the project. Then H2 sections that group links by category, with each link followed by a colon and a one line description.

The format is rigid enough that an AI model can parse it reliably, but flexible enough that a human can write it in a text editor in under an hour. There is no XML schema, no JSON validation, no proprietary CLI. It is markdown with a few conventions.

Two variants ship together in practice. llms.txt is the short, curated index file. It is designed to fit comfortably inside a context window and lists the pages that matter most. llms-full.txt is the full content dump. It concatenates the actual content of every important page into a single markdown file, designed for AI tools that want the full text in a single fetch. Most sites publish both. The short file is the table of contents. The long file is the full book.

How llms.txt is different from robots.txt and sitemap.xml

The three files coexist. They do different jobs, and none of them replaces the others. Robots.txt is access control. It tells crawlers which paths they are allowed to visit. It has been the standard since 1994. Modern AI crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google Extended honor it. If you block these crawlers in robots.txt, no amount of llms.txt or sitemap.xml work will help. We covered crawler access in detail inside our complete guide to AEO, and it is the prerequisite for everything that follows in this article.

Sitemap.xml is an index of every URL on your site, with last modified dates and change frequencies. It is built for breadth. The receiving system, usually a search engine, decides what to do with each URL. Sitemap.xml does not editorialize. It does not tell you what is important. It lists what exists.

llms.txt is a curated reading list. It only includes the pages that matter for understanding your business. It includes prose context. It tells the reader which page answers which question. It is built for depth, not breadth.

File	Audience	Purpose	Format
robots.txt	All crawlers, including AI crawlers	Access control. Defines which paths a crawler may visit.	Plain text, line based directives
sitemap.xml	Search engine indexers	URL index. Lists every page with last modified dates.	XML, machine readable
llms.txt	Large language models and AI agents	Curated reading list with prose context and descriptions.	Markdown, human and machine readable

The structure of llms.txt

The format has six elements, in this exact order. The first three are required. The last three are optional but strongly recommended.

An H1 with your site or project name. A single H1, no more. This is the title of the document. It should be unambiguous. “AEO Hunt” not “Welcome.”

A blockquote summary. One or two sentences explaining what your site is and who it is for. This is the first thing an AI model reads after the title. Write the answer you would want an AI to give if a user asked, “What is this brand?”

Prose context. One to three short paragraphs describing your project, business, or site in plain language. Avoid marketing speak. Tell the model what you do, who you do it for, and what makes your approach different. This is the section AI models are most likely to quote when summarizing your business.

H2 link sections. Each H2 is a category. Common categories include Documentation, Pages, Blog, Services, Reference, and Examples. Under each H2, list links as a markdown unordered list. Each link is a markdown link followed by a colon and one descriptive sentence.

The link format is strict. The line should look like this:

- [Page Title](https://example.com/page): One line description of the page.

An Optional section. The Optional section is the last H2 on the page. Anything inside this section is “skip if you must” content. AI models reading the file treat anything outside Optional as required reading and anything inside as bonus material.

A worked example

Suppose a developer tools company called Acme Routing publishes llms.txt. The minimum useful file looks like this:

# Acme Routing

> Acme Routing is a developer platform for building real time, geo aware routing into mobile and web apps.

Acme Routing provides APIs, SDKs, and developer tooling for routing, geocoding, and place search across 195 countries. The platform is used by 14,000 developers and ships in production at companies including Acme Logistics and Globex.

## Documentation

- [Quickstart Guide](https://acme.com/docs/quickstart): 10 minute guide to your first routing request.
- [API Reference](https://acme.com/docs/api): Reference for every endpoint, including request and response examples.
- [SDK Reference](https://acme.com/docs/sdk): Language specific SDK documentation for Swift, Kotlin, JavaScript, Python, and Go.

## Pricing

- [Pricing](https://acme.com/pricing): Per request pricing with a free tier of 50,000 requests per month.

## Optional

- [Engineering Blog](https://acme.com/blog): Updates, technical deep dives, and engineering posts.
- [About](https://acme.com/about): Company background and team.

That file is under 250 words. An AI model can ingest it in a single tool call. The structure is clear enough that the parser does not have to guess which links matter or which categories belong together.

llms.txt vs llms-full.txt: when to use each

llms.txt is the index. llms-full.txt is the content. llms.txt typically runs 100 to 500 lines. It lists pages with descriptions. The file itself does not contain the full text of those pages, just pointers and one line summaries. The job of llms.txt is to help an AI model decide what to fetch.

llms-full.txt is the actual content of every important page, concatenated into a single markdown file. For a small site with 20 pages of documentation, llms-full.txt might be 50,000 words. For a large product like a developer platform, it might run into the hundreds of thousands.

The split matters because of how AI tools work. An AI agent with limited tool calls and a 128k or 200k context window will read llms.txt first. From the descriptions and structure, it decides which pages to fetch. If your llms.txt is well written, the agent fetches the right three pages and ignores the other seventeen. If your llms.txt is missing, the agent guesses, often poorly.

llms-full.txt is for tools that want all of the content in a single request. IDE assistants like Cursor and Windsurf, AI coding tools, and documentation indexers all support pulling a full documentation set into context for a session. Publishing llms-full.txt makes that workflow trivial. The tool fetches one file and the AI has full context.

Most companies should publish llms.txt first. Add llms-full.txt later, once the short file is in good shape. If you only have time for one, publish llms.txt.

How to write llms.txt: step by step

Here is the process I use for every client AEO Hunt onboards. From blank file to live llms.txt in under an hour.

List your high signal pages. Open a notes file. Write down every page on your site that helps somebody understand your business. The home page. The about page. The pricing page. Each main service or product page. The contact page. Two to five top blog posts that establish authority. Stop at 15 to 25 entries. If you find yourself listing every blog post, prune.
Group your pages by category. Cluster the entries into 3 to 6 categories. Common groupings include Home and About, Services, Documentation, Pricing, Blog, and Resources. The categories become your H2 headings. Categories with one entry are usually a sign you should merge.
Write the H1 and summary. The H1 is your brand or product name. The blockquote summary is one to two sentences that explain what you do and who you do it for. Write the answer you want an AI to give when a user asks, “What is this brand?”
Write the prose context. One to three short paragraphs describing your business in plain language. Cover what you do, who you serve, and what makes your approach different. Skip the listicle preface. Skip the jargon. Write the section like a journalist would. Include specific stats or facts if you have them. AI models quote this section verbatim.
Format the link sections. Under each H2, list your pages as a markdown unordered list. Each item is a markdown link followed by a colon and a single descriptive sentence. The description matters more than the URL, because it is the only signal the AI uses to decide whether to fetch the page.
Add the Optional section. Move the lowest priority pages into a final H2 called Optional. This is where you put pages that are nice to have but not required for understanding your business. The blog index. Legal pages. The press kit.
Save as plain text, host at /llms.txt. Save the file as llms.txt. Upload it to your site root so it lives at https://yourdomain.com/llms.txt. The file must be served with a content type of text/plain or text/markdown. Most static hosting handles this automatically.
Test the URL. Open https://yourdomain.com/llms.txt in a browser. Confirm the file renders as plain text. Confirm there are no redirects, no authentication walls, no 404 errors. Curl the URL and verify the response matches what you uploaded.
Publish llms-full.txt for content heavy sites. If you run a documentation site or have a large library of content, concatenate the actual page text into a single markdown file called llms-full.txt and host it next to llms.txt. AI coding tools and IDE assistants will pull it directly.

That is the entire process. The hardest part is the writing. The technical hosting is trivial.

Where AI tools actually read llms.txt

This is the question every client asks. Does anybody actually read this file?

The answer in May 2026 is yes, with adoption increasing across the major AI platforms and tools.

Cursor and Windsurf pull llms-full.txt into IDE context. When a developer adds a documentation source to either tool, the IDE fetches llms-full.txt if it exists and uses it as the canonical reference for that library or product. The result is faster, more accurate code completion against the documented APIs.

Claude and ChatGPT can read llms.txt when given a URL directly. If you paste a domain into Claude and ask it to summarize the site or extract structured info, Claude will fetch /llms.txt if the file exists and use it as the entry point. The same is true for ChatGPT with browsing enabled and for Perplexity during retrieval.

Mintlify, the developer documentation platform, generates llms.txt automatically for every site it hosts. Hundreds of developer tool companies publish llms.txt by default because they use Mintlify.

Cloudflare, Anthropic, Stripe, Hugging Face, Replicate, OpenAI, and fast.ai all publish llms.txt files for their own properties. When the platforms building the AI infrastructure adopt a file format, expect the file format to matter.

The thing to understand is that llms.txt is still early. Adoption is uneven. Some AI tools read it religiously, others ignore it. The cost of publishing is so low that there is no real argument against doing it now, and the value will compound as more tools standardize on the file.

Does llms.txt help with AEO?

This is where I have to be careful, because llms.txt is not a silver bullet for getting cited by AI answer engines. The biggest signals for AI citation come from your content quality, your entity authority, your structured data (we walk through this in our schema markup for AEO guide), and your third party mention density.

But llms.txt does help, in three specific ways.

First, it raises the floor on how AI models describe your business. When an AI model summarizes your company without a direct citation, the summary is built from whatever the model managed to extract from your site. If your site is poorly structured, the summary will be vague or wrong. If your llms.txt is clean, the summary will be tighter and closer to what you would write yourself.

Second, it helps inside agentic workflows. AI agents that browse the web on behalf of a user, like Claude with computer use, ChatGPT operator, and Perplexity Comet, treat llms.txt as a starting point. An agent visiting your site for the first time will fetch llms.txt before it fetches anything else. A clean file gets the agent to your important pages faster, which translates into more accurate task completion and a higher chance of citation.

Third, it gives you control. Without llms.txt, AI models guess which of your pages matter. With llms.txt, you decide. You promote your most important pages, push lower priority pages into the Optional section, and shape the prose that frames your business.

None of this replaces the rest of the AEO playbook. It is a low cost, high impact step that almost no competitors have taken yet.

Common mistakes to avoid

I have audited dozens of llms.txt files at this point, mostly on developer tool sites that adopted the format early. Five mistakes show up over and over.

Mistake 1: Listing every page on your site

llms.txt is not sitemap.xml. If your file has 200 links, you are doing it wrong. Aim for 15 to 25 links across 3 to 6 categories. Anything more dilutes the signal and tells the AI nothing about which pages matter most.

Mistake 2: Generic, useless descriptions

“Our blog” is not a description. “Pricing page” is not a description. The line after the colon is the only signal the AI gets for whether to fetch the page. Make every description tell the model what is on the page and who it is for. “Per request pricing with a free tier of 50,000 requests per month” beats “Pricing” every time.

Mistake 3: Stale content

If your llms.txt was written 14 months ago and references an old product name, an old pricing model, or a since deleted page, AI models will surface outdated information about your business. Treat llms.txt like a living document. Update it quarterly at minimum, and immediately whenever your pricing, positioning, or service mix changes.

Mistake 4: Marketing fluff in the prose section

The prose context is where AI models pull verbatim text when summarizing your business. If your prose section says “We are a world class team passionate about helping our customers succeed,” that is the sentence AI will quote back at users. Write the prose section like a journalist would, not like a brand copywriter. Specific facts. Plain language. Real numbers where you have them.

Mistake 5: Forgetting llms-full.txt

The short file is the table of contents. The long file is the full book. If you only publish llms.txt and never ship llms-full.txt, you have given the AI a menu but no food. For documentation sites in particular, llms-full.txt is the more useful of the two files because IDE assistants and AI coding tools fetch it directly.

The five most common llms.txt mistakes all share one root cause: treating the file like a marketing asset instead of a developer document. Keep it short, keep it current, write the prose like a journalist, and ship both files.

A real example: how to use llms.txt for your own brand

AEO Hunt has an llms.txt file at https://aeohunt.com/llms.txt. The file lists the home page, the eight service pages, the about page, the pricing page, and a curated set of blog posts that establish authority on AEO concepts. The prose section is two paragraphs that explain what we do, who we serve, and what makes the approach different.

The file is roughly 4 kilobytes. The first version took me 35 minutes to write. I update it every time I publish a new blog post that I want AI models to discover early. Anyone reading this guide can pull the file, study the format, and use it as a template for their own site.

The point is not that the file is perfect. The point is that it exists, it is current, and it gives AI models a clean starting point when they encounter the brand for the first time. That single fact puts the site ahead of the large majority of competitors that have never published one.

How to know if AI tools are fetching your file

You should not publish llms.txt and then wonder whether it is working. Two checks tell you almost everything you need to know.

Start with server logs. Look for user agents associated with AI crawlers visiting /llms.txt. The common ones are GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, PerplexityBot, PerplexityBot-User, and Google-Extended. If you see requests to your llms.txt from those agents, the file is being read. If your site sits behind Cloudflare, AI crawler traffic is broken out inside the Cloudflare analytics dashboard. Vercel and Netlify expose user agent data through their analytics layer. If you do not have either, a short nginx or Apache log query gives you the same answer.

For a faster qualitative signal, paste your domain into Claude or ChatGPT and ask the model to summarize your business. Then ask the same model who you serve and what makes your approach different. If the answers closely match the prose section of your llms.txt, the file is doing its job. If the summary is vague, wrong, or pulled from random pages on the site, either the file is not being fetched yet or the prose section needs sharpening.

This monitoring layer matters because AI crawler behavior shifts. A file format that nine tools read in May 2026 might be read by fifteen tools six months later. Watching the actual fetches keeps you grounded in what is happening rather than what should be happening, and it gives you the data to decide whether the file needs another round of editing.

Who should write llms.txt today

If you run a documentation site, you should have llms.txt already. If you have not added it, do it this week. The format was designed for documentation first, and AI coding tools are the most active consumers of the file.

If you run a SaaS or B2B company, you should write llms.txt this month. The format is settled enough to commit to. The cost is one to two hours of focused writing.

If you run an e-commerce site, llms.txt is lower priority but still worth doing. Your category pages, your brand story, and your most important policy pages should be in the file. Skip the individual product pages. Those belong in your sitemap.xml.

If you run a personal site or a blog, llms.txt is optional. But it takes 20 minutes, so do it anyway.

The pattern is clear. The cost of publishing is low. The cost of not publishing is invisible right now, but it will become visible the moment a buyer asks an AI agent to research your business and the agent comes back with a vague or incorrect summary because it had nothing structured to read.

llms.txt is a quiet, low effort win. Write it once, set it up correctly, keep it current, and benefit for years.

FAQ

Frequently Asked Questions

What is llms.txt?

llms.txt is a plain text markdown file you host at the root of your domain to give AI models a structured map of your site. It includes a brand summary, prose context, and a curated list of important pages with descriptions. The format was proposed by Jeremy Howard of Answer.AI in September 2024 and has been adopted by Anthropic, Cloudflare, Stripe, Hugging Face, and a growing list of documentation sites and product companies.

How do I write an llms.txt file?

Start with an H1 of your brand name, a blockquote summary of what you do, and one to three short paragraphs of plain language context. Then add H2 sections grouping your most important pages by category, with each link followed by a one line description. Save the file as plain text and host it at https://yourdomain.com/llms.txt. The whole process takes about an hour for a typical small to mid size site.

What is the difference between llms.txt and llms-full.txt?

llms.txt is a curated index, typically 100 to 500 lines, listing your most important pages with one line descriptions. llms-full.txt is the actual concatenated content of those pages combined into a single markdown file, often tens of thousands of words. The short file helps AI models decide what to read. The long file gives them every piece of content in a single fetch. Publish llms.txt first. Add llms-full.txt for documentation heavy sites.

Do AI engines actually read llms.txt?

Adoption is growing. Cursor and Windsurf pull llms-full.txt into IDE context. Claude and ChatGPT fetch llms.txt when given a domain URL directly. Perplexity uses it as a retrieval signal. Mintlify generates it automatically for every documentation site it hosts. Cloudflare, Anthropic, Stripe, OpenAI, Hugging Face, and Replicate all publish their own. The standard is still early but is rapidly moving toward default.

Does llms.txt replace robots.txt or sitemap.xml?

No. The three files do different jobs. Robots.txt controls crawler access at the path level. Sitemap.xml lists every URL on your site for search engines. llms.txt gives AI models a curated, descriptive reading list of the pages that matter most. You should publish all three. None of them substitutes for the others, and llms.txt depends on robots.txt allowing AI crawlers in the first place.

Where do I host the llms.txt file?

Host the file at the root of your domain, at the exact path /llms.txt. The file must be served as plain text with no authentication wall and no redirects. Most static hosting platforms including Cloudflare Pages, Netlify, Vercel, and GitHub Pages handle this automatically. After publishing, open https://yourdomain.com/llms.txt in a browser and confirm it renders correctly. If you see a 404 or a redirect, your hosting needs configuration.

Brendan Hunt

Founder & CEO of AEO Hunt. 15+ years in digital marketing, previously at Google. Specializes in custom AI integration, AEO strategy, and AI-powered marketing systems.

Want help writing yours?

AEO Hunt writes llms.txt files and the rest of the AEO infrastructure for brands that want to be cited by AI answer engines. Book a discovery call and we will look at your site together.

Book a Discovery Call