WHAT IS A MARKDOWN FILE
A plain-language explainer for creators, agents, lawyers, underwriters, and platforms encountering the .md file format for the first time in a rights context.
The short answer
A markdown file is a plain text file that both humans and machines can read. It has no formatting buttons, no hidden structure, no proprietary encoding. It is just text — words and symbols arranged in a way that a person can understand by reading it and a computer program can understand by parsing it. The .md extension signals that the file uses a lightweight formatting convention called Markdown, but the key property is simpler than that: it is readable, parseable, and deployable by anything.
In the age of AI agents — automated systems that browse, retrieve, and act on web content at scale — that property is not incidental. It is the whole point.
What makes a file “machine-readable”?
Not all file formats are equal in the eyes of an automated system. A Word document (.docx) contains your text inside a compressed archive of XML files with proprietary structure. A PDF wraps content in a format designed for printing, not for parsing. An image of a contract is visible to a human but invisible to a machine without optical character recognition.
A plain text file — which is what a .md file is — contains only characters. No compression. No proprietary wrapper. Any program, in any language, on any operating system, can open it and read exactly what is written. When you put a cip.md file at the root of your domain, every AI system that visits your website can find it, open it, and read your rights declarations without any special software, any conversion step, or any ambiguity about whether the parsing worked correctly.
This is why robots.txt — the original machine-readable web standard — is a plain text file. It has worked for thirty years across every crawler, every search engine, and every AI system because it is impossible to misread. cip.md follows the same logic, extended from controlling access to declaring rights.
What Markdown formatting adds
Markdown is a set of simple conventions that add structure to plain text without making it unreadable by humans:
- A line starting with
#is a heading - A line starting with
-is a bullet point - Text wrapped in
**double asterisks**is bold - A line in the format
Key: Valueis a structured data pair
In the CIP declaration files, the key-value pair format is the most important:
CIP-Training-Ingestion: Prohibited CIP-NILP-Voice-Clone: Prohibited CIP-TDM-Opt-Out: true
These lines are simultaneously human-readable (a person can understand them immediately) and machine-parseable (an AI system can extract the field name and its value in a single operation). No ambiguity. No interpretation. No hidden metadata. Just a field and a value, separated by a colon.
Why this matters in the age of AI agents
AI agents are programs that operate autonomously — browsing websites, retrieving content, making decisions, and taking actions without a human in the loop at each step. They are already widespread: search indexers, training data scrapers, content retrieval pipelines, and agentic AI assistants that browse the web on behalf of users.
An AI agent visiting your website faces a question at every piece of content it encounters: what am I allowed to do with this? For a long time, the only answer available was robots.txt— a file that could say “do not index this” but could not say anything about rights, consent, or compensation.
The CIP declaration files extend this vocabulary to the full range of questions that matter for creative content in the AI era:
| Question the agent is asking | Field that answers it |
|---|---|
| May I use this content to train an AI model? | CIP-Training-Ingestion |
| May I use this content for fine-tuning? | CIP-Fine-Tuning |
| May I replicate this person's voice? | CIP-NILP-Voice-Clone |
| May I generate images of this person? | CIP-NILP-Likeness-AI |
| Is there a TDM opt-out I must respect? | CIP-TDM-Opt-Out |
| Who do I contact to negotiate a licence? | CIP-Rights-Contact |
| Is this creator CIP-certified? | CIP-Cert-Badge |
| Can I verify their certification? | CIP-Registry-URL |
A well-designed AI agent — and a CIP Platform Certified platform — is required to look for and respect these declarations before deciding what to do with content it encounters. The declaration file is not a technical nicety. It is the machine-readable equivalent of a notice on a gate.
The CIP declaration file
CIP uses a single declaration file: cip.md. It is a plain-text Markdown file that declares rights, consent, and certification status in machine-readable form.
cip.md — the CIP declaration format
The authoritative CIP declaration. Uses the full CIP- namespace throughout. Covers all eleven sections: operator identity, certification, rights bundle, input licence, transformation matrix, output licence, NILP and identity rights, pipeline licences, provenance and CDR, content scope, and equitable remuneration preferences.
This is the file produced by the CIP Generator after certification. Deploy it at yourdomain.com/cip.md.
When to use it: always, for any creator or platform participating in the CIP framework.
Historical formats (retired)
Earlier versions of the framework offered two additional formats — llms.md (a flat-key format for LLM crawlers) and ai.md (a ten-section structured format). As of v3.66, these are retired. cip.md is the single canonical declaration file and covers all use cases previously served by the other formats.
Technical specifications for these retired formats remain available in the reference library for historical purposes.
How the files are found by AI systems
An AI system looking for rights declarations on your domain follows a predictable lookup path:
- Check
robots.txtfor aCIP:reference pointing tocip.md - Check your HTML
<head>for a<link rel="cip-rights">tag - Attempt direct access at
yourdomain.com/cip.md
The CIP standard requires all three signals for full discoverability.
In robots.txt
CIP: https://yourdomain.com/cip.md
In your HTML <head>
<link rel="cip-rights" href="/cip.md" type="text/plain" />
The file itself, at
https://yourdomain.com/cip.md
The legal weight of a machine-readable declaration
A cip.md file is not just a technical convention. It is a machine-readable expression of rights that already subsist in your content under law. The file does not create those rights — copyright, moral rights, neighbouring rights, NILP rights — they arose automatically at the moment of creation. What the file does is make those rights legible in the medium where AI systems operate.
Under UK law — specifically the Data (Use and Access) Act 2025 — a rights holder who has applied a machine-readable TDM opt-out signal has a legally recognised basis to object to commercial AI training use of their content.
Under EU law — EU AI Act Article 53 — providers of general-purpose AI models must respect TDM opt-outs declared by rights holders.
In the United States, an explicit machine-readable opt-out significantly strengthens the rights holder's position under fair use analysis by establishing that the use was not authorised and that the holder had taken active steps to prevent it.
The file is small. The legal significance is not.
What AI agents are obligated to do
A CIP Platform Certified platform is required to:
- Query the Rights Registry at ingestion time for any content it encounters
- Read the CDR linked to that content
- Respect
CIP-Training-Ingestion: Prohibitedby excluding the content from training corpora - Respect
CIP-NILP-Voice-Clone: Prohibitedby excluding voice data from synthesis training - Attach a Provenance Certificate to any outputs derived from content it did ingest under licence
An AI agent operating on a certified platform that ignores a cip.md declaration is not just committing a technical error. It is creating a subsisting rights failure — a legal liability — at the point of ingestion.
For each CIP audience
Creators
You do not need to understand the technical details of markdown syntax to deploy a cip.md. The CIP Generator produces the file for you after you complete certification. What you do need to understand is this: without a machine-readable declaration on your domain, AI systems have no way to know that your rights exist. With one, they do — and a certified platform is legally and contractually required to respect it.
Agents
A client without a cip.md is a client whose rights are invisible to AI systems. Deploying cip.md on your clients' domains, alongside CDR registration, is the first practical step in rights protection. It is faster to deploy than a contract amendment and more visible to AI systems than a clause in a PDF.
Lawyers
The cip.md declaration is evidentiary infrastructure. It establishes, in a timestamped and machine-readable form, that the rights holder made their position known before any alleged ingestion. Combined with a Rights Registry CDR, it creates the documentary foundation for Training Data Dividend claims, NILP Downstream Obligation assertions, and — in cases of deliberate non-compliance — aggravated damages arguments.
Underwriters
A domain with an active, correctly formatted cip.md linked to a registered CDR presents a materially lower Training Data Dividend claim probability than one without. The file is not just a rights declaration; it is evidence of a rights holder who understood their rights, took active steps to assert them, and created an auditable record of doing so.
Platforms
A cip.md at the root of a content domain is the signal your rights-aware ingestion system should be checking at ingestion time. Platform Certification requires you to query the Rights Registry for CDRs linked to ingested content. The cip.md is the complementary domain-level declaration — the file that tells you, before you even query the Registry, what the rights holder's position is.