pure.md api (1.0.0)

Download OpenAPI specification:

Support: puremd@crawlspace.dev Terms of Service

Introduction

pure.md is a REST API that lets AI agents and developers reliably access web content. With pure.md, you can:

Avoid bot detection by mimicking real user behavior
Render JavaScript-heavy websites, PDFs, images, and files
Scrape web pages into markdown optimized for an LLM
Crawl search engines for up-to-date knowledge
Extract JSON from web pages using natural language

Authentication

Generate a unique API token in your dashboard. Then include that token in the x-puremd-api-token request header for all requests.

Rate limits

Rate limits vary by subscription plan. See pricing for details.

Subscription type	Requests per minute
Logged out / anonymous	6
Logged in, no subscription	10
Starter plan	60
Growth plan	600
Business plan	3000

MCP server

The Model Context Protocol, developed by Anthropic, is an open standard that enables AI systems to seamlessly interact with an ecosystem of tooling. With it, MCP clients like Cursor, Windsurf, and Claude Desktop can learn how to use a variety of APIs and other functionality.

You can instruct your MCP clients to route traffic through pure.md by following the instructions at https://github.com/puremd/puremd-mcp.

Headers pass through

All request headers pass through to the target URL, except ones that begin with x-puremd-.

Original headers from the origin are returned in the response.

Fetch web content

Retrieves the content of a given URL in markdown format. Use this endpoint to scrape text content from a web page without getting blocked.

Authorizations:

APIToken

path Parameters

url

required

string

The URL

Responses

Response samples

200
400
415
429

Content type

text/plain

<WebPage url="https://example.com">

title: Example Domain
access_date: Wed, 05 Mar 2025 22:27:19 GMT

---

# Example Domain

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

More information...

</WebPage>

Fetch and extract data

This endpoint is only available on paid plans.

Runs inference on the content of a given URL. Use this endpoint to extract structured JSON from a webpage.

Authorizations:

APIToken

path Parameters

url

required

string

The URL

Request Body schema: application/json

prompt required	string The user message
model	string Enum: "meta/llama-3.1-8b" … 3 more The generative AI model to use. Smaller models are faster, while larger models are more accurate. Default model: `meta/llama-3.1-8b`
schema	object JSON schema of the desired response. Omit this property to get a response in plaintext.

Responses

Request samples

Content type

application/json

{"prompt": "What are the top 5 headlines from today?",
"model": "meta/llama-3.1-8b",
"schema": {"type": "object",
"properties": {"headlines": {"type": "array",
"items": {"type": "string"
}
}
},
"required": ["headlines"
]
}
}

Response samples

Content type

application/json

{"type": "object",
"properties": {"headlines": {"type": "array",
"items": {"type": "string"
}
}
},
"required": ["headlines"
]
}

Search the web

This endpoint is only available on paid plans.

Crawls the top results from a search engine query and concatenates the web content from all pages into markdown. Use this endpoint to gather knowledge of news, current events, or specific topics.

Authorizations:

APIToken

query Parameters

q

required

string

The URL-encoded search query

Responses

Response samples

200
400
401
402
429

Content type

text/plain

# Title of the Page

## Introduction
This is the introduction text from the webpage, purified and optimized for LLM processing.

## Main Content
The main content of the page converted to clean markdown format, with unnecessary elements removed.

### Subsection
Content organized in logical subsections with proper hierarchy.

## Conclusion
The concluding information from the webpage.

Search and extract data

This endpoint is only available on paid plans.

Crawls the top results from a search engine query and runs inference on their web content. Use this endpoint to answer questions about news, current events, or general user queries that require searching.

Authorizations:

APIToken

path Parameters

url

required

string

The URL

Request Body schema: application/json

prompt required	string The user message
model	string Enum: "meta/llama-3.1-8b" … 3 more The generative AI model to use. Smaller models are faster, while larger models are more accurate. Default model: `meta/llama-3.1-8b`
schema	object JSON schema of the desired response. Omit this property to get a response in plaintext.

Responses

Request samples

Content type

application/json

{"prompt": "Who won the baseball game last night?",
"model": "meta/llama-3.1-8b",
"schema": {"type": "object",
"properties": {"headlines": {"type": "array",
"items": {"type": "string"
}
}
},
"required": ["headlines"
]
}
}

Response samples

200
400
401
402
429

Content type

application/json

"string"