__ /\ \ _____ __ __ _ __ __ ___ ___ \_\ \ /\ '__`\/\ \/\ \/\`'__\/'__`\ /' __` __`\ /'_` \ \ \ \L\ \ \ \_\ \ \ \//\ __/ __/\ \/\ \/\ \/\ \L\ \ \ \ ,__/\ \____/\ \_\\ \____\/\_\ \_\ \_\ \_\ \___,_\ \ \ \/ \/___/ \/_/ \/____/\/_/\/_/\/_/\/_/\/__,_ / \ \_\ \/_/
Connect your agents to the open web
pure.md is a REST API that lets AI agents and developers reliably access web content. With pure.md, you can:
- Avoid bot detection by mimicking real user behavior
- Render JavaScript-heavy websites, PDFs, images, and files
- Scrape web pages into markdown optimized for an LLM
- Crawl search engines for up-to-date knowledge
- Extract JSON from web pages using natural language
▶ ▶ ▶ Prefix any URL with `pure.md/` ◀ ◀ ◀
Send HTTP requests like a human
Avoid getting flagged as a bot. Our proxy mimics real browser fingerprints and rotates egress IP addresses on every request. If a site can't be reached, we seamlessly fall back to fetching responses from Common Crawl and Internet Archive datasets.
Request │ ╰─────▶ Regional cache ─────╮ │ │ │ ▼ ╰───▶ Datacenter proxies ───╮ │ │ │ ▼ ╰──▶ Residential proxies ───╮ │ │ │ ▼ ╰──────▶ Common Crawl ──────╮ │ │ │ ▼ ╰─────▶ Wayback Machine ────╮ │ ▼ Response
Headless content rendering
Single-page applications (SPAs, such as those built with React) require JavaScript to render the content of each page — a process known as DOM hydration. A direct curl or fetch of these websites will just leave you with an empty shell of HTML.
Fetching through pure.md, on the other hand, hydrates the DOM of SPAs in the background so that pages render completely.
Similarly, PDFs are parsed as pure markdown automatically. Images run through AI models for object detection and summarization. Excel and Numbers spreadsheet documents can also be converted into markdown.
Markdown written for LLMs
Cut your inference costs and speed up your agents' workflows. Powered by HTMLRewriter, our URL-to-markdown service is optimized for low latency and low token output. We remove superfluous fluff from web pages — while also adding page metadata as frontmatter — so that LLMs have the most context in the fewest characters possible.
| r.jina.ai |░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 143K tokens | | tavily.com |░░░░░░░░░░░░░░░ 55K tokens | | pure.md |████████ 28K tokens ✅ |_______̩_______̩_______̩_______̩_______̩_______̩_______̩___ 25K 75K 125K 175K
| r.jina.ai | ❌ Verifying you are human... | | tavily.com | ❌ Failed to fetch content | | pure.md |█████ 22K tokens ✅ |_______̩_______̩_______̩_______̩_______̩_______̩_______̩___ 25K 75K 125K 175K
Knowledge in real-time
Make your AI apps aware of recent events. With our built-in search engine result page (SERP) crawling, you can turn user queries into a concatenated markdown string of answers that you can feed directly into your prompts. Your model will think it was trained yesterday.
Inference when you need it
Extract data from any page or search simply by changing from GET to POST. pure.md offers a selection of generative AI models for extracting structured or unstructured data from web pages. Stream back responses in markdown for tasks like summarization, or generate JSON that conforms to a custom schema.
POST https://pure.md/reuters.com
{ "prompt": "What are the top 5 headlines from today?", "model": "meta/llama-3.1-8b", "schema": { "type": "object", "properties": { "headlines": { "type": "array", "items": {"type": "string"} } }, "required": ["headlines"] } }
Works on select social media sites
(Coming soon)Prefix social media URLs with `pure.md/` like you normally would, and it will just work. Behind the scenes, pure.md transparently pulls data from data enrichment providers.
Supported:
- LinkedIn user profiles
- LinkedIn organizations
- Twitter/X tweets
- Reddit posts
Not supported:
- Facebook profiles
- Instagram profiles
Pricing as a feature
Simple, easy-to-understand pricing for projects of any size. All plans are available for commercial use. Pay for what you need, and cancel at any time.
Starter
Pay as you go
- 60 requests/minute
- $0.001/fetch
- $0.005/search
- No GenAI extraction
- Email support
- $1 free credit
Growth
$19/mo + metered usage
- 600 requests/minute
- $0.000/fetch
- $0.003/search
- GenAI extraction
- Slack/email support
- $20/mo free credit
Business
$99/mo + metered usage
- 3000 requests/minute
- $0.000/fetch
- $0.002/search
- GenAI extraction
- Slack/email support
- $100/mo free credit
Do I need a credit card to sign up?
No, you can sign up without inputting a credit card; however, you will have a strict rate limit imposed until you have an active subscription.
How do the free credits work?
Each month, you pay a flat fee up-front (except on the Starter plan, which is $0/mo). Over the course of the month, your usage deducts from your allotment of credits. Once you use up your credits, you are billed based on usage on your next payment period. Unused credits do not roll over to the next month.
How does my payment get processed?
We use Stripe as a payment processor. All credit card transactions take place on a subdomain of stripe.com.
How much does data extraction cost?
Data extraction uses generative AI to format answers in JSON or raw streams of text. Pricing varies by model — see the table below. These costs are only incurred on the POST endpoints, not the GET endpoints.
Model | Cost per million tokens |
---|---|
meta/llama-3.1-8b | $0.09/M input, $0.19/M output |
mistral/hermes-2-pro-7b | $0.19/M input, $0.19/M output |
meta/llama-3.3-70b | $0.49/M input, $0.99/M output |
deepseek/r1-distill-qwen-32b | $0.89/M input, $2.49/M output |
Can pure.md access content behind a login?
Yes, just include your authorization cookies in the request. pure.md passes request headers along to the target URL.
Is pure.md safe to use in production?
Yes, pure.md is safe to use in production. Our infrastructure runs on a combination of Cloudflare, AWS, and Railway servers, and is designed to autoscale with demand. Visit our status page for uptime history.
What file types are supported for markdown conversion?
HTML, PDF, images, and spreadsheet file types are supported — specifically,
.csv
.et
.htm
.html
.jpeg
.jpg
.numbers
.pdf
.png
.svg
.webp
.xls
.xlsb
.xlsm
.xlsx
.xml