Connect your agents to the open web

pure.md is a REST API that lets AI agents and developers reliably access web content. With pure.md, you can:


▶ ▶ ▶ Prefix any URL with `pure.md/` ◀ ◀ ◀


Send HTTP requests like a human

Avoid getting flagged as a bot. Our proxy mimics real browser fingerprints and rotates egress IP addresses on every request. If a site can't be reached, we seamlessly fall back to fetching responses from Common Crawl and Internet Archive datasets.



Request

   │
   ╰─────▶ Regional cache ─────╮
   │                           │
   │                           ▼
   ╰───▶ Datacenter proxies ───╮
   │                           │
   │                           ▼
   ╰──▶ Residential proxies ───╮
   │                           │
   │                           ▼
   ╰──────▶ Common Crawl ──────╮
   │                           │
   │                           ▼
   ╰─────▶ Wayback Machine ────╮
                               │
                               ▼

                           Response
            

Headless content rendering

Single-page applications (SPAs, such as those built with React) require JavaScript to render the content of each page — a process known as DOM hydration. A direct curl or fetch of these websites will just leave you with an empty shell of HTML.


Fetching through pure.md, on the other hand, hydrates the DOM of SPAs in the background so that pages render completely.


Similarly, PDFs are parsed as pure markdown automatically. Images run through AI models for object detection and summarization. Excel and Numbers spreadsheet documents can also be converted into markdown.


Markdown written for LLMs

Cut your inference costs and speed up your agents' workflows. Powered by HTMLRewriter, our URL-to-markdown service is optimized for low latency and low token output. We remove superfluous fluff from web pages — while also adding page metadata as frontmatter — so that LLMs have the most context in the fewest characters possible.



           |
 r.jina.ai |░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  143K tokens
           |
           |
tavily.com |░░░░░░░░░░░░░░░ 55K tokens
           |
           |
   pure.md |████████ 28K tokens ✅
           |_______̩_______̩_______̩_______̩_______̩_______̩_______̩___

                 25K           75K           125K          175K
        
Input tokens from the Wikipedia article on Artificial Intelligence


           |
 r.jina.ai | ❌ Verifying you are human...
           |
           |
tavily.com | ❌ Failed to fetch content
           |
           |
   pure.md |█████ 22K tokens ✅
           |_______̩_______̩_______̩_______̩_______̩_______̩_______̩___

                 25K           75K           125K          175K
        
Input tokens from a science.org article

Knowledge in real-time

Make your AI apps aware of recent events. With our built-in search engine result page (SERP) crawling, you can turn user queries into a concatenated markdown string of answers that you can feed directly into your prompts. Your model will think it was trained yesterday.


Inference when you need it

Extract data from any page or search simply by changing from GET to POST. pure.md offers a selection of generative AI models for extracting structured or unstructured data from web pages. Stream back responses in markdown for tasks like summarization, or generate JSON that conforms to a custom schema.

POST https://pure.md/reuters.com
{
  "prompt": "What are the top 5 headlines from today?",
  "model": "meta/llama-3.1-8b",
  "schema": {
    "type": "object",
    "properties": {
      "headlines": {
        "type": "array",
        "items": {"type": "string"}
      }
    },
    "required": ["headlines"]
  }
}

Works on select social media sites

(Coming soon)

Prefix social media URLs with `pure.md/` like you normally would, and it will just work. Behind the scenes, pure.md transparently pulls data from data enrichment providers.


Supported:

  • LinkedIn user profiles
  • LinkedIn organizations
  • Twitter/X tweets
  • Reddit posts

Not supported:

  • Facebook profiles
  • Instagram profiles

Pricing as a feature

Simple, easy-to-understand pricing for projects of any size. All plans are available for commercial use. Pay for what you need, and cancel at any time.

Starter

Pay as you go

  • 60 requests/minute
  • $0.001/fetch
  • $0.005/search
  • No GenAI extraction
  • Email support
  • $1 free credit
Get started

Growth

$19/mo + metered usage

  • 600 requests/minute
  • $0.000/fetch
  • $0.003/search
  • GenAI extraction
  • Slack/email support
  • $20/mo free credit
Get started

Business

$99/mo + metered usage

  • 3000 requests/minute
  • $0.000/fetch
  • $0.002/search
  • GenAI extraction
  • Slack/email support
  • $100/mo free credit
Get started
Do I need a credit card to sign up?

No, you can sign up without inputting a credit card; however, you will have a strict rate limit imposed until you have an active subscription.

How do the free credits work?

Each month, you pay a flat fee up-front (except on the Starter plan, which is $0/mo). Over the course of the month, your usage deducts from your allotment of credits. Once you use up your credits, you are billed based on usage on your next payment period. Unused credits do not roll over to the next month.

How does my payment get processed?

We use Stripe as a payment processor. All credit card transactions take place on a subdomain of stripe.com.

How much does data extraction cost?

Data extraction uses generative AI to format answers in JSON or raw streams of text. Pricing varies by model — see the table below. These costs are only incurred on the POST endpoints, not the GET endpoints.


Model Cost per million tokens
meta/llama-3.1-8b $0.09/M input, $0.19/M output
mistral/hermes-2-pro-7b $0.19/M input, $0.19/M output
meta/llama-3.3-70b $0.49/M input, $0.99/M output
deepseek/r1-distill-qwen-32b $0.89/M input, $2.49/M output
Can pure.md access content behind a login?

Yes, just include your authorization cookies in the request. pure.md passes request headers along to the target URL.

Is pure.md safe to use in production?

Yes, pure.md is safe to use in production. Our infrastructure runs on a combination of Cloudflare, AWS, and Railway servers, and is designed to autoscale with demand. Visit our status page for uptime history.

What file types are supported for markdown conversion?

HTML, PDF, images, and spreadsheet file types are supported — specifically,

  • .csv
  • .et
  • .htm
  • .html
  • .jpeg
  • .jpg
  • .numbers
  • .pdf
  • .png
  • .svg
  • .webp
  • .xls
  • .xlsb
  • .xlsm
  • .xlsx
  • .xml