API v1.0 — All systems operational

API Documentation

Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON.

💡

Base URL: https://api.papalily.com • Get your API key at RapidAPI

Introduction

The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render JavaScript-heavy sites (React, Vue, Angular, Next.js, etc.) before extracting data with Gemini AI.

Unlike traditional scrapers that break when a site's HTML structure changes, Papalily uses AI to understand the page semantically — your prompts keep working even after site redesigns.

Authentication

All API requests require an API key passed in the x-api-key request header. Get your key by subscribing on RapidAPI — the free plan includes 100 requests/month, no credit card needed.

curl https://api.papalily.com/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  ...

⚠️

Never expose your API key in client-side code. Always make requests from your server.

Quick Start

Make your first request in under 60 seconds:

curl -X POST https://api.papalily.com/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles"}'

POST /scrape

The main endpoint. Renders the target URL in a real browser and extracts the requested data using AI. Average response time: 8–15 seconds.

Request Body

Parameter	Type	Description
url	string	Required. The URL to scrape.
prompt	string	Required. Plain-English description of what data to extract.
wait_ms	number	Extra ms to wait after page load. Default: 2000. Max: 10000.
screenshot	boolean	Include screenshot in AI analysis. Default: true.
no_cache	boolean	Set `true` to bypass cache and force a fresh scrape. Default: false.

POST /batch

Scrape up to 5 URLs in parallel in a single API call. Each URL in the batch counts as one request against your quota.

curl -X POST https://api.papalily.com/batch \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "requests": [
      { "url": "https://news.ycombinator.com", "prompt": "Top 3 post titles" },
      { "url": "https://github.com/trending", "prompt": "Top 3 trending repos and stars" }
    ]
  }'

GET /usage

Returns your current API key usage statistics.

# Request
curl https://api.papalily.com/usage \
  -H "x-api-key: YOUR_API_KEY"

# Response
{
  "success": true,
  "plan": "pro",
  "requests_used": 47,
  "requests_limit": -1,
  "requests_remaining": "unlimited",
  "reset_date": "2026-04-01"
}

GET /status/:requestId

Look up a past scrape request by its ID. The request_id is returned in every /scrape and /batch response.

curl https://api.papalily.com/status/f47ac10b-58cc-4372-a567-0e02b2c3d479 \
  -H "x-api-key: YOUR_API_KEY"

GET /health

Health check. No authentication required.

{ "status": "ok", "ts": "2026-03-05T11:00:00.000Z" }

Writing Good Prompts

Be specific: "Get all product names and their USD prices" beats "Get products"
Mention structure: "Return as an array of objects with name and price fields"
Specify limits: "Get the top 10 results" or "Get all items on the page"
Use domain language: "Get the article headline, author, and publication date"

Caching

Papalily caches successful results in memory for 10 minutes. If you send the same URL + prompt within the cache window, you'll receive the result instantly — and it won't count against your quota.

Cached responses include "meta": { "cached": true } in the response body.

Behaviour	Detail
Cache TTL	10 minutes per URL + prompt pair
Max entries	500 (oldest evicted when full)
Failed responses	Never cached — errors always retry fresh
Force refresh	Pass `"no_cache": true` to bypass cache
Quota impact	Cache hits do not count against your monthly quota

Rate Limits

Plan	Requests/month	Requests/minute	Batch size
Free	100	5	5 URLs
Pro	Unlimited	30	5 URLs
Enterprise	Unlimited	Custom	Custom

Error Codes

HTTP Status	Description
400	Missing or invalid `url` or `prompt`
401	Missing `x-api-key` header
403	Invalid API key
429	Monthly quota exceeded or rate limit hit
500	Browser or AI extraction failed

Code Examples

E-commerce: Product Listings

{ "url": "https://shop.example.com/laptops",
  "prompt": "Get all laptop listings with name, price, rating, and review count" }

News: Article Data

{ "url": "https://techcrunch.com",
  "prompt": "Get the 10 most recent article titles, authors, dates, and URLs" }

Jobs: Listings

{ "url": "https://jobs.example.com/engineering",
  "prompt": "Get all job postings with title, company, location, salary, and apply URL",
  "wait_ms": 3000 }