API v1.0 — All systems operational

API Documentation

Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON.

💡
Base URL: https://api.papalily.com  •  Get your API key at RapidAPI

Introduction

The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render JavaScript-heavy sites (React, Vue, Angular, Next.js, etc.) before extracting data with Gemini AI.

Unlike traditional scrapers that break when a site's HTML structure changes, Papalily uses AI to understand the page semantically — your prompts keep working even after site redesigns.

Authentication

All API requests require an API key passed in the x-api-key request header. Get your key by subscribing on RapidAPI — the free plan includes 100 requests/month, no credit card needed.

Authentication header
curl https://api.papalily.com/scrape \ -H "x-api-key: YOUR_API_KEY" \ ...
⚠️
Never expose your API key in client-side code. Always make requests from your server.

Quick Start

Make your first request in under 60 seconds:

curl -X POST https://api.papalily.com/scrape \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles"}'

POST /scrape

The main endpoint. Renders the target URL in a real browser and extracts the requested data using AI. Average response time: 8–15 seconds.

Request Body

ParameterTypeDescription
urlstringRequired. The URL to scrape.
promptstringRequired. Plain-English description of what data to extract.
wait_msnumberExtra ms to wait after page load. Default: 2000. Max: 10000.
screenshotbooleanInclude screenshot in AI analysis. Default: true.
no_cachebooleanSet true to bypass cache and force a fresh scrape. Default: false.

POST /batch

Scrape up to 5 URLs in parallel in a single API call. Each URL in the batch counts as one request against your quota.

curl -X POST https://api.papalily.com/batch \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "requests": [ { "url": "https://news.ycombinator.com", "prompt": "Top 3 post titles" }, { "url": "https://github.com/trending", "prompt": "Top 3 trending repos and stars" } ] }'

GET /usage

Returns your current API key usage statistics.

# Request curl https://api.papalily.com/usage \ -H "x-api-key: YOUR_API_KEY" # Response { "success": true, "plan": "pro", "requests_used": 47, "requests_limit": -1, "requests_remaining": "unlimited", "reset_date": "2026-04-01" }

GET /status/:requestId

Look up a past scrape request by its ID. The request_id is returned in every /scrape and /batch response.

curl https://api.papalily.com/status/f47ac10b-58cc-4372-a567-0e02b2c3d479 \ -H "x-api-key: YOUR_API_KEY"

GET /health

Health check. No authentication required.

{ "status": "ok", "ts": "2026-03-05T11:00:00.000Z" }

Writing Good Prompts

  • Be specific: "Get all product names and their USD prices" beats "Get products"
  • Mention structure: "Return as an array of objects with name and price fields"
  • Specify limits: "Get the top 10 results" or "Get all items on the page"
  • Use domain language: "Get the article headline, author, and publication date"

Caching

Papalily caches successful results in memory for 10 minutes. If you send the same URL + prompt within the cache window, you'll receive the result instantly — and it won't count against your quota.

Cached responses include "meta": { "cached": true } in the response body.

BehaviourDetail
Cache TTL10 minutes per URL + prompt pair
Max entries500 (oldest evicted when full)
Failed responsesNever cached — errors always retry fresh
Force refreshPass "no_cache": true to bypass cache
Quota impactCache hits do not count against your monthly quota

Rate Limits

PlanRequests/monthRequests/minuteBatch size
Free10055 URLs
ProUnlimited305 URLs
EnterpriseUnlimitedCustomCustom

Error Codes

HTTP StatusDescription
400Missing or invalid url or prompt
401Missing x-api-key header
403Invalid API key
429Monthly quota exceeded or rate limit hit
500Browser or AI extraction failed

Code Examples

E-commerce: Product Listings

{ "url": "https://shop.example.com/laptops", "prompt": "Get all laptop listings with name, price, rating, and review count" }

News: Article Data

{ "url": "https://techcrunch.com", "prompt": "Get the 10 most recent article titles, authors, dates, and URLs" }

Jobs: Listings

{ "url": "https://jobs.example.com/engineering", "prompt": "Get all job postings with title, company, location, salary, and apply URL", "wait_ms": 3000 }