API Documentation
Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON.
https://api.papalily.com • Get your API key at RapidAPIIntroduction
The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render JavaScript-heavy sites (React, Vue, Angular, Next.js, etc.) before extracting data with Gemini AI.
Unlike traditional scrapers that break when a site's HTML structure changes, Papalily uses AI to understand the page semantically — your prompts keep working even after site redesigns.
Authentication
All API requests require an API key passed in the x-api-key request header. Get your key by subscribing on RapidAPI — the free plan includes 100 requests/month, no credit card needed.
Quick Start
Make your first request in under 60 seconds:
POST /scrape
The main endpoint. Renders the target URL in a real browser and extracts the requested data using AI. Average response time: 8–15 seconds.
Request Body
| Parameter | Type | Description |
|---|---|---|
| url | string | Required. The URL to scrape. |
| prompt | string | Required. Plain-English description of what data to extract. |
| wait_ms | number | Extra ms to wait after page load. Default: 2000. Max: 10000. |
| screenshot | boolean | Include screenshot in AI analysis. Default: true. |
| no_cache | boolean | Set true to bypass cache and force a fresh scrape. Default: false. |
POST /batch
Scrape up to 5 URLs in parallel in a single API call. Each URL in the batch counts as one request against your quota.
GET /usage
Returns your current API key usage statistics.
GET /status/:requestId
Look up a past scrape request by its ID. The request_id is returned in every /scrape and /batch response.
GET /health
Health check. No authentication required.
Writing Good Prompts
- Be specific: "Get all product names and their USD prices" beats "Get products"
- Mention structure: "Return as an array of objects with name and price fields"
- Specify limits: "Get the top 10 results" or "Get all items on the page"
- Use domain language: "Get the article headline, author, and publication date"
Caching
Papalily caches successful results in memory for 10 minutes. If you send the same URL + prompt within the cache window, you'll receive the result instantly — and it won't count against your quota.
Cached responses include "meta": { "cached": true } in the response body.
| Behaviour | Detail |
|---|---|
| Cache TTL | 10 minutes per URL + prompt pair |
| Max entries | 500 (oldest evicted when full) |
| Failed responses | Never cached — errors always retry fresh |
| Force refresh | Pass "no_cache": true to bypass cache |
| Quota impact | Cache hits do not count against your monthly quota |
Rate Limits
| Plan | Requests/month | Requests/minute | Batch size |
|---|---|---|---|
| Free | 100 | 5 | 5 URLs |
| Pro | Unlimited | 30 | 5 URLs |
| Enterprise | Unlimited | Custom | Custom |
Error Codes
| HTTP Status | Description |
|---|---|
| 400 | Missing or invalid url or prompt |
| 401 | Missing x-api-key header |
| 403 | Invalid API key |
| 429 | Monthly quota exceeded or rate limit hit |
| 500 | Browser or AI extraction failed |