Turn any website into agent-ready data
Scrape one page, search the web, map a domain, or crawl a whole site. Clean markdown, structured JSON, screenshots — straight into Assistant pipelines.
Scrape
One URL → clean markdown
JS rendering, anti-bot bypass, main-content extraction. Output as markdown, HTML, links, screenshot, branding tokens, or schema-validated JSON.
Search
Query the web, get pages back
Run a search, optionally scrape every result in the same call. Filter by language, country, time window (hour / day / week / month / year).
Map
Discover every URL
Fast sitemap generation up to 5,000 URLs per site. Filter by keyword, include subdomains, ranked by relevance — no full crawl required.
Crawl
Whole-site ingestion
Recursive crawl with depth limits, include/exclude path patterns, sitemap-only mode, async jobs with live progress and pagination.
- Step 1URL or query
- Step 2Render (JS / anti-bot)
- Step 3Extract & clean
- Step 4Format (md / json / image)
- Step 5Hand off to agent or store
Where teams use it
Knowledge ingestion
Crawl docs, FAQs, policy pages — pipe markdown into the Support Agent's RAG store. Auto-refresh on a schedule.
Lead enrichment
Map a prospect's site, scrape /pricing and /about, hand structured JSON to the Sales Agent before the first outreach.
Competitive monitoring
Scheduled scrape of competitor pages with diff detection. Trigger Slack alerts on copy, pricing, or feature changes.
Content extraction
Schema-driven JSON extraction for product catalogs, job boards, news sites. Drop straight into Postgres or BigQuery.
Under the hood
| Engine | Firecrawl v2 (managed) or self-hosted Playwright workers |
| Auth | Server-side proxy — API key never touches the browser |
| Output | Markdown, raw HTML, links, screenshot (base64), branding tokens, schema JSON, AI summary |
| Reliability | Retries, polling, pagination, cancellable async jobs |
| Scale | Batch scrape, parallel workers, rate-limit aware |
| Compliance | robots.txt aware, configurable user agent, geo-targeting |
All scraping runs server-side. Credentials, target URLs, and extracted content stay inside your tenant.
Crawler request flow
| Modes | scrape (1 URL) · search (query) · map (sitemap) · crawl (recursive) |
| Limits | Map ≤ 5,000 URLs · crawl depth + path filters · async jobs |
| Cost model | Per-page billing · cached for 24h · batch discount |
| Output | markdown · raw HTML · links · screenshot (b64) · schema JSON |