Crawler

Turn any website into agent-ready data

Scrape one page, search the web, map a domain, or crawl a whole site. Clean markdown, structured JSON, screenshots — straight into Assistant pipelines.

Scrape

One URL → clean markdown

JS rendering, anti-bot bypass, main-content extraction. Output as markdown, HTML, links, screenshot, branding tokens, or schema-validated JSON.

Search

Query the web, get pages back

Run a search, optionally scrape every result in the same call. Filter by language, country, time window (hour / day / week / month / year).

Map

Discover every URL

Fast sitemap generation up to 5,000 URLs per site. Filter by keyword, include subdomains, ranked by relevance — no full crawl required.

Crawl

Whole-site ingestion

Recursive crawl with depth limits, include/exclude path patterns, sitemap-only mode, async jobs with live progress and pagination.

Request lifecycle

Step 1
URL or query
Step 2
Render (JS / anti-bot)
Step 3
Extract & clean
Step 4
Format (md / json / image)
Step 5
Hand off to agent or store

Where teams use it

Knowledge ingestion

Crawl docs, FAQs, policy pages — pipe markdown into the Support Agent's RAG store. Auto-refresh on a schedule.

Lead enrichment

Map a prospect's site, scrape /pricing and /about, hand structured JSON to the Sales Agent before the first outreach.

Competitive monitoring

Scheduled scrape of competitor pages with diff detection. Trigger Slack alerts on copy, pricing, or feature changes.

Content extraction

Schema-driven JSON extraction for product catalogs, job boards, news sites. Drop straight into Postgres or BigQuery.

Under the hood

Engine	Firecrawl v2 (managed) or self-hosted Playwright workers
Auth	Server-side proxy — API key never touches the browser
Output	Markdown, raw HTML, links, screenshot (base64), branding tokens, schema JSON, AI summary
Reliability	Retries, polling, pagination, cancellable async jobs
Scale	Batch scrape, parallel workers, rate-limit aware
Compliance	robots.txt aware, configurable user agent, geo-targeting

All scraping runs server-side. Credentials, target URLs, and extracted content stay inside your tenant.

Crawler request flow

URL or query

Mode select

scrape/search/map/crawl

Firecrawl

render + extract

Format

md / json / png

Workflow

agent context

Crawler contract

Modes	scrape (1 URL) · search (query) · map (sitemap) · crawl (recursive)
Limits	Map ≤ 5,000 URLs · crawl depth + path filters · async jobs
Cost model	Per-page billing · cached for 24h · batch discount
Output	markdown · raw HTML · links · screenshot (b64) · schema JSON