Introduction
Deepcrawl is an agents-oriented web page context extraction platform. Open Source. Open Code.
ACTIVE DEVELOPMENT WARNING!!
Deepcrawl is under active development and early-stage. All APIs and SDKs are subject to change. Many planned features are coming soon. Use at your own risk.
tl;dr one-liner
Deepcrawl is free and open-source Firecrawl alternative with even better performance, flexibility and transparency.
It extracts cleaned markdown of page content, agent-favoured links tree and page metadata that LLMs can digest with minimal token cost to reduce context switching and hallucination. Open Source. Open Code.
Core endpoints at a glance
getMarkdown: Fast GET endpoint that returns prompt-ready markdown—ideal for caching snippets or feeding LLM prompts.readUrl: Full POST operation with metadata, cleaned HTML, markdown, robots, sitemap data, and metrics in one payload.extractLinks: Configurable crawl endpoint that builds a agent-navigable links tree, filters external/media links.
Why Deepcrawl exists
- Compress messy web pages into structured markdown that agents and humans can read.
- Map every link in a domain so workflows understand site topology before they browse.
- Offer type-safe, production-ready APIs and SDKs that feel native inside agent frameworks like
ai-sdk.
How the system is put together
- Service pipeline: Cloudflare Workers handle fetching, normalization, and link graph generation for each domain or URL.
- Rendering outputs: Specially tuned engines convert HTML into markdown tuned for LLM prompts, keeping context rich while trimming noise.
- Developer toolchain: The turborepo(monorepo) hosts shared packages, database modules, and scripts that keep the full stack consistent.
What technologies make Deepcrawl possible
- Hono: Lightweight edge-first framework powering our worker routing and middleware.
- oRPC: Contract-first RPC layer that generates both OpenAPI REST endpoints and RPC methods exposed to SDK client from a single source.
- better-auth: All-powered enterprise-grade secure session and API key management so teams can ship without rolling their own auth stack.
- Next.js 16 App Router: Aggressively optimized dashboard combining with streaming routes and granular data fetching that manifests the best UI/UX industrial standards and stability.
- Cloudflare Workers: Global runtime for V8 runtime web scraping, caching, and rate limiting with minimal latency and cost control, along with service bindings for communication between workers and more in-coming services for more planed features.
- Zod v4: Shared schema library that enforces type safety
across workers, SDKs, and docs. The JS/TS SDK also exposes it at
deepcrawl/zod/v4as an app-level helper can share the same runtime instance. - React Query: Advanced server-side prefetching and cache synchronization for fast dashboard interactions.
- nuqs: URL state serialization supports our advanced custom states management for best efficiency and user experience with shareable custom operation options by copying the URL.
- shadcn/ui + Tailwind CSS v4: Beautiful and accessible components system and modern design tokens for consistent UI.
- Drizzle ORM: Type-safe data layer across both Neon PostgreSQL and Cloudflare D1 SQLite deployments for data persistence.
- Upstash Redis + Cloudflare KV: Rate-limiting and Layered caching strategy that balances hot-path performance with cost control.
- tsdown: tsdown is The Elegant Library Bundler.
tsdownis built on top of Rolldown, a cutting-edge bundler written in Rust. Already a substitute fortsup. - Biome: Unified formatter and linter keeping code style consistent across apps, packages, and workers.
Ways to use Deepcrawl today
- Developers: Consume the APIs directly or integrate with the fully typed JavaScript/TypeScript SDKs for fast agent workflows.
- Analysts & teams: Use the Next.js 16 dashboard to run crawls, explore the API playground, review logs, and manage API keys without writing code.
- Online Playground: Try Deepcrawl instantly in the browser, tweak options, and copy shareable URLs—see the playground guide for details.
Deploy on your own terms
Deepcrawl ships with everything you need to self-host: deploy the dashboard to Vercel, the workers to Cloudflare, and you have a complete system without paid dependencies. Fork it, extend it, or plug it into your existing stack—the licenses and tooling encourage customization.
Where to go next
- Follow the Quick Start at
/docs/quick-start - Dive into API usage at
- Explore the SDK reference