Deepcrawl
Installation and Setup

Node.js SDK

Install and use the official Deepcrawl JavaScript/TypeScript SDK client in Node runtimes.

Deepcrawl’s SDK wraps every API endpoint with typed methods, retry helpers, and structured errors. Use it anywhere you can run server-side JavaScript—Next.js route handlers, serverless functions, edge workers (Node-compatible), CLI tools, or background jobs.

Requirements

  • Node.js 18.17+ (Node 20+ recommended for built-in fetch and AbortController).
  • Access to a Deepcrawl API key (see Quick Start).
  • Environment variables stored securely (local .env, platform secrets, etc.).

DeepcrawlApp is server-only. Do not import it inside client-side components or ship it to the browser bundle.

Install the package

pnpm add deepcrawl
npm install deepcrawl
yarn add deepcrawl
bun add deepcrawl

Step 1. Configure environment variables

Choose where to persist your API key (shell export, .env file, secret manager). For Next.js, use .env.local and access via process.env in server code.

Optional: declare a custom DEEPCRAWL_API_URL if you self-host the workers. Leave it unset to use https://api.deepcrawl.dev.

Restart your dev server so the new environment variables load.

# .env.local (server only)
DEEPCRAWL_API_KEY=dc_your_API_key
# Optional for self-hosted API deployment
DEEPCRAWL_API_URL=https://your-worker.example.com

Step 2. Initialize the client

// src/lib/deepcrawl.ts
import { DeepcrawlApp } from 'deepcrawl';

export const deepcrawl = new DeepcrawlApp({
  apiKey: process.env.DEEPCRAWL_API_KEY as string,
  baseUrl: process.env.DEEPCRAWL_API_URL, // Optional for self-hosted API deployment
});

The constructor automatically sets headers (Authorization, x-api-key, User-Agent), negotiates retries for read/link endpoints, and uses keep‑alive HTTPS agents in Node runtimes. You can pass a custom fetch (for polyfills) or fetchOptions to tweak timeouts.

Basic usage

import { deepcrawl } from '@/lib/deepcrawl';

export async function getMarkdown(url: string) {
  const markdown = await deepcrawl.getMarkdown(url);
  // const markdown = await deepcrawl.getMarkdown(url, {...options}); // with more options

  return markdown; // string (clean markdown)
}
import { deepcrawl } from '@/lib/deepcrawl';

export async function readUrl(url: string) {
  const result = await deepcrawl.readUrl(url);
  // const result = await deepcrawl.readUrl(url, {...options}); // with more options

  return result;
}

Sharing one client instance

  • Instantiate the client once per process (singleton module export) to reuse HTTPS agents.
  • In serverless/edge functions, create the client outside the handler to benefit from warm invocations.

Handling errors and retries

import {
  DeepcrawlAuthError,
  DeepcrawlRateLimitError,
  DeepcrawlReadError,
} from 'deepcrawl';
import { deepcrawl } from '@/lib/deepcrawl';

export async function safeGetMarkdown(url: string) {
  try {
    return await deepcrawl.getMarkdown(url);
  } catch (error) {
    if (error instanceof DeepcrawlAuthError) {
      // Missing/invalid API key
      throw new Error('Check Deepcrawl credentials');
    }

    if (error instanceof DeepcrawlRateLimitError) {
      const retryAfter = error.data?.retryAfter ?? 60;
      throw new Error(`Rate limited, retry in ${retryAfter}s`);
    }

    if (error instanceof DeepcrawlReadError) {
      // Access error.data for the original request payload
      console.error('Read error:', error.data);
    }

    throw error;
  }
}
  • The SDK retries transient network errors and certain read/link operations up to two times. For custom logic, wrap calls with your own retry policy.
  • All error classes inherit from DeepcrawlError, so you can add a fallback handler.

Framework examples

If you want provide deepcrawl as your service in your system, you can use the following examples.

// app/api/markdown/route.ts
import { NextResponse } from 'next/server';
import { deepcrawl } from '@/lib/deepcrawl';

export async function POST(req: Request) {
  const { url } = await req.json();
  const markdown = await deepcrawl.getMarkdown(url);
  return NextResponse.json({ markdown });
}
// src/index.ts
import { Hono } from 'hono';
import { DeepcrawlApp } from 'deepcrawl';

type Bindings = {
  DEEPCRAWL_API_KEY: string;
};

const app = new Hono<{ Bindings: Bindings }>();

app.post('/read', async (c) => {
  const deepcrawl = new DeepcrawlApp({
    apiKey: c.env.DEEPCRAWL_API_KEY,
  });

  const { url } = await c.req.json();

  try {
    const result = await deepcrawl.readUrl(url);
    return c.json(result);
  } catch (error) {
    return c.json({ error: (error as Error).message }, 500);
  }
});

export default app;
// src/index.ts
import { DeepcrawlApp } from 'deepcrawl';

interface Env {
  DEEPCRAWL_API_KEY: string;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    if (request.method !== 'POST') {
      return new Response('Method not allowed', { status: 405 });
    }

    const deepcrawl = new DeepcrawlApp({
      apiKey: env.DEEPCRAWL_API_KEY,
    });

    try {
      const { url } = await request.json();
      const markdown = await deepcrawl.getMarkdown(url);

      return new Response(JSON.stringify({ markdown }), {
        headers: { 'Content-Type': 'application/json' },
      });
    } catch (error) {
      return new Response(
        JSON.stringify({ error: (error as Error).message }),
        { status: 500, headers: { 'Content-Type': 'application/json' } }
      );
    }
  },
};
import express from 'express';
import { deepcrawl } from './lib/deepcrawl';

const app = express();
app.use(express.json());

app.post('/api/read', async (req, res) => {
  try {
    const result = await deepcrawl.readUrl({
      url: req.body.url,
      metadata: true,
    });
    res.json(result);
  } catch (error) {
    res.status(500).json({ error: (error as Error).message });
  }
});

app.listen(3000);

Client Options Overview

import { DeepcrawlApp } from 'deepcrawl';

export const deepcrawl = new DeepcrawlApp({
  apiKey: process.env.DEEPCRAWL_API_KEY as string,
  baseUrl: process.env.DEEPCRAWL_API_URL, // optional self-hosted origin
  ...deepcrawlConfig, // see below
});

You can override fetch, fetchOptions, or headers during construction if you need custom agents, proxies, or tracing metadata.

DeepcrawlConfig

Prop

Type

DeepcrawlFetchOptions

Prop

Type

Base Hosts & Authentication

  • Production host: https://api.deepcrawl.dev.
  • Self-hosted Worker: pass baseUrl to new DeepcrawlApp({ baseUrl }) or send requests directly to your Worker origin.
  • Authentication: send Authorization: Bearer <DEEPCRAWL_API_KEY> or x-api-key: <DEEPCRAWL_API_KEY>. Dashboard sessions may also forward signed cookies; prefer API keys for automation.
  • Content types: POST endpoints expect JSON (UTF-8); GET endpoints accept query parameters that map to the option types below.

Rate limits & retries

  • Workers return 429 with code RATE_LIMITED and a retryAfter duration (seconds).
  • The SDK automatically retries idempotent operations (getMarkdown, readUrl, extractLinks) with exponential backoff unless the error is explicitly typed.
  • Enable caching (cacheOptions) to reduce repeated calls and token costs.

What’s next

  • Explore the SDK reference for every method, type, and option.
  • Check the Playground guide to prototype requests before porting them to code.
  • Combine with the Logs API to inspect past runs, collect analytics, or export markdown snapshots.