Deepcrawl

Extract Links

Crawl a page and return a structured site map with metadata via POST /links.

extractLinks is the deep, configurable crawl endpoint. It builds a hierarchical links tree site map, optionally enriches each node with metadata, and exports performance metrics—ideal for agents that need to understand a domain before planning actions.

No prerequisites required

Link extraction works by parsing the actual HTML content of your target page—no sitemap.xml, robots.txt, or other configuration files needed. Deepcrawl intelligently discovers links by analyzing the page structure, making it work on any website regardless of their SEO setup.

This abbreviated snapshot comes from a real crawl of hono.dev, if you are logged into the dashboard already click this url from your browser to see the raw response, or you can try it out from here in playground.

https://hono.dev/docs/concepts/motivation
https://hono.dev/docs/concepts/routers
https://hono.dev/docs/getting-started/basic
https://hono.dev/docs/api/hono
https://hono.dev/examples/web-api
https://hono.dev/examples/proxy
https://hono.dev/llms.txt
https://hono.dev/llms-full.txt
https://hono.dev/llms-small.txt

When to use this endpoint

  • You need a tree of internal pages in one response, including optional metadata per node.
  • You want to configure link extraction (external links, media, query stripping, exclusion patterns).
  • You plan to cache results or analyze crawl performance metrics.

For lighter GET-only usage (no request body), see getLinks. For page content rather than graph data, use the read endpoints.

Request formats

REST (POST /links)

curl \
  -H "Authorization: Bearer $DEEPCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -X POST "https://api.deepcrawl.dev/links" \
  -d '{
    "url": "https://example.com",
    ...extractLinksOptions, // see below
  }'
import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({
  apiKey: process.env.DEEPCRAWL_API_KEY as string,
});

const tree = await deepcrawl.extractLinks('https://example.com', {
  ...extractLinksOptions,
});

console.log(tree.tree?.children?.length);

Request body - ExtractLinksOptions

Prop

Type

Key controls:

  • tree: enable to receive the hierarchical tree; otherwise you get flat extracted links.
  • linkExtractionOptions: include/exclude external links, media assets, strip query params, or provide regex exclusions.
  • metadata, cleanedHtml, robots, sitemapXML, metaFiles: mirror the read endpoint options to enrich tree nodes.
  • cacheOptions & metricsOptions: same behavior as other endpoints.

Response structure - ExtractLinksResponse

  • This is a union of two shapes:

    1. ExtractLinksResponseWithTree (when tree is enabled in options) – includes a tree hierarchy you can traverse, and metadata is nested in the tree node.
    2. ExtractLinksResponseWithoutTree (when tree is false in options) – omits tree, returning only extracted links and metadata.

ExtractLinksResponse

  • ExtractLinksResponseWithTree

    Prop

    Type

  • ExtractLinksResponseWithoutTree

    Prop

    Type

Type safely narrow by checking if ('tree' in response && response.tree) before reading the tree.

Errors follow the standard schema:

Prop

Type

Example response:

  • With tree:
{
  requestId: '123e4567-e89b-12d3-a456-426614174000',
  success: true,
  cached: false,
  targetUrl: "https://example.com",
  timestamp: "2024-01-15T10:30:00.000Z",
  ancestors: ["https://example.com"],
  tree: {
    url: "https://example.com",
    name: "Home",
    lastUpdated: "2024-01-15T10:30:00.000Z",
    metadata: { title: "Example", description: "..." },
    extractedLinks: { internal: [...], external: [...] },
    children: [...]
  }
}
  • Without tree:
{
  requestId: '123e4567-e89b-12d3-a456-426614174000',
  success: true,
  cached: false,
  targetUrl: "https://example.com",
  timestamp: "2024-01-15T10:30:00.000Z",
  title: "Example Website",
  description: "Welcome to our site",
  metadata: { title: "Example", description: "..." },
  extractedLinks: { internal: [...], external: [...] }
}

Errors use the shared schema:

Prop

Type

Logs & observability

  • Logged under links-extractLinks with full request/response data.
  • Export responses later via the Logs API to replay site maps or analyze crawl history.
  • Rate limiting returns RATE_LIMITED; consider caching large crawls.

Tips

  • Prototype in the Playground to tune extraction patterns quickly.
  • Use excludePatterns to remove auth or tracking links; includeMedia to capture assets.
  • Pair with readUrl to fetch content for the highest-value pages discovered in the tree.

Need a quick GET request? See getLinks.