Extract Links

extractLinks is the deep, configurable crawl endpoint. It builds a hierarchical links tree site map, optionally enriches each node with metadata, and exports performance metrics—ideal for agents that need to understand a domain before planning actions.

No prerequisites required

Link extraction works by parsing the actual HTML content of your target page—no sitemap.xml, robots.txt, or other configuration files needed. Deepcrawl intelligently discovers links by analyzing the page structure, making it work on any website regardless of their SEO setup.

How does `Links Tree` look like in real usage?

This abbreviated snapshot comes from a real crawl of hono.dev, if you are logged into the dashboard already click this url from your browser to see the raw response, or you can try it out from here in playground.

https://hono.dev/docs/concepts/motivation

https://hono.dev/docs/concepts/routers

https://hono.dev/docs/getting-started/basic

https://hono.dev/docs/api/hono

https://hono.dev/examples/web-api

https://hono.dev/examples/proxy

https://hono.dev/llms.txt

https://hono.dev/llms-full.txt

https://hono.dev/llms-small.txt

When to use this endpoint

You need a tree of internal pages in one response, including optional metadata per node.
You want to configure link extraction (external links, media, query stripping, exclusion patterns).
You plan to cache results or analyze crawl performance metrics.

For lighter GET-only usage (no request body), see getLinks. For page content rather than graph data, use the read endpoints.

Request formats

REST (POST /links)

curl \
  -H "Authorization: Bearer $DEEPCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -X POST "https://api.deepcrawl.dev/links" \
  -d '{
    "url": "https://example.com",
    ...extractLinksOptions, // see below
  }'

Node SDK - `extractLinks()`

import { DeepcrawlApp } from 'deepcrawl';

const deepcrawl = new DeepcrawlApp({
  apiKey: process.env.DEEPCRAWL_API_KEY as string,
});

const tree = await deepcrawl.extractLinks('https://example.com', {
  ...extractLinksOptions,
});

console.log(tree.tree?.children?.length);

Request body - `ExtractLinksOptions`

Prop

Type

Key controls:

tree: enable to receive the hierarchical tree; otherwise you get flat extracted links.
linkExtractionOptions: include/exclude external links, media assets, strip query params, or provide regex exclusions.
metadata, cleanedHtml, robots, sitemapXML, metaFiles: mirror the read endpoint options to enrich tree nodes.
cacheOptions & metricsOptions: same behavior as other endpoints.

Response structure - `ExtractLinksResponse`

This is a union of two shapes:
1. ExtractLinksResponseWithTree (when tree is enabled in options) – includes a tree hierarchy you can traverse, and metadata is nested in the tree node.
2. ExtractLinksResponseWithoutTree (when tree is false in options) – omits tree, returning only extracted links and metadata.

ExtractLinksResponse

ExtractLinksResponseWithTree
Prop
Type
ExtractLinksResponseWithoutTree
Prop
Type

Type safely narrow by checking if ('tree' in response && response.tree) before reading the tree.

Errors follow the standard schema:

Prop

Type

Example response:

With tree:

{
  requestId: '123e4567-e89b-12d3-a456-426614174000',
  success: true,
  cached: false,
  targetUrl: "https://example.com",
  timestamp: "2024-01-15T10:30:00.000Z",
  ancestors: ["https://example.com"],
  tree: {
    url: "https://example.com",
    name: "Home",
    lastUpdated: "2024-01-15T10:30:00.000Z",
    metadata: { title: "Example", description: "..." },
    extractedLinks: { internal: [...], external: [...] },
    children: [...]
  }
}

Without tree:

{
  requestId: '123e4567-e89b-12d3-a456-426614174000',
  success: true,
  cached: false,
  targetUrl: "https://example.com",
  timestamp: "2024-01-15T10:30:00.000Z",
  title: "Example Website",
  description: "Welcome to our site",
  metadata: { title: "Example", description: "..." },
  extractedLinks: { internal: [...], external: [...] }
}

Errors use the shared schema:

Prop

Type

Logs & observability

Logged under links-extractLinks with full request/response data.
Export responses later via the Logs API to replay site maps or analyze crawl history.
Rate limiting returns RATE_LIMITED; consider caching large crawls.

Tips

Prototype in the Playground to tune extraction patterns quickly.
Use excludePatterns to remove auth or tracking links; includeMedia to capture assets.
Pair with readUrl to fetch content for the highest-value pages discovered in the tree.

Need a quick GET request? See getLinks.

Extract Links

On this page