Extract Links
Crawl a page and return a structured site map with metadata via POST /links.
extractLinks is the deep, configurable crawl endpoint. It builds a hierarchical links tree site map, optionally enriches each node with metadata, and exports performance metrics—ideal for agents that need to understand a domain before planning actions.
No prerequisites required
Link extraction works by parsing the actual HTML content of your target page—no sitemap.xml, robots.txt, or other configuration files needed. Deepcrawl intelligently discovers links by analyzing the page structure, making it work on any website regardless of their SEO setup.
How does Links Tree look like in real usage?
This abbreviated snapshot comes from a real crawl of hono.dev, if you are logged into the dashboard already click this url from your browser to see the raw response, or you can try it out from here in playground.
When to use this endpoint
- You need a tree of internal pages in one response, including optional metadata per node.
- You want to configure link extraction (external links, media, query stripping, exclusion patterns).
- You plan to cache results or analyze crawl performance metrics.
For lighter GET-only usage (no request body), see getLinks. For page content rather than graph data, use the read endpoints.
Request formats
REST (POST /links)
curl \
-H "Authorization: Bearer $DEEPCRAWL_API_KEY" \
-H "Content-Type: application/json" \
-X POST "https://api.deepcrawl.dev/links" \
-d '{
"url": "https://example.com",
...extractLinksOptions, // see below
}'Node SDK - extractLinks()
import { DeepcrawlApp } from 'deepcrawl';
const deepcrawl = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY as string,
});
const tree = await deepcrawl.extractLinks('https://example.com', {
...extractLinksOptions,
});
console.log(tree.tree?.children?.length);Request body - ExtractLinksOptions
Prop
Type
Key controls:
tree: enable to receive the hierarchical tree; otherwise you get flat extracted links.linkExtractionOptions: include/exclude external links, media assets, strip query params, or provide regex exclusions.metadata,cleanedHtml,robots,sitemapXML,metaFiles: mirror the read endpoint options to enrich tree nodes.cacheOptions&metricsOptions: same behavior as other endpoints.
Response structure - ExtractLinksResponse
-
This is a union of two shapes:
ExtractLinksResponseWithTree(whentreeis enabled in options) – includes atreehierarchy you can traverse, and metadata is nested in the tree node.ExtractLinksResponseWithoutTree(whentreeis false in options) – omitstree, returning only extracted links and metadata.
ExtractLinksResponse
ExtractLinksResponseWithTreeProp
Type
ExtractLinksResponseWithoutTreeProp
Type
Type safely narrow by checking if ('tree' in response && response.tree) before reading the tree.
Errors follow the standard schema:
Prop
Type
Example response:
- With tree:
{
requestId: '123e4567-e89b-12d3-a456-426614174000',
success: true,
cached: false,
targetUrl: "https://example.com",
timestamp: "2024-01-15T10:30:00.000Z",
ancestors: ["https://example.com"],
tree: {
url: "https://example.com",
name: "Home",
lastUpdated: "2024-01-15T10:30:00.000Z",
metadata: { title: "Example", description: "..." },
extractedLinks: { internal: [...], external: [...] },
children: [...]
}
}- Without tree:
{
requestId: '123e4567-e89b-12d3-a456-426614174000',
success: true,
cached: false,
targetUrl: "https://example.com",
timestamp: "2024-01-15T10:30:00.000Z",
title: "Example Website",
description: "Welcome to our site",
metadata: { title: "Example", description: "..." },
extractedLinks: { internal: [...], external: [...] }
}Errors use the shared schema:
Prop
Type
Logs & observability
- Logged under
links-extractLinkswith full request/response data. - Export responses later via the Logs API to replay site maps or analyze crawl history.
- Rate limiting returns
RATE_LIMITED; consider caching large crawls.
Tips
- Prototype in the Playground to tune extraction patterns quickly.
- Use
excludePatternsto remove auth or tracking links;includeMediato capture assets. - Pair with
readUrlto fetch content for the highest-value pages discovered in the tree.
Need a quick GET request? See getLinks.