API Reference

Extract Structured Data

Extract structured data from a document using a JSON schema. Provide a file for end-to-end processing, or a checkpoint_id from a previous /convert call to skip re-parsing.

POST/api/v1/extract

Authorizations

X-API-Keystringheaderrequired

Your API key for authentication

Body Parameters

page_schemastringbodyrequired

The JSON schema for structured extraction. Generate with Pydantic .model_dump_json() or write manually. Must contain a 'properties' key.

file_urlstringbody

Optional file URL. Provide either file/file_url or checkpoint_id.

checkpoint_idstringbody

Checkpoint ID from a previous /convert request (with save_checkpoint=true). Skips re-parsing.

modestringbody

Output mode for parsing (only used when providing a file). 'fast', 'balanced', or 'accurate'.

max_pagesintegerbody

Maximum number of pages to process.

page_rangestringbody

Page range to process, comma separated like 0,5-10,20.

output_formatstringbody

Output format for parsed text alongside extraction results. Defaults to 'markdown'.

save_checkpointbooleanbody

Save a checkpoint after processing for future calls.

skip_cachebooleanbody

Skip the cache and re-run.

webhook_urlstringbody

Optional webhook URL to call when complete.

filefilebody

Input PDF, Word, PowerPoint, or image file. Images must be png, jpg, or webp.

Cookies

wos-sessionstringcookie

Session cookie

access_tokenstringcookie

Access token cookie

datalab_active_teamstringcookie

Active team cookie

Response

Successful Response

import requests

url = "https://www.datalab.to/api/v1/extract"
files = {
    "file.0": ("example-file", open("example-file", "rb"))
}
payload = {
    "page_schema": "<string>",
    "file_url": "<string>",
    "mode": "fast",
    "skip_cache": "false"
}
headers = {"X-API-Key": "<api-key>"}
response = requests.post(url, data=payload, files=files, headers=headers)
print(response.text)

{
  "request_id": "<string>",
  "request_check_url": "<string>",
  "success": true,
  "error": "<string>",
  "versions": {}
}

← PreviousConvert Document Next →Segment Document