API Reference
Extract Structured Data
Extract structured data from a document using a JSON schema. Provide a file for end-to-end processing, or a checkpoint_id from a previous /convert call to skip re-parsing.
Authorizations
X-API-KeystringheaderrequiredYour API key for authentication
Body Parameters
page_schemastringbodyrequiredThe JSON schema for structured extraction. Generate with Pydantic .model_dump_json() or write manually. Must contain a 'properties' key.
file_urlstringbodyOptional file URL. Provide either file/file_url or checkpoint_id.
checkpoint_idstringbodyCheckpoint ID from a previous /convert request (with save_checkpoint=true). Skips re-parsing.
modestringbodyOutput mode for parsing (only used when providing a file). 'fast', 'balanced', or 'accurate'.
max_pagesintegerbodyMaximum number of pages to process.
page_rangestringbodyPage range to process, comma separated like 0,5-10,20.
output_formatstringbodyOutput format for parsed text alongside extraction results. Defaults to 'markdown'.
save_checkpointbooleanbodySave a checkpoint after processing for future calls.
skip_cachebooleanbodySkip the cache and re-run.
webhook_urlstringbodyOptional webhook URL to call when complete.
filefilebodyInput PDF, Word, PowerPoint, or image file. Images must be png, jpg, or webp.
Cookies
wos-sessionstringcookieSession cookie
access_tokenstringcookieAccess token cookie
datalab_active_teamstringcookieActive team cookie
Response
Successful Response
Extract Structured Data
import requests
url = "https://www.datalab.to/api/v1/extract"
files = {
"file.0": ("example-file", open("example-file", "rb"))
}
payload = {
"page_schema": "<string>",
"file_url": "<string>",
"mode": "fast",
"skip_cache": "false"
}
headers = {"X-API-Key": "<api-key>"}
response = requests.post(url, data=payload, files=files, headers=headers)
print(response.text)200Success
{
"request_id": "<string>",
"request_check_url": "<string>",
"success": true,
"error": "<string>",
"versions": {}
}