docs
UAEN
Docs/Output Formats/Word/PDF Documents

Word/PDF Documents

Rekognita can not only recognize documents, but also generate restructured documents in DOCX and PDF formats with preserved structure, tables, and images.

How it works

Rekognita parses the input document, builds a graph structure, and then generates a clean DOCX or PDF with correct formatting:

  • Headings preserve their hierarchy (H1, H2, H3...)
  • Tables preserve columns, rows, and headers
  • Images and charts are inserted with their captions
  • Lists preserve numbering and nesting
  • Footnotes and references are restored

API Request

POST /v1/documents/convert HTTP/1.1
Host: api.rekognita.com
Authorization: Bearer rk_sk_your_key
Content-Type: multipart/form-data

file=@scanned_document.pdf
output_format=docx           # or "pdf"
model=rekognita-accurate

Response

{
  "id": "doc_abc123",
  "status": "completed",
  "output_format": "docx",
  "download_url": "/v1/documents/doc_abc123/download",
  "expires_at": "2025-01-15T12:00:00Z",
  "pages": 5,
  "metadata": {
    "processing_time_ms": 3200,
    "model": "rekognita-accurate",
    "tables_found": 3,
    "images_found": 2
  }
}

Using the SDK

from rekognita import RekognitaClient

client = RekognitaClient()

# Convert a scan to DOCX
result = client.documents.convert(
    file="scanned_invoice.pdf",
    output_format="docx",
    model="rekognita-accurate"
)

# Download the ready DOCX
result.download("output/restructured_invoice.docx")
print(f"Saved: {result.pages} pages, {result.metadata.tables_found} tables")

# Or get bytes
docx_bytes = result.content_bytes
with open("output.docx", "wb") as f:
    f.write(docx_bytes)

Additional Parameters

ParameterTypeDescription
output_formatstring"docx" or "pdf"
include_imagesbooleanInclude images (default: true)
page_rangestringPage range, e.g. "1-5"
templatestringTemplate ID for output styling
languagestringDocument language for OCR (default: auto-detect)

Comparison with Competitors

Other OCR tools output only flat text. Rekognita generates complete documents with restored structure — ready for use without manual editing.