tito-PDF Documentation

Output contract

tito-pdf intentionally has a small, stable output contract. The same rules apply to PDFs and DOCX (except where explicitly noted).

Output styles (two modes)

tito-pdf has two output styles.

If any of these flags are provided:

…then tito-pdf enters explicit output mode.

In explicit output mode:

2) Convenience mode (no explicit output paths)

If you do not provide any explicit output paths, deliverables follow the TITO folder convention:

Outputs (what each file contains)

Primary Markdown (--md-out or md/<id>.retrieve.md)

Best-effort Markdown reconstruction from the document content.

Notes:

Raw text (--raw-text-out)

Plaintext (UTF-8). This is intended for downstream slicing/cleanup tools that prefer raw text.

Tables Markdown (--tables-out or md/<id>.retrieve.tables.md)

A Markdown file containing one or more tables.

Format notes:

Tables audit JSON (--tables-audit-out)

A JSON file describing the accepted tables.

Rules:

The audit is intentionally “machine friendly”:

Assets JSON (--assets-json)

A compact JSON payload with runtime metadata, timings, and metrics.

Contract notes:

See: Assets JSON.

Intermediates

Intermediates (prepared PDFs / OCR outputs) are stored in a temporary working directory and deleted by default.

To preserve intermediates for debugging/audit:

Deliverables always go to md/; intermediates go to sessions/ only when requested.