CLI reference
The installed tito-pdf --help output is the source of truth for flags.
tito-pdf --help
This page explains every parameter, including interactions that are hard to express in --help.
Metavars
In --help, placeholders mean:
PATH: a filesystem path (file path)DIR: a filesystem path (directory path)N: a number
Positional argument
input_path
Path to a .pdf or .docx.
Rules:
- Must exist on disk.
- Type is inferred from the extension (
.pdfor.docx).
Version
-V, --version
Prints a single line version header and exits.
Mode (high level)
--mode {fast,robust,best}
A convenience knob that maps to lower-level OCR/tables behavior.
robust(default)- OCR: conservative (attempt OCR with
ocrmypdf --skip-textwhen available) - Tables: strict
- OCR: conservative (attempt OCR with
fast- OCR: disabled (unless you explicitly force OCR)
- Tables: strict
best- OCR: forced (unless you explicitly disable OCR)
- Tables: strict first; if no tables are accepted, retry with lenient table detection
Important: explicit flags win over --mode:
--no-ocrand--force-ocroverride OCR behavior.--tables-lenientoverrides the table strict/lenient choice.
Output selection: explicit vs convenience
tito-pdf has two output styles:
1) Explicit output mode Triggered when any explicit output path is set:
--md-out PATH--raw-text-out PATH--tables-out PATH--tables-audit-out PATH--assets-json PATH
In explicit output mode:
tito-pdfwrites only to the provided paths.--out-diris ignored.- convenience toggles (
--text,--tables,--all) are ignored.
2) Convenience mode Used when no explicit output paths are set.
In convenience mode (TITO-aligned folder structure):
- Deliverables go to
<out-dir>/md/(defaultout-diris CWD). - Naming:
md/<id>.retrieve.mdandmd/<id>.retrieve.tables.md. --id IDsets the output prefix (defaults to input filename stem if omitted).--tablesor--alladds<id>.retrieve.tables.md.--keep-sessionspreserves intermediate files insessions/run-YYYYMMDD_HHMMSS/.
Output paths (explicit)
--md-out PATH
Write primary Markdown output to PATH.
Notes:
- Parent directories are created.
- The write is atomic (write a temp file then rename).
--raw-text-out PATH
Write extracted plaintext (UTF-8) to PATH.
Why it exists:
- downstream slicing/cleanup often prefers plaintext over Markdown.
--tables-out PATH
Write extracted tables as Markdown to PATH.
Notes:
- When no tables are detected, the output is the literal string
(No tables detected.)followed by a newline.
--tables-audit-out PATH
Write a JSON audit describing accepted tables.
Rules:
- Requires
--tables-out PATH.
--assets-json PATH
Write a JSON payload with runtime metadata and metrics.
Important:
--assets-jsonis a companion output; you must also request at least one content output (--md-outand/or--raw-text-outand/or--tables-out).
See: Assets JSON.
Convenience directory
--out-dir DIR
Base directory for deliverables.
- Deliverables go to
<out-dir>/md/. - Default: current working directory.
- Used only when no explicit output paths are provided.
- If you provide any explicit output path,
--out-diris ignored.
--id ID
Identifier for output filenames.
- Outputs:
md/<id>.retrieve.md,md/<id>.retrieve.tables.md. - Default: input filename stem (with a warning).
--keep-sessions
Preserve intermediate files in sessions/run-YYYYMMDD_HHMMSS/.
- Useful for debugging and audit.
- Intermediates include: prepared.pdf, ocr.pdf.
Convenience toggles
These toggles only matter in convenience mode.
--text
Write Markdown output.
- In convenience mode, Markdown is already the default.
- This flag exists for symmetry with
--tables/--all.
--tables
Write tables Markdown output (<stem>.tables.md).
--all
Write both Markdown and tables Markdown.
Tables behavior
--tables-lenient
Enable text-based table detection (higher recall, more false positives).
Notes:
- In
--mode best, lenient tables can be enabled automatically as a fallback when strict finds no tables.
See: Tables.
OCR behavior
--no-ocr
Disable the OCR stage.
--force-ocr
Force OCR even if the PDF already has a text layer.
Notes:
- If both
--no-ocrand--force-ocrare set,--no-ocrwins.
See: OCR.
Debug
--max-pages N
Limit pages processed. Used for debugging performance and false positives.
0means “all pages”.
Exit codes
0: success2: CLI usage error (unsupported file type, missing file, invalid option combination)1: runtime failure (e.g. failed to produce a requested output)