Usage
tito-pdf writes outputs to files. It prints short status lines to stderr (e.g. Wrote: ... and OK).
Recommended: explicit primary Markdown output
Use --md-out so the output location is unambiguous and scriptable:
tito-pdf input.pdf --md-out out/input.md
DOCX:
tito-pdf input.docx --md-out out/input.md
Optional: raw text output (integration)
If you need a plaintext stream for downstream slicing/cleanup:
tito-pdf input.pdf \
--mode fast \
--md-out out/input.md \
--raw-text-out out/input.raw.txt
Convenience mode (human workflow)
If you do not provide any explicit output paths, tito-pdf writes next to the input file by default:
tito-pdf /path/to/input.pdf
# => /path/to/input.md
Write into a directory:
tito-pdf /path/to/input.pdf --out-dir out
# => out/input.md
Tables in convenience mode:
tito-pdf input.pdf --tables --out-dir out
# => out/input.tables.md
Tables only (explicit paths):
tito-pdf input.pdf \
--mode fast \
--tables-out out/input.tables.md \
--tables-audit-out out/input.tables.audit.json
Text + tables in convenience mode:
tito-pdf input.pdf --all --out-dir out
# => out/input.md + out/input.tables.md
Explicit output paths (integration mode)
Any explicit output path flag enables explicit output mode.
In explicit output mode:
tito-pdfwrites only to the paths you provide.--out-dirand the convenience toggles (--text,--tables,--all) are ignored.
Example:
tito-pdf input.pdf \
--mode fast \
--md-out out/input.md \
--raw-text-out out/input.raw.txt \
--tables-out out/input.tables.md \
--tables-audit-out out/input.tables.audit.json \
--assets-json out/input.assets.json
Notes:
--tables-audit-outrequires--tables-out.--assets-jsonis a companion output; you must also request at least one content output (--md-outand/or--raw-text-outand/or--tables-out).
Choosing a mode
tito-pdf exposes a single high-level knob: --mode.
--mode robust(default)- Conservative OCR behavior (
ocrmypdf --skip-textwhen available). - Strict table detection.
- Good default when you don’t know the PDF quality.
- Conservative OCR behavior (
--mode fast- Disables OCR.
- Best for PDFs with a good text layer and for quick iteration.
--mode best- Forces OCR.
- If strict table detection finds no tables, it automatically retries with lenient table detection.
- Best for scanned PDFs or “bad text layer” PDFs.
Overrides (explicit flags win over --mode):
--no-ocr--force-ocr--tables-lenient
Debug / iteration: limit pages
If you are debugging performance or table false positives, limit pages:
tito-pdf input.pdf --mode fast --md-out out/input.md --max-pages 10