step-2: add convert_pdf.py (pymupdf4llm, low-memory)

Converts PDFs in sources/ to Markdown using pymupdf4llm (pure C,
~30-50 MB RAM, no ML models). Output: step-2/<stem>/raw.md + clean.md.
This commit is contained in:
2026-04-13 10:00:42 +02:00
parent 3d9ed0141c
commit 346e336f1a
3 changed files with 86 additions and 0 deletions
+3
View File
@@ -27,3 +27,6 @@ Thumbs.db
step-0/*_step0_report.txt
step-1/*_step1_report.txt
# Output step-2 — MD grezzo generato da marker
step-2/*/