step-2: add convert_pdf.py (pymupdf4llm, low-memory)
Converts PDFs in sources/ to Markdown using pymupdf4llm (pure C, ~30-50 MB RAM, no ML models). Output: step-2/<stem>/raw.md + clean.md.
This commit is contained in:
@@ -27,3 +27,6 @@ Thumbs.db
|
||||
step-0/*_step0_report.txt
|
||||
step-1/*_step1_report.txt
|
||||
|
||||
# Output step-2 — MD grezzo generato da marker
|
||||
step-2/*/
|
||||
|
||||
|
||||
Reference in New Issue
Block a user