Commit Graph

2 Commits

Author SHA1 Message Date
davide 346e336f1a step-2: add convert_pdf.py (pymupdf4llm, low-memory)
Converts PDFs in sources/ to Markdown using pymupdf4llm (pure C,
~30-50 MB RAM, no ML models). Output: step-2/<stem>/raw.md + clean.md.
2026-04-13 10:01:03 +02:00
davide 42c38c30f7 project setup: gitignore, CLAUDE.md, requirements
Aggiunge configurazione base del progetto:
- .gitignore: esclude venv, sources, processed, chroma_db e report generati
- CLAUDE.md: documenta l'uso obbligatorio del venv
- requirements.txt: dipendenze dirette (pdfplumber per step 0-1)
2026-04-13 08:02:54 +02:00