feat: integra pipeline PDF→Markdown a 9 stadi e test suite

Porta da main la riscrittura completa di conversione/_pipeline/ (9 stadi
PyMuPDF) e la suite tests/ senza modificare chunks/, step-8/, rag.py,
ollama/, retrieve.py, config.py.

requirements.txt: aggiunge PyMuPDF>=1.24.0 e pytest>=8.0, mantiene chromadb,
rimuove opendataloader-pdf e pymupdf4llm.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-11 14:46:16 +02:00
parent a7b71fa508
commit ebd2a43f84
39 changed files with 3688 additions and 153 deletions
+4 -1
View File
@@ -27,8 +27,11 @@ __pycache__/
Thumbs.db
# Output conversione/ — generati da conversione/pipeline.py
# Output conversione/ — generati dagli script
conversione/*/
!conversione/_pipeline/
!conversione/_pipeline/**
conversione/_pipeline/__pycache__/
# Output chunks/ — generati da chunks/chunker.py e chunks/verify_chunks.py
chunks/*/