davide
|
e78c404211
|
feat(chunks): pipeline unificata Stage 1+2 con md_optimizer
chunker.py ora esegue in sequenza:
- Stage 1 (md_optimizer.py): _content_list_v2.json + _model.json → _clean.md
con pulizia TOC, frontespizio, sommari interni, merge titoli capitolo
- Stage 2: _clean.md → chunks.json (paragraph-overlap, atomici tabelle/liste)
config.py esteso con CHAPTER_PREFIX_PATTERNS, SOMMARIO_PATTERNS,
MODEL_SKIP_LABELS, MODEL_ABSTRACT_LABELS, MIN_CONTENT_CHARS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-20 16:07:40 +02:00 |
|