Commit Graph

1 Commits

Author SHA1 Message Date
davide e78c404211 feat(chunks): pipeline unificata Stage 1+2 con md_optimizer
chunker.py ora esegue in sequenza:
  - Stage 1 (md_optimizer.py): _content_list_v2.json + _model.json → _clean.md
    con pulizia TOC, frontespizio, sommari interni, merge titoli capitolo
  - Stage 2: _clean.md → chunks.json (paragraph-overlap, atomici tabelle/liste)

config.py esteso con CHAPTER_PREFIX_PATTERNS, SOMMARIO_PATTERNS,
MODEL_SKIP_LABELS, MODEL_ABSTRACT_LABELS, MIN_CONTENT_CHARS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 16:07:40 +02:00