From a1586343788e285efbf5e1622755c6054c9bfcec Mon Sep 17 00:00:00 2001
From: Davide Grilli <davide.grilli@outlook.com>
Date: Thu, 30 Apr 2026 12:20:00 +0200
Subject: [PATCH] =?UTF-8?q?refactor:=20riduci=20repo=20alla=20sola=20fase?=
 =?UTF-8?q?=20di=20conversione=20PDF=20=E2=86=92=20Markdown?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rimossi chunks/, step-8/, ollama/, chroma_db/, rag.py, retrieve.py,
config.py e chromadb da requirements. Aggiornati README e CLAUDE.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 CLAUDE.md               |  56 +++--
 README.md               | 153 +++-----------
 chunks/chunker.py       | 443 ----------------------------------------
 chunks/fix_chunks.py    | 283 -------------------------
 chunks/verify_chunks.py | 334 ------------------------------
 config.py               |  54 -----
 ollama/README.md        | 113 ----------
 ollama/check_env.py     | 250 -----------------------
 ollama/test_ollama.py   |  66 ------
 rag.py                  | 252 -----------------------
 requirements.txt        |   1 -
 retrieve.py             | 217 --------------------
 step-8/README.md        | 114 -----------
 step-8/ingest.py        | 232 ---------------------
 14 files changed, 71 insertions(+), 2497 deletions(-)
 delete mode 100644 chunks/chunker.py
 delete mode 100644 chunks/fix_chunks.py
 delete mode 100644 chunks/verify_chunks.py
 delete mode 100644 config.py
 delete mode 100644 ollama/README.md
 delete mode 100644 ollama/check_env.py
 delete mode 100644 ollama/test_ollama.py
 delete mode 100644 rag.py
 delete mode 100644 retrieve.py
 delete mode 100644 step-8/README.md
 delete mode 100644 step-8/ingest.py
diff --git a/CLAUDE.md b/CLAUDE.md
index fc0e27b..2fddce7 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,4 +1,6 @@
-# CLAUDE.md — RAG from Scratch
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 
 ## Regole invarianti
 
@@ -11,36 +13,58 @@
 ## Pipeline
 
 ```
-PDF → conversione → chunking → verifica → vettorizzazione → retrieval
+PDF → conversione → clean.md
 ```
 
-`--stem` = nome PDF senza estensione = nome collection ChromaDB.
-
-Per i path degli script e degli output usa `git ls-files` o esplora la root: la struttura è in evoluzione verso un programma unico.
+`--stem` = nome PDF senza estensione.
 
 ---
 
-## Configurazione
+## Comandi
 
-`config.py` è la fonte di verità: `EMBED_MODEL`, `OLLAMA_MODEL`, `TOP_K`, `TEMPERATURE`, `SYSTEM_PROMPT`.
+```bash
+# Converti un PDF
+python conversione/pipeline.py --stem <nome>
 
-**Se cambi `EMBED_MODEL`:** riesegui ingest con `--force` — embedding incoerenti non producono errori ma risposte insensate.
+# Tutti i PDF in sources/
+python conversione/pipeline.py
 
-**Se cambi `MIN_CHARS` / `MAX_CHARS`:** cerca tutte le occorrenze nel repo e sincronizza.
+# Forza riesecuzione
+python conversione/pipeline.py --stem <nome> --force
+
+# Validazione batch di tutti gli stem convertiti
+python conversione/validate.py
+```
 
 ---
 
-## Workflow consigliato
+## Architettura
 
-1. Converti il PDF con lo script di conversione
-2. `/prepare-md conversione/<stem>/clean.md`
-3. Chunking
-4. Vettorizza con `--stem <stem>`
-6. `python rag.py --stem <stem>`
+### `conversione/pipeline.py`
+
+Quattro fasi in sequenza:
+
+1. **Validazione** — verifica che il PDF sia digitale, non protetto, non vuoto.
+2. **Estrazione** — usa `opendataloader-pdf` (XY-Cut++) con Java 11+ per ricostruire l'ordine di lettura corretto, anche in documenti multi-colonna.
+3. **Pulizia strutturale** — serie di trasformazioni su `raw.md`: fix accenti backtick LaTeX, rimozione TOC e dot-leader, normalizzazione header, unione paragrafi spezzati da salto pagina, rimozione URL watermark, ecc.
+4. **Analisi struttura** — rileva gerarchia (`#`/`##`/`###`), lingua, lunghezza media sezioni e scrive `structure_profile.json`.
+
+Output per ogni stem in `conversione/<stem>/`:
+- `raw.md` — grezzo, immutabile
+- `clean.md` — copia di lavoro da revisionare con `/prepare-md`
+- `structure_profile.json` — struttura rilevata + `strategia_chunking` (`h3_aware`, `h2_paragraph_split`, `paragraph`, `sliding_window`)
+- `report.json` — metriche complete (trasformazioni, anomalie, distribuzione lunghezze)
+
+### `conversione/validate.py`
+
+Legge i `report.json` di tutti gli stem e stampa una tabella di stato. Segnala: bare header, sezioni corte/lunghe, backtick residui, dot-leader.
+
+### `conversione/clear.sh`
+
+Rimuove gli output di conversione per uno stem (`conversione/<stem>/`).
 
 ---
 
 ## Skills custom
 
 - `/prepare-md <path|stem>` — corregge `clean.md`: sillabazione, artefatti, header, paragrafi spezzati, gerarchia.
-- `/step6-fix <stem>` — verifica chunk, dry-run e applicazione fix via `fix_chunks.py`.
diff --git a/README.md b/README.md
index 262debf..a3ca353 100644
--- a/README.md
+++ b/README.md
@@ -1,62 +1,9 @@
-# RAG from Scratch
+# PDF → Markdown
 
-Sistema RAG (Retrieval-Augmented Generation) costruito da zero, senza framework di alto livello.
-Funziona su qualsiasi PDF digitale. Gira interamente in locale, senza GPU, senza cloud.
+Converte PDF digitali in Markdown strutturato e pulito.
 
-**Stack:** Python · Ollama · ChromaDB  
-**Compatibile con:** Linux · macOS · Windows (WSL2) · CPU only · ~8 GB RAM
-
----
-
-## Pipeline
-
-```
-PDF → conversione → chunking → verifica → vettorizzazione → retrieval
-```
-
-| Fase | Rischio | Motivo |
-|---|---|---|
-| Conversione | 🟡 Medio | Automatica, ma il PDF deve essere digitale e non protetto |
-| Revisione Markdown | 🔴 Alto | La qualità del MD determina la qualità del RAG |
-| Chunking | 🟡 Medio | Adattivo, dipende dalla qualità del MD |
-| Vettorizzazione | 🟢 Basso | Meccanica, lenta ma affidabile |
-| Retrieval | 🟡 Medio | Dipende dai parametri in `config.py` |
-
----
-
-## Struttura del progetto
-
-```
-rag/
-├── sources/                  # PDF originali — non modificare
-├── conversione/              # PDF → Markdown strutturato
-│   ├── pipeline.py
-│   ├── validate.py
-│   └── <stem>/
-│       ├── raw.md            # grezzo — non modificare
-│       ├── clean.md          # copia di lavoro
-│       └── report.json
-├── step-5/                   # Chunking
-│   ├── chunker.py
-│   └── <stem>/chunks.json
-├── step-6/                   # Verifica e fix chunk
-│   ├── verify_chunks.py
-│   ├── fix_chunks.py
-│   └── <stem>/
-│       ├── chunks.json
-│       └── report.json
-├── step-8/                   # Vettorizzazione
-│   └── ingest.py
-├── ollama/                   # Setup ambiente
-│   ├── check_env.py
-│   └── test_ollama.py
-├── chroma_db/                # Vector store (generato)
-├── config.py                 # Configurazione pipeline ← modifica qui
-├── rag.py                    # Interrogazione RAG interattiva
-└── retrieve.py               # Retrieval puro (senza LLM)
-```
-
-`--stem` = nome del PDF senza estensione = nome della collection ChromaDB.
+**Stack:** Python · opendataloader-pdf (XY-Cut++) · Java 11+  
+**Compatibile con:** Linux · macOS · Windows (WSL2)
 
 ---
 
@@ -68,92 +15,54 @@ source .venv/bin/activate
 pip install -r requirements.txt
 ```
 
-**Java 11+** richiesto per la conversione (`opendataloader-pdf`):
+**Java 11+** richiesto:
 
 ```bash
 sudo apt install default-jdk   # Ubuntu/Debian/WSL
-java -version                  # verifica
+java -version
 ```
 
-Vedi [`ollama/README.md`](ollama/README.md) per l'installazione di Ollama e il download dei modelli.
-
 ---
 
-## Workflow
-
-### 1. Converti il PDF
+## Utilizzo
 
 ```bash
+# Singolo PDF
 python conversione/pipeline.py --stem <nome>
+
+# Tutti i PDF in sources/
+python conversione/pipeline.py
+
+# Forza riesecuzione
+python conversione/pipeline.py --stem <nome> --force
 ```
 
-Produce `conversione/<stem>/clean.md`. Vedi [`conversione/README.md`](conversione/README.md).
-
-### 2. Rivedi il Markdown
-
-```
-/prepare-md conversione/<stem>/clean.md
-```
-
-Passaggio più importante: la qualità del RAG dipende da questo.
-
-### 3. Chunking
-
-```bash
-python step-5/chunker.py --stem <nome>
-```
-
-### 4. Verifica e fix chunk
-
-```bash
-python step-6/verify_chunks.py --stem <nome>
-python step-6/fix_chunks.py --stem <nome>     # se ci sono 🔴
-python step-6/verify_chunks.py --stem <nome>  # ri-verifica
-```
-
-Non procedere alla vettorizzazione se ci sono 🔴.
-
-### 5. Vettorizza
-
-```bash
-python step-8/ingest.py --stem <nome>
-```
-
-Vedi [`step-8/README.md`](step-8/README.md). Usa `--force` se hai cambiato `EMBED_MODEL` o i chunk.
-
-### 6. Interroga
-
-```bash
-python rag.py --stem <nome>       # risposta LLM
-python retrieve.py --stem <nome>  # retrieval puro (debug)
-```
+`--stem` = nome file PDF senza estensione.  
+Esempio: `sources/analisi1.pdf` → `--stem analisi1`
 
 ---
 
-## Configurazione (`config.py`)
+## Output
 
-| Parametro | Default | Descrizione |
-|---|---|---|
-| `EMBED_MODEL` | `"nomic-embed-text"` | Modello embedding — deve corrispondere tra ingest e retrieval |
-| `OLLAMA_MODEL` | `"qwen3.5:0.8b"` | Modello LLM |
-| `OLLAMA_URL` | `"http://localhost:11434"` | Endpoint Ollama |
-| `TOP_K` | `6` | Chunk recuperati per query |
-| `TEMPERATURE` | `0.0` | Deterministico a `0.0` |
-| `NO_THINK` | `True` | Disabilita chain-of-thought (Qwen3/Qwen3.5) |
-| `SYSTEM_PROMPT` | *(vedi file)* | Istruzioni di comportamento per il LLM |
+Per ogni stem in `conversione/<stem>/`:
 
-> Se cambi `EMBED_MODEL`, riesegui `step-8/ingest.py --stem <nome> --force`.
+| File | Descrizione |
+|------|-------------|
+| `raw.md` | Markdown grezzo — **non modificare** |
+| `clean.md` | Markdown pulito — copia di lavoro |
+| `structure_profile.json` | Struttura rilevata e metriche |
+| `report.json` | Statistiche complete della conversione |
 
 ---
 
-## Principi
+## Validazione batch
 
-**Atomico** — ogni fase fa una cosa sola; se si rompe qualcosa sai esattamente dove.
+```bash
+python conversione/validate.py
+```
 
-**Verificabile** — ogni fase ha un criterio di completamento oggettivo prima di procedere.
+Stampa una tabella di stato su tutti gli stem convertiti.
 
-**Reversibile** — puoi tornare indietro senza perdere il lavoro delle altre fasi.
+---
 
-**Adattivo** — nessuna assunzione sulla struttura del documento; si adatta automaticamente.
-
-**Locale** — nessuna API esterna, nessun dato trasmesso fuori dalla macchina.
+Vedi [`conversione/README.md`](conversione/README.md) per dettagli sulla pipeline e i tipi di documento supportati.
diff --git a/chunks/chunker.py b/chunks/chunker.py
deleted file mode 100644
index 188d95a..0000000
--- a/chunks/chunker.py
+++ /dev/null
@@ -1,443 +0,0 @@
-#!/usr/bin/env python3
-"""
-Chunking adattivo
-
-Divide il Markdown revisionato in chunk semantici pronti per la
-vettorizzazione. La strategia dipende dal profilo strutturale del documento.
-
-Input:  conversione/<stem>/clean.md + conversione/<stem>/structure_profile.json
-Output: chunks/<stem>/chunks.json
-
-Uso:
-    python chunks/chunker.py                    # tutti i documenti in conversione/
-    python chunks/chunker.py --stem documento   # un solo documento
-    python chunks/chunker.py --stem documento --force
-"""
-
-import argparse
-import json
-import re
-import sys
-from pathlib import Path
-
-
-# ─── Parametri ────────────────────────────────────────────────────────────────
-
-MIN_CHARS = 200   # sotto questa soglia → accorpa al chunk successivo
-MAX_CHARS = 800   # sopra questa soglia → spezza su frasi
-OVERLAP_S = 2     # frasi di overlap tra sotto-chunk dello stesso boundary
-
-
-# ─── Utilità ──────────────────────────────────────────────────────────────────
-
-def split_sentences(text: str) -> list[str]:
-    parts = re.split(r'(?<=[.!?»])\s+(?=[A-ZÀÈÉÌÒÙA-Z\"])', text.strip())
-    if len(parts) <= 1:
-        parts = re.split(r'(?<=[.!?»])\s+', text.strip())
-    return [p.strip() for p in parts if p.strip()]
-
-
-def slugify(s: str, max_len: int = 60) -> str:
-    s = s.lower()
-    s = re.sub(r'[^\w\s-]', '', s)
-    s = re.sub(r'[\s_-]+', '_', s).strip('_')
-    return s[:max_len] if s else "section"
-
-
-_SENT_BOUNDARY = re.compile(r"[.!?»)\]'\u2019\"\u201c\u201d/:|\u2026]$")
-
-
-def _flush_chunk(
-    current: list[str],
-    sentences: list[str],
-    i: int,
-    prefix: str,
-    sezione: str,
-    titolo: str,
-    sub_index: int,
-    max_chars: int,
-) -> tuple[dict, list[str], int, int]:
-    """Emette un chunk, estendendo fino a un confine di frase (max +20%)."""
-    hard_limit = int(max_chars * 1.2)
-    current_len = sum(len(s) + 1 for s in current)
-    while i < len(sentences) and not _SENT_BOUNDARY.search(" ".join(current)):
-        nxt = sentences[i]
-        if current_len + len(nxt) + 1 > hard_limit:
-            break
-        current.append(nxt)
-        current_len += len(nxt) + 1
-        i += 1
-    chunk_text = prefix + " ".join(current)
-    chunk = {
-        "chunk_id": f"{slugify(sezione)}__{slugify(titolo)}__s{sub_index}",
-        "text": chunk_text,
-        "sezione": sezione,
-        "titolo": titolo,
-        "sub_index": sub_index,
-        "n_chars": len(chunk_text),
-    }
-    return chunk, current, i, sub_index + 1
-
-
-def make_sub_chunks(
-    body: str,
-    prefix: str,
-    sezione: str,
-    titolo: str,
-    max_chars: int,
-    overlap_s: int,
-) -> list[dict]:
-    sentences = split_sentences(body)
-    if not sentences:
-        return []
-
-    chunks = []
-    current: list[str] = []
-    current_len = 0
-    sub_index = 0
-
-    i = 0
-    while i < len(sentences):
-        sent = sentences[i]
-        if not current or current_len + len(sent) + 1 <= max_chars:
-            current.append(sent)
-            current_len += len(sent) + (1 if len(current) > 1 else 0)
-            i += 1
-        else:
-            chunk, current, i, sub_index = _flush_chunk(
-                current, sentences, i, prefix, sezione, titolo, sub_index, max_chars
-            )
-            chunks.append(chunk)
-            overlap = current[-overlap_s:] if overlap_s and len(current) > overlap_s else []
-            current = overlap[:]
-            current_len = sum(len(s) + 1 for s in current)
-
-    if current:
-        chunk_text = prefix + " ".join(current)
-        chunks.append({
-            "chunk_id": f"{slugify(sezione)}__{slugify(titolo)}__s{sub_index}",
-            "text": chunk_text,
-            "sezione": sezione,
-            "titolo": titolo,
-            "sub_index": sub_index,
-            "n_chars": len(chunk_text),
-        })
-
-    return chunks
-
-
-# ─── Parser Markdown ──────────────────────────────────────────────────────────
-
-def parse_h3_sections(text: str) -> list[dict]:
-    sections = []
-    current_h2 = ""
-    current_h3 = ""
-    current_body_lines: list[str] = []
-
-    def flush():
-        body = "\n".join(current_body_lines).strip()
-        if body:
-            sections.append({
-                "sezione": current_h2,
-                "titolo": current_h3,
-                "body": body,
-            })
-
-    for line in text.splitlines():
-        if re.match(r"^# ", line):
-            flush()
-            current_h2 = line[2:].strip()
-            current_h3 = ""
-            current_body_lines = []
-        elif re.match(r"^## ", line):
-            flush()
-            current_h2 = line[3:].strip()
-            current_h3 = ""
-            current_body_lines = []
-        elif re.match(r"^### ", line):
-            flush()
-            current_h3 = line[4:].strip()
-            current_body_lines = []
-        else:
-            current_body_lines.append(line)
-
-    flush()
-    return sections
-
-
-def parse_h2_sections(text: str) -> list[dict]:
-    sections = []
-    current_h2 = ""
-    current_body_lines: list[str] = []
-
-    def flush():
-        body = "\n".join(current_body_lines).strip()
-        if body:
-            sections.append({"sezione": current_h2, "body": body})
-
-    for line in text.splitlines():
-        if re.match(r"^## ", line):
-            flush()
-            current_h2 = line[3:].strip()
-            current_body_lines = []
-        elif re.match(r"^# ", line):
-            flush()
-            current_h2 = line[2:].strip()
-            current_body_lines = []
-        else:
-            current_body_lines.append(line)
-
-    flush()
-    return sections
-
-
-# ─── Strategie di chunking ────────────────────────────────────────────────────
-
-def chunk_h3_aware(text: str, stem: str) -> list[dict]:
-    sections = parse_h3_sections(text)
-
-    merged: list[dict] = []
-    pending: dict | None = None
-
-    for sec in sections:
-        if pending is None:
-            pending = dict(sec)
-            continue
-
-        if (pending["sezione"] == sec["sezione"]
-                and len(pending["body"]) < MIN_CHARS):
-            sep_title = " / ".join(filter(None, [pending["titolo"], sec["titolo"]]))
-            pending = {
-                "sezione": pending["sezione"],
-                "titolo": sep_title or pending["titolo"],
-                "body": pending["body"] + "\n\n" + sec["body"],
-            }
-        else:
-            merged.append(pending)
-            pending = dict(sec)
-
-    if pending:
-        merged.append(pending)
-
-    chunks = []
-    for sec in merged:
-        sezione = sec["sezione"] or stem
-        titolo = sec["titolo"] or ""
-        body = sec["body"]
-
-        prefix = f"[{sezione} > {titolo}]\n" if titolo else f"[{sezione}]\n"
-        sub = make_sub_chunks(body, prefix, sezione, titolo, MAX_CHARS, OVERLAP_S)
-        chunks.extend(sub)
-
-    return chunks
-
-
-def chunk_h2_paragraph_split(text: str, stem: str) -> list[dict]:
-    sections = parse_h2_sections(text)
-    chunks = []
-
-    for sec in sections:
-        sezione = sec["sezione"] or stem
-        body = sec["body"]
-        prefix = f"[{sezione}]\n"
-
-        paragraphs = [
-            p.strip()
-            for p in re.split(r"\n{2,}", body)
-            if p.strip() and not re.match(r"^#+\s", p.strip())
-        ]
-
-        merged_pars: list[str] = []
-        pending = ""
-        for par in paragraphs:
-            if pending and len(pending) < MIN_CHARS:
-                pending = pending + "\n\n" + par
-            else:
-                if pending:
-                    merged_pars.append(pending)
-                pending = par
-        if pending:
-            merged_pars.append(pending)
-
-        for idx, par in enumerate(merged_pars):
-            sub = make_sub_chunks(par, prefix, sezione, f"par{idx}", MAX_CHARS, OVERLAP_S)
-            for c in sub:
-                c["chunk_id"] = f"{slugify(sezione)}__p{idx}__s{c['sub_index']}"
-            chunks.extend(sub)
-
-    return chunks
-
-
-def chunk_paragraph(text: str, stem: str) -> list[dict]:
-    paragraphs = [
-        p.strip()
-        for p in re.split(r"\n{2,}", text)
-        if p.strip() and not re.match(r"^#+\s", p.strip())
-    ]
-    prefix = f"[Documento: {stem}]\n"
-
-    merged: list[str] = []
-    pending = ""
-    for par in paragraphs:
-        if pending and len(pending) < MIN_CHARS:
-            pending = pending + "\n\n" + par
-        else:
-            if pending:
-                merged.append(pending)
-            pending = par
-    if pending:
-        merged.append(pending)
-
-    chunks = []
-    for idx, par in enumerate(merged):
-        sub = make_sub_chunks(par, prefix, stem, f"par{idx}", MAX_CHARS, OVERLAP_S)
-        for c in sub:
-            c["chunk_id"] = f"para__{idx}__s{c['sub_index']}"
-        chunks.extend(sub)
-
-    return chunks
-
-
-def chunk_sliding_window(text: str, stem: str) -> list[dict]:
-    sentences = split_sentences(text)
-    prefix = f"[Documento: {stem}]\n"
-
-    chunks = []
-    i = 0
-    win_idx = 0
-
-    while i < len(sentences):
-        window: list[str] = []
-        cur_len = 0
-
-        j = i
-        while j < len(sentences):
-            s = sentences[j]
-            if window and cur_len + len(s) + 1 > MAX_CHARS:
-                break
-            window.append(s)
-            cur_len += len(s) + (1 if len(window) > 1 else 0)
-            j += 1
-
-        if not window:
-            window = [sentences[i]]
-            j = i + 1
-
-        chunk_text = prefix + " ".join(window)
-        chunks.append({
-            "chunk_id": f"win__{win_idx}",
-            "text": chunk_text,
-            "sezione": stem,
-            "titolo": f"finestra {win_idx}",
-            "sub_index": win_idx,
-            "n_chars": len(chunk_text),
-        })
-        win_idx += 1
-        i += max(1, len(window) - OVERLAP_S)
-
-    return chunks
-
-
-# ─── Dispatcher ───────────────────────────────────────────────────────────────
-
-_STRATEGIES: dict[str, callable] = {
-    "h3_aware": chunk_h3_aware,
-    "h2_paragraph_split": chunk_h2_paragraph_split,
-    "paragraph": chunk_paragraph,
-    "sliding_window": chunk_sliding_window,
-}
-
-
-def chunk_document(clean_md: Path, profile: dict, stem: str) -> list[dict]:
-    text = clean_md.read_text(encoding="utf-8")
-    strategia = profile.get("strategia_chunking", "paragraph")
-    fn = _STRATEGIES.get(strategia, chunk_paragraph)
-    return fn(text, stem)
-
-
-# ─── Per-document processing ──────────────────────────────────────────────────
-
-def process_stem(stem: str, project_root: Path, force: bool) -> bool:
-    conv_dir  = project_root / "conversione" / stem
-    out_dir   = project_root / "chunks" / stem
-    clean_md  = conv_dir / "clean.md"
-    profile_path = conv_dir / "structure_profile.json"
-    out_file  = out_dir / "chunks.json"
-
-    print(f"\nDocumento: {stem}")
-
-    if not clean_md.exists():
-        print(f"  ✗ clean.md non trovato in conversione/{stem}/ — skip")
-        return False
-    if not profile_path.exists():
-        print(f"  ✗ structure_profile.json non trovato in conversione/{stem}/ — skip")
-        return False
-
-    if out_file.exists() and not force:
-        print(f"  ⚠️  chunks.json già presente — skip")
-        print(f"       (usa --force per rieseguire)")
-        return True
-
-    profile   = json.loads(profile_path.read_text(encoding="utf-8"))
-    strategia = profile.get("strategia_chunking", "paragraph")
-    print(f"  Strategia: {strategia}")
-
-    chunks = chunk_document(clean_md, profile, stem)
-
-    if not chunks:
-        print(f"  ✗ Nessun chunk generato — controlla clean.md")
-        return False
-
-    out_dir.mkdir(parents=True, exist_ok=True)
-    out_file.write_text(
-        json.dumps(chunks, ensure_ascii=False, indent=2), encoding="utf-8"
-    )
-
-    lengths = [c["n_chars"] for c in chunks]
-    min_c = min(lengths)
-    max_c = max(lengths)
-    avg_c = int(sum(lengths) / len(lengths))
-    short = sum(1 for l in lengths if l < MIN_CHARS)
-    long_ = sum(1 for l in lengths if l > MAX_CHARS * 1.5)
-
-    print(f"  Chunk totali: {len(chunks)}")
-    print(f"  Min: {min_c} char  Max: {max_c} char  Media: {avg_c} char")
-    if short:
-        print(f"  ⚠️  {short} chunk sotto MIN_CHARS ({MIN_CHARS})")
-    if long_:
-        print(f"  ⚠️  {long_} chunk sopra MAX_CHARS×1.5 ({int(MAX_CHARS * 1.5)})")
-    print(f"  ✅ chunks.json salvato in chunks/{stem}/")
-    return True
-
-
-# ─── Entry point ─────────────────────────────────────────────────────────────
-
-if __name__ == "__main__":
-    project_root = Path(__file__).parent.parent
-
-    parser = argparse.ArgumentParser(description="Chunking adattivo")
-    parser.add_argument("--stem", help="Nome del documento (sottocartella di conversione/)")
-    parser.add_argument("--force", action="store_true", help="Riesegui anche se già presente")
-    args = parser.parse_args()
-
-    if args.stem:
-        stems = [args.stem]
-    else:
-        conv_dir = project_root / "conversione"
-        if not conv_dir.exists():
-            print(f"Errore: cartella conversione/ non trovata in {project_root}")
-            sys.exit(1)
-        stems = sorted(
-            p.name for p in conv_dir.iterdir()
-            if p.is_dir() and (p / "clean.md").exists()
-        )
-        if not stems:
-            print(f"Errore: nessun documento trovato in conversione/")
-            sys.exit(1)
-
-    results = [process_stem(s, project_root, args.force) for s in stems]
-
-    ok    = sum(results)
-    total = len(results)
-    print(f"\n{'✅' if all(results) else '⚠️ '} {ok}/{total} documenti processati")
-    sys.exit(0 if all(results) else 1)
diff --git a/chunks/fix_chunks.py b/chunks/fix_chunks.py
deleted file mode 100644
index e817e51..0000000
--- a/chunks/fix_chunks.py
+++ /dev/null
@@ -1,283 +0,0 @@
-#!/usr/bin/env python3
-"""
-Fix chunk
-
-Applica correzioni dirette su chunks/<stem>/chunks.json basandosi sul
-report.json prodotto da verify_chunks.py. Non tocca clean.md.
-
-Fixes applicati:
-  empty      → rimuove il chunk
-  incomplete → fonde con il chunk successivo (la frase continua)
-  no_prefix  → aggiunge prefisso [sezione > titolo] se mancante
-  too_short  → fonde con il chunk adiacente nello stesso sezione
-  too_long   → spezza all'ultimo confine di paragrafo/frase entro MAX_CHARS
-
-Input:  chunks/<stem>/chunks.json  +  chunks/<stem>/report.json
-Output: chunks/<stem>/chunks.json  (sovrascrive)
-
-Uso:
-    python chunks/fix_chunks.py --stem documento
-    python chunks/fix_chunks.py --stem documento --dry-run
-"""
-
-import argparse
-import json
-import re
-import sys
-from pathlib import Path
-
-MAX_CHARS = 800
-PUNCT_END = re.compile(r"[.!?»)\]'\u2019\"\u201c\u201d\u2018\u2014\u2013-]$")
-
-
-# ─── Helpers ──────────────────────────────────────────────────────────────────
-
-def _prefix(chunk: dict) -> str:
-    sezione = chunk.get("sezione", "")
-    titolo  = chunk.get("titolo", "")
-    if titolo:
-        return f"[{sezione} > {titolo}]"
-    return f"[{sezione}]"
-
-
-def _strip_prefix(text: str) -> str:
-    text = text.lstrip()
-    if text.startswith("["):
-        end = text.find("]")
-        if end != -1:
-            return text[end + 1:].lstrip("\n")
-    return text
-
-
-def _rebuild_text(chunk: dict, body: str) -> str:
-    return f"{_prefix(chunk)}\n{body}"
-
-
-def _split_at_boundary(text: str, max_chars: int) -> list[str]:
-    if len(text) <= max_chars:
-        return [text]
-
-    parts = []
-    remaining = text
-
-    while len(remaining) > max_chars:
-        candidate = remaining[:max_chars]
-        split_pos = candidate.rfind("\n\n")
-
-        if split_pos == -1:
-            m = None
-            for m in re.finditer(r"[.!?»]\s+", candidate):
-                pass
-            split_pos = m.end() if m else None
-
-        if split_pos is None or split_pos == 0:
-            sp = remaining.find(" ", max_chars)
-            split_pos = sp if sp != -1 else len(remaining)
-
-        parts.append(remaining[:split_pos].rstrip())
-        remaining = remaining[split_pos:].lstrip()
-
-    if remaining:
-        parts.append(remaining)
-
-    return [p for p in parts if p.strip()]
-
-
-# ─── Operazioni sui chunk ─────────────────────────────────────────────────────
-
-def fix_empty(chunks: list[dict], empty_ids: set[str]) -> tuple[list[dict], int]:
-    before = len(chunks)
-    chunks = [c for c in chunks if c["chunk_id"] not in empty_ids]
-    return chunks, before - len(chunks)
-
-
-def fix_no_prefix(chunks: list[dict], no_prefix_ids: set[str]) -> tuple[list[dict], int]:
-    count = 0
-    for c in chunks:
-        if c["chunk_id"] in no_prefix_ids:
-            body = _strip_prefix(c["text"])
-            c["text"] = _rebuild_text(c, body)
-            c["n_chars"] = len(c["text"])
-            count += 1
-    return chunks, count
-
-
-def fix_incomplete_and_short(chunks: list[dict],
-                              problem_ids: set[str]) -> tuple[list[dict], int]:
-    merged = 0
-    i = 0
-    result: list[dict] = []
-
-    while i < len(chunks):
-        c = chunks[i]
-        if c["chunk_id"] in problem_ids and i + 1 < len(chunks):
-            nxt = chunks[i + 1]
-            body_c   = _strip_prefix(c["text"])
-            body_nxt = _strip_prefix(nxt["text"])
-            merged_body = body_c.rstrip() + "\n" + body_nxt.lstrip()
-            nxt["text"]    = _rebuild_text(nxt, merged_body)
-            nxt["n_chars"] = len(nxt["text"])
-            merged += 1
-            i += 1
-            continue
-        result.append(c)
-        i += 1
-
-    return result, merged
-
-
-def fix_too_long(chunks: list[dict],
-                 too_long_ids: set[str],
-                 max_chars: int) -> tuple[list[dict], int]:
-    result: list[dict] = []
-    split_count = 0
-
-    for c in chunks:
-        if c["chunk_id"] not in too_long_ids:
-            result.append(c)
-            continue
-
-        body  = _strip_prefix(c["text"])
-        parts = _split_at_boundary(body, max_chars)
-
-        if len(parts) == 1:
-            result.append(c)
-            continue
-
-        base_id  = re.sub(r"__s\d+$", "", c["chunk_id"])
-        base_sub = c.get("sub_index", 0)
-
-        for j, part in enumerate(parts):
-            new_chunk = dict(c)
-            new_chunk["sub_index"] = base_sub + j
-            new_chunk["chunk_id"]  = f"{base_id}__s{base_sub + j}"
-            new_chunk["text"]      = _rebuild_text(new_chunk, part)
-            new_chunk["n_chars"]   = len(new_chunk["text"])
-            result.append(new_chunk)
-
-        split_count += 1
-
-    return result, split_count
-
-
-def renumber_ids(chunks: list[dict]) -> list[dict]:
-    seen: dict[str, int] = {}
-    for c in chunks:
-        base = re.sub(r"__s\d+$", "", c["chunk_id"])
-        idx  = seen.get(base, 0)
-        c["chunk_id"]  = f"{base}__s{idx}"
-        c["sub_index"] = idx
-        seen[base] = idx + 1
-    return chunks
-
-
-# ─── Core ─────────────────────────────────────────────────────────────────────
-
-def fix_stem(stem: str, project_root: Path, max_chars: int, dry_run: bool) -> bool:
-    stem_dir    = project_root / "chunks" / stem
-    chunks_path = stem_dir / "chunks.json"
-    report_path = stem_dir / "report.json"
-
-    if not chunks_path.exists():
-        print(f"✗ chunks/{stem}/chunks.json non trovato.")
-        print(f"  Esegui prima: python chunks/chunker.py --stem {stem}")
-        return False
-
-    if not report_path.exists():
-        print(f"✗ chunks/{stem}/report.json non trovato.")
-        print(f"  Esegui prima: python chunks/verify_chunks.py --stem {stem}")
-        return False
-
-    chunks: list[dict] = json.loads(chunks_path.read_text(encoding="utf-8"))
-    report: dict       = json.loads(report_path.read_text(encoding="utf-8"))
-
-    verdict = report.get("verdict", "ok")
-    print(f"\nDocumento: {stem}  (verdict: {verdict})")
-
-    if verdict == "ok":
-        print("  ✅ Nessun problema — nulla da correggere.")
-        return True
-
-    empty_ids      = {e["chunk_id"] for e in report.get("blockers", {}).get("empty", [])}
-    no_prefix_ids  = {e["chunk_id"] for e in report.get("blockers", {}).get("no_prefix", [])}
-    incomplete_ids = {e["chunk_id"] for e in report.get("blockers", {}).get("incomplete", [])}
-    too_short_ids  = {e["chunk_id"] for e in report.get("warnings", {}).get("too_short", [])}
-    too_long_ids   = {e["chunk_id"] for e in report.get("warnings", {}).get("too_long", [])}
-
-    ops: list[str] = []
-    if empty_ids:
-        ops.append(f"  🗑  rimuovi {len(empty_ids)} chunk vuoti")
-    if no_prefix_ids:
-        ops.append(f"  🔧 aggiungi prefisso a {len(no_prefix_ids)} chunk")
-    if incomplete_ids:
-        ops.append(f"  🔗 fondi {len(incomplete_ids)} chunk incompleti col successivo")
-    if too_short_ids:
-        ops.append(f"  🔗 fondi {len(too_short_ids)} chunk troppo corti col successivo")
-    if too_long_ids:
-        ops.append(f"  ✂️  spezza {len(too_long_ids)} chunk troppo lunghi")
-
-    if not ops:
-        print("  ✅ Nessuna correzione necessaria.")
-        return True
-
-    print("\n  Operazioni pianificate:")
-    for op in ops:
-        print(op)
-
-    if dry_run:
-        print("\n  [dry-run] Nessuna modifica applicata.")
-        return True
-
-    n_before = len(chunks)
-
-    if empty_ids:
-        chunks, n = fix_empty(chunks, empty_ids)
-        print(f"\n  🗑  Rimossi {n} chunk vuoti.")
-
-    if no_prefix_ids:
-        chunks, n = fix_no_prefix(chunks, no_prefix_ids)
-        print(f"  🔧 Aggiunto prefisso a {n} chunk.")
-
-    merge_ids = incomplete_ids | too_short_ids
-    if merge_ids:
-        chunks, n = fix_incomplete_and_short(chunks, merge_ids)
-        print(f"  🔗 Fusi {n} chunk (incompleti + corti).")
-
-    if too_long_ids:
-        chunks, n = fix_too_long(chunks, too_long_ids, max_chars)
-        print(f"  ✂️  Spezzati {n} chunk lunghi.")
-
-    chunks = renumber_ids(chunks)
-
-    n_after = len(chunks)
-    print(f"\n  Totale chunk: {n_before} → {n_after}")
-
-    chunks_path.write_text(
-        json.dumps(chunks, ensure_ascii=False, indent=2), encoding="utf-8"
-    )
-    print(f"  ✅ Salvato: chunks/{stem}/chunks.json")
-    print(f"\n  Riesegui la verifica:")
-    print(f"     python chunks/verify_chunks.py --stem {stem}")
-
-    return True
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-if __name__ == "__main__":
-    project_root = Path(__file__).parent.parent
-
-    parser = argparse.ArgumentParser(description="Fix chunk")
-    parser.add_argument("--stem", required=True, help="Nome del documento (sottocartella di chunks/)")
-    parser.add_argument(
-        "--max", type=int, default=MAX_CHARS,
-        help=f"Soglia massima caratteri per lo split (default: {MAX_CHARS})"
-    )
-    parser.add_argument(
-        "--dry-run", action="store_true",
-        help="Mostra le operazioni pianificate senza applicarle"
-    )
-    args = parser.parse_args()
-
-    ok = fix_stem(args.stem, project_root, args.max, args.dry_run)
-    sys.exit(0 if ok else 1)
diff --git a/chunks/verify_chunks.py b/chunks/verify_chunks.py
deleted file mode 100644
index d682748..0000000
--- a/chunks/verify_chunks.py
+++ /dev/null
@@ -1,334 +0,0 @@
-#!/usr/bin/env python3
-"""
-Verifica chunk
-
-Analizza chunks/<stem>/chunks.json e segnala ogni anomalia che potrebbe
-degradare la qualità del retrieval. Non modifica nulla.
-
-Input:  chunks/<stem>/chunks.json
-Output: report a schermo + chunks/<stem>/report.json + exit code (0 = OK, 1 = problemi)
-
-Uso:
-    python chunks/verify_chunks.py --stem documento
-    python chunks/verify_chunks.py                    # tutti i documenti in chunks/
-    python chunks/verify_chunks.py --min 200 --max 800
-"""
-
-import argparse
-import json
-import re
-import sys
-from pathlib import Path
-
-
-# ─── Soglie ───────────────────────────────────────────────────────────────────
-
-MIN_CHARS = 200
-MAX_CHARS = 800
-PUNCT_END = re.compile(
-    r"[.!?»)\]'\u2019\"\u201c\u201d\u2018\u2014\u2013\u2026]$"
-    r"|/$"    # URL che finisce con /
-    r"|\|$"   # riga di tabella Markdown
-    r"|:$"    # introduzione a lista o formula
-)
-_HEX_END     = re.compile(r"[0-9a-fA-F]{8,}$")
-_URL_TAIL    = re.compile(r"https?://\S+(\s+\S+){0,3}$")  # URL con fino a 3 token extra
-_MATH_SYMS   = re.compile(r"[∈∑≤≥≠∀∃∫√∞∂±×÷→←↔⊂⊃⊆⊇∩∪·°]")
-
-
-# ─── Checks ───────────────────────────────────────────────────────────────────
-
-def has_prefix(chunk: dict) -> bool:
-    return chunk.get("text", "").lstrip().startswith("[")
-
-
-def is_empty(chunk: dict) -> bool:
-    return not chunk.get("text", "").strip()
-
-
-def is_too_short(chunk: dict, min_chars: int) -> bool:
-    return chunk.get("n_chars", 0) < min_chars
-
-
-def is_too_long(chunk: dict, max_chars: int) -> bool:
-    return chunk.get("n_chars", 0) > max_chars * 1.5
-
-
-def ends_incomplete(chunk: dict) -> bool:
-    text = chunk.get("text", "").rstrip()
-    if not text:
-        return False
-    text_check = re.sub(r"[_*]+$", "", text).rstrip()
-    if not text_check:
-        return False
-    if PUNCT_END.search(text_check):
-        return False
-    if _HEX_END.search(text_check):   # hash SHA / codice hex
-        return False
-    if _URL_TAIL.search(text_check[-200:]):  # URL (con eventuale path dopo spazio)
-        return False
-    return True
-
-
-def is_math_incomplete(chunk: dict) -> bool:
-    """Incompleto ma in contesto matematico — degrada a warning invece di blocker."""
-    return ends_incomplete(chunk) and len(_MATH_SYMS.findall(chunk.get("text", ""))) >= 3
-
-
-# ─── Report ───────────────────────────────────────────────────────────────────
-
-def _fmt_chunk(c: dict) -> str:
-    cid     = c.get("chunk_id", "?")
-    n       = c.get("n_chars", 0)
-    preview = c.get("text", "")[:60].replace("\n", " ")
-    return f"  [{cid}] ({n} char) «{preview}»"
-
-
-def verify_stem(stem: str, project_root: Path, min_chars: int, max_chars: int) -> bool:
-    chunks_path = project_root / "chunks" / stem / "chunks.json"
-
-    print(f"\nDocumento: {stem}")
-
-    if not chunks_path.exists():
-        print(f"  ✗ chunks/{stem}/chunks.json non trovato")
-        print(f"    Esegui prima: python chunks/chunker.py --stem {stem}")
-        return False
-
-    chunks: list[dict] = json.loads(chunks_path.read_text(encoding="utf-8"))
-
-    if not chunks:
-        print(f"  ✗ chunks.json è vuoto")
-        return False
-
-    # ── Raccogli problemi ──────────────────────────────────────────────────────
-
-    empty_chunks      = [c for c in chunks if is_empty(c)]
-    no_prefix         = [c for c in chunks if not is_empty(c) and not has_prefix(c)]
-    too_short         = [c for c in chunks if is_too_short(c, min_chars)]
-    too_long          = [c for c in chunks if is_too_long(c, max_chars)]
-    _incomplete_all   = [c for c in chunks if not is_empty(c) and ends_incomplete(c)]
-    incomplete_math   = [c for c in _incomplete_all if is_math_incomplete(c)]
-    incomplete        = [c for c in _incomplete_all if not is_math_incomplete(c)]
-
-    # ── Statistiche ───────────────────────────────────────────────────────────
-
-    lengths = [c.get("n_chars", 0) for c in chunks]
-    n_total = len(chunks)
-    n_ok    = n_total - len(set(
-        c["chunk_id"]
-        for lst in [empty_chunks, no_prefix, too_short, too_long, incomplete]
-        for c in lst
-    ))
-    min_l = min(lengths)
-    max_l = max(lengths)
-    avg_l = int(sum(lengths) / n_total)
-
-    n_under  = sum(1 for l in lengths if l < min_chars)
-    n_normal = sum(1 for l in lengths if min_chars <= l <= max_chars)
-    n_over   = sum(1 for l in lengths if l > max_chars)
-
-    # ── Output ────────────────────────────────────────────────────────────────
-
-    print(f"  Totale chunk:  {n_total}")
-    print(f"  ✅ OK:         {n_ok}")
-    print()
-    print(f"  Distribuzione lunghezze:")
-    print(f"    Min:   {min_l} char")
-    print(f"    Max:   {max_l} char")
-    print(f"    Media: {avg_l} char")
-    print(f"    < {min_chars} char (sotto MIN): {n_under}")
-    print(f"    {min_chars}–{max_chars} char (ideale):  {n_normal}")
-    print(f"    > {max_chars} char (sopra MAX): {n_over}")
-
-    has_errors = False
-
-    if empty_chunks:
-        has_errors = True
-        print(f"\n  🔴 {len(empty_chunks)} chunk VUOTI:")
-        for c in empty_chunks[:5]:
-            print(f"  [{c.get('chunk_id', '?')}]")
-        if len(empty_chunks) > 5:
-            print(f"  ... e altri {len(empty_chunks) - 5}")
-
-    if no_prefix:
-        has_errors = True
-        print(f"\n  🔴 {len(no_prefix)} chunk SENZA PREFISSO DI CONTESTO:")
-        for c in no_prefix[:5]:
-            print(_fmt_chunk(c))
-        if len(no_prefix) > 5:
-            print(f"  ... e altri {len(no_prefix) - 5}")
-        print(f"  → Causa probabile: header ### mancanti o malformati nel MD")
-
-    if too_short:
-        has_errors = True
-        print(f"\n  🟡 {len(too_short)} chunk SOTTO MIN_CHARS ({min_chars}):")
-        for c in too_short[:5]:
-            print(_fmt_chunk(c))
-        if len(too_short) > 5:
-            print(f"  ... e altri {len(too_short) - 5}")
-        print(f"  → Soluzione: abbassa MIN_CHARS o revisiona il MD")
-
-    if too_long:
-        has_errors = True
-        print(f"\n  🟡 {len(too_long)} chunk SOPRA MAX_CHARS×1.5 ({int(max_chars * 1.5)}):")
-        for c in too_long[:5]:
-            print(_fmt_chunk(c))
-        if len(too_long) > 5:
-            print(f"  ... e altri {len(too_long) - 5}")
-        print(f"  → Soluzione: alza MAX_CHARS o verifica il testo nel MD")
-
-    if incomplete:
-        has_errors = True
-        print(f"\n  🔴 {len(incomplete)} chunk CHE FINISCONO SENZA PUNTEGGIATURA (frase spezzata):")
-        for c in incomplete[:5]:
-            last_line = c.get("text", "").rstrip().split("\n")[-1][-80:]
-            print(f"  [{c.get('chunk_id', '?')}] ...{last_line!r}")
-        if len(incomplete) > 5:
-            print(f"  ... e altri {len(incomplete) - 5}")
-        print(f"  → Soluzione: correggi le righe spezzate in conversione/{stem}/clean.md")
-
-    if incomplete_math:
-        has_errors = True
-        print(f"\n  🟡 {len(incomplete_math)} chunk MATEMATICI SENZA PUNTEGGIATURA (formula/espressione):")
-        for c in incomplete_math[:3]:
-            last_line = c.get("text", "").rstrip().split("\n")[-1][-80:]
-            print(f"  [{c.get('chunk_id', '?')}] ...{last_line!r}")
-        if len(incomplete_math) > 3:
-            print(f"  ... e altri {len(incomplete_math) - 3}")
-        print(f"  → Le formule non finiscono con punteggiatura — avviso non bloccante")
-
-    # ── Costruisci e salva report.json ────────────────────────────────────────
-
-    blockers = empty_chunks + no_prefix + incomplete
-    warnings = too_short + too_long + incomplete_math
-
-    def _chunk_entry(c: dict) -> dict:
-        return {
-            "chunk_id":  c.get("chunk_id", ""),
-            "sezione":   c.get("sezione", ""),
-            "titolo":    c.get("titolo", ""),
-            "n_chars":   c.get("n_chars", 0),
-            "last_text": c.get("text", "").rstrip().split("\n")[-1][-120:],
-        }
-
-    verdict = "ok" if not blockers else "blocked"
-    if not blockers and warnings:
-        verdict = "warnings_only"
-
-    report = {
-        "stem":    stem,
-        "verdict": verdict,
-        "stats": {
-            "total":     n_total,
-            "ok":        n_ok,
-            "min_chars": min_l,
-            "max_chars": max_l,
-            "avg_chars": avg_l,
-        },
-        "thresholds": {"min_chars": min_chars, "max_chars": max_chars},
-        "blockers": {
-            "empty":      [_chunk_entry(c) for c in empty_chunks],
-            "no_prefix":  [_chunk_entry(c) for c in no_prefix],
-            "incomplete": [_chunk_entry(c) for c in incomplete],
-        },
-        "warnings": {
-            "too_short":       [_chunk_entry(c) for c in too_short],
-            "too_long":        [_chunk_entry(c) for c in too_long],
-            "incomplete_math": [_chunk_entry(c) for c in incomplete_math],
-        },
-    }
-
-    out_dir = project_root / "chunks" / stem
-    out_dir.mkdir(parents=True, exist_ok=True)
-    (out_dir / "report.json").write_text(
-        json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8"
-    )
-    print(f"\n  report.json salvato in chunks/{stem}/")
-
-    # ── Prossimi passi ────────────────────────────────────────────────────────
-
-    print(f"\n  {'─' * 50}")
-    print(f"  PROSSIMI PASSI")
-    print(f"  {'─' * 50}")
-
-    if not blockers and not warnings:
-        print(f"  ✅ Tutto OK — procedi alla vettorizzazione:")
-        print(f"       python step-8/ingest.py --stem {stem}")
-
-    elif not blockers:
-        print(f"  🟡 Solo avvisi minori — puoi procedere alla vettorizzazione:")
-        print(f"       python step-8/ingest.py --stem {stem}")
-        print()
-        print(f"  Oppure, per ottimizzare prima:")
-        if too_short:
-            pct = int(len(too_short) / n_total * 100)
-            print(f"    • {len(too_short)} chunk corti ({pct}% del totale)")
-        if too_long:
-            pct = int(len(too_long) / n_total * 100)
-            print(f"    • {len(too_long)} chunk lunghi ({pct}% del totale)")
-        if too_short or too_long:
-            print(f"      → Esegui: python chunks/fix_chunks.py --stem {stem} --dry-run")
-            print(f"        poi:     python chunks/fix_chunks.py --stem {stem}")
-            print(f"        poi:     python chunks/verify_chunks.py --stem {stem}")
-
-    else:
-        print(f"  🔴 Problemi bloccanti — correggi prima di procedere:")
-        print()
-        if empty_chunks:
-            print(f"    • {len(empty_chunks)} chunk vuoti")
-            print(f"      → Controlla conversione/{stem}/clean.md per sezioni prive di testo")
-        if no_prefix:
-            print(f"    • {len(no_prefix)} chunk senza prefisso di contesto")
-            print(f"      → Controlla che gli header ### siano corretti in conversione/{stem}/clean.md")
-        if incomplete:
-            print(f"    • {len(incomplete)} chunk con frase spezzata")
-            print(f"      → Esegui: python chunks/fix_chunks.py --stem {stem}")
-        print()
-        print(f"  Dopo le correzioni, riesegui nell'ordine:")
-        print(f"       python chunks/chunker.py --stem {stem} --force")
-        print(f"       python chunks/verify_chunks.py --stem {stem}")
-        print()
-        if warnings:
-            print(f"  🟡 Hai anche {len(warnings)} avvisi minori — affrontali dopo aver risolto i 🔴.")
-
-    return not blockers
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-if __name__ == "__main__":
-    project_root = Path(__file__).parent.parent
-
-    parser = argparse.ArgumentParser(description="Verifica chunk")
-    parser.add_argument("--stem", help="Nome del documento (sottocartella di chunks/)")
-    parser.add_argument(
-        "--min", type=int, default=MIN_CHARS,
-        help=f"Soglia minima caratteri (default: {MIN_CHARS})"
-    )
-    parser.add_argument(
-        "--max", type=int, default=MAX_CHARS,
-        help=f"Soglia massima caratteri (default: {MAX_CHARS})"
-    )
-    args = parser.parse_args()
-
-    if args.stem:
-        stems = [args.stem]
-    else:
-        chunks_dir = project_root / "chunks"
-        if not chunks_dir.exists():
-            print(f"Errore: cartella chunks/ non trovata in {project_root}")
-            sys.exit(1)
-        stems = sorted(
-            p.name for p in chunks_dir.iterdir()
-            if p.is_dir() and (p / "chunks.json").exists()
-        )
-        if not stems:
-            print("Errore: nessun chunks.json trovato in chunks/")
-            sys.exit(1)
-
-    results = [verify_stem(s, project_root, args.min, args.max) for s in stems]
-
-    ok    = sum(results)
-    total = len(results)
-    print(f"\n{'✅' if all(results) else '⚠️ '} {ok}/{total} documenti senza problemi")
-    sys.exit(0 if all(results) else 1)
diff --git a/config.py b/config.py
deleted file mode 100644
index efd9d55..0000000
--- a/config.py
+++ /dev/null
@@ -1,54 +0,0 @@
-# ─── Configurazione RAG ───────────────────────────────────────────────────────
-#
-# Modifica questo file per cambiare i parametri della pipeline.
-#
-# Uso:
-#   python rag.py --stem nietzsche
-# ──────────────────────────────────────────────────────────────────────────────
-
-# ── Retrieval ─────────────────────────────────────────────────────────────────
-
-# Numero di chunk da recuperare per ogni domanda.
-# Valori più alti = più contesto, risposte potenzialmente più complete,
-# ma prompt più lunghi e generazione più lenta.
-TOP_K = 6
-
-# ── Generazione ───────────────────────────────────────────────────────────────
-
-# Temperatura del modello LLM.
-# 0.0 = completamente deterministico (stessa risposta ad ogni run)
-# 0.7 = più creativo e vario
-TEMPERATURE = 0.0
-
-# Disabilita il "thinking" (ragionamento interno) nei modelli Qwen3/Qwen3.5.
-# True  = risposta diretta, più veloce
-# False = ragionamento interno abilitato (più lento ma potenzialmente più accurato)
-NO_THINK = True
-
-# ── Embedding ─────────────────────────────────────────────────────────────────
-
-# Modello di embedding usato da Ollama.
-# Deve corrispondere al modello usato durante la vettorizzazione (ingest.py).
-# Se cambi questo, devi rieseguire ingest.py con --force.
-EMBED_MODEL = "nomic-embed-text"
-
-# ── Ollama ────────────────────────────────────────────────────────────────────
-
-# URL del server Ollama (default: locale sulla porta 11434).
-OLLAMA_URL = "http://localhost:11434"
-
-# Modello LLM. Scegli in base alla RAM disponibile (vedi README).
-OLLAMA_MODEL = "qwen3.5:0.8b"
-
-# ── Prompt di sistema ─────────────────────────────────────────────────────────
-
-# Istruzioni di comportamento inviate al LLM prima del contesto e della domanda.
-# Modifica per cambiare il tono, la lingua, il grado di libertà interpretativa
-# o le condizioni di fallback ("non so rispondere").
-SYSTEM_PROMPT = (
-    "Sei un assistente che risponde usando il contesto fornito. "
-    "Sintetizza e interpreta liberamente i passaggi del contesto per rispondere alla domanda. "
-    "Se il contesto contiene informazioni pertinenti, anche indirette, usale per costruire una risposta. "
-    "Solo se il contesto è completamente irrilevante, rispondi: "
-    "\"Non trovo questa informazione nel documento.\""
-)
diff --git a/ollama/README.md b/ollama/README.md
deleted file mode 100644
index 6faea9d..0000000
--- a/ollama/README.md
+++ /dev/null
@@ -1,113 +0,0 @@
-# Ollama — Verifica Ambiente
-
-Prima di procedere con la vettorizzazione (step 8) devi avere installato:
-
-- **Ollama** — server locale per LLM e embedding
-- un **modello di embedding** (es. `qwen3-embedding:0.6b`, `bge-m3`)
-- un **modello LLM** (es. `qwen3.5:4b`)
-- **chromadb** — libreria Python per il vector store
-
----
-
-## 1. Installa Ollama
-
-```bash
-curl -fsSL https://ollama.com/install.sh | sh
-```
-
-Verifica che il servizio sia attivo:
-
-```bash
-ollama list
-```
-
-### Disinstalla Ollama
-
-```bash
-# Ferma e rimuovi il servizio systemd
-sudo systemctl stop ollama
-sudo systemctl disable ollama
-sudo rm /etc/systemd/system/ollama.service
-sudo systemctl daemon-reload
-
-# Rimuovi il binario
-sudo rm /usr/local/bin/ollama
-
-# Rimuovi modelli e dati (opzionale)
-sudo rm -rf /usr/share/ollama
-
-# Rimuovi utente e gruppo di sistema (opzionale)
-sudo userdel ollama
-sudo groupdel ollama
-```
-
----
-
-## 2. Scarica i modelli
-
-### Modello di embedding (consigliato)
-
-```bash
-ollama pull qwen3-embedding:0.6b
-```
-
-Alternative supportate:
-
-- `nomic-embed-text-v2-moe`
-- `bge-m3`
-- `nomic-embed-text`
-
-Se cambi embedding model rispetto a quello usato in step-8, riesegui ingest con `--force` e aggiorna `EMBED_MODEL` in `config.py`.
-
-### Modello LLM (consigliato per 8 GB RAM)
-
-```bash
-ollama pull qwen3.5:4b
-```
-
-Se usi un modello diverso, aggiorna `OLLAMA_MODEL` in `config.py`.
-
-### Disinstalla un modello
-
-```bash
-ollama rm qwen3.5:4b
-ollama rm qwen3-embedding:0.6b
-```
-
----
-
-## 3. Installa le dipendenze Python
-
-```bash
-source .venv/bin/activate
-pip install -r requirements.txt
-```
-
----
-
-## 4. Verifica ambiente
-
-```bash
-source .venv/bin/activate
-python ollama/check_env.py
-```
-
-Output atteso (esempio):
-
-```text
-✅ ollama trovato nel PATH
-✅ ollama risponde correttamente
-✅ embedding disponibile: qwen3-embedding:0.6b
-✅ LLM disponibile: qwen3.5:4b
-✅ chromadb importabile
-✅ Ambiente pronto — procedi con la vettorizzazione:
-   python step-8/ingest.py --stem <nome>
-```
-
----
-
-## Prossimo step
-
-```bash
-python step-8/ingest.py --stem <nome>
-```
diff --git a/ollama/check_env.py b/ollama/check_env.py
deleted file mode 100644
index 359f0e9..0000000
--- a/ollama/check_env.py
+++ /dev/null
@@ -1,250 +0,0 @@
-#!/usr/bin/env python3
-"""
-Verifica ambiente Ollama
-
-Controlla che tutti i prerequisiti per la vettorizzazione siano soddisfatti:
-  1. ollama è nel PATH e risponde
-  2. Almeno un modello di embedding è scaricato
-  3. Almeno un modello LLM è scaricato
-  4. chromadb è importabile
-
-Output: report a schermo con ✅ / ❌ per ogni componente.
-Nessun file scritto. Exit 0 se tutto OK, 1 altrimenti.
-
-Uso:
-    python ollama/check_env.py
-"""
-
-import shutil
-import subprocess
-import sys
-from pathlib import Path
-
-
-# ─── Lista canonica di modelli embedding supportati ───────────────────────────
-# Ordine: prima scelta → ultima scelta (come da ollama/README.md)
-EMBED_MODELS = [
-    "qwen3-embedding",
-    "nomic-embed-text-v2-moe",
-    "bge-m3",
-    "nomic-embed-text",
-    "mxbai-embed-large",
-    "paraphrase-multilingual",
-    "all-minilm",
-]
-EMBED_MODEL_PREFIXES = tuple(EMBED_MODELS)
-
-OLLAMA_SERVE_HINT = "   → Avvia il servizio con: ollama serve"
-RECOMMENDED_EMBED_MODEL = "qwen3-embedding:0.6b"
-RECOMMENDED_LLM_MODEL = "qwen3.5:4b"
-
-
-def _is_embed(model_name: str) -> bool:
-    """True se il modello è riconosciuto come embedding (lista canonica o keyword)."""
-    base = model_name.split(":")[0].lower()
-    return base.startswith(EMBED_MODEL_PREFIXES) or "embed" in base
-
-
-def _parse_ollama_models(raw_output: str) -> list[str]:
-    """Estrae i nomi modello dall'output di `ollama list`."""
-    models: list[str] = []
-    for idx, line in enumerate(raw_output.splitlines()):
-        line = line.strip()
-        if not line:
-            continue
-        # Prima riga: header tabellare ("NAME ...")
-        if idx == 0 and line.lower().startswith("name"):
-            continue
-        model_name = line.split(maxsplit=1)[0]
-        models.append(model_name)
-    return models
-
-
-sys.path.insert(0, str(Path(__file__).parent.parent))
-try:
-    from config import EMBED_MODEL as CONFIGURED_EMBED, OLLAMA_MODEL as CONFIGURED_LLM
-except Exception:
-    CONFIGURED_EMBED = None
-    CONFIGURED_LLM = None
-
-REQUIRED_LIBS = ["chromadb"]
-
-
-# ─── Checks ───────────────────────────────────────────────────────────────────
-
-def _print_model_list(title: str, models: list[str]) -> None:
-    """Stampa in modo uniforme una lista di modelli."""
-    if not models:
-        print(f"   {title}: nessuno")
-        return
-    print(f"   {title} ({len(models)}):")
-    for model in models:
-        print(f"   - {model}")
-
-def check_ollama_in_path() -> bool:
-    """Verifica che ollama sia nel PATH."""
-    found = shutil.which("ollama") is not None
-    if found:
-        print("✅ ollama trovato nel PATH")
-    else:
-        print("❌ ollama non trovato nel PATH")
-        print("   → Installa con: curl -fsSL https://ollama.com/install.sh | sh")
-    return found
-
-
-def check_ollama_running() -> list[str] | None:
-    """
-    Esegue 'ollama list' e ritorna la lista dei modelli disponibili.
-    Ritorna None se ollama non risponde.
-    """
-    try:
-        result = subprocess.run(
-            ["ollama", "list"],
-            capture_output=True, text=True, timeout=10
-        )
-        if result.returncode != 0:
-            print("❌ ollama non risponde (errore all'avvio)")
-            print(OLLAMA_SERVE_HINT)
-            return None
-        models = _parse_ollama_models(result.stdout)
-        print("✅ ollama risponde correttamente")
-        return models
-    except FileNotFoundError:
-        print("❌ ollama non trovato (FileNotFoundError)")
-        return None
-    except subprocess.TimeoutExpired:
-        print("❌ ollama non risponde (timeout)")
-        print(OLLAMA_SERVE_HINT)
-        return None
-
-
-def _match(model_name: str, available: list[str]) -> str | None:
-    """
-    Ritorna il nome completo del modello trovato in 'available' che corrisponde
-    a 'model_name' (confronto per prefisso), oppure None.
-    """
-    for m in available:
-        if m == model_name or m.startswith(model_name + ":") or m.startswith(model_name + "-"):
-            return m
-    return None
-
-
-def _check_configured_model(
-    configured_name: str | None,
-    available: list[str],
-    label: str,
-) -> bool | None:
-    """
-    Se esiste un modello configurato, lo verifica e ritorna True/False.
-    Se non è configurato, ritorna None (il chiamante userà il fallback).
-    """
-    if not configured_name:
-        return None
-
-    print(f"   modello configurato (config.py): {configured_name}")
-    found = _match(configured_name, available)
-    if found:
-        print(f"✅ {label} disponibile: {found}")
-        return True
-
-    print(f"❌ {configured_name} non trovato in Ollama")
-    print(f"   → ollama pull {configured_name}")
-    return False
-
-
-def check_embed_model(available: list[str]) -> bool:
-    """Verifica che il modello di embedding configurato sia disponibile."""
-    configured_check = _check_configured_model(CONFIGURED_EMBED, available, "embedding")
-    if configured_check is not None:
-        return configured_check
-
-    # fallback: config.py non leggibile
-    found = next((m for m in available if _is_embed(m)), None)
-    if found:
-        print(f"✅ modello embedding trovato: {found}")
-        return True
-    print("❌ nessun modello di embedding trovato")
-    print(f"   → Prima scelta: ollama pull {RECOMMENDED_EMBED_MODEL}")
-    return False
-
-
-def check_llm_model(available: list[str]) -> bool:
-    """Verifica che il modello LLM configurato sia disponibile."""
-    configured_check = _check_configured_model(CONFIGURED_LLM, available, "LLM")
-    if configured_check is not None:
-        return configured_check
-
-    # fallback: config.py non leggibile
-    first_llm = next((m for m in available if not _is_embed(m)), None)
-    if first_llm:
-        print(f"✅ modello LLM trovato: {first_llm}")
-        return True
-    print("❌ nessun modello LLM trovato")
-    print(f"   → Consigliato per 8 GB RAM: ollama pull {RECOMMENDED_LLM_MODEL}")
-    return False
-
-
-def check_library(lib: str) -> bool:
-    """Verifica che una libreria Python sia importabile."""
-    try:
-        __import__(lib)
-        print(f"✅ {lib} importabile")
-        return True
-    except ImportError:
-        print(f"❌ {lib} non importabile")
-        print(f"   → Installa con: pip install {lib}")
-        return False
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-def main() -> int:
-    print("─── Verifica ambiente Ollama ─────────────────────────────────────────\n")
-
-    results: list[bool] = []
-
-    # 1. ollama nel PATH
-    in_path = check_ollama_in_path()
-    results.append(in_path)
-
-    # 2. ollama risponde + modelli
-    if in_path:
-        available = check_ollama_running()
-        if available is None:
-            results.extend([False, False, False])
-        else:
-            results.append(True)
-            _print_model_list(
-                "modelli embedding rilevati",
-                [m for m in available if _is_embed(m)],
-            )
-            _print_model_list(
-                "modelli LLM rilevati",
-                [m for m in available if not _is_embed(m)],
-            )
-            results.append(check_embed_model(available))
-            results.append(check_llm_model(available))
-    else:
-        results.extend([False, False, False])
-        print("⚠️  modelli non verificabili (ollama non trovato)")
-
-    # 3. Librerie Python
-    print()
-    for lib in REQUIRED_LIBS:
-        results.append(check_library(lib))
-
-    # ── Riepilogo ─────────────────────────────────────────────────────────────
-    print()
-    print("──────────────────────────────────────────────────────────────────────")
-    all_ok = all(results)
-    if all_ok:
-        print("✅ Ambiente pronto")
-    else:
-        n_fail = sum(1 for r in results if not r)
-        print(f"⚠️  {n_fail} problema/i rilevato/i — risolvi prima di procedere.")
-
-    return 0 if all_ok else 1
-
-
-if __name__ == "__main__":
-    sys.exit(main())
diff --git a/ollama/test_ollama.py b/ollama/test_ollama.py
deleted file mode 100644
index 3054d59..0000000
--- a/ollama/test_ollama.py
+++ /dev/null
@@ -1,66 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test chat locale Ollama — senza RAG, senza ChromaDB.
-Uso: python ollama/test_ollama.py
-"""
-
-import json
-import sys
-import urllib.error
-import urllib.request
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).parent.parent))
-import config as _cfg
-
-OLLAMA_URL  = _cfg.OLLAMA_URL
-MODEL       = _cfg.OLLAMA_MODEL
-TEMPERATURE = _cfg.TEMPERATURE
-NO_THINK    = _cfg.NO_THINK
-
-
-def chat(prompt: str) -> str:
-    payload = json.dumps({
-        "model": MODEL,
-        "prompt": prompt,
-        "stream": False,
-        "think": not NO_THINK,
-        "options": {"temperature": TEMPERATURE},
-    }).encode()
-    req = urllib.request.Request(
-        f"{OLLAMA_URL}/api/generate",
-        data=payload,
-        headers={"Content-Type": "application/json"},
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=300) as resp:
-        return json.loads(resp.read())["response"].strip()
-
-
-def main() -> int:
-    print(f"─── Chat Ollama ──────────────────────────────── (exit per uscire)")
-    print(f"  Modello   : {MODEL}")
-    print(f"  Thinking  : {'off' if NO_THINK else 'on'}")
-    print()
-
-    while True:
-        try:
-            user = input("Tu: ").strip()
-        except (EOFError, KeyboardInterrupt):
-            print("\nUscita.")
-            break
-        if not user:
-            continue
-        if user.lower() == "exit":
-            break
-        try:
-            reply = chat(user)
-            print(f"\nAssistente: {reply}\n")
-        except (urllib.error.URLError, OSError) as e:
-            print(f"❌ Errore: {e}")
-
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
diff --git a/rag.py b/rag.py
deleted file mode 100644
index f8f406e..0000000
--- a/rag.py
+++ /dev/null
@@ -1,252 +0,0 @@
-#!/usr/bin/env python3
-"""
-Pipeline RAG interattiva
-
-Riceve una domanda, recupera i chunk più rilevanti da ChromaDB (retrieval)
-e genera una risposta tramite Ollama (generation).
-
-Input:  chroma_db/<stem> (collection ChromaDB)
-Output: risposta a schermo
-
-Uso:
-    python rag.py --stem <nome>
-
-Nel loop interattivo:
-    Domanda: <testo>       → risposta
-    Domanda: <testo> -v    → risposta + chunk recuperati
-    Domanda: exit          → uscita
-"""
-
-import argparse
-import json
-import sys
-import urllib.error
-import urllib.request
-from pathlib import Path
-
-import chromadb
-
-# ─── Configurazione ───────────────────────────────────────────────────────────
-
-sys.path.insert(0, str(Path(__file__).parent))
-import config as _cfg
-
-project_root = Path(__file__).parent
-CHROMA_DIR   = project_root / "chroma_db"
-
-OLLAMA_URL    = _cfg.OLLAMA_URL
-EMBED_MODEL   = _cfg.EMBED_MODEL
-LLM_MODEL     = _cfg.OLLAMA_MODEL
-TOP_K         = _cfg.TOP_K
-TEMPERATURE   = _cfg.TEMPERATURE
-NO_THINK      = _cfg.NO_THINK
-SYSTEM_PROMPT = _cfg.SYSTEM_PROMPT
-
-
-# ─── Embedding ────────────────────────────────────────────────────────────────
-
-def embed(text: str) -> list[float]:
-    """Genera il vettore della domanda tramite Ollama."""
-    payload = json.dumps({"model": EMBED_MODEL, "prompt": text}).encode()
-    req = urllib.request.Request(
-        f"{OLLAMA_URL}/api/embeddings",
-        data=payload,
-        headers={"Content-Type": "application/json"},
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=30) as resp:
-        return json.loads(resp.read())["embedding"]
-
-
-# ─── Generazione ──────────────────────────────────────────────────────────────
-
-def call_ollama(prompt: str, system: str = "") -> str:
-    """Chiama Ollama /api/generate e ritorna la risposta."""
-    payload = json.dumps({
-        "model": LLM_MODEL,
-        "system": system,
-        "prompt": prompt,
-        "stream": False,
-        "think": not NO_THINK,
-        "options": {"temperature": TEMPERATURE},
-    }).encode()
-    req = urllib.request.Request(
-        f"{OLLAMA_URL}/api/generate",
-        data=payload,
-        headers={"Content-Type": "application/json"},
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=300) as resp:
-        return json.loads(resp.read())["response"].strip()
-
-
-# ─── Retrieval ────────────────────────────────────────────────────────────────
-
-def retrieve(collection: chromadb.Collection, question: str) -> list[dict]:
-    """
-    Genera l'embedding della domanda e recupera i TOP_K chunk più simili.
-    Ritorna lista di dict con chiavi: text, sezione, titolo, distance.
-    """
-    vector = embed(question)
-    results = collection.query(
-        query_embeddings=[vector],
-        n_results=TOP_K,
-        include=["documents", "metadatas", "distances"],
-    )
-    chunks = []
-    for text, meta, dist in zip(
-        results["documents"][0],
-        results["metadatas"][0],
-        results["distances"][0],
-    ):
-        chunks.append({
-            "text":     text,
-            "sezione":  meta.get("sezione", ""),
-            "titolo":   meta.get("titolo", ""),
-            "distance": dist,
-        })
-    return chunks
-
-
-# ─── Prompt ───────────────────────────────────────────────────────────────────
-
-def build_prompt(question: str, chunks: list[dict]) -> str:
-    """Ritorna (system, user_prompt) separati per l'API Ollama."""
-    context_parts = []
-    for i, c in enumerate(chunks, start=1):
-        header = f"[Contesto {i}"
-        if c["sezione"]:
-            header += f" — {c['sezione']}"
-            if c["titolo"]:
-                header += f" > {c['titolo']}"
-        header += "]"
-        context_parts.append(f"{header}\n{c['text']}")
-
-    context = "\n\n".join(context_parts)
-    user_prompt = f"{context}\n\nDomanda: {question}"
-    return SYSTEM_PROMPT, user_prompt
-
-
-# ─── Loop interattivo ─────────────────────────────────────────────────────────
-
-def answer(question: str, collection: chromadb.Collection, verbose: bool) -> None:
-    try:
-        chunks = retrieve(collection, question)
-    except (urllib.error.URLError, OSError) as e:
-        print(f"❌ Errore embedding: {e}")
-        return
-
-    if verbose:
-        print("\n── Chunk recuperati ──────────────────────────────────────────")
-        for i, c in enumerate(chunks, start=1):
-            loc = c["sezione"]
-            if c["titolo"]:
-                loc += f" > {c['titolo']}"
-            sim = 1 - c["distance"]
-            print(f"  [{i}] {loc}  (similarità: {sim:.3f})")
-            print(f"      {c['text'][:120].replace(chr(10), ' ')}...")
-        print("──────────────────────────────────────────────────────────────\n")
-
-    system, prompt = build_prompt(question, chunks)
-
-    try:
-        response = call_ollama(prompt, system=system)
-    except (urllib.error.URLError, OSError) as e:
-        print(f"❌ Errore generazione: {e}")
-        return
-
-    print(f"\n{response}\n")
-
-
-def run_loop(collection: chromadb.Collection) -> None:
-    print("── Loop RAG ─────────────────────────────────────── (exit per uscire)\n")
-    while True:
-        try:
-            raw = input("Domanda: ").strip()
-        except (EOFError, KeyboardInterrupt):
-            print("\nUscita.")
-            break
-
-        if not raw:
-            continue
-        if raw.lower() == "exit":
-            break
-
-        verbose = raw.endswith(" -v")
-        question = raw[:-3].strip() if verbose else raw
-
-        answer(question, collection, verbose)
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-def _build_epilog() -> str:
-    lines = [
-        "Uso:",
-        "  python rag.py --stem <nome>",
-        "",
-        "Loop interattivo:",
-        "  <domanda>       risposta basata sul documento",
-        "  <domanda> -v    risposta + chunk recuperati con score di similarità",
-        "  exit            termina",
-    ]
-    if CHROMA_DIR.exists():
-        try:
-            client = chromadb.PersistentClient(path=str(CHROMA_DIR))
-            names = [c.name for c in client.list_collections()]
-            if names:
-                lines += ["", f"Collection disponibili: {', '.join(names)}"]
-            else:
-                lines += ["", "Nessuna collection trovata — eseguire prima: python step-8/ingest.py"]
-        except Exception:
-            pass
-    return "\n".join(lines)
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(
-        description=(
-            "Pipeline RAG interattiva\n\n"
-            "Risponde a domande in linguaggio naturale su un documento\n"
-            "indicizzato in ChromaDB da step-8/ingest.py."
-        ),
-        epilog=_build_epilog(),
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-    )
-    parser.add_argument(
-        "--stem",
-        required=True,
-        help=(
-            "Nome della collection ChromaDB da interrogare. "
-            "Le collection vengono create da: python step-8/ingest.py --stem <nome>"
-        ),
-    )
-    args = parser.parse_args()
-
-    print("─── Pipeline RAG ────────────────────────────────────────────\n")
-    print(f"  Documento : {args.stem}")
-    print(f"  Modello   : {LLM_MODEL}")
-    print(f"  Top-K     : {TOP_K}")
-    print(f"  Thinking  : {'off' if NO_THINK else 'on'}")
-    print()
-
-    if not CHROMA_DIR.exists():
-        print("❌ chroma_db/ non trovata — esegui prima step-8")
-        return 1
-
-    client = chromadb.PersistentClient(path=str(CHROMA_DIR))
-    collections = [c.name for c in client.list_collections()]
-    if args.stem not in collections:
-        print(f"❌ Collection '{args.stem}' non trovata in chroma_db/")
-        print(f"   → python step-8/ingest.py --stem {args.stem}")
-        return 1
-
-    collection = client.get_collection(args.stem)
-    print(f"✅ Collection '{args.stem}' caricata ({collection.count()} chunk)\n")
-
-    run_loop(collection)
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
diff --git a/requirements.txt b/requirements.txt
index dc5da54..d68674f 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,3 @@
 pdfplumber==0.11.9
 pymupdf4llm
 opendataloader-pdf
-chromadb
diff --git a/retrieve.py b/retrieve.py
deleted file mode 100644
index 03b26a1..0000000
--- a/retrieve.py
+++ /dev/null
@@ -1,217 +0,0 @@
-#!/usr/bin/env python3
-"""
-Retrieval puro (senza generazione LLM)
-
-Loop interattivo: inserisci una query, ottieni i chunk più simili dalla
-collection ChromaDB tramite embedding semantico — senza chiamare Ollama
-per la generation.
-
-Utile per:
-  - verificare la qualità del retrieval prima di diagnosticare risposte sbagliate
-  - controllare che i chunk giusti vengano recuperati per una query
-  - usare la pipeline come motore di ricerca semantica
-
-Input:  chroma_db/<stem> (collection ChromaDB)
-Output: lista chunk con score di similarità
-
-Uso:
-    python retrieve.py --stem <nome>
-
-Nel loop interattivo:
-    Query: <testo>      → chunk più simili con score
-    Query: <testo> -f   → testo completo dei chunk
-    Query: exit         → uscita
-"""
-
-import argparse
-import json
-import sys
-import urllib.error
-import urllib.request
-from pathlib import Path
-
-import chromadb
-
-# ─── Configurazione ───────────────────────────────────────────────────────────
-
-sys.path.insert(0, str(Path(__file__).parent))
-import config as _cfg
-
-project_root = Path(__file__).parent
-CHROMA_DIR   = project_root / "chroma_db"
-
-OLLAMA_URL  = _cfg.OLLAMA_URL
-EMBED_MODEL = _cfg.EMBED_MODEL
-TOP_K       = _cfg.TOP_K
-
-
-# ─── Embedding ────────────────────────────────────────────────────────────────
-
-def embed(text: str) -> list[float]:
-    """Genera il vettore della query tramite Ollama."""
-    payload = json.dumps({"model": EMBED_MODEL, "prompt": text}).encode()
-    req = urllib.request.Request(
-        f"{OLLAMA_URL}/api/embeddings",
-        data=payload,
-        headers={"Content-Type": "application/json"},
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=30) as resp:
-        return json.loads(resp.read())["embedding"]
-
-
-# ─── Retrieval ────────────────────────────────────────────────────────────────
-
-def retrieve(collection: chromadb.Collection, query: str, top_k: int) -> list[dict]:
-    """
-    Genera l'embedding della query e recupera i top_k chunk più simili.
-    Ritorna lista di dict con chiavi: rank, similarity, sezione, titolo, text.
-    """
-    vector = embed(query)
-    results = collection.query(
-        query_embeddings=[vector],
-        n_results=top_k,
-        include=["documents", "metadatas", "distances"],
-    )
-    chunks = []
-    for rank, (text, meta, dist) in enumerate(
-        zip(
-            results["documents"][0],
-            results["metadatas"][0],
-            results["distances"][0],
-        ),
-        start=1,
-    ):
-        chunks.append({
-            "rank":       rank,
-            "similarity": round(1 - dist, 4),
-            "sezione":    meta.get("sezione", ""),
-            "titolo":     meta.get("titolo", ""),
-            "text":       text,
-        })
-    return chunks
-
-
-# ─── Output ───────────────────────────────────────────────────────────────────
-
-def print_results(chunks: list[dict], full: bool = False) -> None:
-    print(f"── {len(chunks)} chunk recuperati ─────────────────────────────────\n")
-    for c in chunks:
-        loc = c["sezione"]
-        if c["titolo"]:
-            loc += f" > {c['titolo']}"
-        print(f"  [{c['rank']}] similarità: {c['similarity']:.4f}  |  {loc}")
-        if full:
-            print()
-            print(c["text"])
-        else:
-            print(f"      {c['text'][:200].replace(chr(10), ' ')}")
-            if len(c["text"]) > 200:
-                print(f"      … ({len(c['text'])} caratteri totali)")
-        print()
-
-
-# ─── Loop interattivo ─────────────────────────────────────────────────────────
-
-def run_loop(collection: chromadb.Collection, top_k: int) -> None:
-    print("── Loop retrieval ──────────────────────── (exit per uscire, -f per testo completo)\n")
-    while True:
-        try:
-            raw = input("Query: ").strip()
-        except (EOFError, KeyboardInterrupt):
-            print("\nUscita.")
-            break
-
-        if not raw:
-            continue
-        if raw.lower() == "exit":
-            break
-
-        full = raw.endswith(" -f")
-        query = raw[:-3].strip() if full else raw
-
-        try:
-            chunks = retrieve(collection, query, top_k)
-        except (urllib.error.URLError, OSError) as e:
-            print(f"❌ Errore embedding (Ollama raggiungibile?): {e}\n")
-            continue
-
-        print()
-        print_results(chunks, full=full)
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-def _build_epilog() -> str:
-    lines = [
-        "Uso:",
-        "  python retrieve.py --stem <nome>",
-        "",
-        "Nel loop interattivo:",
-        "  <query>       chunk più simili con score (testo troncato)",
-        "  <query> -f    testo completo dei chunk",
-        "  exit          termina",
-    ]
-    if CHROMA_DIR.exists():
-        try:
-            client = chromadb.PersistentClient(path=str(CHROMA_DIR))
-            names = [c.name for c in client.list_collections()]
-            if names:
-                lines += ["", f"Collection disponibili: {', '.join(names)}"]
-            else:
-                lines += ["", "Nessuna collection trovata — eseguire prima: python step-8/ingest.py"]
-        except Exception:
-            pass
-    return "\n".join(lines)
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(
-        description=(
-            "Retrieval puro (senza LLM)\n\n"
-            "Loop interattivo: inserisci una query e ottieni i chunk più simili\n"
-            "tramite embedding semantico, senza generazione LLM."
-        ),
-        epilog=_build_epilog(),
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-    )
-    parser.add_argument(
-        "--stem",
-        required=True,
-        help="Nome della collection ChromaDB da interrogare.",
-    )
-    parser.add_argument(
-        "--top-k",
-        type=int,
-        default=TOP_K,
-        metavar="N",
-        help=f"Numero di chunk da restituire per query (default: {TOP_K} da config.py).",
-    )
-    args = parser.parse_args()
-
-    print("─── Retrieval puro ──────────────────────────────────────────\n")
-    print(f"  Documento    : {args.stem}")
-    print(f"  Embed model  : {EMBED_MODEL}")
-    print(f"  Top-K        : {args.top_k}")
-    print()
-
-    if not CHROMA_DIR.exists():
-        print("❌ chroma_db/ non trovata — esegui prima step-8", file=sys.stderr)
-        return 1
-
-    client = chromadb.PersistentClient(path=str(CHROMA_DIR))
-    collections = [c.name for c in client.list_collections()]
-    if args.stem not in collections:
-        print(f"❌ Collection '{args.stem}' non trovata in chroma_db/", file=sys.stderr)
-        print(f"   → python step-8/ingest.py --stem {args.stem}", file=sys.stderr)
-        return 1
-
-    collection = client.get_collection(args.stem)
-    print(f"✅ Collection '{args.stem}' caricata ({collection.count()} chunk)\n")
-
-    run_loop(collection, args.top_k)
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
diff --git a/step-8/README.md b/step-8/README.md
deleted file mode 100644
index afcef49..0000000
--- a/step-8/README.md
+++ /dev/null
@@ -1,114 +0,0 @@
-# Step 8 — Vettorizzazione
-
-Legge i chunk prodotti da step-6, genera gli embedding tramite Ollama e li
-salva in ChromaDB (vector store persistente su disco).
-
----
-
-## Prerequisiti
-
-- Step-6 completato (esiste `step-6/<stem>/chunks.json`)
-- Ollama attivo con il modello di embedding scaricato
-- `chromadb` installato (`pip install -r requirements.txt`)
-
----
-
-## Configurazione modello
-
-Il modello di embedding viene letto da **`config.py`**:
-
-```python
-# config.py
-EMBED_MODEL = "nomic-embed-text"   # ← cambia qui
-```
-
-> Il modello scelto qui deve corrispondere a quello usato in rag.py.
-> Se lo cambi dopo aver già vettorizzato, devi rieseguire step-8 con `--force`.
-
----
-
-## Uso
-
-```bash
-# Vettorizza un singolo documento
-python step-8/ingest.py --stem <nome>
-
-# Vettorizza tutti i documenti trovati in step-6/
-python step-8/ingest.py
-
-# Sovrascrive una collection già esistente
-python step-8/ingest.py --stem <nome> --force
-
-# Override modello (senza modificare config.py)
-python step-8/ingest.py --stem <nome> --model bge-m3
-```
-
----
-
-## Output
-
-I vettori vengono salvati in `chroma_db/<stem>/` come collection ChromaDB con
-distanza coseno. La directory è ignorata da git (generata automaticamente).
-
----
-
-## Modelli supportati
-
-Stessi modelli raccomandati nel [README di ollama](../ollama/README.md).
-Il modello deve essere scaricato in Ollama prima di eseguire questo script
-(`ollama pull <modello>`).
-
----
-
-## Regole d'oro per parametri ottimali
-
-### Modello di embedding
-
-**Usa un modello multilingue per testi italiani.**
-I modelli English-first (`nomic-embed-text`, `mxbai-embed-large`, `all-minilm`)
-producono vettori di qualità inferiore su italiano, con retrieval meno preciso.
-Prima scelta: `qwen3-embedding:0.6b`.
-
-**Più dimensioni = retrieval più preciso, ma più spazio su disco.**
-
-| Dimensioni | Modelli | Quando usarlo |
-|---|---|---|
-| 1024 | `qwen3-embedding:0.6b`, `bge-m3` | documenti tecnici, testi lunghi |
-| 768 | `nomic-embed-text-v2-moe` | buon compromesso |
-| 384 | `all-minilm` | solo per test rapidi |
-
-**Usa la stessa famiglia LLM + embedding quando possibile.**
-`qwen3-embedding` + `qwen3.5` condividono tokenizer e spazio semantico —
-il retrieval è più coerente rispetto a modelli di famiglie diverse.
-
-### Coerenza tra ingest e retrieval
-
-**`EMBED_MODEL` deve essere identico in `ingest.py` e `rag.py`.**
-ChromaDB memorizza i vettori generati con un certo modello. Se `rag.py` usa un
-modello diverso per la query di ricerca, gli spazi vettoriali non corrispondono
-e il retrieval restituisce risultati casuali — senza alcun errore visibile.
-
-**Dopo aver cambiato `EMBED_MODEL`, riesegui sempre con `--force`.**
-Senza `--force` lo script salta la collection già esistente — i vecchi vettori
-(generati col modello precedente) restano e continuano a essere usati da `rag.py`.
-
-```bash
-# Cambio modello → ricrea sempre la collection
-python step-8/ingest.py --stem <nome> --force
-```
-
-### Quando usare `--force`
-
-| Situazione | `--force` necessario? |
-|---|---|
-| Prima esecuzione | No |
-| Hai cambiato `EMBED_MODEL` | **Sì** |
-| Hai migliorato i chunk in step-6 | **Sì** |
-| Hai aggiunto nuovi documenti (stem diverso) | No |
-| Vuoi solo verificare che funzioni | No |
-
-### Distanza vettoriale
-
-Lo script usa **distanza coseno** (hardcoded), che è la scelta corretta per
-embedding testuali — misura l'angolo tra vettori indipendentemente dalla loro
-lunghezza. Non cambiare questo parametro.
diff --git a/step-8/ingest.py b/step-8/ingest.py
deleted file mode 100644
index 7dda557..0000000
--- a/step-8/ingest.py
+++ /dev/null
@@ -1,232 +0,0 @@
-#!/usr/bin/env python3
-"""
-Step 8 — Vettorizzazione
-
-Legge i chunk prodotti da step-6, genera gli embedding tramite Ollama
-e li indicizza in ChromaDB (persistente).
-
-Il modello di embedding viene letto da config.py (EMBED_MODEL).
-Puoi sovrascriverlo con --model, ma deve corrispondere al modello che
-userai in rag.py — altrimenti riesegui con --force dopo aver cambiato.
-
-Input:  step-6/<stem>/chunks.json
-Output: chroma_db/<stem> (collection ChromaDB)
-
-Uso:
-    python step-8/ingest.py --stem <nome>            # singolo documento
-    python step-8/ingest.py                          # tutti gli stem trovati
-    python step-8/ingest.py --stem <nome> --force    # sovrascrive collection
-    python step-8/ingest.py --model bge-m3           # override modello
-"""
-
-import argparse
-import json
-import sys
-import time
-import urllib.error
-import urllib.request
-from pathlib import Path
-
-import chromadb
-
-# ─── Configurazione ────────────────────────────────────────────────────────────
-
-project_root = Path(__file__).parent.parent
-
-CHUNKS_DIR = project_root / "step-6"
-CHROMA_DIR = project_root / "chroma_db"
-
-sys.path.insert(0, str(project_root))
-from config import EMBED_MODEL, OLLAMA_URL  # noqa: E402
-
-EMBED_ENDPOINT = f"{OLLAMA_URL}/api/embeddings"
-
-
-# ─── Ollama ────────────────────────────────────────────────────────────────────
-
-def embed(text: str, model: str) -> list[float]:
-    """Chiama Ollama /api/embeddings e ritorna il vettore."""
-    payload = json.dumps({"model": model, "prompt": text}).encode()
-    req = urllib.request.Request(
-        EMBED_ENDPOINT,
-        data=payload,
-        headers={"Content-Type": "application/json"},
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=60) as resp:
-        data = json.loads(resp.read())
-    return data["embedding"]
-
-
-def check_ollama(model: str) -> bool:
-    """Verifica che Ollama sia attivo e che il modello di embedding sia disponibile."""
-    try:
-        req = urllib.request.Request(f"{OLLAMA_URL}/api/tags", method="GET")
-        with urllib.request.urlopen(req, timeout=10) as resp:
-            data = json.loads(resp.read())
-        models = [m["name"] for m in data.get("models", [])]
-        found = any(
-            m == model or m.startswith(model + ":")
-            for m in models
-        )
-        if found:
-            print(f"✅ Ollama OK — {model} disponibile")
-            return True
-        print(f"❌ Modello {model} non trovato in Ollama")
-        print(f"   → ollama pull {model}")
-        return False
-    except (urllib.error.URLError, OSError):
-        print("❌ Ollama non raggiungibile — assicurati che sia in esecuzione")
-        print("   → ollama serve")
-        return False
-
-
-# ─── ChromaDB ─────────────────────────────────────────────────────────────────
-
-def get_client() -> chromadb.PersistentClient:
-    CHROMA_DIR.mkdir(parents=True, exist_ok=True)
-    return chromadb.PersistentClient(path=str(CHROMA_DIR))
-
-
-def collection_exists(client: chromadb.PersistentClient, stem: str) -> bool:
-    return any(c.name == stem for c in client.list_collections())
-
-
-# ─── Ingestione ───────────────────────────────────────────────────────────────
-
-def ingest(stem: str, force: bool, model: str = EMBED_MODEL) -> bool:
-    """
-    Legge step-6/<stem>/chunks.json, genera embedding e popola ChromaDB.
-    Ritorna True se completato con successo, False altrimenti.
-    """
-    chunks_path = CHUNKS_DIR / stem / "chunks.json"
-    if not chunks_path.exists():
-        print(f"❌ File non trovato: {chunks_path}")
-        return False
-
-    with open(chunks_path, encoding="utf-8") as f:
-        chunks = json.load(f)
-
-    if not chunks:
-        print(f"⚠️  {stem}: chunks.json è vuoto — skip")
-        return False
-
-    client = get_client()
-
-    if collection_exists(client, stem):
-        if not force:
-            print(f"⚠️  Collection '{stem}' già presente in ChromaDB — skip")
-            print(f"   → usa --force per sovrascrivere")
-            return True  # non è un errore, è uno skip
-        client.delete_collection(stem)
-        print(f"🗑️  Collection '{stem}' rimossa (--force)")
-
-    collection = client.create_collection(
-        name=stem,
-        metadata={"hnsw:space": "cosine"},
-    )
-
-    total = len(chunks)
-    print(f"📦 {total} chunk da ingestire\n")
-
-    ids        = []
-    embeddings = []
-    documents  = []
-    metadatas  = []
-
-    start     = time.monotonic()
-    durations: list[float] = []
-
-    for i, chunk in enumerate(chunks, start=1):
-        t0 = time.monotonic()
-        vector = embed(chunk["text"], model)
-        t1 = time.monotonic()
-        durations.append(t1 - t0)
-
-        ids.append(chunk["chunk_id"])
-        embeddings.append(vector)
-        documents.append(chunk["text"])
-        metadatas.append({
-            "sezione":   chunk.get("sezione", ""),
-            "titolo":    chunk.get("titolo", ""),
-            "sub_index": chunk.get("sub_index", 0),
-        })
-
-        avg  = sum(durations) / len(durations)
-        eta  = int(avg * (total - i))
-        done = f"[{i:>{len(str(total))}}/{total}]"
-        cid  = chunk["chunk_id"][:50]
-        line = f"  {done} ✓ {cid:<50}  ETA: {eta}s"
-        print(f"{line:<80}", end="\r", flush=True)
-
-        # Upsert in batch da 100 per non sovraccaricare la memoria
-        if len(ids) == 100:
-            collection.add(
-                ids=ids,
-                embeddings=embeddings,
-                documents=documents,
-                metadatas=metadatas,
-            )
-            ids, embeddings, documents, metadatas = [], [], [], []
-
-    # Upsert dei rimanenti
-    if ids:
-        collection.add(
-            ids=ids,
-            embeddings=embeddings,
-            documents=documents,
-            metadatas=metadatas,
-        )
-
-    elapsed = int(time.monotonic() - start)
-    print()  # nuova riga dopo il \r
-    print(f"\n✅ Ingestione completata in {elapsed}s — {total}/{total} chunk salvati")
-    print(f"   Collection '{stem}' in {CHROMA_DIR}/")
-    return True
-
-
-# ─── Entry point ──────────────────────────────────────────────────────────────
-
-def find_stems() -> list[str]:
-    """Ritorna tutti gli stem che hanno un chunks.json in step-6/."""
-    return sorted(
-        p.parent.name
-        for p in CHUNKS_DIR.glob("*/chunks.json")
-    )
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(
-        description="Step 8 — Vettorizzazione chunk in ChromaDB"
-    )
-    parser.add_argument("--stem", help="Nome del documento (senza --stem = tutti)")
-    parser.add_argument("--force", action="store_true",
-                        help="Sovrascrive la collection se già esistente")
-    parser.add_argument("--model", default=EMBED_MODEL,
-                        help=f"Modello embedding Ollama (default da config.py: {EMBED_MODEL})")
-    args = parser.parse_args()
-
-    print("─── Step 8 — Vettorizzazione ─────────────────────────────────────────\n")
-
-    if not check_ollama(args.model):
-        return 1
-
-    stems = [args.stem] if args.stem else find_stems()
-    if not stems:
-        print("❌ Nessun chunks.json trovato in step-6/")
-        return 1
-
-    print()
-    results = []
-    for stem in stems:
-        if len(stems) > 1:
-            print(f"── {stem} ──")
-        results.append(ingest(stem, force=args.force, model=args.model))
-        if len(stems) > 1:
-            print()
-
-    return 0 if all(results) else 1
-
-
-if __name__ == "__main__":
-    sys.exit(main())