-
48567fa5e7
fix(verify): riconosce URL www. come terminatori validi + doc multi-documento
main
davide
2026-05-12 11:21:24 +02:00
-
8d972fa7c6
feat(ingestion): supporto multi-documento in unica collection ChromaDB
davide
2026-05-12 11:21:17 +02:00
-
5b63c423cc
feat(chunks): ottimizzazione chunking e post-processing
davide
2026-05-12 11:09:28 +02:00
-
587238f9f5
docs(conversione): aggiorna README — comandi, output e log di esecuzione
davide
2026-05-12 10:43:17 +02:00
-
c381d7da3c
docs(readme): aggiunge sezioni configurazione modelli, test ollama, retrieval e RAG
davide
2026-05-12 10:39:27 +02:00
-
b5fb363104
chore(config): tuning RAG — modello 4b, temperatura 0.2, chunk target 300
davide
2026-05-12 10:37:39 +02:00
-
602dc87045
fix(ingestion): correggi path chunks da step-6/ a chunks/
davide
2026-05-12 10:37:35 +02:00
-
b49ef8edf0
docs: aggiorna README con flusso ingestion completo
davide
2026-05-11 16:05:23 +02:00
-
9e1a72a9e6
refactor: rinomina step-8 → ingestion
davide
2026-05-11 15:58:54 +02:00
-
70b304e1d4
docs(readme): flusso completo conversione → chunking
davide
2026-05-11 15:46:52 +02:00
-
02c785678d
feat(chunks): target-based chunking con config centralizzata
davide
2026-05-11 15:45:24 +02:00
-
508587c5bf
Merge branch 'chunks' into main
davide
2026-05-11 14:51:53 +02:00
-
-
ebd2a43f84
feat: integra pipeline PDF→Markdown a 9 stadi e test suite
chunks
davide
2026-05-11 14:46:16 +02:00
-
e1b5298b20
feat: integra pipeline PDF→Markdown a 9 stadi e test suite
davide
2026-05-11 14:44:16 +02:00
-
444942dc8f
feat: demota #→## quando il documento usa h1 per sezioni principali
marker
davide
2026-05-07 16:21:02 +02:00
-
3f4689e8fd
feat: rileva note bibliografiche e raccolte multi-articolo in pipeline
davide
2026-05-07 16:12:50 +02:00
-
2c0b7a462e
feat: migliora pipeline PDF→MD per RAG — frontmatter e page marker
davide
2026-05-07 14:58:09 +02:00
-
6e755c0b6c
fix(clear.sh): esclude _pipeline/ dal batch e supporta stem singolo
davide
2026-05-07 14:53:17 +02:00
-
9598209f12
chore: aggiorna .gitignore — esclude __pycache__ e rimuove riferimento a transforms/
davide
2026-05-07 14:44:40 +02:00
-
64dc403e80
refactor: ottimizza pipeline PDF→Markdown — struttura piatta e verbosità
davide
2026-05-07 14:30:41 +02:00
-
afbf29514d
Aggiorna CLAUDE.md
davide
2026-05-07 13:51:55 +02:00
-
ab4036591f
temp
davide
2026-04-30 15:26:52 +02:00
-
e41fcae248
refactor: modularizza pipeline in conversione/_pipeline/
davide
2026-04-30 14:59:55 +02:00
-
faa8acae84
feat(pipeline): ottimizzazione completa PDF→Markdown senza revisione manuale
davide
2026-04-30 14:58:15 +02:00
-
a158634378
refactor: riduci repo alla sola fase di conversione PDF → Markdown
davide
2026-04-30 12:20:00 +02:00
-
a7b71fa508
refactor(skills): rinomina step6-fix → post-chunk
davide
2026-04-20 14:25:18 +02:00
-
-
fe0ecc24ad
feat(chunks): sentence-boundary flush, math incomplete detection, structure profile export
davide
2026-04-20 12:27:58 +02:00
-
995a8be735
chore: pulisci .gitignore — rimuovi step-2..6, aggiungi chunks/
davide
2026-04-20 12:25:00 +02:00
-
c87a7cb3eb
refactor: rimuovi step-5/ e step-6/, sostituiti da chunks/
davide
2026-04-20 12:21:30 +02:00
-
4c0e0db2a5
feat(chunks): aggiungi pipeline chunking consolidata
davide
2026-04-20 11:36:18 +02:00
-
-
5215f53ad0
docs: compatta README — rimuovi sezioni verbose, mantieni l'essenziale
davide
2026-04-20 11:20:54 +02:00
-
4f28358ec1
feat: pipeline RAG consolidata — conversione unificata, refactor struttura, CLAUDE.md minimale
davide
2026-04-20 11:06:18 +02:00
-
-
6f8785d90a
docs(CLAUDE.md): semplifica istruzioni, rimuovi path step-X hardcoded
ollama
davide
2026-04-20 11:05:20 +02:00
-
c8167d4f01
fix: aggiorna path step-4/ → conversione/ e riferimenti step-X
davide
2026-04-19 00:03:43 +02:00
-
e4dc0856bb
refactor: pulizia files
davide
2026-04-17 18:52:13 +02:00
-
af9ffc0559
docs(README): riscrittura per struttura reale del progetto
davide
2026-04-17 18:51:09 +02:00
-
e02e3496a3
chore(requirements): rimuovi commenti step-X obsoleti
davide
2026-04-17 18:50:50 +02:00
-
12effa1a51
refactor: elimina step-7 e step-9, consolida script alla root
davide
2026-04-17 18:50:31 +02:00
-
fc457e8525
feat(ollama): aggiungi step 7 — verifica ambiente Ollama
davide
2026-04-17 18:16:32 +02:00
-
610d4db348
feat(conversione): pipeline unificata PDF → Markdown, sostituisce step-0..4
davide
2026-04-17 16:05:11 +02:00
-
-
82f205faa2
chore: rimuovi cartelle step-0..step-4 ora obsolete
davide
2026-04-17 16:04:59 +02:00
-
368530bc25
refactor(docs): skill prepare-md sostituisce step4-review, CLAUDE.md senza step-X
davide
2026-04-17 13:44:41 +02:00
-
cdb2d4cab9
fix(conversione): PUA Symbol, garbage headers, merge+bib guard, math EN
davide
2026-04-17 13:44:30 +02:00
-
ef8f56fdba
fix(conversione): 5 fix robustezza e precisione transform
davide
2026-04-17 12:06:19 +02:00
-
0a8d98279c
feat(conversione): robustezza e 7 nuovi transform
davide
2026-04-17 11:53:38 +02:00
-
757df26bc2
refactor(pipeline): modularizza apply_transforms in 26 funzioni _t_xxx
pdf-to-md
davide
2026-04-17 09:46:50 +02:00
-
875a342efa
feat(validate): scoring orientato a chunking/vettorizzazione, flag --detail
davide
2026-04-17 09:20:15 +02:00
-
ea721774da
feat(pipeline): 10 nuovi transform e metriche residui estese
davide
2026-04-17 09:19:44 +02:00
-
9910a70823
feat(conversione): aggiungi clear.sh per pulizia batch cartelle stem
davide
2026-04-17 09:19:17 +02:00
-
265ac92b6c
feat(conversione): 7 nuovi transform pipeline, refactor validate — media 92→99/100
davide
2026-04-17 07:47:56 +02:00
-
bcf2e688aa
feat(validate): support single-file flags and explicit markdown score output
davide
2026-04-16 16:05:03 +02:00
-
5b6940e479
feat(pdf-to-md): sostituisci report.md con report.json + validate.py
davide
2026-04-16 15:53:46 +02:00
-
6ec54c8616
docs(pdf-to-md): aggiungi README per conversione/
davide
2026-04-16 15:35:42 +02:00
-
2545d834a9
refactor(pdf-to-md): rimuovi riferimenti agli step interni da conversione/
davide
2026-04-16 15:30:59 +02:00
-
b7994100e7
feat(pdf-to-md): aggiungi pipeline automatica PDF → Markdown pulito
davide
2026-04-16 15:27:45 +02:00
-
-
70924a575a
feat(step-9): aggiungi retrieve.py per retrieval puro senza LLM
davide
2026-04-15 14:25:34 +02:00
-
0b46c73006
docs(README): aggiungi istruzioni manuali senza Claude per step 4 e 6
davide
2026-04-15 13:33:56 +02:00
-
87e7ba67ec
fix(step-6): riconosci _word._ come terminatore valido in verify_chunks
davide
2026-04-15 13:33:39 +02:00
-
dabad93131
feat(skills): potenzia step4-review e step6-fix con check concreti
davide
2026-04-15 11:39:02 +02:00
-
94766d67cc
docs(CLAUDE.md): riscrivi con regole operative e tabella file critici
davide
2026-04-15 11:38:45 +02:00
-
80bd563000
step-9: add dynamic epilog and improve argparse help text
davide
2026-04-14 16:25:23 +02:00
-
1a0ebafda5
docs(step-8): aggiungi regole per parametri ottimali
davide
2026-04-14 19:10:34 +02:00
-
6594033673
feat(step-7,8): leggi modello da config.py, allinea EMBED_MODELS al README
davide
2026-04-14 18:22:05 +02:00
-
f62b5bc871
chore: rimuovi .env.example e traccia sources/.gitkeep
davide
2026-04-14 18:01:02 +02:00
-
8fa07784ae
docs: allinea README alla struttura reale del progetto
davide
2026-04-14 15:57:49 +02:00
-
f27ebfa101
docs(step-7): aggiorna guida modelli embedding e LLM
davide
2026-04-14 15:57:40 +02:00
-
d50f7f64a9
step-9: add pipeline RAG interattiva
davide
2026-04-14 15:57:29 +02:00
-
7d95872a8e
step-8: add ingest.py, align README
davide
2026-04-14 10:59:40 +02:00
-
a5f8b8d119
step-7: add check_env.py, README, update requirements
davide
2026-04-13 23:57:20 +02:00
-
e70a9a41f0
step-6: add fix_chunks.py, make step-6 self-contained
davide
2026-04-13 14:03:41 +02:00
-
5126e0d971
step-5: add adaptive chunker
davide
2026-04-13 13:36:53 +02:00
-
1631dff80d
step-4: add revise.py, step4-review skill, README update
davide
2026-04-13 12:21:26 +02:00
-
ee25adc0a6
step-3: add detect_structure.py (structure profile, no ML deps)
davide
2026-04-13 10:16:42 +02:00
-
346e336f1a
step-2: add convert_pdf.py (pymupdf4llm, low-memory)
davide
2026-04-13 10:00:42 +02:00
-
3d9ed0141c
step-1: add inspect_pdf.py
davide
2026-04-13 08:51:03 +02:00
-
eda04dc464
step-0: add check_pdf.py
davide
2026-04-13 08:03:08 +02:00
-
42c38c30f7
project setup: gitignore, CLAUDE.md, requirements
davide
2026-04-13 08:02:48 +02:00
-
638ba17629
Inital commit
davide
2026-04-12 23:53:13 +02:00