Commit Graph

  • 48567fa5e7 fix(verify): riconosce URL www. come terminatori validi + doc multi-documento main davide 2026-05-12 11:21:24 +02:00
  • 8d972fa7c6 feat(ingestion): supporto multi-documento in unica collection ChromaDB davide 2026-05-12 11:21:17 +02:00
  • 5b63c423cc feat(chunks): ottimizzazione chunking e post-processing davide 2026-05-12 11:09:28 +02:00
  • 587238f9f5 docs(conversione): aggiorna README — comandi, output e log di esecuzione davide 2026-05-12 10:43:17 +02:00
  • c381d7da3c docs(readme): aggiunge sezioni configurazione modelli, test ollama, retrieval e RAG davide 2026-05-12 10:39:27 +02:00
  • b5fb363104 chore(config): tuning RAG — modello 4b, temperatura 0.2, chunk target 300 davide 2026-05-12 10:37:39 +02:00
  • 602dc87045 fix(ingestion): correggi path chunks da step-6/ a chunks/ davide 2026-05-12 10:37:35 +02:00
  • b49ef8edf0 docs: aggiorna README con flusso ingestion completo davide 2026-05-11 16:05:23 +02:00
  • 9e1a72a9e6 refactor: rinomina step-8 → ingestion davide 2026-05-11 15:58:54 +02:00
  • 70b304e1d4 docs(readme): flusso completo conversione → chunking davide 2026-05-11 15:46:52 +02:00
  • 02c785678d feat(chunks): target-based chunking con config centralizzata davide 2026-05-11 15:45:24 +02:00
  • 508587c5bf Merge branch 'chunks' into main davide 2026-05-11 14:51:53 +02:00
  • ebd2a43f84 feat: integra pipeline PDF→Markdown a 9 stadi e test suite chunks davide 2026-05-11 14:46:16 +02:00
  • e1b5298b20 feat: integra pipeline PDF→Markdown a 9 stadi e test suite davide 2026-05-11 14:44:16 +02:00
  • 444942dc8f feat: demota #→## quando il documento usa h1 per sezioni principali marker davide 2026-05-07 16:21:02 +02:00
  • 3f4689e8fd feat: rileva note bibliografiche e raccolte multi-articolo in pipeline davide 2026-05-07 16:12:50 +02:00
  • 2c0b7a462e feat: migliora pipeline PDF→MD per RAG — frontmatter e page marker davide 2026-05-07 14:58:09 +02:00
  • 6e755c0b6c fix(clear.sh): esclude _pipeline/ dal batch e supporta stem singolo davide 2026-05-07 14:53:17 +02:00
  • 9598209f12 chore: aggiorna .gitignore — esclude __pycache__ e rimuove riferimento a transforms/ davide 2026-05-07 14:44:40 +02:00
  • 64dc403e80 refactor: ottimizza pipeline PDF→Markdown — struttura piatta e verbosità davide 2026-05-07 14:30:41 +02:00
  • afbf29514d Aggiorna CLAUDE.md davide 2026-05-07 13:51:55 +02:00
  • ab4036591f temp davide 2026-04-30 15:26:52 +02:00
  • e41fcae248 refactor: modularizza pipeline in conversione/_pipeline/ davide 2026-04-30 14:59:55 +02:00
  • faa8acae84 feat(pipeline): ottimizzazione completa PDF→Markdown senza revisione manuale davide 2026-04-30 14:58:15 +02:00
  • a158634378 refactor: riduci repo alla sola fase di conversione PDF → Markdown davide 2026-04-30 12:20:00 +02:00
  • a7b71fa508 refactor(skills): rinomina step6-fix → post-chunk davide 2026-04-20 14:25:18 +02:00
  • fe0ecc24ad feat(chunks): sentence-boundary flush, math incomplete detection, structure profile export davide 2026-04-20 12:27:58 +02:00
  • 995a8be735 chore: pulisci .gitignore — rimuovi step-2..6, aggiungi chunks/ davide 2026-04-20 12:25:00 +02:00
  • c87a7cb3eb refactor: rimuovi step-5/ e step-6/, sostituiti da chunks/ davide 2026-04-20 12:21:30 +02:00
  • 4c0e0db2a5 feat(chunks): aggiungi pipeline chunking consolidata davide 2026-04-20 11:36:18 +02:00
  • 5215f53ad0 docs: compatta README — rimuovi sezioni verbose, mantieni l'essenziale davide 2026-04-20 11:20:54 +02:00
  • 4f28358ec1 feat: pipeline RAG consolidata — conversione unificata, refactor struttura, CLAUDE.md minimale davide 2026-04-20 11:06:18 +02:00
  • 6f8785d90a docs(CLAUDE.md): semplifica istruzioni, rimuovi path step-X hardcoded ollama davide 2026-04-20 11:05:20 +02:00
  • c8167d4f01 fix: aggiorna path step-4/ → conversione/ e riferimenti step-X davide 2026-04-19 00:03:43 +02:00
  • e4dc0856bb refactor: pulizia files davide 2026-04-17 18:52:13 +02:00
  • af9ffc0559 docs(README): riscrittura per struttura reale del progetto davide 2026-04-17 18:51:09 +02:00
  • e02e3496a3 chore(requirements): rimuovi commenti step-X obsoleti davide 2026-04-17 18:50:50 +02:00
  • 12effa1a51 refactor: elimina step-7 e step-9, consolida script alla root davide 2026-04-17 18:50:31 +02:00
  • fc457e8525 feat(ollama): aggiungi step 7 — verifica ambiente Ollama davide 2026-04-17 18:16:32 +02:00
  • 610d4db348 feat(conversione): pipeline unificata PDF → Markdown, sostituisce step-0..4 davide 2026-04-17 16:05:11 +02:00
  • 82f205faa2 chore: rimuovi cartelle step-0..step-4 ora obsolete davide 2026-04-17 16:04:59 +02:00
  • 368530bc25 refactor(docs): skill prepare-md sostituisce step4-review, CLAUDE.md senza step-X davide 2026-04-17 13:44:41 +02:00
  • cdb2d4cab9 fix(conversione): PUA Symbol, garbage headers, merge+bib guard, math EN davide 2026-04-17 13:44:30 +02:00
  • ef8f56fdba fix(conversione): 5 fix robustezza e precisione transform davide 2026-04-17 12:06:19 +02:00
  • 0a8d98279c feat(conversione): robustezza e 7 nuovi transform davide 2026-04-17 11:53:38 +02:00
  • 757df26bc2 refactor(pipeline): modularizza apply_transforms in 26 funzioni _t_xxx pdf-to-md davide 2026-04-17 09:46:50 +02:00
  • 875a342efa feat(validate): scoring orientato a chunking/vettorizzazione, flag --detail davide 2026-04-17 09:20:15 +02:00
  • ea721774da feat(pipeline): 10 nuovi transform e metriche residui estese davide 2026-04-17 09:19:44 +02:00
  • 9910a70823 feat(conversione): aggiungi clear.sh per pulizia batch cartelle stem davide 2026-04-17 09:19:17 +02:00
  • 265ac92b6c feat(conversione): 7 nuovi transform pipeline, refactor validate — media 92→99/100 davide 2026-04-17 07:47:56 +02:00
  • bcf2e688aa feat(validate): support single-file flags and explicit markdown score output davide 2026-04-16 16:05:03 +02:00
  • 5b6940e479 feat(pdf-to-md): sostituisci report.md con report.json + validate.py davide 2026-04-16 15:53:46 +02:00
  • 6ec54c8616 docs(pdf-to-md): aggiungi README per conversione/ davide 2026-04-16 15:35:42 +02:00
  • 2545d834a9 refactor(pdf-to-md): rimuovi riferimenti agli step interni da conversione/ davide 2026-04-16 15:30:59 +02:00
  • b7994100e7 feat(pdf-to-md): aggiungi pipeline automatica PDF → Markdown pulito davide 2026-04-16 15:27:45 +02:00
  • 70924a575a feat(step-9): aggiungi retrieve.py per retrieval puro senza LLM davide 2026-04-15 14:25:34 +02:00
  • 0b46c73006 docs(README): aggiungi istruzioni manuali senza Claude per step 4 e 6 davide 2026-04-15 13:33:56 +02:00
  • 87e7ba67ec fix(step-6): riconosci _word._ come terminatore valido in verify_chunks davide 2026-04-15 13:33:39 +02:00
  • dabad93131 feat(skills): potenzia step4-review e step6-fix con check concreti davide 2026-04-15 11:39:02 +02:00
  • 94766d67cc docs(CLAUDE.md): riscrivi con regole operative e tabella file critici davide 2026-04-15 11:38:45 +02:00
  • 80bd563000 step-9: add dynamic epilog and improve argparse help text davide 2026-04-14 16:25:23 +02:00
  • 1a0ebafda5 docs(step-8): aggiungi regole per parametri ottimali davide 2026-04-14 19:10:34 +02:00
  • 6594033673 feat(step-7,8): leggi modello da config.py, allinea EMBED_MODELS al README davide 2026-04-14 18:22:05 +02:00
  • f62b5bc871 chore: rimuovi .env.example e traccia sources/.gitkeep davide 2026-04-14 18:01:02 +02:00
  • 8fa07784ae docs: allinea README alla struttura reale del progetto davide 2026-04-14 15:57:49 +02:00
  • f27ebfa101 docs(step-7): aggiorna guida modelli embedding e LLM davide 2026-04-14 15:57:40 +02:00
  • d50f7f64a9 step-9: add pipeline RAG interattiva davide 2026-04-14 15:57:29 +02:00
  • 7d95872a8e step-8: add ingest.py, align README davide 2026-04-14 10:59:40 +02:00
  • a5f8b8d119 step-7: add check_env.py, README, update requirements davide 2026-04-13 23:57:20 +02:00
  • e70a9a41f0 step-6: add fix_chunks.py, make step-6 self-contained davide 2026-04-13 14:03:41 +02:00
  • 5126e0d971 step-5: add adaptive chunker davide 2026-04-13 13:36:53 +02:00
  • 1631dff80d step-4: add revise.py, step4-review skill, README update davide 2026-04-13 12:21:26 +02:00
  • ee25adc0a6 step-3: add detect_structure.py (structure profile, no ML deps) davide 2026-04-13 10:16:42 +02:00
  • 346e336f1a step-2: add convert_pdf.py (pymupdf4llm, low-memory) davide 2026-04-13 10:00:42 +02:00
  • 3d9ed0141c step-1: add inspect_pdf.py davide 2026-04-13 08:51:03 +02:00
  • eda04dc464 step-0: add check_pdf.py davide 2026-04-13 08:03:08 +02:00
  • 42c38c30f7 project setup: gitignore, CLAUDE.md, requirements davide 2026-04-13 08:02:48 +02:00
  • 638ba17629 Inital commit davide 2026-04-12 23:53:13 +02:00