Files
rag-from-scratch/.gitignore
T
davide e70a9a41f0 step-6: add fix_chunks.py, make step-6 self-contained
- verify_chunks.py now reads from step-6/<stem>/chunks.json and
  auto-copies from step-5 on first run (input and output both in step-6)
- fix_chunks.py: new script that applies fixes directly on chunks.json
  (merge too-short/incomplete, split too-long, remove empty, add prefix)
  supports --dry-run to preview changes before applying
- step6-fix.md skill updated to use fix_chunks.py workflow:
  dry-run → user approval → apply → re-verify
2026-04-13 23:56:50 +02:00

46 lines
724 B
Plaintext

# Virtual environment
.venv/
# PDF originali — file pesanti, non versionare
sources/
# Output pipeline — generati dagli script, non versionare
processed/
chroma_db/
# Python
__pycache__/
*.py[cod]
*.pyo
# Editor
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Report generati dagli script
step-0/*_step0_report.txt
step-1/*_step1_report.txt
# Output step-2 — MD grezzo generato da marker
step-2/*/
# Output step-3 — profilo struttura generato da detect_structure.py
step-3/*/
# Output step-4 — MD revisionato e log generati da revise.py
step-4/*/
step-4/revision_log.md
# Output step-5 — chunk generati da chunker.py
step-5/*/
# Output step-6 — report generati da verify_chunks.py
step-6/*/