feat(skills): add local python-to-c-efficiency skill with modular C scaffold

add local Codex skill for Python->C performance-focused translation define modular C architecture and benchmark/correctness gates add references for patterns, profiling, and module design add scaffold_c_module.py to generate include/src/tests/bench skeleton update agent default prompt for benchmark-backed optimizations
2026-03-30 00:08:14 +02:00
parent 3ea99d84f4
commit 9a0a170799
6 changed files with 506 additions and 0 deletions
--- a/.codex/skills/python-to-c-efficiency/references/compiler-and-profiling.md
+++ b/.codex/skills/python-to-c-efficiency/references/compiler-and-profiling.md
@@ -0,0 +1,68 @@
+# Compilazione, Profiling e Benchmark
+
+## 1) Profili build consigliati
+
+## Debug (correttezza e sicurezza)
+
+```bash
+gcc -O0 -g3 -fsanitize=address,undefined -fno-omit-frame-pointer -Wall -Wextra -Wpedantic src/*.c tests/*.c -o app_debug
+```
+
+Usare questo profilo per trovare UB, out-of-bounds e bug di lifetime.
+
+## Release (throughput)
+
+```bash
+gcc -O3 -march=native -flto -fno-semantic-interposition -DNDEBUG -Wall -Wextra src/*.c -o app_release
+```
+
+Confrontare anche con `clang` sullo stesso workload.
+
+## Release + PGO (opzionale su workload stabile)
+
+```bash
+gcc -O3 -fprofile-generate src/*.c -o app_pgo_gen
+./app_pgo_gen <input-realistico>
+gcc -O3 -fprofile-use -fprofile-correction src/*.c -o app_pgo
+```
+
+Applicare PGO solo quando il dataset di training è rappresentativo.
+
+## 2) Misurazione minima
+
+## Tempo e memoria
+
+```bash
+/usr/bin/time -v ./app_release
+```
+
+## Benchmark ripetibile
+
+```bash
+hyperfine --warmup 3 --runs 20 'python3 script.py' './app_release'
+```
+
+Bloccare input, CPU governor e carico macchina durante le run.
+
+## 3) Profiling CPU
+
+```bash
+perf stat ./app_release
+perf record -g ./app_release
+perf report
+```
+
+Usare `perf report` per confermare hotspot reali prima di ottimizzare.
+
+## 4) Interpretazione pratica
+
+- Ridurre prima complessità algoritmica.
+- Ottimizzare poi memoria/cache.
+- Applicare infine micro-ottimizzazioni (`inline`, branch hints, unrolling) solo se misurate.
+
+## 5) Gate di accettazione
+
+1. Nessun errore sanitizer in debug.
+2. Equivalenza output su dataset di regressione.
+3. Speedup ripetibile su target reale.
+4. Metriche e flag compiler documentati nel risultato.