.codex/skills/python-to-c-efficiency/references/compiler-and-profiling.md

# Compilazione, Profiling e Benchmark

## 1) Profili build consigliati

## Debug (correttezza e sicurezza)

```bash
gcc -O0 -g3 -fsanitize=address,undefined -fno-omit-frame-pointer -Wall -Wextra -Wpedantic src/*.c tests/*.c -o app_debug
```

Usare questo profilo per trovare UB, out-of-bounds e bug di lifetime.

## Release (throughput)

```bash
gcc -O3 -march=native -flto -fno-semantic-interposition -DNDEBUG -Wall -Wextra src/*.c -o app_release
```

Confrontare anche con `clang` sullo stesso workload.

## Release + PGO (opzionale su workload stabile)

```bash
gcc -O3 -fprofile-generate src/*.c -o app_pgo_gen
./app_pgo_gen <input-realistico>
gcc -O3 -fprofile-use -fprofile-correction src/*.c -o app_pgo
```

Applicare PGO solo quando il dataset di training è rappresentativo.

## 2) Misurazione minima

## Tempo e memoria

```bash
/usr/bin/time -v ./app_release
```

## Benchmark ripetibile

```bash
hyperfine --warmup 3 --runs 20 'python3 script.py' './app_release'
```

Bloccare input, CPU governor e carico macchina durante le run.

## 3) Profiling CPU

```bash
perf stat ./app_release
perf record -g ./app_release
perf report
```

Usare `perf report` per confermare hotspot reali prima di ottimizzare.

## 4) Interpretazione pratica

- Ridurre prima complessità algoritmica.
- Ottimizzare poi memoria/cache.
- Applicare infine micro-ottimizzazioni (`inline`, branch hints, unrolling) solo se misurate.

## 5) Gate di accettazione

1. Nessun errore sanitizer in debug.
2. Equivalenza output su dataset di regressione.
3. Speedup ripetibile su target reale.
4. Metriche e flag compiler documentati nel risultato.
feat(skills): add local python-to-c-efficiency skill with modular C scaffold add local Codex skill for Python->C performance-focused translation define modular C architecture and benchmark/correctness gates add references for patterns, profiling, and module design add scaffold_c_module.py to generate include/src/tests/bench skeleton update agent default prompt for benchmark-backed optimizations 2026-03-30 00:08:14 +02:00			`# Compilazione, Profiling e Benchmark`

			`## 1) Profili build consigliati`

			`## Debug (correttezza e sicurezza)`

			```bash
			`gcc -O0 -g3 -fsanitize=address,undefined -fno-omit-frame-pointer -Wall -Wextra -Wpedantic src/.c tests/.c -o app_debug`
			```

			`Usare questo profilo per trovare UB, out-of-bounds e bug di lifetime.`

			`## Release (throughput)`

			```bash
			`gcc -O3 -march=native -flto -fno-semantic-interposition -DNDEBUG -Wall -Wextra src/*.c -o app_release`
			```

			Confrontare anche con `clang` sullo stesso workload.

			`## Release + PGO (opzionale su workload stabile)`

			```bash
			`gcc -O3 -fprofile-generate src/*.c -o app_pgo_gen`
			`./app_pgo_gen <input-realistico>`
			`gcc -O3 -fprofile-use -fprofile-correction src/*.c -o app_pgo`
			```

			`Applicare PGO solo quando il dataset di training è rappresentativo.`

			`## 2) Misurazione minima`

			`## Tempo e memoria`

			```bash
			`/usr/bin/time -v ./app_release`
			```

			`## Benchmark ripetibile`

			```bash
			`hyperfine --warmup 3 --runs 20 'python3 script.py' './app_release'`
			```

			`Bloccare input, CPU governor e carico macchina durante le run.`

			`## 3) Profiling CPU`

			```bash
			`perf stat ./app_release`
			`perf record -g ./app_release`
			`perf report`
			```

			Usare `perf report` per confermare hotspot reali prima di ottimizzare.

			`## 4) Interpretazione pratica`

			`- Ridurre prima complessità algoritmica.`
			`- Ottimizzare poi memoria/cache.`
			- Applicare infine micro-ottimizzazioni (`inline`, branch hints, unrolling) solo se misurate.

			`## 5) Gate di accettazione`

			`1. Nessun errore sanitizer in debug.`
			`2. Equivalenza output su dataset di regressione.`
			`3. Speedup ripetibile su target reale.`
			`4. Metriche e flag compiler documentati nel risultato.`