Commit Graph

2 Commits

Author SHA1 Message Date
7d4096749a perf(sha256): add ARMv8 2-way interleaved transform and scan_4way_direct
Process two independent SHA256 chains simultaneously to hide the 2-cycle
latency of vsha256hq_u32 on Cortex-A76, approaching full throughput.
Also reduces memcpy from 512 to ~192 bytes per 4-nonce group by reusing
block buffers, and adds scan_4way_direct to bypass pthread_once (LDAR
barrier) on every inner-loop call.
2026-03-30 10:42:17 +02:00
5b4c11f6f0 feat(sha256): add sha256d80 backend API and ARM64 kernel entry 2026-03-30 09:04:57 +02:00