cpu-miner

davide/cpu-miner

Fork 0

Commit Graph

Author	SHA1	Message	Date
davide	7d4096749a	perf(sha256): add ARMv8 2-way interleaved transform and scan_4way_direct Process two independent SHA256 chains simultaneously to hide the 2-cycle latency of vsha256hq_u32 on Cortex-A76, approaching full throughput. Also reduces memcpy from 512 to ~192 bytes per 4-nonce group by reusing block buffers, and adds scan_4way_direct to bypass pthread_once (LDAR barrier) on every inner-loop call.	2026-03-30 10:42:17 +02:00
davide	6be9e3cafd	test(sha256): add 100k nonce equivalence and hitmask checks	2026-03-30 09:05:42 +02:00

Author

SHA1

Message

Date

davide

7d4096749a

perf(sha256): add ARMv8 2-way interleaved transform and scan_4way_direct

Process two independent SHA256 chains simultaneously to hide the 2-cycle
latency of vsha256hq_u32 on Cortex-A76, approaching full throughput.
Also reduces memcpy from 512 to ~192 bytes per 4-nonce group by reusing
block buffers, and adds scan_4way_direct to bypass pthread_once (LDAR
barrier) on every inner-loop call.

2026-03-30 10:42:17 +02:00

davide

6be9e3cafd

test(sha256): add 100k nonce equivalence and hitmask checks

2026-03-30 09:05:42 +02:00

2 Commits