Skip to content

This reference describes the timer implementations used by tacet on different platforms.

tacet automatically selects the best available timer:

PlatformStandard TimerResolutionCycle-Accurate TimerResolution
x86_64 Linuxrdtsc~0.3nsperf_event~0.3ns
x86_64 macOSrdtsc~0.3nsN/AN/A
ARM64 Linuxcntvct_el01-40nsperf_event~0.3ns
ARM64 macOScntvct_el0~42nskperf~1ns

Use TimerSpec::CyclePrecision to request cycle-accurate measurements. This uses kperf on macOS ARM64 and perf_event on Linux:

Terminal window
# Requires BOTH sudo AND single-threaded
sudo -E cargo test -- --test-threads=1

kperf uses Apple’s private performance framework. It requires root privileges and can only be accessed by one thread at a time; parallel tests silently fall back to the standard timer.

The rdtsc (Read Time-Stamp Counter) instruction reads the CPU’s cycle counter:

  • Invariant TSC: Modern CPUs maintain constant rate regardless of frequency scaling
  • Resolution: ~0.3ns at 3 GHz
  • No privileges required

The library uses serialization barriers (mfence/lfence) to prevent out-of-order execution from affecting measurements.

Linux perf_event provides access to hardware performance counters:

  • Same resolution as rdtsc
  • Provides additional isolation from software overhead
  • Requires CAP_PERFMON or perf_event_paranoid ≤ 2

The virtual timer counter runs at a fixed 24 MHz on M1/M2/M3/M4:

  • Resolution: ~41.67ns (1/24 MHz)
  • No privileges required
  • Consistent across P/E cores

This relatively coarse resolution is compensated by adaptive batching.

Apple’s private performance framework provides cycle-accurate timing:

  • Resolution: ~1ns
  • Requires root
  • Single-threaded only (global resource)

To use kperf:

Terminal window
sudo -E cargo test -- --test-threads=1

Counter frequency varies by SoC:

SoCFrequencyResolution
AWS Graviton4 (ARMv8.6+)1 GHz~1ns
Ampere Altra25 MHz~40ns
Raspberry Pi 454 MHz~18ns

Same as x86_64 Linux; requires CAP_PERFMON or perf_event_paranoid ≤ 2.

On platforms with coarse timer resolution, fast operations complete in fewer timer ticks than needed for reliable measurement. The library automatically compensates:

  1. Pilot measurement: Run ~100 iterations to measure ticks per operation
  2. Enable batching: If < 5 ticks per call, batch multiple operations together
  3. Select K: Choose batch size K = min(ceil(50/ticks_per_call), 20)
  4. Measure batches: Record total time for K operations as one sample

Batching is disabled when:

  • Timer has sufficient resolution (≥ 5 ticks per operation)
  • Using cycle-accurate timers (kperf, perf_event)
  • Operation is slow enough

Before measurement begins, the library runs several checks:

  • Verify timer is monotonic (second read ≥ first read)
  • Check timer advances at a reasonable rate
  • Detect if timer resolution is too coarse for any measurement
  • Compare two halves of baseline samples
  • Detects problems like generator timing or side effects
  • If “leak” found between identical inputs, harness has a bug
  • Detect drift over time that would violate statistical assumptions
  • Divide samples into windows, compare medians
  • Flags quality warning if significant drift detected

Operations completing faster than ~10ns on Apple Silicon (or proportionally fast on other platforms) cannot be reliably measured:

Operation too fast to measure
Operation: ~5ns
Timer resolution: ~42ns
Recommendation: Use TimerSpec::CyclePrecision with sudo (~1ns resolution)

Options:

  1. Use TimerSpec::CyclePrecision for higher resolution
  2. Test a larger workload (more iterations, larger input)
  3. Accept that ultra-fast operations may not be measurable