Skip to content

This page describes the statistical methodology used by tacet to detect timing side channels.

In plain English: We run your code thousands of times with two types of inputs (baseline and sample). If the timing distributions differ by more than your threshold, we report a leak. The key innovation is how we make that comparison:

  1. Compare distributions, not averages: A timing leak might only affect some runs (cache misses, branch mispredictions), so we compare the full timing distribution using multiple percentiles.

  2. Bayesian probability: Instead of a yes/no answer, we compute the probability that there’s a security-relevant leak. This lets us say “72% confident there’s a leak” rather than just “the t-statistic is 4.7.”

  3. Adaptive stopping: We keep collecting samples until we’re confident enough to decide. Clear cases stop early; borderline cases get more evidence.

tacet uses an adaptive two-phase approach:

Timing Samples
┌──────────────┴──────────────┐
▼ ▼
┌───────────────┐ ┌──────────────────┐
│ Phase 1: │ │ Phase 2: │
│ Calibration │ │ Adaptive Loop │
├───────────────┤ ├──────────────────┤
│ 5,000 samples │ │ Batch collection │
│ Covariance │ │ Posterior update │
│ estimation │ │ Decision check │
│ Prior setup │ ────> │ Quality gates │
│ │ │ │
│ Fixed cost │ │ Stops when │
│ │ │ P < 0.05 or │
│ │ │ P > 0.95 │
└───────────────┘ └──────────────────┘

The calibration phase collects an initial batch of samples (default: 5,000 per class) to:

  1. Estimate noise covariance: Timing measurements are noisy. Block bootstrap estimates the covariance structure of this noise.

  2. Compute minimum detectable effect (MDE): Based on noise levels, what’s the smallest effect we could reliably detect?

  3. Set Bayesian priors: The prior distribution is centered on zero effect with spread determined by the MDE.

After calibration, the adaptive loop collects samples in batches until a decision is reached:

  1. Collect batch: Gather N samples per class (default: 1,000)

  2. Update posterior: Combine new data with existing posterior using Bayesian updating

  3. Check decision thresholds: If P(leak) < 0.05, pass. If P(leak) > 0.95, fail.

  4. Apply quality gates: Check for conditions that would make continued sampling futile

  5. Repeat or stop: Either collect more samples or return a result

This adaptive approach:

  • Stops early on clear pass/fail cases (saves time)
  • Gathers more evidence when the signal is near the threshold
  • Returns Inconclusive when hitting resource limits or quality gates

Rather than comparing means (which can miss distributional differences), tacet uses the Wasserstein-1 distance (also known as the earth mover’s distance) to measure how different the timing distributions are:

W1(baseline,sample)=1ni=1nxbaseline(i)xsample(i)W_1(\text{baseline}, \text{sample}) = \frac{1}{n} \sum_{i=1}^{n} \left| x_{\text{baseline}}^{(i)} - x_{\text{sample}}^{(i)} \right|

where xbaseline(i)x_{\text{baseline}}^{(i)} and xsample(i)x_{\text{sample}}^{(i)} are the ii-th smallest values from each class.

Why W₁? The Wasserstein-1 distance captures the “cost” of transforming one distribution into another via optimal transport. Unlike a simple mean comparison, it’s sensitive to both location shifts and shape changes in the distribution.

The W₁ distance can be decomposed into two interpretable components:

W1=shift+tailW_1 = |\text{shift}| + \text{tail}

where:

  • shift = median of rank-matched differences {xbaseline(i)xsample(i)}\{x_{\text{baseline}}^{(i)} - x_{\text{sample}}^{(i)}\}
  • tail = mean of centered residuals (xbaseline(i)xsample(i))shift\left| (x_{\text{baseline}}^{(i)} - x_{\text{sample}}^{(i)}) - \text{shift} \right|

The tail share ratio distinguishes timing leak patterns:

tail_share=tailshift+tail\text{tail\_share} = \frac{\text{tail}}{|\text{shift}| + \text{tail}}
Tail SharePatternLikely cause
< 0.20UniformShiftBranch on secret data (all timings shift uniformly)
0.20–0.50MixedMultiple leak sources (branches + cache effects)
> 0.50TailEffectData-dependent cache behavior (upper quantiles affected more)

This decomposition is useful because:

  • A pure UniformShift suggests a conditional branch on secret data
  • A pure TailEffect suggests data-dependent cache misses or memory access patterns
  • Mixed patterns suggest multiple leak sources or complex microarchitectural effects

The core question: given observed timing data, what’s the probability that the timing difference exceeds your security threshold θ?

In plain English: We start by assuming there’s probably no leak (the prior). As we collect data, we update our belief. If the data strongly suggests a difference, our confidence in a leak increases. If not, it decreases. The final number (72%, 3%, etc.) represents how confident we are that there’s a security-relevant leak.

We start by assuming “probably no leak”:

  • The prior is centered at zero effect
  • Its spread is determined by how noisy our measurements are (the MDE)
  • Default: 75% prior probability of no leak

Each batch of samples updates our belief. The likelihood tells us: “How probable is this data if the true W₁ distance is δ?” We model the W₁ distance using a Student’s t distribution with robust variance estimation to handle measurement uncertainty.

After each batch, we combine our prior belief with the new evidence using Bayes’ theorem:

p(δdata)p(dataδ)p(δ)p(\delta \mid \text{data}) \propto p(\text{data} \mid \delta) \cdot p(\delta)

This gives us an updated probability distribution over the true W₁ distance δ\delta.

We integrate the posterior to answer: “What’s the probability that the W₁ distance exceeds the security threshold θ?”

P(leak)=P(δ>θall data)P(\text{leak}) = P(\delta > \theta \mid \text{all data})
ProbabilityDecision
P(leak)<0.05P(\text{leak}) < 0.05Pass: confident there’s no exploitable leak
P(leak)>0.95P(\text{leak}) > 0.95Fail: confident there’s a leak
0.05P(leak)0.950.05 \leq P(\text{leak}) \leq 0.95Continue collecting or Inconclusive

This is different from DudeCT’s t-test approach. DudeCT asks “is there any statistical difference?” while tacet asks “is there a security-relevant difference above my threshold θ\theta?”

The adaptive loop includes five quality gates that can stop sampling early:

GateTriggers whenOutcome
DataTooNoisyPosterior ≈ prior after calibrationInconclusive
NotLearningKL divergence collapsed for 5+ batchesInconclusive
WouldTakeTooLongProjected time exceeds budgetInconclusive
TimeBudgetExceededWall-clock time exceededInconclusive
SampleBudgetExceededSample limit reachedInconclusive

These gates prevent wasting time on tests that won’t converge.

On platforms with coarse timer resolution (e.g., Apple Silicon’s ~42ns counter), single-operation measurements are quantized. Adaptive batching compensates:

  1. Pilot measurement: Measure ~100 warmup iterations to estimate operation time topt_{\text{op}}

  2. KK selection: Choose batch size KK to achieve ≥50 timer ticks per measurement:

K=max ⁣(1,50rtimertop)K = \max\!\left(1, \left\lceil \frac{50 \cdot r_{\text{timer}}}{t_{\text{op}}} \right\rceil\right)

where rtimerr_{\text{timer}} is the timer resolution in nanoseconds.

  1. Bound KK: Cap at K=20K = 20 to avoid microarchitectural artifacts

  2. Batch measurement: Measure KK iterations together, analyze batch totals

Batching is disabled when:

  • Timer resolution is already fine (>5 ticks per call)
  • Using cycle-accurate timers (kperf, perf_event)
  • Operation is slow enough
  • Running on x86_64 (rdtsc provides cycle-accurate timing)

Variance estimation uses circular block bootstrap to account for potential autocorrelation in timing data:

  1. Divide samples into overlapping blocks of length n1/3\ell \approx n^{1/3}
  2. Resample blocks with replacement
  3. Compute W₁ distance on resampled data
  4. Repeat B=2,000B = 2{,}000 times to estimate variance σ^2\hat{\sigma}^2

Block bootstrap handles short-range dependencies that might arise from CPU state, cache effects, or measurement overhead.

For the complete mathematical specification, see the Specification.

Academic references:

  • Reparaz, O., Balasch, J., & Verbauwhede, I. (2016). “Dude, is my code constant time?” DATE. (Original DudeCT methodology)
  • Van Goethem, T., et al. (2020). “Timeless Timing Attacks.” USENIX Security. (HTTP/2 timing attacks)
  • Crosby, S. A., Wallach, D. S., & Riedi, R. H. (2009). “Opportunities and limits of remote timing attacks.” ACM TISSEC. (Effect size to exploitability mapping)