How It Works
This page describes the statistical methodology used by tacet to detect timing side channels.
The core idea
Section titled “The core idea”In plain English: We run your code thousands of times with two types of inputs (baseline and sample). If the timing distributions differ by more than your threshold, we report a leak. The key innovation is how we make that comparison:
-
Compare distributions, not averages: A timing leak might only affect some runs (cache misses, branch mispredictions), so we compare the full timing distribution using multiple percentiles.
-
Bayesian probability: Instead of a yes/no answer, we compute the probability that there’s a security-relevant leak. This lets us say “72% confident there’s a leak” rather than just “the t-statistic is 4.7.”
-
Adaptive stopping: We keep collecting samples until we’re confident enough to decide. Clear cases stop early; borderline cases get more evidence.
Two-phase pipeline
Section titled “Two-phase pipeline”tacet uses an adaptive two-phase approach:
Timing Samples │ ┌──────────────┴──────────────┐ ▼ ▼ ┌───────────────┐ ┌──────────────────┐ │ Phase 1: │ │ Phase 2: │ │ Calibration │ │ Adaptive Loop │ ├───────────────┤ ├──────────────────┤ │ 5,000 samples │ │ Batch collection │ │ Covariance │ │ Posterior update │ │ estimation │ │ Decision check │ │ Prior setup │ ────> │ Quality gates │ │ │ │ │ │ Fixed cost │ │ Stops when │ │ │ │ P < 0.05 or │ │ │ │ P > 0.95 │ └───────────────┘ └──────────────────┘Phase 1: Calibration
Section titled “Phase 1: Calibration”The calibration phase collects an initial batch of samples (default: 5,000 per class) to:
-
Estimate noise covariance: Timing measurements are noisy. Block bootstrap estimates the covariance structure of this noise.
-
Compute minimum detectable effect (MDE): Based on noise levels, what’s the smallest effect we could reliably detect?
-
Set Bayesian priors: The prior distribution is centered on zero effect with spread determined by the MDE.
Phase 2: Adaptive loop
Section titled “Phase 2: Adaptive loop”After calibration, the adaptive loop collects samples in batches until a decision is reached:
-
Collect batch: Gather N samples per class (default: 1,000)
-
Update posterior: Combine new data with existing posterior using Bayesian updating
-
Check decision thresholds: If P(leak) < 0.05, pass. If P(leak) > 0.95, fail.
-
Apply quality gates: Check for conditions that would make continued sampling futile
-
Repeat or stop: Either collect more samples or return a result
This adaptive approach:
- Stops early on clear pass/fail cases (saves time)
- Gathers more evidence when the signal is near the threshold
- Returns Inconclusive when hitting resource limits or quality gates
Wasserstein distance
Section titled “Wasserstein distance”Rather than comparing means (which can miss distributional differences), tacet uses the Wasserstein-1 distance (also known as the earth mover’s distance) to measure how different the timing distributions are:
where and are the -th smallest values from each class.
Why W₁? The Wasserstein-1 distance captures the “cost” of transforming one distribution into another via optimal transport. Unlike a simple mean comparison, it’s sensitive to both location shifts and shape changes in the distribution.
Effect decomposition
Section titled “Effect decomposition”The W₁ distance can be decomposed into two interpretable components:
where:
- shift = median of rank-matched differences
- tail = mean of centered residuals
The tail share ratio distinguishes timing leak patterns:
| Tail Share | Pattern | Likely cause |
|---|---|---|
| < 0.20 | UniformShift | Branch on secret data (all timings shift uniformly) |
| 0.20–0.50 | Mixed | Multiple leak sources (branches + cache effects) |
| > 0.50 | TailEffect | Data-dependent cache behavior (upper quantiles affected more) |
This decomposition is useful because:
- A pure UniformShift suggests a conditional branch on secret data
- A pure TailEffect suggests data-dependent cache misses or memory access patterns
- Mixed patterns suggest multiple leak sources or complex microarchitectural effects
Bayesian inference
Section titled “Bayesian inference”The core question: given observed timing data, what’s the probability that the timing difference exceeds your security threshold θ?
In plain English: We start by assuming there’s probably no leak (the prior). As we collect data, we update our belief. If the data strongly suggests a difference, our confidence in a leak increases. If not, it decreases. The final number (72%, 3%, etc.) represents how confident we are that there’s a security-relevant leak.
We start by assuming “probably no leak”:
- The prior is centered at zero effect
- Its spread is determined by how noisy our measurements are (the MDE)
- Default: 75% prior probability of no leak
Likelihood
Section titled “Likelihood”Each batch of samples updates our belief. The likelihood tells us: “How probable is this data if the true W₁ distance is δ?” We model the W₁ distance using a Student’s t distribution with robust variance estimation to handle measurement uncertainty.
Posterior
Section titled “Posterior”After each batch, we combine our prior belief with the new evidence using Bayes’ theorem:
This gives us an updated probability distribution over the true W₁ distance .
Decision
Section titled “Decision”We integrate the posterior to answer: “What’s the probability that the W₁ distance exceeds the security threshold θ?”
| Probability | Decision |
|---|---|
| Pass: confident there’s no exploitable leak | |
| Fail: confident there’s a leak | |
| Continue collecting or Inconclusive |
This is different from DudeCT’s t-test approach. DudeCT asks “is there any statistical difference?” while tacet asks “is there a security-relevant difference above my threshold ?”
Quality gates
Section titled “Quality gates”The adaptive loop includes five quality gates that can stop sampling early:
| Gate | Triggers when | Outcome |
|---|---|---|
DataTooNoisy | Posterior ≈ prior after calibration | Inconclusive |
NotLearning | KL divergence collapsed for 5+ batches | Inconclusive |
WouldTakeTooLong | Projected time exceeds budget | Inconclusive |
TimeBudgetExceeded | Wall-clock time exceeded | Inconclusive |
SampleBudgetExceeded | Sample limit reached | Inconclusive |
These gates prevent wasting time on tests that won’t converge.
Adaptive batching
Section titled “Adaptive batching”On platforms with coarse timer resolution (e.g., Apple Silicon’s ~42ns counter), single-operation measurements are quantized. Adaptive batching compensates:
-
Pilot measurement: Measure ~100 warmup iterations to estimate operation time
-
selection: Choose batch size to achieve ≥50 timer ticks per measurement:
where is the timer resolution in nanoseconds.
-
Bound : Cap at to avoid microarchitectural artifacts
-
Batch measurement: Measure iterations together, analyze batch totals
Batching is disabled when:
- Timer resolution is already fine (>5 ticks per call)
- Using cycle-accurate timers (kperf, perf_event)
- Operation is slow enough
- Running on x86_64 (rdtsc provides cycle-accurate timing)
Block bootstrap
Section titled “Block bootstrap”Variance estimation uses circular block bootstrap to account for potential autocorrelation in timing data:
- Divide samples into overlapping blocks of length
- Resample blocks with replacement
- Compute W₁ distance on resampled data
- Repeat times to estimate variance
Block bootstrap handles short-range dependencies that might arise from CPU state, cache effects, or measurement overhead.
References
Section titled “References”For the complete mathematical specification, see the Specification.
Academic references:
- Reparaz, O., Balasch, J., & Verbauwhede, I. (2016). “Dude, is my code constant time?” DATE. (Original DudeCT methodology)
- Van Goethem, T., et al. (2020). “Timeless Timing Attacks.” USENIX Security. (HTTP/2 timing attacks)
- Crosby, S. A., Wallach, D. S., & Riedi, R. H. (2009). “Opportunities and limits of remote timing attacks.” ACM TISSEC. (Effect size to exploitability mapping)