How It Works
This page describes the statistical methodology used by tacet to detect timing side channels.
The core idea
Section titled “The core idea”In plain English: We run your code thousands of times with two types of inputs (baseline and sample). If the timing distributions differ by more than your threshold, we report a leak. The key innovation is how we make that comparison:
-
Compare distributions, not averages: A timing leak might only affect some runs (cache misses, branch mispredictions), so we compare the full timing distribution using multiple percentiles.
-
Bayesian probability: Instead of a yes/no answer, we compute the probability that there’s a security-relevant leak. This lets us say “72% confident there’s a leak” rather than just “the t-statistic is 4.7.”
-
Adaptive stopping: We keep collecting samples until we’re confident enough to decide. Clear cases stop early; borderline cases get more evidence.
Two-phase pipeline
Section titled “Two-phase pipeline”tacet uses an adaptive two-phase approach:
Timing Samples │ ┌──────────────┴──────────────┐ ▼ ▼ ┌───────────────┐ ┌──────────────────┐ │ Phase 1: │ │ Phase 2: │ │ Calibration │ │ Adaptive Loop │ ├───────────────┤ ├──────────────────┤ │ 5,000 samples │ │ Batch collection │ │ Covariance │ │ Posterior update │ │ estimation │ │ Decision check │ │ Prior setup │ ────> │ Quality gates │ │ │ │ │ │ Fixed cost │ │ Stops when │ │ │ │ P < 0.05 or │ │ │ │ P > 0.95 │ └───────────────┘ └──────────────────┘Phase 1: Calibration
Section titled “Phase 1: Calibration”The calibration phase collects an initial batch of samples (default: 5,000 per class) to:
-
Estimate noise covariance: Timing measurements are noisy. Block bootstrap estimates the covariance structure of this noise.
-
Compute minimum detectable effect (MDE): Based on noise levels, what’s the smallest effect we could reliably detect?
-
Set Bayesian priors: The prior distribution is centered on zero effect with spread determined by the MDE.
Phase 2: Adaptive loop
Section titled “Phase 2: Adaptive loop”After calibration, the adaptive loop collects samples in batches until a decision is reached:
-
Collect batch: Gather N samples per class (default: 1,000)
-
Update posterior: Combine new data with existing posterior using Bayesian updating
-
Check decision thresholds: If P(leak) < 0.05, pass. If P(leak) > 0.95, fail.
-
Apply quality gates: Check for conditions that would make continued sampling futile
-
Repeat or stop: Either collect more samples or return a result
This adaptive approach:
- Stops early on clear pass/fail cases (saves time)
- Gathers more evidence when the signal is near the threshold
- Returns Inconclusive when hitting resource limits or quality gates
Quantile-based statistics
Section titled “Quantile-based statistics”Rather than comparing means (which can miss distributional differences), tacet compares nine deciles between baseline and sample classes:
This captures two types of effects:
- Uniform shift: All quantiles move by the same amount (e.g., different code path)
- Tail effect: Upper quantiles move more than lower (e.g., cache misses)
Effect decomposition
Section titled “Effect decomposition”The difference vector is projected onto an orthogonalized basis:
where is the uniform vector and is a linear contrast vector .
- (shift): Uniform component affecting all quantiles equally
- (tail): Linear component affecting upper quantiles more
- : Residual noise
This decomposition is useful because:
- A pure UniformShift suggests a branch on secret data
- A pure TailEffect suggests data-dependent cache behavior
- Mixed patterns suggest multiple leak sources
Bayesian inference
Section titled “Bayesian inference”The core question: given observed timing data, what’s the probability that the timing difference exceeds your security threshold θ?
In plain English: We start by assuming there’s probably no leak (the prior). As we collect data, we update our belief. If the data strongly suggests a difference, our confidence in a leak increases. If not, it decreases. The final number (72%, 3%, etc.) represents how confident we are that there’s a security-relevant leak.
We start by assuming “probably no leak”:
- The prior is centered at zero effect
- Its spread is determined by how noisy our measurements are (the MDE)
- Default: 75% prior probability of no leak
Likelihood
Section titled “Likelihood”Each batch of samples updates our belief. The likelihood tells us: “How probable is this data if the true effect is X?” We model timing differences using multivariate normal distributions.
Posterior
Section titled “Posterior”After each batch, we combine our prior belief with the new evidence using Bayes’ theorem:
This gives us an updated probability distribution over possible effect sizes .
Decision
Section titled “Decision”We integrate the posterior to answer: “What’s the probability that the effect exceeds the security threshold ?”
| Probability | Decision |
|---|---|
| Pass: confident there’s no exploitable leak | |
| Fail: confident there’s a leak | |
| Continue collecting or Inconclusive |
This is different from DudeCT’s t-test approach. DudeCT asks “is there any statistical difference?” while tacet asks “is there a security-relevant difference above my threshold ?”
Quality gates
Section titled “Quality gates”The adaptive loop includes five quality gates that can stop sampling early:
| Gate | Triggers when | Outcome |
|---|---|---|
DataTooNoisy | Posterior ≈ prior after calibration | Inconclusive |
NotLearning | KL divergence collapsed for 5+ batches | Inconclusive |
WouldTakeTooLong | Projected time exceeds budget | Inconclusive |
TimeBudgetExceeded | Wall-clock time exceeded | Inconclusive |
SampleBudgetExceeded | Sample limit reached | Inconclusive |
These gates prevent wasting time on tests that won’t converge.
Adaptive batching
Section titled “Adaptive batching”On platforms with coarse timer resolution (e.g., Apple Silicon’s ~42ns counter), single-operation measurements are quantized. Adaptive batching compensates:
-
Pilot measurement: Measure ~100 warmup iterations to estimate operation time
-
selection: Choose batch size to achieve ≥50 timer ticks per measurement:
where is the timer resolution in nanoseconds.
-
Bound : Cap at to avoid microarchitectural artifacts
-
Batch measurement: Measure iterations together, analyze batch totals
Batching is disabled when:
- Timer resolution is already fine (>5 ticks per call)
- Using cycle-accurate timers (kperf, perf_event)
- Operation is slow enough
- Running on x86_64 (rdtsc provides cycle-accurate timing)
Block bootstrap
Section titled “Block bootstrap”Covariance estimation uses circular block bootstrap to account for potential autocorrelation in timing data:
- Divide samples into overlapping blocks of length
- Resample blocks with replacement
- Compute quantile differences on resampled data
- Repeat times to estimate covariance matrix
Block bootstrap handles short-range dependencies that might arise from CPU state, cache effects, or measurement overhead.
References
Section titled “References”For the complete mathematical specification, see the Specification.
Academic references:
- Reparaz, O., Balasch, J., & Verbauwhede, I. (2016). “Dude, is my code constant time?” DATE. (Original DudeCT methodology)
- Van Goethem, T., et al. (2020). “Timeless Timing Attacks.” USENIX Security. (HTTP/2 timing attacks)
- Crosby, S. A., Wallach, D. S., & Riedi, R. H. (2009). “Opportunities and limits of remote timing attacks.” ACM TISSEC. (Effect size to exploitability mapping)