We look for reasons to
reject your strategy

Most validation confirms what traders already believe. Ours does the opposite. Our proprietary framework applies a sequence of statistical tests designed to identify the failure modes that standard backtesting misses.


What we test for
Beta confound detection
We test whether your signal captures genuine timing alpha or simply benefits from directional exposure to a trending asset. This single class of test has eliminated strategies with Sharpe ratios above 1.5 in our internal research.
Proper statistical inference
Financial returns are autocorrelated. Standard significance tests produce misleading results. We use advanced bootstrap methods that preserve the dependence structure of your data, producing reliable p-values where naive methods fail.
Realistic cost modeling
Strategies are stress-tested at multiples of estimated transaction costs — including spread, commission, and market impact. Edges that vanish under conservative cost assumptions are identified and flagged before they consume capital.
Regime robustness
We require strategies to demonstrate persistence across distinct market environments (including crisis periods and low-volatility regimes). A signal that works only in favorable conditions is not a signal.

Gate 1

Statistical Screen

A fast, rigorous screen designed to kill quickly. Multiple layers of statistical testing assess whether the core signal has any validity before deeper resources are committed.

8
Test layers
~99%
Kill rate at this gate
Gate 2

Deep Validation

For survivors only. Real market microstructure data, bid-ask analysis, regime robustness across multiple environments, decay diagnostics, and portfolio-level correlation testing.

41
Additional layers
~50%
Kill rate of Gate 1 survivors

10
Synthetic test cases
covering all 8 layers
100%
Detection rate on
planted confounds
0.17
Expected false positives
per 1,576 null strategies
10−11
P(≥ 8 survivors)
under the null hypothesis

The deliverable

A verdict, not an opinion

Every engagement produces a structured validation report. Binary pass/fail on each test layer, supported by quantitative evidence.

What you receive

A comprehensive document designed to serve as an independent due diligence artifact, for your own capital allocation decisions, or to present to investors and allocators.

01
Validation summary
A single-page executive summary with an overall verdict (advance, reject, or conditional) supported by the key metrics that determined the outcome.
02
Layer-by-layer results
Every validation layer reports a binary pass/fail with the underlying test statistic, confidence level, and a plain-language interpretation.
03
Risk diagnostics
Drawdown profile, regime sensitivity, cost sensitivity, and correlation to common risk factors. A complete picture of where and how the strategy could fail.
04
Debrief
For full validation and multi-strategy engagements, a live session with the research team to walk through findings and discuss deployment implications.
View engagement options

From the kill log

Why strategies fail

Anonymized case studies from our internal research. Each illustrates a different failure mode that the framework is designed to catch — and that standard backtesting misses.

Case study 01 · Domain A · Equities
Month-End Rebalancing Flow
US equities & bonds · Daily resolution · 2005–2026
KILLED — L8
LayerTestStatisticVerdict
L1Direction shufflep = 0.048PASS
L2Bootstrap CICI excludes 0PASS
L3Sharpe floor0.52PASS
L4Cost absorption+18.4 bps netPASS
L5Walk-forward3/5 positivePASS
L6Bonferronip = 0.048PASS
L7N floorN = 174PASS
L8Timing alpha — regime-conditionalp = 0.991KILL

Hypothesis

Month-end portfolio rebalancing by index funds and pension managers creates predictable flow in equity/bond pairs. The signal captures the rebalancing window and fades the expected flow direction.

What the standard backtest showed

Positive mean return of +23.6 bps per trade, Sharpe of 0.52, and a statistically significant direction shuffle (p = 0.048). Six of the first seven layers passed. By any conventional backtest standard, this looked like a viable strategy.

What L8 revealed

The L8 timing shuffle compares the strategy's entries against regime-conditional random entries on the same asset. The result: p = 0.991 — meaning random entries at the same frequency produced better returns than the signal. The entire apparent edge was explained by directional exposure to the underlying asset, not by the rebalancing mechanism. The signal was an expensive way to be long in an uptrend.

Research finding: Rebalancing flow at month-end is continuation, not reversion. The signal was anti-signal — fading flow that was informationally correct. This closed the daily-frequency rebalancing frontier across all tested asset pairs.

Case study 02 · Domain B · FX
Macro Announcement Overshoot Reversion
Major FX pair · 1-minute resolution · 2014–2026
KILLED — L7
LayerTestStatisticVerdict
L1Direction shufflep = 0.001PASS
L2Bootstrap CICI excludes 0PASS
L3Sharpe floor0.71PASS
L4Cost absorption+9.2 bps netPASS
L5Walk-forward4/5 positivePASS
L6Bonferronip = 0.009PASS
L7N floor (Domain B = 300)N = 170KILL
L8Timing alphap = 0.0005PASS

Hypothesis

Algorithmic reaction to major macroeconomic releases creates a temporary overshoot in the first 30 seconds. Real-money flow normalizes the price over the following 5–35 minutes. The signal captures this reversion.

Why this is the hardest kill in the program

The mechanism is real. L8 returned p = 0.0005 — the strongest timing alpha confirmation in the entire research program. The forced actor is identifiable, the constraint is binding, and the statistical evidence is overwhelming. But the event occurs only ~17 times per year, producing N = 170 over 10 years of data. Domain B requires N ≥ 300. The strategy was killed by sample sufficiency, not by lack of edge.

Why we didn't bend the rule

We attempted cross-pair pooling (testing the same mechanism on correlated pairs to increase N). The correlation between pairs was ρ = 0.82, meaning the effective independent sample size barely increased. Pooling correlated observations inflates apparent statistical power without adding real information. The N floor exists precisely to prevent this.

Research finding: A confirmed mechanism is not the same as a tradeable edge. The N floor is the binding constraint for event-driven FX at minute resolution — a structural limitation of the data, not a framework calibration error. Cross-pair pooling does not solve the problem when pairs are correlated.

Case study 03 · Domain B · Crypto
Cascade Spillover at Hourly Resolution
Altcoin perpetual · 1-hour bars · 2020–2026
KILLED — Cost
LayerTestStatisticVerdict
L1Direction shufflep = 0.042PASS
L4Cost absorption (16 bps RT)−2.4 bps netKILL
L8Timing alphap = 0.030PASS

Hypothesis

When a large-cap cryptocurrency drops sharply, exchange liquidation engines force-sell correlated altcoin positions. This creates a temporary overshoot that reverts as organic liquidity returns. The signal captures the reversion on a mid-cap altcoin at hourly resolution.

What happened

The mechanism is confirmed — L8 timing alpha is significant at p = 0.030. But the gross edge of +13.6 bps per trade cannot survive 16 bps in round-trip transaction costs (spread + commission + impact). Net return is negative. The edge exists but is not tradeable at this resolution and cost structure.

Research finding: The same mechanism tested at higher resolution (15-minute bars) with tighter spreads survived cost absorption and passed full validation. Resolution and cost structure determine whether a confirmed mechanism translates into a tradeable edge. The framework tests both independently.


Submit your strategy for independent validation. We return a proposal within 48 hours.

or contact us