ChiNext50 Regime Review (2026-04-09)

Executive summary

The current system has a real defensive effect, but the present end-to-end result is not primarily a threshold-tuning problem. It is first a system integrity problem:

breadth_score and crowding_score are effectively broken on the supplied PIT dataset because one z-scored component is constant, so the weighted sum becomes all-NaN.
down_hazard, repair_hazard, and rebound_hazard collapse to ~0.5 everywhere because the raw hazard inputs are NaN and get filled to zero inside the sigmoid.
This silently degenerates the state machine into a 3-state controller (chop, trend, risk_off), so repair and euphoric_late logic is mostly dead.
Policy candidate differentiation is partly fake: baseline and pro_risk produce identical exposure paths on the supplied run because coarse quantization collapses their differences.
Frozen walk-forward is too weak to support strong model-selection claims because only 2 windows are actually processed.

Only after fixing these should you trust threshold tuning and objective redesign.

Confirmed current behavior from the supplied bundle

Full-sample strategy metrics are materially below benchmark on annual return and upside capture, while max drawdown is much better.
Full-sample state counts in the saved run are effectively only chop, trend, and risk_off.
In my local replay of the supplied code + PIT:
- breadth_score non-null ratio = 0.0
- crowding_score non-null ratio = 0.0
- down_hazard, repair_hazard, rebound_hazard = 0.5 nearly everywhere
- baseline and pro_risk exposure paths are identical

Root causes

1. NaN propagation in `model/scores.py`

The weighted score sums do not protect against NaN sub-components. If any sub-score is all-NaN, the whole composite score becomes all-NaN.

On the supplied PIT, concentration_spread_5 = weighted_ret_5 - eq_weight_ret_5 is constant at 0.002, so its rolling z-score has zero std and becomes all-NaN. This breaks both:

breadth_score
crowding_score

2. Hazard collapse

Hazards are built from raw formulas that reference broken scores. Then they are fed through:

rolling_zscore(...)
_sigmoid(series.fillna(0.0))

This turns missing hazard information into the neutral constant 0.5, which prevents the system from noticing it is effectively blind.

3. Candidate selection collapse

The policy layer uses coarse quantization:

allowed levels: {0.0, 0.25, 0.50, 0.75, 1.0}

As a result:

trend = 0.95 and trend = 1.00 both quantize to 1.0
many repair and chop parameter tweaks collapse to the same discrete levels

That is why baseline and pro_risk can become identical even though their YAML values differ.

4. Walk-forward sample weakness

The frozen WF windows start in 2016 while the supplied PIT starts in 2020. So the first window is skipped, leaving only 2 processed windows. This is too thin for robust selection.

5. Execution calibration objective is mis-scaled

Current calibration score:

utility_total_score - 3*tracking_diff_abs_mean - 20*tracking_error_20_p95 - max_drawdown

On the supplied run:

max_drawdown is ~0.32 and dominates the score
tracking penalties are tiny in absolute magnitude
utility spread between candidates is small

So the calibration is effectively “pick the lowest cost / smallest MDD” rather than meaningfully trading off return, utility, and tracking.

Direction judgment

The overall direction is still valid:

regime-aware exposure control
preserve drawdown advantage
recover upside via repair/trend participation

But the current implementation is not yet a true regime system. In practice, it behaves like:

a coarse 3-state exposure smoother
using mostly price/stress information
with breadth/crowding/repair logic mostly disabled

So the global direction is not wrong, but the current bundle is not measuring what it thinks it is measuring.

Immediate fixes before any serious threshold tuning

In model/scores.py, fill NaN at the component level or aggregate with NaN-safe sums.
Add a low-information / constant-series gate in data quality checks.
Make hazards fail loudly when one of their prerequisite scores is entirely missing.
Replace 5-level exposure quantization with either:
- continuous exposure, or
- finer ladder, e.g. every 0.10.
Rebuild the walk-forward schedule so every reported window is valid.

Practical parameter recommendations after bug fix

State machine

Risk-off

Current risk-off is likely too eager once hazards start working.

Recommended first pass:

down_hazard: 0.62 -> 0.70
stress_score: 0.85 -> 0.95
crash override keep, but use stronger trigger: 0.72 -> 0.78

Expected impact:

fewer premature risk-off entries
better upside capture
slightly higher drawdown, but still clearly below benchmark if crash override remains

Repair

Current repair condition is too easy once repaired hazards become live.

Recommended first pass:

repair_hazard: 0.58 -> 0.62
repair stress max: 0.85 -> 0.70
keep d_stress <= 0, and add d_trend >= 0
add minimum breadth confirmation: breadth_score >= 0.00

Expected impact:

fewer fake repair states
lower churn in weak rebounds
repair exposure will become cleaner and more useful

Trend

Current trend gate is too strict on signal but too weak on persistence.

Recommended first pass:

trend_score: 0.45 -> 0.30~0.35
breadth_score: -0.05 -> 0.00 after bug fix
stress_score: 0.45 -> 0.55

Expected impact:

more days classified as trend
stronger upside capture
higher exposure persistence in genuine rallies

Euphoric late

Current euphoric_late should be delayed, not early.

Recommended first pass:

crowding_score: 0.70 -> 0.82
rebound_hazard: 0.68 -> 0.78

Expected impact:

fewer early caps on strong trends
better trend participation
still protects last-stage blowoff risk

Duration / persistence

Current symmetric min_state_duration = 3 is too blunt.

Recommended:

default persistence: 4
crash override: immediate
non-crash risk_off: 2-day confirm
trend exit: 4-day confirm
repair entry: 2-day confirm

Expected impact:

fewer whipsaws
better hold-through in trend
less premature de-risking

Exposure mapping recommendations

Replace coarse quantization

This is one of the biggest practical blockers.

Recommended:

remove quantization entirely, or
replace {0,0.25,0.5,0.75,1.0} with {0,0.1,0.2,...,1.0}

Without this change, many policy experiments are fake because different raw exposures map to the same discrete level.

Repair mapping

Current repair exposure is too timid if the goal is upside capture >= 0.60.

Recommended piecewise mapping:

weak repair: 0.30
confirmed repair: 0.45
broad repair: 0.60
strong repair + improving breadth: 0.75

Example:

if repair_hazard in [0.62, 0.70) and breadth_score >= 0.0: 0.45
if repair_hazard in [0.70, 0.80) and d_trend > 0: 0.60
if repair_hazard >= 0.80 and breadth_score > 0.25: 0.75

Trend mapping

Trend should be close to full risk unless stress or crowding says otherwise.

Recommended:

base trend: 0.90
strong trend + breadth > 0.25: 1.00
late trend / early crowding: 0.75

Practical formula:

trend_base = 0.90
trend_boost = +0.10 if breadth_score > 0.25
trend_cut = -0.15 if crowding_score > 0.75
clamp to [0.75, 1.00]

Chop mapping

This is the lever most directly tied to upside capture in the current broken topology.

Observed on the supplied run:

chop around 0.25 produces upside capture around 0.37
effective chop around 0.50 lifts upside capture toward 0.54
effective chop around 0.75 pushes upside capture above 0.70, but drawdown rises sharply

Recommended target for next round:

chop = 0.40~0.45 if using continuous exposure
if quantized, force effective chop to 0.50 only after fixing state logic

Expected impact:

biggest single uplift in upside capture
drawdown will rise, so do not do this before fixing false repair / false trend logic

Turnover guardrails

max_daily_exposure_change: 0.25 -> 0.35 after quantization removal
annual turnover soft ceiling: <= 12
if turnover > 12 without upside capture > 0.55, rollback

Objective / loss redesign

Hard constraints for walk-forward selection

Use hard gates first, then a score.

Recommended hard constraints:

strategy_max_drawdown <= 0.70 * baseline_max_drawdown
upside_capture >= 0.50 for every valid OOS window
median OOS upside_capture >= 0.55
positive_window_ratio >= 0.67
annual_turnover <= 12 unless annual return improves by at least +300 bps

Only candidates that pass hard constraints are ranked.

Practical ranking score

Recommended selection score:

score = 0.35 * return_ratio + 0.30 * upside_score + 0.20 * dd_score + 0.10 * sharpe_delta_score + 0.05 * stability_score - turnover_penalty

Where:

return_ratio = clip(strategy_ann / baseline_ann, 0, 1.2)
upside_score = clip(upside_capture / 0.60, 0, 1.2)
dd_score = clip((baseline_mdd - strategy_mdd) / baseline_mdd / 0.35, 0, 1.2)
sharpe_delta_score = clip((strategy_sharpe - baseline_sharpe + 0.10) / 0.20, 0, 1.2)
stability_score = positive_window_ratio
turnover_penalty = max(0, annual_turnover - 10) * 0.02

This score is easier to interpret than the current utility-only selection.

Execution calibration score redesign

Problem with current formula

The current formula is dominated by -max_drawdown, not by tracking penalties.

Alternative A: utility-first deployment score

Use when execution assumptions are still approximate.

calib_A = utility_total_score + 0.40*annual_return + 0.20*upside_capture - 0.60*max_drawdown - 5*tracking_error_20_p95 - 1.5*tracking_diff_abs_mean

Alternative B: implementation-sensitive score

Use only when execution model is already close to production.

calib_B = utility_total_score + 0.30*sharpe - 0.40*max_drawdown - 2*max(0, tracking_error_20_p95 - 0.003) - 1*max(0, tracking_diff_abs_mean - 0.001)

This introduces tolerance bands so tiny tracking differences do not dominate selection.

Walk-forward robustness protocol

Window scheme

Given current data start in 2020, do not pretend you have 2016 windows.

Recommended:

expanding train, rolling 1-year or 18-month test
minimum 3 valid OOS windows, preferably 4+

Example:

train 2020-2021, test 2022
train 2020-2022, test 2023
train 2020-2023, test 2024
train 2020-2024, test 2025

Stability checks

For every candidate, record:

median OOS annual return
median OOS max drawdown
median OOS upside capture
worst-window upside capture
worst-window drawdown ratio
selection frequency if you do candidate search

Acceptance criteria

no valid window with upside capture < 0.40
median upside capture >= 0.55
drawdown ratio vs baseline <= 0.75 in every window
positive utility in at least 3/4 windows

Two-week roadmap

Week 1

Fix NaN propagation in score aggregation
Add low-information feature gate
Remove or refine exposure quantization
Rebuild walk-forward windows to valid periods only

Week 2

Retune state thresholds after bug fix
Upgrade repair/trend exposure curves
Re-run walk-forward with new hard constraints
Replace execution calibration score

Priority experiments

Experiment 1 — Score integrity repair

change: NaN-safe score aggregation + fail-fast hazard checks
expected win-rate: very high
success metric: breadth_score and crowding_score non-null ratio > 95%; hazards not stuck at 0.5
rollback: if any required score still all-NaN

Experiment 2 — Exposure quantization removal

change: continuous exposure or 0.10 ladder
expected win-rate: high
success metric: baseline and pro-risk no longer identical; policy sweeps create genuinely different exposure paths
rollback: if turnover spikes > 14 without upside improvement

Experiment 3 — Trend/chop uplift

change: effective chop to ~0.45 and trend to 0.90~1.00
expected win-rate: medium-high
success metric: upside capture > 0.50 while max drawdown <= 0.75 * baseline
rollback: if MDD rises above 0.48 before upside reaches 0.50

Experiment 4 — Risk-off relaxation

change: down_hazard 0.70, stress 0.95, stronger crash override
expected win-rate: medium
success metric: fewer risk-off days, higher annual return, drawdown ratio still <= 0.70
rollback: if downside capture worsens materially without upside benefit

Experiment 5 — Repair cleanup

change: repair_hazard 0.62, breadth >= 0, d_trend >= 0, lower repair stress ceiling
expected win-rate: medium
success metric: lower false rebound count, repair-state annualized return positive
rollback: if repair days collapse to near zero

Experiment 6 — Walk-forward + objective redesign

change: hard constraints + new selection score
expected win-rate: high for decision quality, medium for metrics
success metric: selected candidates diversify and OOS selection remains stable
rollback: if selected candidate flips every window with no OOS gain

Bottom line

The core direction is valid, but the current bundle is still partly a false negative on offense because two of the most important score channels are effectively broken and the policy search space is partly collapsed.

Fix the integrity issues first. After that, the most likely path to materially better upside capture is:

slightly less eager risk-off
stricter but cleaner repair
earlier/more persistent trend classification
much less coarse exposure mapping

chinext50_regime_review_2026-04-09.md 13 KB 文件歷史 原始文件

ChiNext50 Regime Review (2026-04-09)

Executive summary

Confirmed current behavior from the supplied bundle

Root causes

1. NaN propagation in model/scores.py

2. Hazard collapse

3. Candidate selection collapse

4. Walk-forward sample weakness

5. Execution calibration objective is mis-scaled

Direction judgment

Immediate fixes before any serious threshold tuning

Practical parameter recommendations after bug fix

State machine

Risk-off

Repair

Trend

Euphoric late

Duration / persistence

Exposure mapping recommendations

Replace coarse quantization

Repair mapping

Trend mapping

Chop mapping

Turnover guardrails

Objective / loss redesign

Hard constraints for walk-forward selection

Practical ranking score

Execution calibration score redesign

Problem with current formula

Alternative A: utility-first deployment score

Alternative B: implementation-sensitive score

Walk-forward robustness protocol

Window scheme

Stability checks

Acceptance criteria

Two-week roadmap

Week 1

Week 2

Priority experiments

Experiment 1 — Score integrity repair

Experiment 2 — Exposure quantization removal

Experiment 3 — Trend/chop uplift

Experiment 4 — Risk-off relaxation

Experiment 5 — Repair cleanup

Experiment 6 — Walk-forward + objective redesign

Bottom line

chinext50_regime_review_2026-04-09.md 13 KB

文件歷史原始文件

1. NaN propagation in `model/scores.py`