chinext50_regime_review_2026-04-09.md 13 KB

ChiNext50 Regime Review (2026-04-09)

Executive summary

The current system has a real defensive effect, but the present end-to-end result is not primarily a threshold-tuning problem. It is first a system integrity problem:

  1. breadth_score and crowding_score are effectively broken on the supplied PIT dataset because one z-scored component is constant, so the weighted sum becomes all-NaN.
  2. down_hazard, repair_hazard, and rebound_hazard collapse to ~0.5 everywhere because the raw hazard inputs are NaN and get filled to zero inside the sigmoid.
  3. This silently degenerates the state machine into a 3-state controller (chop, trend, risk_off), so repair and euphoric_late logic is mostly dead.
  4. Policy candidate differentiation is partly fake: baseline and pro_risk produce identical exposure paths on the supplied run because coarse quantization collapses their differences.
  5. Frozen walk-forward is too weak to support strong model-selection claims because only 2 windows are actually processed.

Only after fixing these should you trust threshold tuning and objective redesign.

Confirmed current behavior from the supplied bundle

  • Full-sample strategy metrics are materially below benchmark on annual return and upside capture, while max drawdown is much better.
  • Full-sample state counts in the saved run are effectively only chop, trend, and risk_off.
  • In my local replay of the supplied code + PIT:
    • breadth_score non-null ratio = 0.0
    • crowding_score non-null ratio = 0.0
    • down_hazard, repair_hazard, rebound_hazard = 0.5 nearly everywhere
    • baseline and pro_risk exposure paths are identical

Root causes

1. NaN propagation in model/scores.py

The weighted score sums do not protect against NaN sub-components. If any sub-score is all-NaN, the whole composite score becomes all-NaN.

On the supplied PIT, concentration_spread_5 = weighted_ret_5 - eq_weight_ret_5 is constant at 0.002, so its rolling z-score has zero std and becomes all-NaN. This breaks both:

  • breadth_score
  • crowding_score

2. Hazard collapse

Hazards are built from raw formulas that reference broken scores. Then they are fed through:

  • rolling_zscore(...)
  • _sigmoid(series.fillna(0.0))

This turns missing hazard information into the neutral constant 0.5, which prevents the system from noticing it is effectively blind.

3. Candidate selection collapse

The policy layer uses coarse quantization:

  • allowed levels: {0.0, 0.25, 0.50, 0.75, 1.0}

As a result:

  • trend = 0.95 and trend = 1.00 both quantize to 1.0
  • many repair and chop parameter tweaks collapse to the same discrete levels

That is why baseline and pro_risk can become identical even though their YAML values differ.

4. Walk-forward sample weakness

The frozen WF windows start in 2016 while the supplied PIT starts in 2020. So the first window is skipped, leaving only 2 processed windows. This is too thin for robust selection.

5. Execution calibration objective is mis-scaled

Current calibration score:

utility_total_score - 3*tracking_diff_abs_mean - 20*tracking_error_20_p95 - max_drawdown

On the supplied run:

  • max_drawdown is ~0.32 and dominates the score
  • tracking penalties are tiny in absolute magnitude
  • utility spread between candidates is small

So the calibration is effectively “pick the lowest cost / smallest MDD” rather than meaningfully trading off return, utility, and tracking.

Direction judgment

The overall direction is still valid:

  • regime-aware exposure control
  • preserve drawdown advantage
  • recover upside via repair/trend participation

But the current implementation is not yet a true regime system. In practice, it behaves like:

  • a coarse 3-state exposure smoother
  • using mostly price/stress information
  • with breadth/crowding/repair logic mostly disabled

So the global direction is not wrong, but the current bundle is not measuring what it thinks it is measuring.

Immediate fixes before any serious threshold tuning

  1. In model/scores.py, fill NaN at the component level or aggregate with NaN-safe sums.
  2. Add a low-information / constant-series gate in data quality checks.
  3. Make hazards fail loudly when one of their prerequisite scores is entirely missing.
  4. Replace 5-level exposure quantization with either:
    • continuous exposure, or
    • finer ladder, e.g. every 0.10.
  5. Rebuild the walk-forward schedule so every reported window is valid.

Practical parameter recommendations after bug fix

State machine

Risk-off

Current risk-off is likely too eager once hazards start working.

Recommended first pass:

  • down_hazard: 0.62 -> 0.70
  • stress_score: 0.85 -> 0.95
  • crash override keep, but use stronger trigger: 0.72 -> 0.78

Expected impact:

  • fewer premature risk-off entries
  • better upside capture
  • slightly higher drawdown, but still clearly below benchmark if crash override remains

Repair

Current repair condition is too easy once repaired hazards become live.

Recommended first pass:

  • repair_hazard: 0.58 -> 0.62
  • repair stress max: 0.85 -> 0.70
  • keep d_stress <= 0, and add d_trend >= 0
  • add minimum breadth confirmation: breadth_score >= 0.00

Expected impact:

  • fewer fake repair states
  • lower churn in weak rebounds
  • repair exposure will become cleaner and more useful

Trend

Current trend gate is too strict on signal but too weak on persistence.

Recommended first pass:

  • trend_score: 0.45 -> 0.30~0.35
  • breadth_score: -0.05 -> 0.00 after bug fix
  • stress_score: 0.45 -> 0.55

Expected impact:

  • more days classified as trend
  • stronger upside capture
  • higher exposure persistence in genuine rallies

Euphoric late

Current euphoric_late should be delayed, not early.

Recommended first pass:

  • crowding_score: 0.70 -> 0.82
  • rebound_hazard: 0.68 -> 0.78

Expected impact:

  • fewer early caps on strong trends
  • better trend participation
  • still protects last-stage blowoff risk

Duration / persistence

Current symmetric min_state_duration = 3 is too blunt.

Recommended:

  • default persistence: 4
  • crash override: immediate
  • non-crash risk_off: 2-day confirm
  • trend exit: 4-day confirm
  • repair entry: 2-day confirm

Expected impact:

  • fewer whipsaws
  • better hold-through in trend
  • less premature de-risking

Exposure mapping recommendations

Replace coarse quantization

This is one of the biggest practical blockers.

Recommended:

  • remove quantization entirely, or
  • replace {0,0.25,0.5,0.75,1.0} with {0,0.1,0.2,...,1.0}

Without this change, many policy experiments are fake because different raw exposures map to the same discrete level.

Repair mapping

Current repair exposure is too timid if the goal is upside capture >= 0.60.

Recommended piecewise mapping:

  • weak repair: 0.30
  • confirmed repair: 0.45
  • broad repair: 0.60
  • strong repair + improving breadth: 0.75

Example:

  • if repair_hazard in [0.62, 0.70) and breadth_score >= 0.0: 0.45
  • if repair_hazard in [0.70, 0.80) and d_trend > 0: 0.60
  • if repair_hazard >= 0.80 and breadth_score > 0.25: 0.75

Trend mapping

Trend should be close to full risk unless stress or crowding says otherwise.

Recommended:

  • base trend: 0.90
  • strong trend + breadth > 0.25: 1.00
  • late trend / early crowding: 0.75

Practical formula:

  • trend_base = 0.90
  • trend_boost = +0.10 if breadth_score > 0.25
  • trend_cut = -0.15 if crowding_score > 0.75
  • clamp to [0.75, 1.00]

Chop mapping

This is the lever most directly tied to upside capture in the current broken topology.

Observed on the supplied run:

  • chop around 0.25 produces upside capture around 0.37
  • effective chop around 0.50 lifts upside capture toward 0.54
  • effective chop around 0.75 pushes upside capture above 0.70, but drawdown rises sharply

Recommended target for next round:

  • chop = 0.40~0.45 if using continuous exposure
  • if quantized, force effective chop to 0.50 only after fixing state logic

Expected impact:

  • biggest single uplift in upside capture
  • drawdown will rise, so do not do this before fixing false repair / false trend logic

Turnover guardrails

  • max_daily_exposure_change: 0.25 -> 0.35 after quantization removal
  • annual turnover soft ceiling: <= 12
  • if turnover > 12 without upside capture > 0.55, rollback

Objective / loss redesign

Hard constraints for walk-forward selection

Use hard gates first, then a score.

Recommended hard constraints:

  1. strategy_max_drawdown <= 0.70 * baseline_max_drawdown
  2. upside_capture >= 0.50 for every valid OOS window
  3. median OOS upside_capture >= 0.55
  4. positive_window_ratio >= 0.67
  5. annual_turnover <= 12 unless annual return improves by at least +300 bps

Only candidates that pass hard constraints are ranked.

Practical ranking score

Recommended selection score:

score = 0.35 * return_ratio + 0.30 * upside_score + 0.20 * dd_score + 0.10 * sharpe_delta_score + 0.05 * stability_score - turnover_penalty

Where:

  • return_ratio = clip(strategy_ann / baseline_ann, 0, 1.2)
  • upside_score = clip(upside_capture / 0.60, 0, 1.2)
  • dd_score = clip((baseline_mdd - strategy_mdd) / baseline_mdd / 0.35, 0, 1.2)
  • sharpe_delta_score = clip((strategy_sharpe - baseline_sharpe + 0.10) / 0.20, 0, 1.2)
  • stability_score = positive_window_ratio
  • turnover_penalty = max(0, annual_turnover - 10) * 0.02

This score is easier to interpret than the current utility-only selection.

Execution calibration score redesign

Problem with current formula

The current formula is dominated by -max_drawdown, not by tracking penalties.

Alternative A: utility-first deployment score

Use when execution assumptions are still approximate.

calib_A = utility_total_score + 0.40*annual_return + 0.20*upside_capture - 0.60*max_drawdown - 5*tracking_error_20_p95 - 1.5*tracking_diff_abs_mean

Alternative B: implementation-sensitive score

Use only when execution model is already close to production.

calib_B = utility_total_score + 0.30*sharpe - 0.40*max_drawdown - 2*max(0, tracking_error_20_p95 - 0.003) - 1*max(0, tracking_diff_abs_mean - 0.001)

This introduces tolerance bands so tiny tracking differences do not dominate selection.

Walk-forward robustness protocol

Window scheme

Given current data start in 2020, do not pretend you have 2016 windows.

Recommended:

  • expanding train, rolling 1-year or 18-month test
  • minimum 3 valid OOS windows, preferably 4+

Example:

  • train 2020-2021, test 2022
  • train 2020-2022, test 2023
  • train 2020-2023, test 2024
  • train 2020-2024, test 2025

Stability checks

For every candidate, record:

  • median OOS annual return
  • median OOS max drawdown
  • median OOS upside capture
  • worst-window upside capture
  • worst-window drawdown ratio
  • selection frequency if you do candidate search

Acceptance criteria

  • no valid window with upside capture < 0.40
  • median upside capture >= 0.55
  • drawdown ratio vs baseline <= 0.75 in every window
  • positive utility in at least 3/4 windows

Two-week roadmap

Week 1

  1. Fix NaN propagation in score aggregation
  2. Add low-information feature gate
  3. Remove or refine exposure quantization
  4. Rebuild walk-forward windows to valid periods only

Week 2

  1. Retune state thresholds after bug fix
  2. Upgrade repair/trend exposure curves
  3. Re-run walk-forward with new hard constraints
  4. Replace execution calibration score

Priority experiments

Experiment 1 — Score integrity repair

  • change: NaN-safe score aggregation + fail-fast hazard checks
  • expected win-rate: very high
  • success metric: breadth_score and crowding_score non-null ratio > 95%; hazards not stuck at 0.5
  • rollback: if any required score still all-NaN

Experiment 2 — Exposure quantization removal

  • change: continuous exposure or 0.10 ladder
  • expected win-rate: high
  • success metric: baseline and pro-risk no longer identical; policy sweeps create genuinely different exposure paths
  • rollback: if turnover spikes > 14 without upside improvement

Experiment 3 — Trend/chop uplift

  • change: effective chop to ~0.45 and trend to 0.90~1.00
  • expected win-rate: medium-high
  • success metric: upside capture > 0.50 while max drawdown <= 0.75 * baseline
  • rollback: if MDD rises above 0.48 before upside reaches 0.50

Experiment 4 — Risk-off relaxation

  • change: down_hazard 0.70, stress 0.95, stronger crash override
  • expected win-rate: medium
  • success metric: fewer risk-off days, higher annual return, drawdown ratio still <= 0.70
  • rollback: if downside capture worsens materially without upside benefit

Experiment 5 — Repair cleanup

  • change: repair_hazard 0.62, breadth >= 0, d_trend >= 0, lower repair stress ceiling
  • expected win-rate: medium
  • success metric: lower false rebound count, repair-state annualized return positive
  • rollback: if repair days collapse to near zero

Experiment 6 — Walk-forward + objective redesign

  • change: hard constraints + new selection score
  • expected win-rate: high for decision quality, medium for metrics
  • success metric: selected candidates diversify and OOS selection remains stable
  • rollback: if selected candidate flips every window with no OOS gain

Bottom line

The core direction is valid, but the current bundle is still partly a false negative on offense because two of the most important score channels are effectively broken and the policy search space is partly collapsed.

Fix the integrity issues first. After that, the most likely path to materially better upside capture is:

  • slightly less eager risk-off
  • stricter but cleaner repair
  • earlier/more persistent trend classification
  • much less coarse exposure mapping