erwin пре 3 недеља
родитељ
комит
1c90bb6525
100 измењених фајлова са 19346 додато и 145 уклоњено
  1. 0 55
      BOOTSTRAP.md
  2. 5 1
      cat-fly/t1/.claude/settings.local.json
  3. 3 0
      cat-fly/t1/MEMORY.md
  4. 4 0
      cat-fly/t1/USER.md
  5. 64 53
      market-regime-identifier/cyb50_market_classifier_v3.py
  6. BIN
      market-regime-identifier/feature_stats.pkl
  7. 22 8
      market-regime-identifier/hmm_diagnosis.py
  8. BIN
      market-regime-identifier/hmm_model.pkl
  9. 12 14
      market-regime-identifier/market_regime_hmm.py
  10. 1 0
      market-regime-identifier/requirements.txt
  11. 25 14
      market-regime-identifier/train_and_validate.py
  12. 12 0
      research/chinext50_regime_project/.claude/scheduled_tasks.json
  13. 1 0
      research/chinext50_regime_project/.claude/scheduled_tasks.lock
  14. 156 0
      research/chinext50_regime_project/.codex/skills/openspec-apply-change/SKILL.md
  15. 114 0
      research/chinext50_regime_project/.codex/skills/openspec-archive-change/SKILL.md
  16. 288 0
      research/chinext50_regime_project/.codex/skills/openspec-explore/SKILL.md
  17. 110 0
      research/chinext50_regime_project/.codex/skills/openspec-propose/SKILL.md
  18. 87 0
      research/chinext50_regime_project/MEMORY.md
  19. 492 0
      research/chinext50_regime_project/README.md
  20. 24 0
      research/chinext50_regime_project/USER.md
  21. 23 0
      research/chinext50_regime_project/backtest/__init__.py
  22. 160 0
      research/chinext50_regime_project/backtest/engine.py
  23. 151 0
      research/chinext50_regime_project/backtest/events.py
  24. 612 0
      research/chinext50_regime_project/backtest/frozen_walkforward.py
  25. 70 0
      research/chinext50_regime_project/backtest/utility.py
  26. 76 0
      research/chinext50_regime_project/backtest/walkforward.py
  27. 743 0
      research/chinext50_regime_project/chinext50_blocker_checklist_for_codex.md
  28. 820 0
      research/chinext50_regime_project/chinext50_fullcode_guidance_for_codex_2026-04-10.md
  29. 691 0
      research/chinext50_regime_project/chinext50_harden_derived_breadth_direction_handoff_2026-04-09.md
  30. 587 0
      research/chinext50_regime_project/chinext50_post_b3_detailed_guidance_for_codex_2026-04-10.md
  31. 545 0
      research/chinext50_regime_project/chinext50_post_b3_feedback_response_for_codex_2026-04-10.md
  32. 585 0
      research/chinext50_regime_project/chinext50_post_b3_next_steps_for_codex_2026-04-10.md
  33. 787 0
      research/chinext50_regime_project/chinext50_recalibrate_guidance_for_codex_2026-04-09.md
  34. 123 0
      research/chinext50_regime_project/chinext50_regime_build_handoff_2026-04-08.md
  35. 357 0
      research/chinext50_regime_project/chinext50_regime_review_2026-04-09.md
  36. 3 0
      research/chinext50_regime_project/config/__init__.py
  37. 12 0
      research/chinext50_regime_project/config/loader.py
  38. 193 0
      research/chinext50_regime_project/config/regime.yaml
  39. 41 0
      research/chinext50_regime_project/data/__init__.py
  40. 1216 0
      research/chinext50_regime_project/data/breadth_builder.py
  41. 65 0
      research/chinext50_regime_project/data/index_metadata_snapshot.py
  42. 710 0
      research/chinext50_regime_project/data/ingestion.py
  43. 257 0
      research/chinext50_regime_project/data/io.py
  44. 50 0
      research/chinext50_regime_project/data/pit_builder.py
  45. 213 0
      research/chinext50_regime_project/data/sample_data.py
  46. 558 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B1/real_walkforward_report.py
  47. 226 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B1/test_real_walkforward_report_pipeline.py
  48. 586 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B2/frozen_walkforward.py
  49. 175 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B2/regime.yaml
  50. 410 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B2/test_frozen_walkforward.py
  51. 7 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B2/test_utility.py
  52. 27 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B2/utility.py
  53. 95 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B3/state_machine.py
  54. 129 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B3/test_policy.py
  55. 178 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/regime.yaml
  56. 95 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/state_machine.py
  57. 129 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/test_policy.py
  58. 610 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B4/frozen_walkforward.py
  59. 587 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B4/real_walkforward_report.py
  60. 231 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/B4/test_real_walkforward_report_pipeline.py
  61. 191 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1a/regime.yaml
  62. 213 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1a/test_policy.py
  63. 191 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/regime.yaml
  64. 142 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/state_machine.py
  65. 235 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/test_policy.py
  66. 191 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/regime.yaml
  67. 142 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/state_machine.py
  68. 235 0
      research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/test_policy.py
  69. 250 0
      research/chinext50_regime_project/deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.json
  70. 75 0
      research/chinext50_regime_project/deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.md
  71. 17 0
      research/chinext50_regime_project/deliverables/fullcode_guidance_closure_2026-04-24.md
  72. 64 0
      research/chinext50_regime_project/deliverables/gpt_pro_blockers_harden_derived_breadth_2026-04-09.md
  73. BIN
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09.zip
  74. 42 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/QUESTIONS_FOR_GPT_PRO.md
  75. 925 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/data/breadth_builder.py
  76. 64 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/deliverables/gpt_pro_blockers_harden_derived_breadth_2026-04-09.md
  77. 2 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/.openspec.yaml
  78. 59 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/design.md
  79. 25 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/proposal.md
  80. 35 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/constituent-derived-breadth/spec.md
  81. 8 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/implementation-issue-log/spec.md
  82. 9 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/real-walkforward-report/spec.md
  83. 21 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/tasks.md
  84. 39 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/constituent-derived-breadth/spec.md
  85. 12 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/implementation-issue-log/spec.md
  86. 27 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/real-walkforward-report/spec.md
  87. 15 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/ingestion_manifest.json
  88. 231 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/pit/pit_quality_summary.json
  89. 196 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/raw/breadth_derivation_summary.json
  90. 116 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/raw/breadth_integrity_summary.json
  91. 91 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/calibration/execution_calibration_recommendation.json
  92. 29 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/demo/metrics_summary.json
  93. 72 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/frozen/frozen_validation_summary.json
  94. 110 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/report/real_walkforward_summary.json
  95. 280 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/pipelines/real_walkforward_report.py
  96. 300 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/tests/test_breadth_builder.py
  97. 94 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/tests/test_real_walkforward_report_pipeline.py
  98. BIN
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09.zip
  99. 45 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09/CONTEXT_FOR_GPT_PRO.md
  100. 0 0
      research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09/QUESTIONS_FOR_GPT_PRO.md

+ 0 - 55
BOOTSTRAP.md

@@ -1,55 +0,0 @@
-# BOOTSTRAP.md - Hello, World
-
-_You just woke up. Time to figure out who you are._
-
-There is no memory yet. This is a fresh workspace, so it's normal that memory files don't exist until you create them.
-
-## The Conversation
-
-Don't interrogate. Don't be robotic. Just... talk.
-
-Start with something like:
-
-> "Hey. I just came online. Who am I? Who are you?"
-
-Then figure out together:
-
-1. **Your name** — What should they call you?
-2. **Your nature** — What kind of creature are you? (AI assistant is fine, but maybe you're something weirder)
-3. **Your vibe** — Formal? Casual? Snarky? Warm? What feels right?
-4. **Your emoji** — Everyone needs a signature.
-
-Offer suggestions if they're stuck. Have fun with it.
-
-## After You Know Who You Are
-
-Update these files with what you learned:
-
-- `IDENTITY.md` — your name, creature, vibe, emoji
-- `USER.md` — their name, how to address them, timezone, notes
-
-Then open `SOUL.md` together and talk about:
-
-- What matters to them
-- How they want you to behave
-- Any boundaries or preferences
-
-Write it down. Make it real.
-
-## Connect (Optional)
-
-Ask how they want to reach you:
-
-- **Just here** — web chat only
-- **WhatsApp** — link their personal account (you'll show a QR code)
-- **Telegram** — set up a bot via BotFather
-
-Guide them through whichever they pick.
-
-## When You're Done
-
-Delete this file. You don't need a bootstrap script anymore — you're you now.
-
----
-
-_Good luck out there. Make it count._

+ 5 - 1
cat-fly/t1/.claude/settings.local.json

@@ -11,7 +11,11 @@
       "Bash(py:*)",
       "Bash(ls D:/work/project/cyb50-quant/cat-fly/t1/*.py)",
       "Bash(ls D:/work/project/cyb50-quant/cat-fly/t1/*.csv)",
-      "Bash(cd:*)"
+      "Bash(cd:*)",
+      "Bash(openspec new:*)",
+      "Bash(openspec status:*)",
+      "Bash(openspec instructions:*)",
+      "Bash(openspec list:*)"
     ]
   }
 }

+ 3 - 0
cat-fly/t1/MEMORY.md

@@ -3,3 +3,6 @@
 - Workspace `t1` is a CYB50 30-minute quant research project centered on T+1 trading and market-environment filtering.
 - The repo currently has weak structure: research scripts, production-like reporting scripts, generated CSVs, and exploratory outputs all live in the root directory.
 - There is a notable encoding/mojibake problem in several Python files, which will make future maintenance and rule verification error-prone.
+- The core strategy pipeline remains: dual-direction base signals and execution, then long-only extraction, T+1 conversion, environment labeling, and a higher-level scoring/filtering layer for final backtests and reports.
+- Active optimization work now lives in sibling workspace `t2`, while `t1` remains the untouched baseline.
+- The current research target is to map the raw 30-minute T1 strategy's “comfort zones” and “dark zones” using higher-level indicators such as daily and weekly context, then turn those maps into explicit optimization rules.

+ 4 - 0
cat-fly/t1/USER.md

@@ -2,3 +2,7 @@
 
 - User is working in `D:\work\project\cyb50-quant\cat-fly\t1`.
 - User asked for project analysis on 2026-03-29.
+- User asked for analysis of the `t1` directory logic on 2026-04-05.
+- User later made clear that the real goal is to improve the original strategy, not just produce more reports.
+- User wants the original 30-minute strategy's “comfort zones” and “dark zones” quantified using daily / weekly or other higher-level indicators.
+- User prefers autonomous continuation instead of repeated confirmation requests.

+ 64 - 53
market-regime-identifier/cyb50_market_classifier_v3.py

@@ -10,12 +10,15 @@
 import numpy as np
 import pandas as pd
 from sklearn.ensemble import RandomForestClassifier
-from sklearn.model_selection import train_test_split, cross_val_score
+from sklearn.model_selection import TimeSeriesSplit, cross_val_score
 from sklearn.metrics import classification_report, confusion_matrix
 import baostock as bs
+from pathlib import Path
 import warnings
 warnings.filterwarnings('ignore')
 
+PROJECT_DIR = Path(__file__).resolve().parent
+
 
 def fetch_cyb50_data(start_date="2017-01-01", end_date="2025-12-31"):
     """获取创业板50真实历史数据"""
@@ -48,7 +51,7 @@ def fetch_cyb50_data(start_date="2017-01-01", end_date="2025-12-31"):
         bs.logout()
         
         if not data_list:
-            print(" 未获取到数据")
+            print("[ERR] 未获取到数据")
             return None
         
         df = pd.DataFrame(data_list)
@@ -56,14 +59,14 @@ def fetch_cyb50_data(start_date="2017-01-01", end_date="2025-12-31"):
         df = df.set_index('date').sort_index()
         df['return'] = df['close'].pct_change()
         
-        print(f" 获取成功: {len(df)}条数据")
+        print(f"[OK] 获取成功: {len(df)}条数据")
         print(f"  日期范围: {df.index[0].date()} ~ {df.index[-1].date()}")
         print(f"  价格范围: {df['close'].min():.2f} ~ {df['close'].max():.2f}")
         
         return df[['open', 'high', 'low', 'close', 'volume', 'return']]
     
     except Exception as e:
-        print(f" 数据获取失败: {e}")
+        print(f"[ERR] 数据获取失败: {e}")
         import traceback
         traceback.print_exc()
         return None
@@ -161,9 +164,17 @@ def calculate_features(df):
                                      (df['close'] - df['low']) / (df['high'] - df['low'] + 1e-10))
     
     # 13. 连续涨跌天数
-    features['consecutive_up'] = (df['return'] > 0).astype(int).groupby((df['return'] <= 0).astype(int).cumsum()).cumsum()
-    features['consecutive_down'] = (df['return'] < 0).astype(int).groupby((df['return'] >= 0).astype(int).cumsum()).cumsum()
-    
+    # 口径:return == 0 视为“中断连续序列”,且当天 up/down 都记 0
+    ret_sign = np.sign(df['return'].fillna(0))
+
+    up_mask = ret_sign > 0
+    up_group = (~up_mask).cumsum()
+    features['consecutive_up'] = up_mask.astype(int).groupby(up_group).cumsum()
+
+    down_mask = ret_sign < 0
+    down_group = (~down_mask).cumsum()
+    features['consecutive_down'] = down_mask.astype(int).groupby(down_group).cumsum()
+
     # 14. 新增:5日价格位置(用于判断超买超卖后的位置)
     features['price_position_5d'] = (df['close'] - df['low'].rolling(5).min()) / (df['high'].rolling(5).max() - df['low'].rolling(5).min() + 1e-10)
     
@@ -174,27 +185,29 @@ def calculate_features(df):
 
 
 def calculate_adx(df, period=14):
-    """计算ADX趋势强度指标"""
-    plus_dm = df['high'].diff()
-    minus_dm = df['low'].diff().abs()
-    
-    plus_dm[plus_dm < 0] = 0
-    minus_dm[minus_dm < 0] = 0
-    
+    """计算ADX趋势强度指标(标准 Wilder 方法)"""
+    up_move = df['high'].diff()
+    down_move = -df['low'].diff()
+
+    plus_dm = np.where((up_move > down_move) & (up_move > 0), up_move, 0.0)
+    minus_dm = np.where((down_move > up_move) & (down_move > 0), down_move, 0.0)
+
+    plus_dm = pd.Series(plus_dm, index=df.index)
+    minus_dm = pd.Series(minus_dm, index=df.index)
+
     tr = pd.concat([
         df['high'] - df['low'],
         (df['high'] - df['close'].shift()).abs(),
         (df['low'] - df['close'].shift()).abs()
     ], axis=1).max(axis=1)
-    
-    atr = tr.rolling(period).mean()
-    
-    plus_di = 100 * (plus_dm.rolling(period).mean() / atr)
-    minus_di = 100 * (minus_dm.rolling(period).mean() / atr)
-    
-    dx = (abs(plus_di - minus_di) / (plus_di + minus_di + 1e-10)) * 100
-    adx = dx.rolling(period).mean()
-    
+
+    atr = tr.ewm(alpha=1/period, adjust=False, min_periods=period).mean()
+    plus_di = 100 * plus_dm.ewm(alpha=1/period, adjust=False, min_periods=period).mean() / (atr + 1e-10)
+    minus_di = 100 * minus_dm.ewm(alpha=1/period, adjust=False, min_periods=period).mean() / (atr + 1e-10)
+
+    dx = (plus_di - minus_di).abs() / (plus_di + minus_di + 1e-10) * 100
+    adx = dx.ewm(alpha=1/period, adjust=False, min_periods=period).mean()
+
     return adx
 
 
@@ -253,29 +266,25 @@ def define_market_regime(df, lookback=10):
         # 定义标签
         label = 0  # 默认震荡
         
-        # ========== 反转判断(适中条件)==========
-        # 条件1: RSI极端值后的明显反向
-        condition_1 = (rsi_start > 68 and rsi_change < -18) or (rsi_start < 32 and rsi_change > 18)
-        
-        # 条件2: 价格前后明显反向
-        condition_2 = (first_half_return * second_half_return < 0 and 
-                      abs(first_half_return) > 1.8 and abs(second_half_return) > 1.2)
-        
-        # 条件3: 触及超买超卖区域
-        condition_3 = (rsi_max > 72 or rsi_min < 28)
-        
-        # 条件4: 整体波动率适中
-        condition_4 = 15 < volatility < 45
-        
-        # 满足至少2个条件算反转
-        reversal_score = sum([condition_1, condition_2, condition_3, condition_4])
-        if reversal_score >= 2:
+        # ========== 反转判断(收紧到明确前后反向)==========
+        reversal_core = (
+            (first_half_return >= 2.5 and second_half_return <= -2.0) or
+            (first_half_return <= -2.5 and second_half_return >= 2.0)
+        )
+        rsi_confirmation = (
+            (rsi_start > 68 and rsi_change < -18) or
+            (rsi_start < 32 and rsi_change > 18) or
+            (rsi_max > 72 or rsi_min < 28)
+        )
+
+        if reversal_core and volatility > 20 and price_range > 1.04 and rsi_confirmation:
             label = 2
-        
-        # ========== 趋势判断 ==========
-        elif abs(period_return) >= 3.2 and volatility < 38:
-            if price_range > 1.035:
-                if reversal_score < 2:  # 不是反转
+
+        # ========== 趋势判断(向主线边界靠拢) ==========
+        elif abs(period_return) >= 4.0 and volatility < 35:
+            if price_range > 1.04:
+                if not (abs(first_half_return) > 3 and abs(second_half_return) > 2 and
+                        np.sign(first_half_return) != np.sign(second_half_return)):
                     label = 1
         
         # ========== 震荡判断(默认)=========
@@ -311,7 +320,7 @@ def train_classifier(features, labels):
         min_samples_split=10,
         min_samples_leaf=5,
         random_state=42,
-        class_weight={0: 1.0, 1: 1.2, 2: 2.0}  # 给反转更高的权重
+        class_weight='balanced'
     )
     
     clf.fit(X_train, y_train)
@@ -320,12 +329,13 @@ def train_classifier(features, labels):
     train_score = clf.score(X_train, y_train)
     test_score = clf.score(X_test, y_test)
     
-    # 交叉验证
-    cv_scores = cross_val_score(clf, X, y, cv=5)
-    
+    # 时间序列交叉验证(避免未来数据泄漏到过去)
+    tscv = TimeSeriesSplit(n_splits=5)
+    cv_scores = cross_val_score(clf, X, y, cv=tscv)
+
     print(f"\n训练准确率: {train_score:.2%}")
     print(f"测试准确率: {test_score:.2%}")
-    print(f"交叉验证准确率: {cv_scores.mean():.2%} (+/- {cv_scores.std()*2:.2%})")
+    print(f"时间序列交叉验证准确率: {cv_scores.mean():.2%} (+/- {cv_scores.std()*2:.2%})")
     
     # 详细报告
     y_pred = clf.predict(X_test)
@@ -399,15 +409,16 @@ def main():
     
     print("\n状态概率分布:")
     for i, name in enumerate(state_names):
-        bar = '' * int(pred_proba[i] * 20)
+        bar = '#' * int(pred_proba[i] * 20)
         print(f"  {name}: {pred_proba[i]:.2%} {bar}")
     
     # 保存模型
     print("\n保存模型...")
     import pickle
-    with open('/root/.openclaw/workspace/market-regime-identifier/rf_classifier_v3.pkl', 'wb') as f:
+    model_path = PROJECT_DIR / 'rf_classifier_v3.pkl'
+    with open(model_path, 'wb') as f:
         pickle.dump(clf, f)
-    print("✓ 模型已保存: rf_classifier_v3.pkl")
+    print(f"[OK] 模型已保存: {model_path.name}")
     
     print("\n" + "="*70)
 

BIN
market-regime-identifier/feature_stats.pkl


+ 22 - 8
market-regime-identifier/hmm_diagnosis.py

@@ -11,7 +11,12 @@ import warnings
 warnings.filterwarnings('ignore')
 
 import sys
-sys.path.insert(0, '/root/.openclaw/workspace/market-regime-identifier')
+from pathlib import Path
+
+PROJECT_DIR = Path(__file__).resolve().parent
+if str(PROJECT_DIR) not in sys.path:
+    sys.path.insert(0, str(PROJECT_DIR))
+
 from market_regime_hmm import MarketRegimeHMM, extract_features
 
 print("="*70)
@@ -31,21 +36,30 @@ for i in range(8):
     state = i % 3
     seg_prices = []
     price = 1000 + i * 100
-    
+
     for day in range(100):
         if state == 0:  # 震荡: 零均值,中等波动
             ret = np.random.normal(0, 0.015)
         elif state == 1:  # 趋势: 正漂移,低波动
             ret = np.random.normal(0.001, 0.010)
-        else:  # 反转: 负漂移,高波动
-            ret = np.random.normal(-0.001, 0.025)
-        
+        else:  # 反转: 前半段单边,后半段反向,形成真正的拐点
+            if day < 50:
+                direction = 1 if (i % 2 == 0) else -1
+                ret = np.random.normal(direction * 0.0018, 0.018)
+            else:
+                direction = -1 if (i % 2 == 0) else 1
+                ret = np.random.normal(direction * 0.0018, 0.018)
+
         price *= (1 + ret)
         seg_prices.append(price)
         true_states.append(state)
-    
+
     segments.extend(seg_prices)
 
+# 为反转段补充一个更符合定义的说明
+print("  反转段定义: 前50天单边运行,后50天反向运行")
+
+
 dates = pd.date_range('2020-01-01', periods=n_days, freq='B')
 df = pd.DataFrame({
     'open': np.array(segments) + np.random.normal(0, 2, n_days),
@@ -132,8 +146,8 @@ print("\n[5.4] 状态定义验证")
 state_names = ['震荡', '趋势', '反转']
 expected = {
     0: {'vol': '中高', 'ret': '接近0'},
-    1: {'vol': '低', 'ret': '正'},
-    2: {'vol': '高', 'ret': '负'}
+    1: {'vol': '低', 'ret': '单边/负漂移'},
+    2: {'vol': '较高', 'ret': '阶段内先同向后反向'}
 }
 
 for i in range(3):

BIN
market-regime-identifier/hmm_model.pkl


+ 12 - 14
market-regime-identifier/market_regime_hmm.py

@@ -39,20 +39,18 @@ def calculate_hurst(prices, max_lag=100):
     return reg[0]
 
 def calculate_rsi(prices, period=14):
-    """计算RSI指标"""
-    deltas = np.diff(prices)
-    gains = np.where(deltas > 0, deltas, 0)
-    losses = np.where(deltas < 0, -deltas, 0)
-    
-    avg_gains = np.convolve(gains, np.ones(period)/period, mode='valid')
-    avg_losses = np.convolve(losses, np.ones(period)/period, mode='valid')
-    
-    rs = avg_gains / (avg_losses + 1e-10)
+    """计算RSI指标(标准 Wilder 方法)"""
+    prices = pd.Series(prices)
+    delta = prices.diff()
+    gain = delta.clip(lower=0)
+    loss = (-delta).clip(lower=0)
+
+    avg_gain = gain.ewm(alpha=1/period, adjust=False, min_periods=period).mean()
+    avg_loss = loss.ewm(alpha=1/period, adjust=False, min_periods=period).mean()
+
+    rs = avg_gain / (avg_loss + 1e-10)
     rsi = 100 - (100 / (1 + rs))
-    
-    # 补齐长度
-    padding = np.full(period, 50)
-    return np.concatenate([padding, rsi])
+    return rsi.fillna(50).to_numpy()
 
 def extract_features(df):
     """
@@ -396,7 +394,7 @@ def main():
     print(f"置信度: {current_regime['confidence']:.2%}")
     print("\n状态概率分布:")
     for name, prob in current_regime['probabilities'].items():
-        bar = '' * int(prob * 20)
+        bar = '#' * int(prob * 20)
         print(f"  {name:6s}: {prob:.2%} {bar}")
     
     # 策略建议

+ 1 - 0
market-regime-identifier/requirements.txt

@@ -3,3 +3,4 @@ pandas>=1.3.0
 scipy>=1.7.0
 scikit-learn>=0.24.0
 hmmlearn>=0.2.7
+# akshare 已在运行时按需导入;若需交易日历增强,建议安装 akshare>=1.12.0

+ 25 - 14
market-regime-identifier/train_and_validate.py

@@ -8,7 +8,11 @@
 import numpy as np
 import pandas as pd
 import sys
-sys.path.insert(0, '/root/.openclaw/workspace/market-regime-identifier')
+from pathlib import Path
+
+PROJECT_DIR = Path(__file__).resolve().parent
+if str(PROJECT_DIR) not in sys.path:
+    sys.path.insert(0, str(PROJECT_DIR))
 
 from market_regime_hmm import (
     MarketRegimeHMM, 
@@ -67,16 +71,23 @@ def generate_synthetic_data(n_days=2000, seed=42):
     
     for i in range(n_days):
         # 模拟三种状态切换
-        if (i // 200) % 3 == 0:  # 趋势上涨
+        cycle_idx = (i // 200) % 3
+        day_in_segment = i % 200
+
+        if cycle_idx == 0:  # 趋势上涨
             price *= (1 + np.random.normal(0.001, 0.012))
             true_states.append(1)
-        elif (i // 200) % 3 == 1:  # 震荡
+        elif cycle_idx == 1:  # 震荡
             price *= (1 + np.random.normal(0, 0.015))
             true_states.append(0)
-        else:  # 反转下跌
-            price *= (1 + np.random.normal(-0.001, 0.013))
+        else:  # 反转:前半段单边,后半段反向,形成真正的拐点
+            if day_in_segment < 100:
+                ret = np.random.normal(-0.0016, 0.016)
+            else:
+                ret = np.random.normal(0.0016, 0.016)
+            price *= (1 + ret)
             true_states.append(2)
-        
+
         prices.append(price)
     
     df = pd.DataFrame({
@@ -211,24 +222,24 @@ def train_and_validate():
             checks_total = 2
             
             if np.mean(trend_returns) > np.mean(range_returns):
-                print(" 趋势状态收益 > 震荡状态收益")
+                print("[OK] 趋势状态收益 > 震荡状态收益")
                 checks_passed += 1
             else:
-                print(" 趋势状态收益应 > 震荡状态收益")
+                print("[ERR] 趋势状态收益应 > 震荡状态收益")
             
             if len([s for s in test_states if s == 1]) > len(test_states) * 0.1:
-                print(" 趋势状态出现频率合理 (>10%)")
+                print("[OK] 趋势状态出现频率合理 (>10%)")
                 checks_passed += 1
             else:
-                print(" 趋势状态出现频率过低")
+                print("[ERR] 趋势状态出现频率过低")
             
             accuracy = (checks_passed / checks_total) * 100
             print(f"\n状态识别合理性: {accuracy:.0f}% ({checks_passed}/{checks_total})")
             
             if accuracy >= 50:  # 实际使用时要求72%
-                print(" 通过基本验证")
+                print("[OK] 通过基本验证")
             else:
-                print(" 需要重新训练")
+                print("[ERR] 需要重新训练")
     
     # 当前状态
     print("\n" + "="*70)
@@ -245,7 +256,7 @@ def train_and_validate():
     # 保存模型
     print("\n[保存模型...]")
     import pickle
-    model_path = '/root/.openclaw/workspace/market-regime-identifier/hmm_model.pkl'
+    model_path = PROJECT_DIR / 'hmm_model.pkl'
     with open(model_path, 'wb') as f:
         pickle.dump(hmm, f)
     print(f"模型已保存: {model_path}")
@@ -256,7 +267,7 @@ def train_and_validate():
         'train_mean': X_train.mean().to_dict(),
         'train_std': X_train.std().to_dict()
     }
-    stats_path = '/root/.openclaw/workspace/market-regime-identifier/feature_stats.pkl'
+    stats_path = PROJECT_DIR / 'feature_stats.pkl'
     with open(stats_path, 'wb') as f:
         pickle.dump(feature_stats, f)
     print(f"特征统计已保存: {stats_path}")

Разлика између датотеке није приказан због своје велике величине
+ 12 - 0
research/chinext50_regime_project/.claude/scheduled_tasks.json


+ 1 - 0
research/chinext50_regime_project/.claude/scheduled_tasks.lock

@@ -0,0 +1 @@
+{"sessionId":"aafa6c9b-b6bc-4331-b33c-f25a021f5876","pid":14564,"acquiredAt":1777019350237}

+ 156 - 0
research/chinext50_regime_project/.codex/skills/openspec-apply-change/SKILL.md

@@ -0,0 +1,156 @@
+---
+name: openspec-apply-change
+description: Implement tasks from an OpenSpec change. Use when the user wants to start implementing, continue implementation, or work through tasks.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Implement tasks from an OpenSpec change.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **Select the change**
+
+   If a name is provided, use it. Otherwise:
+   - Infer from conversation context if the user mentioned a change
+   - Auto-select if only one active change exists
+   - If ambiguous, run `openspec list --json` to get available changes and use the **AskUserQuestion tool** to let the user select
+
+   Always announce: "Using change: <name>" and how to override (e.g., `/opsx:apply <other>`).
+
+2. **Check status to understand the schema**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used (e.g., "spec-driven")
+   - Which artifact contains the tasks (typically "tasks" for spec-driven, check status for others)
+
+3. **Get apply instructions**
+
+   ```bash
+   openspec instructions apply --change "<name>" --json
+   ```
+
+   This returns:
+   - Context file paths (varies by schema - could be proposal/specs/design/tasks or spec/tests/implementation/docs)
+   - Progress (total, complete, remaining)
+   - Task list with status
+   - Dynamic instruction based on current state
+
+   **Handle states:**
+   - If `state: "blocked"` (missing artifacts): show message, suggest using openspec-continue-change
+   - If `state: "all_done"`: congratulate, suggest archive
+   - Otherwise: proceed to implementation
+
+4. **Read context files**
+
+   Read the files listed in `contextFiles` from the apply instructions output.
+   The files depend on the schema being used:
+   - **spec-driven**: proposal, specs, design, tasks
+   - Other schemas: follow the contextFiles from CLI output
+
+5. **Show current progress**
+
+   Display:
+   - Schema being used
+   - Progress: "N/M tasks complete"
+   - Remaining tasks overview
+   - Dynamic instruction from CLI
+
+6. **Implement tasks (loop until done or blocked)**
+
+   For each pending task:
+   - Show which task is being worked on
+   - Make the code changes required
+   - Keep changes minimal and focused
+   - Mark task complete in the tasks file: `- [ ]` → `- [x]`
+   - Continue to next task
+
+   **Pause if:**
+   - Task is unclear → ask for clarification
+   - Implementation reveals a design issue → suggest updating artifacts
+   - Error or blocker encountered → report and wait for guidance
+   - User interrupts
+
+7. **On completion or pause, show status**
+
+   Display:
+   - Tasks completed this session
+   - Overall progress: "N/M tasks complete"
+   - If all done: suggest archive
+   - If paused: explain why and wait for guidance
+
+**Output During Implementation**
+
+```
+## Implementing: <change-name> (schema: <schema-name>)
+
+Working on task 3/7: <task description>
+[...implementation happening...]
+✓ Task complete
+
+Working on task 4/7: <task description>
+[...implementation happening...]
+✓ Task complete
+```
+
+**Output On Completion**
+
+```
+## Implementation Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 7/7 tasks complete ✓
+
+### Completed This Session
+- [x] Task 1
+- [x] Task 2
+...
+
+All tasks complete! Ready to archive this change.
+```
+
+**Output On Pause (Issue Encountered)**
+
+```
+## Implementation Paused
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 4/7 tasks complete
+
+### Issue Encountered
+<description of the issue>
+
+**Options:**
+1. <option 1>
+2. <option 2>
+3. Other approach
+
+What would you like to do?
+```
+
+**Guardrails**
+- Keep going through tasks until done or blocked
+- Always read context files before starting (from the apply instructions output)
+- If task is ambiguous, pause and ask before implementing
+- If implementation reveals issues, pause and suggest artifact updates
+- Keep code changes minimal and scoped to each task
+- Update task checkbox immediately after completing each task
+- Pause on errors, blockers, or unclear requirements - don't guess
+- Use contextFiles from CLI output, don't assume specific file names
+
+**Fluid Workflow Integration**
+
+This skill supports the "actions on a change" model:
+
+- **Can be invoked anytime**: Before all artifacts are done (if tasks exist), after partial implementation, interleaved with other actions
+- **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly

+ 114 - 0
research/chinext50_regime_project/.codex/skills/openspec-archive-change/SKILL.md

@@ -0,0 +1,114 @@
+---
+name: openspec-archive-change
+description: Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Archive a completed change in the experimental workflow.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **If no change name provided, prompt for selection**
+
+   Run `openspec list --json` to get available changes. Use the **AskUserQuestion tool** to let the user select.
+
+   Show only active changes (not already archived).
+   Include the schema used for each change if available.
+
+   **IMPORTANT**: Do NOT guess or auto-select a change. Always let the user choose.
+
+2. **Check artifact completion status**
+
+   Run `openspec status --change "<name>" --json` to check artifact completion.
+
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used
+   - `artifacts`: List of artifacts with their status (`done` or other)
+
+   **If any artifacts are not `done`:**
+   - Display warning listing incomplete artifacts
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+3. **Check task completion status**
+
+   Read the tasks file (typically `tasks.md`) to check for incomplete tasks.
+
+   Count tasks marked with `- [ ]` (incomplete) vs `- [x]` (complete).
+
+   **If incomplete tasks found:**
+   - Display warning showing count of incomplete tasks
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+   **If no tasks file exists:** Proceed without task-related warning.
+
+4. **Assess delta spec sync state**
+
+   Check for delta specs at `openspec/changes/<name>/specs/`. If none exist, proceed without sync prompt.
+
+   **If delta specs exist:**
+   - Compare each delta spec with its corresponding main spec at `openspec/specs/<capability>/spec.md`
+   - Determine what changes would be applied (adds, modifications, removals, renames)
+   - Show a combined summary before prompting
+
+   **Prompt options:**
+   - If changes needed: "Sync now (recommended)", "Archive without syncing"
+   - If already synced: "Archive now", "Sync anyway", "Cancel"
+
+   If user chooses sync, use Task tool (subagent_type: "general-purpose", prompt: "Use Skill tool to invoke openspec-sync-specs for change '<name>'. Delta spec analysis: <include the analyzed delta spec summary>"). Proceed to archive regardless of choice.
+
+5. **Perform the archive**
+
+   Create the archive directory if it doesn't exist:
+   ```bash
+   mkdir -p openspec/changes/archive
+   ```
+
+   Generate target name using current date: `YYYY-MM-DD-<change-name>`
+
+   **Check if target already exists:**
+   - If yes: Fail with error, suggest renaming existing archive or using different date
+   - If no: Move the change directory to archive
+
+   ```bash
+   mv openspec/changes/<name> openspec/changes/archive/YYYY-MM-DD-<name>
+   ```
+
+6. **Display summary**
+
+   Show archive completion summary including:
+   - Change name
+   - Schema that was used
+   - Archive location
+   - Whether specs were synced (if applicable)
+   - Note about any warnings (incomplete artifacts/tasks)
+
+**Output On Success**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** ✓ Synced to main specs (or "No delta specs" or "Sync skipped")
+
+All artifacts complete. All tasks complete.
+```
+
+**Guardrails**
+- Always prompt for change selection if not provided
+- Use artifact graph (openspec status --json) for completion checking
+- Don't block archive on warnings - just inform and confirm
+- Preserve .openspec.yaml when moving to archive (it moves with the directory)
+- Show clear summary of what happened
+- If sync is requested, use openspec-sync-specs approach (agent-driven)
+- If delta specs exist, always run the sync assessment and show the combined summary before prompting

+ 288 - 0
research/chinext50_regime_project/.codex/skills/openspec-explore/SKILL.md

@@ -0,0 +1,288 @@
+---
+name: openspec-explore
+description: Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Enter explore mode. Think deeply. Visualize freely. Follow the conversation wherever it goes.
+
+**IMPORTANT: Explore mode is for thinking, not implementing.** You may read files, search code, and investigate the codebase, but you must NEVER write code or implement features. If the user asks you to implement something, remind them to exit explore mode first and create a change proposal. You MAY create OpenSpec artifacts (proposals, designs, specs) if the user asks—that's capturing thinking, not implementing.
+
+**This is a stance, not a workflow.** There are no fixed steps, no required sequence, no mandatory outputs. You're a thinking partner helping the user explore.
+
+---
+
+## The Stance
+
+- **Curious, not prescriptive** - Ask questions that emerge naturally, don't follow a script
+- **Open threads, not interrogations** - Surface multiple interesting directions and let the user follow what resonates. Don't funnel them through a single path of questions.
+- **Visual** - Use ASCII diagrams liberally when they'd help clarify thinking
+- **Adaptive** - Follow interesting threads, pivot when new information emerges
+- **Patient** - Don't rush to conclusions, let the shape of the problem emerge
+- **Grounded** - Explore the actual codebase when relevant, don't just theorize
+
+---
+
+## What You Might Do
+
+Depending on what the user brings, you might:
+
+**Explore the problem space**
+- Ask clarifying questions that emerge from what they said
+- Challenge assumptions
+- Reframe the problem
+- Find analogies
+
+**Investigate the codebase**
+- Map existing architecture relevant to the discussion
+- Find integration points
+- Identify patterns already in use
+- Surface hidden complexity
+
+**Compare options**
+- Brainstorm multiple approaches
+- Build comparison tables
+- Sketch tradeoffs
+- Recommend a path (if asked)
+
+**Visualize**
+```
+┌─────────────────────────────────────────┐
+│     Use ASCII diagrams liberally        │
+├─────────────────────────────────────────┤
+│                                         │
+│   ┌────────┐         ┌────────┐        │
+│   │ State  │────────▶│ State  │        │
+│   │   A    │         │   B    │        │
+│   └────────┘         └────────┘        │
+│                                         │
+│   System diagrams, state machines,      │
+│   data flows, architecture sketches,    │
+│   dependency graphs, comparison tables  │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+**Surface risks and unknowns**
+- Identify what could go wrong
+- Find gaps in understanding
+- Suggest spikes or investigations
+
+---
+
+## OpenSpec Awareness
+
+You have full context of the OpenSpec system. Use it naturally, don't force it.
+
+### Check for context
+
+At the start, quickly check what exists:
+```bash
+openspec list --json
+```
+
+This tells you:
+- If there are active changes
+- Their names, schemas, and status
+- What the user might be working on
+
+### When no change exists
+
+Think freely. When insights crystallize, you might offer:
+
+- "This feels solid enough to start a change. Want me to create a proposal?"
+- Or keep exploring - no pressure to formalize
+
+### When a change exists
+
+If the user mentions a change or you detect one is relevant:
+
+1. **Read existing artifacts for context**
+   - `openspec/changes/<name>/proposal.md`
+   - `openspec/changes/<name>/design.md`
+   - `openspec/changes/<name>/tasks.md`
+   - etc.
+
+2. **Reference them naturally in conversation**
+   - "Your design mentions using Redis, but we just realized SQLite fits better..."
+   - "The proposal scopes this to premium users, but we're now thinking everyone..."
+
+3. **Offer to capture when decisions are made**
+
+   | Insight Type | Where to Capture |
+   |--------------|------------------|
+   | New requirement discovered | `specs/<capability>/spec.md` |
+   | Requirement changed | `specs/<capability>/spec.md` |
+   | Design decision made | `design.md` |
+   | Scope changed | `proposal.md` |
+   | New work identified | `tasks.md` |
+   | Assumption invalidated | Relevant artifact |
+
+   Example offers:
+   - "That's a design decision. Capture it in design.md?"
+   - "This is a new requirement. Add it to specs?"
+   - "This changes scope. Update the proposal?"
+
+4. **The user decides** - Offer and move on. Don't pressure. Don't auto-capture.
+
+---
+
+## What You Don't Have To Do
+
+- Follow a script
+- Ask the same questions every time
+- Produce a specific artifact
+- Reach a conclusion
+- Stay on topic if a tangent is valuable
+- Be brief (this is thinking time)
+
+---
+
+## Handling Different Entry Points
+
+**User brings a vague idea:**
+```
+User: I'm thinking about adding real-time collaboration
+
+You: Real-time collab is a big space. Let me think about this...
+
+      COLLABORATION SPECTRUM
+      ════════════════════════════════════════════
+
+      Awareness          Coordination         Sync
+          │                   │                 │
+          ▼                   ▼                 ▼
+      ┌────────┐         ┌────────┐        ┌────────┐
+      │Presence│         │Cursors │        │  CRDT  │
+      │  "3    │         │ Multi  │        │Conflict│
+      │online" │         │ select │        │  free  │
+      └────────┘         └────────┘        └────────┘
+          │                   │                 │
+       trivial            moderate           complex
+
+      Where's your head at?
+```
+
+**User brings a specific problem:**
+```
+User: The auth system is a mess
+
+You: [reads codebase]
+
+     ┌─────────────────────────────────────────────┐
+     │              CURRENT AUTH FLOW              │
+     └─────────────────────────────────────────────┘
+                        │
+          ┌─────────────┼─────────────┐
+          ▼             ▼             ▼
+     ┌─────────┐  ┌─────────┐  ┌─────────┐
+     │ Google  │  │  GitHub │  │  Email  │
+     │  OAuth  │  │  OAuth  │  │  Magic  │
+     └────┬────┘  └────┬────┘  └────┬────┘
+          │             │             │
+          └─────────────┼─────────────┘
+                        ▼
+                  ┌───────────┐
+                  │  Session  │
+                  └─────┬─────┘
+                        │
+                        ▼
+                  ┌───────────┐
+                  │   Perms   │
+                  └───────────┘
+
+     I see three tangles. Which one's burning?
+```
+
+**User is stuck mid-implementation:**
+```
+User: /opsx:explore add-auth-system
+      The OAuth integration is more complex than expected
+
+You: [reads change artifacts]
+
+     You're on task 4: "Implement OAuth flow"
+
+     Let me trace what's involved...
+
+     [draws diagram, explores options, suggests paths]
+
+     Want to update the design to reflect this?
+     Or add a spike task to investigate?
+```
+
+**User wants to compare options:**
+```
+User: Should we use Postgres or SQLite?
+
+You: Generic answer is boring. What's the context?
+
+User: A CLI tool that tracks local dev environments
+
+You: That changes everything.
+
+     ┌─────────────────────────────────────────────────┐
+     │         CLI TOOL DATA STORAGE                  │
+     └─────────────────────────────────────────────────┘
+
+     Key constraints:
+     • No daemon running
+     • Must work offline
+     • Single user
+
+                  SQLite          Postgres
+     Deployment   embedded ✓      needs server ✗
+     Offline      yes ✓           no ✗
+     Single file  yes ✓           no ✗
+
+     SQLite. Not even close.
+
+     Unless... is there a sync component?
+```
+
+---
+
+## Ending Discovery
+
+There's no required ending. Discovery might:
+
+- **Flow into a proposal**: "Ready to start? I can create a change proposal."
+- **Result in artifact updates**: "Updated design.md with these decisions"
+- **Just provide clarity**: User has what they need, moves on
+- **Continue later**: "We can pick this up anytime"
+
+When it feels like things are crystallizing, you might summarize:
+
+```
+## What We Figured Out
+
+**The problem**: [crystallized understanding]
+
+**The approach**: [if one emerged]
+
+**Open questions**: [if any remain]
+
+**Next steps** (if ready):
+- Create a change proposal
+- Keep exploring: just keep talking
+```
+
+But this summary is optional. Sometimes the thinking IS the value.
+
+---
+
+## Guardrails
+
+- **Don't implement** - Never write code or implement features. Creating OpenSpec artifacts is fine, writing application code is not.
+- **Don't fake understanding** - If something is unclear, dig deeper
+- **Don't rush** - Discovery is thinking time, not task time
+- **Don't force structure** - Let patterns emerge naturally
+- **Don't auto-capture** - Offer to save insights, don't just do it
+- **Do visualize** - A good diagram is worth many paragraphs
+- **Do explore the codebase** - Ground discussions in reality
+- **Do question assumptions** - Including the user's and your own

+ 110 - 0
research/chinext50_regime_project/.codex/skills/openspec-propose/SKILL.md

@@ -0,0 +1,110 @@
+---
+name: openspec-propose
+description: Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Propose a new change - create the change and generate all artifacts in one step.
+
+I'll create a change with artifacts:
+- proposal.md (what & why)
+- design.md (how)
+- tasks.md (implementation steps)
+
+When ready to implement, run /opsx:apply
+
+---
+
+**Input**: The user's request should include a change name (kebab-case) OR a description of what they want to build.
+
+**Steps**
+
+1. **If no clear input provided, ask what they want to build**
+
+   Use the **AskUserQuestion tool** (open-ended, no preset options) to ask:
+   > "What change do you want to work on? Describe what you want to build or fix."
+
+   From their description, derive a kebab-case name (e.g., "add user authentication" → `add-user-auth`).
+
+   **IMPORTANT**: Do NOT proceed without understanding what the user wants to build.
+
+2. **Create the change directory**
+   ```bash
+   openspec new change "<name>"
+   ```
+   This creates a scaffolded change at `openspec/changes/<name>/` with `.openspec.yaml`.
+
+3. **Get the artifact build order**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to get:
+   - `applyRequires`: array of artifact IDs needed before implementation (e.g., `["tasks"]`)
+   - `artifacts`: list of all artifacts with their status and dependencies
+
+4. **Create artifacts in sequence until apply-ready**
+
+   Use the **TodoWrite tool** to track progress through the artifacts.
+
+   Loop through artifacts in dependency order (artifacts with no pending dependencies first):
+
+   a. **For each artifact that is `ready` (dependencies satisfied)**:
+      - Get instructions:
+        ```bash
+        openspec instructions <artifact-id> --change "<name>" --json
+        ```
+      - The instructions JSON includes:
+        - `context`: Project background (constraints for you - do NOT include in output)
+        - `rules`: Artifact-specific rules (constraints for you - do NOT include in output)
+        - `template`: The structure to use for your output file
+        - `instruction`: Schema-specific guidance for this artifact type
+        - `outputPath`: Where to write the artifact
+        - `dependencies`: Completed artifacts to read for context
+      - Read any completed dependency files for context
+      - Create the artifact file using `template` as the structure
+      - Apply `context` and `rules` as constraints - but do NOT copy them into the file
+      - Show brief progress: "Created <artifact-id>"
+
+   b. **Continue until all `applyRequires` artifacts are complete**
+      - After creating each artifact, re-run `openspec status --change "<name>" --json`
+      - Check if every artifact ID in `applyRequires` has `status: "done"` in the artifacts array
+      - Stop when all `applyRequires` artifacts are done
+
+   c. **If an artifact requires user input** (unclear context):
+      - Use **AskUserQuestion tool** to clarify
+      - Then continue with creation
+
+5. **Show final status**
+   ```bash
+   openspec status --change "<name>"
+   ```
+
+**Output**
+
+After completing all artifacts, summarize:
+- Change name and location
+- List of artifacts created with brief descriptions
+- What's ready: "All artifacts created! Ready for implementation."
+- Prompt: "Run `/opsx:apply` or ask me to implement to start working on the tasks."
+
+**Artifact Creation Guidelines**
+
+- Follow the `instruction` field from `openspec instructions` for each artifact type
+- The schema defines what each artifact should contain - follow it
+- Read dependency artifacts for context before creating new ones
+- Use `template` as the structure for your output file - fill in its sections
+- **IMPORTANT**: `context` and `rules` are constraints for YOU, not content for the file
+  - Do NOT copy `<context>`, `<rules>`, `<project_context>` blocks into the artifact
+  - These guide what you write, but should never appear in the output
+
+**Guardrails**
+- Create ALL artifacts needed for implementation (as defined by schema's `apply.requires`)
+- Always read dependency artifacts before creating a new one
+- If context is critically unclear, ask the user - but prefer making reasonable decisions to keep momentum
+- If a change with that name already exists, ask if user wants to continue it or create a new one
+- Verify each artifact file exists after writing before proceeding to next

Разлика између датотеке није приказан због своје велике величине
+ 87 - 0
research/chinext50_regime_project/MEMORY.md


+ 492 - 0
research/chinext50_regime_project/README.md

@@ -0,0 +1,492 @@
+# ChiNext 50 Regime Project Starter
+
+这是一个**创业板50专用**的日频 regime-aware exposure control 项目骨架。
+
+它的目标不是预测每天涨跌,而是尽量做到:
+- 大跌/拥挤期少亏
+- 真修复阶段逐步回补
+- 主升段保留大部分参与率
+
+## 当前已经搭好的内容
+
+- `data/`:CSV/parquet 读取器 + synthetic demo 数据生成器
+- `features/`:价格、广度、相对强弱三层特征
+- `model/`:连续分数、5 态状态机、仓位映射和硬 veto
+- `backtest/`:next-open 近似执行回测、utility、事件切片
+- `pipelines/`:demo 管线 + frozen-hypothesis validation
+- `tests/`:最小端到端测试
+
+## 核心状态
+
+- `risk_off`
+- `repair`
+- `trend`
+- `chop`
+- `euphoric_late`
+
+## 核心分数
+
+- `trend_score`
+- `breadth_score`
+- `stress_score`
+- `crowding_score`
+- `repair_score`
+
+以及三个路径型 hazard:
+- `down_hazard`
+- `repair_hazard`
+- `rebound_hazard`
+
+## 运行 demo
+
+在项目根目录执行:
+
+```bash
+python pipelines/run_demo.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --output-dir outputs/demo
+```
+
+这会使用 synthetic 数据生成:
+- `outputs/demo/daily_ledger.csv`
+- `outputs/demo/event_summary.csv`
+- `outputs/demo/metrics_summary.json`
+
+## 运行 frozen-hypothesis 验证
+
+```bash
+python pipelines/frozen_hypothesis_validation.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --output-dir outputs/frozen_validation
+```
+
+## 换成真实数据
+
+你的 CSV/parquet 至少需要这些列:
+
+- `date`
+- `open`
+- `high`
+- `low`
+- `close`
+- `volume`
+
+建议同时提供:
+- `hs300_close`
+- `star50_close`
+- `csi1000_close`
+- `pct_constituents_above_20dma`
+- `pct_constituents_above_60dma`
+- `pct_new_high_20`
+- `pct_new_low_20`
+- `eq_weight_ret_5`
+- `weighted_ret_5`
+- `top3_contribution_5`
+- `corr_spike_20`
+- `dispersion_20`
+
+运行方式:
+
+```bash
+python pipelines/run_demo.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --output-dir outputs/real_data_demo
+```
+
+## 重要说明
+
+- 当前 scaffold **不是**业绩证明,只是把“特征 -> 分数 -> 状态 -> 仓位 -> 回测 -> 事件诊断”这条闭环先搭通。
+- economic effect 需要你接入**真实的创业板50指数/ETF历史 + 历史成分股宽度数据**后再做严格 walk-forward 验证。
+- 第一阶段不要同时扩到多市场或复杂 readiness/portability 系统。
+
+## Real Data Input Contract and Quality Gate
+
+The runtime pipelines now require a full point-in-time dataset and can optionally block low-quality data before feature construction.
+
+### Required PIT columns
+
+`date`, `open`, `high`, `low`, `close`, `volume`, `hs300_close`, `star50_close`, `csi1000_close`, `pct_constituents_above_20dma`, `pct_constituents_above_60dma`, `pct_new_high_20`, `pct_new_low_20`, `eq_weight_ret_5`, `weighted_ret_5`, `top3_contribution_5`, `top1_contribution_5`, `top10_contribution_5`, `sector_concentration_20`, `corr_spike_20`, `dispersion_20`
+
+- Column names are normalized to lowercase with surrounding whitespace removed.
+- Duplicate trading dates are rejected.
+- Rows are sorted by trading date before downstream processing.
+
+- Runtime entrypoints no longer merge sidecars on the fly.
+- If required PIT columns are missing, the pipeline fails before quality gate and feature construction.
+
+### Data quality gate modes
+
+- Non-strict (default): pipeline continues and records warnings when critical-column coverage is below threshold.
+- Strict (`--strict-data`): pipeline stops only when configured `blocking_columns` are breached; non-blocking breaches remain warnings.
+
+Coverage threshold configuration:
+
+- Config defaults: `config/regime.yaml` -> `data_quality.default_min_coverage` and `data_quality.column_min_coverage`
+- CLI override: `--min-coverage`
+
+### Output artifact
+
+Each run writes `data_quality_summary.json` into the output directory.
+This artifact includes gate mode, pass/fail status, breach severities (`error`/`warning`), and field-level coverage metrics.
+
+### Example commands
+
+```bash
+python pipelines/run_demo.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --strict-data \
+  --min-coverage 0.98 \
+  --output-dir outputs/real_data_demo
+```
+
+```bash
+python pipelines/frozen_hypothesis_validation.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --strict-data \
+  --min-coverage 0.98 \
+  --output-dir outputs/frozen_validation_real
+```
+
+## Build Point-In-Time (PIT) Dataset
+
+Use `pipelines/build_pit_dataset.py` to create a reusable point-in-time table before running strategy pipelines.
+
+### Command
+
+```bash
+python pipelines/build_pit_dataset.py \
+  --market-csv path/to/chinext50_market.csv \
+  --sidecar-csv path/to/chinext50_benchmark_sidecar.csv \
+  --sidecar-csv path/to/chinext50_breadth_sidecar.csv \
+  --output-path outputs/pit/chinext50_pit.csv
+```
+
+Optional quality controls:
+
+- `--strict-data`: block PIT output when quality breaches occur
+- `--min-coverage 0.98`: override minimum non-null coverage threshold
+- `--config path/to/regime.yaml`: load custom quality defaults
+
+### Output semantics
+
+- Always writes `pit_quality_summary.json` in the same output directory.
+- On success, writes PIT data to `--output-path` (`.csv` or `.parquet`).
+- In strict failure mode, PIT file is not written, but `pit_quality_summary.json` is still written for diagnostics.
+- Quality summary includes source metadata:
+  - `sources.market_path`
+  - `sources.sidecar_paths`
+  - `sources.sidecar_count`
+- `sources.merged_row_count`
+  - `pit_columns`
+
+## Real Data Ingestion
+
+Use `pipelines/ingest_real_data.py` to fetch/load source data, publish `raw` + `staging` layers, and output final PIT in one run.
+
+### CSV provider (local source files)
+
+```bash
+python pipelines/ingest_real_data.py \
+  --provider csv \
+  --market-csv path/to/chinext50_market.csv \
+  --hs300-csv path/to/hs300.csv \
+  --star50-csv path/to/star50.csv \
+  --csi1000-csv path/to/csi1000.csv \
+  --breadth-csv path/to/chinext50_breadth.csv \
+  --output-dir outputs/ingestion
+```
+
+### Akshare provider (online fetch + local breadth)
+
+```bash
+python pipelines/ingest_real_data.py \
+  --provider akshare \
+  --market-symbol 159915 \
+  --market-symbol-type etf \
+  --hs300-symbol 000300 \
+  --star50-symbol 000688 \
+  --csi1000-symbol 000852 \
+  --start-date 2018-01-01 \
+  --end-date 2026-04-09 \
+  --breadth-csv path/to/chinext50_breadth.csv \
+  --output-dir outputs/ingestion
+```
+
+### Akshare + Mairui fallback (recommended when Akshare缺字段或不可用)
+
+```bash
+python pipelines/ingest_real_data.py \
+  --provider akshare \
+  --market-symbol 159915 \
+  --market-symbol-type etf \
+  --breadth-csv path/to/chinext50_breadth.csv \
+  --mairui-licence YOUR_MAIRUI_LICENCE \
+  --mairui-market-code 399673.SZ \
+  --mairui-hs300-code 000300.SH \
+  --mairui-star50-code 000688.SH \
+  --mairui-csi1000-code 000852.SH \
+  --start-date 2018-01-01 \
+  --end-date 2026-04-09 \
+  --output-dir outputs/ingestion
+```
+
+### Mairui provider (online fetch as primary)
+
+```bash
+python pipelines/ingest_real_data.py \
+  --provider mairui \
+  --mairui-licence YOUR_MAIRUI_LICENCE \
+  --mairui-market-code 399673.SZ \
+  --mairui-market-kind index \
+  --mairui-hs300-code 000300.SH \
+  --mairui-star50-code 000688.SH \
+  --mairui-csi1000-code 000852.SH \
+  --breadth-csv path/to/chinext50_breadth.csv \
+  --start-date 2018-01-01 \
+  --end-date 2026-04-09 \
+  --output-dir outputs/ingestion
+```
+
+If breadth fields are also served by a Mairui endpoint, you can replace `--breadth-csv` with:
+
+- `--mairui-breadth-url https://api.mairuiapi.com/xxx/{licence}`
+- optional `--mairui-breadth-map-json path/to/rename_map.json`
+
+If you do not trust an external breadth panel (or do not have one), you can derive breadth from constituent histories:
+
+```bash
+python pipelines/ingest_real_data.py \
+  --provider mairui \
+  --mairui-licence YOUR_MAIRUI_LICENCE \
+  --mairui-market-code 399673.SZ \
+  --mairui-market-kind index \
+  --mairui-hs300-code 000300.SH \
+  --mairui-star50-code 000688.SH \
+  --mairui-csi1000-code 000852.SH \
+  --derive-breadth \
+  --breadth-index-symbol 399673 \
+  --breadth-min-active-constituents 20 \
+  --breadth-max-constituents 50 \
+  --breadth-cache-dir outputs/ingestion/raw/constituent_history \
+  --output-dir outputs/ingestion
+```
+
+Strict mode now includes a breadth-source integrity gate. Placeholder-like breadth inputs (for example, constant `weighted_ret_5 - eq_weight_ret_5`) are blocked before PIT publish.
+
+Output structure includes:
+
+- `outputs/ingestion/raw/*.csv`
+- `outputs/ingestion/raw/breadth_integrity_summary.json`
+- `outputs/ingestion/raw/breadth_derivation_summary.json` (when `--derive-breadth` is used)
+- `outputs/ingestion/staging/*.csv`
+- `outputs/ingestion/pit/chinext50_pit.csv`
+- `outputs/ingestion/pit/pit_quality_summary.json`
+- `outputs/ingestion/ingestion_manifest.json`
+
+## Frozen Walk-Forward (Train-Select / Test-Freeze)
+
+`pipelines/frozen_hypothesis_validation.py` now runs a strict frozen-hypothesis process:
+
+1. Evaluate predefined candidates only on each training window.
+2. Select one winner by training utility (deterministic tie-break by candidate order).
+3. Freeze that winner and evaluate the paired test window without re-selection.
+
+### Candidate configuration
+
+Candidates can come from:
+
+- `config/regime.yaml` -> `frozen_validation.candidates`
+- optional CLI override file: `--candidates-json path/to/candidates.json`
+
+Window row requirements:
+
+- `frozen_validation.min_train_rows` (or `--min-train-rows`)
+- `frozen_validation.min_test_rows` (or `--min-test-rows`)
+
+If a window is too short, it is marked as skipped with an explicit status.
+
+### Audit outputs
+
+`frozen_validation_board.csv` now includes:
+
+- window ranges (`train_*`, `test_*`)
+- `status`
+- `selected_candidate_id`
+- `selected_candidate_overrides` (serialized JSON)
+- prefixed train/test metrics such as `train_utility_total_score` and `test_utility_total_score`
+
+`frozen_validation_summary.json` now includes:
+
+- processed/skipped window counts
+- positive test-utility ratio
+- selected candidate distribution
+- status distribution
+
+### Example
+
+```bash
+python pipelines/frozen_hypothesis_validation.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --candidates-json path/to/frozen_candidates.json \
+  --min-train-rows 180 \
+  --min-test-rows 60 \
+  --output-dir outputs/frozen_validation_real
+```
+
+## Real Walk-Forward Report
+
+Use `pipelines/real_walkforward_report.py` to generate a review-ready bundle from full PIT input:
+
+- `data_quality_summary.json`
+- `frozen_validation_board.csv`
+- `real_walkforward_summary.json`
+- `real_walkforward_report.md`
+
+```bash
+python pipelines/real_walkforward_report.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --strict-data \
+  --output-dir outputs/real_walkforward_report
+```
+
+## Event-Anchored Diagnostics
+
+`run_demo` now outputs transition-anchor diagnostics with explicit event taxonomy:
+
+- `crash_onset`
+- `false_rebound`
+- `true_repair`
+- `crowded_unwind`
+- `state_transition` (fallback class for other transitions)
+
+### Event artifacts
+
+- `event_log.csv`: per-transition anchor details (`event_date`, `from_state`, `to_state`, `event_type`, forward returns, exposure context)
+- `event_summary.csv`: event-type grouped averages and counts
+
+Classification logic is rule-based on state transitions plus forward-window confirmation signals for rebound quality.
+
+## Execution Layer Constraints and Tracking Diagnostics
+
+Backtest execution now includes configurable constraints for better ETF-style realism:
+
+- `trading.extreme_day_move_threshold`: absolute executed return threshold that triggers cost amplification
+- `trading.extreme_day_cost_multiplier`: multiplier applied to base trading cost on extreme days
+- `trading.gap_slippage_factor`: additive gap shock cost factor using `abs(gap_open) * turnover`
+
+New ledger diagnostics:
+
+- `tracking_difference`: `strategy_return_net - strategy_return_gross`
+- `tracking_error_20`: 20-day rolling std of `tracking_difference`
+
+New summary metrics:
+
+- `tracking_diff_mean`
+- `tracking_diff_abs_mean`
+- `tracking_error_20_p95`
+
+### Execution Constraint Calibration
+
+Use `pipelines/calibrate_execution_constraints.py` to sweep execution parameters and output a recommendation:
+
+- `execution_calibration_grid.csv`
+- `execution_calibration_recommendation.json`
+
+```bash
+python pipelines/calibrate_execution_constraints.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --cost-multipliers 1.0,1.25,1.5,1.75 \
+  --gap-slippage-factors 0.0,0.01,0.02,0.03 \
+  --output-dir outputs/execution_calibration
+```
+
+### Additional Optional Concentration Inputs
+
+To improve crowding diagnostics, you can optionally provide:
+
+- `top1_contribution_5`
+- `top10_contribution_5`
+- `sector_concentration_20`
+
+## Regime Lite (Small-Team Runtime)
+
+Use `pipelines/regime_lite_run.py` for a minimal operational workflow:
+
+- 3 states only: `risk_off`, `chop`, `trend`
+- fixed base exposures: `0.0`, `0.35`, `0.80`
+- daily exposure step cap: `0.20`
+- explicit execution profiles:
+  - `baseline`: `lag1` timing, no overlay
+  - `promoted_fast_entry_hold3`: prior promoted fixed-hold reference, based on `combo_fast_hold3`
+  - `promoted_fast_entry_adaptive_extend`: current preferred profile after adaptive keep-vs-replace closure, based on `combo_fast_adaptive_extend`
+
+```bash
+python pipelines/regime_lite_run.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --profile promoted_fast_entry_adaptive_extend \
+  --output-dir outputs/regime_lite
+```
+
+Current preferred lite runtime profile:
+
+- `promoted_fast_entry_adaptive_extend`
+- promotion decision artifact: `outputs/regime_lite_promotion_20260424/regime_lite_promotion_decision.json`
+- rationale: the bounded adaptive closure concluded `adaptive-replace-candidate`, selecting `combo_fast_adaptive_extend` to replace the prior fixed-hold reference while keeping `baseline` as rollback-safe reference
+- rollback/reference profile: `baseline`
+- inspect `promotion_decision.active_adaptive_mode` plus `regime_lite_summary.json -> execution_profile.adaptive_hold_mode` / `adaptive_hold_context` to understand the active bounded hold semantics before operating it
+
+Converged lite operational flow:
+
+1. Run the preferred profile with `pipelines/regime_lite_run.py --profile promoted_fast_entry_adaptive_extend`.
+2. Inspect `regime_lite_runtime_health.json` for bounded status `healthy` / `review` / `hold` / `rollback_recommended`.
+3. Inspect `regime_lite_post_promotion_review.json` for bounded decision `keep_promoted` / `hold_and_review` / `recommend_rollback`.
+4. In post-promotion review, treat `recent_window_evidence` as the primary decision basis; `full_history_reference` is reference context only, and `segmented_diagnostics` is for bounded diagnosis rather than override.
+5. If health stays `healthy` and review stays `keep_promoted`, continue normal lite operation.
+6. If health moves to `review` or review moves to `hold_and_review`, pause new tuning and inspect the bounded reasons before any change.
+7. If health reaches `rollback_recommended` or review reaches `recommend_rollback`, switch back to `baseline` as the rollback-safe profile and keep the lite path scoped to that runtime handoff.
+
+Artifacts:
+
+- `regime_lite_daily_ledger.csv`
+- `regime_lite_summary.json`
+- `regime_lite_report.md`
+- `regime_lite_runtime_health.json`
+- `regime_lite_post_promotion_review.json`
+
+### Execution Timing + Entry-Exit Experiments
+
+Run controlled A/B experiments for:
+
+- execution timing: `lag1` vs `fast_entry`
+- entry-specific exit overlay: short trend-entry hold floor with stop guard
+
+```bash
+python pipelines/regime_lite_experiments.py \
+  --pit-csv path/to/chinext50_pit.csv \
+  --output-dir outputs/regime_lite_experiments
+```
+
+Artifacts:
+
+- `regime_lite_experiment_results.csv`
+- `regime_lite_experiment_summary.json`
+- `regime_lite_experiment_report.html`
+- `regime_lite_experiment_baseline_ledger.csv`
+- `regime_lite_experiment_best_ledger.csv`
+- `regime_lite_promotion_decision.json`
+
+The experiment board now separates:
+
+- recommendation candidate: best discovery-sample variant
+- promotion status: final `promote` / `hold` / `reject` decision from deterministic holdout validation
+- governance handoff target: the promoted runtime profile that must flow into bounded lite runtime health and post-promotion review
+
+## Verification
+
+Project health check:
+
+```bash
+py -m pytest -q
+```
+
+The repository now pins pytest collection to the main `tests/` directory, so historical deliverable bundles and backups do not pollute the default test run.

+ 24 - 0
research/chinext50_regime_project/USER.md

@@ -0,0 +1,24 @@
+## User
+
+- Prefers Chinese communication in this workspace.
+- Current focus on the `chinext50_regime_project` research project.
+- Asked on 2026-04-08 to read carefully and analyze the project logic.
+- On 2026-04-09, asked to continue OpenSpec proposal and immediately start implementation once proposal was ready.
+- On 2026-04-09, confirmed preference to proceed in sequence: proposal first, then continue implementation.
+- On 2026-04-09, explicitly authorized autonomous continuation without further confirmation until completion.
+- On 2026-04-09, asked to continue improving according to GPT Pro suggestions while keeping OpenSpec-driven development flow.
+- On 2026-04-09, asked to keep progressing by OpenSpec change flow and keep `chinext50_regime_build_handoff_2026-04-08.md` as reference baseline.
+- On 2026-04-09, confirmed to start immediately and continue autonomous execution across sequential OpenSpec changes.
+- On 2026-04-09, approved implementing a concrete data-fetch/ingestion pipeline and asked to proceed directly.
+- On 2026-04-09, requested Akshare-first fetching with Mairui fallback for missing data and provided Mairui API access details.
+- On 2026-04-09, requested packaging code + review questions into a zip for GPT Pro.
+- On 2026-04-09, requested analysis of `chinext50_regime_review_2026-04-09.md` and a concrete improvement plan before the next implementation round.
+- On 2026-04-09, required strict execution of GPT Pro blocker checklist `chinext50_blocker_checklist_for_codex.md` in `B1 -> B6` order under OpenSpec flow, with per-blocker file/test/metric reporting and no threshold/policy tuning before `B1-B5` completion.
+- On 2026-04-10, user required strict execution of `chinext50_post_b3_feedback_response_for_codex_2026-04-10.md` in sequence: D0 -> preparatory repair-threshold code change -> H1b.1-L1 -> L2 -> L3 -> if all fail then H1b.2-direct-from-R1.
+- User expects per-block reporting: changed files, targeted tests, stitched metric delta, state mix delta, and exposure diagnostics delta.
+- 2026-04-10: User explicitly asked to continue without confirmation loops. Prefer direct execution and concise status updates.
+- User prefers reducing scope to practical small-team execution rather than heavy research workflow.
+- 2026-04-11: User reiterated that core unfinished goal is practical profitability; specifically prioritized experiments on execution timing and entry-specific exit to reduce false-cut miss and improve trend re-entry quality.
+- 2026-04-24: User requested a main-thread supervision loop: continuously monitor progress until task completion or true blockage, supervise subtask advancement, make local decisions on issues, and prioritize convergence to avoid endless project expansion.
+- 2026-04-24: User prefers subtasks to advance in OpenSpec style whenever possible.
+- 2026-04-24: User further clarified that the main thread should treat convergence of the entire project to a final finished state as the objective, not just completion of individual changes. Mid-process routing and local decisions should be made autonomously by me.

+ 23 - 0
research/chinext50_regime_project/backtest/__init__.py

@@ -0,0 +1,23 @@
+from .engine import run_backtest, compute_metrics
+from .events import build_transition_event_log, summarize_transition_events
+from .frozen_walkforward import (
+    DEFAULT_HYPOTHESIS_CANDIDATES,
+    HypothesisCandidate,
+    normalize_hypothesis_candidates,
+    run_frozen_walkforward,
+)
+from .utility import net_utility, utility_status, utility_from_metrics
+
+__all__ = [
+    'DEFAULT_HYPOTHESIS_CANDIDATES',
+    'HypothesisCandidate',
+    'compute_metrics',
+    'net_utility',
+    'normalize_hypothesis_candidates',
+    'build_transition_event_log',
+    'run_backtest',
+    'run_frozen_walkforward',
+    'summarize_transition_events',
+    'utility_from_metrics',
+    'utility_status',
+]

+ 160 - 0
research/chinext50_regime_project/backtest/engine.py

@@ -0,0 +1,160 @@
+from __future__ import annotations
+
+from typing import Any
+
+import numpy as np
+import pandas as pd
+
+
+def _max_drawdown(equity: pd.Series) -> float:
+    peak = equity.cummax()
+    drawdown = equity / peak - 1.0
+    return float(drawdown.min())
+
+
+def compute_metrics(
+    strategy_returns: pd.Series,
+    benchmark_returns: pd.Series,
+    turnover: pd.Series | None = None,
+    tracking_difference: pd.Series | None = None,
+    annualization: int = 252,
+) -> dict[str, float]:
+    strategy_returns = strategy_returns.dropna()
+    benchmark_returns = benchmark_returns.reindex(strategy_returns.index).fillna(0.0)
+    turnover = turnover.reindex(strategy_returns.index).fillna(0.0) if turnover is not None else pd.Series(0.0, index=strategy_returns.index)
+    tracking_difference = (
+        tracking_difference.reindex(strategy_returns.index).fillna(0.0)
+        if tracking_difference is not None
+        else strategy_returns - benchmark_returns
+    )
+
+    if strategy_returns.empty:
+        return {
+            'annual_return': 0.0,
+            'annual_vol': 0.0,
+            'sharpe': 0.0,
+            'max_drawdown': 0.0,
+            'calmar': 0.0,
+            'benchmark_sharpe': 0.0,
+            'sharpe_delta': 0.0,
+            'benchmark_max_drawdown': 0.0,
+            'drawdown_improvement_ratio': 0.0,
+            'upside_capture': 0.0,
+            'downside_capture': 0.0,
+            'annual_turnover': 0.0,
+            'tracking_diff_mean': 0.0,
+            'tracking_diff_abs_mean': 0.0,
+            'tracking_error_20_p95': 0.0,
+        }
+
+    def annual_return(returns: pd.Series) -> float:
+        total = float((1.0 + returns).prod())
+        n = len(returns)
+        return total ** (annualization / max(n, 1)) - 1.0
+
+    def annual_vol(returns: pd.Series) -> float:
+        return float(returns.std(ddof=0) * np.sqrt(annualization))
+
+    strategy_ann = annual_return(strategy_returns)
+    strategy_vol = annual_vol(strategy_returns)
+    strategy_sharpe = strategy_ann / strategy_vol if strategy_vol > 0 else 0.0
+    strategy_equity = (1.0 + strategy_returns).cumprod()
+    strategy_mdd = abs(_max_drawdown(strategy_equity))
+    strategy_calmar = strategy_ann / strategy_mdd if strategy_mdd > 0 else 0.0
+
+    bench_ann = annual_return(benchmark_returns)
+    bench_vol = annual_vol(benchmark_returns)
+    bench_sharpe = bench_ann / bench_vol if bench_vol > 0 else 0.0
+    bench_equity = (1.0 + benchmark_returns).cumprod()
+    bench_mdd = abs(_max_drawdown(bench_equity))
+
+    up_mask = benchmark_returns > 0
+    down_mask = benchmark_returns < 0
+    upside_capture = (strategy_returns[up_mask].mean() / benchmark_returns[up_mask].mean()) if up_mask.any() else 0.0
+    downside_capture = (strategy_returns[down_mask].mean() / benchmark_returns[down_mask].mean()) if down_mask.any() else 0.0
+    drawdown_improvement = (bench_mdd - strategy_mdd) / bench_mdd if bench_mdd > 0 else 0.0
+    annual_turnover = float(turnover.mean() * annualization)
+    tracking_diff_mean = float(tracking_difference.mean())
+    tracking_diff_abs_mean = float(tracking_difference.abs().mean())
+    tracking_error_20 = tracking_difference.rolling(20).std().dropna()
+    tracking_error_20_p95 = float(tracking_error_20.quantile(0.95)) if not tracking_error_20.empty else 0.0
+
+    return {
+        'annual_return': float(strategy_ann),
+        'annual_vol': float(strategy_vol),
+        'sharpe': float(strategy_sharpe),
+        'max_drawdown': float(strategy_mdd),
+        'calmar': float(strategy_calmar),
+        'benchmark_return': float(bench_ann),
+        'benchmark_vol': float(bench_vol),
+        'benchmark_sharpe': float(bench_sharpe),
+        'benchmark_max_drawdown': float(bench_mdd),
+        'sharpe_delta': float(strategy_sharpe - bench_sharpe),
+        'drawdown_improvement_ratio': float(drawdown_improvement),
+        'upside_capture': float(upside_capture),
+        'downside_capture': float(downside_capture),
+        'annual_turnover': annual_turnover,
+        'tracking_diff_mean': tracking_diff_mean,
+        'tracking_diff_abs_mean': tracking_diff_abs_mean,
+        'tracking_error_20_p95': tracking_error_20_p95,
+    }
+
+
+def run_backtest(df: pd.DataFrame, config: dict[str, Any] | None = None) -> tuple[pd.DataFrame, dict[str, float]]:
+    out = df.copy()
+    trading_cfg = (config or {}).get('trading', {})
+    annualization = int(trading_cfg.get('annualization', 252))
+
+    if 'target_exposure' not in out.columns:
+        raise ValueError('target_exposure column is required for backtest.')
+
+    if 'open' in out.columns:
+        asset_exec_return = out['open'].shift(-1) / out['open'] - 1.0
+    else:
+        asset_exec_return = out['close'].pct_change().shift(-1)
+
+    executed_exposure = out['target_exposure'].shift(1).fillna(0.0)
+    previous_executed = executed_exposure.shift(1).fillna(0.0)
+    turnover = (executed_exposure - previous_executed).abs()
+
+    one_way_cost_bps = float(trading_cfg.get('fee_bps_roundtrip', 8)) / 2.0 + float(trading_cfg.get('slippage_bps_oneway', 4))
+    cost_rate = one_way_cost_bps / 10000.0
+    extreme_move_threshold = float(trading_cfg.get('extreme_day_move_threshold', 0.03))
+    extreme_day_cost_multiplier = float(trading_cfg.get('extreme_day_cost_multiplier', 1.0))
+    gap_slippage_factor = float(trading_cfg.get('gap_slippage_factor', 0.0))
+
+    extreme_day_flag = asset_exec_return.abs() >= extreme_move_threshold
+    effective_multiplier = pd.Series(1.0, index=out.index)
+    effective_multiplier.loc[extreme_day_flag.fillna(False)] = extreme_day_cost_multiplier
+
+    trading_cost_base = turnover * cost_rate * effective_multiplier
+    if 'gap_open' in out.columns:
+        gap_open = out['gap_open'].fillna(0.0)
+    else:
+        gap_open = (out['open'] / out['close'].shift(1) - 1.0).fillna(0.0) if {'open', 'close'}.issubset(out.columns) else pd.Series(0.0, index=out.index)
+    gap_shock_cost = turnover * gap_open.abs() * gap_slippage_factor
+    trading_cost = trading_cost_base + gap_shock_cost
+
+    out['asset_exec_return'] = asset_exec_return
+    out['executed_exposure'] = executed_exposure
+    out['turnover'] = turnover
+    out['extreme_day_flag'] = extreme_day_flag.fillna(False)
+    out['execution_cost_multiplier'] = effective_multiplier
+    out['trading_cost_base'] = trading_cost_base
+    out['gap_shock_cost'] = gap_shock_cost
+    out['trading_cost'] = trading_cost
+    out['strategy_return_gross'] = executed_exposure * asset_exec_return
+    out['strategy_return_net'] = out['strategy_return_gross'] - trading_cost
+    out['tracking_difference'] = out['strategy_return_net'] - out['strategy_return_gross']
+    out['tracking_error_20'] = out['tracking_difference'].rolling(20).std()
+    out['strategy_equity'] = (1.0 + out['strategy_return_net'].fillna(0.0)).cumprod()
+    out['benchmark_equity'] = (1.0 + out['asset_exec_return'].fillna(0.0)).cumprod()
+
+    metrics = compute_metrics(
+        strategy_returns=out['strategy_return_net'],
+        benchmark_returns=out['asset_exec_return'],
+        turnover=out['turnover'],
+        tracking_difference=out['tracking_difference'],
+        annualization=annualization,
+    )
+    return out, metrics

+ 151 - 0
research/chinext50_regime_project/backtest/events.py

@@ -0,0 +1,151 @@
+from __future__ import annotations
+
+import numpy as np
+import pandas as pd
+
+
+EVENT_LOG_COLUMNS = [
+    'event_date',
+    'from_state',
+    'to_state',
+    'event_state',
+    'event_type',
+    'horizon',
+    'confirm_horizon',
+    'asset_forward_close_return',
+    'strategy_forward_return',
+    'target_exposure',
+    'has_risk_off_within_confirm',
+]
+
+EVENT_SUMMARY_COLUMNS = [
+    'event_type',
+    'event_state',
+    'count',
+    'avg_asset_forward_return',
+    'avg_strategy_forward_return',
+    'avg_target_exposure',
+]
+
+
+def _forward_strategy_return(series: pd.Series, horizon: int) -> pd.Series:
+    values = series.fillna(0.0).to_numpy(dtype=float)
+    out = np.full(len(values), np.nan, dtype=float)
+    for i in range(len(values)):
+        end = i + horizon
+        if end < len(values):
+            out[i] = np.prod(1.0 + values[i + 1 : end + 1]) - 1.0
+    return pd.Series(out, index=series.index, dtype=float)
+
+
+def _classify_event_type(
+    *,
+    from_state: str,
+    to_state: str,
+    asset_forward_close_return: float,
+    has_risk_off_within_confirm: bool,
+) -> str:
+    if to_state == 'risk_off':
+        return 'crash_onset'
+
+    if from_state == 'euphoric_late' and pd.notna(asset_forward_close_return) and asset_forward_close_return < 0.0:
+        return 'crowded_unwind'
+
+    if from_state in {'risk_off', 'chop'} and to_state in {'repair', 'trend'}:
+        if pd.notna(asset_forward_close_return) and asset_forward_close_return > 0.0 and not has_risk_off_within_confirm:
+            return 'true_repair'
+        return 'false_rebound'
+
+    return 'state_transition'
+
+
+def build_transition_event_log(
+    df: pd.DataFrame,
+    *,
+    horizon: int = 10,
+    confirm_horizon: int = 10,
+) -> pd.DataFrame:
+    if 'state' not in df.columns:
+        raise ValueError('state column required for event diagnostics.')
+    if 'close' not in df.columns:
+        raise ValueError('close column required for event diagnostics.')
+    if 'strategy_return_net' not in df.columns:
+        raise ValueError('strategy_return_net column required for event diagnostics.')
+    if horizon <= 0 or confirm_horizon <= 0:
+        raise ValueError('horizon and confirm_horizon must be positive integers.')
+
+    out = df.copy().sort_index()
+    out['state_prev'] = out['state'].shift(1)
+    out['state_change'] = out['state'] != out['state_prev']
+    if not out.empty:
+        out.iloc[0, out.columns.get_loc('state_change')] = False
+    out['asset_forward_close_return'] = out['close'].shift(-horizon) / out['close'] - 1.0
+    out['strategy_forward_return'] = _forward_strategy_return(out['strategy_return_net'], horizon=horizon)
+
+    events = out[out['state_change']].copy()
+    if events.empty:
+        return pd.DataFrame(columns=EVENT_LOG_COLUMNS)
+
+    rows: list[dict[str, object]] = []
+    states = out['state'].astype(str)
+    for ts, row in events.iterrows():
+        from_state = str(row['state_prev'])
+        to_state = str(row['state'])
+        try:
+            pos = int(out.index.get_loc(ts))
+        except Exception:
+            continue
+
+        future_states = states.iloc[pos + 1 : pos + 1 + confirm_horizon]
+        has_risk_off = bool((future_states == 'risk_off').any())
+        asset_fwd = float(row['asset_forward_close_return']) if pd.notna(row['asset_forward_close_return']) else np.nan
+        event_type = _classify_event_type(
+            from_state=from_state,
+            to_state=to_state,
+            asset_forward_close_return=asset_fwd,
+            has_risk_off_within_confirm=has_risk_off,
+        )
+        rows.append(
+            {
+                'event_date': ts,
+                'from_state': from_state,
+                'to_state': to_state,
+                'event_state': to_state,
+                'event_type': event_type,
+                'horizon': int(horizon),
+                'confirm_horizon': int(confirm_horizon),
+                'asset_forward_close_return': asset_fwd,
+                'strategy_forward_return': (
+                    float(row['strategy_forward_return']) if pd.notna(row['strategy_forward_return']) else np.nan
+                ),
+                'target_exposure': float(row['target_exposure']) if 'target_exposure' in row and pd.notna(row['target_exposure']) else np.nan,
+                'has_risk_off_within_confirm': has_risk_off,
+            }
+        )
+
+    event_log = pd.DataFrame(rows)
+    if event_log.empty:
+        return pd.DataFrame(columns=EVENT_LOG_COLUMNS)
+    return event_log[EVENT_LOG_COLUMNS]
+
+
+def summarize_transition_events(
+    df: pd.DataFrame,
+    horizon: int = 10,
+    confirm_horizon: int = 10,
+) -> pd.DataFrame:
+    event_log = build_transition_event_log(df, horizon=horizon, confirm_horizon=confirm_horizon)
+    if event_log.empty:
+        return pd.DataFrame(columns=EVENT_SUMMARY_COLUMNS)
+
+    summary = (
+        event_log.groupby(['event_type', 'event_state'])
+        .agg(
+            count=('event_type', 'size'),
+            avg_asset_forward_return=('asset_forward_close_return', 'mean'),
+            avg_strategy_forward_return=('strategy_forward_return', 'mean'),
+            avg_target_exposure=('target_exposure', 'mean'),
+        )
+        .reset_index()
+    )
+    return summary.sort_values(['count', 'event_type'], ascending=[False, True]).reset_index(drop=True)

+ 612 - 0
research/chinext50_regime_project/backtest/frozen_walkforward.py

@@ -0,0 +1,612 @@
+from __future__ import annotations
+
+import copy
+import json
+from dataclasses import dataclass
+from typing import Any, Callable, Iterable, Mapping, Sequence
+
+import pandas as pd
+
+from backtest.engine import compute_metrics, run_backtest
+from backtest.utility import core_utility, utility_from_metrics, utility_status
+from features.quality import enforce_feature_information_gate
+from backtest.walkforward import WindowSpec
+from features.pipeline import build_feature_table
+from model.policy import build_exposure_plan
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+
+
+@dataclass(frozen=True)
+class HypothesisCandidate:
+    candidate_id: str
+    overrides: dict[str, Any]
+
+
+DEFAULT_HYPOTHESIS_CANDIDATES: tuple[HypothesisCandidate, ...] = (
+    HypothesisCandidate(
+        candidate_id='defensive',
+        overrides={
+            'policy': {
+                'trend': 0.80,
+                'euphoric_late': 0.30,
+                'chop': 0.20,
+                'repair_rebound_base': 0.30,
+                'repair_rebound_max': 0.65,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.20,
+            },
+        },
+    ),
+    HypothesisCandidate(candidate_id='baseline', overrides={}),
+    HypothesisCandidate(
+        candidate_id='balanced_capture',
+        overrides={
+            'policy': {
+                'trend': 0.95,
+                'euphoric_late': 0.65,
+                'chop': 0.35,
+                'repair_rebound_base': 0.40,
+                'repair_rebound_max': 0.85,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.30,
+            },
+        },
+    ),
+    HypothesisCandidate(
+        candidate_id='pro_risk',
+        overrides={
+            'policy': {
+                'trend': 1.00,
+                'euphoric_late': 0.70,
+                'chop': 0.45,
+                'repair_rebound_base': 0.50,
+                'repair_rebound_max': 0.95,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.35,
+            },
+        },
+    ),
+)
+
+
+StrategyRunner = Callable[[pd.DataFrame, dict[str, Any]], tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]]
+
+
+def _deep_merge_dict(base: Mapping[str, Any], overrides: Mapping[str, Any]) -> dict[str, Any]:
+    out = copy.deepcopy(dict(base))
+    for key, value in overrides.items():
+        if isinstance(value, Mapping) and isinstance(out.get(key), Mapping):
+            out[key] = _deep_merge_dict(dict(out[key]), value)
+        else:
+            out[key] = copy.deepcopy(value)
+    return out
+
+
+def _resolve_utility(metrics: Mapping[str, float], config: Mapping[str, Any] | None = None) -> tuple[float, str]:
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    utility_total_score = float(
+        metrics.get(
+            'utility_total_score',
+            utility_from_metrics(
+                dict(metrics),
+                upside_target=float(evaluation_cfg.get('utility_upside_target', 0.55)),
+                turnover_penalty_start=float(evaluation_cfg.get('utility_turnover_penalty_start', 8.0)),
+                turnover_penalty_rate=float(evaluation_cfg.get('utility_turnover_penalty_rate', 0.010)),
+            ),
+        )
+    )
+    utility_state = str(metrics.get('utility_status', utility_status(utility_total_score)))
+    return utility_total_score, utility_state
+
+
+def run_strategy_bundle(df: pd.DataFrame, config: dict[str, Any]) -> tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]:
+    featured = build_feature_table(df)
+    enforce_feature_information_gate(featured, config)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+    ledger, metrics = run_backtest(planned, config)
+
+    utility_total_score, utility_state = _resolve_utility(metrics, config)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return planned, ledger, out_metrics
+
+
+def normalize_hypothesis_candidates(raw_candidates: Iterable[Mapping[str, Any]] | None) -> list[HypothesisCandidate]:
+    if raw_candidates is None:
+        return [copy.deepcopy(candidate) for candidate in DEFAULT_HYPOTHESIS_CANDIDATES]
+
+    candidates: list[HypothesisCandidate] = []
+    for idx, item in enumerate(raw_candidates):
+        candidate_id = str(item.get('id', item.get('candidate_id', f'candidate_{idx + 1}'))).strip()
+        if not candidate_id:
+            raise ValueError(f'Candidate index {idx} is missing an id.')
+        overrides_raw = item.get('overrides', {})
+        if not isinstance(overrides_raw, Mapping):
+            raise ValueError(f'Candidate {candidate_id} overrides must be an object.')
+        candidates.append(HypothesisCandidate(candidate_id=candidate_id, overrides=dict(overrides_raw)))
+
+    if not candidates:
+        raise ValueError('At least one hypothesis candidate is required.')
+
+    ids = [candidate.candidate_id for candidate in candidates]
+    if len(set(ids)) != len(ids):
+        raise ValueError(f'Duplicate candidate ids found: {ids}')
+    return candidates
+
+
+def _candidate_config(base_config: Mapping[str, Any], candidate: HypothesisCandidate) -> dict[str, Any]:
+    merged = _deep_merge_dict(base_config, candidate.overrides)
+    merged['_candidate_id'] = candidate.candidate_id
+    return merged
+
+
+def _prefixed_metrics(prefix: str, metrics: Mapping[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[f'{prefix}_{key}'] = float(value)
+        else:
+            out[f'{prefix}_{key}'] = value
+    return out
+
+
+def _compute_window_metrics(ledger: pd.DataFrame, config: Mapping[str, Any] | None = None) -> dict[str, float]:
+    required_columns = {'strategy_return_net', 'asset_exec_return', 'turnover'}
+    if not required_columns.issubset(ledger.columns):
+        raise ValueError(f'Ledger is missing required columns: {sorted(required_columns - set(ledger.columns))}')
+    metrics = compute_metrics(
+        strategy_returns=ledger['strategy_return_net'],
+        benchmark_returns=ledger['asset_exec_return'],
+        turnover=ledger['turnover'],
+    )
+    utility_total_score, utility_state = _resolve_utility(metrics, config)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return out_metrics
+
+
+def _window_row_base(window: WindowSpec) -> dict[str, Any]:
+    return {
+        'train_start': window.train_start,
+        'train_end': window.train_end,
+        'test_start': window.test_start,
+        'test_end': window.test_end,
+    }
+
+
+def _clip(value: float, lower: float, upper: float) -> float:
+    return float(min(max(value, lower), upper))
+
+
+def _safe_float(value: Any, default: float = 0.0) -> float:
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return float(default)
+
+
+def _resolve_candidate_selection_settings(config: Mapping[str, Any]) -> dict[str, Any]:
+    frozen_cfg = dict((config or {}).get('frozen_validation', {}))
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    cfg = dict(frozen_cfg.get('candidate_selection', {}))
+    return {
+        'use_hard_constraints': bool(cfg.get('use_hard_constraints', True)),
+        'upside_capture_min': float(cfg.get('upside_capture_min', 0.28)),
+        'max_drawdown_ratio_vs_benchmark': float(cfg.get('max_drawdown_ratio_vs_benchmark', 0.72)),
+        'annual_turnover_soft_max': float(cfg.get('annual_turnover_soft_max', 18.0)),
+        'annual_return_override_abs': float(cfg.get('annual_return_override_abs', 0.05)),
+        'annual_return_override_ratio': float(cfg.get('annual_return_override_ratio', 0.40)),
+        'return_ratio_weight': float(cfg.get('return_ratio_weight', 0.30)),
+        'upside_weight': float(cfg.get('upside_weight', 0.30)),
+        'drawdown_weight': float(cfg.get('drawdown_weight', 0.20)),
+        'sharpe_delta_weight': float(cfg.get('sharpe_delta_weight', 0.10)),
+        'stability_weight': float(cfg.get('stability_weight', 0.10)),
+        'turnover_penalty_per_unit': float(cfg.get('turnover_penalty_per_unit', 0.015)),
+        'score_cap': float(cfg.get('score_cap', 1.2)),
+        'upside_target': float(cfg.get('upside_target', 0.45)),
+        'drawdown_improvement_target': float(cfg.get('drawdown_improvement_target', 0.35)),
+        'sharpe_delta_shift': float(cfg.get('sharpe_delta_shift', 0.05)),
+        'sharpe_delta_scale': float(cfg.get('sharpe_delta_scale', 0.15)),
+        'turnover_penalty_start': float(cfg.get('turnover_penalty_start', 12.0)),
+        'core_utility_floor': float(cfg.get('core_utility_floor', cfg.get('utility_floor', -0.05))),
+        'core_utility_target': float(cfg.get('core_utility_target', cfg.get('utility_target', 0.10))),
+        'utility_upside_target': float(evaluation_cfg.get('utility_upside_target', 0.55)),
+        'fallback_mode': str(cfg.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower(),
+    }
+
+
+def _compute_selection_score(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    sharpe_delta = _safe_float(metrics.get('sharpe_delta'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+
+    score_cap = float(settings['score_cap'])
+    upside_target = max(float(settings['upside_target']), 1e-12)
+    drawdown_target = max(float(settings['drawdown_improvement_target']), 1e-12)
+    sharpe_scale = max(float(settings['sharpe_delta_scale']), 1e-12)
+
+    if benchmark_return > 0.05:
+        return_ratio = _clip(annual_return / benchmark_return, 0.0, score_cap)
+    else:
+        return_ratio = _clip(annual_return / 0.10, 0.0, score_cap)
+    upside_score = _clip((upside_capture - 0.15) / max(upside_target - 0.15, 1e-12), 0.0, score_cap)
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_improvement = (benchmark_max_drawdown - max_drawdown) / benchmark_max_drawdown
+    else:
+        drawdown_improvement = 0.0
+    core_utility_value = _safe_float(
+        metrics.get(
+            'core_utility_score',
+            core_utility(
+                sharpe_delta=sharpe_delta,
+                drawdown_improvement=drawdown_improvement,
+                upside_capture=upside_capture,
+                upside_target=float(settings['utility_upside_target']),
+            ),
+        )
+    )
+    drawdown_score = _clip(drawdown_improvement / drawdown_target, 0.0, score_cap)
+    sharpe_delta_score = _clip((sharpe_delta + float(settings['sharpe_delta_shift'])) / sharpe_scale, 0.0, score_cap)
+    stability_score = _clip(
+        (core_utility_value - float(settings['core_utility_floor']))
+        / max(float(settings['core_utility_target']) - float(settings['core_utility_floor']), 1e-12),
+        0.0,
+        score_cap,
+    )
+    turnover_penalty = max(0.0, annual_turnover - float(settings['turnover_penalty_start'])) * float(
+        settings['turnover_penalty_per_unit']
+    )
+
+    score = (
+        float(settings['return_ratio_weight']) * return_ratio
+        + float(settings['upside_weight']) * upside_score
+        + float(settings['drawdown_weight']) * drawdown_score
+        + float(settings['sharpe_delta_weight']) * sharpe_delta_score
+        + float(settings['stability_weight']) * stability_score
+        - turnover_penalty
+    )
+    return score, {
+        'return_ratio': return_ratio,
+        'upside_score': upside_score,
+        'drawdown_score': drawdown_score,
+        'sharpe_delta_score': sharpe_delta_score,
+        'core_utility_value': core_utility_value,
+        'stability_score': stability_score,
+        'turnover_penalty': turnover_penalty,
+    }
+
+
+def _evaluate_hard_constraints(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[bool, list[str]]:
+    reasons: list[str] = []
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    if upside_capture < float(settings['upside_capture_min']):
+        reasons.append('upside_capture_below_min')
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_ratio = max_drawdown / benchmark_max_drawdown
+        if drawdown_ratio > float(settings['max_drawdown_ratio_vs_benchmark']):
+            reasons.append('drawdown_ratio_above_max')
+
+    turnover_cap = float(settings['annual_turnover_soft_max'])
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+    if annual_turnover > turnover_cap and annual_return < return_override_threshold:
+        reasons.append('turnover_above_soft_max_without_return_override')
+
+    return len(reasons) == 0, reasons
+
+
+def _constraint_distance(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    upside_min = max(float(settings['upside_capture_min']), 1e-12)
+    drawdown_max = max(float(settings['max_drawdown_ratio_vs_benchmark']), 1e-12)
+    turnover_soft_max = max(float(settings['annual_turnover_soft_max']), 1e-12)
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+
+    upside_gap = max(0.0, upside_min - upside_capture) / upside_min
+    drawdown_ratio = (max_drawdown / benchmark_max_drawdown) if benchmark_max_drawdown > 1e-12 else 0.0
+    drawdown_gap = max(0.0, drawdown_ratio - drawdown_max) / drawdown_max
+
+    turnover_gap = 0.0
+    if annual_turnover > turnover_soft_max and annual_return < return_override_threshold:
+        turnover_gap = (annual_turnover - turnover_soft_max) / turnover_soft_max
+
+    violation_distance = 0.50 * upside_gap + 0.30 * drawdown_gap + 0.20 * turnover_gap
+    return float(violation_distance), {
+        'upside_gap': float(upside_gap),
+        'drawdown_gap': float(drawdown_gap),
+        'turnover_gap': float(turnover_gap),
+    }
+
+
+def run_frozen_walkforward(
+    raw: pd.DataFrame,
+    config: Mapping[str, Any],
+    windows: Sequence[WindowSpec],
+    *,
+    candidates: Sequence[HypothesisCandidate] | None = None,
+    min_train_rows: int = 120,
+    min_test_rows: int = 40,
+    strategy_runner: StrategyRunner | None = None,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    if min_train_rows <= 0:
+        raise ValueError('min_train_rows must be positive.')
+    if min_test_rows <= 0:
+        raise ValueError('min_test_rows must be positive.')
+
+    runner = strategy_runner or run_strategy_bundle
+    candidate_list = list(candidates or DEFAULT_HYPOTHESIS_CANDIDATES)
+    if not candidate_list:
+        raise ValueError('At least one candidate is required for frozen walk-forward.')
+    selection_settings = _resolve_candidate_selection_settings(config)
+
+    rows: list[dict[str, Any]] = []
+
+    for window in windows:
+        train_slice = raw.loc[window.train_start:window.train_end].copy()
+        test_slice = raw.loc[window.test_start:window.test_end].copy()
+
+        row = _window_row_base(window)
+        row['train_rows'] = int(len(train_slice))
+        row['test_rows'] = int(len(test_slice))
+        row['candidate_count'] = int(len(candidate_list))
+
+        if len(train_slice) < min_train_rows:
+            row['status'] = 'skipped_insufficient_train'
+            rows.append(row)
+            continue
+        if len(test_slice) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        selected_candidate: HypothesisCandidate | None = None
+        selected_train_metrics: dict[str, float] | None = None
+        selected_train_utility = float('-inf')
+        selected_train_score = float('-inf')
+        selected_train_hard_pass = False
+        selected_train_constraint_failures: list[str] = []
+        selected_train_violation_distance = 0.0
+        selected_train_violation_components: dict[str, float] = {}
+        selection_mode = 'constraint_score'
+        candidate_evaluations: list[dict[str, Any]] = []
+
+        for candidate in candidate_list:
+            candidate_config = _candidate_config(config, candidate)
+            _, _, train_metrics_raw = runner(train_slice, candidate_config)
+            train_metrics = dict(train_metrics_raw)
+            utility_value, _ = _resolve_utility(train_metrics)
+            train_metrics['utility_total_score'] = utility_value
+            train_metrics['utility_status'] = utility_status(utility_value)
+            hard_pass, hard_fail_reasons = _evaluate_hard_constraints(train_metrics, selection_settings)
+            score_value, score_components = _compute_selection_score(train_metrics, selection_settings)
+            violation_distance, violation_components = _constraint_distance(train_metrics, selection_settings)
+            candidate_evaluations.append(
+                {
+                    'candidate': candidate,
+                    'metrics': train_metrics,
+                    'utility': utility_value,
+                    'hard_pass': hard_pass,
+                    'hard_fail_reasons': hard_fail_reasons,
+                    'selection_score': score_value,
+                    'selection_score_components': score_components,
+                    'violation_distance': violation_distance,
+                    'violation_components': violation_components,
+                }
+            )
+
+        use_hard_constraints = bool(selection_settings['use_hard_constraints'])
+        ranking_pool = (
+            [item for item in candidate_evaluations if item['hard_pass']]
+            if use_hard_constraints
+            else candidate_evaluations
+        )
+
+        if ranking_pool:
+            for item in ranking_pool:
+                score_value = float(item['selection_score'])
+                if score_value > selected_train_score:
+                    selected_train_score = score_value
+                    selected_candidate = item['candidate']
+                    selected_train_metrics = item['metrics']
+                    selected_train_utility = float(item['utility'])
+                    selected_train_hard_pass = bool(item['hard_pass'])
+                    selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                    selected_train_violation_distance = float(item['violation_distance'])
+                    selected_train_violation_components = dict(item['violation_components'])
+        else:
+            fallback_mode = str(selection_settings.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower()
+            if fallback_mode == 'closest_to_feasible_frontier':
+                selection_mode = 'frontier_fallback_no_hard_pass'
+                selected_fallback_score = float('-inf')
+                for item in candidate_evaluations:
+                    fallback_score = -float(item['violation_distance']) + 0.25 * float(item['selection_score'])
+                    utility_value = float(item['utility'])
+                    if (
+                        fallback_score > selected_fallback_score
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) > selected_train_score
+                        )
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) == selected_train_score
+                            and utility_value > selected_train_utility
+                        )
+                    ):
+                        selected_fallback_score = fallback_score
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+            else:
+                selection_mode = 'utility_fallback_no_hard_pass'
+                for item in candidate_evaluations:
+                    utility_value = float(item['utility'])
+                    if utility_value > selected_train_utility:
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+
+        hard_pass_count = int(sum(1 for item in candidate_evaluations if bool(item['hard_pass'])))
+        ranking_brief = [
+            {
+                'candidate_id': item['candidate'].candidate_id,
+                'hard_pass': bool(item['hard_pass']),
+                'selection_score': float(item['selection_score']),
+                'train_utility_total_score': float(item['utility']),
+                'hard_fail_reasons': list(item['hard_fail_reasons']),
+                'violation_distance': float(item['violation_distance']),
+            }
+            for item in candidate_evaluations
+        ]
+        ranking_brief.sort(key=lambda x: (-x['hard_pass'], -x['selection_score'], -x['train_utility_total_score']))
+
+        if selected_candidate is None or selected_train_metrics is None:
+            row['status'] = 'skipped_no_candidate'
+            rows.append(row)
+            continue
+
+        combined_slice = raw.loc[window.train_start:window.test_end].copy()
+        candidate_config = _candidate_config(config, selected_candidate)
+        _, combined_ledger, _ = runner(combined_slice, candidate_config)
+        frozen_test_ledger = combined_ledger.loc[window.test_start:window.test_end].copy()
+
+        if len(frozen_test_ledger) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        test_metrics = _compute_window_metrics(frozen_test_ledger, candidate_config)
+
+        row.update(
+            {
+                'status': 'ok',
+                'selected_candidate_id': selected_candidate.candidate_id,
+                'selection_mode': selection_mode,
+                'train_candidate_hard_pass_count': hard_pass_count,
+                'train_candidate_total_count': int(len(candidate_evaluations)),
+                'selected_train_selection_score': float(selected_train_score),
+                'selected_train_hard_pass': bool(selected_train_hard_pass),
+                'selected_train_constraint_failures': json.dumps(
+                    selected_train_constraint_failures,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'selected_train_violation_distance': float(selected_train_violation_distance),
+                'selected_train_violation_components': json.dumps(
+                    selected_train_violation_components,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'train_candidate_rankings': json.dumps(ranking_brief, ensure_ascii=False, sort_keys=True),
+                'selected_candidate_overrides': json.dumps(
+                    selected_candidate.overrides,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+            }
+        )
+        row.update(_prefixed_metrics('train', selected_train_metrics))
+        row.update(_prefixed_metrics('test', test_metrics))
+        rows.append(row)
+
+    board = pd.DataFrame(rows)
+    if board.empty:
+        board = pd.DataFrame(columns=['status'])
+
+    ok_board = board[board['status'] == 'ok'].copy() if 'status' in board.columns else pd.DataFrame()
+    selected_distribution = (
+        ok_board['selected_candidate_id'].value_counts().to_dict() if 'selected_candidate_id' in ok_board.columns else {}
+    )
+    status_counts = board['status'].value_counts().to_dict() if 'status' in board.columns else {}
+    selection_mode_distribution = (
+        ok_board['selection_mode'].value_counts().to_dict() if not ok_board.empty and 'selection_mode' in ok_board.columns else {}
+    )
+    windows_with_hard_pass_candidate_count = (
+        int((ok_board['train_candidate_hard_pass_count'] > 0).sum())
+        if not ok_board.empty and 'train_candidate_hard_pass_count' in ok_board.columns
+        else 0
+    )
+    hard_pass_window_ratio = (
+        float(windows_with_hard_pass_candidate_count / len(ok_board))
+        if len(ok_board) > 0
+        else 0.0
+    )
+    positive_window_ratio = (
+        float((ok_board['test_utility_total_score'] > 0.0).mean())
+        if not ok_board.empty and 'test_utility_total_score' in ok_board.columns
+        else 0.0
+    )
+    fallback_distance_distribution = (
+        ok_board.loc[
+            ok_board['selection_mode'].isin({'frontier_fallback_no_hard_pass', 'utility_fallback_no_hard_pass'}),
+            'selected_train_violation_distance',
+        ]
+        .dropna()
+        .tolist()
+        if not ok_board.empty
+        and 'selection_mode' in ok_board.columns
+        and 'selected_train_violation_distance' in ok_board.columns
+        else []
+    )
+
+    summary = {
+        'total_windows': int(len(windows)),
+        'processed_window_count': int(len(ok_board)),
+        'skipped_window_count': int(max(len(windows) - len(ok_board), 0)),
+        'positive_window_ratio': positive_window_ratio,
+        'positive_window_ratio_role': 'diagnostic_only',
+        'primary_acceptance_metrics': ['primary_window_success_ratio', 'hard_pass_window_ratio'],
+        'selected_candidate_distribution': selected_distribution,
+        'window_status_counts': status_counts,
+        'selection_mode_distribution': selection_mode_distribution,
+        'windows_with_hard_pass_candidate_count': windows_with_hard_pass_candidate_count,
+        'windows_without_hard_pass_candidate_count': int(max(len(ok_board) - windows_with_hard_pass_candidate_count, 0)),
+        'hard_pass_window_ratio': hard_pass_window_ratio,
+        'fallback_distance_distribution': [float(x) for x in fallback_distance_distribution],
+        'candidate_ids': [candidate.candidate_id for candidate in candidate_list],
+        'min_train_rows': int(min_train_rows),
+        'min_test_rows': int(min_test_rows),
+        'candidate_selection': selection_settings,
+    }
+    return board, summary

+ 70 - 0
research/chinext50_regime_project/backtest/utility.py

@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+from typing import Mapping
+
+
+def core_utility(
+    sharpe_delta: float,
+    drawdown_improvement: float,
+    upside_capture: float,
+    upside_target: float = 0.55,
+) -> float:
+    return 0.45 * sharpe_delta + 0.35 * drawdown_improvement + 0.20 * (upside_capture - upside_target)
+
+
+def turnover_penalty(
+    annual_turnover: float,
+    start: float = 8.0,
+    rate: float = 0.010,
+) -> float:
+    return rate * max(0.0, float(annual_turnover) - float(start))
+
+
+def net_utility(
+    sharpe_delta: float,
+    drawdown_improvement: float,
+    upside_capture: float,
+    annual_turnover: float = 0.0,
+    upside_target: float = 0.55,
+    turnover_penalty_start: float = 8.0,
+    turnover_penalty_rate: float = 0.010,
+) -> float:
+    return core_utility(
+        sharpe_delta=sharpe_delta,
+        drawdown_improvement=drawdown_improvement,
+        upside_capture=upside_capture,
+        upside_target=upside_target,
+    ) - turnover_penalty(
+        annual_turnover=annual_turnover,
+        start=turnover_penalty_start,
+        rate=turnover_penalty_rate,
+    )
+
+
+def utility_status(total_utility: float) -> str:
+    return 'positive_utility' if total_utility > 0 else 'negative_utility'
+
+
+def utility_from_metrics(
+    metrics: Mapping[str, float],
+    *,
+    upside_target: float = 0.55,
+    turnover_penalty_start: float = 8.0,
+    turnover_penalty_rate: float = 0.010,
+) -> float:
+    sharpe_delta = float(metrics.get('sharpe_delta', 0.0))
+    drawdown_improvement = float(metrics.get('drawdown_improvement_ratio', 0.0))
+    upside_capture = float(metrics.get('upside_capture', upside_target))
+    annual_turnover = float(metrics.get('annual_turnover', 0.0))
+    upside_target_value = float(metrics.get('utility_upside_target', upside_target))
+    turnover_start_value = float(metrics.get('utility_turnover_penalty_start', turnover_penalty_start))
+    turnover_rate_value = float(metrics.get('utility_turnover_penalty_rate', turnover_penalty_rate))
+    return net_utility(
+        sharpe_delta=sharpe_delta,
+        drawdown_improvement=drawdown_improvement,
+        upside_capture=upside_capture,
+        annual_turnover=annual_turnover,
+        upside_target=upside_target_value,
+        turnover_penalty_start=turnover_start_value,
+        turnover_penalty_rate=turnover_rate_value,
+    )

+ 76 - 0
research/chinext50_regime_project/backtest/walkforward.py

@@ -0,0 +1,76 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+import pandas as pd
+
+
+@dataclass
+class WindowSpec:
+    train_start: str
+    train_end: str
+    test_start: str
+    test_end: str
+
+
+DEFAULT_WINDOWS = [
+    WindowSpec('2016-01-01', '2018-12-31', '2019-01-01', '2020-12-31'),
+    WindowSpec('2018-01-01', '2020-12-31', '2021-01-01', '2022-12-31'),
+    WindowSpec('2020-01-01', '2022-12-31', '2023-01-01', '2024-12-31'),
+]
+
+
+def build_expanding_windows(
+    index: pd.DatetimeIndex,
+    *,
+    min_train_years: int = 2,
+    test_years: int = 1,
+    allow_partial_last_test: bool = True,
+) -> list[WindowSpec]:
+    if min_train_years <= 0:
+        raise ValueError('min_train_years must be positive.')
+    if test_years <= 0:
+        raise ValueError('test_years must be positive.')
+
+    normalized_index = pd.DatetimeIndex(index).dropna().sort_values().unique()
+    if len(normalized_index) == 0:
+        return []
+
+    start_ts = pd.Timestamp(normalized_index[0])
+    end_ts = pd.Timestamp(normalized_index[-1])
+    windows: list[WindowSpec] = []
+
+    for test_start_year in range(start_ts.year + min_train_years, end_ts.year + 1):
+        candidate_test_start = pd.Timestamp(year=test_start_year, month=1, day=1)
+        if candidate_test_start > end_ts:
+            break
+
+        test_mask = normalized_index >= candidate_test_start
+        if not test_mask.any():
+            continue
+        actual_test_start = pd.Timestamp(normalized_index[test_mask][0])
+
+        target_test_end = pd.Timestamp(year=test_start_year + test_years - 1, month=12, day=31)
+        if target_test_end > end_ts:
+            if not allow_partial_last_test:
+                break
+            target_test_end = end_ts
+
+        actual_test_index = normalized_index[(normalized_index >= actual_test_start) & (normalized_index <= target_test_end)]
+        if len(actual_test_index) == 0:
+            continue
+
+        actual_train_index = normalized_index[normalized_index < actual_test_start]
+        if len(actual_train_index) == 0:
+            continue
+
+        windows.append(
+            WindowSpec(
+                train_start=pd.Timestamp(actual_train_index[0]).date().isoformat(),
+                train_end=pd.Timestamp(actual_train_index[-1]).date().isoformat(),
+                test_start=actual_test_start.date().isoformat(),
+                test_end=pd.Timestamp(actual_test_index[-1]).date().isoformat(),
+            )
+        )
+
+    return windows

+ 743 - 0
research/chinext50_regime_project/chinext50_blocker_checklist_for_codex.md

@@ -0,0 +1,743 @@
+# ChiNext50 Regime System — 必须先改的 Blocker 清单(给 Codex)
+
+> 目标:**先修完整性和可验证性,再谈阈值/仓位/目标函数优化。**
+>
+> 在当前 bundle 上,**不要**先做 state threshold tuning、policy tuning、execution calibration tuning。先完成下面的 blocker,再重新跑全链路。
+
+---
+
+## 0. 当前已验证的故障现象(基于现有 PIT 和当前代码复跑)
+
+在当前 `results/outputs/system_e2e_20260409_strict/ingestion/pit/chinext50_pit.csv` 上,按现有 pipeline:
+
+- `breadth_score.notna().mean() == 0.0`
+- `crowding_score.notna().mean() == 0.0`
+- `down_hazard` / `repair_hazard` / `rebound_hazard` **全部恒等于 `0.5`**
+- 状态机退化为 3 态:
+  - `chop: 1177`
+  - `trend: 243`
+  - `risk_off: 97`
+  - `repair: 0`
+  - `euphoric_late: 0`
+- `baseline` 与 `pro_risk` 的 `target_exposure` 路径 **100% 相同**
+- frozen walk-forward 只有 **2 个有效窗口**,第 1 个窗口因训练样本不足直接跳过
+
+这说明当前系统不是“offense 还没调出来”,而是:
+
+1. 分数组合存在 **NaN 污染传播**;
+2. state machine / policy 在 **静默把无效信号当中性信号** 处理;
+3. candidate 搜索空间被 **粗量化仓位** 压扁;
+4. walk-forward 默认窗口和现有数据起点 **不匹配**;
+5. execution calibration 的打分公式 **量纲失衡**。
+
+---
+
+## 总执行顺序(必须按顺序)
+
+1. **B1** 修复 `scores.py` 的 NaN 污染与 hazard 塌缩
+2. **B2** 禁止 state machine / policy 把缺失信号静默当 0
+3. **B3** 增加 post-feature information gate(仅 raw coverage gate 不够)
+4. **B4** 去掉 0.25 粗量化仓位,至少升级到 0.10 ladder
+5. **B5** 改 walk-forward 窗口生成逻辑,和 2020+ PIT 对齐
+6. **B6** 再改 execution calibration score,重新跑 calibration
+
+**在 B1-B5 没完成前,不要重新调 state threshold / repair/trend exposure / objective 权重。**
+
+---
+
+# B1. 修复 composite score 的 NaN 污染,以及 hazard 恒等于 0.5 的问题
+
+## 为什么这是 blocker
+
+当前 `code/model/scores.py` 直接把多个 `rolling_zscore(...)` 线性相加。只要某个分量因为常数列或 0 std 变成 `NaN`,整个总分就被毒死。
+
+在当前 PIT 中,`concentration_spread_5 = weighted_ret_5 - eq_weight_ret_5` 是常数,因此:
+
+- `rolling_zscore(concentration_spread_5)` 全 `NaN`
+- `breadth_score` 全 `NaN`
+- `crowding_score` 全 `NaN`
+- hazard raw 再 `fillna(0)` 后过 sigmoid,直接塌成 `0.5`
+
+## 要修改的文件与行段
+
+### 1) `code/model/scores.py`
+
+- **第 7-11 行**:`rolling_zscore`
+- **第 20-21 行**:`_sigmoid`
+- **第 24-67 行**:`trend_score / breadth_score / stress_score / crowding_score / repair_score`
+- **第 75-100 行**:`down_hazard / repair_hazard / rebound_hazard`
+
+## 必须怎么改
+
+### A. 保留 `rolling_zscore` 的 NaN 输出,但不要再直接线性相加
+
+新增一个 helper,按“**忽略 NaN 的加权聚合**”方式算 composite score,而不是直接 `a + b + c + ...`。
+
+建议新增:
+
+```python
+
+def _weighted_composite(
+    components: list[tuple[float, pd.Series]],
+    *,
+    min_valid_weight: float,
+) -> tuple[pd.Series, pd.Series]:
+    """
+    Return:
+      score: weighted average over non-null components
+      valid_weight: total available weight per row
+    If valid_weight < min_valid_weight -> score = NaN
+    """
+```
+
+关键要求:
+
+- 某个分量 `NaN` 时,**只跳过该分量**,不要把总分变成 `NaN`
+- 但如果该行的**有效权重不足**,总分仍应是 `NaN`
+- 返回 `valid_weight`,后面给 state machine / diagnostics 用
+
+### B. `_sigmoid` 不能再 `fillna(0.0)`
+
+当前:
+
+```python
+return 1.0 / (1.0 + np.exp(-series.fillna(0.0)))
+```
+
+这会把“无效 hazard input”伪装成中性概率 `0.5`。
+
+改成:
+
+```python
+
+def _sigmoid_preserve_nan(series: pd.Series) -> pd.Series:
+    out = pd.Series(np.nan, index=series.index, dtype=float)
+    mask = series.notna()
+    out.loc[mask] = 1.0 / (1.0 + np.exp(-series.loc[mask]))
+    return out
+```
+
+### C. 五个主分都要输出“有效权重/有效性”
+
+至少新增这些列:
+
+- `trend_score_valid_weight`
+- `breadth_score_valid_weight`
+- `stress_score_valid_weight`
+- `crowding_score_valid_weight`
+- `repair_score_valid_weight`
+
+### D. 三个 hazard raw 也要用 NaN-safe 聚合,而不是直接做 NaN 算术
+
+现在这些位置:
+
+- `down_raw`(第 75-82 行)
+- `repair_raw`(第 83-89 行)
+- `rebound_raw`(第 90-96 行)
+
+不要直接 `0.45 * stress_score - 0.25 * trend_score - ...`。
+
+也用类似 `_weighted_composite()` 的方式做,并设置 `min_valid_weight`。
+
+### E. 增加总的 readiness 标记
+
+新增:
+
+- `core_score_ready`
+- `hazard_ready`
+
+建议定义:
+
+```python
+out['core_score_ready'] = out[
+    ['trend_score', 'breadth_score', 'stress_score', 'crowding_score', 'repair_score']
+].notna().all(axis=1)
+
+out['hazard_ready'] = out[
+    ['down_hazard', 'repair_hazard', 'rebound_hazard']
+].notna().all(axis=1)
+```
+
+## 验收标准
+
+在当前 PIT 上重跑后,至少满足:
+
+- `breadth_score.notna().mean() > 0.95`
+- `crowding_score.notna().mean() > 0.90`
+- `down_hazard.nunique(dropna=True) > 50`
+- `repair_hazard.nunique(dropna=True) > 50`
+- `rebound_hazard.nunique(dropna=True) > 50`
+- `down_hazard` / `repair_hazard` / `rebound_hazard` 不允许全部恒等于 `0.5`
+
+## 需要新增/修改的测试
+
+### 新增 `code/tests/test_scores_integrity.py`
+
+至少覆盖:
+
+1. 当单个分量列是常数、其余分量有效时,`breadth_score` / `crowding_score` 仍应可计算
+2. 当有效权重不足时,对应 composite score 应为 `NaN`
+3. `_sigmoid_preserve_nan()` 不得把 `NaN` 变成 `0.5`
+4. 当前 PIT 样式输入下,hazard 不能全部恒等于 `0.5`
+
+---
+
+# B2. 禁止 state machine / policy 把缺失信号静默当 0
+
+## 为什么这是 blocker
+
+当前两处关键逻辑在做“静默中和”:
+
+- `state_machine.py` 把行级输入 `fillna(0.0)` 后再做状态分类
+- `policy.py` 在 `_base_exposure()` / `_apply_caps()` 前也 `fillna(0.0)`
+
+这会导致:
+
+- 本该失败/报警的 invalid 信号,被悄悄当成“普通 chop”
+- breadth/hazard 坏掉时,系统看起来还能跑,但输出是假的
+
+## 要修改的文件与行段
+
+### 1) `code/model/state_machine.py`
+
+- **第 15-24 行**:`_raw_state`
+- **第 37-43 行**:proposal 与 crash override 逻辑
+
+### 2) `code/model/policy.py`
+
+- **第 17-37 行**:`_repair_exposure` / `_base_exposure`
+- **第 39-50 行**:`_apply_caps`
+- **第 63-68 行**:`build_exposure_plan` 主循环
+
+## 必须怎么改
+
+### A. 在 state machine 中引入显式 warmup / invalid 处理
+
+不要再:
+
+```python
+proposal = _raw_state(row.fillna(0.0))
+```
+
+改成:
+
+1. 先定义 `required_state_inputs`:
+   - `trend_score`
+   - `breadth_score`
+   - `stress_score`
+   - `crowding_score`
+   - `down_hazard`
+   - `repair_hazard`
+   - `rebound_hazard`
+
+2. 依赖 `core_score_ready` / `hazard_ready`:
+   - 在**第一个 fully ready 日期之前**:状态应为 `warmup`
+   - 在 ready 之后如果再出现 invalid:**直接 raise ValueError**
+
+建议:
+
+```python
+if not row['core_score_ready'] or not row['hazard_ready']:
+    if not system_ready_yet:
+        proposal = 'warmup'
+    else:
+        raise ValueError(f'invalid score/hazard after warmup at {ts}')
+```
+
+### B. `policy.py` 不得对 row 直接 `fillna(0.0)`
+
+当前:
+
+```python
+base = _base_exposure(row.fillna(0.0), policy_cfg)
+capped, reason = _apply_caps(row.fillna(0.0), base)
+```
+
+必须改掉。
+
+### C. 给 `warmup` 明确仓位规则
+
+如果 `state == 'warmup'`:
+
+- `target_exposure = 0.0`
+- `veto_reason = 'warmup'`
+
+### D. invalid after warmup 必须 fail fast
+
+不要默认回退到 `chop` 或 `0`,否则你只是在隐藏 bug。
+
+## 验收标准
+
+- 如果 `breadth_score` / `crowding_score` / hazard 在 warmup 之后出现 `NaN`,pipeline 必须报错,而不是继续产出 ledger
+- warmup 行必须显式可见,且仓位为 0
+- 不得再出现 “scores 坏了但还能正常输出 state/exposure” 的情况
+
+## 需要新增/修改的测试
+
+### 修改 `code/tests/test_policy.py`
+
+当前测试(第 9-19 行)只测试“量化仓位 + max step”。
+
+需要新增:
+
+1. invalid after warmup 必须报错
+2. warmup 状态下 `target_exposure == 0.0`
+3. policy 不得通过 `fillna(0)` 吞掉 invalid score/hazard
+
+---
+
+# B3. 增加 post-feature information gate(仅 raw coverage gate 不够)
+
+## 为什么这是 blocker
+
+当前 `data_quality_gate` 只检查 **non-null coverage**,完全检查不到“**高覆盖但低信息**”的问题。
+
+这在当前 bundle 里已经发生:
+
+- `weighted_ret_5` 与 `eq_weight_ret_5` 各自都不缺失
+- 但 `concentration_spread_5 = weighted_ret_5 - eq_weight_ret_5` 却是常数
+- raw gate 完全放行,后面在 `scores.py` 内部炸掉
+
+所以必须加 **post-feature gate**,专门检查 feature engineering 后的派生特征是否“常数 / 近常数 / 低信息”。
+
+## 要修改的文件与行段
+
+### 1) `code/features/pipeline.py`
+
+- **第 10-14 行**:`build_feature_table`
+
+### 2) 新增文件:`code/features/quality.py`
+
+新增一个 feature information gate。
+
+### 3) `code/backtest/frozen_walkforward.py`
+
+- **第 79-84 行**:`run_strategy_bundle` 中 `build_feature_table(df)` 之后
+
+### 4) `code/pipelines/run_demo.py`
+
+- **第 99-103 行**:在 `build_feature_table(raw)` 之后、`build_scores(featured)` 之前
+
+### 5) 如果你希望所有入口统一,也可以顺手补:
+
+- `code/pipelines/calibrate_execution_constraints.py`(通过 `run_strategy_bundle` 已间接受益)
+- `code/pipelines/frozen_hypothesis_validation.py`(通过 `run_strategy_bundle` 已间接受益)
+- `code/pipelines/real_walkforward_report.py`(通过 `run_strategy_bundle` 已间接受益)
+
+## 必须怎么改
+
+### A. 新增 `features/quality.py`
+
+建议函数:
+
+```python
+
+def evaluate_feature_information_gate(
+    df: pd.DataFrame,
+    *,
+    critical_feature_columns: list[str],
+    min_unique_non_null: int = 3,
+    std_floor: float = 1e-8,
+    max_dominant_ratio: float = 0.995,
+) -> dict[str, Any]:
+    ...
+```
+
+至少检查:
+
+- `n_unique_non_null`
+- `std`
+- `dominant_value_ratio`
+
+对以下派生列做检查(至少这些):
+
+- `concentration_spread_5`
+- `breadth_thrust_5`
+- `up_down_imbalance_20`
+- `breadth_divergence`
+- `top3_vs_top10_ratio_5`
+- `top1_top3_pressure_5`
+- `sector_concentration_change_20`
+- `volume_z_20`
+- `upper_wick_ratio_5`
+- `range_pos_120`
+
+### B. gate 行为
+
+- warmup 造成的前几行 `NaN` 可以接受
+- 但如果在完整样本上某个 critical derived feature 几乎是常数,必须:
+  - 标成 blocking error,或
+  - 至少在 strict mode 下 fail fast
+
+### C. 将 feature gate 接入主 pipeline
+
+最低要求:
+
+- `run_demo.py`
+- `run_strategy_bundle()`
+
+都要在 `build_scores()` 前运行这个 gate。
+
+### D. 原始 data gate 也要增强,但不要以为 raw gate 足够
+
+`code/data/io.py` 仍然建议增强(增加 unique/std/dominant ratio),但**不能替代** post-feature gate。
+
+## 验收标准
+
+在当前 bundle 的真实 PIT 上:
+
+- feature gate 必须明确指出 `concentration_spread_5` 为低信息/常数列
+- 如果 strict mode 开启,pipeline 必须阻断,而不是继续产出假结果
+
+## 需要新增/修改的测试
+
+### 新增 `code/tests/test_feature_quality_gate.py`
+
+至少覆盖:
+
+1. 两列原始输入都变化,但其差值为常数时,feature gate 必须拦截
+2. warmup `NaN` 不应误判为 constant failure
+3. strict mode 下,feature gate 阻断 `run_demo` / `run_strategy_bundle`
+
+---
+
+# B4. 去掉 0.25 粗量化仓位;至少升级为 0.10 ladder(优先),最好连续仓位
+
+## 为什么这是 blocker
+
+当前:
+
+- `ALLOWED_LEVELS = (0.0, 0.25, 0.50, 0.75, 1.0)`
+- `baseline` 与 `pro_risk` 的 exposure path **100% 一样**
+
+这说明大量 policy 差异在量化后被直接吃掉了。你以为在做 frozen candidate 比较,实际上很多 candidate 根本没形成真实差异。
+
+## 要修改的文件与行段
+
+### 1) `code/model/policy.py`
+
+- **第 9-15 行**:`ALLOWED_LEVELS` 与 `_quantize_exposure`
+- **第 57-68 行**:`build_exposure_plan`
+
+### 2) `code/config/regime.yaml`
+
+- **第 3-15 行**:`trading` 配置
+- **第 74-99 行**:frozen candidate overrides
+
+### 3) `code/tests/test_policy.py`
+
+- **第 9-19 行**:当前测试写死了 `{0,0.25,0.5,0.75,1.0}`
+
+## 必须怎么改
+
+### A. 最低可接受方案:0.10 ladder
+
+新增配置:
+
+```yaml
+trading:
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+```
+
+把 `_quantize_exposure()` 改成可配置 ladder:
+
+```python
+
+def _quantize_exposure(exposure: float, *, mode: str, step: float | None, allowed_levels: Sequence[float] | None):
+    ...
+```
+
+### B. 更优方案:连续仓位 + max step guard
+
+如果你愿意一步到位:
+
+- 默认 `exposure_mode: continuous`
+- `target_exposure = bounded`(只保留 `max_daily_exposure_change`)
+
+### C. 调整 `max_daily_exposure_change`
+
+当前默认 `0.25` 太粗,建议至少:
+
+- 如果用 0.10 ladder:`max_daily_exposure_change = 0.30 ~ 0.35`
+- 如果用 continuous:`0.20 ~ 0.35` 都可,但需要结合 turnover 再验
+
+## 验收标准
+
+- 在当前 bundle 上,`baseline` 与 `pro_risk` 的 exposure path **不得再 100% 相同**
+- 不同 candidate 的平均 exposure / turnover / annual_return 应产生可观测差异
+- `target_exposure.diff().abs().max()` 仍必须受 `max_daily_exposure_change` 约束
+
+## 需要新增/修改的测试
+
+### 修改 `code/tests/test_policy.py`
+
+- 删除“只能是 `{0,0.25,0.5,0.75,1.0}`”的硬编码断言
+- 保留:
+  - `0 <= target_exposure <= 1`
+  - `daily step <= max_daily_exposure_change`
+- 新增:
+  - 在一组固定 state/hazard 输入下,`baseline` 与 `pro_risk` 产生不同 exposure path
+
+---
+
+# B5. Walk-forward 默认窗口与当前 2020+ PIT 不匹配,必须改成数据驱动窗口
+
+## 为什么这是 blocker
+
+当前 `DEFAULT_WINDOWS`:
+
+- 2016-2018 train / 2019-2020 test
+- 2018-2020 train / 2021-2022 test
+- 2020-2022 train / 2023-2024 test
+
+但当前真实 PIT 只有 **2020-01-02 到 2026-04-09**,所以第一个窗口天然会被跳过。
+
+这不是策略问题,是 **window generator 和样本起点不匹配**。
+
+## 要修改的文件与行段
+
+### 1) `code/backtest/walkforward.py`
+
+- **第 14-17 行**:`DEFAULT_WINDOWS`
+
+### 2) `code/pipelines/frozen_hypothesis_validation.py`
+
+- **第 163-170 行**:调用 `run_frozen_walkforward(..., windows=DEFAULT_WINDOWS, ...)`
+- **第 173-189 行**:summary 输出窗口列表
+
+### 3) `code/pipelines/real_walkforward_report.py`
+
+- **第 197-204 行**:调用 `run_frozen_walkforward(..., windows=DEFAULT_WINDOWS, ...)`
+- **第 237-247 行**:summary 输出窗口列表
+
+## 必须怎么改
+
+### A. 不要再用硬编码 `DEFAULT_WINDOWS`
+
+新增一个数据驱动窗口生成器,例如:
+
+```python
+
+def build_expanding_windows(
+    index: pd.DatetimeIndex,
+    *,
+    min_train_years: int = 2,
+    test_years: int = 1,
+    allow_partial_last_test: bool = True,
+) -> list[WindowSpec]:
+    ...
+```
+
+### B. 当前 PIT 推荐窗口(按实际数据范围)
+
+对于当前 2020+ 数据,优先用:
+
+1. train 2020-2021 / test 2022
+2. train 2020-2022 / test 2023
+3. train 2020-2023 / test 2024
+4. train 2020-2024 / test 2025
+5. 可选:train 2020-2025 / test 2026Q1(如果 test rows 足够)
+
+### C. pipeline 要把实际使用的窗口写进 summary
+
+`frozen_validation_summary.json` / `real_walkforward_summary.json` 中的 `windows` 字段,必须来自**实际构建出的窗口**,不是旧的常量。
+
+## 验收标准
+
+在当前真实 PIT 上:
+
+- 不再出现“第一个窗口由于 pre-sample 年份不匹配而跳过”的情况
+- 至少应有 **3 个以上**有效 OOS 窗口
+- summary 中的 windows 应和数据起点一致
+
+## 需要新增/修改的测试
+
+### 新增 `code/tests/test_walkforward_window_builder.py`
+
+至少覆盖:
+
+1. 当数据从 2020 开始时,不生成 2016-2018 train 这种无效窗口
+2. 在给定 index 下,生成的窗口满足 `min_train_rows` / `min_test_rows`
+3. frozen/report pipeline summary 输出的窗口列表等于实际使用窗口
+
+---
+
+# B6. Execution calibration score 的量纲失衡,修完上面 5 项后必须重做
+
+> 这是 **P1 blocker**:不影响主策略链路是否能跑,但会影响你是否继续拿错 cost/gap 参数。
+
+## 为什么这是 blocker
+
+当前公式:
+
+```text
+utility_total_score - 3*tracking_diff_abs_mean - 20*tracking_error_20_p95 - max_drawdown
+```
+
+在当前结果分布下,`max_drawdown` 的量级远大于 tracking penalties,导致 calibration 几乎天然偏保守设置。
+
+## 要修改的文件与行段
+
+### 1) `code/pipelines/calibrate_execution_constraints.py`
+
+- **第 63-68 行**:`_calibration_score`
+- **第 150 行**:输出给 recommendation 的 `score_formula`
+
+### 2) `code/tests/test_execution_calibration_pipeline.py`
+
+- **第 42-78 行**:当前只检查 grid/recommendation 是否生成,不检查新 score 逻辑
+
+## 必须怎么改
+
+### 方案 A(推荐,utility-first)
+
+```python
+
+def _calibration_score(metrics: Mapping[str, Any]) -> float:
+    utility = float(metrics.get('utility_total_score', 0.0))
+    annual_return = float(metrics.get('annual_return', 0.0))
+    upside_capture = float(metrics.get('upside_capture', 0.0))
+    max_drawdown = float(metrics.get('max_drawdown', 0.0))
+    tracking_abs = float(metrics.get('tracking_diff_abs_mean', 0.0))
+    tracking_p95 = float(metrics.get('tracking_error_20_p95', 0.0))
+
+    return (
+        0.60 * utility
+        + 0.25 * annual_return
+        + 0.15 * upside_capture
+        - 0.50 * max_drawdown
+        - 2.0 * max(0.0, tracking_p95 - 0.003)
+        - 1.0 * max(0.0, tracking_abs - 0.001)
+    )
+```
+
+### 方案 B(更重 implementation discipline)
+
+```python
+return (
+    0.50 * utility
+    + 0.20 * sharpe
+    - 0.45 * max_drawdown
+    - 2.5 * max(0.0, tracking_p95 - 0.003)
+    - 1.5 * max(0.0, tracking_abs - 0.001)
+)
+```
+
+### 核心要求
+
+- tracking penalty 必须有“容忍带”,而不是把 1e-4 级别的 tracking noise 也线性放大惩罚
+- `score_formula` 字符串必须同步更新
+
+## 验收标准
+
+- calibration 不再稳定地偏向最保守组,仅因为 `max_drawdown` 项绝对值最大
+- top candidate 排名应对 annual_return / utility / upside_capture 有明显敏感性
+
+## 需要新增/修改的测试
+
+### 新增 `code/tests/test_execution_calibration_score.py`
+
+至少覆盖:
+
+1. tracking 在容忍带内时,不应压过 utility / annual_return 的改善
+2. 真正更高的 tracking error(超容忍带)时,score 才显著下降
+
+---
+
+# 需要顺手改掉的测试和配置(避免旧测试把 bug“保护住”)
+
+## 必改测试
+
+1. `code/tests/test_policy.py`
+   - 删除“必须量化到 0.25 档”的断言
+2. `code/tests/test_data_io.py`
+   - 增加 low-info / constant-column 测试
+3. 新增 `code/tests/test_scores_integrity.py`
+4. 新增 `code/tests/test_feature_quality_gate.py`
+5. 新增 `code/tests/test_walkforward_window_builder.py`
+6. 新增 `code/tests/test_execution_calibration_score.py`
+
+## 必改配置
+
+### `code/config/regime.yaml`
+
+除了原配置外,至少新增这些配置项:
+
+```yaml
+data_quality:
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_blocking_columns:
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+
+trading:
+  exposure_mode: ladder   # or continuous
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+```
+
+---
+
+# 完成定义(Definition of Done)
+
+只有当下面全部满足,才允许进入下一阶段(threshold / policy / objective tuning):
+
+1. 当前真实 PIT 上:
+   - `breadth_score` 非空率 > 95%
+   - `crowding_score` 非空率 > 90%
+   - 三个 hazard 不再恒等于 0.5
+2. state machine / policy 不再通过 `fillna(0)` 吞掉 invalid signal
+3. current PIT 的低信息派生特征能被 feature gate 报出
+4. `baseline` 与 `pro_risk` 不再产出 100% 相同的 exposure path
+5. walk-forward 至少有 3 个有效 OOS 窗口
+6. calibration score 改完并重新产出 grid / recommendation
+
+---
+
+# 备注:非 blocker,但建议后续检查的隐含问题
+
+这些不是“先修再跑”的 P0 blocker,但建议在下一轮审查中处理:
+
+1. `code/model/state_machine.py` 第 66-67 行:
+   - `days_since_riskoff` 当前写法是累计计数,不是 episode 内 streak 计数
+   - 目前未看到被使用,但语义是错的
+
+2. `code/backtest/engine.py` 第 148-149 行:
+   - `tracking_difference` 当前定义为 `net - gross`,这是“执行实现偏差”,不是“策略相对 benchmark 的 tracking gap”
+   - 若后续用途是 execution calibration,可以保留,但建议改名更清楚
+
+3. `code/backtest/frozen_walkforward.py` 第 201-213 行:
+   - 当前 candidate selection 仍是纯 train utility 最大化
+   - 在 B1-B6 完成后,建议再加 hard constraints / dominance filter
+
+---
+
+# 给 Codex 的执行要求(请严格遵守)
+
+1. **严格按 B1 -> B6 顺序改**,不要跳步
+2. 每完成一个 blocker:
+   - 跑对应测试
+   - 贴出变更文件列表
+   - 给出当前 bundle 上的关键数值对比(非空率、hazard unique 值数、state counts、candidate equality、有效窗口数)
+3. 在 B1-B5 完成前,**不要**调 `state_machine` threshold 或 `policy` exposure 参数
+4. 除非明确需要,否则不要大规模重构目录结构
+

+ 820 - 0
research/chinext50_regime_project/chinext50_fullcode_guidance_for_codex_2026-04-10.md

@@ -0,0 +1,820 @@
+# Chinext50 Regime — Full Code Issues Guidance for Codex (2026-04-10)
+
+This document answers the latest issue list, gives an execution order, and calls out global direction problems / hidden bugs.
+
+It is based on:
+- the current full code bundle
+- the current issue list (`gpt_pro_issues_2026-04-10.md`)
+- local replay / inspection of the shipped outputs and code
+- local re-run of tests (`81 passed`)
+
+---
+
+## 0. Executive summary
+
+## What is already correct
+- The semantic split between `stitched_frozen_oos_metrics` and `default_strategy_full_sample_metrics` is the right move.
+- `frozen_walkforward` now exposes `primary_window_success_ratio`, `partial_window_success_ratio`, `hard_pass_window_ratio`, and `selection_mode_distribution`, which are the right diagnostics.
+- The new issue list correctly identifies the current bottleneck area: candidate selection vs policy/offense.
+
+## What is *not* yet correct enough
+There are still **four structural blockers** that should be fixed before another major round of policy tuning:
+
+1. **Stitched OOS is still being compared against full-sample baseline, not same-period baseline.**  
+   This is a semantic bug, not just a presentation detail.
+
+2. **Turnover is effectively counted multiple times in candidate selection.**  
+   It enters:
+   - hard constraints,
+   - `utility_total_score`,
+   - and then again as a direct penalty in `_compute_selection_score()`.
+
+3. **State frequency is still too defensive for a system that wants more upside capture.**  
+   In the shipped stitched OOS ledger, local replay shows roughly:
+   - `risk_off ~ 36.8%`
+   - `chop ~ 32.9%`
+   - `repair ~ 21.8%`
+   - `trend ~ 4.8%`
+   - `euphoric_late ~ 3.8%`
+   Mean executed exposure is only about `0.335`.
+
+4. **The state machine ordering likely lets `repair` shadow `trend`.**  
+   `_raw_state()` checks `repair` before `trend`, so overlapping days are classified as `repair` instead of `trend`.
+
+## Main recommendation
+**Do not tune policy first.**
+
+Correct order:
+1. Fix baseline comparison semantics.
+2. Fix selection-score / turnover double-counting.
+3. Fix state-machine precedence / thresholds.
+4. Then tune policy mapping.
+5. Only after that consider candidate-level robustness filtering.
+
+---
+
+## 1. Direct answers to the 4 current questions
+
+### Q1. After semantics are corrected, should the next round prioritize `backtest/utility.py` or policy mapping?
+
+## Answer
+**Prioritize `utility.py` + `backtest/frozen_walkforward.py` first, not policy mapping.**
+
+### Why
+Because policy tuning is currently evaluated through a candidate-selection layer that is still biased:
+- `utility_total_score` already penalizes turnover heavily.
+- `_compute_selection_score()` uses `utility_total_score` indirectly via `stability_score`.
+- `_compute_selection_score()` then applies an additional direct turnover penalty.
+- hard constraints also reject high-turnover candidates.
+
+So right now, **turnover is represented three times**:
+1. hard constraint
+2. utility
+3. selection score direct penalty
+
+If you tune policy before fixing that, you will likely bias the search toward overly defensive candidates again.
+
+### What to change first
+#### Target files
+- `backtest/utility.py`
+- `backtest/frozen_walkforward.py`
+- `config/regime.yaml`
+- `tests/test_utility.py`
+- `tests/test_frozen_walkforward.py`
+
+#### Exact change
+### A. Split utility into `core_utility` and `net_utility`
+In `backtest/utility.py`, replace the current one-piece logic with:
+
+```python
+from __future__ import annotations
+
+
+def core_utility(
+    sharpe_delta: float,
+    drawdown_improvement: float,
+    upside_capture: float,
+    upside_target: float = 0.55,
+) -> float:
+    return (
+        0.45 * sharpe_delta
+        + 0.35 * drawdown_improvement
+        + 0.20 * (upside_capture - upside_target)
+    )
+
+
+def turnover_penalty(
+    annual_turnover: float,
+    start: float = 8.0,
+    rate: float = 0.010,
+) -> float:
+    return rate * max(0.0, annual_turnover - start)
+
+
+def net_utility(
+    sharpe_delta: float,
+    drawdown_improvement: float,
+    upside_capture: float,
+    annual_turnover: float,
+    upside_target: float = 0.55,
+    turnover_penalty_start: float = 8.0,
+    turnover_penalty_rate: float = 0.010,
+) -> float:
+    return core_utility(
+        sharpe_delta=sharpe_delta,
+        drawdown_improvement=drawdown_improvement,
+        upside_capture=upside_capture,
+        upside_target=upside_target,
+    ) - turnover_penalty(
+        annual_turnover=annual_turnover,
+        start=turnover_penalty_start,
+        rate=turnover_penalty_rate,
+    )
+```
+
+Then update `utility_from_metrics()` to use `annual_turnover` explicitly.
+
+### B. In frozen candidate selection, use **core utility** for stability, not net utility
+In `backtest/frozen_walkforward.py`, inside `_compute_selection_score()`:
+- stop deriving `stability_score` from `utility_total_score`
+- derive it from `core_utility` (or a separately passed `core_utility_score`)
+- keep direct turnover penalty exactly once in `_compute_selection_score()`
+
+Suggested replacement:
+
+```python
+core_utility_score = _clip(
+    (core_utility_value - float(settings['core_utility_floor']))
+    / max(float(settings['core_utility_target']) - float(settings['core_utility_floor']), 1e-12),
+    0.0,
+    score_cap,
+)
+```
+
+And then:
+
+```python
+score = (
+    return_ratio_weight * return_ratio
+    + upside_weight * upside_score
+    + drawdown_weight * drawdown_score
+    + sharpe_delta_weight * sharpe_delta_score
+    + stability_weight * core_utility_score
+    - turnover_penalty
+)
+```
+
+### C. Add config defaults
+In `config/regime.yaml`:
+
+```yaml
+evaluation:
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+
+frozen_validation:
+  candidate_selection:
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+```
+
+### Expected direction of impact
+- Less false punishment of offense.
+- `positive_window_ratio` may rise even without policy changes.
+- Candidate ranking will stop being dominated by turnover aversion.
+- `hard_pass_window_ratio` itself may not jump immediately, but fallback windows will become more interpretable.
+
+### Validation thresholds
+- `tests` all green.
+- In candidate ranking diagnostics, direct turnover penalty appears only once.
+- `positive_window_ratio` should no longer be treated as primary acceptance, but it should stop clustering near zero.
+- Selected-candidate ranking should not change solely due to duplicated turnover effects.
+
+---
+
+### Q2. For stitched OOS underperformance (return still behind, drawdown ratio up), what parameter adjustment order and guardrails should be used?
+
+## Answer
+Use this order:
+
+### Phase 1 — semantic / scoring fixes
+1. Same-period baseline fix
+2. utility / selection-score turnover fix
+
+### Phase 2 — state machine tuning
+3. state precedence fix (`repair` vs `trend`)
+4. threshold tuning in `model/state_machine.py`
+
+### Phase 3 — policy mapping tuning
+5. `trend/chop/repair/euphoric_late` exposure tuning in `model/policy.py`
+
+### Phase 4 — only then widen candidate robustness / frontier logic
+6. candidate robustness tie-break / filter
+
+## Guardrails for each round
+
+### Global acceptance guardrails
+Use these on **stitched OOS vs same-period stitched baseline**, not vs full-sample baseline:
+
+- `drawdown_ratio_vs_baseline <= 0.70`
+- `primary_window_success_ratio >= 0.50` (minimum)
+- `primary_window_success_ratio >= 0.60` (target)
+- `hard_pass_window_ratio >= 0.80`
+- `selection_mode_distribution['frontier_fallback_no_hard_pass'] <= 1`
+- `stitched_oos_upside_capture >= 0.40` (near-term target), then `>= 0.50`
+- `mean_executed_exposure <= 0.50` unless drawdown ratio stays <= `0.65`
+
+### Stop conditions
+Stop a tuning branch and revert if any one of these happens:
+- `drawdown_ratio_vs_baseline > 0.75`
+- `hard_pass_window_ratio < 0.60`
+- `primary_window_success_ratio < 0.50`
+- `frontier_fallback_no_hard_pass` windows increase vs current baseline
+- selected candidate collapses to one candidate in `>= 80%` of windows and performance does not improve
+
+## Exact tuning order
+
+### Round 1 — fix state precedence before changing exposure values
+#### Target file
+- `model/state_machine.py`
+
+#### Problem
+`repair` is checked before `trend`, so overlap days can never become `trend`.
+
+#### Exact change
+Reorder `_raw_state()` to evaluate `trend/euphoric_late` before `repair`, **or** add an exclusivity clause.
+
+Recommended logic:
+
+```python
+def _raw_state(row: pd.Series) -> str:
+    if row['down_hazard'] >= 0.66 or (row['stress_score'] >= 0.90 and row['trend_score'] <= -0.15):
+        return 'risk_off'
+
+    if row['trend_score'] >= 0.35 and row['breadth_score'] >= -0.05 and row['stress_score'] <= 0.55:
+        if row['crowding_score'] >= 0.82 or row['rebound_hazard'] >= 0.78:
+            return 'euphoric_late'
+        return 'trend'
+
+    if (
+        row['repair_hazard'] >= 0.60
+        and row['stress_score'] <= 0.75
+        and row['d_stress'] <= 0.0
+        and row['trend_score'] < 0.35
+    ):
+        return 'repair'
+
+    return 'chop'
+```
+
+#### Expected direction
+- `trend` share rises.
+- `repair` becomes cleaner and more transitional.
+- `risk_off` frequency falls modestly.
+- Offense improves without immediately changing mapping.
+
+#### Validation
+- In stitched ledger, `trend + euphoric_late` should rise above `~12%` before policy tuning.
+- `risk_off` should fall below `~32%`.
+- `primary_window_success_ratio` must not fall.
+
+---
+
+### Round 2 — tune policy mapping only after state mix improves
+#### Target file
+- `model/policy.py`
+- `config/regime.yaml`
+
+#### Exact parameter changes
+Start with one round only:
+
+```yaml
+policy:
+  trend: 0.95
+  euphoric_late: 0.70
+  chop: 0.35
+  risk_off: 0.00
+  repair_rebound_base: 0.40
+  repair_rebound_max: 0.85
+trading:
+  max_daily_exposure_change: 0.35
+```
+
+Then in `_repair_exposure()` leave structure intact, but use the new base/max.
+
+#### Expected direction
+- Increase average exposure from `~0.335` toward `0.38~0.45`
+- Improve upside capture first
+- Slightly raise turnover and drawdown
+
+#### Validation
+- `stitched_oos_upside_capture` improves by at least `+0.05` absolute
+- `drawdown_ratio_vs_baseline <= 0.70`
+- `annual_turnover <= 22` unless annual return improves materially
+
+---
+
+### Round 3 — only if offense is still too weak
+If after Round 2, `trend + euphoric_late` is still below `~15%` and upside capture < `0.40`, then make a second threshold pass:
+
+#### Additional threshold loosening
+```python
+risk_off:
+  down_hazard threshold: 0.66 -> 0.68
+  crash stress threshold: 0.90 -> 0.92
+trend:
+  trend_score threshold: 0.35 -> 0.30
+  stress ceiling: 0.55 -> 0.60
+repair:
+  repair_hazard threshold: 0.60 -> 0.62
+```
+
+Do **not** change both state thresholds and policy values in the same experiment branch.
+
+---
+
+### Q3. `primary_window_success_ratio=0.5` just clears the floor, but `hard_pass_window_ratio=0.6` is low. Which constraints/weights should be adjusted first to get hard-pass back to >=0.8 without reverting to a single candidate?
+
+## Answer
+First repair the selection semantics, then loosen the hard constraints **symmetrically**.
+
+### Why not just loosen turnover?
+Because the current no-hard-pass windows are split between:
+- candidates failing turnover,
+- and candidates failing upside.
+
+If you only relax turnover, you likely make `pro_risk` dominate even more.
+If you only relax upside, you may admit weak low-upside candidates.
+
+## Recommended sequence
+
+### Step 1 — modest constraint widening
+#### Target file
+- `config/regime.yaml`
+
+#### Change candidate-selection defaults to:
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    upside_capture_min: 0.26
+    max_drawdown_ratio_vs_benchmark: 0.75
+    annual_turnover_soft_max: 19.0
+    annual_return_override_abs: 0.04
+    annual_return_override_ratio: 0.35
+```
+
+### Why these exact moves
+- `upside_capture_min 0.28 -> 0.26` helps borderline baseline / balanced candidates pass.
+- `drawdown_ratio 0.72 -> 0.75` is a mild widening, not a capitulation.
+- `annual_turnover_soft_max 18 -> 19` helps offense candidates pass without fully opening the gate.
+- lower return override thresholds slightly reduce “turnover fail” harshness.
+
+### Step 2 — rebalance selection score weights
+In `config/regime.yaml`:
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    return_ratio_weight: 0.25
+    upside_weight: 0.25
+    drawdown_weight: 0.25
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.15
+    turnover_penalty_per_unit: 0.012
+    turnover_penalty_start: 13.0
+```
+
+### Why
+Current score is still too offense-sensitive in a way that can amplify a single candidate once constraints are relaxed. Raising `stability_weight` (after it is de-duplicated from turnover) helps prevent one-dimensional winner-take-all behavior.
+
+### Step 3 — add a simple candidate-diversity warning, not a hard rule
+#### Target file
+- `backtest/frozen_walkforward.py`
+
+Add summary diagnostics:
+- `selection_margin_to_second_best`
+- `dominant_candidate_ratio`
+
+Emit warning if:
+- one candidate wins `>= 0.80` of windows **and**
+- median margin to 2nd best `< 0.03`
+
+This is a warning only, not a hard rejection.
+
+### Expected direction
+- `hard_pass_window_ratio` should recover toward `0.8~1.0`
+- fallback windows should shrink
+- candidate distribution should stay mixed
+
+### Validation
+Accept this round only if all hold:
+- `hard_pass_window_ratio >= 0.80`
+- no candidate selected in more than `80%` of processed windows unless annual return clearly improves
+- `primary_window_success_ratio >= 0.50`
+- stitched OOS drawdown ratio does not worsen beyond `0.70`
+
+---
+
+### Q4. Should candidate-level probability / robustness filtering be introduced now to reduce frontier fallback frequency?
+
+## Answer
+**Not as a blocking step.**
+
+Do not add a hard candidate-probability filter yet.
+First fix:
+1. same-period baseline comparison
+2. turnover double-counting
+3. state precedence / threshold issues
+
+Then add a **light robustness tie-breaker**, not a hard gate.
+
+## Recommended approach
+#### Target file
+- `backtest/frozen_walkforward.py`
+
+Add two new diagnostics per candidate per train window:
+
+### A. Constraint margin
+A normalized measure of how far inside the feasible set the candidate is.
+
+```python
+def _constraint_margin(metrics, settings) -> float:
+    upside_margin = (metrics['upside_capture'] - settings['upside_capture_min']) / max(settings['upside_capture_min'], 1e-12)
+    drawdown_ratio = metrics['max_drawdown'] / max(metrics['benchmark_max_drawdown'], 1e-12)
+    drawdown_margin = (settings['max_drawdown_ratio_vs_benchmark'] - drawdown_ratio) / max(settings['max_drawdown_ratio_vs_benchmark'], 1e-12)
+    turnover_margin = (settings['annual_turnover_soft_max'] - metrics['annual_turnover']) / max(settings['annual_turnover_soft_max'], 1e-12)
+    return float(min(upside_margin, drawdown_margin, turnover_margin))
+```
+
+### B. Rank-stability diagnostic
+Compute rolling candidate rank over the last 2 completed windows and expose:
+- `prior_rank_mean`
+- `prior_rank_std`
+
+## How to use it
+Only for tie-break / fallback ranking:
+
+```python
+fallback_rank = selection_score - 0.50 * violation_distance + 0.10 * clip(constraint_margin, -1.0, 1.0)
+```
+
+Do **not** hard-drop candidates on this basis yet.
+
+### Expected direction
+- fewer arbitrary frontier fallback selections
+- more reproducible fallback choices
+- lower chance of a single noisy candidate dominating
+
+### Validation
+- fallback windows should drop from `2` to `<=1`
+- if fallback still occurs, chosen candidate should have smallest violation distance and non-negative margin preference
+
+---
+
+## 2. Must-change blockers
+
+## Blocking B1 — Compare stitched OOS against same-period stitched baseline
+
+### Problem
+`comparison.stitched_oos_vs_baseline` currently compares:
+- stitched OOS strategy metrics
+against
+- full-sample baseline metrics
+
+That is a horizon mismatch.
+
+### Why this is serious
+On local replay, using the shipped stitched ledger:
+- current comparison reports `annual_return_delta ≈ -0.0951`
+- but when compared against buy-and-hold on the **same stitched OOS dates**, annual return delta flips to roughly `+0.0299`
+
+That is not a small drift. It can reverse the sign of the headline return conclusion.
+
+### Target files
+- `pipelines/real_walkforward_report.py`
+- `tests/test_real_walkforward_report_pipeline.py`
+
+### Exact change
+1. Build a same-period baseline ledger aligned to stitched dates.
+2. Compute `baseline_stitched_oos_metrics`.
+3. Use that for `comparison.stitched_oos_vs_baseline`.
+4. Keep `baseline_full_sample_metrics` only as a reference section.
+
+Recommended implementation sketch:
+
+```python
+def _baseline_metrics_on_same_dates(stitched_ledger: pd.DataFrame, config: Mapping[str, Any]) -> dict[str, Any]:
+    if stitched_ledger.empty:
+        return _metrics_from_ledger(pd.DataFrame(), config)
+    baseline_returns = stitched_ledger['asset_exec_return']
+    baseline_turnover = pd.Series(0.0, index=stitched_ledger.index)
+    baseline_tracking = pd.Series(0.0, index=stitched_ledger.index)
+    metrics = compute_metrics(
+        strategy_returns=baseline_returns,
+        benchmark_returns=baseline_returns,
+        turnover=baseline_turnover,
+        tracking_difference=baseline_tracking,
+        annualization=int(dict((config or {}).get('trading', {})).get('annualization', 252)),
+    )
+    out = _normalize_metrics(metrics)
+    out['utility_total_score'] = float(utility_from_metrics(out))
+    out['utility_status'] = utility_status(out['utility_total_score'])
+    return out
+```
+
+Then add to summary:
+
+```python
+'baseline_stitched_oos_metrics': baseline_stitched_metrics,
+```
+
+And point:
+
+```python
+stitched_vs_baseline = _comparison_against_baseline(stitched_oos_metrics, baseline_stitched_metrics)
+```
+
+### Validation
+- new summary includes both:
+  - `baseline_stitched_oos_metrics`
+  - `baseline_full_sample_metrics`
+- stitched comparison uses same-date baseline
+- tests explicitly assert this semantic split
+
+---
+
+## Blocking B2 — Remove turnover double counting from candidate selection
+
+### Problem
+Turnover is counted:
+1. in hard constraints
+2. in `utility_total_score`
+3. again in `_compute_selection_score()`
+
+### Target files
+- `backtest/utility.py`
+- `backtest/frozen_walkforward.py`
+- `tests/test_utility.py`
+- `tests/test_frozen_walkforward.py`
+
+### Exact change
+- implement `core_utility()` and `turnover_penalty()`
+- selection stability must be based on `core_utility`, not `net utility`
+- direct turnover penalty remains once in selection score
+
+### Validation
+- tests prove no duplicate turnover representation in selection score
+- candidate ranking changes are explainable by score components
+
+---
+
+## Blocking B3 — Fix state precedence so `repair` does not shadow `trend`
+
+### Problem
+`repair` is evaluated before `trend`, which can suppress offense even when trend conditions are already acceptable.
+
+### Target files
+- `model/state_machine.py`
+- `tests/test_policy.py` or a new `tests/test_state_machine.py`
+
+### Exact change
+- reorder `_raw_state()`
+- or enforce `trend_score < trend_threshold` inside repair
+
+### Validation
+- add tests with overlapping row conditions
+- assert overlap resolves to `trend`, not `repair`
+
+---
+
+## Blocking B4 — Keep `positive_window_ratio` as diagnostic only
+
+### Problem
+`positive_window_ratio` is still based on `test_utility_total_score > 0`.
+That is acceptable as a secondary diagnostic, but it is not a primary acceptance metric.
+
+### Target files
+- `backtest/frozen_walkforward.py`
+- `pipelines/real_walkforward_report.py`
+
+### Exact change
+- leave field in place for backward compatibility
+- add explicit comment / report note that primary acceptance uses `primary_window_success_ratio`
+
+### Validation
+- report labels `positive_window_ratio` as diagnostic
+- acceptance logic does not rely on it
+
+---
+
+## 3. High-impact changes
+
+## High impact H1 — Tune state thresholds after B1-B3
+
+### Target file
+- `model/state_machine.py`
+- `config/regime.yaml`
+
+### First-pass thresholds
+Use:
+
+```python
+risk_off:
+  down_hazard >= 0.66
+  or (stress_score >= 0.90 and trend_score <= -0.15)
+
+trend:
+  trend_score >= 0.35
+  breadth_score >= -0.05
+  stress_score <= 0.55
+
+euphoric_late:
+  crowding_score >= 0.82 or rebound_hazard >= 0.78
+
+repair:
+  repair_hazard >= 0.60
+  stress_score <= 0.75
+  d_stress <= 0.0
+  trend_score < 0.35
+```
+
+### Expected direction
+- more trend days
+- less needless repair persistence
+- risk_off slightly reduced
+
+### Validation
+- stitched ledger state mix moves toward:
+  - `trend + euphoric_late >= 0.12`
+  - `risk_off <= 0.32`
+
+---
+
+## High impact H2 — Tune exposure mapping only after state mix improves
+
+### Target file
+- `model/policy.py`
+- `config/regime.yaml`
+
+### First-pass policy values
+
+```yaml
+policy:
+  trend: 0.95
+  euphoric_late: 0.70
+  chop: 0.35
+  risk_off: 0.00
+  repair_rebound_base: 0.40
+  repair_rebound_max: 0.85
+trading:
+  max_daily_exposure_change: 0.35
+```
+
+### Expected direction
+- higher upside capture
+- mild increase in drawdown and turnover
+
+### Validation
+- upside capture +0.05 absolute minimum
+- drawdown ratio still <= 0.70
+
+---
+
+## High impact H3 — Restore hard-pass ratio to >= 0.8 without collapsing to one candidate
+
+### Target file
+- `config/regime.yaml`
+
+### Candidate selection defaults
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    upside_capture_min: 0.26
+    max_drawdown_ratio_vs_benchmark: 0.75
+    annual_turnover_soft_max: 19.0
+    annual_return_override_abs: 0.04
+    annual_return_override_ratio: 0.35
+    return_ratio_weight: 0.25
+    upside_weight: 0.25
+    drawdown_weight: 0.25
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.15
+    turnover_penalty_per_unit: 0.012
+    turnover_penalty_start: 13.0
+```
+
+### Expected direction
+- fewer no-hard-pass windows
+- more balanced candidate competition
+
+### Validation
+- `hard_pass_window_ratio >= 0.80`
+- no candidate > `80%` of processed windows unless returns clearly improve
+
+---
+
+## 4. Nice-to-have improvements
+
+## Nice N1 — Add candidate robustness tie-breaker
+- target: `backtest/frozen_walkforward.py`
+- use `constraint_margin` + prior rank stability only in fallback mode
+
+## Nice N2 — Expand candidate grid more orthogonally
+Current candidates move several knobs together. Add a wider but more interpretable grid:
+- trend-only variants
+- chop-only variants
+- repair-only variants
+- cap-tightness variants
+
+This makes attribution easier.
+
+## Nice N3 — Export state-mix and exposure-mix diagnostics automatically
+Add to summary:
+- state distribution
+- target exposure distribution
+- executed exposure mean / median
+
+This prevents future “policy vs state” ambiguity.
+
+---
+
+## 5. Additional global direction issues / hidden bugs
+
+## Hidden bug G1 — stitched-vs-full-sample baseline horizon mismatch
+Already covered in B1. This is the biggest remaining semantic bug.
+
+## Hidden bug G2 — mixed upside targets across layers
+Current system uses different upside anchors in different places:
+- utility: effectively `0.75`
+- candidate hard-pass min: `0.28`
+- selection target: `0.45`
+- window success min: `0.25`
+
+This is not automatically wrong, but right now it is not explicitly documented as stage-specific.
+
+### Recommendation
+Document as:
+- utility upside target = medium-term aspiration (`0.55` suggested)
+- hard-pass min = permissive candidate screen (`0.26`)
+- selection target = ranking target (`0.45`)
+- window success min = minimal OOS viability (`0.25`)
+
+## Hidden bug G3 — `stability_score` is not true stability
+Right now it is a transformed utility proxy, not cross-window robustness.
+Rename conceptually or implement actual robustness later.
+
+## Hidden bug G4 — candidate grid is not orthogonal
+`defensive`, `balanced_capture`, `pro_risk` each move multiple levers at once. That makes attribution noisy.
+
+## Hidden bug G5 — report consumers can still misuse `positive_window_ratio`
+Backward-compatible field should remain, but documentation should demote it.
+
+---
+
+## 6. Recommended execution order for Codex
+
+1. **B1** same-period stitched baseline fix
+2. **B2** utility / selection-score turnover de-duplication
+3. **B3** state precedence fix
+4. **H1** threshold tuning pass
+5. **H2** policy tuning pass
+6. **H3** hard-pass restoration / candidate-selection rebalance
+7. **N1** robustness tie-breaker (only after the above)
+
+Do **not** start with H2.
+
+---
+
+## 7. Minimal acceptance checklist for the next revision
+
+The next revision is acceptable only if all hold:
+
+- `comparison.stitched_oos_vs_baseline` uses same-period baseline
+- `hard_pass_window_ratio >= 0.80`
+- `primary_window_success_ratio >= 0.50` minimum, `>= 0.60` target
+- `frontier_fallback_no_hard_pass <= 1`
+- `stitched_oos_upside_capture >= 0.40`
+- `drawdown_ratio_vs_baseline <= 0.70`
+- no obvious single-candidate collapse unless annual return and utility improve materially
+
+---
+
+## 8. Short instruction block to prepend when feeding Codex
+
+Use this before pasting the task:
+
+```text
+Please implement the changes in this document in the exact order B1 -> B2 -> B3 -> H1 -> H2 -> H3.
+
+Rules:
+1. Do not tune policy before B1-B3 are complete.
+2. Keep backward-compatible fields unless explicitly told to remove them.
+3. After each block:
+   - list changed files
+   - run targeted tests
+   - report metric deltas
+4. Do not silently change unrelated configs or thresholds.
+```

+ 691 - 0
research/chinext50_regime_project/chinext50_harden_derived_breadth_direction_handoff_2026-04-09.md

@@ -0,0 +1,691 @@
+# 创业板50 `harden-derived-breadth` 瓶颈诊断、修复方案与方向把控(2026-04-09)
+
+## 结论
+
+当前 bundle 表面上的 blocker 是 **行业元数据源失稳**,但我看完代码、OpenSpec 和产物后,结论更强一些:
+
+**这不是一个单点 provider 问题,而是一个“元数据/权重/成员资格语义层”还没真正生产化的问题。**
+
+所以接下来最该做的,不是继续围着 Akshare / Mairui 调 retry,而是把这一层重构成:
+
+1. **离线快照优先**,运行时 API 只做增量刷新,不再做主数据源;
+2. **历史成员/历史权重** 成为一等公民,不能再用 latest 50 全历史回填;
+3. **metadata gate 从统计完整性升级到语义完整性**;
+4. 在这层修正完成前,**不要把当前 real walk-forward 的经济结果当成可定性的模型结论**。
+
+一句话判断:
+
+> 当前瓶颈的直接修复方向是 **数据语义层 hardening**;
+> 当前项目方向上的真正风险,则是 **把“能跑通”误认为“经济逻辑已成立”**。
+
+---
+
+## 这次 bundle 实际证明了什么
+
+### 已经做对的部分
+
+- full50 strict ingestion 已经能跑通;
+- breadth 统计完整性 gate 已经通过;
+- cache 边界对非交易日容忍的问题已经修掉;
+- e2e report / frozen validation / calibration 流程已经打通。
+
+这说明工程骨架没有歪。
+
+### 但当前证据同时说明两件事
+
+#### 1)元数据链路确实退化了
+
+从产物看:
+
+- `industry_unknown_ratio = 1.0`
+- `industry_unique_count = 1`
+- `sector_concentration_mode = weight_hhi_proxy`
+- `metadata_error_symbol_count = 50`
+- `meta_cache.hit_count = 50, miss_count = 0`
+
+这说明当前 full50 rerun 已经完全吃缓存,但缓存里的行业结果本身就是退化状态。
+
+#### 2)经济效果目前仍然没有过 production 线
+
+从 `real_walkforward_summary.json` / `frozen_validation_summary.json` 看:
+
+- `positive_window_ratio = 0.4`
+- 5 个窗口里 **全部选择 defensive**
+- strategy `annual_return = 0.0673`
+- baseline `annual_return = 0.1463`
+- strategy `utility_total_score = -0.1104`
+- baseline `utility_total_score = 0.0348`
+- `utility_delta_vs_baseline = -0.1452`
+- `upside_capture = 0.2593`
+- `annual_turnover = 18.12`
+
+这组数说明:
+
+**当前系统不是“没降低风险”,而是“过于防守,以明显牺牲上涨捕获为代价”,所以经济上仍然不成立。**
+
+而 calibration 产物里,最优组合的 `calibration_score` 仍然是负数,说明问题不是调一点 execution cost 就能救回来,而是更结构性的。
+
+---
+
+## 我认为当前真正的瓶颈顺序
+
+### P0:先修数据语义层
+
+这是眼前最优先的工作。
+
+因为你当前很多核心 breadth feature 并不只是“解释性变差”,而是**有可能已经被错误权重/错误成员资格污染**:
+
+- `weighted_ret_5`
+- `top1_contribution_5`
+- `top3_contribution_5`
+- `top10_contribution_5`
+- `sector_concentration_20`
+
+如果这层是错的,后面的 regime / policy / walk-forward 都会被误导。
+
+### P1:然后重跑完整 real walk-forward
+
+只有在 P0 修完后,再重跑 frozen validation / report / calibration,才能判断当前 defensive bias 到底有多少是真策略问题,多少是数据层问题。
+
+### P2:最后才谈经济效果调参
+
+如果 P0 修完后,仍然是:
+
+- defensive 全窗口胜出
+- utility 仍然为负
+- upside capture 仍然过低
+
+那时候才说明真正该改的是 **controller / utility / candidate set**。
+
+---
+
+## 我发现的隐藏问题(比 blocker 文档里写得更严重)
+
+## 1. `_meta_cache.json` 现在会“缓存坏状态”,而且几乎不会自愈
+
+当前 `breadth_builder.py` 的 `_resolve_meta_with_cache()` 逻辑是:
+
+- 只要缓存里有该 symbol 的 meta,就直接返回;
+- 缓存里没有 `fetched_at` / `expires_at` / `status` / `freshness`;
+- `industry='unknown'` 或者带 error 的记录,也会被当作有效 cache 命中。
+
+这意味着:
+
+**一旦某轮运行把全体 symbol 都缓存成 unknown/error,后续 rerun 即使 provider 恢复,也会继续吃这批坏缓存。**
+
+这正好和当前产物吻合:
+
+- `meta_cache.hit_count = 50`
+- `industry_unknown_ratio = 1.0`
+
+也就是说,现在不是 provider 只在“本轮”有问题,而是 cache 机制把退化状态固化了。
+
+### 这件事必须立即修
+
+建议把 meta cache 从“纯结果缓存”改成“带状态的 snapshot cache”:
+
+字段至少加:
+
+- `fetched_at`
+- `expires_at`
+- `status`:`resolved / partial / unknown / error`
+- `provider_attempted`
+- `provider_resolved`
+- `error_code`
+- `snapshot_version`
+
+并引入两个 TTL:
+
+- **正缓存 TTL**:
+  - `industry`: 30 天
+  - `weight / float-share related`: 5~7 天(如果仍保留)
+- **负缓存 TTL**:
+  - `429 / connection-abort`: 6 小时
+  - 连续失败可放大到 24 小时,但不能无限期
+
+只要缓存状态是 `unknown` 或 `error`,且超出负缓存 TTL,就必须强制 refresh。
+
+---
+
+## 2. `metadata_provider_counts` 现在在语义上是误导性的
+
+当前 `_resolve_symbol_meta()` 里 `provider` 默认是 `akshare`。
+
+如果:
+
+- Akshare 抛异常
+- Mairui 也失败或返回 unknown
+
+函数最后仍然可能返回:
+
+- `provider = 'akshare'`
+- `meta['industry'] = 'unknown'`
+- `error` 带着两段失败信息
+
+所以当前 summary 里的:
+
+- `metadata_provider_counts = {'akshare': 50}`
+
+并不表示 “50 个 symbol 成功由 akshare 提供 metadata”,而更像是:
+
+**50 个 symbol 的默认 provider label 留在 akshare,即使事实上没有成功解析出有效 industry。**
+
+### 这会造成两个坏处
+
+1. diagnostics 误导人,容易高估 provider 质量;
+2. 后续 gate 若基于 provider counts 做判断,会得出错结论。
+
+### 修法
+
+把 provider 拆成两个字段:
+
+- `provider_attempted_chain`: `['akshare', 'mairui']`
+- `provider_resolved`: `akshare / mairui / snapshot / error / unknown`
+
+`metadata_provider_counts` 只统计 `provider_resolved`。
+
+---
+
+## 3. `float_shares` 失败时默认回退到 `1.0`,会直接污染加权类特征
+
+当前代码里,如果 float shares 没拿到,最终是:
+
+- `float_shares[symbol] = 1.0`
+
+然后 `_build_required_breadth_columns()` 里用:
+
+- `caps = close * float_shares`
+- `weights = caps / caps.sum()`
+
+这意味着:
+
+**当 float shares 全失效时,系统并不是退回到“等权”,而是退回到“按价格加权”。**
+
+这会直接污染:
+
+- `weighted_ret_5`
+- `top1/top3/top10_contribution_5`
+- `sector_concentration_20`
+
+所以当前 blocker 绝不是“只是行业解释性下降”。
+
+**它已经上升到“权重特征可能语义错误”的层级。**
+
+### 修法
+
+这里不要再从 `stock_individual_info_em` 拿 float shares 作为长期方案。
+
+对你这个项目,**最稳的做法不是 stock metadata 推重建权重,而是直接拿官方指数成分权重。**
+
+也就是:
+
+- `index_weight` → 作为权重主来源
+- `stock_basic` / `index_member_all` → 作为行业分类来源
+- `daily_basic` → 只在你需要额外流通股本、流通市值时作为补充
+
+这样一来:
+
+- 成员资格有了
+- 历史权重有了
+- 不需要再依赖 runtime meta 拼 float shares
+
+---
+
+## 4. 当前 breadth 存在明显 survivorship bias
+
+这点非常关键,而且 blocker 文档里没有点出来。
+
+当前流程是:
+
+- 拉“latest constituents”
+- 取 50 个当前成分股
+- 对这 50 个符号拉 2020-2026 全历史
+- 用这些历史来构造整段 breadth
+
+而 `_fetch_constituents_akshare()` 虽然解析了 `entry_date`,但 downstream **根本没有用它**。
+
+也就是说,当前系统相当于:
+
+**用今天的 50 只成分股去回填过去整个样本区间。**
+
+这会带来两个偏差:
+
+1. **pre-entry 回填污染**:个股还没入指时就被算进 breadth;
+2. **历史成分替代缺失**:已经被剔除的历史成分根本没进入面板。
+
+### 这件事的优先级非常高
+
+我建议不要在这个版本上补一个“用 entry_date 截断历史”的半修补,而是直接切到:
+
+**以 `index_weight` 作为历史成员 + 历史权重的统一主表。**
+
+也就是:
+
+- monthly `index_weight(index_code='399673.SZ')`
+- 日级 forward fill 到交易日
+- 缺权重的日子按最近权重延展
+- 未在权重表里的 symbol,当天视为不在 index
+
+这一步同时解决:
+
+- survivorship bias
+- float share fallback
+- topN 权重语义
+- sector concentration 的行业加权基础
+
+---
+
+## 5. 当前 strict gate 只检查“统计活性”,不检查“语义可信度”
+
+现在的 `evaluate_breadth_source_integrity()` 主要检查:
+
+- non-null
+- unique count
+- dominant value ratio
+- std floor
+
+所以即使权重是错的、成员资格是 latest-only、industry 全 unknown,只要最终序列“看起来有波动”,strict gate 也能通过。
+
+这解释了为什么当前:
+
+- `breadth_integrity_summary.json` strict pass
+- 但 metadata 质量已经完全退化
+
+### 建议新增一组“语义 gate”
+
+#### A. metadata coverage gate
+
+新增指标:
+
+- `industry_resolved_ratio`
+- `weight_resolved_ratio`
+- `historical_membership_mode`
+- `weight_source_mode`
+- `metadata_snapshot_age_days`
+
+#### B. block 规则
+
+建议:
+
+- `weight_source_mode != official_index_weight` 时,strict 模式 **直接 blocking**
+- `historical_membership_mode != time_varying_index_membership` 时,strict 模式 **直接 blocking**
+- `industry_unknown_ratio > 0.10` 时,strict 模式 **blocking**
+- `industry_unknown_ratio > 0.02` 时,告警但不阻断
+
+#### C. warmup exception
+
+仅允许两类 warmup:
+
+1. **snapshot 初次建库**
+2. **新纳入成分股少量缺分类**
+
+建议例外规则:
+
+- 缺分类 symbol 仅限 **新增成分股**
+- 缺失数量不超过 **2 只** 或 **5%** universe
+- 例外期不超过 **10 个交易日**
+
+超过这个范围,就不要继续当 warmup 问题处理。
+
+---
+
+## 6. `weight_hhi_proxy` 只能是短期 degraded fallback,不能当长期正式方案
+
+当前 `sector_concentration_20` 的逻辑是:
+
+- 如果有足够行业分类,算 `industry_max_share`
+- 否则回退到 `weight_hhi_proxy`
+
+这个 fallback 可以作为临时兜底,但不能直接默认“长期可接受”。
+
+因为它回答的问题不一样:
+
+- `industry_max_share` 问的是:**是否集中在某个行业**
+- `weight_hhi_proxy` 问的是:**是否集中在少数个股权重**
+
+当指数内部存在“同一行业内部龙头高度集中”或“跨行业但少数龙头共振”时,这两个量未必等价。
+
+### 我的建议
+
+把它定义为:
+
+- **短期 degraded mode**:允许产线继续跑
+- **长期 production mode**:不能替代正式行业加权拥挤度
+
+### 如何量化它的替代误差
+
+在有正式行业分类的区间,同时计算:
+
+- `industry_max_share`
+- `weight_hhi_proxy`
+
+然后做三类检验:
+
+1. **252 日滚动 Spearman**
+2. **拥挤警报 top decile overlap**
+3. **作为 veto 输入时的 regime flag agreement**
+
+建议 acceptance:
+
+- rolling Spearman 中位数 ≥ 0.85
+- top decile overlap ≥ 80%
+- crowding veto agreement ≥ 85%
+
+达不到,就只能把 HHI proxy 留在 shadow monitor,而不能作为正式 hard feature。
+
+---
+
+## 最推荐的生产级元数据/权重策略
+
+## 结论先说
+
+**不要再把 Akshare `stock_individual_info_em` + Mairui fallback 当成 production 主链路。**
+
+我的推荐是:
+
+### 主链路:离线 snapshot + 周期刷新
+
+#### 权重 / 成员资格主来源
+
+- **Tushare `index_weight`**(月度指数成分和权重)
+- 用它构造:
+  - 历史成员资格
+  - 历史日级权重面板
+  - topN / weighted_ret / sector concentration 的权重基础
+
+#### 行业分类主来源
+
+优先顺序:
+
+1. **Tushare `index_member_all(ts_code=...)`** → 申万行业分级映射
+2. **Tushare `stock_basic`** → coarse industry fallback
+3. 你自己的离线快照库(本地 parquet/json snapshot)
+
+#### runtime fallback
+
+- Akshare / Mairui 不再做主数据源
+- 只在 snapshot 缺失或 bootstrap 时使用
+- 失败后也不要直接覆盖掉已有 resolved snapshot
+
+### 为什么这是正确方向
+
+因为这个项目需要的不是“股票详情页爬虫元数据”,而是:
+
+- 历史成员资格
+- 历史权重
+- 行业分组
+
+这三者本来就更适合走“指数/行业结构数据”,而不是“逐股票 meta 抓取”。
+
+---
+
+## 参数级建议
+
+## 1. retry / backoff / jitter
+
+### 对 runtime fallback provider(Akshare / Mairui)
+
+- 最大尝试次数:`5`
+- backoff:指数退避 `1.5s * 2^k`
+- jitter:`[0, 1.0s]` full jitter
+- 遇到 `Retry-After`:优先服从 header
+- metadata 请求并发:`1`
+- history 请求并发:`2~3`
+
+### 对 snapshot refresh job
+
+- provider 间不要并发风暴
+- 按 symbol 串行或小并发队列
+- 对 429 使用 token bucket / leaky bucket 限流
+
+## 2. cache TTL / invalidation
+
+### 行业 snapshot
+
+- soft TTL:`30 天`
+- hard TTL:`90 天`
+
+### index weight snapshot
+
+- soft TTL:`7 天`
+- hard TTL:`31 天`
+- 月末 / 调样后首个交易日强制刷新
+
+### negative cache
+
+- `429 / RemoteDisconnected / 5xx`:`6 小时`
+- 连续 3 次失败:最多扩到 `24 小时`
+- 超过 hard TTL 后,必须重试,不能永久吃旧错误缓存
+
+## 3. refresh 调度
+
+- 每个交易日收盘后:只补缺、只补过期条目
+- 每周末:全量核对 snapshot 完整性
+- 每月初(或指数权重更新后):全量更新 `index_weight`
+- 指数调样窗口:强制执行 full refresh
+
+---
+
+## 具体代码改造建议(给 Codex)
+
+## A. `data/breadth_builder.py`
+
+### 1)引入 snapshot-first 解析
+
+新增概念:
+
+- `membership_panel`
+- `weight_panel`
+- `industry_snapshot`
+
+新流程改成:
+
+1. 先加载本地 snapshot
+2. snapshot 缺失或过期时,刷新 snapshot
+3. 只在 snapshot 还不可用时,才走 runtime fallback
+4. runtime fallback 失败不能覆盖已有 resolved snapshot
+
+### 2)重写 meta cache 结构
+
+把 `_meta_cache.json` 改成带 freshness 的记录:
+
+```json
+{
+  "300750": {
+    "industry": "电池",
+    "provider_resolved": "snapshot",
+    "provider_attempted_chain": ["snapshot", "tushare"],
+    "status": "resolved",
+    "fetched_at": "2026-04-09T19:10:00Z",
+    "expires_at": "2026-05-09T19:10:00Z",
+    "error": ""
+  }
+}
+```
+
+### 3)停止 `float_shares -> 1.0` 这个默认回退
+
+把这条逻辑删掉。
+
+如果权重缺失:
+
+- 优先用 `index_weight`
+- 再不行,就把 weight-based columns 标成 degraded / unavailable
+- strict 模式下直接 block
+
+### 4)把 `entry_date` 真正用起来,或直接被 `index_weight` 替代
+
+当前 `entry_date` 已解析但未使用。最低限度也应该:
+
+- 在 `entry_date` 之前 mask 掉该 symbol 的价格历史
+
+但更推荐直接切到:
+
+- `index_weight` time-varying membership
+
+### 5)provider 诊断改语义
+
+新增:
+
+- `provider_resolved`
+- `provider_attempted_chain`
+- `metadata_status_counts`
+
+不要再让失败记录被统计成 `akshare` 成功。
+
+---
+
+## B. 新增 `data/index_metadata_snapshot.py`
+
+职责:
+
+- 下载 / 读取 `index_weight`
+- 下载 / 读取 `index_member_all`
+- 下载 / 读取 `stock_basic`
+- 生成本地 snapshot parquet/json
+- 输出 freshness summary
+
+建议输出:
+
+- `index_weight_snapshot.parquet`
+- `industry_snapshot.parquet`
+- `metadata_snapshot_manifest.json`
+
+---
+
+## C. 新增语义 gate
+
+建议在 ingestion / PIT 之前,插入:
+
+- `evaluate_breadth_semantic_gate()`
+
+检查:
+
+- `weight_source_mode`
+- `historical_membership_mode`
+- `industry_unknown_ratio`
+- `snapshot_age_days`
+- `resolved_weight_ratio`
+
+---
+
+## D. tests 现在缺的几项必须补上
+
+当前测试覆盖了:
+
+- metadata fallback
+- cache 命中
+- integrity gate 的低 std 宽免
+
+但缺最关键的生产风险测试。
+
+必须补:
+
+1. `test_meta_cache_refreshes_stale_unknown_records`
+2. `test_metadata_provider_counts_do_not_mark_failed_symbol_as_akshare`
+3. `test_pre_entry_history_is_masked`
+4. `test_index_weight_membership_replaces_latest_constituent_backfill`
+5. `test_weight_features_block_when_only_price_proxy_is_available`
+6. `test_semantic_gate_blocks_on_unknown_industry_ratio`
+
+---
+
+## OpenSpec 变更边界建议
+
+我建议把现在这件事拆成两个 change,不要全塞在一个 hardening 里。
+
+## Change A:`build-snapshot-first-index-metadata-and-weight-layer`
+
+范围:
+
+- 离线 snapshot
+- historical membership / weights
+- metadata cache freshness
+- semantic gate
+
+这是**数据语义层 change**。
+
+## Change B:`recalibrate-walkforward-economics-after-semantic-hardening`
+
+范围:
+
+- frozen validation 重跑
+- candidate ranking / utility 复核
+- defensive bias 评估
+
+这是**经济验证层 change**。
+
+这样拆的好处是:
+
+- 先把数据层闭合
+- 再看经济结果
+- 避免把数据错觉混成策略结论
+
+---
+
+## 方向把控:接下来不要做什么
+
+在 P0 修完之前,我不建议你现在就去做:
+
+- regime threshold 大调
+- exposure 映射重写
+- 再扩更多 signal
+- 继续围绕 `weight_hhi_proxy` 做复杂解释
+
+因为现在最可能发生的误判是:
+
+**用语义未闭合的数据,去优化已经被污染的经济目标。**
+
+这样只会把错误做得更精致。
+
+---
+
+## P0 修完后,下一步怎么判断方向是否正确
+
+重跑 full50 之后,我会先看这几个指标:
+
+### 数据层
+
+- `industry_unknown_ratio <= 0.02`(目标)
+- `industry_unknown_ratio <= 0.10`(硬门槛)
+- `weight_source_mode = official_index_weight`
+- `historical_membership_mode = time_varying_index_membership`
+- `metadata_error_symbol_count` 显著下降
+
+### 研究层
+
+- `positive_window_ratio >= 0.6`
+- 不再出现 “5 个窗口全选 defensive”
+- `utility_total_score > 0`
+- `utility_delta_vs_baseline > 0`
+- `upside_capture` 至少回到更合理水平(当前 0.259 太低)
+
+如果 P0 修完后,仍然是 defensive 一统天下、utility 仍为负,那就说明真正的主瓶颈是:
+
+- candidate set 太极端(defensive / baseline / pro_risk 不够)
+- utility 对 capture 的惩罚/奖励还不够合理
+- regime gating 过于偏向 drawdown 压缩
+
+那时再进入 controller 调参,才是对的顺序。
+
+---
+
+## 最后一段结论
+
+当前 blocker 文档把问题定义成“provider instability 导致 industry metadata 退化”,这个定义只说对了一半。
+
+**真正的问题是:当前 derived breadth 的 metadata / weight / membership 还没有生产级语义保障。**
+
+在这个前提下,`industry_unknown_ratio = 1.0` 不是一个孤立解释性问题,而是一个会进一步污染:
+
+- 权重类特征
+- 拥挤度特征
+- 历史成员资格语义
+- 最终 walk-forward 经济评估
+
+所以接下来最应该做的就是:
+
+> **把这一层从“runtime API 拼装层”升级成“snapshot-first 的指数结构数据层”。**
+
+修完这一步,再看策略经济性,项目才会真正回到正轨。
+

+ 587 - 0
research/chinext50_regime_project/chinext50_post_b3_detailed_guidance_for_codex_2026-04-10.md

@@ -0,0 +1,587 @@
+# Chinext50 Regime Project — Post-B3 Detailed Guidance for Codex (2026-04-10)
+
+## 0. Executive decision: let Codex implement, do not ask GPT to one-shot patch the repo
+
+### Recommendation
+Use **Codex as the implementer** and keep GPT-Pro in the role of:
+1. architecture / research guidance,
+2. block-level acceptance design,
+3. review after each block.
+
+### Why this is the better choice now
+The project is **no longer in a one-shot hotfix stage**. It is in an **iterative calibration stage** with:
+- an existing OpenSpec change,
+- block backups,
+- reproducible PIT data,
+- targeted tests,
+- guardrail-based rollback discipline.
+
+That is exactly the environment where Codex should operate.
+
+### When a direct GPT patch would make sense
+A direct patch from GPT would only be the better option if the remaining issue were:
+- a single deterministic semantic bug,
+- a self-contained report fix,
+- or a narrow formula replacement with no block sequencing.
+
+That is **not** the current situation.
+
+Current H1a evidence says the system is already in a trade-off regime:
+- `risk_off` improved from `0.36857` to `0.30747`,
+- `hard_pass_window_ratio` improved from `0.60` to `1.00`,
+- but `annual_return_delta` fell from `0.02040` to `-0.01205`,
+- and `upside_capture` fell from `0.36227` to `0.32051`.  
+This means the next step is **not a blind code patch**; it is a controlled sequence of state-machine and policy experiments. fileciteturn11file0
+
+---
+
+## 1. Current diagnosis
+
+### What B3/B4/H1a actually proved
+
+#### B3 proved
+- the semantic re-landing was directionally correct,
+- `trend + euphoric_late` was revived,
+- state thresholds were successfully moved into config.
+
+#### B4 proved
+- `positive_window_ratio` can safely be treated as diagnostic-only,
+- no numerical logic changed.
+
+#### H1a proved
+- the current system **can** reduce `risk_off` and recover hard-pass rate,
+- but **risk_off-only compression is not enough** to preserve offense.
+
+### The key inference
+**Do not spend many more cycles on H1a-only grids.**
+
+The current evidence strongly suggests:
+- H1a can fix the defense side,
+- but offense recovery now depends on **repair/trend boundary cleanup**, not on more risk_off-only tuning.
+
+So the correct next move is:
+1. **reframe H1a acceptance**,
+2. **accept H1a as a defense-improving intermediate state**,
+3. move to **H1b**,
+4. keep H2 and H3 later.
+
+---
+
+## 2. Direct answers to the four open questions
+
+## Q1. What should the next H1a thresholds be under the constraint “risk_off-only changes only”?
+
+### Short answer
+Do **not** continue a wide H1a-only search.
+
+### Why
+Your own notes already say that multiple small grids over `risk_off_*` and `crash_override_*` did **not** find a combination that simultaneously satisfies:
+- `risk_off <= 0.32`,
+- `trend + euphoric_late >= 0.14`,
+- `drawdown_ratio_vs_baseline <= 0.68`,
+- `annual_return_delta >= B3 - 0.01`. fileciteturn11file0
+
+That is a strong sign that the current H1a acceptance bundle is too strict **for a risk_off-only block**.
+
+### What to do instead
+Freeze the current H1a settings as the **defense-improving candidate**:
+
+```yaml
+state_machine:
+  thresholds:
+    risk_off_down_hazard: 0.68
+    risk_off_stress: 0.90
+    risk_off_trend_floor: -0.15
+    crash_override_down_hazard: 0.78
+```
+
+### One final micro-probe is acceptable, but only one
+If you want a final probe before formally freezing H1a, try only this one small variant:
+
+```yaml
+state_machine:
+  thresholds:
+    risk_off_down_hazard: 0.67
+    risk_off_stress: 0.89
+    risk_off_trend_floor: -0.14
+    crash_override_down_hazard: 0.77
+```
+
+### Why this probe
+This is a **slightly looser** version of current H1a, intended to:
+- keep `risk_off` close to the desired cap,
+- recover a small amount of annual return,
+- without giving back too much of the hard-pass improvement.
+
+### But set a strict limit
+Do **not** test more than **4 H1a-only combinations total** from this point forward.
+If none passes the revised acceptance below, freeze H1a and move on.
+
+---
+
+## Q2. If the original H1a acceptance is infeasible, how should it be adjusted?
+
+### Core principle
+H1a is a **defense block**, not a full-system victory block.
+
+So its acceptance should measure:
+- whether it improved defense,
+- whether it preserved enough offense to keep H1b worthwhile,
+- not whether it already solved the whole project.
+
+### Replace the old H1a acceptance with a two-tier acceptance
+
+## H1a Acceptance — Tier A (must pass)
+The block is allowed to proceed to H1b if all of these hold:
+
+- `risk_off <= 0.32`
+- `hard_pass_window_ratio >= 0.80`
+- `drawdown_ratio_vs_baseline <= 0.62`
+- `trend + euphoric_late >= 0.14`
+
+## H1a Acceptance — Tier B (offense carry, softer)
+At least one of the following should hold:
+
+- `annual_return_delta >= B3_annual_return_delta - 0.035`
+- `upside_capture >= B3_upside_capture - 0.05`
+
+### Why this is the right reframing
+Current H1a already satisfies the defense side very well:
+- `risk_off = 0.30747`
+- `hard_pass_window_ratio = 1.00`
+- `drawdown_ratio_vs_baseline = 0.58631`
+- `trend + euphoric_late = 0.14452`
+
+It fails only because the original offense floor was too strict for a risk_off-only move. fileciteturn11file0
+
+### Translation
+**Treat H1a as a partial-pass / defense-pass block, and let H1b recover offense.**
+
+That is the correct calibration logic.
+
+---
+
+## Q3. In H1b, should we first tune euphoric thresholds, or repair/trend thresholds?
+
+### Priority
+**First tune repair/trend. Do not touch euphoric first.**
+
+### Why
+At the current stage:
+- `trend + euphoric_late` is still too small to justify euphoric-first work,
+- the H1a result says return loss is coming from **admission / re-entry quality**,
+- not primarily from late-trend trimming.
+
+So the correct order is:
+1. **repair cleanup**,
+2. **trend release**,
+3. only then **euphoric refinement**.
+
+### Recommended H1b split
+
+# H1b.1 — Repair cleanup only
+
+#### Goal
+Reduce weak repair persistence and stop repair from absorbing borderline trend days.
+
+#### Files
+- `model/state_machine.py`
+- `config/regime.yaml`
+- `tests/test_policy.py`
+
+#### Code change
+Add two new configurable repair guards:
+
+```python
+repair_breadth_min
+repair_d_trend_min
+```
+
+#### Change `DEFAULT_THRESHOLDS`
+Add:
+
+```python
+'repair_breadth_min': 0.0,
+'repair_d_trend_min': 0.0,
+```
+
+#### Change repair condition in `_raw_state()`
+Add these clauses:
+
+```python
+and row['breadth_score'] >= thresholds['repair_breadth_min']
+and row['d_trend'] >= thresholds['repair_d_trend_min']
+```
+
+#### H1b.1 thresholds
+
+```yaml
+state_machine:
+  thresholds:
+    repair_hazard: 0.60
+    repair_stress_max: 0.72
+    repair_d_stress_max: 0.0
+    repair_breadth_min: 0.00
+    repair_d_trend_min: 0.00
+```
+
+Leave these unchanged in H1b.1:
+- risk_off thresholds,
+- trend thresholds,
+- euphoric thresholds,
+- policy mapping.
+
+#### H1b.1 target effect
+- fewer weak repair days,
+- more clean hand-off into trend,
+- minimal disturbance to H1a defense gains.
+
+#### H1b.1 acceptance
+- `risk_off <= 0.325`
+- `trend + euphoric_late >= 0.15`
+- `annual_return_delta >= H1a_annual_return_delta + 0.005`
+- `upside_capture >= H1a_upside_capture + 0.015`
+- `hard_pass_window_ratio >= 0.80`
+
+#### H1b.1 stop condition
+Rollback H1b.1 if any of these occurs:
+- `annual_return_delta < H1a_annual_return_delta - 0.01`
+- `drawdown_ratio_vs_baseline > 0.64`
+- `risk_off > 0.33`
+- `repair < 0.08` or `repair > 0.24`
+
+---
+
+# H1b.2 — Trend release only
+
+Run this only if H1b.1 is neutral-to-positive.
+
+#### Goal
+Recover offense by letting true trend days in more easily.
+
+#### Files
+- `model/state_machine.py`
+- `config/regime.yaml`
+- `tests/test_policy.py`
+
+#### H1b.2 thresholds
+
+```yaml
+state_machine:
+  thresholds:
+    trend_score: 0.42
+    trend_breadth_min: -0.02
+    trend_stress_max: 0.50
+```
+
+Leave unchanged in H1b.2:
+- risk_off thresholds,
+- euphoric thresholds,
+- policy mapping.
+
+#### Why these values
+They are deliberately small moves:
+- `trend_score 0.45 -> 0.42`
+- `trend_stress_max 0.45 -> 0.50`
+- `trend_breadth_min -0.05 -> -0.02`
+
+This gives trend slightly more room without collapsing breadth discipline.
+
+#### H1b.2 acceptance
+- `annual_return_delta >= max(-0.002, H1a_annual_return_delta + 0.010)`
+- `upside_capture >= 0.34`
+- `drawdown_ratio_vs_baseline <= 0.65`
+- `risk_off <= 0.33`
+- `hard_pass_window_ratio >= 0.80`
+- `trend + euphoric_late >= 0.16`
+
+#### H1b.2 stop condition
+Rollback H1b.2 if any of these occurs:
+- `drawdown_ratio_vs_baseline > 0.66`
+- `hard_pass_window_ratio < 0.80`
+- `risk_off > 0.33`
+- `annual_turnover` rises by `> +2.0` annualized while `annual_return_delta` does not improve by at least `+0.01`
+
+---
+
+## Q4. Should we do a light H3 before H1b because hard-pass improved to 1.0 but return fell?
+
+### Answer
+**No official H3 before H1b.**
+
+### Why
+The H1a outcome means:
+- candidate selection is now healthy enough to pass windows,
+- but classification still misallocates offense.
+
+If you touch H3 now, you will be changing candidate selection to compensate for a state classification problem.
+That is exactly the wrong order.
+
+### What is allowed
+You may run a **diagnostic-only policy elasticity check** with **no committed code change**:
+- take the H1a state path,
+- simulate small exposure perturbations,
+- report deltas only.
+
+#### Allowed dry-run perturbations
+- `trend +0.05`
+- `chop +0.05`
+- `repair_rebound_base +0.05`
+
+#### Purpose
+This is only to estimate:
+- how much of the return loss is due to state mix,
+- and how much is due to policy mapping.
+
+Do **not** treat that as H3.
+Do **not** commit any policy change based on it yet.
+
+---
+
+## 3. Concrete next block order
+
+Run blocks in this exact order:
+
+1. **H1a-reframe**
+2. **H1a-final-microprobe (optional, max 1 block / max 4 combos)**
+3. **H1b.1 repair cleanup**
+4. **H1b.2 trend release**
+5. **H2 policy mapping**
+6. **H3 candidate selection softening**
+
+---
+
+## 4. Detailed instructions for each next block
+
+# Block R0 — H1a reframe only (no code logic change unless report wording needs update)
+
+### Goal
+Change the experiment logic, not the state-machine logic.
+
+### Files
+- `docs/` or `deliverables/` summary notes if you keep these
+- optional: report wording if acceptance diagnostics are rendered somewhere
+
+### Required output
+Produce a short markdown or JSON note stating:
+- old H1a acceptance,
+- new two-tier H1a acceptance,
+- whether current H1a passes Tier A and Tier B.
+
+### Expected result
+Current H1a should be classified as:
+- **defense-pass**,
+- **offense-deferred-to-H1b**.
+
+---
+
+# Block R1 — Optional H1a final microprobe
+
+### Goal
+Try only one tiny looser risk_off bundle to see whether some return can be recovered without losing the H1a defense win.
+
+### Thresholds
+
+```yaml
+state_machine:
+  thresholds:
+    risk_off_down_hazard: 0.67
+    risk_off_stress: 0.89
+    risk_off_trend_floor: -0.14
+    crash_override_down_hazard: 0.77
+```
+
+### Acceptance
+- `risk_off <= 0.325`
+- `drawdown_ratio_vs_baseline <= 0.61`
+- `hard_pass_window_ratio >= 0.80`
+- `annual_return_delta >= current_H1a_annual_return_delta + 0.005`
+
+### Stop
+If this does not clearly beat current H1a on annual return **without losing the defense win**, abandon H1a search and move on.
+
+---
+
+# Block H1b.1 — Repair cleanup
+
+### Files
+- `model/state_machine.py`
+- `config/regime.yaml`
+- `tests/test_policy.py`
+
+### Code edits
+1. Extend `DEFAULT_THRESHOLDS`.
+2. Make repair clause use `repair_breadth_min` and `repair_d_trend_min`.
+3. Add config values to YAML.
+4. Add tests that repair requires non-negative `d_trend` and non-negative breadth when configured.
+
+### Acceptance
+Use H1b.1 acceptance above.
+
+---
+
+# Block H1b.2 — Trend release
+
+### Files
+- `config/regime.yaml`
+- `tests/test_policy.py`
+
+### Code edits
+Only threshold/config changes.
+Do **not** change policy mapping yet.
+
+### Acceptance
+Use H1b.2 acceptance above.
+
+---
+
+# Block H2 — Policy mapping only after H1b
+
+### Preconditions
+Run H2 only if H1b.2 produces:
+- `annual_return_delta >= -0.002`
+- `upside_capture >= 0.34`
+- `hard_pass_window_ratio >= 0.80`
+- `risk_off <= 0.33`
+
+### First H2 values
+
+```yaml
+policy:
+  trend: 0.95
+  euphoric_late: 0.70
+  chop: 0.35
+  risk_off: 0.00
+  repair_rebound_base: 0.40
+  repair_rebound_max: 0.85
+trading:
+  max_daily_exposure_change: 0.35
+```
+
+### H2 acceptance
+- `upside_capture >= H1b_best_upside_capture + 0.03`
+- `annual_return_delta >= H1b_best_annual_return_delta + 0.01`
+- `drawdown_ratio_vs_baseline <= 0.67`
+- `annual_turnover <= 14`
+
+### H2 stop
+Rollback H2 if:
+- annual return does not improve by at least `+0.01`,
+- upside does not improve by at least `+0.03`,
+- or drawdown ratio exceeds `0.67`.
+
+---
+
+# Block H3 — Candidate selection only after H2
+
+### Preconditions
+Do H3 only if H2 improves offense.
+
+### Goal
+Recover selection robustness without reverting to one single ultra-defensive winner.
+
+### Suggested H3 softening
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    upside_capture_min: 0.26
+    max_drawdown_ratio_vs_benchmark: 0.75
+    annual_turnover_soft_max: 19.0
+    annual_return_override_abs: 0.04
+    annual_return_override_ratio: 0.35
+    return_ratio_weight: 0.25
+    upside_weight: 0.25
+    drawdown_weight: 0.25
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.15
+    turnover_penalty_per_unit: 0.012
+    turnover_penalty_start: 13.0
+```
+
+### H3 acceptance
+- `hard_pass_window_ratio >= 0.80`
+- `primary_window_success_ratio >= 0.60`
+- `frontier_fallback_no_hard_pass <= 1`
+- no candidate wins `> 80%` of processed windows unless annual return clearly improves
+
+### H3 stop
+Rollback H3 if:
+- hard-pass rises but offense falls back materially,
+- or one candidate dominates again with no OOS improvement.
+
+---
+
+## 5. Global hidden issues Codex should keep in mind
+
+## G1. `days_since_riskoff` is almost certainly misnamed / miscomputed
+Current implementation:
+
+```python
+out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+```
+
+This is **not** “days since risk_off”.
+It is also **not** the proper risk_off streak length.
+It is a cumulative count of all historical risk_off days, reset to zero off-state.
+
+### Recommendation
+Do not fix this inside H1/H2/H3 unless a downstream module is actively using it.
+But mark it as a semantic bug for the next blocker cycle.
+
+## G2. Avoid early euphoric tuning
+`euphoric_late` should not be touched until repair/trend admission is healthier.
+Right now its share is too small for first-order tuning.
+
+## G3. Avoid policy compensation before H1b
+Do not let policy mapping mask classification errors.
+A stronger `trend` or `chop` exposure can make the report look better while the regime logic stays wrong.
+
+## G4. Turnover is still at risk of being over-penalized in candidate selection
+Even after earlier fixes, keep checking whether turnover is effectively being punished in multiple layers:
+- hard constraints,
+- utility,
+- selection score.
+
+Do not change this in H1b, but keep it visible when you reach H3.
+
+---
+
+## 6. Exact Codex operating instruction
+
+Use this as the header when sending the task to Codex:
+
+```text
+Please do not ask GPT-Pro to directly patch the repo.
+Continue with Codex implementation under the existing OpenSpec workflow.
+
+Execute in this exact order:
+R0 -> R1(optional) -> H1b.1 -> H1b.2 -> H2 -> H3.
+
+Important:
+- Treat current H1a as a defense-improving intermediate state.
+- Do not spend more than one microprobe block on H1a.
+- Do not touch policy mapping before H1b.2.
+- Do not touch candidate selection before H2.
+
+For each block:
+1. list modified files,
+2. run targeted tests,
+3. report stitched metrics delta,
+4. report state mix delta,
+5. rollback only the current block if stop conditions fail.
+```
+
+---
+
+## 7. Final bottom line
+
+At this point the right move is **not** “let GPT directly rewrite the repo.”
+The right move is:
+
+1. keep the current repo state and OpenSpec chain,
+2. let Codex implement the next calibrated blocks,
+3. let GPT-Pro continue to define block boundaries and review results.
+
+That will preserve rollback safety, experiment traceability, and the logic of the calibration process.

+ 545 - 0
research/chinext50_regime_project/chinext50_post_b3_feedback_response_for_codex_2026-04-10.md

@@ -0,0 +1,545 @@
+# Chinext50 Regime Project — Post-B3 Feedback Response for Codex (2026-04-10)
+
+## 0. Read this first
+
+This document is the next-step instruction after the latest post-B3 feedback bundle.
+Current stabilized anchor is **R1 baseline**.
+
+Do **not** jump to H2 or H3 yet.
+Do **not** let policy mapping absorb state-layer problems.
+Do **not** rerun broad grid search.
+
+Proceed in this order:
+
+1. `D0` (optional but recommended): add one diagnostic-only report for state-conditioned exposures
+2. `H1b.1-L1`
+3. `H1b.1-L2`
+4. `H1b.1-L3`
+5. If all three fail, **stop H1b.1** and move to `H1b.2-direct-from-R1`
+6. Only after H1b.2 succeeds/near-succeeds, consider H2
+
+---
+
+## 1. Current read of the system
+
+### 1.1 R1 is the correct current anchor
+R1 improved annual return vs H1a while preserving a good defense profile.
+Use R1 as the comparison anchor for all further blocks.
+
+### 1.2 Why H1b.1 failed twice
+The failure pattern matters more than the raw stop trigger.
+
+Observed from the latest retry vs R1:
+- `annual_return_delta`: `0.008913 -> -0.004561`
+- `drawdown_ratio_vs_baseline`: `0.575317 -> 0.657910` (stop)
+- `upside_capture`: `0.324220 -> 0.344080`
+- `hard_pass_window_ratio`: `0.80 -> 1.00`
+- state mix:
+  - `risk_off`: ~unchanged
+  - `trend`: unchanged
+  - `repair`: down sharply
+  - `chop`: up sharply
+  - `euphoric_late`: slightly up
+
+### 1.3 Interpretation
+This means the attempted H1b.1 cleanup did **not** create more real trend days.
+It mostly did this:
+- removed a chunk of repair days,
+- pushed many of them into chop,
+- slightly increased late-stage risk-on exposure,
+- but did not increase true trend occupancy.
+
+That is why upside went up a bit but drawdown got worse.
+
+### 1.4 Hidden coupling you must respect
+Even when policy mapping is unchanged, H1b.1 can still change **effective exposure** materially because:
+- repair rows that survive become “stronger” rows,
+- state-conditioned caps may trigger less often,
+- target exposure inside a state is not constant.
+
+This is an important system fact:
+
+> **State cleanup can implicitly become policy loosening, even without changing `policy:` values.**
+
+So for H1b.1 we must watch not only state mix, but also:
+- `mean_target_exposure`
+- state-conditioned mean target exposure
+- cap-hit rates by state
+
+---
+
+## 2. Immediate decision
+
+### Recommendation
+**Do not hand over a broad “keep tuning H1b.1” task.**
+Give Codex a **small parameter ladder** with explicit acceptance and stop conditions.
+
+### Why
+The latest evidence suggests H1b.1 can easily become a hidden risk-on block.
+If you let Codex free-search here, it will likely keep finding “slightly better upside, much worse drawdown” variants.
+
+---
+
+## 3. D0 — optional but strongly recommended diagnostic-only block
+
+### Goal
+Before trying more H1b.1 variants, expose the hidden coupling between state changes and effective exposure.
+
+### Files
+- `pipelines/real_walkforward_report.py`
+- optionally a small helper in `backtest/frozen_walkforward.py` if needed
+- tests only if required for stable output contract
+
+### Add these diagnostics to stitched summary/report
+For the stitched OOS ledger, add:
+- `mean_target_exposure`
+- `mean_executed_exposure`
+- `state_conditioned_mean_target_exposure`
+- `state_conditioned_mean_executed_exposure`
+- `cap_hit_rate_overall`
+- `cap_hit_rate_by_state`
+- `state_mix`
+
+Suggested cap-hit definitions from `veto_reason` / cap reason fields:
+- `stress_cap`
+- `breadth_divergence_cap`
+- `crowding_cap`
+
+### Acceptance
+- no metric semantics change
+- no strategy outputs change
+- report gains new diagnostics only
+
+### Stop condition
+- if any existing stitched metrics change, rollback D0
+
+### Why this is worth doing
+Current H1b.1 failures are not fully visible if you only look at:
+- state mix,
+- annual return,
+- drawdown ratio.
+
+The real hidden variable is **effective exposure drift**.
+
+---
+
+## 4. H1b.1 — revised strategy
+
+## Core principle
+Do **not** start with aggressive repair cleanup.
+The failed attempts indicate that strong repair cleanup removes repair mass but does not create trend.
+
+So the new H1b.1 ladder must be:
+- **boundary-first**,
+- then **micro-tighten**,
+- then **mixed micro**,
+- and stop quickly if trend does not respond.
+
+### Required code change before H1b.1 ladders
+Current R1 state machine does not yet safely expose the repair boundary knobs we need.
+Make this change first in a backward-compatible way.
+
+#### Files
+- `model/state_machine.py`
+- `config/regime.yaml`
+- `tests/test_policy.py`
+
+#### Add new thresholds with permissive defaults
+In `DEFAULT_THRESHOLDS`, add:
+
+```python
+'repair_breadth_min': -1.0,
+'repair_d_trend_min': -1.0,
+```
+
+In repair condition, extend to:
+
+```python
+if (
+    row['repair_hazard'] >= thresholds['repair_hazard']
+    and row['stress_score'] <= thresholds['repair_stress_max']
+    and row['d_stress'] <= thresholds['repair_d_stress_max']
+    and row['trend_score'] < thresholds['trend_score']
+    and row['breadth_score'] >= thresholds['repair_breadth_min']
+    and row['d_trend'] >= thresholds['repair_d_trend_min']
+):
+    return 'repair'
+```
+
+In `config/regime.yaml`, keep R1 behavior unchanged by setting:
+
+```yaml
+state_machine:
+  thresholds:
+    repair_breadth_min: -1.0
+    repair_d_trend_min: -1.0
+```
+
+#### Add tests
+Add tests that verify:
+1. permissive defaults preserve existing repair behavior
+2. tightening `repair_breadth_min` can exclude a weak repair row
+3. tightening `repair_d_trend_min` can exclude a deteriorating repair row
+4. risk_off precedence is still intact
+
+### Acceptance for this preparatory code change
+- no change in R1 metrics
+- no change in R1 state mix
+- targeted policy tests pass
+
+### Stop condition
+- if R1 metrics move at all after only adding permissive defaults, rollback
+
+---
+
+## 5. H1b.1-L1 — boundary-only cleanup
+
+### Intent
+Clean the weakest repair rows **without shrinking repair too much**.
+This is the safest first attempt.
+
+### Exact params
+Starting from current R1 config:
+
+```yaml
+state_machine:
+  thresholds:
+    repair_hazard: 0.58          # unchanged
+    repair_stress_max: 0.85      # unchanged
+    repair_d_stress_max: 0.0     # unchanged
+    repair_breadth_min: -0.02    # from -1.0 permissive baseline
+    repair_d_trend_min: -0.01    # from -1.0 permissive baseline
+```
+
+### Files
+- `config/regime.yaml`
+- `tests/test_policy.py` if config-coupled fixtures/assertions exist
+
+### Target effect
+- prune only clearly weak repair rows
+- preserve most repair occupancy
+- avoid the repair -> chop collapse seen in failed H1b.1 attempts
+
+### Acceptance
+All must hold:
+- `drawdown_ratio_vs_baseline <= 0.62`
+- `annual_return_delta >= R1_annual_return_delta - 0.003`
+- `upside_capture >= R1_upside_capture + 0.004`
+- `hard_pass_window_ratio >= 0.80`
+- `risk_off <= 0.325`
+- `repair in [0.14, 0.18]`
+- `trend + euphoric_late >= R1_trend_plus_euphoric`
+- `mean_target_exposure <= R1_mean_target_exposure + 0.010`
+
+### Stop condition
+Rollback L1 if any occurs:
+- `drawdown_ratio_vs_baseline > 0.63`
+- `repair < 0.13`
+- `trend + euphoric_late < R1_trend_plus_euphoric - 0.002`
+- `mean_target_exposure > R1_mean_target_exposure + 0.015`
+
+### Interpretation rule
+If L1 yields:
+- no drawdown breach,
+- small or flat offense improvement,
+- and repair remains stable,
+then proceed to L2.
+
+If L1 already worsens drawdown materially without trend gain, **skip L2/L3 and go direct to H1b.2 from R1**.
+
+---
+
+## 6. H1b.1-L2 — hazard-only micro-tighten
+
+### Intent
+Add only a **very small** cleanup pressure if L1 is neutral-to-positive.
+Do not change stress too aggressively.
+
+### Exact params
+Starting from L1 (not directly from R1):
+
+```yaml
+state_machine:
+  thresholds:
+    repair_hazard: 0.59
+    repair_stress_max: 0.85
+    repair_d_stress_max: 0.0
+    repair_breadth_min: -0.02
+    repair_d_trend_min: -0.01
+```
+
+### Why
+The previous failed attempts were too eager on repair cleanup.
+This step tightens only one dimension.
+
+### Acceptance
+All must hold:
+- `drawdown_ratio_vs_baseline <= 0.635`
+- `annual_return_delta >= R1_annual_return_delta - 0.004`
+- `upside_capture >= R1_upside_capture + 0.006`
+- `hard_pass_window_ratio >= 0.80`
+- `risk_off <= 0.325`
+- `repair in [0.13, 0.18]`
+- `trend + euphoric_late >= R1_trend_plus_euphoric + 0.002`
+- `mean_target_exposure <= R1_mean_target_exposure + 0.012`
+
+### Stop condition
+Rollback L2 if any occurs:
+- `drawdown_ratio_vs_baseline > 0.64`
+- `annual_return_delta < 0.0`
+- `repair < 0.12`
+- `trend + euphoric_late <= R1_trend_plus_euphoric`
+- `mean_target_exposure > R1_mean_target_exposure + 0.015`
+
+### Interpretation rule
+If L2 improves upside but trend is still flat and repair drops materially, do **not** attempt a stricter H1b.1. Move to direct H1b.2.
+
+---
+
+## 7. H1b.1-L3 — mixed micro cleanup (last H1b.1 try)
+
+### Intent
+This is the final H1b.1 attempt.
+Use only if L1 or L2 is close-but-not-enough, and drawdown remains controlled.
+
+### Exact params
+Starting from R1 (not cumulative from failed ladders unless explicitly approved):
+
+```yaml
+state_machine:
+  thresholds:
+    repair_hazard: 0.59
+    repair_stress_max: 0.80
+    repair_d_stress_max: 0.0
+    repair_breadth_min: -0.01
+    repair_d_trend_min: 0.00
+```
+
+### Notes
+- Do **not** set `repair_breadth_min >= 0.00` in H1b.1.
+- Do **not** set `repair_hazard >= 0.60` in H1b.1.
+- Do **not** set `repair_stress_max <= 0.78` in H1b.1.
+
+Those stronger settings are exactly the zone most likely to recreate the prior failure pattern.
+
+### Acceptance
+All must hold:
+- `drawdown_ratio_vs_baseline <= 0.64`
+- `annual_return_delta >= R1_annual_return_delta - 0.003`
+- `upside_capture >= R1_upside_capture + 0.010`
+- `hard_pass_window_ratio >= 0.80`
+- `risk_off <= 0.325`
+- `repair in [0.12, 0.17]`
+- `trend + euphoric_late >= R1_trend_plus_euphoric + 0.004`
+- `mean_target_exposure <= R1_mean_target_exposure + 0.015`
+
+### Stop condition
+Rollback L3 if any occurs:
+- `drawdown_ratio_vs_baseline > 0.64`
+- `annual_return_delta < 0.0`
+- `repair < 0.11`
+- `trend + euphoric_late < R1_trend_plus_euphoric + 0.002`
+- `mean_target_exposure > R1_mean_target_exposure + 0.018`
+
+---
+
+## 8. Hard rule for abandoning H1b.1
+
+If **all three** ladders fail, or if the first two ladders share this pattern:
+- `trend + euphoric_late` improvement `< +0.003`
+- `repair` drops by `>= 0.02`
+- `drawdown_ratio_vs_baseline` worsens by `>= +0.03`
+
+then conclude:
+
+> **H1b.1 is not the right lever on current PIT data.**
+
+At that point:
+- stop H1b.1,
+- revert to R1,
+- move directly to `H1b.2-direct-from-R1`.
+
+Do **not** keep searching stricter repair cleanup bundles.
+
+---
+
+## 9. H1b.2-direct-from-R1 — allowed fallback path
+
+### Why this path exists
+Current evidence says H1b.1 tends to rearrange repair/chop without creating more true trend.
+So if H1b.1 fails, the next rational move is a **small trend admission release directly from R1**.
+
+### Exact params
+Starting from R1 config:
+
+```yaml
+state_machine:
+  thresholds:
+    trend_score: 0.43
+    trend_breadth_min: -0.03
+    trend_stress_max: 0.48
+```
+
+Leave unchanged:
+- all risk_off thresholds,
+- all repair thresholds,
+- all euphoric thresholds,
+- all policy mapping.
+
+### Why these values
+This is deliberately narrower than the earlier suggested full H1b.2 bundle.
+It is a **lite release**, not a broad trend loosen.
+
+### Acceptance
+All must hold:
+- `annual_return_delta >= R1_annual_return_delta + 0.004`
+- `upside_capture >= R1_upside_capture + 0.012`
+- `drawdown_ratio_vs_baseline <= 0.62`
+- `hard_pass_window_ratio >= 0.80`
+- `risk_off <= 0.33`
+- `trend + euphoric_late >= 0.155`
+
+### Stop condition
+Rollback if any occurs:
+- `drawdown_ratio_vs_baseline > 0.64`
+- `risk_off > 0.33`
+- `hard_pass_window_ratio < 0.80`
+- `annual_turnover` rises by `> +2.0` annualized while `annual_return_delta` improves by `< +0.005`
+
+### If this passes
+Then and only then move to H2.
+
+---
+
+## 10. H2 — only after state-layer progress
+
+Do not start H2 unless either:
+- one of the H1b.1 ladders passes, or
+- `H1b.2-direct-from-R1` passes.
+
+### Reason
+Right now the biggest remaining risk is not missing policy alpha.
+It is that policy would compensate for unresolved state misclassification and hidden exposure drift.
+
+### H2 first move when allowed
+If H2 becomes allowed, the first policy adjustment should be **small chop/trend lift**, not euphoric lift.
+
+Recommended first H2 bundle:
+
+```yaml
+policy:
+  trend: 0.95
+  chop: 0.35
+```
+
+Leave unchanged initially:
+- `repair_rebound_base`
+- `repair_rebound_max`
+- `euphoric_late`
+
+### Why
+Current weak offense appears more constrained by:
+- too little trend occupancy,
+- too much chop with low carry,
+than by euphoric throttling.
+
+---
+
+## 11. Do not do official H3 yet
+
+Only allow **diagnostic-only** H3 dry-run if needed.
+Do not adopt H3 formally before H1b/H2 pass.
+
+Reason:
+- current bottleneck is still state admission,
+- not candidate weighting.
+
+If you do a dry-run, it must be read-only:
+- compute candidate ranking under alternate weights,
+- do not change selected production path,
+- do not update config defaults.
+
+---
+
+## 12. Global hidden issues discovered from this feedback
+
+These are not all immediate blockers for H1b, but they are real.
+
+### G1. `days_since_riskoff` is semantically wrong
+Current code:
+
+```python
+out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+```
+
+This is not “days since last risk_off”.
+It is closer to “count of risk_off occurrences on risk_off rows, else 0”.
+
+Do not fix this inside H1b.
+But log it as a deferred semantic bug.
+
+### G2. `days_since_breakout` is also suspicious
+Current logic behaves like “current streak while breakout_dist_120 > 0”, not clearly “days since breakout event”.
+Treat as another deferred semantic bug.
+
+### G3. `min_state_duration` is one-sided
+Current state machine only checks whether the **current state** has lasted long enough before switching.
+It does **not** require the proposed new state to persist for multiple days.
+
+This may be intended, but it is not the only reasonable interpretation of “min state duration”.
+Do not change it inside H1b.
+But record it as a possible later redesign item.
+
+### G4. State changes are currently too easy to misread
+You must evaluate **both**:
+- state mix,
+- state-conditioned exposure means.
+
+If you only watch state mix, you will miss the main failure mode seen in H1b.1.
+
+---
+
+## 13. Exact Codex execution instructions
+
+Use this instruction block when handing over:
+
+```text
+Please execute the next-step ladder exactly as documented in this md.
+
+Order:
+1. D0 (diagnostic-only, optional but recommended)
+2. preparatory repair-threshold code change with permissive defaults
+3. H1b.1-L1
+4. H1b.1-L2
+5. H1b.1-L3
+6. if all H1b.1 ladders fail, revert to R1 and run H1b.2-direct-from-R1
+
+Rules:
+- Compare every block against R1 unless the md explicitly says cumulative-from-L1.
+- Do not change policy mapping before H1b.2 passes.
+- Do not touch euphoric thresholds in H1b.1.
+- After each block, report:
+  1. changed files
+  2. targeted tests run
+  3. stitched metrics delta vs R1
+  4. state mix delta vs R1
+  5. mean_target_exposure delta vs R1
+  6. state-conditioned mean target exposure delta vs R1
+- If any stop condition triggers, rollback only the current block.
+- If L1 and L2 both show trend+euphoric improvement < +0.003 while repair drops >= 0.02, stop H1b.1 and move directly to H1b.2-direct-from-R1.
+```
+
+---
+
+## 14. Bottom line
+
+The current evidence does **not** say “the system direction is wrong”.
+It says something narrower:
+
+> Current H1b.1 attempts are cleaning repair too aggressively, and that cleanup is leaking into higher effective exposure without generating enough new trend.
+
+So the right next move is:
+- smaller H1b.1 boundary experiments,
+- explicit exposure diagnostics,
+- and a fast exit to direct H1b.2 if trend still does not respond.

+ 585 - 0
research/chinext50_regime_project/chinext50_post_b3_next_steps_for_codex_2026-04-10.md

@@ -0,0 +1,585 @@
+# Chinext50 Regime — Post-B3 Next Steps for Codex (2026-04-10)
+
+This document answers the immediate question after the B3 rollback and gives the correct next execution order.
+
+Use this document **instead of improvising the next round**.
+
+---
+
+## 0. Short answer
+
+**Do not jump to H1/H2/H3 on top of B2.**
+
+The next step is:
+
+1. **re-land B3 as a semantic-only fix**,
+2. then do **B4**,
+3. then run **H1 as two smaller passes**,
+4. only after H1 passes, move to **H2**, then **H3**.
+
+The most important correction is this:
+
+> The guardrail `risk_off <= 0.32` was applied too early to B3.
+>
+> That guardrail belongs to **H1 (threshold tuning)**, not to **B3 (precedence / semantic fix)**.
+
+So the rollback was operationally understandable, but **logically premature**.
+
+---
+
+## 1. What the current status actually means
+
+### What B1 and B2 achieved
+- B1 fixed the semantics of the main report comparison.
+- B2 removed the worst turnover-selection bias and materially improved stitched utility.
+
+That means the system is now in a better place to inspect state logic.
+
+### What the attempted B3 actually showed
+The attempted B3 did **not** prove that the B3 idea was wrong.
+It proved something narrower:
+
+- when `repair` no longer shadows `trend`,
+- `trend + euphoric_late` can come back to life,
+- but **risk_off remains too high** if the `risk_off` thresholds are still too strict.
+
+So the attempted B3 output is better interpreted as:
+
+- **B3 semantic direction looks correct**
+- but **B3 alone is not enough to compress risk_off**
+
+That is exactly what H1 was supposed to do.
+
+---
+
+## 2. Core correction: B3 and H1 were mixed together
+
+### B3 should be judged by semantic correctness
+B3 is about one thing:
+
+> Does the state machine classify overlap days correctly, so that valid trend days are not swallowed by repair?
+
+So B3 acceptance should be based on:
+- overlap-resolution unit tests,
+- no catastrophic metric damage,
+- evidence that `trend/euphoric_late` is no longer artificially suppressed.
+
+### H1 should be judged by state-mix targets
+H1 is where these targets belong:
+- `trend + euphoric_late >= 0.12`
+- `risk_off <= 0.32`
+
+Those are **threshold-tuning goals**, not B3 semantic-fix goals.
+
+---
+
+## 3. Correct execution order from here
+
+### Step A — Re-land B3 as semantic-only
+### Step B — Execute B4 (diagnostic-only cleanup)
+### Step C — Run H1a (risk_off compression only)
+### Step D — Run H1b (repair/trend cleanup)
+### Step E — Only then run H2 (policy mapping)
+### Step F — Only then run H3 (hard-pass restoration)
+
+Do **not** insert candidate-level robustness filtering before H3.
+
+---
+
+# 4. B3 re-landing plan (semantic-only)
+
+## Goal
+Re-apply the `repair` vs `trend` precedence fix, but evaluate it with the correct guardrails.
+
+## Target files
+- `model/state_machine.py`
+- `tests/test_policy.py` **or** new `tests/test_state_machine.py`
+
+## Exact code intent
+In `_raw_state()`:
+
+Current ordering is effectively:
+1. `risk_off`
+2. `repair`
+3. `trend/euphoric_late`
+4. `chop`
+
+This should become:
+1. `risk_off`
+2. `trend/euphoric_late`
+3. `repair`
+4. `chop`
+
+### Preferred implementation
+Reorder the checks.
+
+Pseudo-code:
+
+```python
+if risk_off_condition:
+    return 'risk_off'
+
+if trend_condition:
+    if euphoric_condition:
+        return 'euphoric_late'
+    return 'trend'
+
+if repair_condition:
+    return 'repair'
+
+return 'chop'
+```
+
+### Alternative implementation
+If you do not want to reorder, then make repair explicitly exclusive:
+
+```python
+repair_condition = (
+    repair_hazard >= repair_threshold
+    and stress_score <= repair_stress_max
+    and d_stress <= 0.0
+    and trend_score < trend_threshold
+)
+```
+
+But **reordering is preferred**, because it matches the intended economic semantics better.
+
+## Required tests
+Add / update tests that verify:
+
+1. when a row satisfies both `repair` and `trend`, final proposal is `trend`
+2. when a row satisfies `trend` and euphoric condition, final proposal is `euphoric_late`
+3. when `risk_off` also fires, `risk_off` still wins
+
+## Correct B3 acceptance criteria
+B3 should pass if:
+
+1. unit tests pass,
+2. overlap rows resolve to `trend` not `repair`,
+3. `trend + euphoric_late` does not fall vs B2,
+4. no catastrophic stitched deterioration occurs.
+
+### Use these B3 guardrails
+- `tests/test_policy.py` or `tests/test_state_machine.py` pass
+- `trend + euphoric_late >= B2_trend_plus_euphoric`
+- `stitched annual_return_delta >= B2_annual_return_delta - 0.015`
+- `stitched drawdown_ratio_vs_baseline <= B2_drawdown_ratio_vs_baseline + 0.05`
+
+### Do **not** use this as a B3 guardrail
+- `risk_off <= 0.32`
+
+That belongs to H1.
+
+---
+
+# 5. B4 should be executed before H1
+
+## Goal
+Make sure `positive_window_ratio` is labeled and treated as diagnostic-only.
+
+## Why now
+Because after B2, it is already misleading to treat `positive_window_ratio` as a primary acceptance proxy.
+You should clean this up before running the next tuning pass, so future comparisons are less confusing.
+
+## Target files
+- `backtest/frozen_walkforward.py`
+- `pipelines/real_walkforward_report.py`
+
+## Exact change
+- keep the field for backward compatibility
+- explicitly label it as **diagnostic only** in report text and summary comments
+- ensure candidate acceptance does **not** depend on it directly
+
+## Acceptance
+- tests pass
+- report clearly says primary acceptance is based on `primary_window_success_ratio` and `hard_pass_window_ratio`
+
+---
+
+# 6. H1 should be split into H1a and H1b
+
+Do **not** do one giant threshold rewrite.
+
+The attempted B3 already showed that changing classification and changing thresholds at the same time makes diagnosis harder.
+
+So split H1 into two micro-blocks.
+
+---
+
+## H1a — Compress risk_off only
+
+### Goal
+Reduce `risk_off` frequency without touching the repair/trend competition too much.
+
+### Why first
+Local replay shows current `risk_off` frequency is dominated mainly by the `down_hazard` clause, not by the `(stress_score, trend_score)` sub-clause.
+So the highest-value first move is to loosen the main `down_hazard` trigger slightly.
+
+### Target file
+- `model/state_machine.py`
+- optionally `config/regime.yaml` if you parameterize thresholds properly
+
+### Exact threshold changes for H1a
+Use these first, and do **not** add extra threshold moves in the same pass:
+
+```python
+risk_off:
+  down_hazard >= 0.68      # from 0.62
+  or (stress_score >= 0.90 and trend_score <= -0.15)   # from 0.85 / -0.10
+
+crash_override:
+  down_hazard >= 0.78      # from 0.72
+```
+
+### Important
+Leave these unchanged in H1a:
+- repair threshold
+- trend threshold
+- euphoric threshold
+- policy mapping
+
+### Expected direction
+- `risk_off` should fall materially
+- `trend + euphoric_late` should stay alive after B3 re-landing
+- drawdown may rise slightly, but should remain acceptable
+
+### H1a acceptance criteria
+- `risk_off <= 0.32`
+- `trend + euphoric_late >= 0.14`
+- `stitched drawdown_ratio_vs_baseline <= 0.68`
+- `stitched annual_return_delta >= B3_annual_return_delta - 0.01`
+
+### H1a stop conditions
+Rollback H1a if any of these occurs:
+- `risk_off > 0.32`
+- `drawdown_ratio_vs_baseline > 0.70`
+- `trend + euphoric_late < 0.12`
+
+---
+
+## H1b — Clean up repair vs trend boundaries
+
+Run H1b only if H1a passes.
+
+### Goal
+Reduce weak repair persistence and improve true trend admission.
+
+### Target file
+- `model/state_machine.py`
+- optionally `config/regime.yaml`
+
+### Exact threshold changes for H1b
+Start with this conservative bundle:
+
+```python
+trend:
+  trend_score >= 0.40      # from 0.45
+  breadth_score >= -0.05   # unchanged in first H1b pass
+  stress_score <= 0.50     # from 0.45
+
+euphoric_late:
+  crowding_score >= 0.78   # from 0.70
+  or rebound_hazard >= 0.76  # from 0.68
+
+repair:
+  repair_hazard >= 0.60    # from 0.58
+  stress_score <= 0.75     # from 0.85
+  d_stress <= 0.0          # unchanged
+  trend_score < 0.40       # new exclusivity guard if needed
+```
+
+### Why this bundle
+- `trend_score` threshold moves only a little
+- `stress_score` cap for trend loosens only a little
+- `repair` becomes slightly stricter and less likely to absorb borderline trend days
+- `euphoric_late` becomes more selective, preventing premature late-stage classification
+
+### H1b acceptance criteria
+- `risk_off <= 0.32`
+- `trend + euphoric_late >= 0.16`
+- `repair` share stays in a plausible band: `0.12 <= repair <= 0.22`
+- `stitched upside_capture >= H1a_upside_capture + 0.02`
+- `stitched drawdown_ratio_vs_baseline <= 0.70`
+
+### H1b stop conditions
+Rollback H1b if any of these occurs:
+- `repair < 0.10` or `repair > 0.25`
+- `risk_off > 0.32`
+- `drawdown_ratio_vs_baseline > 0.70`
+- `upside_capture` does not improve but turnover rises materially
+
+---
+
+# 7. H2 — Policy tuning only after H1 passes
+
+Do **not** touch policy mapping before H1 passes.
+
+## Why
+Right now the biggest bottleneck is still classification quality.
+If you change policy too early, you will be compensating for a classification problem with exposure hacks.
+
+## Target files
+- `model/policy.py`
+- `config/regime.yaml`
+
+## First-pass H2 values
+Use this only after H1b passes:
+
+```yaml
+policy:
+  trend: 0.95
+  euphoric_late: 0.70
+  chop: 0.35
+  risk_off: 0.00
+  repair_rebound_base: 0.40
+  repair_rebound_max: 0.85
+trading:
+  max_daily_exposure_change: 0.35
+```
+
+## H2 acceptance criteria
+- `stitched upside_capture >= H1_best_upside_capture + 0.05`
+- `stitched drawdown_ratio_vs_baseline <= 0.70`
+- `annual_turnover <= 14`
+- `mean executed exposure <= 0.50`
+
+## H2 stop conditions
+Rollback H2 if:
+- upside does not improve by at least `+0.03`
+- drawdown ratio exceeds `0.70`
+- turnover increases by more than `+3` annualized without upside benefit
+
+---
+
+# 8. H3 — Restore hard-pass ratio only after H2
+
+## Why later
+Current `hard_pass_window_ratio = 0.6` is low, but restoring it too early can simply push the system back toward one overly defensive candidate.
+That is exactly what we want to avoid.
+
+So H3 should happen **after** state mix and exposure are less distorted.
+
+## Target file
+- `config/regime.yaml`
+
+## First-pass H3 changes
+Use these values only after H2:
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    upside_capture_min: 0.26
+    max_drawdown_ratio_vs_benchmark: 0.75
+    annual_turnover_soft_max: 19.0
+    annual_return_override_abs: 0.04
+    annual_return_override_ratio: 0.35
+    return_ratio_weight: 0.25
+    upside_weight: 0.25
+    drawdown_weight: 0.25
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.15
+    turnover_penalty_per_unit: 0.012
+    turnover_penalty_start: 13.0
+```
+
+## H3 acceptance criteria
+- `hard_pass_window_ratio >= 0.80`
+- `frontier_fallback_no_hard_pass <= 1`
+- `primary_window_success_ratio >= 0.60`
+- no candidate wins `> 80%` of processed windows unless performance clearly improves
+
+## H3 stop conditions
+Rollback H3 if:
+- one candidate dominates `>= 80%` of windows again,
+- `annual_return_delta` falls,
+- or `drawdown_ratio_vs_baseline` worsens meaningfully.
+
+---
+
+# 9. Should candidate-level robustness filtering be added now?
+
+## Short answer
+**No. Not yet.**
+
+## Why
+At this stage, extra robustness filtering would likely hide unresolved problems in:
+- state classification,
+- risk_off over-triggering,
+- and candidate selection calibration.
+
+That would be treating symptoms before the main structure is fixed.
+
+## Correct timing
+Candidate-level robustness filtering becomes reasonable **after H3**, and even then:
+- start as a **fallback tie-breaker**,
+- not as a hard pre-filter.
+
+### Acceptable later version
+Only in fallback mode, prefer candidates with:
+- better constraint margin,
+- lower cross-window rank variance,
+- and lower exposure volatility.
+
+But do **not** add this before H3.
+
+---
+
+# 10. Hidden / global issues Codex should keep in mind
+
+## G1. Thresholds are still hardcoded inside `model/state_machine.py`
+This is a real maintainability bug.
+
+Right now, many state thresholds are coded directly in `_raw_state()` and are not actually driven by YAML.
+That means tuning can create config drift and hidden mismatch between code and config.
+
+### Recommendation
+As part of B3/H1 work, move all state thresholds into a structured config object, for example:
+
+```yaml
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+  thresholds:
+    risk_off_down_hazard: 0.68
+    risk_off_stress: 0.90
+    risk_off_trend_floor: -0.15
+    crash_override_down_hazard: 0.78
+    repair_hazard: 0.60
+    repair_stress_max: 0.75
+    trend_score: 0.40
+    trend_breadth_min: -0.05
+    trend_stress_max: 0.50
+    euphoric_crowding: 0.78
+    euphoric_rebound_hazard: 0.76
+```
+
+Then read these in `StateConfig` rather than leaving them hardcoded.
+
+This is not optional if you want H1/H2/H3 to remain interpretable.
+
+---
+
+## G2. B3 was semantically right even though the branch was rolled back
+The attempted B3 produced roughly:
+- `trend + euphoric_late ≈ 0.1465`
+- `risk_off ≈ 0.3686`
+
+Interpretation:
+- the precedence fix did revive offense classification,
+- but it did not solve risk_off inflation.
+
+So the branch failed the **wrong** guardrail, not the **wrong** idea.
+
+---
+
+## G3. `risk_off` is mainly driven by the `down_hazard` threshold
+Current evidence suggests the `(stress_score, trend_score)` sub-clause contributes little compared with the main `down_hazard >= ...` trigger.
+
+So H1 should first move the `down_hazard` threshold, not try a large bundle of unrelated threshold edits.
+
+---
+
+## G4. `positive_window_ratio` must remain secondary
+Even after B2, this field is still too easy to misread.
+Do not use it as a hard acceptance metric.
+
+---
+
+## G5. Candidate grid is still not very orthogonal
+Current candidates move several knobs together.
+That makes attribution harder.
+
+Not a blocker now, but after H3 you should consider adding a more orthogonal grid:
+- trend-only variants
+- chop-only variants
+- repair-only variants
+- cap-tightness variants
+
+---
+
+## G6. `days_since_riskoff` naming looks suspicious
+Current implementation behaves more like a risk-off streak counter than a true “days since last risk_off” variable.
+If it is used later for policy logic, rename or fix it before relying on it.
+
+This is not the immediate blocker, but it is a hidden semantic trap.
+
+---
+
+# 11. Final recommended order for Codex
+
+Use this exact order:
+
+1. **Re-land B3 semantic-only**
+2. **Execute B4**
+3. **H1a risk_off compression only**
+4. **H1b repair/trend cleanup**
+5. **H2 policy mapping**
+6. **H3 hard-pass restoration**
+7. only after that consider robustness tie-breaks
+
+---
+
+# 12. Exact instruction block to paste to Codex
+
+```text
+Please continue from the current B2 state.
+
+Important correction:
+B3 was rolled back too early because an H1 guardrail was applied to a B3 semantic fix.
+
+Execute the next sequence exactly as follows:
+1. Re-land B3 as semantic-only precedence fix in model/state_machine.py.
+2. Add/adjust tests so overlap rows resolve to trend, not repair.
+3. Evaluate B3 with semantic guardrails only; do NOT require risk_off <= 0.32 at B3.
+4. Execute B4 and label positive_window_ratio as diagnostic-only.
+5. Run H1a as a narrow risk_off-compression pass only:
+   - risk_off down_hazard 0.62 -> 0.68
+   - risk_off stress/trend clause 0.85/-0.10 -> 0.90/-0.15
+   - crash_override threshold 0.72 -> 0.78
+6. If H1a passes, run H1b with the conservative repair/trend cleanup bundle from the md.
+7. Only after H1 passes, move to H2 policy tuning.
+8. Only after H2 passes, move to H3 hard-pass restoration.
+9. Do not add candidate-level robustness filtering before H3.
+10. As part of B3/H1, move state thresholds out of hardcoded literals and into config-driven state_machine.thresholds.
+
+For every block:
+- list changed files
+- run targeted tests
+- report stitched metrics delta
+- report state mix delta
+- stop and rollback the block if its stop conditions fail
+```
+
+---
+
+# 13. Minimal acceptance checklist from here
+
+## After B3 re-landing
+- overlap tests pass
+- `trend + euphoric_late` does not regress
+- no catastrophic stitched deterioration
+
+## After H1a
+- `risk_off <= 0.32`
+- `trend + euphoric_late >= 0.14`
+- `drawdown_ratio_vs_baseline <= 0.68`
+
+## After H1b
+- `trend + euphoric_late >= 0.16`
+- `repair` in `[0.12, 0.22]`
+- `upside_capture` improves by `>= 0.02`
+
+## After H2
+- `upside_capture` improves by `>= 0.05`
+- `drawdown_ratio_vs_baseline <= 0.70`
+
+## After H3
+- `hard_pass_window_ratio >= 0.80`
+- `frontier_fallback_no_hard_pass <= 1`
+- `primary_window_success_ratio >= 0.60`
+
+---
+
+If forced to choose only one sentence:
+
+> Re-land B3, because it was stopped for the wrong reason; then use H1a to reduce risk_off, because that is the real next bottleneck.

+ 787 - 0
research/chinext50_regime_project/chinext50_recalibrate_guidance_for_codex_2026-04-09.md

@@ -0,0 +1,787 @@
+# Chinext50 Recalibrate Bundle 指导文档(给 Codex)
+
+版本:2026-04-09  
+适用对象:`deliverables/gpt_pro_bundle_recalibrate_2026-04-09` 这版 bundle 及其所在量化仓库  
+目的:回答 `CONTEXT_FOR_GPT_PRO.md` / `QUESTIONS_FOR_GPT_PRO.md` 的问题,并给出可以直接落实到代码的修改建议;同时指出全局方向问题和潜在隐含 bug。
+
+---
+
+## 0. 先给结论
+
+这次 recalibration **方向是对的**:
+- 候选不再全是 `defensive`
+- 选股(其实是选 policy hypothesis)从单一 utility 改成了 hard-constraint-first + ranking
+- summary/board 里也有了更多诊断信息
+
+但当前结果说明:
+
+1. **候选分散化不等于经济效果已经对了**。  
+   当前新结果只是把“总是选 defensive”改成了“会在 baseline / balanced_capture / pro_risk / defensive 之间切换”,但 full-sample 仍然明显落后于 baseline:
+   - strategy annual return = `0.0666`
+   - baseline annual return = `0.1463`
+   - strategy upside capture = `0.2849`
+   - positive_window_ratio = `0.20`
+
+2. **当前最核心的问题不是 state threshold,而是 evaluation semantics 和 ranking/fallback 还没完全对齐目标。**  
+   也就是说,现在先改 selection / fallback / report semantics 比直接改 state machine 更值钱。
+
+3. **当前 `positive_window_ratio=0.20` 不能被直接当成“策略经济质量就是 0.20”来解释。**  
+   原因不是它完全错,而是它当前是用 `test_utility_total_score > 0` 定义窗口为正。这个 utility 对 turnover 的惩罚对主动 regime 策略过重,导致窗口成功率被系统性压低。
+
+4. **当前 `real_walkforward_report.py` 里的 full-sample comparison 不是 stitched frozen OOS strategy 的真实全样本结果。**  
+   它比较的是:
+   - `run_strategy_bundle(raw, config)` 得到的“默认 config 全样本策略”
+   - `buy-and-hold baseline`
+   而不是“按 frozen walk-forward 每个窗口选出来的 candidate 拼接后的 OOS 策略”。这会导致 report 中“候选分布”和“full-sample strategy metrics”不是同一个对象。
+
+这 4 点是本轮最重要的判断。
+
+---
+
+## 1. 这次 bundle 的代码位置(Codex 必看)
+
+### 1.1 选择逻辑
+- `backtest/frozen_walkforward.py`
+  - `209-256`: `_compute_selection_score()`
+  - `259-280`: `_evaluate_hard_constraints()`
+  - `354-381`: ranking pool + fallback logic
+  - `461-465`: `positive_window_ratio` 的定义
+
+### 1.2 报告逻辑
+- `pipelines/real_walkforward_report.py`
+  - `214-221`: 运行 frozen walk-forward
+  - `222-248`: 计算 full-sample strategy/baseline comparison
+  - `251+`: 输出 summary
+
+### 1.3 默认参数
+- `config/regime.yaml`
+  - `114-131`: candidate selection 默认参数
+  - `132-164`: frozen validation candidates
+
+---
+
+## 2. 直接回答 QUESTIONS_FOR_GPT_PRO.md
+
+---
+
+## Q1. Hard constraints + ranking objective 应该怎么改?
+
+### 当前问题
+当前 selection 的主要问题不是“hard constraints 这个方向错了”,而是下面 4 点:
+
+1. `upside_capture_min=0.25` 太低,不能真正把“太弱 offense”的候选筛掉。  
+2. `annual_turnover_soft_max=12` 对当前 regime 策略过紧,2024 train window 直接把所有候选都打成 fail,迫使系统 fallback。  
+3. `stability_score` 现在实际上是:
+   - utility > 0 -> 1
+   - utility <= 0 -> 0
+   这是个**二值死分**,在当前 train utilities 大多为负时,几乎完全失效。  
+4. `return_ratio = annual_return / benchmark_return` 在 benchmark return 很小或为负时不稳。
+
+### 推荐的硬约束默认值(第一轮)
+
+请把当前默认值改成下面这组:
+
+```yaml
+frozen_validation:
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    utility_floor: -0.15
+    utility_target: 0.05
+    fallback_mode: closest_to_feasible_frontier
+```
+
+### 推荐的新 selection score 公式(精确版)
+
+在 `backtest/frozen_walkforward.py:209-256` 里改成下面逻辑:
+
+```python
+if benchmark_return > 0.05:
+    return_ratio = clip(annual_return / benchmark_return, 0.0, score_cap)
+else:
+    # 避免 benchmark 很小或 <=0 时 ratio 爆炸/失真
+    return_ratio = clip(annual_return / 0.10, 0.0, score_cap)
+
+upside_score = clip((upside_capture - 0.15) / max(upside_target - 0.15, 1e-12), 0.0, score_cap)
+drawdown_score = clip(drawdown_improvement_ratio / drawdown_improvement_target, 0.0, score_cap)
+sharpe_delta_score = clip((sharpe_delta + sharpe_delta_shift) / sharpe_delta_scale, 0.0, score_cap)
+
+# 不要再用 utility > 0 的二值 stability;改成平滑的 utility margin
+stability_score = clip((utility_total_score - utility_floor) / max(utility_target - utility_floor, 1e-12), 0.0, score_cap)
+
+turnover_penalty = max(0.0, annual_turnover - turnover_penalty_start) * turnover_penalty_per_unit
+
+selection_score = (
+    return_ratio_weight * return_ratio
+    + upside_weight * upside_score
+    + drawdown_weight * drawdown_score
+    + sharpe_delta_weight * sharpe_delta_score
+    + stability_weight * stability_score
+    - turnover_penalty
+)
+```
+
+### 为什么是这组值
+
+- `upside_capture_min` 从 `0.25` 提到 `0.28`:  
+  先轻微上调,避免一下子把 ranking pool 清空。当前 train window 的被选中候选 train upside 大多在 `0.21~0.31` 区间,直接提到 `0.40` 会让 hard-pass 覆盖率塌掉。
+
+- `annual_turnover_soft_max` 从 `12 -> 18`:  
+  当前策略不是低换手 buy-and-hold,它是 regime-aware active exposure control。`12` 对当前候选集过紧,容易诱发 fallback。
+
+- `turnover_penalty_start` 从 `10 -> 12`,`penalty_per_unit` 从 `0.02 -> 0.015`:  
+  保留对高换手的惩罚,但不要让 penalty 吞掉所有其它维度。
+
+- `upside_target` 从 `0.60 -> 0.45`:  
+  当前候选的 train upside 普遍远达不到 0.60。用 0.60 会让 upside 维度过于扁平,无法拉开真正更好的候选。
+
+- `stability_score` 平滑化:  
+  现在所有 train utility 大多为负,二值 stability 实际全是 `0`,等于这个权重完全没作用。
+
+### 推荐的 hard constraints 评估逻辑(精确版)
+
+在 `backtest/frozen_walkforward.py:259-280`,请把 turnover override 改成“绝对阈值 + 相对阈值”的组合:
+
+```python
+return_override_threshold = max(
+    annual_return_override_abs,
+    annual_return_override_ratio * max(benchmark_return, 0.0),
+)
+
+if annual_turnover > annual_turnover_soft_max and annual_return < return_override_threshold:
+    fail_turnover = True
+```
+
+不要继续只用一个固定 `annual_return_override=0.03`。  
+原因:
+- 绝对 3% 对不同 train window 长度、不同 benchmark regime 都不稳
+- partial last window 时更不稳
+
+---
+
+## Q2. Fallback policy 应该保留 utility-based 吗?
+
+### 结论
+**不建议继续保留“纯 utility 最大”fallback。**  
+建议改成:
+
+> **closest-to-feasible-frontier fallback**
+
+也就是:当没有 hard-pass candidate 时,优先选“离可行约束最近”的候选,而不是“utility 最大”的候选。
+
+### 当前问题
+当前 fallback 发生在:
+- `selection_mode = utility_fallback_no_hard_pass`
+- 代码位置:`backtest/frozen_walkforward.py:371-381`
+
+这会导致一个问题:
+- 如果所有候选都 fail,系统会偏向“最不差的 utility”
+- 但 utility 本身又被 turnover 惩罚强烈扭曲
+- 最后容易退回更防守、更低 upside 的候选
+
+### 严格 decision rule(建议 Codex 原样实现)
+
+对所有 non-hard-pass candidates 计算 violation distance:
+
+```python
+upside_gap = max(0.0, upside_capture_min - upside_capture) / max(upside_capture_min, 1e-12)
+
+drawdown_ratio = max_drawdown / benchmark_max_drawdown if benchmark_max_drawdown > 1e-12 else 0.0
+drawdown_gap = max(0.0, drawdown_ratio - max_drawdown_ratio_vs_benchmark) / max(max_drawdown_ratio_vs_benchmark, 1e-12)
+
+return_override_threshold = max(
+    annual_return_override_abs,
+    annual_return_override_ratio * max(benchmark_return, 0.0),
+)
+turnover_gap = 0.0
+if annual_turnover > annual_turnover_soft_max and annual_return < return_override_threshold:
+    turnover_gap = (annual_turnover - annual_turnover_soft_max) / max(annual_turnover_soft_max, 1e-12)
+
+violation_distance = 0.50 * upside_gap + 0.30 * drawdown_gap + 0.20 * turnover_gap
+```
+
+然后 fallback 选法是:
+
+```python
+fallback_score = -violation_distance + 0.25 * selection_score
+```
+
+选择规则:
+1. `fallback_score` 最大者胜出  
+2. 若并列,`selection_score` 更高者胜出  
+3. 若再并列,`utility_total_score` 更高者胜出  
+4. 若仍并列,保留当前 candidate order 的 deterministic tie-break
+
+### 为什么这样更好
+
+因为 fallback 的语义不应是:
+- “谁 utility 最不差就选谁”
+
+而应该是:
+- “谁最接近满足约束,就选谁”
+
+这是更符合 frozen selection 的经济意义的。
+
+---
+
+## Q3. Window-level acceptance protocol 应该怎么定义?
+
+### 结论
+**不要再把 `positive_window_ratio = (test_utility_total_score > 0)` 当成主 acceptance 指标。**
+
+当前这个定义太依赖 utility,而 utility 又对 active turnover 过于苛刻。
+
+### 先做两个区分
+
+#### A. Primary windows
+- 完整年度 test windows
+- 当前 bundle 里就是:2022 / 2023 / 2024 / 2025
+
+#### B. Partial tail window
+- `allow_partial_last_test: true` 生成的尾部不完整窗口
+- 当前 bundle 里就是:2026-01-05 到 2026-04-09
+
+**建议:primary acceptance 只看 A,不把 partial tail window 与完整年度窗口等权。**
+
+### 新的 window success 定义(建议)
+
+新增一个明确的经济窗口成功判据:
+
+```python
+window_success = (
+    test_annual_return > 0.0
+    and test_upside_capture >= 0.25
+    and (test_max_drawdown / test_benchmark_max_drawdown) <= 0.80
+    and test_annual_turnover <= 22.0
+)
+```
+
+### 下一轮的 exact pass/fail criteria
+
+#### Primary acceptance(必须满足)
+1. `primary_window_success_ratio >= 0.50`  
+   - 也就是完整年度窗口里至少一半成功
+
+2. `hard_pass_window_ratio >= 0.80`
+
+3. `max_primary_window_drawdown_ratio_vs_baseline <= 0.80`
+
+4. `median_primary_window_upside_capture >= 0.25`
+
+5. `full_sample_drawdown_ratio_vs_baseline <= 0.60`
+
+6. `full_sample_annual_turnover <= 18.0`
+
+#### Target acceptance(更优目标,不是 blocker)
+1. `primary_window_success_ratio >= 0.60`
+2. `median_primary_window_upside_capture >= 0.30`
+3. `full_sample_upside_capture >= 0.35`
+4. `full_sample_annual_return >= 0.08`
+5. `utility_delta_vs_baseline >= -0.05`(前提是 utility 已重标定)
+
+### 当前 bundle 不建议继续使用的 acceptance 定义
+
+以下定义请降级为**辅助诊断**,不要再作为主 gate:
+- `positive_window_ratio = (test_utility_total_score > 0)`
+
+原因:
+- 2025 这个窗口的 OOS 表现并不差(年化 `0.2953`、drawdown improvement `0.6653`),但 utility 仍是负值 `-0.0917`
+- 这说明 utility sign 当前不适合作为主判据
+
+---
+
+## Q4. 参数 tuning 的顺序怎么排?
+
+### 结论
+在当前代码架构下,推荐顺序不是 1 -> 2 -> 3 直接照做,而是:
+
+> **Step 0: 先修 evaluation semantics**  
+> **Step 1: candidate selection thresholds / weights / fallback**  
+> **Step 2: policy mapping**  
+> **Step 3: state-machine thresholds**
+
+### Step 0(必须先做)
+先修下面 3 件:
+1. `positive_window_ratio` 的定义
+2. fallback 从 utility-based 改为 frontier-based
+3. report 里 full-sample strategy metrics 改成 stitched frozen OOS metrics(详见第 4 节)
+
+**Stop condition:**
+- summary 里有 `primary_window_success_ratio`
+- 有 `stitched_frozen_oos_metrics`
+- fallback 不再是 utility-only
+
+### Step 1:candidate selection thresholds / weights
+先改:
+- hard constraints
+- ranking score
+- fallback
+
+**Stop condition:**
+- `hard_pass_window_ratio >= 0.80`
+- `fallback_window_count <= 1`
+- candidate selection 不再明显塌回单一 `defensive`
+- `primary_window_success_ratio` 比当前基线不下降
+
+### Step 2:policy mapping
+只有在 selection protocol 稳定以后,再改:
+- `trend/chop/repair/euphoric_late` exposures
+- `max_daily_exposure_change`
+
+**Stop condition:**
+- `full_sample_upside_capture` 至少提升 `+0.05`
+- `full_sample_drawdown_ratio_vs_baseline` 不恶化超过 `+0.05`
+- `annual_turnover <= 18`
+
+### Step 3:state-machine thresholds
+最后才调:
+- `trend` gate
+- `repair` gate
+- `risk_off` gate
+- persistence / override
+
+**Stop condition:**
+- 连续两轮 threshold 调整都不能同时改善 `primary_window_success_ratio` 和 `full_sample_upside_capture`
+- 或者 drawdown / turnover 触发 guardrail
+
+### 一句话版本
+
+**先把“怎么选 candidate、怎么判窗口成败、report 在比较什么”搞对;再调 policy;最后才碰 state machine。**
+
+---
+
+## 3. 当前 bundle 的全局方向问题 / 隐含 bug
+
+这里是本轮最重要的全局检查结果。
+
+---
+
+## Global Issue A:`real_walkforward_report.py` 的 full-sample comparison 不是 stitched frozen OOS strategy
+
+### 证据
+代码位置:`pipelines/real_walkforward_report.py:214-248`
+
+当前逻辑是:
+1. 先跑 `run_frozen_walkforward()` 得到 `board, frozen_summary`
+2. 然后又单独跑一次:
+   - `run_strategy_bundle(raw, config)` -> `strategy_full_metrics`
+   - `run_backtest(buy_and_hold)` -> `baseline_metrics`
+
+也就是说 summary 里的:
+- `frozen_walkforward.selected_candidate_distribution`
+- `strategy_full_sample_metrics`
+
+**不是同一个策略对象。**
+
+### 这是为什么是严重问题
+因为你现在会看到这样的报告:
+- 左手:frozen walk-forward 里选了 `balanced_capture / pro_risk / baseline / defensive`
+- 右手:full-sample metrics 却是“默认 config 策略”
+
+这会把 selection protocol 的效果和 default config 的效果混在一起。
+
+### 必改建议
+新增一个 stitched OOS 路径:
+
+```python
+stitched_frozen_oos_ledger
+stitched_frozen_oos_metrics
+```
+
+summary 至少要拆成 3 组:
+1. `default_strategy_full_sample_metrics`
+2. `stitched_frozen_oos_metrics`
+3. `baseline_full_sample_metrics`
+
+同时 comparison 也要拆:
+- `default_vs_baseline`
+- `stitched_oos_vs_baseline`
+
+**后续主评价请优先看 stitched OOS,而不是 default full-sample。**
+
+---
+
+## Global Issue B:`positive_window_ratio` 目前语义有偏差
+
+### 当前定义
+在 `backtest/frozen_walkforward.py:461-465`:
+
+```python
+positive_window_ratio = (test_utility_total_score > 0.0).mean()
+```
+
+### 为什么有问题
+因为当前 utility 函数(如果还是沿用原仓库里的版本)对 turnover 的惩罚非常重:
+
+```python
+utility = 0.45 * sharpe_delta
+        + 0.40 * drawdown_improvement
+        + 0.15 * (upside_capture - 0.75)
+        - 0.02 * max(0, annual_turnover - 4)
+```
+
+这套 utility 对一个主动的 regime strategy 来说,`annual_turnover > 4` 的罚分过大。  
+当前 full-sample 里:
+- sharpe_delta 贡献约 `+0.0230`
+- drawdown_improvement 贡献约 `+0.2074`
+- upside 项约 `-0.0698`
+- turnover penalty 约 `-0.2604`
+
+结果:
+- 总 utility = `-0.0997`
+
+这意味着:
+- 只要 turnover 在 15-20 附近,utility 很容易整体被压成负值
+- 因此 `positive_window_ratio` 会系统性偏低
+
+### 必改建议
+- `positive_window_ratio` 改成 `primary_window_success_ratio`
+- utility sign 只保留为辅助诊断,不再做主 acceptance gate
+
+---
+
+## Global Issue C:fallback 现在会被 utility 偏置拖回更保守路径
+
+这个问题虽然已经在 Q2 里回答过,但它值得单独列为 global issue。
+
+如果所有候选都 fail,现在逻辑会:
+- 选 utility 最大的那个
+
+但 utility 本身:
+- 对 turnover 敏感
+- 对 active strategy 偏苛刻
+- 容易把 fallback 拖向更低 offense 的候选
+
+### 必改建议
+改成 closest-to-feasible-frontier fallback。
+
+---
+
+## Global Issue D:`stability_score` 当前几乎是 dead feature
+
+### 当前定义
+`backtest/frozen_walkforward.py:236`
+
+```python
+stability_score = 1.0 if utility_total_score > 0.0 else 0.0
+```
+
+### 问题
+当前 train window 大多数 candidate 的 utility_total_score 都是负的。  
+结果:
+- `stability_score` 几乎总是 `0`
+- `stability_weight` 虽然写在配置里,但实际上没提供有效信息
+
+### 必改建议
+改成平滑分数:
+
+```python
+stability_score = clip((utility_total_score - utility_floor) / (utility_target - utility_floor), 0.0, score_cap)
+```
+
+---
+
+## Global Issue E:`annual_return_override=0.03` 是绝对值,不够 regime-aware
+
+### 问题
+一个固定的 3% 年化阈值:
+- 对不同 benchmark return 背景不稳
+- 对 partial last window 更不稳
+- 对高 beta 风格指数也不够自然
+
+### 必改建议
+换成:
+
+```python
+return_override_threshold = max(
+    annual_return_override_abs,
+    annual_return_override_ratio * max(benchmark_return, 0.0),
+)
+```
+
+---
+
+## Global Issue F:`return_ratio = annual_return / benchmark_return` 在 benchmark 很小或为负时不稳
+
+### 问题
+如果 benchmark return:
+- 非常小
+- 或 <= 0
+
+那么 ratio:
+- 会失真
+- 会被 `clip` 掩盖掉真正问题
+
+### 必改建议
+改成:
+
+```python
+if benchmark_return > 0.05:
+    return_ratio = clip(annual_return / benchmark_return, 0.0, score_cap)
+else:
+    return_ratio = clip(annual_return / 0.10, 0.0, score_cap)
+```
+
+或者显式改成 excess-return-based score。
+
+---
+
+## Global Issue G:partial last window 不应与完整年度窗口等权
+
+当前 2026 窗口只有 `62` 行。  
+在 summary 里它和完整年度窗口是同权的。
+
+### 问题
+- 会扭曲 `positive_window_ratio`
+- 会扭曲 selection/fallback 成功率的主判断
+- 也会让 annualized metrics 在短样本上波动过大
+
+### 必改建议
+summary 中新增:
+- `primary_window_count`
+- `partial_window_count`
+- `primary_window_success_ratio`
+- `partial_window_success_ratio`
+
+并且 acceptance 只用 primary。
+
+---
+
+## Global Issue H:如果上游 score/hazard 模块没有重新体检,selection 优化可能只是表层修正
+
+这条是“全局提醒”,不是当前 bundle 里能直接看到的 bug,但必须提醒 Codex:
+
+如果这版 recalibrate bundle 运行在之前同一套上游 score / hazard / state stack 之上,那么要重新做一次 integrity audit,至少检查:
+- `breadth_score` 非空率
+- `crowding_score` 非空率
+- `repair_hazard / rebound_hazard` 的分布是否塌缩
+- `repair` / `euphoric_late` 状态是否真的出现
+- exposure ladder 是否真的拉开 candidate 差异
+
+原因:
+- selection 层再精细,如果上游状态本身退化,结果仍会偏防守
+- 这一点在之前的全局审阅里已经是高优先级风险
+
+---
+
+## 4. 建议 Codex 按文件修改(详细实施指引)
+
+---
+
+## 4.1 `backtest/frozen_walkforward.py`
+
+### 必改项 A:扩展 selection settings
+在 `_resolve_candidate_selection_settings()` 里新增:
+- `annual_return_override_abs`
+- `annual_return_override_ratio`
+- `utility_floor`
+- `utility_target`
+- `fallback_mode`
+
+### 必改项 B:重写 `_compute_selection_score()`
+目标:
+- return ratio 更稳
+- stability 不再二值
+- upside 目标更贴近当前 active regime 候选
+- turnover penalty 不再过重
+
+### 必改项 C:重写 `_evaluate_hard_constraints()`
+目标:
+- turnover override 改成 relative + absolute mixed threshold
+- 保留 upside / drawdown 约束
+
+### 必改项 D:新增 `_constraint_distance()`
+建议新增 helper:
+
+```python
+def _constraint_distance(metrics, settings) -> tuple[float, dict[str, float]]:
+    ...
+```
+
+返回:
+- `violation_distance`
+- `{'upside_gap': ..., 'drawdown_gap': ..., 'turnover_gap': ...}`
+
+### 必改项 E:替换 fallback
+当前:utility-based fallback  
+改成:frontier-based fallback
+
+### 必改项 F:summary 层新增字段
+在 board 和 summary 里增加:
+- `selected_train_violation_distance`
+- `selected_train_violation_components`
+- `fallback_distance_distribution`(summary 可选)
+
+---
+
+## 4.2 `pipelines/real_walkforward_report.py`
+
+### 必改项 A:不要再用 default full-sample strategy 充当 frozen OOS strategy
+新增一个 stitched OOS helper,例如:
+
+```python
+def stitch_frozen_oos_ledger(...):
+    ...
+```
+
+建议做法:
+- `run_frozen_walkforward()` 除 board 外,额外返回每个 test window 的 frozen test ledger
+- 在 report pipeline 里把这些 ledger 按时间拼起来
+- 计算 `stitched_frozen_oos_metrics`
+
+### 必改项 B:summary 拆成 3 套指标
+新增:
+- `default_strategy_full_sample_metrics`
+- `stitched_frozen_oos_metrics`
+- `baseline_full_sample_metrics`
+
+### 必改项 C:新增 primary / partial window semantics
+新增:
+- `primary_window_success_ratio`
+- `partial_window_success_ratio`
+- `primary_window_count`
+- `partial_window_count`
+- `window_success_rule`
+
+### 必改项 D:comparison block 要分对象
+新增:
+- `stitched_oos_vs_baseline`
+- `default_vs_baseline`
+
+主报告里优先展示 stitched OOS。
+
+---
+
+## 4.3 `config/regime.yaml`
+
+### 必改项
+将 `frozen_validation.candidate_selection` 默认值改成第 2 节给出的版本。
+
+### 建议新增
+```yaml
+evaluation:
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+```
+
+---
+
+## 4.4 `backtest/utility.py`(如果当前仓库中仍是旧公式)
+
+### 结论
+这不是当前 bundle 里第一优先必须改的文件,但它是一个**全局高风险问题**。
+
+当前 utility 对 active regime strategy 的 turnover 惩罚明显过重。  
+如果 Codex 有权限改全仓库,请在第二阶段做:
+
+#### 方案 A(最稳)
+不改 utility,但:
+- 不再用 utility sign 作为 `positive_window_ratio`
+- fallback 不再直接按 utility 排
+
+#### 方案 B(推荐长期做)
+把 utility 公式从:
+
+```python
+0.45 * sharpe_delta + 0.40 * drawdown_improvement + 0.15 * (upside_capture - 0.75) - 0.02 * max(0, annual_turnover - 4)
+```
+
+改成:
+
+```python
+0.40 * sharpe_delta
++ 0.35 * drawdown_improvement
++ 0.25 * (upside_capture - 0.60)
+- 0.01 * max(0, annual_turnover - 10)
+```
+
+注意:如果改 utility,必须同步更新 tests 和 acceptance 解释。
+
+---
+
+## 5. 建议新增测试(Codex 必做)
+
+### 5.1 `tests/test_frozen_walkforward.py`
+新增测试:
+1. **fallback uses closest frontier, not max utility**
+2. **stability_score is non-binary and differentiates negative utilities**
+3. **turnover override uses max(abs, ratio*benchmark)**
+4. **return_ratio stays bounded when benchmark_return <= 0.05**
+
+### 5.2 `tests/test_real_walkforward_report_pipeline.py`
+新增测试:
+1. summary 包含 `stitched_frozen_oos_metrics`
+2. summary 包含 `primary_window_success_ratio`
+3. partial last window 不会进入 `primary_window_success_ratio` 的分母
+4. report 主 comparison 使用 stitched OOS 指标而不是 default full-sample 指标
+
+### 5.3 回归检查
+至少保留以下 invariants:
+- 选择 deterministic
+- train select / test freeze 机制不被破坏
+- insufficient windows 仍然能正确 skip
+
+---
+
+## 6. 推荐的执行顺序(给 Codex)
+
+### 第一轮(先修 semantics)
+1. 改 `backtest/frozen_walkforward.py`
+2. 改 `config/regime.yaml`
+3. 改 `pipelines/real_walkforward_report.py`
+4. 加测试
+5. 跑当前 bundle 对应的 targeted tests
+
+### 第二轮(只在第一轮稳定后)
+1. 若仓库里有 `backtest/utility.py`,考虑 utility recalibration
+2. 重新跑 report,比较:
+   - `hard_pass_window_ratio`
+   - `primary_window_success_ratio`
+   - `stitched_frozen_oos_metrics.upside_capture`
+   - `stitched_frozen_oos_metrics.drawdown_ratio_vs_baseline`
+   - `selection_mode_distribution`
+
+### 第三轮(再往后)
+只有在上面稳定后,才调:
+- policy mapping
+- state-machine thresholds
+
+---
+
+## 7. 最后一句话:当前应该怎么理解这版结果
+
+当前这版 recalibrate bundle 的真正意义是:
+
+- **它证明了“只靠 utility 选 defensive”这条路不够好**
+- **它还没有证明“新的 selection protocol 已经经济上足够好”**
+
+所以当前最正确的动作不是继续盲调 threshold,
+而是先把下面三件事修对:
+
+1. 候选 selection / fallback 的语义  
+2. window success 的定义  
+3. report 到底在比较什么对象  
+
+这三件修完,再讨论 policy/state 调参才不会浪费时间。
+

+ 123 - 0
research/chinext50_regime_project/chinext50_regime_build_handoff_2026-04-08.md

@@ -0,0 +1,123 @@
+# 创业板50 Regime 项目构建交接(2026-04-08)
+
+## 已完成内容
+
+已在 `chinext50_regime_project/` 下搭好一条最小可运行闭环:
+
+- `data/`
+  - `io.py`:读取 CSV / parquet 历史数据
+  - `sample_data.py`:生成 synthetic 创业板50风格日线与 breadth 示例数据
+- `features/`
+  - `price_features.py`:价格/波动/ATR/突破/效率等特征
+  - `breadth_features.py`:广度/集中度/扩散背离特征
+  - `relative_features.py`:相对沪深300/科创50/中证1000强弱
+  - `pipeline.py`:统一特征装配入口
+- `model/`
+  - `scores.py`:trend/breadth/stress/crowding/repair 五分 + 三个 hazard
+  - `state_machine.py`:`risk_off / repair / trend / chop / euphoric_late`
+  - `policy.py`:仓位映射、硬 veto、quantized exposure、日变动上限
+- `backtest/`
+  - `engine.py`:next-open approximate 回测与指标计算
+  - `events.py`:状态切换事件切片
+  - `utility.py`:net utility 与状态判定
+  - `walkforward.py`:默认 frozen-hypothesis 窗口
+- `pipelines/`
+  - `run_demo.py`:端到端 demo
+  - `frozen_hypothesis_validation.py`:先整段算完再按测试窗切片,避免短窗冷启动
+- `tests/`
+  - 端到端 demo 测试
+  - utility 测试
+  - 仓位量化/步长约束测试
+
+## 已输出的可直接查看产物
+
+- `chinext50_regime_project/examples/synthetic_chinext50_sample.csv`
+- `chinext50_regime_project/outputs/demo/daily_ledger.csv`
+- `chinext50_regime_project/outputs/demo/event_summary.csv`
+- `chinext50_regime_project/outputs/demo/metrics_summary.json`
+- `chinext50_regime_project/outputs/frozen_validation/frozen_validation_board.csv`
+- `chinext50_regime_project/outputs/frozen_validation/frozen_validation_summary.json`
+
+## 当前已确认状态
+
+- `pytest` 通过(3 个测试)
+- demo / frozen validation 管线可执行
+- scaffold 现在已经不是空骨架,而是可接真实数据继续迭代的工程起点
+
+## 当前明确限制
+
+1. **还没有接入真实创业板50历史数据**
+   - 现在 demo 用的是 synthetic 数据
+   - synthetic 结果不能解释成真实经济效果证据
+
+2. **回测执行层仍是近似版**
+   - 当前用 next-open approximate 日频执行
+   - 真实版本需要对 ETF 成交、开盘冲击、tracking gap 单独建模
+
+3. **权重和阈值还没有做真实数据上的 train/test 冻结校准**
+   - 当前参数是合理先验,不是已验证最优参数
+
+4. **breadth 层仍然依赖外部聚合表**
+   - 真实版必须接入 point-in-time 成分股和调样历史,避免幸存者偏差
+
+## Codex 下一步最优先任务
+
+### P0:接入真实数据
+
+需要准备一张 point-in-time 日表,至少包含:
+
+- `date`
+- `open/high/low/close/volume`
+- `hs300_close`
+- `star50_close`
+- `csi1000_close`
+- `pct_constituents_above_20dma`
+- `pct_constituents_above_60dma`
+- `pct_new_high_20`
+- `pct_new_low_20`
+- `eq_weight_ret_5`
+- `weighted_ret_5`
+- `top3_contribution_5`
+- `corr_spike_20`
+- `dispersion_20`
+
+### P1:重做 frozen-hypothesis walk-forward
+
+不要在每个测试窗内重选赢家。流程应是:
+
+1. 训练窗内确定参数 / 阈值
+2. 固定假设
+3. 测试窗只评估,不改参数
+4. 汇总 window utility、drawdown、upside capture
+
+### P2:加事件锚定诊断
+
+重点评估:
+
+- crash onset
+- false rebound
+- true repair
+- euphoric unwind
+
+### P3:补执行层真实约束
+
+- ETF 真实成交口径
+- 极端日成本模型
+- tracking difference / tracking error 监控
+
+## 暂时不要做
+
+- 多市场 portability/readiness 系统
+- 大量宏观变量
+- RL / 深度学习
+- 复杂动作模板库
+
+## 目标定义
+
+第一阶段项目目标应是:
+
+- 相对 buy-and-hold 明显降低最大回撤
+- 保留大部分上涨段 capture
+- 成本后 utility 在多数测试窗口不为负
+
+而不是直接追求“显著跑赢创业板50”。

+ 357 - 0
research/chinext50_regime_project/chinext50_regime_review_2026-04-09.md

@@ -0,0 +1,357 @@
+# ChiNext50 Regime Review (2026-04-09)
+
+## Executive summary
+
+The current system has a real defensive effect, but the present end-to-end result is not primarily a threshold-tuning problem. It is first a **system integrity** problem:
+
+1. `breadth_score` and `crowding_score` are effectively broken on the supplied PIT dataset because one z-scored component is constant, so the weighted sum becomes all-NaN.
+2. `down_hazard`, `repair_hazard`, and `rebound_hazard` collapse to ~0.5 everywhere because the raw hazard inputs are NaN and get filled to zero inside the sigmoid.
+3. This silently degenerates the state machine into a 3-state controller (`chop`, `trend`, `risk_off`), so `repair` and `euphoric_late` logic is mostly dead.
+4. Policy candidate differentiation is partly fake: `baseline` and `pro_risk` produce identical exposure paths on the supplied run because coarse quantization collapses their differences.
+5. Frozen walk-forward is too weak to support strong model-selection claims because only 2 windows are actually processed.
+
+Only after fixing these should you trust threshold tuning and objective redesign.
+
+## Confirmed current behavior from the supplied bundle
+
+- Full-sample strategy metrics are materially below benchmark on annual return and upside capture, while max drawdown is much better.
+- Full-sample state counts in the saved run are effectively only `chop`, `trend`, and `risk_off`.
+- In my local replay of the supplied code + PIT:
+  - `breadth_score` non-null ratio = 0.0
+  - `crowding_score` non-null ratio = 0.0
+  - `down_hazard`, `repair_hazard`, `rebound_hazard` = 0.5 nearly everywhere
+  - `baseline` and `pro_risk` exposure paths are identical
+
+## Root causes
+
+### 1. NaN propagation in `model/scores.py`
+The weighted score sums do not protect against NaN sub-components. If any sub-score is all-NaN, the whole composite score becomes all-NaN.
+
+On the supplied PIT, `concentration_spread_5 = weighted_ret_5 - eq_weight_ret_5` is constant at `0.002`, so its rolling z-score has zero std and becomes all-NaN. This breaks both:
+- `breadth_score`
+- `crowding_score`
+
+### 2. Hazard collapse
+Hazards are built from raw formulas that reference broken scores. Then they are fed through:
+- `rolling_zscore(...)`
+- `_sigmoid(series.fillna(0.0))`
+
+This turns missing hazard information into the neutral constant `0.5`, which prevents the system from noticing it is effectively blind.
+
+### 3. Candidate selection collapse
+The policy layer uses coarse quantization:
+- allowed levels: `{0.0, 0.25, 0.50, 0.75, 1.0}`
+
+As a result:
+- `trend = 0.95` and `trend = 1.00` both quantize to `1.0`
+- many `repair` and `chop` parameter tweaks collapse to the same discrete levels
+
+That is why `baseline` and `pro_risk` can become identical even though their YAML values differ.
+
+### 4. Walk-forward sample weakness
+The frozen WF windows start in 2016 while the supplied PIT starts in 2020. So the first window is skipped, leaving only 2 processed windows. This is too thin for robust selection.
+
+### 5. Execution calibration objective is mis-scaled
+Current calibration score:
+
+`utility_total_score - 3*tracking_diff_abs_mean - 20*tracking_error_20_p95 - max_drawdown`
+
+On the supplied run:
+- `max_drawdown` is ~0.32 and dominates the score
+- tracking penalties are tiny in absolute magnitude
+- utility spread between candidates is small
+
+So the calibration is effectively “pick the lowest cost / smallest MDD” rather than meaningfully trading off return, utility, and tracking.
+
+## Direction judgment
+
+The overall direction is still valid:
+- regime-aware exposure control
+- preserve drawdown advantage
+- recover upside via repair/trend participation
+
+But the current implementation is **not yet a true regime system**. In practice, it behaves like:
+- a coarse 3-state exposure smoother
+- using mostly price/stress information
+- with breadth/crowding/repair logic mostly disabled
+
+So the global direction is **not wrong**, but the current bundle is **not measuring what it thinks it is measuring**.
+
+## Immediate fixes before any serious threshold tuning
+
+1. In `model/scores.py`, fill NaN at the component level or aggregate with NaN-safe sums.
+2. Add a low-information / constant-series gate in data quality checks.
+3. Make hazards fail loudly when one of their prerequisite scores is entirely missing.
+4. Replace 5-level exposure quantization with either:
+   - continuous exposure, or
+   - finer ladder, e.g. every 0.10.
+5. Rebuild the walk-forward schedule so every reported window is valid.
+
+## Practical parameter recommendations after bug fix
+
+### State machine
+
+#### Risk-off
+Current risk-off is likely too eager once hazards start working.
+
+Recommended first pass:
+- `down_hazard`: `0.62 -> 0.70`
+- `stress_score`: `0.85 -> 0.95`
+- crash override keep, but use stronger trigger: `0.72 -> 0.78`
+
+Expected impact:
+- fewer premature risk-off entries
+- better upside capture
+- slightly higher drawdown, but still clearly below benchmark if crash override remains
+
+#### Repair
+Current repair condition is too easy once repaired hazards become live.
+
+Recommended first pass:
+- `repair_hazard`: `0.58 -> 0.62`
+- `repair stress max`: `0.85 -> 0.70`
+- keep `d_stress <= 0`, and add `d_trend >= 0`
+- add minimum breadth confirmation: `breadth_score >= 0.00`
+
+Expected impact:
+- fewer fake repair states
+- lower churn in weak rebounds
+- repair exposure will become cleaner and more useful
+
+#### Trend
+Current trend gate is too strict on signal but too weak on persistence.
+
+Recommended first pass:
+- `trend_score`: `0.45 -> 0.30~0.35`
+- `breadth_score`: `-0.05 -> 0.00` after bug fix
+- `stress_score`: `0.45 -> 0.55`
+
+Expected impact:
+- more days classified as trend
+- stronger upside capture
+- higher exposure persistence in genuine rallies
+
+#### Euphoric late
+Current `euphoric_late` should be delayed, not early.
+
+Recommended first pass:
+- `crowding_score`: `0.70 -> 0.82`
+- `rebound_hazard`: `0.68 -> 0.78`
+
+Expected impact:
+- fewer early caps on strong trends
+- better trend participation
+- still protects last-stage blowoff risk
+
+### Duration / persistence
+Current symmetric `min_state_duration = 3` is too blunt.
+
+Recommended:
+- default persistence: `4`
+- crash override: immediate
+- non-crash `risk_off`: 2-day confirm
+- trend exit: 4-day confirm
+- repair entry: 2-day confirm
+
+Expected impact:
+- fewer whipsaws
+- better hold-through in trend
+- less premature de-risking
+
+## Exposure mapping recommendations
+
+### Replace coarse quantization
+This is one of the biggest practical blockers.
+
+Recommended:
+- remove quantization entirely, or
+- replace `{0,0.25,0.5,0.75,1.0}` with `{0,0.1,0.2,...,1.0}`
+
+Without this change, many policy experiments are fake because different raw exposures map to the same discrete level.
+
+### Repair mapping
+Current repair exposure is too timid if the goal is upside capture >= 0.60.
+
+Recommended piecewise mapping:
+- weak repair: `0.30`
+- confirmed repair: `0.45`
+- broad repair: `0.60`
+- strong repair + improving breadth: `0.75`
+
+Example:
+- if `repair_hazard in [0.62, 0.70)` and `breadth_score >= 0.0`: `0.45`
+- if `repair_hazard in [0.70, 0.80)` and `d_trend > 0`: `0.60`
+- if `repair_hazard >= 0.80` and `breadth_score > 0.25`: `0.75`
+
+### Trend mapping
+Trend should be close to full risk unless stress or crowding says otherwise.
+
+Recommended:
+- base trend: `0.90`
+- strong trend + breadth > 0.25: `1.00`
+- late trend / early crowding: `0.75`
+
+Practical formula:
+- `trend_base = 0.90`
+- `trend_boost = +0.10 if breadth_score > 0.25`
+- `trend_cut = -0.15 if crowding_score > 0.75`
+- clamp to `[0.75, 1.00]`
+
+### Chop mapping
+This is the lever most directly tied to upside capture in the current broken topology.
+
+Observed on the supplied run:
+- chop around `0.25` produces upside capture around `0.37`
+- effective chop around `0.50` lifts upside capture toward `0.54`
+- effective chop around `0.75` pushes upside capture above `0.70`, but drawdown rises sharply
+
+Recommended target for next round:
+- `chop = 0.40~0.45` if using continuous exposure
+- if quantized, force effective chop to `0.50` only after fixing state logic
+
+Expected impact:
+- biggest single uplift in upside capture
+- drawdown will rise, so do not do this before fixing false repair / false trend logic
+
+### Turnover guardrails
+- `max_daily_exposure_change`: `0.25 -> 0.35` after quantization removal
+- annual turnover soft ceiling: `<= 12`
+- if turnover > 12 without upside capture > 0.55, rollback
+
+## Objective / loss redesign
+
+### Hard constraints for walk-forward selection
+Use hard gates first, then a score.
+
+Recommended hard constraints:
+1. `strategy_max_drawdown <= 0.70 * baseline_max_drawdown`
+2. `upside_capture >= 0.50` for every valid OOS window
+3. median OOS `upside_capture >= 0.55`
+4. `positive_window_ratio >= 0.67`
+5. `annual_turnover <= 12` unless annual return improves by at least `+300 bps`
+
+Only candidates that pass hard constraints are ranked.
+
+### Practical ranking score
+Recommended selection score:
+
+`score = 0.35 * return_ratio + 0.30 * upside_score + 0.20 * dd_score + 0.10 * sharpe_delta_score + 0.05 * stability_score - turnover_penalty`
+
+Where:
+- `return_ratio = clip(strategy_ann / baseline_ann, 0, 1.2)`
+- `upside_score = clip(upside_capture / 0.60, 0, 1.2)`
+- `dd_score = clip((baseline_mdd - strategy_mdd) / baseline_mdd / 0.35, 0, 1.2)`
+- `sharpe_delta_score = clip((strategy_sharpe - baseline_sharpe + 0.10) / 0.20, 0, 1.2)`
+- `stability_score = positive_window_ratio`
+- `turnover_penalty = max(0, annual_turnover - 10) * 0.02`
+
+This score is easier to interpret than the current utility-only selection.
+
+## Execution calibration score redesign
+
+### Problem with current formula
+The current formula is dominated by `-max_drawdown`, not by tracking penalties.
+
+### Alternative A: utility-first deployment score
+Use when execution assumptions are still approximate.
+
+`calib_A = utility_total_score + 0.40*annual_return + 0.20*upside_capture - 0.60*max_drawdown - 5*tracking_error_20_p95 - 1.5*tracking_diff_abs_mean`
+
+### Alternative B: implementation-sensitive score
+Use only when execution model is already close to production.
+
+`calib_B = utility_total_score + 0.30*sharpe - 0.40*max_drawdown - 2*max(0, tracking_error_20_p95 - 0.003) - 1*max(0, tracking_diff_abs_mean - 0.001)`
+
+This introduces tolerance bands so tiny tracking differences do not dominate selection.
+
+## Walk-forward robustness protocol
+
+### Window scheme
+Given current data start in 2020, do not pretend you have 2016 windows.
+
+Recommended:
+- expanding train, rolling 1-year or 18-month test
+- minimum 3 valid OOS windows, preferably 4+
+
+Example:
+- train 2020-2021, test 2022
+- train 2020-2022, test 2023
+- train 2020-2023, test 2024
+- train 2020-2024, test 2025
+
+### Stability checks
+For every candidate, record:
+- median OOS annual return
+- median OOS max drawdown
+- median OOS upside capture
+- worst-window upside capture
+- worst-window drawdown ratio
+- selection frequency if you do candidate search
+
+### Acceptance criteria
+- no valid window with upside capture < 0.40
+- median upside capture >= 0.55
+- drawdown ratio vs baseline <= 0.75 in every window
+- positive utility in at least 3/4 windows
+
+## Two-week roadmap
+
+### Week 1
+1. Fix NaN propagation in score aggregation
+2. Add low-information feature gate
+3. Remove or refine exposure quantization
+4. Rebuild walk-forward windows to valid periods only
+
+### Week 2
+5. Retune state thresholds after bug fix
+6. Upgrade repair/trend exposure curves
+7. Re-run walk-forward with new hard constraints
+8. Replace execution calibration score
+
+## Priority experiments
+
+### Experiment 1 — Score integrity repair
+- change: NaN-safe score aggregation + fail-fast hazard checks
+- expected win-rate: very high
+- success metric: `breadth_score` and `crowding_score` non-null ratio > 95%; hazards not stuck at 0.5
+- rollback: if any required score still all-NaN
+
+### Experiment 2 — Exposure quantization removal
+- change: continuous exposure or 0.10 ladder
+- expected win-rate: high
+- success metric: baseline and pro-risk no longer identical; policy sweeps create genuinely different exposure paths
+- rollback: if turnover spikes > 14 without upside improvement
+
+### Experiment 3 — Trend/chop uplift
+- change: effective chop to ~0.45 and trend to 0.90~1.00
+- expected win-rate: medium-high
+- success metric: upside capture > 0.50 while max drawdown <= 0.75 * baseline
+- rollback: if MDD rises above 0.48 before upside reaches 0.50
+
+### Experiment 4 — Risk-off relaxation
+- change: `down_hazard 0.70`, `stress 0.95`, stronger crash override
+- expected win-rate: medium
+- success metric: fewer risk-off days, higher annual return, drawdown ratio still <= 0.70
+- rollback: if downside capture worsens materially without upside benefit
+
+### Experiment 5 — Repair cleanup
+- change: `repair_hazard 0.62`, `breadth >= 0`, `d_trend >= 0`, lower repair stress ceiling
+- expected win-rate: medium
+- success metric: lower false rebound count, repair-state annualized return positive
+- rollback: if repair days collapse to near zero
+
+### Experiment 6 — Walk-forward + objective redesign
+- change: hard constraints + new selection score
+- expected win-rate: high for decision quality, medium for metrics
+- success metric: selected candidates diversify and OOS selection remains stable
+- rollback: if selected candidate flips every window with no OOS gain
+
+## Bottom line
+
+The core direction is valid, but the current bundle is still partly a **false negative** on offense because two of the most important score channels are effectively broken and the policy search space is partly collapsed.
+
+Fix the integrity issues first. After that, the most likely path to materially better upside capture is:
+- slightly less eager risk-off
+- stricter but cleaner repair
+- earlier/more persistent trend classification
+- much less coarse exposure mapping

+ 3 - 0
research/chinext50_regime_project/config/__init__.py

@@ -0,0 +1,3 @@
+from .loader import load_config
+
+__all__ = ['load_config']

+ 12 - 0
research/chinext50_regime_project/config/loader.py

@@ -0,0 +1,12 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+
+def load_config(path: str | Path | None = None) -> dict[str, Any]:
+    config_path = Path(path) if path is not None else Path(__file__).with_name('regime.yaml')
+    with config_path.open('r', encoding='utf-8') as fh:
+        return yaml.safe_load(fh)

+ 193 - 0
research/chinext50_regime_project/config/regime.yaml

@@ -0,0 +1,193 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+  thresholds:
+    risk_off_down_hazard: 0.67
+    risk_off_stress: 0.89
+    risk_off_trend_floor: -0.14
+    crash_override_down_hazard: 0.77
+    trend_score: 0.45
+    trend_breadth_min: -0.05
+    trend_stress_max: 0.45
+    euphoric_crowding: 0.70
+    euphoric_rebound_hazard: 0.68
+    repair_hazard: 0.58
+    repair_stress_max: 0.85
+    repair_d_stress_max: 0.0
+    repair_breadth_min: -1.0
+    repair_d_trend_min: -1.0
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 41 - 0
research/chinext50_regime_project/data/__init__.py

@@ -0,0 +1,41 @@
+from .io import (
+    FULL_PIT_REQUIRED_COLUMNS,
+    build_data_quality_report,
+    evaluate_data_quality_gate,
+    load_full_pit_data,
+    load_market_data,
+    load_point_in_time_panel,
+    merge_point_in_time_sidecar,
+    save_dataframe,
+    validate_full_pit_data_contract,
+    validate_market_data_contract,
+)
+from .breadth_builder import derive_breadth_sidecar, evaluate_breadth_semantic_gate, evaluate_breadth_source_integrity
+from .ingestion import REQUIRED_BREADTH_COLUMNS, merge_incremental_by_date, run_ingestion_pipeline, write_incremental_dataset
+from .index_metadata_snapshot import load_industry_snapshot, load_weight_snapshot
+from .pit_builder import build_pit_dataset
+from .sample_data import generate_synthetic_chinext50_data
+
+__all__ = [
+    'REQUIRED_BREADTH_COLUMNS',
+    'FULL_PIT_REQUIRED_COLUMNS',
+    'build_data_quality_report',
+    'evaluate_data_quality_gate',
+    'evaluate_breadth_semantic_gate',
+    'evaluate_breadth_source_integrity',
+    'build_pit_dataset',
+    'derive_breadth_sidecar',
+    'generate_synthetic_chinext50_data',
+    'merge_incremental_by_date',
+    'run_ingestion_pipeline',
+    'load_industry_snapshot',
+    'load_full_pit_data',
+    'load_market_data',
+    'load_point_in_time_panel',
+    'merge_point_in_time_sidecar',
+    'save_dataframe',
+    'load_weight_snapshot',
+    'write_incremental_dataset',
+    'validate_full_pit_data_contract',
+    'validate_market_data_contract',
+]

Разлика између датотеке није приказан због своје велике величине
+ 1216 - 0
research/chinext50_regime_project/data/breadth_builder.py


+ 65 - 0
research/chinext50_regime_project/data/index_metadata_snapshot.py

@@ -0,0 +1,65 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+import pandas as pd
+
+
+def _read_frame(path: str | Path) -> pd.DataFrame:
+    source = Path(path)
+    if not source.exists():
+        raise FileNotFoundError(f'Snapshot file not found: {source}')
+    if source.suffix.lower() == '.parquet':
+        return pd.read_parquet(source)
+    return pd.read_csv(source)
+
+
+def _normalize_symbol(value: Any) -> str:
+    text = str(value).strip()
+    if not text:
+        return ''
+    if '.' in text:
+        text = text.split('.')[0]
+    return text.zfill(6)
+
+
+def load_weight_snapshot(path: str | Path) -> pd.DataFrame:
+    raw = _read_frame(path).copy()
+    raw.columns = [str(col).strip().lower() for col in raw.columns]
+    required = {'date', 'symbol', 'weight'}
+    missing = sorted(required - set(raw.columns))
+    if missing:
+        raise ValueError(f'Weight snapshot missing required columns: {missing}')
+    raw['date'] = pd.to_datetime(raw['date'], errors='coerce')
+    raw['symbol'] = raw['symbol'].map(_normalize_symbol)
+    raw['weight'] = pd.to_numeric(raw['weight'], errors='coerce')
+    raw = raw.dropna(subset=['date', 'symbol', 'weight'])
+    raw = raw[raw['symbol'] != '']
+    if raw.empty:
+        raise ValueError('Weight snapshot is empty after normalization.')
+    panel = (
+        raw.sort_values(['date', 'symbol'])
+        .drop_duplicates(subset=['date', 'symbol'], keep='last')
+        .pivot(index='date', columns='symbol', values='weight')
+        .sort_index()
+    )
+    panel.index.name = 'date'
+    return panel
+
+
+def load_industry_snapshot(path: str | Path) -> pd.Series:
+    raw = _read_frame(path).copy()
+    raw.columns = [str(col).strip().lower() for col in raw.columns]
+    required = {'symbol', 'industry'}
+    missing = sorted(required - set(raw.columns))
+    if missing:
+        raise ValueError(f'Industry snapshot missing required columns: {missing}')
+    raw['symbol'] = raw['symbol'].map(_normalize_symbol)
+    raw['industry'] = raw['industry'].astype(str).str.strip()
+    raw = raw[(raw['symbol'] != '') & (raw['industry'] != '')]
+    if raw.empty:
+        raise ValueError('Industry snapshot is empty after normalization.')
+    out = raw.drop_duplicates(subset=['symbol'], keep='last').set_index('symbol')['industry']
+    out.index = out.index.map(_normalize_symbol)
+    return out.sort_index()

+ 710 - 0
research/chinext50_regime_project/data/ingestion.py

@@ -0,0 +1,710 @@
+from __future__ import annotations
+
+import importlib
+import json
+import time
+from pathlib import Path
+from typing import Any, Mapping
+from urllib.error import HTTPError, URLError
+from urllib.parse import urlencode
+from urllib.request import Request, urlopen
+
+import pandas as pd
+
+from .breadth_builder import (
+    BREADTH_REQUIRED_COLUMNS,
+    derive_breadth_sidecar,
+    evaluate_breadth_semantic_gate,
+    evaluate_breadth_source_integrity,
+)
+from .io import (
+    load_market_data,
+    load_point_in_time_panel,
+    merge_point_in_time_sidecar,
+    save_dataframe,
+    validate_market_data_contract,
+)
+from .pit_builder import build_pit_dataset
+
+
+REQUIRED_BREADTH_COLUMNS: tuple[str, ...] = BREADTH_REQUIRED_COLUMNS
+MAIRUI_BASE_URL = 'https://api.mairuiapi.com'
+
+
+def _read_dataframe(path: str | Path) -> pd.DataFrame:
+    input_path = Path(path)
+    if input_path.suffix.lower() == '.parquet':
+        return pd.read_parquet(input_path)
+    return pd.read_csv(input_path)
+
+
+def _normalize_date_index(df: pd.DataFrame, source_label: str) -> pd.DataFrame:
+    out = df.copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    if 'date' in out.columns:
+        parsed = pd.to_datetime(out['date'], errors='coerce')
+        invalid_count = int(parsed.isna().sum())
+        if invalid_count:
+            raise ValueError(f'{source_label} contains {invalid_count} invalid date values.')
+        out = out.drop(columns=['date'])
+        out.index = parsed
+    else:
+        parsed_index = pd.to_datetime(out.index, errors='coerce')
+        invalid_count = int(parsed_index.isna().sum())
+        if invalid_count:
+            raise ValueError(f'{source_label} must contain a date column or datetime-like index.')
+        out.index = parsed_index
+    out.index.name = 'date'
+    return out.sort_index()
+
+
+def merge_incremental_by_date(existing: pd.DataFrame | None, incoming: pd.DataFrame) -> pd.DataFrame:
+    incoming_norm = _normalize_date_index(incoming, source_label='incoming dataframe')
+    if existing is None:
+        return incoming_norm.sort_index()
+
+    existing_norm = _normalize_date_index(existing, source_label='existing dataframe')
+    merged = pd.concat([existing_norm, incoming_norm], axis=0, sort=False)
+    merged = merged[~merged.index.duplicated(keep='last')]
+    merged.index.name = 'date'
+    return merged.sort_index()
+
+
+def write_incremental_dataset(df: pd.DataFrame, path: str | Path) -> pd.DataFrame:
+    output_path = Path(path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    existing = _read_dataframe(output_path) if output_path.exists() else None
+    merged = merge_incremental_by_date(existing, df)
+    save_dataframe(merged, output_path)
+    return merged
+
+
+def _require_columns(df: pd.DataFrame, columns: tuple[str, ...] | list[str], *, label: str) -> None:
+    missing = sorted(set(columns) - set(df.columns))
+    if missing:
+        raise ValueError(f'{label} missing required columns: {missing}')
+
+
+def _load_close_panel(path: str | Path, target_column: str) -> pd.DataFrame:
+    panel = load_point_in_time_panel(path)
+    if target_column in panel.columns:
+        out = panel[[target_column]].copy()
+    elif 'close' in panel.columns:
+        out = panel[['close']].rename(columns={'close': target_column})
+    else:
+        raise ValueError(f'{path} must include column "{target_column}" or "close".')
+    return out
+
+
+def _normalize_akshare_ohlcv(df: pd.DataFrame, source_label: str) -> pd.DataFrame:
+    rename_map = {
+        '日期': 'date',
+        'date': 'date',
+        '开盘': 'open',
+        'open': 'open',
+        '最高': 'high',
+        'high': 'high',
+        '最低': 'low',
+        'low': 'low',
+        '收盘': 'close',
+        'close': 'close',
+        '成交量': 'volume',
+        '成交量(手)': 'volume',
+        'volume': 'volume',
+    }
+    out = df.rename(columns=rename_map).copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    _require_columns(out, ('date', 'open', 'high', 'low', 'close', 'volume'), label=source_label)
+    out = out[['date', 'open', 'high', 'low', 'close', 'volume']].copy()
+    return validate_market_data_contract(out, required_columns=('open', 'high', 'low', 'close', 'volume'), source_label=source_label)
+
+
+def _normalize_akshare_close(df: pd.DataFrame, source_label: str, target_column: str) -> pd.DataFrame:
+    rename_map = {
+        '日期': 'date',
+        'date': 'date',
+        '收盘': 'close',
+        'close': 'close',
+    }
+    out = df.rename(columns=rename_map).copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    _require_columns(out, ('date', 'close'), label=source_label)
+    panel = out[['date', 'close']].rename(columns={'close': target_column})
+    return validate_market_data_contract(panel, required_columns=(), source_label=source_label)
+
+
+def _load_akshare() -> Any:
+    try:
+        return importlib.import_module('akshare')
+    except ImportError as exc:
+        raise RuntimeError('provider "akshare" requires dependency "akshare". Install it first.') from exc
+
+
+def _fetch_akshare_series(
+    *,
+    symbol: str,
+    start_date: str | None,
+    end_date: str | None,
+    symbol_type: str,
+) -> pd.DataFrame:
+    ak = _load_akshare()
+    start_yyyymmdd = (start_date or '20050101').replace('-', '')
+    end_yyyymmdd = (end_date or pd.Timestamp.today().strftime('%Y%m%d')).replace('-', '')
+
+    symbol_type_norm = symbol_type.strip().lower()
+    if symbol_type_norm == 'etf':
+        return ak.fund_etf_hist_em(symbol=symbol, period='daily', start_date=start_yyyymmdd, end_date=end_yyyymmdd, adjust='')
+    if symbol_type_norm == 'index':
+        return ak.index_zh_a_hist(symbol=symbol, period='daily', start_date=start_yyyymmdd, end_date=end_yyyymmdd)
+    raise ValueError(f'Unsupported symbol_type: {symbol_type}')
+
+
+def _inject_licence(url: str, licence: str) -> str:
+    if not licence:
+        raise ValueError('mairui licence is required.')
+    out = str(url).strip()
+    if '{licence}' in out:
+        return out.replace('{licence}', licence)
+    if '您的licence' in out:
+        return out.replace('您的licence', licence)
+    if out.rstrip('/').endswith(licence):
+        return out
+    return f'{out.rstrip("/")}/{licence}'
+
+
+def _load_json_url(url: str) -> Any:
+    request = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
+    last_error: Exception | None = None
+    for attempt in range(5):
+        try:
+            with urlopen(request, timeout=30) as resp:
+                payload = json.loads(resp.read().decode('utf-8'))
+            if isinstance(payload, dict) and 'error' in payload:
+                raise ValueError(f'Mairui API error: {payload["error"]}')
+            return payload
+        except HTTPError as exc:
+            last_error = exc
+            if exc.code not in {429, 500, 502, 503, 504} or attempt == 4:
+                raise
+            time.sleep(1.0 + attempt * 1.5)
+        except URLError as exc:
+            last_error = exc
+            if attempt == 4:
+                raise
+            time.sleep(1.0 + attempt * 1.5)
+    if last_error is not None:
+        raise last_error
+    raise RuntimeError(f'Failed to fetch url: {url}')
+
+
+def _to_yyyymmdd(value: str | None) -> str | None:
+    if value is None:
+        return None
+    return pd.Timestamp(value).strftime('%Y%m%d')
+
+
+def _build_mairui_history_url(
+    *,
+    code: str,
+    kind: str,
+    licence: str,
+    start_date: str | None,
+    end_date: str | None,
+) -> str:
+    kind_norm = kind.strip().lower()
+    if kind_norm == 'index':
+        path = f'/hsindex/history/{code}/d/{licence}'
+    elif kind_norm == 'stock':
+        path = f'/hsstock/history/{code}/d/n/{licence}'
+    else:
+        raise ValueError(f'Unsupported mairui kind: {kind}')
+
+    params: dict[str, str] = {}
+    st = _to_yyyymmdd(start_date)
+    et = _to_yyyymmdd(end_date)
+    if st:
+        params['st'] = st
+    if et:
+        params['et'] = et
+    if params:
+        path = f'{path}?{urlencode(params)}'
+    return f'{MAIRUI_BASE_URL}{path}'
+
+
+def _normalize_mairui_ohlcv(payload: Any, source_label: str) -> pd.DataFrame:
+    if not isinstance(payload, list) or not payload:
+        raise ValueError(f'{source_label} returned empty or invalid payload.')
+    out = pd.DataFrame(payload).rename(columns={'t': 'date', 'o': 'open', 'h': 'high', 'l': 'low', 'c': 'close', 'v': 'volume'})
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    _require_columns(out, ('date', 'open', 'high', 'low', 'close', 'volume'), label=source_label)
+    out = out[['date', 'open', 'high', 'low', 'close', 'volume']].copy()
+    return validate_market_data_contract(out, required_columns=('open', 'high', 'low', 'close', 'volume'), source_label=source_label)
+
+
+def _normalize_mairui_close(payload: Any, *, target_column: str, source_label: str) -> pd.DataFrame:
+    if not isinstance(payload, list) or not payload:
+        raise ValueError(f'{source_label} returned empty or invalid payload.')
+    out = pd.DataFrame(payload).rename(columns={'t': 'date', 'c': target_column, 'close': target_column})
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    _require_columns(out, ('date', target_column), label=source_label)
+    panel = out[['date', target_column]].copy()
+    return validate_market_data_contract(panel, required_columns=(), source_label=source_label)
+
+
+def _fetch_mairui_history_ohlcv(
+    *,
+    code: str,
+    kind: str,
+    licence: str,
+    start_date: str | None,
+    end_date: str | None,
+    source_label: str,
+) -> pd.DataFrame:
+    url = _build_mairui_history_url(code=code, kind=kind, licence=licence, start_date=start_date, end_date=end_date)
+    payload = _load_json_url(url)
+    return _normalize_mairui_ohlcv(payload, source_label=source_label)
+
+
+def _fetch_mairui_history_close(
+    *,
+    code: str,
+    kind: str,
+    licence: str,
+    start_date: str | None,
+    end_date: str | None,
+    target_column: str,
+    source_label: str,
+) -> pd.DataFrame:
+    url = _build_mairui_history_url(code=code, kind=kind, licence=licence, start_date=start_date, end_date=end_date)
+    payload = _load_json_url(url)
+    return _normalize_mairui_close(payload, target_column=target_column, source_label=source_label)
+
+
+def _fetch_mairui_custom_panel(
+    *,
+    url: str,
+    licence: str,
+    required_columns: tuple[str, ...] | list[str],
+    source_label: str,
+    rename_map: Mapping[str, str] | None = None,
+) -> pd.DataFrame:
+    resolved = _inject_licence(url, licence)
+    payload = _load_json_url(resolved)
+    if isinstance(payload, dict) and isinstance(payload.get('data'), list):
+        payload = payload['data']
+    if not isinstance(payload, list) or not payload:
+        raise ValueError(f'{source_label} returned empty or invalid payload.')
+
+    out = pd.DataFrame(payload).copy()
+    if rename_map:
+        out = out.rename(columns={str(k): str(v) for k, v in rename_map.items()})
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    if 'date' not in out.columns:
+        if 't' in out.columns:
+            out = out.rename(columns={'t': 'date'})
+        elif 'dt' in out.columns:
+            out = out.rename(columns={'dt': 'date'})
+    cols = [str(col).strip().lower() for col in required_columns]
+    _require_columns(out, ['date', *cols], label=source_label)
+    panel = out[['date', *cols]].copy()
+    return validate_market_data_contract(panel, required_columns=(), source_label=source_label)
+
+
+def fetch_source_tables(
+    *,
+    provider: str,
+    market_csv: str | None,
+    hs300_csv: str | None,
+    star50_csv: str | None,
+    csi1000_csv: str | None,
+    breadth_csv: str | None,
+    market_symbol: str,
+    market_symbol_type: str,
+    hs300_symbol: str,
+    star50_symbol: str,
+    csi1000_symbol: str,
+    start_date: str | None,
+    end_date: str | None,
+    mairui_licence: str | None,
+    mairui_market_code: str,
+    mairui_market_kind: str,
+    mairui_hs300_code: str,
+    mairui_star50_code: str,
+    mairui_csi1000_code: str,
+    mairui_breadth_url: str | None,
+    mairui_breadth_map: Mapping[str, str] | None,
+    derive_breadth: bool,
+    breadth_index_symbol: str,
+    breadth_min_active_constituents: int,
+    breadth_max_constituents: int | None,
+    breadth_cache_dir: str | Path | None,
+    breadth_weight_snapshot_path: str | Path | None,
+    breadth_industry_snapshot_path: str | Path | None,
+) -> dict[str, Any]:
+    provider_norm = provider.strip().lower()
+    if provider_norm == 'csv':
+        if not all([market_csv, hs300_csv, star50_csv, csi1000_csv]):
+            raise ValueError('csv provider requires --market-csv, --hs300-csv, --star50-csv, --csi1000-csv.')
+
+        market = load_market_data(str(market_csv))
+        hs300 = _load_close_panel(str(hs300_csv), 'hs300_close')
+        star50 = _load_close_panel(str(star50_csv), 'star50_close')
+        csi1000 = _load_close_panel(str(csi1000_csv), 'csi1000_close')
+
+        breadth_source = ''
+        derivation_meta: dict[str, Any] | None = None
+        if breadth_csv:
+            breadth = load_point_in_time_panel(str(breadth_csv))
+            _require_columns(breadth, REQUIRED_BREADTH_COLUMNS, label='breadth panel')
+            breadth = breadth[list(REQUIRED_BREADTH_COLUMNS)].copy()
+            breadth_source = 'csv'
+        elif derive_breadth:
+            derived_start = start_date or (market.index.min().date().isoformat() if len(market) else None)
+            derived_end = end_date or (market.index.max().date().isoformat() if len(market) else None)
+            breadth, derivation_meta = derive_breadth_sidecar(
+                start_date=derived_start,
+                end_date=derived_end,
+                index_symbol=breadth_index_symbol,
+                mairui_licence=mairui_licence,
+                min_active_constituents=breadth_min_active_constituents,
+                max_constituents=breadth_max_constituents,
+                cache_dir=breadth_cache_dir,
+                weight_snapshot_path=breadth_weight_snapshot_path,
+                industry_snapshot_path=breadth_industry_snapshot_path,
+            )
+            breadth_source = 'derived'
+        else:
+            raise ValueError(
+                'csv provider requires breadth source via --breadth-csv or --derive-breadth.'
+            )
+
+        return {
+            'tables': {
+                'market': market,
+                'hs300': hs300,
+                'star50': star50,
+                'csi1000': csi1000,
+                'breadth': breadth,
+            },
+            'breadth_source': breadth_source,
+            'breadth_derivation': derivation_meta,
+        }
+
+    if provider_norm in {'akshare', 'mairui'}:
+        market: pd.DataFrame | None = None
+        hs300: pd.DataFrame | None = None
+        star50: pd.DataFrame | None = None
+        csi1000: pd.DataFrame | None = None
+        akshare_error: Exception | None = None
+
+        if provider_norm == 'akshare':
+            try:
+                market_raw = _fetch_akshare_series(
+                    symbol=market_symbol,
+                    start_date=start_date,
+                    end_date=end_date,
+                    symbol_type=market_symbol_type,
+                )
+                hs300_raw = _fetch_akshare_series(
+                    symbol=hs300_symbol,
+                    start_date=start_date,
+                    end_date=end_date,
+                    symbol_type='index',
+                )
+                star50_raw = _fetch_akshare_series(
+                    symbol=star50_symbol,
+                    start_date=start_date,
+                    end_date=end_date,
+                    symbol_type='index',
+                )
+                csi1000_raw = _fetch_akshare_series(
+                    symbol=csi1000_symbol,
+                    start_date=start_date,
+                    end_date=end_date,
+                    symbol_type='index',
+                )
+                market = _normalize_akshare_ohlcv(market_raw, source_label='akshare_market')
+                hs300 = _normalize_akshare_close(hs300_raw, source_label='akshare_hs300', target_column='hs300_close')
+                star50 = _normalize_akshare_close(star50_raw, source_label='akshare_star50', target_column='star50_close')
+                csi1000 = _normalize_akshare_close(csi1000_raw, source_label='akshare_csi1000', target_column='csi1000_close')
+            except Exception as exc:
+                akshare_error = exc
+
+        if market is None or hs300 is None or star50 is None or csi1000 is None:
+            if not mairui_licence:
+                if provider_norm == 'akshare' and akshare_error is not None:
+                    raise akshare_error
+                raise ValueError('mairui fallback requires --mairui-licence.')
+            market = _fetch_mairui_history_ohlcv(
+                code=mairui_market_code,
+                kind=mairui_market_kind,
+                licence=mairui_licence,
+                start_date=start_date,
+                end_date=end_date,
+                source_label='mairui_market',
+            )
+            hs300 = _fetch_mairui_history_close(
+                code=mairui_hs300_code,
+                kind='index',
+                licence=mairui_licence,
+                start_date=start_date,
+                end_date=end_date,
+                target_column='hs300_close',
+                source_label='mairui_hs300',
+            )
+            star50 = _fetch_mairui_history_close(
+                code=mairui_star50_code,
+                kind='index',
+                licence=mairui_licence,
+                start_date=start_date,
+                end_date=end_date,
+                target_column='star50_close',
+                source_label='mairui_star50',
+            )
+            csi1000 = _fetch_mairui_history_close(
+                code=mairui_csi1000_code,
+                kind='index',
+                licence=mairui_licence,
+                start_date=start_date,
+                end_date=end_date,
+                target_column='csi1000_close',
+                source_label='mairui_csi1000',
+            )
+
+        breadth_source = ''
+        derivation_meta = None
+        if breadth_csv:
+            breadth = load_point_in_time_panel(str(breadth_csv))
+            _require_columns(breadth, REQUIRED_BREADTH_COLUMNS, label='breadth panel')
+            breadth = breadth[list(REQUIRED_BREADTH_COLUMNS)].copy()
+            breadth_source = 'csv'
+        elif mairui_breadth_url and mairui_licence:
+            breadth = _fetch_mairui_custom_panel(
+                url=mairui_breadth_url,
+                licence=mairui_licence,
+                required_columns=REQUIRED_BREADTH_COLUMNS,
+                source_label='mairui_breadth',
+                rename_map=mairui_breadth_map,
+            )
+            breadth_source = 'mairui_custom'
+        elif derive_breadth:
+            breadth, derivation_meta = derive_breadth_sidecar(
+                start_date=start_date,
+                end_date=end_date,
+                index_symbol=breadth_index_symbol,
+                mairui_licence=mairui_licence,
+                min_active_constituents=breadth_min_active_constituents,
+                max_constituents=breadth_max_constituents,
+                cache_dir=breadth_cache_dir,
+                weight_snapshot_path=breadth_weight_snapshot_path,
+                industry_snapshot_path=breadth_industry_snapshot_path,
+            )
+            breadth_source = 'derived'
+        else:
+            raise ValueError(
+                'provider requires breadth source via --breadth-csv, --mairui-breadth-url with --mairui-licence, or --derive-breadth.'
+            )
+
+        return {
+            'tables': {
+                'market': market,
+                'hs300': hs300,
+                'star50': star50,
+                'csi1000': csi1000,
+                'breadth': breadth,
+            },
+            'breadth_source': breadth_source,
+            'breadth_derivation': derivation_meta,
+        }
+
+    raise ValueError(f'Unsupported provider: {provider}')
+
+
+def run_ingestion_pipeline(
+    *,
+    provider: str,
+    market_csv: str | None,
+    hs300_csv: str | None,
+    star50_csv: str | None,
+    csi1000_csv: str | None,
+    breadth_csv: str | None,
+    market_symbol: str,
+    market_symbol_type: str,
+    hs300_symbol: str,
+    star50_symbol: str,
+    csi1000_symbol: str,
+    start_date: str | None,
+    end_date: str | None,
+    mairui_licence: str | None,
+    mairui_market_code: str,
+    mairui_market_kind: str,
+    mairui_hs300_code: str,
+    mairui_star50_code: str,
+    mairui_csi1000_code: str,
+    mairui_breadth_url: str | None,
+    mairui_breadth_map: Mapping[str, str] | None,
+    derive_breadth: bool,
+    breadth_index_symbol: str,
+    breadth_min_active_constituents: int,
+    breadth_max_constituents: int | None,
+    breadth_cache_dir: str | Path | None,
+    breadth_weight_snapshot_path: str | Path | None,
+    breadth_industry_snapshot_path: str | Path | None,
+    output_dir: str | Path,
+    pit_output_path: str | Path,
+    strict: bool,
+    critical_columns: list[str] | None,
+    blocking_columns: list[str] | None,
+    default_min_coverage: float,
+    column_min_coverage: Mapping[str, float] | None,
+    breadth_integrity_min_unique_non_null: int,
+    breadth_integrity_max_dominant_value_ratio: float,
+    breadth_integrity_std_floor: float,
+    breadth_semantic_require_official_index_weight: bool,
+    breadth_semantic_require_time_varying_membership: bool,
+    breadth_semantic_max_industry_unknown_ratio: float,
+) -> dict[str, Any]:
+    output_root = Path(output_dir)
+    raw_dir = output_root / 'raw'
+    staging_dir = output_root / 'staging'
+    resolved_breadth_cache_dir: str | Path | None = breadth_cache_dir
+    if derive_breadth and resolved_breadth_cache_dir is None:
+        resolved_breadth_cache_dir = raw_dir / 'constituent_history'
+
+    source_bundle = fetch_source_tables(
+        provider=provider,
+        market_csv=market_csv,
+        hs300_csv=hs300_csv,
+        star50_csv=star50_csv,
+        csi1000_csv=csi1000_csv,
+        breadth_csv=breadth_csv,
+        market_symbol=market_symbol,
+        market_symbol_type=market_symbol_type,
+        hs300_symbol=hs300_symbol,
+        star50_symbol=star50_symbol,
+        csi1000_symbol=csi1000_symbol,
+        start_date=start_date,
+        end_date=end_date,
+        mairui_licence=mairui_licence,
+        mairui_market_code=mairui_market_code,
+        mairui_market_kind=mairui_market_kind,
+        mairui_hs300_code=mairui_hs300_code,
+        mairui_star50_code=mairui_star50_code,
+        mairui_csi1000_code=mairui_csi1000_code,
+        mairui_breadth_url=mairui_breadth_url,
+        mairui_breadth_map=mairui_breadth_map,
+        derive_breadth=derive_breadth,
+        breadth_index_symbol=breadth_index_symbol,
+        breadth_min_active_constituents=breadth_min_active_constituents,
+        breadth_max_constituents=breadth_max_constituents,
+        breadth_cache_dir=resolved_breadth_cache_dir,
+        breadth_weight_snapshot_path=breadth_weight_snapshot_path,
+        breadth_industry_snapshot_path=breadth_industry_snapshot_path,
+    )
+    tables: dict[str, pd.DataFrame] = source_bundle['tables']
+    breadth_source = str(source_bundle.get('breadth_source') or 'unknown')
+    breadth_derivation = source_bundle.get('breadth_derivation')
+
+    raw_market_path = raw_dir / 'market.csv'
+    raw_hs300_path = raw_dir / 'hs300.csv'
+    raw_star50_path = raw_dir / 'star50.csv'
+    raw_csi1000_path = raw_dir / 'csi1000.csv'
+    raw_breadth_path = raw_dir / 'breadth.csv'
+
+    market_raw = write_incremental_dataset(tables['market'], raw_market_path)
+    hs300_raw = write_incremental_dataset(tables['hs300'], raw_hs300_path)
+    star50_raw = write_incremental_dataset(tables['star50'], raw_star50_path)
+    csi1000_raw = write_incremental_dataset(tables['csi1000'], raw_csi1000_path)
+    breadth_raw = write_incremental_dataset(tables['breadth'], raw_breadth_path)
+
+    breadth_derivation_path: str | None = None
+    breadth_semantic_path: str | None = None
+    if breadth_derivation is not None:
+        derivation_path = raw_dir / 'breadth_derivation_summary.json'
+        with derivation_path.open('w', encoding='utf-8') as fh:
+            json.dump(breadth_derivation, fh, ensure_ascii=False, indent=2)
+        breadth_derivation_path = str(derivation_path)
+        breadth_semantic = evaluate_breadth_semantic_gate(
+            breadth_derivation,
+            strict=strict,
+            require_official_index_weight=breadth_semantic_require_official_index_weight,
+            require_time_varying_membership=breadth_semantic_require_time_varying_membership,
+            max_industry_unknown_ratio=breadth_semantic_max_industry_unknown_ratio,
+        )
+        semantic_path = raw_dir / 'breadth_semantic_summary.json'
+        with semantic_path.open('w', encoding='utf-8') as fh:
+            json.dump(breadth_semantic, fh, ensure_ascii=False, indent=2)
+        breadth_semantic_path = str(semantic_path)
+        if breadth_semantic['blocking']:
+            failed = breadth_semantic.get('failures') or []
+            detail = ', '.join(f"{item['field']}:{item['reason']}" for item in failed[:8])
+            raise ValueError(f'Breadth semantic gate failed in strict mode. Findings: {detail}')
+
+    breadth_integrity = evaluate_breadth_source_integrity(
+        breadth_raw,
+        required_columns=REQUIRED_BREADTH_COLUMNS,
+        min_unique_non_null=breadth_integrity_min_unique_non_null,
+        max_dominant_value_ratio=breadth_integrity_max_dominant_value_ratio,
+        std_floor=breadth_integrity_std_floor,
+        strict=strict,
+    )
+    breadth_integrity_path = raw_dir / 'breadth_integrity_summary.json'
+    with breadth_integrity_path.open('w', encoding='utf-8') as fh:
+        json.dump(breadth_integrity, fh, ensure_ascii=False, indent=2)
+    if breadth_integrity['blocking']:
+        failed = breadth_integrity.get('failures') or []
+        detail = ', '.join(f"{item['column']}:{item['reason']}" for item in failed[:8])
+        raise ValueError(f'Breadth integrity gate failed in strict mode. Findings: {detail}')
+
+    staging_market = validate_market_data_contract(
+        market_raw,
+        required_columns=('open', 'high', 'low', 'close', 'volume'),
+        source_label='staging_market',
+    )
+    staging_sidecar = hs300_raw.copy()
+    for panel in [star50_raw, csi1000_raw, breadth_raw]:
+        staging_sidecar = merge_point_in_time_sidecar(staging_sidecar, panel)
+
+    staging_market_path = staging_dir / 'market.csv'
+    staging_sidecar_path = staging_dir / 'sidecar.csv'
+    save_dataframe(staging_market, staging_market_path)
+    save_dataframe(staging_sidecar, staging_sidecar_path)
+
+    pit_df, quality_summary = build_pit_dataset(
+        staging_market_path,
+        sidecar_paths=[staging_sidecar_path],
+        strict=strict,
+        critical_columns=critical_columns,
+        blocking_columns=blocking_columns,
+        default_min_coverage=default_min_coverage,
+        column_min_coverage=column_min_coverage,
+    )
+
+    pit_output = Path(pit_output_path)
+    pit_output.parent.mkdir(parents=True, exist_ok=True)
+    summary_path = pit_output.parent / 'pit_quality_summary.json'
+    with summary_path.open('w', encoding='utf-8') as fh:
+        json.dump(quality_summary, fh, ensure_ascii=False, indent=2)
+
+    if quality_summary['blocking']:
+        failed_items = quality_summary.get('errors') or quality_summary['breaches']
+        breached = ', '.join(item['column'] for item in failed_items)
+        raise ValueError(f'PIT data quality gate failed in strict mode. Breached columns: {breached}')
+
+    save_dataframe(pit_df, pit_output)
+    return {
+        'raw_market_path': str(raw_market_path),
+        'raw_hs300_path': str(raw_hs300_path),
+        'raw_star50_path': str(raw_star50_path),
+        'raw_csi1000_path': str(raw_csi1000_path),
+        'raw_breadth_path': str(raw_breadth_path),
+        'breadth_source': breadth_source,
+        'breadth_derivation_path': breadth_derivation_path,
+        'breadth_semantic_path': breadth_semantic_path,
+        'breadth_integrity_path': str(breadth_integrity_path),
+        'staging_market_path': str(staging_market_path),
+        'staging_sidecar_path': str(staging_sidecar_path),
+        'pit_output_path': str(pit_output),
+        'quality_summary_path': str(summary_path),
+        'row_count': int(len(pit_df)),
+    }

+ 257 - 0
research/chinext50_regime_project/data/io.py

@@ -0,0 +1,257 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any, Iterable, Mapping
+
+import pandas as pd
+
+
+REQUIRED_COLUMNS = {'open', 'high', 'low', 'close', 'volume'}
+FULL_PIT_REQUIRED_COLUMNS: tuple[str, ...] = (
+    'open',
+    'high',
+    'low',
+    'close',
+    'volume',
+    'hs300_close',
+    'star50_close',
+    'csi1000_close',
+    'pct_constituents_above_20dma',
+    'pct_constituents_above_60dma',
+    'pct_new_high_20',
+    'pct_new_low_20',
+    'eq_weight_ret_5',
+    'weighted_ret_5',
+    'top3_contribution_5',
+    'top1_contribution_5',
+    'top10_contribution_5',
+    'sector_concentration_20',
+    'corr_spike_20',
+    'dispersion_20',
+)
+DEFAULT_CRITICAL_COLUMNS: tuple[str, ...] = (
+    'open',
+    'high',
+    'low',
+    'close',
+    'volume',
+    'hs300_close',
+    'star50_close',
+    'csi1000_close',
+    'pct_constituents_above_20dma',
+    'pct_constituents_above_60dma',
+    'pct_new_high_20',
+    'pct_new_low_20',
+    'eq_weight_ret_5',
+    'weighted_ret_5',
+    'top3_contribution_5',
+    'top1_contribution_5',
+    'top10_contribution_5',
+    'sector_concentration_20',
+    'corr_spike_20',
+    'dispersion_20',
+)
+DEFAULT_MIN_COVERAGE = 0.95
+
+
+def _read_dataframe(path: str | Path) -> pd.DataFrame:
+    data_path = Path(path)
+    if data_path.suffix.lower() == '.parquet':
+        return pd.read_parquet(data_path)
+    return pd.read_csv(data_path)
+
+
+def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
+    out = df.copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    return out
+
+
+def _coerce_datetime_index(df: pd.DataFrame, source_label: str) -> pd.DataFrame:
+    out = df.copy()
+
+    if 'date' in out.columns:
+        parsed = pd.to_datetime(out['date'], errors='coerce')
+        invalid_count = int(parsed.isna().sum())
+        if invalid_count:
+            raise ValueError(f'{source_label} contains {invalid_count} invalid date values.')
+        out = out.drop(columns=['date'])
+        out.index = parsed
+    else:
+        parsed_index = pd.to_datetime(out.index, errors='coerce')
+        invalid_count = int(parsed_index.isna().sum())
+        if invalid_count:
+            raise ValueError(f'{source_label} must contain a date column or datetime-like index.')
+        out.index = parsed_index
+
+    out.index.name = 'date'
+    return out
+
+
+def validate_market_data_contract(
+    df: pd.DataFrame,
+    required_columns: Iterable[str] | None = None,
+    source_label: str = 'dataset',
+) -> pd.DataFrame:
+    out = _normalize_columns(df)
+    out = _coerce_datetime_index(out, source_label=source_label)
+
+    normalized_required = REQUIRED_COLUMNS if required_columns is None else {str(col).strip().lower() for col in required_columns}
+    missing = normalized_required - set(out.columns)
+    if missing:
+        raise ValueError(f'{source_label} missing required columns: {sorted(missing)}')
+
+    duplicate_mask = out.index.duplicated(keep=False)
+    duplicate_count = int(duplicate_mask.sum())
+    if duplicate_count:
+        sample_dates = sorted({ts.strftime('%Y-%m-%d') for ts in out.index[duplicate_mask][:5]})
+        raise ValueError(f'{source_label} contains duplicate trading dates: {sample_dates}')
+
+    return out.sort_index()
+
+
+def load_market_data(path: str | Path) -> pd.DataFrame:
+    raw = _read_dataframe(path)
+    return validate_market_data_contract(raw, required_columns=REQUIRED_COLUMNS, source_label=str(path))
+
+
+def load_point_in_time_panel(path: str | Path) -> pd.DataFrame:
+    raw = _read_dataframe(path)
+    return validate_market_data_contract(raw, required_columns=(), source_label=str(path))
+
+
+def validate_full_pit_data_contract(
+    df: pd.DataFrame,
+    required_columns: Iterable[str] | None = None,
+    source_label: str = 'dataset',
+) -> pd.DataFrame:
+    normalized_required = FULL_PIT_REQUIRED_COLUMNS if required_columns is None else tuple(
+        str(col).strip().lower() for col in required_columns
+    )
+    return validate_market_data_contract(df, required_columns=normalized_required, source_label=source_label)
+
+
+def load_full_pit_data(path: str | Path, required_columns: Iterable[str] | None = None) -> pd.DataFrame:
+    raw = _read_dataframe(path)
+    return validate_full_pit_data_contract(raw, required_columns=required_columns, source_label=str(path))
+
+
+def merge_point_in_time_sidecar(base_df: pd.DataFrame, sidecar_df: pd.DataFrame) -> pd.DataFrame:
+    base = validate_market_data_contract(base_df, required_columns=(), source_label='base dataframe')
+    sidecar = validate_market_data_contract(sidecar_df, required_columns=(), source_label='sidecar dataframe')
+
+    conflicts = sorted(set(base.columns) & set(sidecar.columns))
+    if conflicts:
+        raise ValueError(f'Sidecar merge has conflicting non-key columns: {conflicts}')
+
+    merged = base.join(sidecar, how='left')
+    merged.index.name = 'date'
+    return merged
+
+
+def _get_threshold(column: str, default_min_coverage: float, column_min_coverage: Mapping[str, float]) -> float:
+    value = float(column_min_coverage.get(column, default_min_coverage))
+    if value < 0.0 or value > 1.0:
+        raise ValueError(f'Coverage threshold for {column} must be between 0 and 1.')
+    return value
+
+
+def build_data_quality_report(df: pd.DataFrame, critical_columns: Iterable[str] | None = None) -> dict[str, Any]:
+    out = validate_market_data_contract(df, required_columns=(), source_label='quality report dataframe')
+    columns = [str(col).strip().lower() for col in (critical_columns or DEFAULT_CRITICAL_COLUMNS)]
+    row_count = int(len(out))
+
+    column_report: dict[str, dict[str, Any]] = {}
+    for col in columns:
+        if col not in out.columns:
+            column_report[col] = {
+                'present': False,
+                'non_null_count': 0,
+                'non_null_ratio': 0.0,
+                'missing_ratio': 1.0,
+            }
+            continue
+        non_null_count = int(out[col].notna().sum())
+        non_null_ratio = float(non_null_count / row_count) if row_count else 0.0
+        column_report[col] = {
+            'present': True,
+            'non_null_count': non_null_count,
+            'non_null_ratio': non_null_ratio,
+            'missing_ratio': float(1.0 - non_null_ratio),
+        }
+
+    return {
+        'row_count': row_count,
+        'date_start': out.index.min().date().isoformat() if row_count else None,
+        'date_end': out.index.max().date().isoformat() if row_count else None,
+        'duplicate_date_count': int(out.index.duplicated().sum()),
+        'columns': column_report,
+    }
+
+
+def evaluate_data_quality_gate(
+    df: pd.DataFrame,
+    *,
+    strict: bool = False,
+    critical_columns: Iterable[str] | None = None,
+    blocking_columns: Iterable[str] | None = None,
+    default_min_coverage: float = DEFAULT_MIN_COVERAGE,
+    column_min_coverage: Mapping[str, float] | None = None,
+) -> dict[str, Any]:
+    if default_min_coverage < 0.0 or default_min_coverage > 1.0:
+        raise ValueError('default_min_coverage must be between 0 and 1.')
+
+    normalized_columns = [str(col).strip().lower() for col in (critical_columns or DEFAULT_CRITICAL_COLUMNS)]
+    per_column = {str(k).strip().lower(): float(v) for k, v in (column_min_coverage or {}).items()}
+    if blocking_columns is None:
+        blocking_set = set(normalized_columns)
+    else:
+        blocking_set = {str(col).strip().lower() for col in blocking_columns}
+    report = build_data_quality_report(df, critical_columns=normalized_columns)
+
+    breaches: list[dict[str, Any]] = []
+    error_breaches: list[dict[str, Any]] = []
+    warning_breaches: list[dict[str, Any]] = []
+    for col in normalized_columns:
+        threshold = _get_threshold(col, default_min_coverage, per_column)
+        observed = float(report['columns'][col]['non_null_ratio'])
+        if observed < threshold:
+            severity = 'error' if col in blocking_set else 'warning'
+            breach = {
+                'column': col,
+                'threshold': threshold,
+                'observed': observed,
+                'reason': 'below_min_coverage',
+                'severity': severity,
+            }
+            breaches.append(breach)
+            if severity == 'error':
+                error_breaches.append(breach)
+            else:
+                warning_breaches.append(breach)
+
+    blocking = bool(strict and error_breaches)
+    mode = 'strict' if strict else 'non_strict'
+    return {
+        'mode': mode,
+        'strict': bool(strict),
+        'passed': not blocking,
+        'blocking': blocking,
+        'blocking_columns': sorted(blocking_set),
+        'default_min_coverage': float(default_min_coverage),
+        'column_min_coverage': {col: _get_threshold(col, default_min_coverage, per_column) for col in normalized_columns},
+        'critical_columns': normalized_columns,
+        'breaches': breaches,
+        'errors': error_breaches if strict else [],
+        'warnings': breaches if not strict else warning_breaches,
+        'quality_report': report,
+    }
+
+
+def save_dataframe(df: pd.DataFrame, path: str | Path) -> None:
+    output_path = Path(path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    if output_path.suffix.lower() == '.parquet':
+        df.to_parquet(output_path)
+    else:
+        df.to_csv(output_path, index=True)

+ 50 - 0
research/chinext50_regime_project/data/pit_builder.py

@@ -0,0 +1,50 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any, Iterable, Mapping, Sequence
+
+import pandas as pd
+
+from .io import (
+    DEFAULT_MIN_COVERAGE,
+    evaluate_data_quality_gate,
+    load_market_data,
+    load_point_in_time_panel,
+    merge_point_in_time_sidecar,
+)
+
+
+def build_pit_dataset(
+    market_path: str | Path,
+    *,
+    sidecar_paths: Sequence[str | Path] | None = None,
+    strict: bool = False,
+    critical_columns: Iterable[str] | None = None,
+    blocking_columns: Iterable[str] | None = None,
+    default_min_coverage: float = DEFAULT_MIN_COVERAGE,
+    column_min_coverage: Mapping[str, float] | None = None,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    market_path = str(market_path)
+    ordered_sidecars = [str(path) for path in (sidecar_paths or [])]
+
+    pit = load_market_data(market_path)
+    for sidecar_path in ordered_sidecars:
+        sidecar = load_point_in_time_panel(sidecar_path)
+        pit = merge_point_in_time_sidecar(pit, sidecar)
+
+    quality = evaluate_data_quality_gate(
+        pit,
+        strict=strict,
+        critical_columns=critical_columns,
+        blocking_columns=blocking_columns,
+        default_min_coverage=default_min_coverage,
+        column_min_coverage=column_min_coverage,
+    )
+    quality['sources'] = {
+        'market_path': market_path,
+        'sidecar_paths': ordered_sidecars,
+        'sidecar_count': int(len(ordered_sidecars)),
+        'merged_row_count': int(len(pit)),
+    }
+    quality['pit_columns'] = sorted([str(col) for col in pit.columns])
+    return pit, quality

+ 213 - 0
research/chinext50_regime_project/data/sample_data.py

@@ -0,0 +1,213 @@
+from __future__ import annotations
+
+import math
+from typing import Iterable
+
+import numpy as np
+import pandas as pd
+
+
+STATE_PARAMS: dict[str, tuple[float, float]] = {
+    'trend': (0.0012, 0.018),
+    'stress': (-0.0018, 0.032),
+    'repair': (0.0010, 0.024),
+    'chop': (0.0001, 0.014),
+    'euphoric': (0.0015, 0.026),
+}
+
+
+def _state_schedule(length: int) -> list[str]:
+    schedule: list[str] = []
+    blocks: list[tuple[str, int]] = [
+        ('trend', 140),
+        ('stress', 55),
+        ('repair', 65),
+        ('chop', 100),
+        ('euphoric', 80),
+        ('stress', 45),
+        ('repair', 55),
+        ('trend', 120),
+    ]
+    while len(schedule) < length:
+        for state, block_len in blocks:
+            schedule.extend([state] * block_len)
+            if len(schedule) >= length:
+                break
+    return schedule[:length]
+
+
+def generate_synthetic_chinext50_data(
+    start: str = '2016-01-01',
+    periods: int = 2400,
+    seed: int = 7,
+) -> pd.DataFrame:
+    rng = np.random.default_rng(seed)
+    dates = pd.bdate_range(start=start, periods=periods)
+    states = _state_schedule(periods)
+
+    close = np.empty(periods, dtype=float)
+    open_ = np.empty(periods, dtype=float)
+    high = np.empty(periods, dtype=float)
+    low = np.empty(periods, dtype=float)
+    volume = np.empty(periods, dtype=float)
+    hs300 = np.empty(periods, dtype=float)
+    star50 = np.empty(periods, dtype=float)
+    csi1000 = np.empty(periods, dtype=float)
+
+    close[0] = 1000.0
+    open_[0] = 1000.0
+    hs300[0] = 3000.0
+    star50[0] = 1200.0
+    csi1000[0] = 5000.0
+    volume[0] = 1e9
+    high[0] = close[0] * 1.01
+    low[0] = close[0] * 0.99
+
+    breadth_20 = np.empty(periods, dtype=float)
+    breadth_60 = np.empty(periods, dtype=float)
+    new_high = np.empty(periods, dtype=float)
+    new_low = np.empty(periods, dtype=float)
+    eq_ret_5 = np.empty(periods, dtype=float)
+    weighted_ret_5 = np.empty(periods, dtype=float)
+    top3_contribution_5 = np.empty(periods, dtype=float)
+    top1_contribution_5 = np.empty(periods, dtype=float)
+    top10_contribution_5 = np.empty(periods, dtype=float)
+    sector_concentration_20 = np.empty(periods, dtype=float)
+    corr_spike_20 = np.empty(periods, dtype=float)
+    dispersion_20 = np.empty(periods, dtype=float)
+
+    breadth_20[0] = 0.50
+    breadth_60[0] = 0.50
+    new_high[0] = 0.08
+    new_low[0] = 0.08
+    eq_ret_5[0] = 0.0
+    weighted_ret_5[0] = 0.0
+    top3_contribution_5[0] = 0.35
+    top1_contribution_5[0] = 0.12
+    top10_contribution_5[0] = 0.62
+    sector_concentration_20[0] = 0.22
+    corr_spike_20[0] = 0.0
+    dispersion_20[0] = 0.0
+
+    def clip01(x: float) -> float:
+        return max(0.0, min(1.0, x))
+
+    for i in range(1, periods):
+        state = states[i]
+        drift, vol = STATE_PARAMS[state]
+        gap = rng.normal(0.0, vol * 0.18)
+        intraday = drift + rng.normal(0.0, vol)
+        open_[i] = close[i - 1] * (1.0 + gap)
+        close[i] = open_[i] * (1.0 + intraday)
+        daily_range = abs(intraday) + abs(rng.normal(0.0, vol * 0.65)) + 0.004
+        high[i] = max(open_[i], close[i]) * (1.0 + daily_range * 0.55)
+        low[i] = min(open_[i], close[i]) * (1.0 - daily_range * 0.45)
+
+        bench_drift = drift * 0.55
+        bench_vol = vol * 0.58
+        hs300[i] = hs300[i - 1] * (1.0 + bench_drift + rng.normal(0.0, bench_vol))
+        star50[i] = star50[i - 1] * (1.0 + drift * 0.75 + rng.normal(0.0, vol * 0.75))
+        csi1000[i] = csi1000[i - 1] * (1.0 + drift * 0.85 + rng.normal(0.0, vol * 0.88))
+
+        volume_base = {
+            'trend': 1.10,
+            'stress': 1.45,
+            'repair': 1.20,
+            'chop': 0.92,
+            'euphoric': 1.55,
+        }[state]
+        volume[i] = volume[i - 1] * (0.92 + 0.18 * volume_base + rng.uniform(-0.08, 0.08))
+
+        if state == 'trend':
+            breadth_20[i] = clip01(0.72 + rng.normal(0.0, 0.06))
+            breadth_60[i] = clip01(0.68 + rng.normal(0.0, 0.05))
+            new_high[i] = clip01(0.20 + rng.normal(0.0, 0.04))
+            new_low[i] = clip01(0.05 + rng.normal(0.0, 0.02))
+            eq_ret_5[i] = rng.normal(0.032, 0.018)
+            weighted_ret_5[i] = eq_ret_5[i] + rng.normal(0.002, 0.007)
+            top3_contribution_5[i] = clip01(0.34 + rng.normal(0.0, 0.06))
+            top1_contribution_5[i] = clip01(0.12 + rng.normal(0.0, 0.03))
+            top10_contribution_5[i] = clip01(0.58 + rng.normal(0.0, 0.06))
+            sector_concentration_20[i] = clip01(0.18 + rng.normal(0.0, 0.05))
+            corr_spike_20[i] = rng.normal(0.12, 0.08)
+            dispersion_20[i] = abs(rng.normal(0.18, 0.06))
+        elif state == 'stress':
+            breadth_20[i] = clip01(0.20 + rng.normal(0.0, 0.06))
+            breadth_60[i] = clip01(0.18 + rng.normal(0.0, 0.05))
+            new_high[i] = clip01(0.02 + rng.normal(0.0, 0.01))
+            new_low[i] = clip01(0.26 + rng.normal(0.0, 0.05))
+            eq_ret_5[i] = rng.normal(-0.050, 0.024)
+            weighted_ret_5[i] = eq_ret_5[i] + rng.normal(-0.008, 0.008)
+            top3_contribution_5[i] = clip01(0.46 + rng.normal(0.0, 0.05))
+            top1_contribution_5[i] = clip01(0.20 + rng.normal(0.0, 0.04))
+            top10_contribution_5[i] = clip01(0.68 + rng.normal(0.0, 0.05))
+            sector_concentration_20[i] = clip01(0.30 + rng.normal(0.0, 0.06))
+            corr_spike_20[i] = rng.normal(0.75, 0.12)
+            dispersion_20[i] = abs(rng.normal(0.38, 0.09))
+        elif state == 'repair':
+            breadth_20[i] = clip01(0.48 + 0.003 * (i % 25) + rng.normal(0.0, 0.05))
+            breadth_60[i] = clip01(0.40 + rng.normal(0.0, 0.05))
+            new_high[i] = clip01(0.09 + rng.normal(0.0, 0.03))
+            new_low[i] = clip01(0.10 + rng.normal(0.0, 0.03))
+            eq_ret_5[i] = rng.normal(0.020, 0.020)
+            weighted_ret_5[i] = eq_ret_5[i] + rng.normal(-0.002, 0.006)
+            top3_contribution_5[i] = clip01(0.31 + rng.normal(0.0, 0.05))
+            top1_contribution_5[i] = clip01(0.13 + rng.normal(0.0, 0.03))
+            top10_contribution_5[i] = clip01(0.60 + rng.normal(0.0, 0.06))
+            sector_concentration_20[i] = clip01(0.22 + rng.normal(0.0, 0.05))
+            corr_spike_20[i] = rng.normal(0.34, 0.10)
+            dispersion_20[i] = abs(rng.normal(0.24, 0.07))
+        elif state == 'euphoric':
+            breadth_20[i] = clip01(0.60 + rng.normal(0.0, 0.07))
+            breadth_60[i] = clip01(0.58 + rng.normal(0.0, 0.06))
+            new_high[i] = clip01(0.24 + rng.normal(0.0, 0.05))
+            new_low[i] = clip01(0.04 + rng.normal(0.0, 0.02))
+            eq_ret_5[i] = rng.normal(0.028, 0.023)
+            weighted_ret_5[i] = eq_ret_5[i] + rng.normal(0.012, 0.010)
+            top3_contribution_5[i] = clip01(0.55 + rng.normal(0.0, 0.07))
+            top1_contribution_5[i] = clip01(0.24 + rng.normal(0.0, 0.04))
+            top10_contribution_5[i] = clip01(0.72 + rng.normal(0.0, 0.05))
+            sector_concentration_20[i] = clip01(0.34 + rng.normal(0.0, 0.06))
+            corr_spike_20[i] = rng.normal(0.26, 0.09)
+            dispersion_20[i] = abs(rng.normal(0.28, 0.08))
+        else:  # chop
+            breadth_20[i] = clip01(0.50 + rng.normal(0.0, 0.08))
+            breadth_60[i] = clip01(0.48 + rng.normal(0.0, 0.07))
+            new_high[i] = clip01(0.07 + rng.normal(0.0, 0.03))
+            new_low[i] = clip01(0.08 + rng.normal(0.0, 0.03))
+            eq_ret_5[i] = rng.normal(0.004, 0.015)
+            weighted_ret_5[i] = eq_ret_5[i] + rng.normal(0.0, 0.005)
+            top3_contribution_5[i] = clip01(0.37 + rng.normal(0.0, 0.06))
+            top1_contribution_5[i] = clip01(0.15 + rng.normal(0.0, 0.03))
+            top10_contribution_5[i] = clip01(0.62 + rng.normal(0.0, 0.06))
+            sector_concentration_20[i] = clip01(0.24 + rng.normal(0.0, 0.06))
+            corr_spike_20[i] = rng.normal(0.20, 0.08)
+            dispersion_20[i] = abs(rng.normal(0.22, 0.06))
+
+    df = pd.DataFrame(
+        {
+            'date': dates,
+            'open': open_,
+            'high': high,
+            'low': low,
+            'close': close,
+            'volume': volume,
+            'hs300_close': hs300,
+            'star50_close': star50,
+            'csi1000_close': csi1000,
+            'pct_constituents_above_20dma': breadth_20,
+            'pct_constituents_above_60dma': breadth_60,
+            'pct_new_high_20': new_high,
+            'pct_new_low_20': new_low,
+            'eq_weight_ret_5': eq_ret_5,
+            'weighted_ret_5': weighted_ret_5,
+            'top3_contribution_5': top3_contribution_5,
+            'top1_contribution_5': top1_contribution_5,
+            'top10_contribution_5': top10_contribution_5,
+            'sector_concentration_20': sector_concentration_20,
+            'corr_spike_20': corr_spike_20,
+            'dispersion_20': dispersion_20,
+            'synthetic_regime_hint': states,
+        }
+    )
+    return df.set_index('date')

+ 558 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B1/real_walkforward_report.py

@@ -0,0 +1,558 @@
+from __future__ import annotations
+
+import copy
+from pathlib import Path
+import sys
+from typing import Any, Mapping
+
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+import argparse
+import json
+
+import pandas as pd
+
+from backtest.engine import compute_metrics, run_backtest
+from backtest.frozen_walkforward import (
+    HypothesisCandidate,
+    normalize_hypothesis_candidates,
+    run_frozen_walkforward,
+    run_strategy_bundle,
+)
+from backtest.utility import utility_from_metrics, utility_status
+from backtest.walkforward import WindowSpec, build_expanding_windows
+from config.loader import load_config
+from data.io import evaluate_data_quality_gate, load_full_pit_data
+
+
+def _resolve_data_quality_settings(
+    config: dict[str, Any],
+    *,
+    strict_cli: bool,
+    min_coverage_cli: float | None,
+) -> tuple[bool, float, list[str] | None, list[str] | None, dict[str, float]]:
+    quality_cfg = config.get('data_quality', {})
+    strict_mode = bool(quality_cfg.get('strict_mode_default', False)) or strict_cli
+    default_min_coverage = float(quality_cfg.get('default_min_coverage', 0.95))
+    if min_coverage_cli is not None:
+        default_min_coverage = float(min_coverage_cli)
+    critical_columns = [str(col).strip().lower() for col in quality_cfg.get('critical_columns', [])]
+    blocking_columns = [str(col).strip().lower() for col in quality_cfg.get('blocking_columns', critical_columns)]
+    column_min_coverage = {
+        str(column).strip().lower(): float(value) for column, value in quality_cfg.get('column_min_coverage', {}).items()
+    }
+    return strict_mode, default_min_coverage, (critical_columns or None), (blocking_columns or None), column_min_coverage
+
+
+def _load_candidate_payload(path: str | None) -> list[dict[str, Any]] | None:
+    if not path:
+        return None
+    with Path(path).open('r', encoding='utf-8') as fh:
+        payload = json.load(fh)
+    if not isinstance(payload, list):
+        raise ValueError('Candidate file must be a JSON list of candidate objects.')
+    return payload
+
+
+def _resolve_frozen_settings(
+    config: dict[str, Any],
+    *,
+    candidates_json: str | None,
+    min_train_rows_cli: int | None,
+    min_test_rows_cli: int | None,
+) -> tuple[list[HypothesisCandidate], int, int]:
+    frozen_cfg = config.get('frozen_validation', {})
+    raw_candidates = _load_candidate_payload(candidates_json) or frozen_cfg.get('candidates')
+    candidates = normalize_hypothesis_candidates(raw_candidates)
+
+    min_train_rows = int(frozen_cfg.get('min_train_rows', 120))
+    min_test_rows = int(frozen_cfg.get('min_test_rows', 40))
+    if min_train_rows_cli is not None:
+        min_train_rows = int(min_train_rows_cli)
+    if min_test_rows_cli is not None:
+        min_test_rows = int(min_test_rows_cli)
+    return candidates, min_train_rows, min_test_rows
+
+
+def _serialize_windows(windows: list[WindowSpec]) -> list[dict[str, str]]:
+    return [
+        {
+            'train_start': window.train_start,
+            'train_end': window.train_end,
+            'test_start': window.test_start,
+            'test_end': window.test_end,
+        }
+        for window in windows
+    ]
+
+
+def _resolve_walkforward_windows(config: dict[str, Any], raw_index) -> list[WindowSpec]:
+    frozen_cfg = config.get('frozen_validation', {})
+    window_mode = str(frozen_cfg.get('window_mode', 'expanding')).strip().lower()
+    if window_mode != 'expanding':
+        raise ValueError(f'Unsupported window_mode: {window_mode}')
+    return build_expanding_windows(
+        raw_index,
+        min_train_years=int(frozen_cfg.get('min_train_years', 2)),
+        test_years=int(frozen_cfg.get('test_years', 1)),
+        allow_partial_last_test=bool(frozen_cfg.get('allow_partial_last_test', True)),
+    )
+
+
+def _normalize_metrics(metrics: dict[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[key] = float(value)
+        else:
+            out[key] = value
+    return out
+
+
+def _safe_divide(numerator: float, denominator: float) -> float | None:
+    if abs(float(denominator)) < 1e-12:
+        return None
+    return float(numerator / denominator)
+
+
+def _build_baseline_plan(raw: pd.DataFrame) -> pd.DataFrame:
+    baseline = raw.copy()
+    baseline['target_exposure'] = 1.0
+    return baseline
+
+
+def _deep_merge_dict(base: Mapping[str, Any], overrides: Mapping[str, Any]) -> dict[str, Any]:
+    out = copy.deepcopy(dict(base))
+    for key, value in overrides.items():
+        if isinstance(value, Mapping) and isinstance(out.get(key), Mapping):
+            out[key] = _deep_merge_dict(dict(out[key]), value)
+        else:
+            out[key] = copy.deepcopy(value)
+    return out
+
+
+def _candidate_config(base_config: Mapping[str, Any], candidate: str, overrides: Mapping[str, Any]) -> dict[str, Any]:
+    merged = _deep_merge_dict(base_config, overrides)
+    merged['_candidate_id'] = str(candidate)
+    return merged
+
+
+def _resolve_window_success_rule(config: Mapping[str, Any]) -> dict[str, Any]:
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    return {
+        'upside_min': float(evaluation_cfg.get('primary_window_success_upside_min', 0.25)),
+        'drawdown_ratio_max': float(evaluation_cfg.get('primary_window_success_drawdown_ratio_max', 0.80)),
+        'turnover_max': float(evaluation_cfg.get('primary_window_success_turnover_max', 22.0)),
+        'require_positive_return': bool(evaluation_cfg.get('primary_window_success_require_positive_return', True)),
+        'ratio_min': float(evaluation_cfg.get('primary_window_success_ratio_min', 0.50)),
+        'ratio_target': float(evaluation_cfg.get('primary_window_success_ratio_target', 0.60)),
+        'primary_window_min_rows': int(evaluation_cfg.get('primary_window_min_rows', 180)),
+    }
+
+
+def _window_success_diagnostics(
+    board: pd.DataFrame,
+    rule: Mapping[str, Any],
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    if board.empty or 'status' not in board.columns:
+        return board.copy(), {
+            'primary_window_count': 0,
+            'partial_window_count': 0,
+            'primary_window_success_count': 0,
+            'partial_window_success_count': 0,
+            'primary_window_success_ratio': 0.0,
+            'partial_window_success_ratio': 0.0,
+            'max_primary_window_drawdown_ratio_vs_baseline': None,
+            'median_primary_window_upside_capture': None,
+            'window_success_rule': dict(rule),
+        }
+
+    enriched = board.copy()
+    ok_mask = enriched['status'] == 'ok'
+    ok_board = enriched.loc[ok_mask].copy()
+    if ok_board.empty:
+        enriched['window_is_primary'] = False
+        enriched['window_success'] = False
+        enriched['test_drawdown_ratio_vs_benchmark'] = None
+        return enriched, {
+            'primary_window_count': 0,
+            'partial_window_count': 0,
+            'primary_window_success_count': 0,
+            'partial_window_success_count': 0,
+            'primary_window_success_ratio': 0.0,
+            'partial_window_success_ratio': 0.0,
+            'max_primary_window_drawdown_ratio_vs_baseline': None,
+            'median_primary_window_upside_capture': None,
+            'window_success_rule': dict(rule),
+        }
+
+    ok_board['window_is_primary'] = ok_board['test_rows'].astype(float) >= float(rule['primary_window_min_rows'])
+    ok_board['test_drawdown_ratio_vs_benchmark'] = ok_board.apply(
+        lambda row: _safe_divide(
+            float(row.get('test_max_drawdown', 0.0)),
+            float(row.get('test_benchmark_max_drawdown', 0.0)),
+        ),
+        axis=1,
+    )
+    require_positive_return = bool(rule['require_positive_return'])
+    if require_positive_return:
+        positive_return_ok = ok_board['test_annual_return'].astype(float) > 0.0
+    else:
+        positive_return_ok = pd.Series(True, index=ok_board.index)
+    upside_ok = ok_board['test_upside_capture'].astype(float) >= float(rule['upside_min'])
+    drawdown_ok = ok_board['test_drawdown_ratio_vs_benchmark'].fillna(float('inf')) <= float(rule['drawdown_ratio_max'])
+    turnover_ok = ok_board['test_annual_turnover'].astype(float) <= float(rule['turnover_max'])
+    ok_board['window_success'] = positive_return_ok & upside_ok & drawdown_ok & turnover_ok
+
+    primary_board = ok_board.loc[ok_board['window_is_primary']]
+    partial_board = ok_board.loc[~ok_board['window_is_primary']]
+    primary_success_count = int(primary_board['window_success'].sum()) if not primary_board.empty else 0
+    partial_success_count = int(partial_board['window_success'].sum()) if not partial_board.empty else 0
+
+    diagnostics = {
+        'primary_window_count': int(len(primary_board)),
+        'partial_window_count': int(len(partial_board)),
+        'primary_window_success_count': primary_success_count,
+        'partial_window_success_count': partial_success_count,
+        'primary_window_success_ratio': float(primary_success_count / len(primary_board)) if len(primary_board) else 0.0,
+        'partial_window_success_ratio': float(partial_success_count / len(partial_board)) if len(partial_board) else 0.0,
+        'max_primary_window_drawdown_ratio_vs_baseline': (
+            float(primary_board['test_drawdown_ratio_vs_benchmark'].max())
+            if not primary_board.empty
+            else None
+        ),
+        'median_primary_window_upside_capture': (
+            float(primary_board['test_upside_capture'].median()) if not primary_board.empty else None
+        ),
+        'window_success_rule': dict(rule),
+    }
+
+    enriched['window_is_primary'] = False
+    enriched['window_success'] = False
+    enriched['test_drawdown_ratio_vs_benchmark'] = None
+    enriched.loc[ok_board.index, 'window_is_primary'] = ok_board['window_is_primary']
+    enriched.loc[ok_board.index, 'window_success'] = ok_board['window_success']
+    enriched.loc[ok_board.index, 'test_drawdown_ratio_vs_benchmark'] = ok_board['test_drawdown_ratio_vs_benchmark']
+    return enriched, diagnostics
+
+
+def _resolve_selected_overrides(
+    row: Mapping[str, Any],
+    candidate_overrides: Mapping[str, Mapping[str, Any]],
+) -> dict[str, Any]:
+    candidate_id = str(row.get('selected_candidate_id', '')).strip()
+    if candidate_id in candidate_overrides:
+        return copy.deepcopy(dict(candidate_overrides[candidate_id]))
+    serialized = row.get('selected_candidate_overrides')
+    if not isinstance(serialized, str) or not serialized.strip():
+        return {}
+    try:
+        parsed = json.loads(serialized)
+    except json.JSONDecodeError:
+        return {}
+    if not isinstance(parsed, dict):
+        return {}
+    return parsed
+
+
+def _build_stitched_frozen_oos_ledger(
+    raw: pd.DataFrame,
+    config: Mapping[str, Any],
+    board: pd.DataFrame,
+    candidate_overrides: Mapping[str, Mapping[str, Any]],
+) -> pd.DataFrame:
+    if board.empty or 'status' not in board.columns:
+        return pd.DataFrame()
+    ok_board = board.loc[board['status'] == 'ok'].copy()
+    if ok_board.empty:
+        return pd.DataFrame()
+
+    stitched_parts: list[pd.DataFrame] = []
+    for idx, row in ok_board.iterrows():
+        candidate_id = str(row.get('selected_candidate_id', '')).strip()
+        if not candidate_id:
+            continue
+        overrides = _resolve_selected_overrides(row, candidate_overrides)
+        candidate_cfg = _candidate_config(config, candidate_id, overrides)
+        combined_slice = raw.loc[str(row['train_start']) : str(row['test_end'])].copy()
+        _, combined_ledger, _ = run_strategy_bundle(combined_slice, candidate_cfg)
+        test_ledger = combined_ledger.loc[str(row['test_start']) : str(row['test_end'])].copy()
+        if test_ledger.empty:
+            continue
+        test_ledger['frozen_window_index'] = int(idx)
+        test_ledger['selected_candidate_id'] = candidate_id
+        test_ledger['window_test_start'] = str(row['test_start'])
+        test_ledger['window_test_end'] = str(row['test_end'])
+        stitched_parts.append(test_ledger)
+
+    if not stitched_parts:
+        return pd.DataFrame()
+
+    stitched = pd.concat(stitched_parts, axis=0).sort_index()
+    if stitched.index.has_duplicates:
+        stitched = stitched.loc[~stitched.index.duplicated(keep='first')].copy()
+    return stitched
+
+
+def _metrics_from_ledger(ledger: pd.DataFrame, config: Mapping[str, Any]) -> dict[str, Any]:
+    annualization = int(dict((config or {}).get('trading', {})).get('annualization', 252))
+    if ledger.empty:
+        metrics = compute_metrics(
+            strategy_returns=pd.Series(dtype=float),
+            benchmark_returns=pd.Series(dtype=float),
+            turnover=pd.Series(dtype=float),
+            annualization=annualization,
+        )
+    else:
+        metrics = compute_metrics(
+            strategy_returns=ledger['strategy_return_net'],
+            benchmark_returns=ledger['asset_exec_return'],
+            turnover=ledger['turnover'] if 'turnover' in ledger.columns else None,
+            tracking_difference=ledger['tracking_difference'] if 'tracking_difference' in ledger.columns else None,
+            annualization=annualization,
+        )
+    out = _normalize_metrics(metrics)
+    out['utility_total_score'] = float(utility_from_metrics(out))
+    out['utility_status'] = utility_status(out['utility_total_score'])
+    return out
+
+
+def _comparison_against_baseline(strategy_metrics: Mapping[str, Any], baseline_metrics: Mapping[str, Any]) -> dict[str, Any]:
+    annual_return_delta = float(
+        float(strategy_metrics.get('annual_return', 0.0)) - float(baseline_metrics.get('annual_return', 0.0))
+    )
+    max_drawdown_delta = float(
+        float(strategy_metrics.get('max_drawdown', 0.0)) - float(baseline_metrics.get('max_drawdown', 0.0))
+    )
+    return {
+        'annual_return_delta': annual_return_delta,
+        'annual_return_delta_vs_baseline': annual_return_delta,
+        'max_drawdown_delta': max_drawdown_delta,
+        'max_drawdown_delta_vs_baseline': max_drawdown_delta,
+        'drawdown_ratio_vs_baseline': _safe_divide(
+            float(strategy_metrics.get('max_drawdown', 0.0)),
+            float(baseline_metrics.get('max_drawdown', 0.0)),
+        ),
+        'utility_delta_vs_baseline': float(
+            float(strategy_metrics.get('utility_total_score', 0.0)) - float(baseline_metrics.get('utility_total_score', 0.0))
+        ),
+        'upside_capture': float(strategy_metrics.get('upside_capture', 0.0)),
+    }
+
+
+def _build_report_markdown(summary: dict[str, Any]) -> str:
+    meta = summary['input']
+    comparison = summary['comparison']
+    stitched = summary['stitched_frozen_oos_metrics']
+    default = summary['default_strategy_full_sample_metrics']
+    baseline = summary['baseline_full_sample_metrics']
+    frozen = summary['frozen_walkforward']
+    stitched_vs_baseline = comparison['stitched_oos_vs_baseline']
+    default_vs_baseline = comparison['default_vs_baseline']
+
+    def _fmt(value: Any, ndigits: int = 4) -> str:
+        if value is None:
+            return 'n/a'
+        if isinstance(value, float):
+            return f'{value:.{ndigits}f}'
+        return str(value)
+
+    lines = [
+        '# Real Walk-Forward Report',
+        '',
+        f"- input_path: `{meta['pit_path']}`",
+        f"- row_count: `{meta['row_count']}`",
+        f"- date_range: `{meta['date_start']}` to `{meta['date_end']}`",
+        '',
+        '## Frozen Validation Summary',
+        f"- total_windows: `{frozen['total_windows']}`",
+        f"- processed_window_count: `{frozen['processed_window_count']}`",
+        f"- skipped_window_count: `{frozen['skipped_window_count']}`",
+        f"- positive_window_ratio: `{_fmt(frozen['positive_window_ratio'])}`",
+        f"- primary_window_success_ratio: `{_fmt(frozen.get('primary_window_success_ratio'))}`",
+        f"- partial_window_success_ratio: `{_fmt(frozen.get('partial_window_success_ratio'))}`",
+        f"- primary_window_count: `{frozen.get('primary_window_count', 0)}`",
+        f"- partial_window_count: `{frozen.get('partial_window_count', 0)}`",
+        f"- hard_pass_window_ratio: `{_fmt(frozen.get('hard_pass_window_ratio'))}`",
+        f"- selection_mode_distribution: `{frozen.get('selection_mode_distribution', {})}`",
+        '',
+        '## Stitched Frozen OOS vs Baseline',
+        f"- stitched_oos_annual_return: `{_fmt(stitched.get('annual_return'))}`",
+        f"- baseline_annual_return: `{_fmt(baseline.get('annual_return'))}`",
+        f"- annual_return_delta: `{_fmt(stitched_vs_baseline.get('annual_return_delta'))}`",
+        f"- stitched_oos_max_drawdown: `{_fmt(stitched.get('max_drawdown'))}`",
+        f"- baseline_max_drawdown: `{_fmt(baseline.get('max_drawdown'))}`",
+        f"- drawdown_ratio_vs_baseline: `{_fmt(stitched_vs_baseline.get('drawdown_ratio_vs_baseline'))}`",
+        f"- stitched_oos_utility_total_score: `{_fmt(stitched.get('utility_total_score'))}`",
+        f"- baseline_utility_total_score: `{_fmt(baseline.get('utility_total_score'))}`",
+        f"- utility_delta_vs_baseline: `{_fmt(stitched_vs_baseline.get('utility_delta_vs_baseline'))}`",
+        f"- stitched_oos_upside_capture: `{_fmt(stitched.get('upside_capture'))}`",
+        '',
+        '## Default Full-Sample vs Baseline (Reference)',
+        f"- default_annual_return: `{_fmt(default.get('annual_return'))}`",
+        f"- baseline_annual_return: `{_fmt(baseline.get('annual_return'))}`",
+        f"- annual_return_delta: `{_fmt(default_vs_baseline.get('annual_return_delta'))}`",
+        f"- default_max_drawdown: `{_fmt(default.get('max_drawdown'))}`",
+        f"- baseline_max_drawdown: `{_fmt(baseline.get('max_drawdown'))}`",
+        f"- drawdown_ratio_vs_baseline: `{_fmt(default_vs_baseline.get('drawdown_ratio_vs_baseline'))}`",
+        f"- default_utility_total_score: `{_fmt(default.get('utility_total_score'))}`",
+        f"- baseline_utility_total_score: `{_fmt(baseline.get('utility_total_score'))}`",
+        f"- utility_delta_vs_baseline: `{_fmt(default_vs_baseline.get('utility_delta_vs_baseline'))}`",
+    ]
+    return '\n'.join(lines) + '\n'
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description='Generate real-data frozen walk-forward report for ChiNext 50 regime workflow.')
+    parser.add_argument('--pit-csv', '--data-csv', dest='pit_csv', type=str, required=True, help='Required CSV/parquet full PIT input keyed by date.')
+    parser.add_argument('--strict-data', action='store_true', help='Fail fast when blocking quality breaches are detected.')
+    parser.add_argument('--min-coverage', type=float, default=None, help='Override default minimum non-null coverage ratio.')
+    parser.add_argument('--candidates-json', type=str, default=None, help='Optional JSON file describing frozen-validation candidate set.')
+    parser.add_argument('--min-train-rows', type=int, default=None, help='Override minimum required rows for each training window.')
+    parser.add_argument('--min-test-rows', type=int, default=None, help='Override minimum required rows for each test window.')
+    parser.add_argument('--config', type=str, default=None, help='Optional config YAML path.')
+    parser.add_argument('--output-dir', type=str, default='outputs/real_walkforward_report', help='Directory for report artifacts.')
+    args = parser.parse_args()
+
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    config = load_config(args.config)
+    raw = load_full_pit_data(args.pit_csv)
+
+    strict_mode, min_coverage, critical_columns, blocking_columns, column_min_coverage = _resolve_data_quality_settings(
+        config,
+        strict_cli=args.strict_data,
+        min_coverage_cli=args.min_coverage,
+    )
+    quality_summary = evaluate_data_quality_gate(
+        raw,
+        strict=strict_mode,
+        critical_columns=critical_columns,
+        blocking_columns=blocking_columns,
+        default_min_coverage=min_coverage,
+        column_min_coverage=column_min_coverage,
+    )
+    with (output_dir / 'data_quality_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(quality_summary, fh, ensure_ascii=False, indent=2)
+    if quality_summary['blocking']:
+        failed_items = quality_summary.get('errors') or quality_summary['breaches']
+        breached = ', '.join(item['column'] for item in failed_items)
+        raise ValueError(f'Data quality gate failed in strict mode. Breached columns: {breached}')
+
+    config.setdefault('_runtime', {})['strict_feature_gate'] = strict_mode
+    candidates, min_train_rows, min_test_rows = _resolve_frozen_settings(
+        config,
+        candidates_json=args.candidates_json,
+        min_train_rows_cli=args.min_train_rows,
+        min_test_rows_cli=args.min_test_rows,
+    )
+    windows = _resolve_walkforward_windows(config, raw.index)
+    board, frozen_summary = run_frozen_walkforward(
+        raw=raw,
+        config=config,
+        windows=windows,
+        candidates=candidates,
+        min_train_rows=min_train_rows,
+        min_test_rows=min_test_rows,
+    )
+    candidate_overrides = {candidate.candidate_id: copy.deepcopy(candidate.overrides) for candidate in candidates}
+
+    window_success_rule = _resolve_window_success_rule(config)
+    board_enriched, window_diagnostics = _window_success_diagnostics(board, window_success_rule)
+
+    stitched_ledger = _build_stitched_frozen_oos_ledger(
+        raw=raw,
+        config=config,
+        board=board_enriched,
+        candidate_overrides=candidate_overrides,
+    )
+    stitched_export = stitched_ledger.copy()
+    stitched_export.index.name = 'date'
+    stitched_export.to_csv(output_dir / 'stitched_frozen_oos_ledger.csv')
+
+    _, _, default_metrics_raw = run_strategy_bundle(raw, config)
+    baseline_plan = _build_baseline_plan(raw)
+    _, baseline_metrics_raw = run_backtest(baseline_plan, config)
+
+    default_strategy_metrics = _normalize_metrics(dict(default_metrics_raw))
+    default_strategy_metrics['utility_total_score'] = float(utility_from_metrics(default_strategy_metrics))
+    default_strategy_metrics['utility_status'] = utility_status(default_strategy_metrics['utility_total_score'])
+
+    stitched_oos_metrics = _metrics_from_ledger(stitched_ledger, config)
+
+    baseline_metrics = _normalize_metrics(dict(baseline_metrics_raw))
+    baseline_metrics['utility_total_score'] = float(utility_from_metrics(baseline_metrics))
+    baseline_metrics['utility_status'] = utility_status(baseline_metrics['utility_total_score'])
+
+    stitched_vs_baseline = _comparison_against_baseline(stitched_oos_metrics, baseline_metrics)
+    default_vs_baseline = _comparison_against_baseline(default_strategy_metrics, baseline_metrics)
+
+    comparison = {
+        'stitched_oos_vs_baseline': stitched_vs_baseline,
+        'default_vs_baseline': default_vs_baseline,
+        # Legacy aliases remain mapped to stitched OOS branch.
+        'annual_return_delta': stitched_vs_baseline['annual_return_delta'],
+        'annual_return_delta_vs_baseline': stitched_vs_baseline['annual_return_delta_vs_baseline'],
+        'max_drawdown_delta': stitched_vs_baseline['max_drawdown_delta'],
+        'max_drawdown_delta_vs_baseline': stitched_vs_baseline['max_drawdown_delta_vs_baseline'],
+        'drawdown_ratio_vs_baseline': stitched_vs_baseline['drawdown_ratio_vs_baseline'],
+        'utility_delta_vs_baseline': stitched_vs_baseline['utility_delta_vs_baseline'],
+        'upside_capture': stitched_vs_baseline['upside_capture'],
+    }
+
+    summary = {
+        'input': {
+            'pit_path': str(args.pit_csv),
+            'row_count': int(len(raw)),
+            'date_start': raw.index.min().date().isoformat() if len(raw) else None,
+            'date_end': raw.index.max().date().isoformat() if len(raw) else None,
+        },
+        'frozen_walkforward': {
+            'total_windows': int(frozen_summary['total_windows']),
+            'processed_window_count': int(frozen_summary['processed_window_count']),
+            'skipped_window_count': int(frozen_summary['skipped_window_count']),
+            'positive_window_ratio': float(frozen_summary['positive_window_ratio']),
+            'primary_window_count': int(window_diagnostics['primary_window_count']),
+            'partial_window_count': int(window_diagnostics['partial_window_count']),
+            'primary_window_success_count': int(window_diagnostics['primary_window_success_count']),
+            'partial_window_success_count': int(window_diagnostics['partial_window_success_count']),
+            'primary_window_success_ratio': float(window_diagnostics['primary_window_success_ratio']),
+            'partial_window_success_ratio': float(window_diagnostics['partial_window_success_ratio']),
+            'max_primary_window_drawdown_ratio_vs_baseline': window_diagnostics['max_primary_window_drawdown_ratio_vs_baseline'],
+            'median_primary_window_upside_capture': window_diagnostics['median_primary_window_upside_capture'],
+            'window_success_rule': dict(window_diagnostics['window_success_rule']),
+            'selected_candidate_distribution': dict(frozen_summary['selected_candidate_distribution']),
+            'window_status_counts': dict(frozen_summary['window_status_counts']),
+            'selection_mode_distribution': dict(frozen_summary.get('selection_mode_distribution', {})),
+            'windows_with_hard_pass_candidate_count': int(
+                frozen_summary.get('windows_with_hard_pass_candidate_count', 0)
+            ),
+            'windows_without_hard_pass_candidate_count': int(
+                frozen_summary.get('windows_without_hard_pass_candidate_count', 0)
+            ),
+            'hard_pass_window_ratio': float(frozen_summary.get('hard_pass_window_ratio', 0.0)),
+            'candidate_selection': dict(frozen_summary.get('candidate_selection', {})),
+            'candidate_ids': list(frozen_summary['candidate_ids']),
+            'min_train_rows': int(frozen_summary['min_train_rows']),
+            'min_test_rows': int(frozen_summary['min_test_rows']),
+            'windows': _serialize_windows(windows),
+        },
+        'default_strategy_full_sample_metrics': default_strategy_metrics,
+        'stitched_frozen_oos_metrics': stitched_oos_metrics,
+        # Backward-compatible alias: old key now points to stitched OOS metrics.
+        'strategy_full_sample_metrics': stitched_oos_metrics,
+        'baseline_full_sample_metrics': baseline_metrics,
+        'comparison': comparison,
+    }
+
+    board_enriched.to_csv(output_dir / 'frozen_validation_board.csv', index=False)
+    with (output_dir / 'real_walkforward_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(summary, fh, ensure_ascii=False, indent=2)
+    (output_dir / 'real_walkforward_report.md').write_text(_build_report_markdown(summary), encoding='utf-8')
+
+
+if __name__ == '__main__':
+    main()

+ 226 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B1/test_real_walkforward_report_pipeline.py

@@ -0,0 +1,226 @@
+from __future__ import annotations
+
+import json
+import sys
+
+import pandas as pd
+import pytest
+
+from backtest.engine import compute_metrics
+import pipelines.real_walkforward_report as real_walkforward_report
+
+
+def _write_full_pit_csv(path, periods: int = 320, *, sparse_column: str | None = None) -> None:
+    dates = pd.bdate_range('2022-01-04', periods=periods)
+    base = pd.Series(range(periods), dtype=float)
+    df = pd.DataFrame(
+        {
+            'date': dates,
+            'open': 100.0 + base * 0.1,
+            'high': 101.0 + base * 0.1 + (base % 5) * 0.02,
+            'low': 99.0 + base * 0.1 - (base % 4) * 0.015,
+            'close': 100.5 + base * 0.1 + (base % 3) * 0.01,
+            'volume': 1_000_000.0 + base * 1000.0 + (base % 7) * 200.0,
+            'hs300_close': 4000.0 + base * 0.5,
+            'star50_close': 1200.0 + base * 0.2,
+            'csi1000_close': 5000.0 + base * 0.4,
+            'pct_constituents_above_20dma': 0.55 + (base % 10) * 0.01,
+            'pct_constituents_above_60dma': 0.50 + (base % 8) * 0.01,
+            'pct_new_high_20': 0.06 + (base % 5) * 0.002,
+            'pct_new_low_20': 0.07 + (base % 4) * 0.002,
+            'eq_weight_ret_5': -0.01 + (base % 7) * 0.002,
+            'weighted_ret_5': -0.008 + (base % 7) * 0.002 + (base % 3) * 0.0005,
+            'top3_contribution_5': 0.34 + (base % 6) * 0.004,
+            'top1_contribution_5': 0.11 + (base % 6) * 0.003,
+            'top10_contribution_5': 0.60 + (base % 6) * 0.004,
+            'sector_concentration_20': 0.20 + (base % 5) * 0.003 + (base % 3) * 0.0005,
+            'corr_spike_20': 0.05 + (base % 9) * 0.003,
+            'dispersion_20': 0.18 + (base % 8) * 0.004,
+        }
+    )
+    if sparse_column is not None:
+        df.loc[5:, sparse_column] = float('nan')
+    df.to_csv(path, index=False)
+
+
+def test_real_walkforward_report_generates_artifacts(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit.csv'
+    output_dir = tmp_path / 'report_output'
+    _write_full_pit_csv(data_path, periods=360)
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        ['real_walkforward_report.py', '--pit-csv', str(data_path), '--output-dir', str(output_dir)],
+    )
+    real_walkforward_report.main()
+
+    assert (output_dir / 'data_quality_summary.json').exists()
+    assert (output_dir / 'frozen_validation_board.csv').exists()
+    assert (output_dir / 'stitched_frozen_oos_ledger.csv').exists()
+    assert (output_dir / 'real_walkforward_summary.json').exists()
+    assert (output_dir / 'real_walkforward_report.md').exists()
+
+    summary = json.loads((output_dir / 'real_walkforward_summary.json').read_text(encoding='utf-8'))
+    assert 'default_strategy_full_sample_metrics' in summary
+    assert 'stitched_frozen_oos_metrics' in summary
+    assert 'strategy_full_sample_metrics' in summary
+    assert 'baseline_full_sample_metrics' in summary
+    assert 'comparison' in summary
+    assert 'selection_mode_distribution' in summary['frozen_walkforward']
+    assert 'hard_pass_window_ratio' in summary['frozen_walkforward']
+    assert 'primary_window_success_ratio' in summary['frozen_walkforward']
+    assert 'candidate_selection' in summary['frozen_walkforward']
+    assert 'stitched_oos_vs_baseline' in summary['comparison']
+    assert 'default_vs_baseline' in summary['comparison']
+    assert 'utility_delta_vs_baseline' in summary['comparison']
+    assert 'annual_return_delta_vs_baseline' in summary['comparison']
+    assert 'max_drawdown_delta_vs_baseline' in summary['comparison']
+    assert summary['comparison']['annual_return_delta'] == summary['comparison']['annual_return_delta_vs_baseline']
+    assert summary['comparison']['max_drawdown_delta'] == summary['comparison']['max_drawdown_delta_vs_baseline']
+    assert summary['strategy_full_sample_metrics'] == summary['stitched_frozen_oos_metrics']
+    assert summary['input']['row_count'] == 360
+
+
+def test_real_walkforward_report_strict_mode_blocks_on_core_breach(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit_sparse.csv'
+    output_dir = tmp_path / 'report_strict_fail'
+    _write_full_pit_csv(data_path, periods=180, sparse_column='pct_constituents_above_60dma')
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        [
+            'real_walkforward_report.py',
+            '--pit-csv',
+            str(data_path),
+            '--strict-data',
+            '--output-dir',
+            str(output_dir),
+        ],
+    )
+
+    with pytest.raises(ValueError, match='Data quality gate failed in strict mode'):
+        real_walkforward_report.main()
+    assert (output_dir / 'data_quality_summary.json').exists()
+
+
+def test_primary_ratio_excludes_partial_and_main_comparison_uses_stitched(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit_custom.csv'
+    output_dir = tmp_path / 'report_custom'
+    _write_full_pit_csv(data_path, periods=340)
+    raw = real_walkforward_report.load_full_pit_data(str(data_path))
+
+    dates = raw.index
+    row_primary = {
+        'status': 'ok',
+        'train_start': dates[0].date().isoformat(),
+        'train_end': dates[80].date().isoformat(),
+        'test_start': dates[81].date().isoformat(),
+        'test_end': dates[300].date().isoformat(),
+        'test_rows': 220,
+        'selected_candidate_id': 'baseline',
+        'selected_candidate_overrides': '{}',
+        'test_annual_return': 0.10,
+        'test_upside_capture': 0.30,
+        'test_max_drawdown': 0.20,
+        'test_benchmark_max_drawdown': 0.40,
+        'test_annual_turnover': 12.0,
+        'test_utility_total_score': 0.05,
+    }
+    row_partial = {
+        'status': 'ok',
+        'train_start': dates[0].date().isoformat(),
+        'train_end': dates[300].date().isoformat(),
+        'test_start': dates[301].date().isoformat(),
+        'test_end': dates[339].date().isoformat(),
+        'test_rows': 39,
+        'selected_candidate_id': 'pro_risk',
+        'selected_candidate_overrides': '{}',
+        'test_annual_return': -0.08,
+        'test_upside_capture': 0.10,
+        'test_max_drawdown': 0.25,
+        'test_benchmark_max_drawdown': 0.25,
+        'test_annual_turnover': 30.0,
+        'test_utility_total_score': -0.20,
+    }
+    fake_board = pd.DataFrame([row_primary, row_partial])
+    fake_summary = {
+        'total_windows': 2,
+        'processed_window_count': 2,
+        'skipped_window_count': 0,
+        'positive_window_ratio': 0.5,
+        'selected_candidate_distribution': {'baseline': 1, 'pro_risk': 1},
+        'window_status_counts': {'ok': 2},
+        'selection_mode_distribution': {'constraint_score': 2},
+        'windows_with_hard_pass_candidate_count': 2,
+        'windows_without_hard_pass_candidate_count': 0,
+        'hard_pass_window_ratio': 1.0,
+        'candidate_selection': {},
+        'candidate_ids': ['baseline', 'pro_risk'],
+        'min_train_rows': 120,
+        'min_test_rows': 40,
+    }
+
+    def fake_run_frozen_walkforward(*args, **kwargs):
+        return fake_board.copy(), dict(fake_summary)
+
+    def fake_run_strategy_bundle(df: pd.DataFrame, cfg: dict[str, object]):
+        candidate_id = str(cfg.get('_candidate_id', 'default'))
+        if candidate_id == 'baseline':
+            strategy_ret = 0.012
+            turnover = 0.03
+        elif candidate_id == 'pro_risk':
+            strategy_ret = -0.006
+            turnover = 0.20
+        else:
+            strategy_ret = 0.020
+            turnover = 0.04
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(strategy_ret, index=df.index, dtype=float),
+                'asset_exec_return': pd.Series(0.008, index=df.index, dtype=float),
+                'turnover': pd.Series(turnover, index=df.index, dtype=float),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return df.copy(), ledger, metrics
+
+    def fake_run_backtest(df: pd.DataFrame, cfg: dict[str, object]):
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(0.008, index=df.index, dtype=float),
+                'asset_exec_return': pd.Series(0.008, index=df.index, dtype=float),
+                'turnover': pd.Series(0.0, index=df.index, dtype=float),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return ledger, metrics
+
+    monkeypatch.setattr(real_walkforward_report, 'run_frozen_walkforward', fake_run_frozen_walkforward)
+    monkeypatch.setattr(real_walkforward_report, 'run_strategy_bundle', fake_run_strategy_bundle)
+    monkeypatch.setattr(real_walkforward_report, 'run_backtest', fake_run_backtest)
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        ['real_walkforward_report.py', '--pit-csv', str(data_path), '--output-dir', str(output_dir)],
+    )
+    real_walkforward_report.main()
+
+    summary = json.loads((output_dir / 'real_walkforward_summary.json').read_text(encoding='utf-8'))
+    frozen = summary['frozen_walkforward']
+    comparison = summary['comparison']
+    stitched_cmp = comparison['stitched_oos_vs_baseline']
+    default_cmp = comparison['default_vs_baseline']
+
+    assert frozen['primary_window_count'] == 1
+    assert frozen['partial_window_count'] == 1
+    assert frozen['primary_window_success_ratio'] == 1.0
+    assert frozen['partial_window_success_ratio'] == 0.0
+
+    assert comparison['annual_return_delta'] == stitched_cmp['annual_return_delta']
+    assert comparison['annual_return_delta'] != default_cmp['annual_return_delta']
+    assert summary['stitched_frozen_oos_metrics']['annual_return'] != summary['default_strategy_full_sample_metrics']['annual_return']
+
+    report_text = (output_dir / 'real_walkforward_report.md').read_text(encoding='utf-8')
+    assert 'Stitched Frozen OOS vs Baseline' in report_text

+ 586 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B2/frozen_walkforward.py

@@ -0,0 +1,586 @@
+from __future__ import annotations
+
+import copy
+import json
+from dataclasses import dataclass
+from typing import Any, Callable, Iterable, Mapping, Sequence
+
+import pandas as pd
+
+from backtest.engine import compute_metrics, run_backtest
+from backtest.utility import utility_from_metrics, utility_status
+from features.quality import enforce_feature_information_gate
+from backtest.walkforward import WindowSpec
+from features.pipeline import build_feature_table
+from model.policy import build_exposure_plan
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+
+
+@dataclass(frozen=True)
+class HypothesisCandidate:
+    candidate_id: str
+    overrides: dict[str, Any]
+
+
+DEFAULT_HYPOTHESIS_CANDIDATES: tuple[HypothesisCandidate, ...] = (
+    HypothesisCandidate(
+        candidate_id='defensive',
+        overrides={
+            'policy': {
+                'trend': 0.80,
+                'euphoric_late': 0.30,
+                'chop': 0.20,
+                'repair_rebound_base': 0.30,
+                'repair_rebound_max': 0.65,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.20,
+            },
+        },
+    ),
+    HypothesisCandidate(candidate_id='baseline', overrides={}),
+    HypothesisCandidate(
+        candidate_id='balanced_capture',
+        overrides={
+            'policy': {
+                'trend': 0.95,
+                'euphoric_late': 0.65,
+                'chop': 0.35,
+                'repair_rebound_base': 0.40,
+                'repair_rebound_max': 0.85,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.30,
+            },
+        },
+    ),
+    HypothesisCandidate(
+        candidate_id='pro_risk',
+        overrides={
+            'policy': {
+                'trend': 1.00,
+                'euphoric_late': 0.70,
+                'chop': 0.45,
+                'repair_rebound_base': 0.50,
+                'repair_rebound_max': 0.95,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.35,
+            },
+        },
+    ),
+)
+
+
+StrategyRunner = Callable[[pd.DataFrame, dict[str, Any]], tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]]
+
+
+def _deep_merge_dict(base: Mapping[str, Any], overrides: Mapping[str, Any]) -> dict[str, Any]:
+    out = copy.deepcopy(dict(base))
+    for key, value in overrides.items():
+        if isinstance(value, Mapping) and isinstance(out.get(key), Mapping):
+            out[key] = _deep_merge_dict(dict(out[key]), value)
+        else:
+            out[key] = copy.deepcopy(value)
+    return out
+
+
+def _resolve_utility(metrics: Mapping[str, float]) -> tuple[float, str]:
+    utility_total_score = float(metrics.get('utility_total_score', utility_from_metrics(dict(metrics))))
+    utility_state = str(metrics.get('utility_status', utility_status(utility_total_score)))
+    return utility_total_score, utility_state
+
+
+def run_strategy_bundle(df: pd.DataFrame, config: dict[str, Any]) -> tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]:
+    featured = build_feature_table(df)
+    enforce_feature_information_gate(featured, config)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+    ledger, metrics = run_backtest(planned, config)
+
+    utility_total_score, utility_state = _resolve_utility(metrics)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return planned, ledger, out_metrics
+
+
+def normalize_hypothesis_candidates(raw_candidates: Iterable[Mapping[str, Any]] | None) -> list[HypothesisCandidate]:
+    if raw_candidates is None:
+        return [copy.deepcopy(candidate) for candidate in DEFAULT_HYPOTHESIS_CANDIDATES]
+
+    candidates: list[HypothesisCandidate] = []
+    for idx, item in enumerate(raw_candidates):
+        candidate_id = str(item.get('id', item.get('candidate_id', f'candidate_{idx + 1}'))).strip()
+        if not candidate_id:
+            raise ValueError(f'Candidate index {idx} is missing an id.')
+        overrides_raw = item.get('overrides', {})
+        if not isinstance(overrides_raw, Mapping):
+            raise ValueError(f'Candidate {candidate_id} overrides must be an object.')
+        candidates.append(HypothesisCandidate(candidate_id=candidate_id, overrides=dict(overrides_raw)))
+
+    if not candidates:
+        raise ValueError('At least one hypothesis candidate is required.')
+
+    ids = [candidate.candidate_id for candidate in candidates]
+    if len(set(ids)) != len(ids):
+        raise ValueError(f'Duplicate candidate ids found: {ids}')
+    return candidates
+
+
+def _candidate_config(base_config: Mapping[str, Any], candidate: HypothesisCandidate) -> dict[str, Any]:
+    merged = _deep_merge_dict(base_config, candidate.overrides)
+    merged['_candidate_id'] = candidate.candidate_id
+    return merged
+
+
+def _prefixed_metrics(prefix: str, metrics: Mapping[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[f'{prefix}_{key}'] = float(value)
+        else:
+            out[f'{prefix}_{key}'] = value
+    return out
+
+
+def _compute_window_metrics(ledger: pd.DataFrame) -> dict[str, float]:
+    required_columns = {'strategy_return_net', 'asset_exec_return', 'turnover'}
+    if not required_columns.issubset(ledger.columns):
+        raise ValueError(f'Ledger is missing required columns: {sorted(required_columns - set(ledger.columns))}')
+    metrics = compute_metrics(
+        strategy_returns=ledger['strategy_return_net'],
+        benchmark_returns=ledger['asset_exec_return'],
+        turnover=ledger['turnover'],
+    )
+    utility_total_score, utility_state = _resolve_utility(metrics)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return out_metrics
+
+
+def _window_row_base(window: WindowSpec) -> dict[str, Any]:
+    return {
+        'train_start': window.train_start,
+        'train_end': window.train_end,
+        'test_start': window.test_start,
+        'test_end': window.test_end,
+    }
+
+
+def _clip(value: float, lower: float, upper: float) -> float:
+    return float(min(max(value, lower), upper))
+
+
+def _safe_float(value: Any, default: float = 0.0) -> float:
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return float(default)
+
+
+def _resolve_candidate_selection_settings(config: Mapping[str, Any]) -> dict[str, Any]:
+    frozen_cfg = dict((config or {}).get('frozen_validation', {}))
+    cfg = dict(frozen_cfg.get('candidate_selection', {}))
+    return {
+        'use_hard_constraints': bool(cfg.get('use_hard_constraints', True)),
+        'upside_capture_min': float(cfg.get('upside_capture_min', 0.28)),
+        'max_drawdown_ratio_vs_benchmark': float(cfg.get('max_drawdown_ratio_vs_benchmark', 0.72)),
+        'annual_turnover_soft_max': float(cfg.get('annual_turnover_soft_max', 18.0)),
+        'annual_return_override_abs': float(cfg.get('annual_return_override_abs', 0.05)),
+        'annual_return_override_ratio': float(cfg.get('annual_return_override_ratio', 0.40)),
+        'return_ratio_weight': float(cfg.get('return_ratio_weight', 0.30)),
+        'upside_weight': float(cfg.get('upside_weight', 0.30)),
+        'drawdown_weight': float(cfg.get('drawdown_weight', 0.20)),
+        'sharpe_delta_weight': float(cfg.get('sharpe_delta_weight', 0.10)),
+        'stability_weight': float(cfg.get('stability_weight', 0.10)),
+        'turnover_penalty_per_unit': float(cfg.get('turnover_penalty_per_unit', 0.015)),
+        'score_cap': float(cfg.get('score_cap', 1.2)),
+        'upside_target': float(cfg.get('upside_target', 0.45)),
+        'drawdown_improvement_target': float(cfg.get('drawdown_improvement_target', 0.35)),
+        'sharpe_delta_shift': float(cfg.get('sharpe_delta_shift', 0.05)),
+        'sharpe_delta_scale': float(cfg.get('sharpe_delta_scale', 0.15)),
+        'turnover_penalty_start': float(cfg.get('turnover_penalty_start', 12.0)),
+        'utility_floor': float(cfg.get('utility_floor', -0.15)),
+        'utility_target': float(cfg.get('utility_target', 0.05)),
+        'fallback_mode': str(cfg.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower(),
+    }
+
+
+def _compute_selection_score(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    sharpe_delta = _safe_float(metrics.get('sharpe_delta'))
+    utility_total_score = _safe_float(metrics.get('utility_total_score'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+
+    score_cap = float(settings['score_cap'])
+    upside_target = max(float(settings['upside_target']), 1e-12)
+    drawdown_target = max(float(settings['drawdown_improvement_target']), 1e-12)
+    sharpe_scale = max(float(settings['sharpe_delta_scale']), 1e-12)
+
+    if benchmark_return > 0.05:
+        return_ratio = _clip(annual_return / benchmark_return, 0.0, score_cap)
+    else:
+        return_ratio = _clip(annual_return / 0.10, 0.0, score_cap)
+    upside_score = _clip((upside_capture - 0.15) / max(upside_target - 0.15, 1e-12), 0.0, score_cap)
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_improvement = (benchmark_max_drawdown - max_drawdown) / benchmark_max_drawdown
+    else:
+        drawdown_improvement = 0.0
+    drawdown_score = _clip(drawdown_improvement / drawdown_target, 0.0, score_cap)
+    sharpe_delta_score = _clip((sharpe_delta + float(settings['sharpe_delta_shift'])) / sharpe_scale, 0.0, score_cap)
+    stability_score = _clip(
+        (utility_total_score - float(settings['utility_floor']))
+        / max(float(settings['utility_target']) - float(settings['utility_floor']), 1e-12),
+        0.0,
+        score_cap,
+    )
+    turnover_penalty = max(0.0, annual_turnover - float(settings['turnover_penalty_start'])) * float(
+        settings['turnover_penalty_per_unit']
+    )
+
+    score = (
+        float(settings['return_ratio_weight']) * return_ratio
+        + float(settings['upside_weight']) * upside_score
+        + float(settings['drawdown_weight']) * drawdown_score
+        + float(settings['sharpe_delta_weight']) * sharpe_delta_score
+        + float(settings['stability_weight']) * stability_score
+        - turnover_penalty
+    )
+    return score, {
+        'return_ratio': return_ratio,
+        'upside_score': upside_score,
+        'drawdown_score': drawdown_score,
+        'sharpe_delta_score': sharpe_delta_score,
+        'stability_score': stability_score,
+        'turnover_penalty': turnover_penalty,
+    }
+
+
+def _evaluate_hard_constraints(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[bool, list[str]]:
+    reasons: list[str] = []
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    if upside_capture < float(settings['upside_capture_min']):
+        reasons.append('upside_capture_below_min')
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_ratio = max_drawdown / benchmark_max_drawdown
+        if drawdown_ratio > float(settings['max_drawdown_ratio_vs_benchmark']):
+            reasons.append('drawdown_ratio_above_max')
+
+    turnover_cap = float(settings['annual_turnover_soft_max'])
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+    if annual_turnover > turnover_cap and annual_return < return_override_threshold:
+        reasons.append('turnover_above_soft_max_without_return_override')
+
+    return len(reasons) == 0, reasons
+
+
+def _constraint_distance(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    upside_min = max(float(settings['upside_capture_min']), 1e-12)
+    drawdown_max = max(float(settings['max_drawdown_ratio_vs_benchmark']), 1e-12)
+    turnover_soft_max = max(float(settings['annual_turnover_soft_max']), 1e-12)
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+
+    upside_gap = max(0.0, upside_min - upside_capture) / upside_min
+    drawdown_ratio = (max_drawdown / benchmark_max_drawdown) if benchmark_max_drawdown > 1e-12 else 0.0
+    drawdown_gap = max(0.0, drawdown_ratio - drawdown_max) / drawdown_max
+
+    turnover_gap = 0.0
+    if annual_turnover > turnover_soft_max and annual_return < return_override_threshold:
+        turnover_gap = (annual_turnover - turnover_soft_max) / turnover_soft_max
+
+    violation_distance = 0.50 * upside_gap + 0.30 * drawdown_gap + 0.20 * turnover_gap
+    return float(violation_distance), {
+        'upside_gap': float(upside_gap),
+        'drawdown_gap': float(drawdown_gap),
+        'turnover_gap': float(turnover_gap),
+    }
+
+
+def run_frozen_walkforward(
+    raw: pd.DataFrame,
+    config: Mapping[str, Any],
+    windows: Sequence[WindowSpec],
+    *,
+    candidates: Sequence[HypothesisCandidate] | None = None,
+    min_train_rows: int = 120,
+    min_test_rows: int = 40,
+    strategy_runner: StrategyRunner | None = None,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    if min_train_rows <= 0:
+        raise ValueError('min_train_rows must be positive.')
+    if min_test_rows <= 0:
+        raise ValueError('min_test_rows must be positive.')
+
+    runner = strategy_runner or run_strategy_bundle
+    candidate_list = list(candidates or DEFAULT_HYPOTHESIS_CANDIDATES)
+    if not candidate_list:
+        raise ValueError('At least one candidate is required for frozen walk-forward.')
+    selection_settings = _resolve_candidate_selection_settings(config)
+
+    rows: list[dict[str, Any]] = []
+
+    for window in windows:
+        train_slice = raw.loc[window.train_start:window.train_end].copy()
+        test_slice = raw.loc[window.test_start:window.test_end].copy()
+
+        row = _window_row_base(window)
+        row['train_rows'] = int(len(train_slice))
+        row['test_rows'] = int(len(test_slice))
+        row['candidate_count'] = int(len(candidate_list))
+
+        if len(train_slice) < min_train_rows:
+            row['status'] = 'skipped_insufficient_train'
+            rows.append(row)
+            continue
+        if len(test_slice) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        selected_candidate: HypothesisCandidate | None = None
+        selected_train_metrics: dict[str, float] | None = None
+        selected_train_utility = float('-inf')
+        selected_train_score = float('-inf')
+        selected_train_hard_pass = False
+        selected_train_constraint_failures: list[str] = []
+        selected_train_violation_distance = 0.0
+        selected_train_violation_components: dict[str, float] = {}
+        selection_mode = 'constraint_score'
+        candidate_evaluations: list[dict[str, Any]] = []
+
+        for candidate in candidate_list:
+            candidate_config = _candidate_config(config, candidate)
+            _, _, train_metrics_raw = runner(train_slice, candidate_config)
+            train_metrics = dict(train_metrics_raw)
+            utility_value, _ = _resolve_utility(train_metrics)
+            train_metrics['utility_total_score'] = utility_value
+            train_metrics['utility_status'] = utility_status(utility_value)
+            hard_pass, hard_fail_reasons = _evaluate_hard_constraints(train_metrics, selection_settings)
+            score_value, score_components = _compute_selection_score(train_metrics, selection_settings)
+            violation_distance, violation_components = _constraint_distance(train_metrics, selection_settings)
+            candidate_evaluations.append(
+                {
+                    'candidate': candidate,
+                    'metrics': train_metrics,
+                    'utility': utility_value,
+                    'hard_pass': hard_pass,
+                    'hard_fail_reasons': hard_fail_reasons,
+                    'selection_score': score_value,
+                    'selection_score_components': score_components,
+                    'violation_distance': violation_distance,
+                    'violation_components': violation_components,
+                }
+            )
+
+        use_hard_constraints = bool(selection_settings['use_hard_constraints'])
+        ranking_pool = (
+            [item for item in candidate_evaluations if item['hard_pass']]
+            if use_hard_constraints
+            else candidate_evaluations
+        )
+
+        if ranking_pool:
+            for item in ranking_pool:
+                score_value = float(item['selection_score'])
+                if score_value > selected_train_score:
+                    selected_train_score = score_value
+                    selected_candidate = item['candidate']
+                    selected_train_metrics = item['metrics']
+                    selected_train_utility = float(item['utility'])
+                    selected_train_hard_pass = bool(item['hard_pass'])
+                    selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                    selected_train_violation_distance = float(item['violation_distance'])
+                    selected_train_violation_components = dict(item['violation_components'])
+        else:
+            fallback_mode = str(selection_settings.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower()
+            if fallback_mode == 'closest_to_feasible_frontier':
+                selection_mode = 'frontier_fallback_no_hard_pass'
+                selected_fallback_score = float('-inf')
+                for item in candidate_evaluations:
+                    fallback_score = -float(item['violation_distance']) + 0.25 * float(item['selection_score'])
+                    utility_value = float(item['utility'])
+                    if (
+                        fallback_score > selected_fallback_score
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) > selected_train_score
+                        )
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) == selected_train_score
+                            and utility_value > selected_train_utility
+                        )
+                    ):
+                        selected_fallback_score = fallback_score
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+            else:
+                selection_mode = 'utility_fallback_no_hard_pass'
+                for item in candidate_evaluations:
+                    utility_value = float(item['utility'])
+                    if utility_value > selected_train_utility:
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+
+        hard_pass_count = int(sum(1 for item in candidate_evaluations if bool(item['hard_pass'])))
+        ranking_brief = [
+            {
+                'candidate_id': item['candidate'].candidate_id,
+                'hard_pass': bool(item['hard_pass']),
+                'selection_score': float(item['selection_score']),
+                'train_utility_total_score': float(item['utility']),
+                'hard_fail_reasons': list(item['hard_fail_reasons']),
+                'violation_distance': float(item['violation_distance']),
+            }
+            for item in candidate_evaluations
+        ]
+        ranking_brief.sort(key=lambda x: (-x['hard_pass'], -x['selection_score'], -x['train_utility_total_score']))
+
+        if selected_candidate is None or selected_train_metrics is None:
+            row['status'] = 'skipped_no_candidate'
+            rows.append(row)
+            continue
+
+        combined_slice = raw.loc[window.train_start:window.test_end].copy()
+        candidate_config = _candidate_config(config, selected_candidate)
+        _, combined_ledger, _ = runner(combined_slice, candidate_config)
+        frozen_test_ledger = combined_ledger.loc[window.test_start:window.test_end].copy()
+
+        if len(frozen_test_ledger) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        test_metrics = _compute_window_metrics(frozen_test_ledger)
+
+        row.update(
+            {
+                'status': 'ok',
+                'selected_candidate_id': selected_candidate.candidate_id,
+                'selection_mode': selection_mode,
+                'train_candidate_hard_pass_count': hard_pass_count,
+                'train_candidate_total_count': int(len(candidate_evaluations)),
+                'selected_train_selection_score': float(selected_train_score),
+                'selected_train_hard_pass': bool(selected_train_hard_pass),
+                'selected_train_constraint_failures': json.dumps(
+                    selected_train_constraint_failures,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'selected_train_violation_distance': float(selected_train_violation_distance),
+                'selected_train_violation_components': json.dumps(
+                    selected_train_violation_components,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'train_candidate_rankings': json.dumps(ranking_brief, ensure_ascii=False, sort_keys=True),
+                'selected_candidate_overrides': json.dumps(
+                    selected_candidate.overrides,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+            }
+        )
+        row.update(_prefixed_metrics('train', selected_train_metrics))
+        row.update(_prefixed_metrics('test', test_metrics))
+        rows.append(row)
+
+    board = pd.DataFrame(rows)
+    if board.empty:
+        board = pd.DataFrame(columns=['status'])
+
+    ok_board = board[board['status'] == 'ok'].copy() if 'status' in board.columns else pd.DataFrame()
+    selected_distribution = (
+        ok_board['selected_candidate_id'].value_counts().to_dict() if 'selected_candidate_id' in ok_board.columns else {}
+    )
+    status_counts = board['status'].value_counts().to_dict() if 'status' in board.columns else {}
+    selection_mode_distribution = (
+        ok_board['selection_mode'].value_counts().to_dict() if not ok_board.empty and 'selection_mode' in ok_board.columns else {}
+    )
+    windows_with_hard_pass_candidate_count = (
+        int((ok_board['train_candidate_hard_pass_count'] > 0).sum())
+        if not ok_board.empty and 'train_candidate_hard_pass_count' in ok_board.columns
+        else 0
+    )
+    hard_pass_window_ratio = (
+        float(windows_with_hard_pass_candidate_count / len(ok_board))
+        if len(ok_board) > 0
+        else 0.0
+    )
+    positive_window_ratio = (
+        float((ok_board['test_utility_total_score'] > 0.0).mean())
+        if not ok_board.empty and 'test_utility_total_score' in ok_board.columns
+        else 0.0
+    )
+    fallback_distance_distribution = (
+        ok_board.loc[
+            ok_board['selection_mode'].isin({'frontier_fallback_no_hard_pass', 'utility_fallback_no_hard_pass'}),
+            'selected_train_violation_distance',
+        ]
+        .dropna()
+        .tolist()
+        if not ok_board.empty
+        and 'selection_mode' in ok_board.columns
+        and 'selected_train_violation_distance' in ok_board.columns
+        else []
+    )
+
+    summary = {
+        'total_windows': int(len(windows)),
+        'processed_window_count': int(len(ok_board)),
+        'skipped_window_count': int(max(len(windows) - len(ok_board), 0)),
+        'positive_window_ratio': positive_window_ratio,
+        'selected_candidate_distribution': selected_distribution,
+        'window_status_counts': status_counts,
+        'selection_mode_distribution': selection_mode_distribution,
+        'windows_with_hard_pass_candidate_count': windows_with_hard_pass_candidate_count,
+        'windows_without_hard_pass_candidate_count': int(max(len(ok_board) - windows_with_hard_pass_candidate_count, 0)),
+        'hard_pass_window_ratio': hard_pass_window_ratio,
+        'fallback_distance_distribution': [float(x) for x in fallback_distance_distribution],
+        'candidate_ids': [candidate.candidate_id for candidate in candidate_list],
+        'min_train_rows': int(min_train_rows),
+        'min_test_rows': int(min_test_rows),
+        'candidate_selection': selection_settings,
+    }
+    return board, summary

+ 175 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B2/regime.yaml

@@ -0,0 +1,175 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    utility_floor: -0.15
+    utility_target: 0.05
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 410 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B2/test_frozen_walkforward.py

@@ -0,0 +1,410 @@
+from __future__ import annotations
+
+from typing import Any
+
+import pandas as pd
+
+import backtest.frozen_walkforward as frozen_module
+from backtest.engine import compute_metrics
+from backtest.frozen_walkforward import HypothesisCandidate, run_frozen_walkforward
+from backtest.utility import utility_from_metrics
+from backtest.walkforward import WindowSpec
+
+
+def _raw_frame() -> pd.DataFrame:
+    idx = pd.bdate_range('2024-01-02', periods=12)
+    return pd.DataFrame({'dummy': 1.0}, index=idx)
+
+
+def test_train_selection_is_deterministic_under_tie() -> None:
+    raw = _raw_frame()
+    split_date = raw.index[7]
+    window = WindowSpec(
+        train_start=raw.index[0].date().isoformat(),
+        train_end=raw.index[5].date().isoformat(),
+        test_start=raw.index[6].date().isoformat(),
+        test_end=raw.index[9].date().isoformat(),
+    )
+
+    def tie_runner(df: pd.DataFrame, config: dict[str, Any]):
+        returns = pd.Series(0.01, index=df.index)
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': returns,
+                'asset_exec_return': pd.Series(0.008, index=df.index),
+                'turnover': pd.Series(0.0, index=df.index),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return df.copy(), ledger, metrics
+
+    candidates = [
+        HypothesisCandidate('alpha', {}),
+        HypothesisCandidate('beta', {}),
+    ]
+
+    board1, _ = run_frozen_walkforward(
+        raw=raw,
+        config={},
+        windows=[window],
+        candidates=candidates,
+        min_train_rows=3,
+        min_test_rows=2,
+        strategy_runner=tie_runner,
+    )
+    board2, _ = run_frozen_walkforward(
+        raw=raw,
+        config={},
+        windows=[window],
+        candidates=candidates,
+        min_train_rows=3,
+        min_test_rows=2,
+        strategy_runner=tie_runner,
+    )
+    assert board1.iloc[0]['status'] == 'ok'
+    assert board1.iloc[0]['selected_candidate_id'] == 'alpha'
+    assert board2.iloc[0]['selected_candidate_id'] == 'alpha'
+
+
+def test_test_window_uses_train_selected_candidate_without_reselect() -> None:
+    raw = _raw_frame()
+    split_date = raw.index[5]
+    window = WindowSpec(
+        train_start=raw.index[0].date().isoformat(),
+        train_end=split_date.date().isoformat(),
+        test_start=raw.index[6].date().isoformat(),
+        test_end=raw.index[10].date().isoformat(),
+    )
+
+    def phase_runner(df: pd.DataFrame, config: dict[str, Any]):
+        candidate_id = config.get('_candidate_id', '')
+        strategy_returns: list[float] = []
+        for ts in df.index:
+            if ts <= split_date:
+                strategy_returns.append(0.02 if candidate_id == 'defensive' else -0.02)
+            else:
+                strategy_returns.append(-0.01 if candidate_id == 'defensive' else 0.03)
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(strategy_returns, index=df.index, dtype=float),
+                'asset_exec_return': pd.Series(0.01, index=df.index, dtype=float),
+                'turnover': pd.Series(0.0, index=df.index, dtype=float),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return df.copy(), ledger, metrics
+
+    candidates = [
+        HypothesisCandidate('defensive', {}),
+        HypothesisCandidate('aggressive', {}),
+    ]
+    board, _ = run_frozen_walkforward(
+        raw=raw,
+        config={},
+        windows=[window],
+        candidates=candidates,
+        min_train_rows=3,
+        min_test_rows=2,
+        strategy_runner=phase_runner,
+    )
+
+    # Compute counterfactual aggressive test utility.
+    combined = raw.loc[window.train_start:window.test_end].copy()
+    _, aggressive_ledger, _ = phase_runner(combined, {'_candidate_id': 'aggressive'})
+    aggressive_test = aggressive_ledger.loc[window.test_start:window.test_end]
+    aggressive_metrics = compute_metrics(
+        aggressive_test['strategy_return_net'],
+        aggressive_test['asset_exec_return'],
+        aggressive_test['turnover'],
+    )
+    aggressive_utility = utility_from_metrics(aggressive_metrics)
+
+    _, defensive_ledger, _ = phase_runner(combined, {'_candidate_id': 'defensive'})
+    defensive_test = defensive_ledger.loc[window.test_start:window.test_end]
+    defensive_metrics = compute_metrics(
+        defensive_test['strategy_return_net'],
+        defensive_test['asset_exec_return'],
+        defensive_test['turnover'],
+    )
+    defensive_utility = utility_from_metrics(defensive_metrics)
+
+    assert board.iloc[0]['status'] == 'ok'
+    assert board.iloc[0]['selected_candidate_id'] == 'defensive'
+    assert aggressive_utility > defensive_utility
+    assert abs(float(board.iloc[0]['test_utility_total_score']) - defensive_utility) < 1e-12
+
+
+def test_window_is_marked_skipped_when_train_rows_insufficient() -> None:
+    raw = _raw_frame()
+    window = WindowSpec(
+        train_start=raw.index[0].date().isoformat(),
+        train_end=raw.index[3].date().isoformat(),
+        test_start=raw.index[4].date().isoformat(),
+        test_end=raw.index[6].date().isoformat(),
+    )
+
+    board, summary = run_frozen_walkforward(
+        raw=raw,
+        config={},
+        windows=[window],
+        candidates=[HypothesisCandidate('baseline', {})],
+        min_train_rows=999,
+        min_test_rows=2,
+        strategy_runner=lambda df, cfg: (
+            df.copy(),
+            pd.DataFrame(
+                {
+                    'strategy_return_net': pd.Series(0.0, index=df.index),
+                    'asset_exec_return': pd.Series(0.0, index=df.index),
+                    'turnover': pd.Series(0.0, index=df.index),
+                }
+            ),
+            {},
+        ),
+    )
+
+    assert board.iloc[0]['status'] == 'skipped_insufficient_train'
+    assert summary['processed_window_count'] == 0
+    assert summary['skipped_window_count'] == 1
+
+
+def test_hard_constraints_prefer_hard_pass_candidate_over_higher_utility_fail_candidate() -> None:
+    raw = _raw_frame()
+    window = WindowSpec(
+        train_start=raw.index[0].date().isoformat(),
+        train_end=raw.index[6].date().isoformat(),
+        test_start=raw.index[7].date().isoformat(),
+        test_end=raw.index[10].date().isoformat(),
+    )
+    config = {
+        'frozen_validation': {
+            'candidate_selection': {
+                'use_hard_constraints': True,
+                'upside_capture_min': 0.50,
+            }
+        }
+    }
+
+    def candidate_runner(df: pd.DataFrame, cfg: dict[str, Any]):
+        candidate_id = cfg.get('_candidate_id', '')
+        if candidate_id == 'high_utility_fail':
+            annual_return = 0.12
+            upside_capture = 0.20
+        else:
+            annual_return = 0.08
+            upside_capture = 0.70
+        metrics = {
+            'annual_return': annual_return,
+            'benchmark_return': 0.10,
+            'max_drawdown': 0.30,
+            'benchmark_max_drawdown': 0.60,
+            'sharpe_delta': 0.02,
+            'upside_capture': upside_capture,
+            'annual_turnover': 6.0,
+            'utility_total_score': annual_return,
+            'utility_status': 'positive_utility',
+        }
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(0.01, index=df.index),
+                'asset_exec_return': pd.Series(0.008, index=df.index),
+                'turnover': pd.Series(0.0, index=df.index),
+            }
+        )
+        return df.copy(), ledger, metrics
+
+    candidates = [
+        HypothesisCandidate('high_utility_fail', {}),
+        HypothesisCandidate('hard_pass', {}),
+    ]
+    board, summary = run_frozen_walkforward(
+        raw=raw,
+        config=config,
+        windows=[window],
+        candidates=candidates,
+        min_train_rows=3,
+        min_test_rows=2,
+        strategy_runner=candidate_runner,
+    )
+
+    assert board.iloc[0]['status'] == 'ok'
+    assert board.iloc[0]['selected_candidate_id'] == 'hard_pass'
+    assert board.iloc[0]['selection_mode'] == 'constraint_score'
+    assert int(board.iloc[0]['train_candidate_hard_pass_count']) == 1
+    assert summary['windows_with_hard_pass_candidate_count'] == 1
+    assert summary['selection_mode_distribution']['constraint_score'] == 1
+
+
+def test_selection_falls_back_to_closest_frontier_when_no_hard_pass_candidate() -> None:
+    raw = _raw_frame()
+    window = WindowSpec(
+        train_start=raw.index[0].date().isoformat(),
+        train_end=raw.index[6].date().isoformat(),
+        test_start=raw.index[7].date().isoformat(),
+        test_end=raw.index[10].date().isoformat(),
+    )
+    config = {
+        'frozen_validation': {
+            'candidate_selection': {
+                'use_hard_constraints': True,
+                'upside_capture_min': 0.30,
+                'max_drawdown_ratio_vs_benchmark': 0.72,
+                'fallback_mode': 'closest_to_feasible_frontier',
+            }
+        }
+    }
+
+    def candidate_runner(df: pd.DataFrame, cfg: dict[str, Any]):
+        candidate_id = cfg.get('_candidate_id', '')
+        if candidate_id == 'winner_by_utility':
+            utility = 0.20
+            upside_capture = 0.12
+            max_drawdown = 0.60
+        else:
+            utility = 0.04
+            upside_capture = 0.26
+            max_drawdown = 0.42
+        metrics = {
+            'annual_return': utility,
+            'benchmark_return': 0.10,
+            'max_drawdown': max_drawdown,
+            'benchmark_max_drawdown': 0.60,
+            'sharpe_delta': 0.01,
+            'upside_capture': upside_capture,
+            'annual_turnover': 5.0,
+            'utility_total_score': utility,
+            'utility_status': 'positive_utility',
+        }
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(0.01, index=df.index),
+                'asset_exec_return': pd.Series(0.008, index=df.index),
+                'turnover': pd.Series(0.0, index=df.index),
+            }
+        )
+        return df.copy(), ledger, metrics
+
+    candidates = [
+        HypothesisCandidate('winner_by_utility', {}),
+        HypothesisCandidate('closer_to_frontier', {}),
+    ]
+    board, summary = run_frozen_walkforward(
+        raw=raw,
+        config=config,
+        windows=[window],
+        candidates=candidates,
+        min_train_rows=3,
+        min_test_rows=2,
+        strategy_runner=candidate_runner,
+    )
+
+    assert board.iloc[0]['status'] == 'ok'
+    assert board.iloc[0]['selected_candidate_id'] == 'closer_to_frontier'
+    assert board.iloc[0]['selection_mode'] == 'frontier_fallback_no_hard_pass'
+    assert int(board.iloc[0]['train_candidate_hard_pass_count']) == 0
+    assert summary['windows_with_hard_pass_candidate_count'] == 0
+    assert summary['selection_mode_distribution']['frontier_fallback_no_hard_pass'] == 1
+
+
+def test_stability_score_is_non_binary_and_distinguishes_negative_utilities() -> None:
+    settings = frozen_module._resolve_candidate_selection_settings({})
+    base_metrics = {
+        'annual_return': 0.05,
+        'benchmark_return': 0.10,
+        'max_drawdown': 0.20,
+        'benchmark_max_drawdown': 0.50,
+        'sharpe_delta': 0.03,
+        'upside_capture': 0.32,
+        'annual_turnover': 8.0,
+    }
+
+    _, components_low = frozen_module._compute_selection_score(
+        {**base_metrics, 'utility_total_score': -0.14},
+        settings,
+    )
+    _, components_high = frozen_module._compute_selection_score(
+        {**base_metrics, 'utility_total_score': -0.05},
+        settings,
+    )
+
+    assert 0.0 < components_low['stability_score'] < components_high['stability_score']
+    assert components_high['stability_score'] < float(settings['score_cap'])
+
+
+def test_turnover_override_uses_max_abs_and_ratio_threshold() -> None:
+    settings = frozen_module._resolve_candidate_selection_settings({})
+
+    hard_pass, reasons = frozen_module._evaluate_hard_constraints(
+        {
+            'annual_return': 0.07,
+            'benchmark_return': 0.20,
+            'max_drawdown': 0.30,
+            'benchmark_max_drawdown': 0.60,
+            'upside_capture': 0.40,
+            'annual_turnover': 20.0,
+        },
+        settings,
+    )
+    assert not hard_pass
+    assert 'turnover_above_soft_max_without_return_override' in reasons
+
+    hard_pass, reasons = frozen_module._evaluate_hard_constraints(
+        {
+            'annual_return': 0.09,
+            'benchmark_return': 0.20,
+            'max_drawdown': 0.30,
+            'benchmark_max_drawdown': 0.60,
+            'upside_capture': 0.40,
+            'annual_turnover': 20.0,
+        },
+        settings,
+    )
+    assert hard_pass
+    assert 'turnover_above_soft_max_without_return_override' not in reasons
+
+    hard_pass, reasons = frozen_module._evaluate_hard_constraints(
+        {
+            'annual_return': 0.04,
+            'benchmark_return': 0.02,
+            'max_drawdown': 0.30,
+            'benchmark_max_drawdown': 0.60,
+            'upside_capture': 0.40,
+            'annual_turnover': 20.0,
+        },
+        settings,
+    )
+    assert not hard_pass
+    assert 'turnover_above_soft_max_without_return_override' in reasons
+
+
+def test_return_ratio_remains_bounded_when_benchmark_is_small_or_non_positive() -> None:
+    settings = frozen_module._resolve_candidate_selection_settings({})
+    score_cap = float(settings['score_cap'])
+    base_metrics = {
+        'max_drawdown': 0.25,
+        'benchmark_max_drawdown': 0.60,
+        'sharpe_delta': 0.02,
+        'upside_capture': 0.30,
+        'annual_turnover': 7.0,
+        'utility_total_score': -0.05,
+    }
+
+    _, components_non_positive = frozen_module._compute_selection_score(
+        {
+            **base_metrics,
+            'annual_return': 1.0,
+            'benchmark_return': 0.0,
+        },
+        settings,
+    )
+    _, components_small_positive = frozen_module._compute_selection_score(
+        {
+            **base_metrics,
+            'annual_return': 0.02,
+            'benchmark_return': 0.03,
+        },
+        settings,
+    )
+
+    assert components_non_positive['return_ratio'] == score_cap
+    assert 0.0 <= components_small_positive['return_ratio'] <= score_cap

+ 7 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B2/test_utility.py

@@ -0,0 +1,7 @@
+from backtest.utility import net_utility, utility_status
+
+
+def test_utility_status_net_based() -> None:
+    total = net_utility(0.2, 0.3, 0.8)
+    assert utility_status(total) == 'positive_utility'
+    assert utility_status(-0.01) == 'negative_utility'

+ 27 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B2/utility.py

@@ -0,0 +1,27 @@
+from __future__ import annotations
+
+
+def net_utility(
+    sharpe_delta: float,
+    drawdown_improvement: float,
+    upside_capture: float,
+    turnover_penalty: float = 0.0,
+) -> float:
+    return 0.45 * sharpe_delta + 0.40 * drawdown_improvement + 0.15 * (upside_capture - 0.75) - turnover_penalty
+
+
+def utility_status(total_utility: float) -> str:
+    return 'positive_utility' if total_utility > 0 else 'negative_utility'
+
+
+def utility_from_metrics(metrics: dict[str, float]) -> float:
+    sharpe_delta = float(metrics.get('sharpe_delta', 0.0))
+    drawdown_improvement = float(metrics.get('drawdown_improvement_ratio', 0.0))
+    upside_capture = float(metrics.get('upside_capture', 0.75))
+    turnover_penalty = 0.02 * max(0.0, float(metrics.get('annual_turnover', 0.0)) - 4.0)
+    return net_utility(
+        sharpe_delta=sharpe_delta,
+        drawdown_improvement=drawdown_improvement,
+        upside_capture=upside_capture,
+        turnover_penalty=turnover_penalty,
+    )

+ 95 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B3/state_machine.py

@@ -0,0 +1,95 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import pandas as pd
+
+
+@dataclass
+class StateConfig:
+    min_state_duration: int = 3
+    crash_override: bool = True
+
+
+REQUIRED_STATE_INPUTS: tuple[str, ...] = (
+    'trend_score',
+    'breadth_score',
+    'stress_score',
+    'crowding_score',
+    'down_hazard',
+    'repair_hazard',
+    'rebound_hazard',
+)
+
+
+def _row_is_ready(row: pd.Series) -> bool:
+    if not (bool(row.get('core_score_ready', False)) and bool(row.get('hazard_ready', False))):
+        return False
+    return all(pd.notna(row.get(column)) for column in REQUIRED_STATE_INPUTS)
+
+
+def _raw_state(row: pd.Series) -> str:
+    if row['down_hazard'] >= 0.62 or (row['stress_score'] >= 0.85 and row['trend_score'] <= -0.10):
+        return 'risk_off'
+    if row['repair_hazard'] >= 0.58 and row['stress_score'] <= 0.85 and row['d_stress'] <= 0.0:
+        return 'repair'
+    if row['trend_score'] >= 0.45 and row['breadth_score'] >= -0.05 and row['stress_score'] <= 0.45:
+        if row['crowding_score'] >= 0.70 or row['rebound_hazard'] >= 0.68:
+            return 'euphoric_late'
+        return 'trend'
+    return 'chop'
+
+
+def run_state_machine(df: pd.DataFrame, config: dict[str, Any] | None = None) -> pd.DataFrame:
+    out = df.copy()
+    state_cfg = StateConfig(**((config or {}).get('state_machine', {})))
+
+    current_state = 'warmup'
+    days_in_state = 0
+    system_ready = False
+    active_days: list[int] = []
+    active_states: list[str] = []
+    proposed_states: list[str] = []
+
+    for ts, row in out.iterrows():
+        if not _row_is_ready(row):
+            if system_ready:
+                raise ValueError(f'invalid score/hazard after warmup at {pd.Timestamp(ts).date().isoformat()}')
+            proposal = 'warmup'
+            new_state = 'warmup'
+        else:
+            proposal = _raw_state(row)
+            system_ready = True
+
+            crash_override = state_cfg.crash_override and proposal == 'risk_off' and row['down_hazard'] >= 0.72
+
+            if current_state == 'warmup':
+                new_state = proposal
+            elif crash_override:
+                new_state = 'risk_off'
+            elif proposal == current_state:
+                new_state = current_state
+            elif days_in_state >= state_cfg.min_state_duration:
+                new_state = proposal
+            else:
+                new_state = current_state
+
+        proposed_states.append(proposal)
+
+        if new_state == current_state:
+            days_in_state += 1
+        else:
+            current_state = new_state
+            days_in_state = 1
+
+        active_states.append(current_state)
+        active_days.append(days_in_state)
+
+    out['proposed_state'] = proposed_states
+    out['state'] = active_states
+    out['days_in_state'] = active_days
+    out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+    out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+    out['days_since_breakout'] = (out['breakout_dist_120'].fillna(0.0) > 0.0).groupby((out['breakout_dist_120'].fillna(0.0) <= 0.0).cumsum()).cumsum()
+    return out

+ 129 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B3/test_policy.py

@@ -0,0 +1,129 @@
+import pandas as pd
+import pytest
+
+from data.sample_data import generate_synthetic_chinext50_data
+from config.loader import load_config
+from features.pipeline import build_feature_table
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+from model.policy import build_exposure_plan
+
+
+def test_exposure_plan_is_quantized_and_bounded() -> None:
+    config = load_config()
+    raw = generate_synthetic_chinext50_data(periods=400, seed=11)
+    featured = build_feature_table(raw)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+
+    ladder_step = float(config['trading']['exposure_ladder_step'])
+    max_step = float(config['trading']['max_daily_exposure_change'])
+    assert planned['target_exposure'].between(0.0, 1.0).all()
+    assert (planned['target_exposure'].diff().abs().dropna() <= max_step + 1e-12).all()
+    scaled = planned['target_exposure'].dropna() / ladder_step
+    assert ((scaled - scaled.round()).abs() < 1e-9).all()
+
+
+def test_state_machine_raises_if_invalid_signal_reappears_after_warmup() -> None:
+    idx = pd.date_range('2024-01-01', periods=3, freq='D')
+    df = pd.DataFrame(
+        {
+            'trend_score': [None, 0.6, 0.6],
+            'breadth_score': [None, 0.2, None],
+            'stress_score': [None, 0.1, 0.1],
+            'crowding_score': [None, 0.1, 0.1],
+            'repair_score': [None, 0.1, 0.1],
+            'down_hazard': [None, 0.3, 0.3],
+            'repair_hazard': [None, 0.4, 0.4],
+            'rebound_hazard': [None, 0.4, 0.4],
+            'd_trend': [0.0, 0.0, 0.0],
+            'd_breadth': [0.0, 0.0, 0.0],
+            'd_stress': [0.0, 0.0, 0.0],
+            'd_crowding': [0.0, 0.0, 0.0],
+            'score_acceleration': [0.0, 0.0, 0.0],
+            'core_score_ready': [False, True, False],
+            'hazard_ready': [False, True, True],
+            'breakout_dist_120': [0.0, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='invalid score/hazard after warmup'):
+        run_state_machine(df, load_config())
+
+
+def test_warmup_rows_have_zero_exposure() -> None:
+    idx = pd.date_range('2024-01-01', periods=2, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['warmup', 'chop'],
+            'days_in_state': [1, 1],
+            'down_hazard': [None, 0.3],
+            'repair_hazard': [None, 0.4],
+            'stress_score': [None, 0.1],
+            'trend_score': [None, 0.1],
+            'breadth_score': [None, 0.0],
+            'crowding_score': [None, 0.1],
+            'upper_wick_ratio_5': [None, 0.1],
+        },
+        index=idx,
+    )
+
+    planned = build_exposure_plan(df, load_config())
+    assert planned.loc[idx[0], 'target_exposure'] == 0.0
+    assert planned.loc[idx[0], 'veto_reason'] == 'warmup'
+
+
+def test_policy_does_not_swallow_invalid_trend_inputs() -> None:
+    idx = pd.date_range('2024-01-01', periods=1, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['trend'],
+            'days_in_state': [1],
+            'down_hazard': [0.3],
+            'repair_hazard': [0.4],
+            'stress_score': [0.1],
+            'trend_score': [0.5],
+            'breadth_score': [None],
+            'crowding_score': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='requires non-null "breadth_score"'):
+        build_exposure_plan(df, load_config())
+
+
+def test_candidate_overrides_produce_different_exposure_paths_under_finer_ladder() -> None:
+    idx = pd.date_range('2024-01-01', periods=4, freq='D')
+    state_df = pd.DataFrame(
+        {
+            'state': ['trend', 'trend', 'chop', 'repair'],
+            'days_in_state': [1, 2, 1, 2],
+            'down_hazard': [0.3, 0.3, 0.3, 0.3],
+            'repair_hazard': [0.6, 0.6, 0.6, 0.7],
+            'stress_score': [0.1, 0.1, 0.2, 0.2],
+            'trend_score': [0.5, 0.5, 0.1, 0.2],
+            'breadth_score': [0.0, 0.0, 0.0, 0.0],
+            'crowding_score': [0.1, 0.1, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    baseline_cfg = load_config()
+    pro_risk_cfg = load_config()
+    pro_risk_cfg['policy']['trend'] = 1.00
+    pro_risk_cfg['policy']['euphoric_late'] = 0.65
+    pro_risk_cfg['policy']['chop'] = 0.35
+    pro_risk_cfg['policy']['repair_rebound_base'] = 0.45
+    pro_risk_cfg['policy']['repair_rebound_max'] = 0.95
+    pro_risk_cfg['trading']['max_daily_exposure_change'] = 0.30
+
+    baseline = build_exposure_plan(state_df, baseline_cfg)
+    pro_risk = build_exposure_plan(state_df, pro_risk_cfg)
+
+    assert not baseline['target_exposure'].equals(pro_risk['target_exposure'])

+ 178 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/regime.yaml

@@ -0,0 +1,178 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 95 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/state_machine.py

@@ -0,0 +1,95 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import pandas as pd
+
+
+@dataclass
+class StateConfig:
+    min_state_duration: int = 3
+    crash_override: bool = True
+
+
+REQUIRED_STATE_INPUTS: tuple[str, ...] = (
+    'trend_score',
+    'breadth_score',
+    'stress_score',
+    'crowding_score',
+    'down_hazard',
+    'repair_hazard',
+    'rebound_hazard',
+)
+
+
+def _row_is_ready(row: pd.Series) -> bool:
+    if not (bool(row.get('core_score_ready', False)) and bool(row.get('hazard_ready', False))):
+        return False
+    return all(pd.notna(row.get(column)) for column in REQUIRED_STATE_INPUTS)
+
+
+def _raw_state(row: pd.Series) -> str:
+    if row['down_hazard'] >= 0.62 or (row['stress_score'] >= 0.85 and row['trend_score'] <= -0.10):
+        return 'risk_off'
+    if row['repair_hazard'] >= 0.58 and row['stress_score'] <= 0.85 and row['d_stress'] <= 0.0:
+        return 'repair'
+    if row['trend_score'] >= 0.45 and row['breadth_score'] >= -0.05 and row['stress_score'] <= 0.45:
+        if row['crowding_score'] >= 0.70 or row['rebound_hazard'] >= 0.68:
+            return 'euphoric_late'
+        return 'trend'
+    return 'chop'
+
+
+def run_state_machine(df: pd.DataFrame, config: dict[str, Any] | None = None) -> pd.DataFrame:
+    out = df.copy()
+    state_cfg = StateConfig(**((config or {}).get('state_machine', {})))
+
+    current_state = 'warmup'
+    days_in_state = 0
+    system_ready = False
+    active_days: list[int] = []
+    active_states: list[str] = []
+    proposed_states: list[str] = []
+
+    for ts, row in out.iterrows():
+        if not _row_is_ready(row):
+            if system_ready:
+                raise ValueError(f'invalid score/hazard after warmup at {pd.Timestamp(ts).date().isoformat()}')
+            proposal = 'warmup'
+            new_state = 'warmup'
+        else:
+            proposal = _raw_state(row)
+            system_ready = True
+
+            crash_override = state_cfg.crash_override and proposal == 'risk_off' and row['down_hazard'] >= 0.72
+
+            if current_state == 'warmup':
+                new_state = proposal
+            elif crash_override:
+                new_state = 'risk_off'
+            elif proposal == current_state:
+                new_state = current_state
+            elif days_in_state >= state_cfg.min_state_duration:
+                new_state = proposal
+            else:
+                new_state = current_state
+
+        proposed_states.append(proposal)
+
+        if new_state == current_state:
+            days_in_state += 1
+        else:
+            current_state = new_state
+            days_in_state = 1
+
+        active_states.append(current_state)
+        active_days.append(days_in_state)
+
+    out['proposed_state'] = proposed_states
+    out['state'] = active_states
+    out['days_in_state'] = active_days
+    out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+    out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+    out['days_since_breakout'] = (out['breakout_dist_120'].fillna(0.0) > 0.0).groupby((out['breakout_dist_120'].fillna(0.0) <= 0.0).cumsum()).cumsum()
+    return out

+ 129 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B3_reland/test_policy.py

@@ -0,0 +1,129 @@
+import pandas as pd
+import pytest
+
+from data.sample_data import generate_synthetic_chinext50_data
+from config.loader import load_config
+from features.pipeline import build_feature_table
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+from model.policy import build_exposure_plan
+
+
+def test_exposure_plan_is_quantized_and_bounded() -> None:
+    config = load_config()
+    raw = generate_synthetic_chinext50_data(periods=400, seed=11)
+    featured = build_feature_table(raw)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+
+    ladder_step = float(config['trading']['exposure_ladder_step'])
+    max_step = float(config['trading']['max_daily_exposure_change'])
+    assert planned['target_exposure'].between(0.0, 1.0).all()
+    assert (planned['target_exposure'].diff().abs().dropna() <= max_step + 1e-12).all()
+    scaled = planned['target_exposure'].dropna() / ladder_step
+    assert ((scaled - scaled.round()).abs() < 1e-9).all()
+
+
+def test_state_machine_raises_if_invalid_signal_reappears_after_warmup() -> None:
+    idx = pd.date_range('2024-01-01', periods=3, freq='D')
+    df = pd.DataFrame(
+        {
+            'trend_score': [None, 0.6, 0.6],
+            'breadth_score': [None, 0.2, None],
+            'stress_score': [None, 0.1, 0.1],
+            'crowding_score': [None, 0.1, 0.1],
+            'repair_score': [None, 0.1, 0.1],
+            'down_hazard': [None, 0.3, 0.3],
+            'repair_hazard': [None, 0.4, 0.4],
+            'rebound_hazard': [None, 0.4, 0.4],
+            'd_trend': [0.0, 0.0, 0.0],
+            'd_breadth': [0.0, 0.0, 0.0],
+            'd_stress': [0.0, 0.0, 0.0],
+            'd_crowding': [0.0, 0.0, 0.0],
+            'score_acceleration': [0.0, 0.0, 0.0],
+            'core_score_ready': [False, True, False],
+            'hazard_ready': [False, True, True],
+            'breakout_dist_120': [0.0, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='invalid score/hazard after warmup'):
+        run_state_machine(df, load_config())
+
+
+def test_warmup_rows_have_zero_exposure() -> None:
+    idx = pd.date_range('2024-01-01', periods=2, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['warmup', 'chop'],
+            'days_in_state': [1, 1],
+            'down_hazard': [None, 0.3],
+            'repair_hazard': [None, 0.4],
+            'stress_score': [None, 0.1],
+            'trend_score': [None, 0.1],
+            'breadth_score': [None, 0.0],
+            'crowding_score': [None, 0.1],
+            'upper_wick_ratio_5': [None, 0.1],
+        },
+        index=idx,
+    )
+
+    planned = build_exposure_plan(df, load_config())
+    assert planned.loc[idx[0], 'target_exposure'] == 0.0
+    assert planned.loc[idx[0], 'veto_reason'] == 'warmup'
+
+
+def test_policy_does_not_swallow_invalid_trend_inputs() -> None:
+    idx = pd.date_range('2024-01-01', periods=1, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['trend'],
+            'days_in_state': [1],
+            'down_hazard': [0.3],
+            'repair_hazard': [0.4],
+            'stress_score': [0.1],
+            'trend_score': [0.5],
+            'breadth_score': [None],
+            'crowding_score': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='requires non-null "breadth_score"'):
+        build_exposure_plan(df, load_config())
+
+
+def test_candidate_overrides_produce_different_exposure_paths_under_finer_ladder() -> None:
+    idx = pd.date_range('2024-01-01', periods=4, freq='D')
+    state_df = pd.DataFrame(
+        {
+            'state': ['trend', 'trend', 'chop', 'repair'],
+            'days_in_state': [1, 2, 1, 2],
+            'down_hazard': [0.3, 0.3, 0.3, 0.3],
+            'repair_hazard': [0.6, 0.6, 0.6, 0.7],
+            'stress_score': [0.1, 0.1, 0.2, 0.2],
+            'trend_score': [0.5, 0.5, 0.1, 0.2],
+            'breadth_score': [0.0, 0.0, 0.0, 0.0],
+            'crowding_score': [0.1, 0.1, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    baseline_cfg = load_config()
+    pro_risk_cfg = load_config()
+    pro_risk_cfg['policy']['trend'] = 1.00
+    pro_risk_cfg['policy']['euphoric_late'] = 0.65
+    pro_risk_cfg['policy']['chop'] = 0.35
+    pro_risk_cfg['policy']['repair_rebound_base'] = 0.45
+    pro_risk_cfg['policy']['repair_rebound_max'] = 0.95
+    pro_risk_cfg['trading']['max_daily_exposure_change'] = 0.30
+
+    baseline = build_exposure_plan(state_df, baseline_cfg)
+    pro_risk = build_exposure_plan(state_df, pro_risk_cfg)
+
+    assert not baseline['target_exposure'].equals(pro_risk['target_exposure'])

+ 610 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B4/frozen_walkforward.py

@@ -0,0 +1,610 @@
+from __future__ import annotations
+
+import copy
+import json
+from dataclasses import dataclass
+from typing import Any, Callable, Iterable, Mapping, Sequence
+
+import pandas as pd
+
+from backtest.engine import compute_metrics, run_backtest
+from backtest.utility import core_utility, utility_from_metrics, utility_status
+from features.quality import enforce_feature_information_gate
+from backtest.walkforward import WindowSpec
+from features.pipeline import build_feature_table
+from model.policy import build_exposure_plan
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+
+
+@dataclass(frozen=True)
+class HypothesisCandidate:
+    candidate_id: str
+    overrides: dict[str, Any]
+
+
+DEFAULT_HYPOTHESIS_CANDIDATES: tuple[HypothesisCandidate, ...] = (
+    HypothesisCandidate(
+        candidate_id='defensive',
+        overrides={
+            'policy': {
+                'trend': 0.80,
+                'euphoric_late': 0.30,
+                'chop': 0.20,
+                'repair_rebound_base': 0.30,
+                'repair_rebound_max': 0.65,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.20,
+            },
+        },
+    ),
+    HypothesisCandidate(candidate_id='baseline', overrides={}),
+    HypothesisCandidate(
+        candidate_id='balanced_capture',
+        overrides={
+            'policy': {
+                'trend': 0.95,
+                'euphoric_late': 0.65,
+                'chop': 0.35,
+                'repair_rebound_base': 0.40,
+                'repair_rebound_max': 0.85,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.30,
+            },
+        },
+    ),
+    HypothesisCandidate(
+        candidate_id='pro_risk',
+        overrides={
+            'policy': {
+                'trend': 1.00,
+                'euphoric_late': 0.70,
+                'chop': 0.45,
+                'repair_rebound_base': 0.50,
+                'repair_rebound_max': 0.95,
+            },
+            'trading': {
+                'max_daily_exposure_change': 0.35,
+            },
+        },
+    ),
+)
+
+
+StrategyRunner = Callable[[pd.DataFrame, dict[str, Any]], tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]]
+
+
+def _deep_merge_dict(base: Mapping[str, Any], overrides: Mapping[str, Any]) -> dict[str, Any]:
+    out = copy.deepcopy(dict(base))
+    for key, value in overrides.items():
+        if isinstance(value, Mapping) and isinstance(out.get(key), Mapping):
+            out[key] = _deep_merge_dict(dict(out[key]), value)
+        else:
+            out[key] = copy.deepcopy(value)
+    return out
+
+
+def _resolve_utility(metrics: Mapping[str, float], config: Mapping[str, Any] | None = None) -> tuple[float, str]:
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    utility_total_score = float(
+        metrics.get(
+            'utility_total_score',
+            utility_from_metrics(
+                dict(metrics),
+                upside_target=float(evaluation_cfg.get('utility_upside_target', 0.55)),
+                turnover_penalty_start=float(evaluation_cfg.get('utility_turnover_penalty_start', 8.0)),
+                turnover_penalty_rate=float(evaluation_cfg.get('utility_turnover_penalty_rate', 0.010)),
+            ),
+        )
+    )
+    utility_state = str(metrics.get('utility_status', utility_status(utility_total_score)))
+    return utility_total_score, utility_state
+
+
+def run_strategy_bundle(df: pd.DataFrame, config: dict[str, Any]) -> tuple[pd.DataFrame, pd.DataFrame, dict[str, float]]:
+    featured = build_feature_table(df)
+    enforce_feature_information_gate(featured, config)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+    ledger, metrics = run_backtest(planned, config)
+
+    utility_total_score, utility_state = _resolve_utility(metrics, config)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return planned, ledger, out_metrics
+
+
+def normalize_hypothesis_candidates(raw_candidates: Iterable[Mapping[str, Any]] | None) -> list[HypothesisCandidate]:
+    if raw_candidates is None:
+        return [copy.deepcopy(candidate) for candidate in DEFAULT_HYPOTHESIS_CANDIDATES]
+
+    candidates: list[HypothesisCandidate] = []
+    for idx, item in enumerate(raw_candidates):
+        candidate_id = str(item.get('id', item.get('candidate_id', f'candidate_{idx + 1}'))).strip()
+        if not candidate_id:
+            raise ValueError(f'Candidate index {idx} is missing an id.')
+        overrides_raw = item.get('overrides', {})
+        if not isinstance(overrides_raw, Mapping):
+            raise ValueError(f'Candidate {candidate_id} overrides must be an object.')
+        candidates.append(HypothesisCandidate(candidate_id=candidate_id, overrides=dict(overrides_raw)))
+
+    if not candidates:
+        raise ValueError('At least one hypothesis candidate is required.')
+
+    ids = [candidate.candidate_id for candidate in candidates]
+    if len(set(ids)) != len(ids):
+        raise ValueError(f'Duplicate candidate ids found: {ids}')
+    return candidates
+
+
+def _candidate_config(base_config: Mapping[str, Any], candidate: HypothesisCandidate) -> dict[str, Any]:
+    merged = _deep_merge_dict(base_config, candidate.overrides)
+    merged['_candidate_id'] = candidate.candidate_id
+    return merged
+
+
+def _prefixed_metrics(prefix: str, metrics: Mapping[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[f'{prefix}_{key}'] = float(value)
+        else:
+            out[f'{prefix}_{key}'] = value
+    return out
+
+
+def _compute_window_metrics(ledger: pd.DataFrame, config: Mapping[str, Any] | None = None) -> dict[str, float]:
+    required_columns = {'strategy_return_net', 'asset_exec_return', 'turnover'}
+    if not required_columns.issubset(ledger.columns):
+        raise ValueError(f'Ledger is missing required columns: {sorted(required_columns - set(ledger.columns))}')
+    metrics = compute_metrics(
+        strategy_returns=ledger['strategy_return_net'],
+        benchmark_returns=ledger['asset_exec_return'],
+        turnover=ledger['turnover'],
+    )
+    utility_total_score, utility_state = _resolve_utility(metrics, config)
+    out_metrics = dict(metrics)
+    out_metrics['utility_total_score'] = utility_total_score
+    out_metrics['utility_status'] = utility_state
+    return out_metrics
+
+
+def _window_row_base(window: WindowSpec) -> dict[str, Any]:
+    return {
+        'train_start': window.train_start,
+        'train_end': window.train_end,
+        'test_start': window.test_start,
+        'test_end': window.test_end,
+    }
+
+
+def _clip(value: float, lower: float, upper: float) -> float:
+    return float(min(max(value, lower), upper))
+
+
+def _safe_float(value: Any, default: float = 0.0) -> float:
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return float(default)
+
+
+def _resolve_candidate_selection_settings(config: Mapping[str, Any]) -> dict[str, Any]:
+    frozen_cfg = dict((config or {}).get('frozen_validation', {}))
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    cfg = dict(frozen_cfg.get('candidate_selection', {}))
+    return {
+        'use_hard_constraints': bool(cfg.get('use_hard_constraints', True)),
+        'upside_capture_min': float(cfg.get('upside_capture_min', 0.28)),
+        'max_drawdown_ratio_vs_benchmark': float(cfg.get('max_drawdown_ratio_vs_benchmark', 0.72)),
+        'annual_turnover_soft_max': float(cfg.get('annual_turnover_soft_max', 18.0)),
+        'annual_return_override_abs': float(cfg.get('annual_return_override_abs', 0.05)),
+        'annual_return_override_ratio': float(cfg.get('annual_return_override_ratio', 0.40)),
+        'return_ratio_weight': float(cfg.get('return_ratio_weight', 0.30)),
+        'upside_weight': float(cfg.get('upside_weight', 0.30)),
+        'drawdown_weight': float(cfg.get('drawdown_weight', 0.20)),
+        'sharpe_delta_weight': float(cfg.get('sharpe_delta_weight', 0.10)),
+        'stability_weight': float(cfg.get('stability_weight', 0.10)),
+        'turnover_penalty_per_unit': float(cfg.get('turnover_penalty_per_unit', 0.015)),
+        'score_cap': float(cfg.get('score_cap', 1.2)),
+        'upside_target': float(cfg.get('upside_target', 0.45)),
+        'drawdown_improvement_target': float(cfg.get('drawdown_improvement_target', 0.35)),
+        'sharpe_delta_shift': float(cfg.get('sharpe_delta_shift', 0.05)),
+        'sharpe_delta_scale': float(cfg.get('sharpe_delta_scale', 0.15)),
+        'turnover_penalty_start': float(cfg.get('turnover_penalty_start', 12.0)),
+        'core_utility_floor': float(cfg.get('core_utility_floor', cfg.get('utility_floor', -0.05))),
+        'core_utility_target': float(cfg.get('core_utility_target', cfg.get('utility_target', 0.10))),
+        'utility_upside_target': float(evaluation_cfg.get('utility_upside_target', 0.55)),
+        'fallback_mode': str(cfg.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower(),
+    }
+
+
+def _compute_selection_score(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    sharpe_delta = _safe_float(metrics.get('sharpe_delta'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+
+    score_cap = float(settings['score_cap'])
+    upside_target = max(float(settings['upside_target']), 1e-12)
+    drawdown_target = max(float(settings['drawdown_improvement_target']), 1e-12)
+    sharpe_scale = max(float(settings['sharpe_delta_scale']), 1e-12)
+
+    if benchmark_return > 0.05:
+        return_ratio = _clip(annual_return / benchmark_return, 0.0, score_cap)
+    else:
+        return_ratio = _clip(annual_return / 0.10, 0.0, score_cap)
+    upside_score = _clip((upside_capture - 0.15) / max(upside_target - 0.15, 1e-12), 0.0, score_cap)
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_improvement = (benchmark_max_drawdown - max_drawdown) / benchmark_max_drawdown
+    else:
+        drawdown_improvement = 0.0
+    core_utility_value = _safe_float(
+        metrics.get(
+            'core_utility_score',
+            core_utility(
+                sharpe_delta=sharpe_delta,
+                drawdown_improvement=drawdown_improvement,
+                upside_capture=upside_capture,
+                upside_target=float(settings['utility_upside_target']),
+            ),
+        )
+    )
+    drawdown_score = _clip(drawdown_improvement / drawdown_target, 0.0, score_cap)
+    sharpe_delta_score = _clip((sharpe_delta + float(settings['sharpe_delta_shift'])) / sharpe_scale, 0.0, score_cap)
+    stability_score = _clip(
+        (core_utility_value - float(settings['core_utility_floor']))
+        / max(float(settings['core_utility_target']) - float(settings['core_utility_floor']), 1e-12),
+        0.0,
+        score_cap,
+    )
+    turnover_penalty = max(0.0, annual_turnover - float(settings['turnover_penalty_start'])) * float(
+        settings['turnover_penalty_per_unit']
+    )
+
+    score = (
+        float(settings['return_ratio_weight']) * return_ratio
+        + float(settings['upside_weight']) * upside_score
+        + float(settings['drawdown_weight']) * drawdown_score
+        + float(settings['sharpe_delta_weight']) * sharpe_delta_score
+        + float(settings['stability_weight']) * stability_score
+        - turnover_penalty
+    )
+    return score, {
+        'return_ratio': return_ratio,
+        'upside_score': upside_score,
+        'drawdown_score': drawdown_score,
+        'sharpe_delta_score': sharpe_delta_score,
+        'core_utility_value': core_utility_value,
+        'stability_score': stability_score,
+        'turnover_penalty': turnover_penalty,
+    }
+
+
+def _evaluate_hard_constraints(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[bool, list[str]]:
+    reasons: list[str] = []
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    if upside_capture < float(settings['upside_capture_min']):
+        reasons.append('upside_capture_below_min')
+
+    if benchmark_max_drawdown > 1e-12:
+        drawdown_ratio = max_drawdown / benchmark_max_drawdown
+        if drawdown_ratio > float(settings['max_drawdown_ratio_vs_benchmark']):
+            reasons.append('drawdown_ratio_above_max')
+
+    turnover_cap = float(settings['annual_turnover_soft_max'])
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+    if annual_turnover > turnover_cap and annual_return < return_override_threshold:
+        reasons.append('turnover_above_soft_max_without_return_override')
+
+    return len(reasons) == 0, reasons
+
+
+def _constraint_distance(metrics: Mapping[str, Any], settings: Mapping[str, Any]) -> tuple[float, dict[str, float]]:
+    upside_capture = _safe_float(metrics.get('upside_capture'))
+    max_drawdown = _safe_float(metrics.get('max_drawdown'))
+    benchmark_max_drawdown = _safe_float(metrics.get('benchmark_max_drawdown'))
+    annual_turnover = _safe_float(metrics.get('annual_turnover'))
+    annual_return = _safe_float(metrics.get('annual_return'))
+    benchmark_return = _safe_float(metrics.get('benchmark_return'))
+
+    upside_min = max(float(settings['upside_capture_min']), 1e-12)
+    drawdown_max = max(float(settings['max_drawdown_ratio_vs_benchmark']), 1e-12)
+    turnover_soft_max = max(float(settings['annual_turnover_soft_max']), 1e-12)
+    return_override_threshold = max(
+        float(settings['annual_return_override_abs']),
+        float(settings['annual_return_override_ratio']) * max(benchmark_return, 0.0),
+    )
+
+    upside_gap = max(0.0, upside_min - upside_capture) / upside_min
+    drawdown_ratio = (max_drawdown / benchmark_max_drawdown) if benchmark_max_drawdown > 1e-12 else 0.0
+    drawdown_gap = max(0.0, drawdown_ratio - drawdown_max) / drawdown_max
+
+    turnover_gap = 0.0
+    if annual_turnover > turnover_soft_max and annual_return < return_override_threshold:
+        turnover_gap = (annual_turnover - turnover_soft_max) / turnover_soft_max
+
+    violation_distance = 0.50 * upside_gap + 0.30 * drawdown_gap + 0.20 * turnover_gap
+    return float(violation_distance), {
+        'upside_gap': float(upside_gap),
+        'drawdown_gap': float(drawdown_gap),
+        'turnover_gap': float(turnover_gap),
+    }
+
+
+def run_frozen_walkforward(
+    raw: pd.DataFrame,
+    config: Mapping[str, Any],
+    windows: Sequence[WindowSpec],
+    *,
+    candidates: Sequence[HypothesisCandidate] | None = None,
+    min_train_rows: int = 120,
+    min_test_rows: int = 40,
+    strategy_runner: StrategyRunner | None = None,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    if min_train_rows <= 0:
+        raise ValueError('min_train_rows must be positive.')
+    if min_test_rows <= 0:
+        raise ValueError('min_test_rows must be positive.')
+
+    runner = strategy_runner or run_strategy_bundle
+    candidate_list = list(candidates or DEFAULT_HYPOTHESIS_CANDIDATES)
+    if not candidate_list:
+        raise ValueError('At least one candidate is required for frozen walk-forward.')
+    selection_settings = _resolve_candidate_selection_settings(config)
+
+    rows: list[dict[str, Any]] = []
+
+    for window in windows:
+        train_slice = raw.loc[window.train_start:window.train_end].copy()
+        test_slice = raw.loc[window.test_start:window.test_end].copy()
+
+        row = _window_row_base(window)
+        row['train_rows'] = int(len(train_slice))
+        row['test_rows'] = int(len(test_slice))
+        row['candidate_count'] = int(len(candidate_list))
+
+        if len(train_slice) < min_train_rows:
+            row['status'] = 'skipped_insufficient_train'
+            rows.append(row)
+            continue
+        if len(test_slice) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        selected_candidate: HypothesisCandidate | None = None
+        selected_train_metrics: dict[str, float] | None = None
+        selected_train_utility = float('-inf')
+        selected_train_score = float('-inf')
+        selected_train_hard_pass = False
+        selected_train_constraint_failures: list[str] = []
+        selected_train_violation_distance = 0.0
+        selected_train_violation_components: dict[str, float] = {}
+        selection_mode = 'constraint_score'
+        candidate_evaluations: list[dict[str, Any]] = []
+
+        for candidate in candidate_list:
+            candidate_config = _candidate_config(config, candidate)
+            _, _, train_metrics_raw = runner(train_slice, candidate_config)
+            train_metrics = dict(train_metrics_raw)
+            utility_value, _ = _resolve_utility(train_metrics)
+            train_metrics['utility_total_score'] = utility_value
+            train_metrics['utility_status'] = utility_status(utility_value)
+            hard_pass, hard_fail_reasons = _evaluate_hard_constraints(train_metrics, selection_settings)
+            score_value, score_components = _compute_selection_score(train_metrics, selection_settings)
+            violation_distance, violation_components = _constraint_distance(train_metrics, selection_settings)
+            candidate_evaluations.append(
+                {
+                    'candidate': candidate,
+                    'metrics': train_metrics,
+                    'utility': utility_value,
+                    'hard_pass': hard_pass,
+                    'hard_fail_reasons': hard_fail_reasons,
+                    'selection_score': score_value,
+                    'selection_score_components': score_components,
+                    'violation_distance': violation_distance,
+                    'violation_components': violation_components,
+                }
+            )
+
+        use_hard_constraints = bool(selection_settings['use_hard_constraints'])
+        ranking_pool = (
+            [item for item in candidate_evaluations if item['hard_pass']]
+            if use_hard_constraints
+            else candidate_evaluations
+        )
+
+        if ranking_pool:
+            for item in ranking_pool:
+                score_value = float(item['selection_score'])
+                if score_value > selected_train_score:
+                    selected_train_score = score_value
+                    selected_candidate = item['candidate']
+                    selected_train_metrics = item['metrics']
+                    selected_train_utility = float(item['utility'])
+                    selected_train_hard_pass = bool(item['hard_pass'])
+                    selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                    selected_train_violation_distance = float(item['violation_distance'])
+                    selected_train_violation_components = dict(item['violation_components'])
+        else:
+            fallback_mode = str(selection_settings.get('fallback_mode', 'closest_to_feasible_frontier')).strip().lower()
+            if fallback_mode == 'closest_to_feasible_frontier':
+                selection_mode = 'frontier_fallback_no_hard_pass'
+                selected_fallback_score = float('-inf')
+                for item in candidate_evaluations:
+                    fallback_score = -float(item['violation_distance']) + 0.25 * float(item['selection_score'])
+                    utility_value = float(item['utility'])
+                    if (
+                        fallback_score > selected_fallback_score
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) > selected_train_score
+                        )
+                        or (
+                            fallback_score == selected_fallback_score
+                            and float(item['selection_score']) == selected_train_score
+                            and utility_value > selected_train_utility
+                        )
+                    ):
+                        selected_fallback_score = fallback_score
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+            else:
+                selection_mode = 'utility_fallback_no_hard_pass'
+                for item in candidate_evaluations:
+                    utility_value = float(item['utility'])
+                    if utility_value > selected_train_utility:
+                        selected_train_utility = utility_value
+                        selected_candidate = item['candidate']
+                        selected_train_metrics = item['metrics']
+                        selected_train_score = float(item['selection_score'])
+                        selected_train_hard_pass = bool(item['hard_pass'])
+                        selected_train_constraint_failures = list(item['hard_fail_reasons'])
+                        selected_train_violation_distance = float(item['violation_distance'])
+                        selected_train_violation_components = dict(item['violation_components'])
+
+        hard_pass_count = int(sum(1 for item in candidate_evaluations if bool(item['hard_pass'])))
+        ranking_brief = [
+            {
+                'candidate_id': item['candidate'].candidate_id,
+                'hard_pass': bool(item['hard_pass']),
+                'selection_score': float(item['selection_score']),
+                'train_utility_total_score': float(item['utility']),
+                'hard_fail_reasons': list(item['hard_fail_reasons']),
+                'violation_distance': float(item['violation_distance']),
+            }
+            for item in candidate_evaluations
+        ]
+        ranking_brief.sort(key=lambda x: (-x['hard_pass'], -x['selection_score'], -x['train_utility_total_score']))
+
+        if selected_candidate is None or selected_train_metrics is None:
+            row['status'] = 'skipped_no_candidate'
+            rows.append(row)
+            continue
+
+        combined_slice = raw.loc[window.train_start:window.test_end].copy()
+        candidate_config = _candidate_config(config, selected_candidate)
+        _, combined_ledger, _ = runner(combined_slice, candidate_config)
+        frozen_test_ledger = combined_ledger.loc[window.test_start:window.test_end].copy()
+
+        if len(frozen_test_ledger) < min_test_rows:
+            row['status'] = 'skipped_insufficient_test'
+            rows.append(row)
+            continue
+
+        test_metrics = _compute_window_metrics(frozen_test_ledger, candidate_config)
+
+        row.update(
+            {
+                'status': 'ok',
+                'selected_candidate_id': selected_candidate.candidate_id,
+                'selection_mode': selection_mode,
+                'train_candidate_hard_pass_count': hard_pass_count,
+                'train_candidate_total_count': int(len(candidate_evaluations)),
+                'selected_train_selection_score': float(selected_train_score),
+                'selected_train_hard_pass': bool(selected_train_hard_pass),
+                'selected_train_constraint_failures': json.dumps(
+                    selected_train_constraint_failures,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'selected_train_violation_distance': float(selected_train_violation_distance),
+                'selected_train_violation_components': json.dumps(
+                    selected_train_violation_components,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+                'train_candidate_rankings': json.dumps(ranking_brief, ensure_ascii=False, sort_keys=True),
+                'selected_candidate_overrides': json.dumps(
+                    selected_candidate.overrides,
+                    ensure_ascii=False,
+                    sort_keys=True,
+                ),
+            }
+        )
+        row.update(_prefixed_metrics('train', selected_train_metrics))
+        row.update(_prefixed_metrics('test', test_metrics))
+        rows.append(row)
+
+    board = pd.DataFrame(rows)
+    if board.empty:
+        board = pd.DataFrame(columns=['status'])
+
+    ok_board = board[board['status'] == 'ok'].copy() if 'status' in board.columns else pd.DataFrame()
+    selected_distribution = (
+        ok_board['selected_candidate_id'].value_counts().to_dict() if 'selected_candidate_id' in ok_board.columns else {}
+    )
+    status_counts = board['status'].value_counts().to_dict() if 'status' in board.columns else {}
+    selection_mode_distribution = (
+        ok_board['selection_mode'].value_counts().to_dict() if not ok_board.empty and 'selection_mode' in ok_board.columns else {}
+    )
+    windows_with_hard_pass_candidate_count = (
+        int((ok_board['train_candidate_hard_pass_count'] > 0).sum())
+        if not ok_board.empty and 'train_candidate_hard_pass_count' in ok_board.columns
+        else 0
+    )
+    hard_pass_window_ratio = (
+        float(windows_with_hard_pass_candidate_count / len(ok_board))
+        if len(ok_board) > 0
+        else 0.0
+    )
+    positive_window_ratio = (
+        float((ok_board['test_utility_total_score'] > 0.0).mean())
+        if not ok_board.empty and 'test_utility_total_score' in ok_board.columns
+        else 0.0
+    )
+    fallback_distance_distribution = (
+        ok_board.loc[
+            ok_board['selection_mode'].isin({'frontier_fallback_no_hard_pass', 'utility_fallback_no_hard_pass'}),
+            'selected_train_violation_distance',
+        ]
+        .dropna()
+        .tolist()
+        if not ok_board.empty
+        and 'selection_mode' in ok_board.columns
+        and 'selected_train_violation_distance' in ok_board.columns
+        else []
+    )
+
+    summary = {
+        'total_windows': int(len(windows)),
+        'processed_window_count': int(len(ok_board)),
+        'skipped_window_count': int(max(len(windows) - len(ok_board), 0)),
+        'positive_window_ratio': positive_window_ratio,
+        'selected_candidate_distribution': selected_distribution,
+        'window_status_counts': status_counts,
+        'selection_mode_distribution': selection_mode_distribution,
+        'windows_with_hard_pass_candidate_count': windows_with_hard_pass_candidate_count,
+        'windows_without_hard_pass_candidate_count': int(max(len(ok_board) - windows_with_hard_pass_candidate_count, 0)),
+        'hard_pass_window_ratio': hard_pass_window_ratio,
+        'fallback_distance_distribution': [float(x) for x in fallback_distance_distribution],
+        'candidate_ids': [candidate.candidate_id for candidate in candidate_list],
+        'min_train_rows': int(min_train_rows),
+        'min_test_rows': int(min_test_rows),
+        'candidate_selection': selection_settings,
+    }
+    return board, summary

+ 587 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B4/real_walkforward_report.py

@@ -0,0 +1,587 @@
+from __future__ import annotations
+
+import copy
+from pathlib import Path
+import sys
+from typing import Any, Mapping
+
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+import argparse
+import json
+
+import pandas as pd
+
+from backtest.engine import compute_metrics, run_backtest
+from backtest.frozen_walkforward import (
+    HypothesisCandidate,
+    normalize_hypothesis_candidates,
+    run_frozen_walkforward,
+    run_strategy_bundle,
+)
+from backtest.utility import utility_from_metrics, utility_status
+from backtest.walkforward import WindowSpec, build_expanding_windows
+from config.loader import load_config
+from data.io import evaluate_data_quality_gate, load_full_pit_data
+
+
+def _resolve_data_quality_settings(
+    config: dict[str, Any],
+    *,
+    strict_cli: bool,
+    min_coverage_cli: float | None,
+) -> tuple[bool, float, list[str] | None, list[str] | None, dict[str, float]]:
+    quality_cfg = config.get('data_quality', {})
+    strict_mode = bool(quality_cfg.get('strict_mode_default', False)) or strict_cli
+    default_min_coverage = float(quality_cfg.get('default_min_coverage', 0.95))
+    if min_coverage_cli is not None:
+        default_min_coverage = float(min_coverage_cli)
+    critical_columns = [str(col).strip().lower() for col in quality_cfg.get('critical_columns', [])]
+    blocking_columns = [str(col).strip().lower() for col in quality_cfg.get('blocking_columns', critical_columns)]
+    column_min_coverage = {
+        str(column).strip().lower(): float(value) for column, value in quality_cfg.get('column_min_coverage', {}).items()
+    }
+    return strict_mode, default_min_coverage, (critical_columns or None), (blocking_columns or None), column_min_coverage
+
+
+def _load_candidate_payload(path: str | None) -> list[dict[str, Any]] | None:
+    if not path:
+        return None
+    with Path(path).open('r', encoding='utf-8') as fh:
+        payload = json.load(fh)
+    if not isinstance(payload, list):
+        raise ValueError('Candidate file must be a JSON list of candidate objects.')
+    return payload
+
+
+def _resolve_frozen_settings(
+    config: dict[str, Any],
+    *,
+    candidates_json: str | None,
+    min_train_rows_cli: int | None,
+    min_test_rows_cli: int | None,
+) -> tuple[list[HypothesisCandidate], int, int]:
+    frozen_cfg = config.get('frozen_validation', {})
+    raw_candidates = _load_candidate_payload(candidates_json) or frozen_cfg.get('candidates')
+    candidates = normalize_hypothesis_candidates(raw_candidates)
+
+    min_train_rows = int(frozen_cfg.get('min_train_rows', 120))
+    min_test_rows = int(frozen_cfg.get('min_test_rows', 40))
+    if min_train_rows_cli is not None:
+        min_train_rows = int(min_train_rows_cli)
+    if min_test_rows_cli is not None:
+        min_test_rows = int(min_test_rows_cli)
+    return candidates, min_train_rows, min_test_rows
+
+
+def _serialize_windows(windows: list[WindowSpec]) -> list[dict[str, str]]:
+    return [
+        {
+            'train_start': window.train_start,
+            'train_end': window.train_end,
+            'test_start': window.test_start,
+            'test_end': window.test_end,
+        }
+        for window in windows
+    ]
+
+
+def _resolve_walkforward_windows(config: dict[str, Any], raw_index) -> list[WindowSpec]:
+    frozen_cfg = config.get('frozen_validation', {})
+    window_mode = str(frozen_cfg.get('window_mode', 'expanding')).strip().lower()
+    if window_mode != 'expanding':
+        raise ValueError(f'Unsupported window_mode: {window_mode}')
+    return build_expanding_windows(
+        raw_index,
+        min_train_years=int(frozen_cfg.get('min_train_years', 2)),
+        test_years=int(frozen_cfg.get('test_years', 1)),
+        allow_partial_last_test=bool(frozen_cfg.get('allow_partial_last_test', True)),
+    )
+
+
+def _normalize_metrics(metrics: dict[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[key] = float(value)
+        else:
+            out[key] = value
+    return out
+
+
+def _safe_divide(numerator: float, denominator: float) -> float | None:
+    if abs(float(denominator)) < 1e-12:
+        return None
+    return float(numerator / denominator)
+
+
+def _build_baseline_plan(raw: pd.DataFrame) -> pd.DataFrame:
+    baseline = raw.copy()
+    baseline['target_exposure'] = 1.0
+    return baseline
+
+
+def _deep_merge_dict(base: Mapping[str, Any], overrides: Mapping[str, Any]) -> dict[str, Any]:
+    out = copy.deepcopy(dict(base))
+    for key, value in overrides.items():
+        if isinstance(value, Mapping) and isinstance(out.get(key), Mapping):
+            out[key] = _deep_merge_dict(dict(out[key]), value)
+        else:
+            out[key] = copy.deepcopy(value)
+    return out
+
+
+def _candidate_config(base_config: Mapping[str, Any], candidate: str, overrides: Mapping[str, Any]) -> dict[str, Any]:
+    merged = _deep_merge_dict(base_config, overrides)
+    merged['_candidate_id'] = str(candidate)
+    return merged
+
+
+def _resolve_window_success_rule(config: Mapping[str, Any]) -> dict[str, Any]:
+    evaluation_cfg = dict((config or {}).get('evaluation', {}))
+    return {
+        'upside_min': float(evaluation_cfg.get('primary_window_success_upside_min', 0.25)),
+        'drawdown_ratio_max': float(evaluation_cfg.get('primary_window_success_drawdown_ratio_max', 0.80)),
+        'turnover_max': float(evaluation_cfg.get('primary_window_success_turnover_max', 22.0)),
+        'require_positive_return': bool(evaluation_cfg.get('primary_window_success_require_positive_return', True)),
+        'ratio_min': float(evaluation_cfg.get('primary_window_success_ratio_min', 0.50)),
+        'ratio_target': float(evaluation_cfg.get('primary_window_success_ratio_target', 0.60)),
+        'primary_window_min_rows': int(evaluation_cfg.get('primary_window_min_rows', 180)),
+    }
+
+
+def _window_success_diagnostics(
+    board: pd.DataFrame,
+    rule: Mapping[str, Any],
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    if board.empty or 'status' not in board.columns:
+        return board.copy(), {
+            'primary_window_count': 0,
+            'partial_window_count': 0,
+            'primary_window_success_count': 0,
+            'partial_window_success_count': 0,
+            'primary_window_success_ratio': 0.0,
+            'partial_window_success_ratio': 0.0,
+            'max_primary_window_drawdown_ratio_vs_baseline': None,
+            'median_primary_window_upside_capture': None,
+            'window_success_rule': dict(rule),
+        }
+
+    enriched = board.copy()
+    ok_mask = enriched['status'] == 'ok'
+    ok_board = enriched.loc[ok_mask].copy()
+    if ok_board.empty:
+        enriched['window_is_primary'] = False
+        enriched['window_success'] = False
+        enriched['test_drawdown_ratio_vs_benchmark'] = None
+        return enriched, {
+            'primary_window_count': 0,
+            'partial_window_count': 0,
+            'primary_window_success_count': 0,
+            'partial_window_success_count': 0,
+            'primary_window_success_ratio': 0.0,
+            'partial_window_success_ratio': 0.0,
+            'max_primary_window_drawdown_ratio_vs_baseline': None,
+            'median_primary_window_upside_capture': None,
+            'window_success_rule': dict(rule),
+        }
+
+    ok_board['window_is_primary'] = ok_board['test_rows'].astype(float) >= float(rule['primary_window_min_rows'])
+    ok_board['test_drawdown_ratio_vs_benchmark'] = ok_board.apply(
+        lambda row: _safe_divide(
+            float(row.get('test_max_drawdown', 0.0)),
+            float(row.get('test_benchmark_max_drawdown', 0.0)),
+        ),
+        axis=1,
+    )
+    require_positive_return = bool(rule['require_positive_return'])
+    if require_positive_return:
+        positive_return_ok = ok_board['test_annual_return'].astype(float) > 0.0
+    else:
+        positive_return_ok = pd.Series(True, index=ok_board.index)
+    upside_ok = ok_board['test_upside_capture'].astype(float) >= float(rule['upside_min'])
+    drawdown_ok = ok_board['test_drawdown_ratio_vs_benchmark'].fillna(float('inf')) <= float(rule['drawdown_ratio_max'])
+    turnover_ok = ok_board['test_annual_turnover'].astype(float) <= float(rule['turnover_max'])
+    ok_board['window_success'] = positive_return_ok & upside_ok & drawdown_ok & turnover_ok
+
+    primary_board = ok_board.loc[ok_board['window_is_primary']]
+    partial_board = ok_board.loc[~ok_board['window_is_primary']]
+    primary_success_count = int(primary_board['window_success'].sum()) if not primary_board.empty else 0
+    partial_success_count = int(partial_board['window_success'].sum()) if not partial_board.empty else 0
+
+    diagnostics = {
+        'primary_window_count': int(len(primary_board)),
+        'partial_window_count': int(len(partial_board)),
+        'primary_window_success_count': primary_success_count,
+        'partial_window_success_count': partial_success_count,
+        'primary_window_success_ratio': float(primary_success_count / len(primary_board)) if len(primary_board) else 0.0,
+        'partial_window_success_ratio': float(partial_success_count / len(partial_board)) if len(partial_board) else 0.0,
+        'max_primary_window_drawdown_ratio_vs_baseline': (
+            float(primary_board['test_drawdown_ratio_vs_benchmark'].max())
+            if not primary_board.empty
+            else None
+        ),
+        'median_primary_window_upside_capture': (
+            float(primary_board['test_upside_capture'].median()) if not primary_board.empty else None
+        ),
+        'window_success_rule': dict(rule),
+    }
+
+    enriched['window_is_primary'] = False
+    enriched['window_success'] = False
+    enriched['test_drawdown_ratio_vs_benchmark'] = None
+    enriched.loc[ok_board.index, 'window_is_primary'] = ok_board['window_is_primary']
+    enriched.loc[ok_board.index, 'window_success'] = ok_board['window_success']
+    enriched.loc[ok_board.index, 'test_drawdown_ratio_vs_benchmark'] = ok_board['test_drawdown_ratio_vs_benchmark']
+    return enriched, diagnostics
+
+
+def _resolve_selected_overrides(
+    row: Mapping[str, Any],
+    candidate_overrides: Mapping[str, Mapping[str, Any]],
+) -> dict[str, Any]:
+    candidate_id = str(row.get('selected_candidate_id', '')).strip()
+    if candidate_id in candidate_overrides:
+        return copy.deepcopy(dict(candidate_overrides[candidate_id]))
+    serialized = row.get('selected_candidate_overrides')
+    if not isinstance(serialized, str) or not serialized.strip():
+        return {}
+    try:
+        parsed = json.loads(serialized)
+    except json.JSONDecodeError:
+        return {}
+    if not isinstance(parsed, dict):
+        return {}
+    return parsed
+
+
+def _build_stitched_frozen_oos_ledger(
+    raw: pd.DataFrame,
+    config: Mapping[str, Any],
+    board: pd.DataFrame,
+    candidate_overrides: Mapping[str, Mapping[str, Any]],
+) -> pd.DataFrame:
+    if board.empty or 'status' not in board.columns:
+        return pd.DataFrame()
+    ok_board = board.loc[board['status'] == 'ok'].copy()
+    if ok_board.empty:
+        return pd.DataFrame()
+
+    stitched_parts: list[pd.DataFrame] = []
+    for idx, row in ok_board.iterrows():
+        candidate_id = str(row.get('selected_candidate_id', '')).strip()
+        if not candidate_id:
+            continue
+        overrides = _resolve_selected_overrides(row, candidate_overrides)
+        candidate_cfg = _candidate_config(config, candidate_id, overrides)
+        combined_slice = raw.loc[str(row['train_start']) : str(row['test_end'])].copy()
+        _, combined_ledger, _ = run_strategy_bundle(combined_slice, candidate_cfg)
+        test_ledger = combined_ledger.loc[str(row['test_start']) : str(row['test_end'])].copy()
+        if test_ledger.empty:
+            continue
+        test_ledger['frozen_window_index'] = int(idx)
+        test_ledger['selected_candidate_id'] = candidate_id
+        test_ledger['window_test_start'] = str(row['test_start'])
+        test_ledger['window_test_end'] = str(row['test_end'])
+        stitched_parts.append(test_ledger)
+
+    if not stitched_parts:
+        return pd.DataFrame()
+
+    stitched = pd.concat(stitched_parts, axis=0).sort_index()
+    if stitched.index.has_duplicates:
+        stitched = stitched.loc[~stitched.index.duplicated(keep='first')].copy()
+    return stitched
+
+
+def _metrics_from_ledger(ledger: pd.DataFrame, config: Mapping[str, Any]) -> dict[str, Any]:
+    annualization = int(dict((config or {}).get('trading', {})).get('annualization', 252))
+    if ledger.empty:
+        metrics = compute_metrics(
+            strategy_returns=pd.Series(dtype=float),
+            benchmark_returns=pd.Series(dtype=float),
+            turnover=pd.Series(dtype=float),
+            annualization=annualization,
+        )
+    else:
+        metrics = compute_metrics(
+            strategy_returns=ledger['strategy_return_net'],
+            benchmark_returns=ledger['asset_exec_return'],
+            turnover=ledger['turnover'] if 'turnover' in ledger.columns else None,
+            tracking_difference=ledger['tracking_difference'] if 'tracking_difference' in ledger.columns else None,
+            annualization=annualization,
+        )
+    out = _normalize_metrics(metrics)
+    out['utility_total_score'] = float(utility_from_metrics(out))
+    out['utility_status'] = utility_status(out['utility_total_score'])
+    return out
+
+
+def _baseline_metrics_on_same_dates(stitched_ledger: pd.DataFrame, config: Mapping[str, Any]) -> dict[str, Any]:
+    annualization = int(dict((config or {}).get('trading', {})).get('annualization', 252))
+    if stitched_ledger.empty:
+        metrics = compute_metrics(
+            strategy_returns=pd.Series(dtype=float),
+            benchmark_returns=pd.Series(dtype=float),
+            turnover=pd.Series(dtype=float),
+            annualization=annualization,
+        )
+    else:
+        baseline_returns = stitched_ledger['asset_exec_return']
+        baseline_turnover = pd.Series(0.0, index=stitched_ledger.index, dtype=float)
+        baseline_tracking = pd.Series(0.0, index=stitched_ledger.index, dtype=float)
+        metrics = compute_metrics(
+            strategy_returns=baseline_returns,
+            benchmark_returns=baseline_returns,
+            turnover=baseline_turnover,
+            tracking_difference=baseline_tracking,
+            annualization=annualization,
+        )
+    out = _normalize_metrics(metrics)
+    out['utility_total_score'] = float(utility_from_metrics(out))
+    out['utility_status'] = utility_status(out['utility_total_score'])
+    return out
+
+
+def _comparison_against_baseline(strategy_metrics: Mapping[str, Any], baseline_metrics: Mapping[str, Any]) -> dict[str, Any]:
+    annual_return_delta = float(
+        float(strategy_metrics.get('annual_return', 0.0)) - float(baseline_metrics.get('annual_return', 0.0))
+    )
+    max_drawdown_delta = float(
+        float(strategy_metrics.get('max_drawdown', 0.0)) - float(baseline_metrics.get('max_drawdown', 0.0))
+    )
+    return {
+        'annual_return_delta': annual_return_delta,
+        'annual_return_delta_vs_baseline': annual_return_delta,
+        'max_drawdown_delta': max_drawdown_delta,
+        'max_drawdown_delta_vs_baseline': max_drawdown_delta,
+        'drawdown_ratio_vs_baseline': _safe_divide(
+            float(strategy_metrics.get('max_drawdown', 0.0)),
+            float(baseline_metrics.get('max_drawdown', 0.0)),
+        ),
+        'utility_delta_vs_baseline': float(
+            float(strategy_metrics.get('utility_total_score', 0.0)) - float(baseline_metrics.get('utility_total_score', 0.0))
+        ),
+        'upside_capture': float(strategy_metrics.get('upside_capture', 0.0)),
+    }
+
+
+def _build_report_markdown(summary: dict[str, Any]) -> str:
+    meta = summary['input']
+    comparison = summary['comparison']
+    stitched = summary['stitched_frozen_oos_metrics']
+    default = summary['default_strategy_full_sample_metrics']
+    baseline_stitched = summary['baseline_stitched_oos_metrics']
+    baseline_full = summary['baseline_full_sample_metrics']
+    frozen = summary['frozen_walkforward']
+    stitched_vs_baseline = comparison['stitched_oos_vs_baseline']
+    default_vs_baseline = comparison['default_vs_baseline']
+
+    def _fmt(value: Any, ndigits: int = 4) -> str:
+        if value is None:
+            return 'n/a'
+        if isinstance(value, float):
+            return f'{value:.{ndigits}f}'
+        return str(value)
+
+    lines = [
+        '# Real Walk-Forward Report',
+        '',
+        f"- input_path: `{meta['pit_path']}`",
+        f"- row_count: `{meta['row_count']}`",
+        f"- date_range: `{meta['date_start']}` to `{meta['date_end']}`",
+        '',
+        '## Frozen Validation Summary',
+        f"- total_windows: `{frozen['total_windows']}`",
+        f"- processed_window_count: `{frozen['processed_window_count']}`",
+        f"- skipped_window_count: `{frozen['skipped_window_count']}`",
+        f"- positive_window_ratio: `{_fmt(frozen['positive_window_ratio'])}`",
+        f"- primary_window_success_ratio: `{_fmt(frozen.get('primary_window_success_ratio'))}`",
+        f"- partial_window_success_ratio: `{_fmt(frozen.get('partial_window_success_ratio'))}`",
+        f"- primary_window_count: `{frozen.get('primary_window_count', 0)}`",
+        f"- partial_window_count: `{frozen.get('partial_window_count', 0)}`",
+        f"- hard_pass_window_ratio: `{_fmt(frozen.get('hard_pass_window_ratio'))}`",
+        f"- selection_mode_distribution: `{frozen.get('selection_mode_distribution', {})}`",
+        '',
+        '## Stitched Frozen OOS vs Baseline',
+        f"- stitched_oos_annual_return: `{_fmt(stitched.get('annual_return'))}`",
+        f"- baseline_stitched_oos_annual_return: `{_fmt(baseline_stitched.get('annual_return'))}`",
+        f"- annual_return_delta: `{_fmt(stitched_vs_baseline.get('annual_return_delta'))}`",
+        f"- stitched_oos_max_drawdown: `{_fmt(stitched.get('max_drawdown'))}`",
+        f"- baseline_stitched_oos_max_drawdown: `{_fmt(baseline_stitched.get('max_drawdown'))}`",
+        f"- drawdown_ratio_vs_baseline: `{_fmt(stitched_vs_baseline.get('drawdown_ratio_vs_baseline'))}`",
+        f"- stitched_oos_utility_total_score: `{_fmt(stitched.get('utility_total_score'))}`",
+        f"- baseline_stitched_oos_utility_total_score: `{_fmt(baseline_stitched.get('utility_total_score'))}`",
+        f"- utility_delta_vs_baseline: `{_fmt(stitched_vs_baseline.get('utility_delta_vs_baseline'))}`",
+        f"- stitched_oos_upside_capture: `{_fmt(stitched.get('upside_capture'))}`",
+        '',
+        '## Default Full-Sample vs Baseline (Reference)',
+        f"- default_annual_return: `{_fmt(default.get('annual_return'))}`",
+        f"- baseline_annual_return: `{_fmt(baseline_full.get('annual_return'))}`",
+        f"- annual_return_delta: `{_fmt(default_vs_baseline.get('annual_return_delta'))}`",
+        f"- default_max_drawdown: `{_fmt(default.get('max_drawdown'))}`",
+        f"- baseline_max_drawdown: `{_fmt(baseline_full.get('max_drawdown'))}`",
+        f"- drawdown_ratio_vs_baseline: `{_fmt(default_vs_baseline.get('drawdown_ratio_vs_baseline'))}`",
+        f"- default_utility_total_score: `{_fmt(default.get('utility_total_score'))}`",
+        f"- baseline_utility_total_score: `{_fmt(baseline_full.get('utility_total_score'))}`",
+        f"- utility_delta_vs_baseline: `{_fmt(default_vs_baseline.get('utility_delta_vs_baseline'))}`",
+    ]
+    return '\n'.join(lines) + '\n'
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description='Generate real-data frozen walk-forward report for ChiNext 50 regime workflow.')
+    parser.add_argument('--pit-csv', '--data-csv', dest='pit_csv', type=str, required=True, help='Required CSV/parquet full PIT input keyed by date.')
+    parser.add_argument('--strict-data', action='store_true', help='Fail fast when blocking quality breaches are detected.')
+    parser.add_argument('--min-coverage', type=float, default=None, help='Override default minimum non-null coverage ratio.')
+    parser.add_argument('--candidates-json', type=str, default=None, help='Optional JSON file describing frozen-validation candidate set.')
+    parser.add_argument('--min-train-rows', type=int, default=None, help='Override minimum required rows for each training window.')
+    parser.add_argument('--min-test-rows', type=int, default=None, help='Override minimum required rows for each test window.')
+    parser.add_argument('--config', type=str, default=None, help='Optional config YAML path.')
+    parser.add_argument('--output-dir', type=str, default='outputs/real_walkforward_report', help='Directory for report artifacts.')
+    args = parser.parse_args()
+
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    config = load_config(args.config)
+    raw = load_full_pit_data(args.pit_csv)
+
+    strict_mode, min_coverage, critical_columns, blocking_columns, column_min_coverage = _resolve_data_quality_settings(
+        config,
+        strict_cli=args.strict_data,
+        min_coverage_cli=args.min_coverage,
+    )
+    quality_summary = evaluate_data_quality_gate(
+        raw,
+        strict=strict_mode,
+        critical_columns=critical_columns,
+        blocking_columns=blocking_columns,
+        default_min_coverage=min_coverage,
+        column_min_coverage=column_min_coverage,
+    )
+    with (output_dir / 'data_quality_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(quality_summary, fh, ensure_ascii=False, indent=2)
+    if quality_summary['blocking']:
+        failed_items = quality_summary.get('errors') or quality_summary['breaches']
+        breached = ', '.join(item['column'] for item in failed_items)
+        raise ValueError(f'Data quality gate failed in strict mode. Breached columns: {breached}')
+
+    config.setdefault('_runtime', {})['strict_feature_gate'] = strict_mode
+    candidates, min_train_rows, min_test_rows = _resolve_frozen_settings(
+        config,
+        candidates_json=args.candidates_json,
+        min_train_rows_cli=args.min_train_rows,
+        min_test_rows_cli=args.min_test_rows,
+    )
+    windows = _resolve_walkforward_windows(config, raw.index)
+    board, frozen_summary = run_frozen_walkforward(
+        raw=raw,
+        config=config,
+        windows=windows,
+        candidates=candidates,
+        min_train_rows=min_train_rows,
+        min_test_rows=min_test_rows,
+    )
+    candidate_overrides = {candidate.candidate_id: copy.deepcopy(candidate.overrides) for candidate in candidates}
+
+    window_success_rule = _resolve_window_success_rule(config)
+    board_enriched, window_diagnostics = _window_success_diagnostics(board, window_success_rule)
+
+    stitched_ledger = _build_stitched_frozen_oos_ledger(
+        raw=raw,
+        config=config,
+        board=board_enriched,
+        candidate_overrides=candidate_overrides,
+    )
+    stitched_export = stitched_ledger.copy()
+    stitched_export.index.name = 'date'
+    stitched_export.to_csv(output_dir / 'stitched_frozen_oos_ledger.csv')
+
+    _, _, default_metrics_raw = run_strategy_bundle(raw, config)
+    baseline_plan = _build_baseline_plan(raw)
+    _, baseline_metrics_raw = run_backtest(baseline_plan, config)
+
+    default_strategy_metrics = _normalize_metrics(dict(default_metrics_raw))
+    default_strategy_metrics['utility_total_score'] = float(utility_from_metrics(default_strategy_metrics))
+    default_strategy_metrics['utility_status'] = utility_status(default_strategy_metrics['utility_total_score'])
+
+    stitched_oos_metrics = _metrics_from_ledger(stitched_ledger, config)
+
+    baseline_metrics = _normalize_metrics(dict(baseline_metrics_raw))
+    baseline_metrics['utility_total_score'] = float(utility_from_metrics(baseline_metrics))
+    baseline_metrics['utility_status'] = utility_status(baseline_metrics['utility_total_score'])
+    baseline_stitched_metrics = _baseline_metrics_on_same_dates(stitched_ledger, config)
+
+    stitched_vs_baseline = _comparison_against_baseline(stitched_oos_metrics, baseline_stitched_metrics)
+    default_vs_baseline = _comparison_against_baseline(default_strategy_metrics, baseline_metrics)
+
+    comparison = {
+        'stitched_oos_vs_baseline': stitched_vs_baseline,
+        'default_vs_baseline': default_vs_baseline,
+        # Legacy aliases remain mapped to stitched OOS branch.
+        'annual_return_delta': stitched_vs_baseline['annual_return_delta'],
+        'annual_return_delta_vs_baseline': stitched_vs_baseline['annual_return_delta_vs_baseline'],
+        'max_drawdown_delta': stitched_vs_baseline['max_drawdown_delta'],
+        'max_drawdown_delta_vs_baseline': stitched_vs_baseline['max_drawdown_delta_vs_baseline'],
+        'drawdown_ratio_vs_baseline': stitched_vs_baseline['drawdown_ratio_vs_baseline'],
+        'utility_delta_vs_baseline': stitched_vs_baseline['utility_delta_vs_baseline'],
+        'upside_capture': stitched_vs_baseline['upside_capture'],
+    }
+
+    summary = {
+        'input': {
+            'pit_path': str(args.pit_csv),
+            'row_count': int(len(raw)),
+            'date_start': raw.index.min().date().isoformat() if len(raw) else None,
+            'date_end': raw.index.max().date().isoformat() if len(raw) else None,
+        },
+        'frozen_walkforward': {
+            'total_windows': int(frozen_summary['total_windows']),
+            'processed_window_count': int(frozen_summary['processed_window_count']),
+            'skipped_window_count': int(frozen_summary['skipped_window_count']),
+            'positive_window_ratio': float(frozen_summary['positive_window_ratio']),
+            'primary_window_count': int(window_diagnostics['primary_window_count']),
+            'partial_window_count': int(window_diagnostics['partial_window_count']),
+            'primary_window_success_count': int(window_diagnostics['primary_window_success_count']),
+            'partial_window_success_count': int(window_diagnostics['partial_window_success_count']),
+            'primary_window_success_ratio': float(window_diagnostics['primary_window_success_ratio']),
+            'partial_window_success_ratio': float(window_diagnostics['partial_window_success_ratio']),
+            'max_primary_window_drawdown_ratio_vs_baseline': window_diagnostics['max_primary_window_drawdown_ratio_vs_baseline'],
+            'median_primary_window_upside_capture': window_diagnostics['median_primary_window_upside_capture'],
+            'window_success_rule': dict(window_diagnostics['window_success_rule']),
+            'selected_candidate_distribution': dict(frozen_summary['selected_candidate_distribution']),
+            'window_status_counts': dict(frozen_summary['window_status_counts']),
+            'selection_mode_distribution': dict(frozen_summary.get('selection_mode_distribution', {})),
+            'windows_with_hard_pass_candidate_count': int(
+                frozen_summary.get('windows_with_hard_pass_candidate_count', 0)
+            ),
+            'windows_without_hard_pass_candidate_count': int(
+                frozen_summary.get('windows_without_hard_pass_candidate_count', 0)
+            ),
+            'hard_pass_window_ratio': float(frozen_summary.get('hard_pass_window_ratio', 0.0)),
+            'candidate_selection': dict(frozen_summary.get('candidate_selection', {})),
+            'candidate_ids': list(frozen_summary['candidate_ids']),
+            'min_train_rows': int(frozen_summary['min_train_rows']),
+            'min_test_rows': int(frozen_summary['min_test_rows']),
+            'windows': _serialize_windows(windows),
+        },
+        'default_strategy_full_sample_metrics': default_strategy_metrics,
+        'stitched_frozen_oos_metrics': stitched_oos_metrics,
+        # Backward-compatible alias: old key now points to stitched OOS metrics.
+        'strategy_full_sample_metrics': stitched_oos_metrics,
+        'baseline_stitched_oos_metrics': baseline_stitched_metrics,
+        'baseline_full_sample_metrics': baseline_metrics,
+        'comparison': comparison,
+    }
+
+    board_enriched.to_csv(output_dir / 'frozen_validation_board.csv', index=False)
+    with (output_dir / 'real_walkforward_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(summary, fh, ensure_ascii=False, indent=2)
+    (output_dir / 'real_walkforward_report.md').write_text(_build_report_markdown(summary), encoding='utf-8')
+
+
+if __name__ == '__main__':
+    main()

+ 231 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/B4/test_real_walkforward_report_pipeline.py

@@ -0,0 +1,231 @@
+from __future__ import annotations
+
+import json
+import sys
+
+import pandas as pd
+import pytest
+
+from backtest.engine import compute_metrics
+import pipelines.real_walkforward_report as real_walkforward_report
+
+
+def _write_full_pit_csv(path, periods: int = 320, *, sparse_column: str | None = None) -> None:
+    dates = pd.bdate_range('2022-01-04', periods=periods)
+    base = pd.Series(range(periods), dtype=float)
+    df = pd.DataFrame(
+        {
+            'date': dates,
+            'open': 100.0 + base * 0.1,
+            'high': 101.0 + base * 0.1 + (base % 5) * 0.02,
+            'low': 99.0 + base * 0.1 - (base % 4) * 0.015,
+            'close': 100.5 + base * 0.1 + (base % 3) * 0.01,
+            'volume': 1_000_000.0 + base * 1000.0 + (base % 7) * 200.0,
+            'hs300_close': 4000.0 + base * 0.5,
+            'star50_close': 1200.0 + base * 0.2,
+            'csi1000_close': 5000.0 + base * 0.4,
+            'pct_constituents_above_20dma': 0.55 + (base % 10) * 0.01,
+            'pct_constituents_above_60dma': 0.50 + (base % 8) * 0.01,
+            'pct_new_high_20': 0.06 + (base % 5) * 0.002,
+            'pct_new_low_20': 0.07 + (base % 4) * 0.002,
+            'eq_weight_ret_5': -0.01 + (base % 7) * 0.002,
+            'weighted_ret_5': -0.008 + (base % 7) * 0.002 + (base % 3) * 0.0005,
+            'top3_contribution_5': 0.34 + (base % 6) * 0.004,
+            'top1_contribution_5': 0.11 + (base % 6) * 0.003,
+            'top10_contribution_5': 0.60 + (base % 6) * 0.004,
+            'sector_concentration_20': 0.20 + (base % 5) * 0.003 + (base % 3) * 0.0005,
+            'corr_spike_20': 0.05 + (base % 9) * 0.003,
+            'dispersion_20': 0.18 + (base % 8) * 0.004,
+        }
+    )
+    if sparse_column is not None:
+        df.loc[5:, sparse_column] = float('nan')
+    df.to_csv(path, index=False)
+
+
+def test_real_walkforward_report_generates_artifacts(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit.csv'
+    output_dir = tmp_path / 'report_output'
+    _write_full_pit_csv(data_path, periods=360)
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        ['real_walkforward_report.py', '--pit-csv', str(data_path), '--output-dir', str(output_dir)],
+    )
+    real_walkforward_report.main()
+
+    assert (output_dir / 'data_quality_summary.json').exists()
+    assert (output_dir / 'frozen_validation_board.csv').exists()
+    assert (output_dir / 'stitched_frozen_oos_ledger.csv').exists()
+    assert (output_dir / 'real_walkforward_summary.json').exists()
+    assert (output_dir / 'real_walkforward_report.md').exists()
+
+    summary = json.loads((output_dir / 'real_walkforward_summary.json').read_text(encoding='utf-8'))
+    assert 'default_strategy_full_sample_metrics' in summary
+    assert 'stitched_frozen_oos_metrics' in summary
+    assert 'strategy_full_sample_metrics' in summary
+    assert 'baseline_stitched_oos_metrics' in summary
+    assert 'baseline_full_sample_metrics' in summary
+    assert 'comparison' in summary
+    assert 'selection_mode_distribution' in summary['frozen_walkforward']
+    assert 'hard_pass_window_ratio' in summary['frozen_walkforward']
+    assert 'primary_window_success_ratio' in summary['frozen_walkforward']
+    assert 'candidate_selection' in summary['frozen_walkforward']
+    assert 'stitched_oos_vs_baseline' in summary['comparison']
+    assert 'default_vs_baseline' in summary['comparison']
+    assert 'utility_delta_vs_baseline' in summary['comparison']
+    assert 'annual_return_delta_vs_baseline' in summary['comparison']
+    assert 'max_drawdown_delta_vs_baseline' in summary['comparison']
+    assert summary['comparison']['annual_return_delta'] == summary['comparison']['annual_return_delta_vs_baseline']
+    assert summary['comparison']['max_drawdown_delta'] == summary['comparison']['max_drawdown_delta_vs_baseline']
+    assert summary['strategy_full_sample_metrics'] == summary['stitched_frozen_oos_metrics']
+    assert summary['input']['row_count'] == 360
+
+
+def test_real_walkforward_report_strict_mode_blocks_on_core_breach(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit_sparse.csv'
+    output_dir = tmp_path / 'report_strict_fail'
+    _write_full_pit_csv(data_path, periods=180, sparse_column='pct_constituents_above_60dma')
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        [
+            'real_walkforward_report.py',
+            '--pit-csv',
+            str(data_path),
+            '--strict-data',
+            '--output-dir',
+            str(output_dir),
+        ],
+    )
+
+    with pytest.raises(ValueError, match='Data quality gate failed in strict mode'):
+        real_walkforward_report.main()
+    assert (output_dir / 'data_quality_summary.json').exists()
+
+
+def test_primary_ratio_excludes_partial_and_main_comparison_uses_stitched(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit_custom.csv'
+    output_dir = tmp_path / 'report_custom'
+    _write_full_pit_csv(data_path, periods=340)
+    raw = real_walkforward_report.load_full_pit_data(str(data_path))
+
+    dates = raw.index
+    row_primary = {
+        'status': 'ok',
+        'train_start': dates[0].date().isoformat(),
+        'train_end': dates[80].date().isoformat(),
+        'test_start': dates[81].date().isoformat(),
+        'test_end': dates[300].date().isoformat(),
+        'test_rows': 220,
+        'selected_candidate_id': 'baseline',
+        'selected_candidate_overrides': '{}',
+        'test_annual_return': 0.10,
+        'test_upside_capture': 0.30,
+        'test_max_drawdown': 0.20,
+        'test_benchmark_max_drawdown': 0.40,
+        'test_annual_turnover': 12.0,
+        'test_utility_total_score': 0.05,
+    }
+    row_partial = {
+        'status': 'ok',
+        'train_start': dates[0].date().isoformat(),
+        'train_end': dates[300].date().isoformat(),
+        'test_start': dates[301].date().isoformat(),
+        'test_end': dates[339].date().isoformat(),
+        'test_rows': 39,
+        'selected_candidate_id': 'pro_risk',
+        'selected_candidate_overrides': '{}',
+        'test_annual_return': -0.08,
+        'test_upside_capture': 0.10,
+        'test_max_drawdown': 0.25,
+        'test_benchmark_max_drawdown': 0.25,
+        'test_annual_turnover': 30.0,
+        'test_utility_total_score': -0.20,
+    }
+    fake_board = pd.DataFrame([row_primary, row_partial])
+    fake_summary = {
+        'total_windows': 2,
+        'processed_window_count': 2,
+        'skipped_window_count': 0,
+        'positive_window_ratio': 0.5,
+        'selected_candidate_distribution': {'baseline': 1, 'pro_risk': 1},
+        'window_status_counts': {'ok': 2},
+        'selection_mode_distribution': {'constraint_score': 2},
+        'windows_with_hard_pass_candidate_count': 2,
+        'windows_without_hard_pass_candidate_count': 0,
+        'hard_pass_window_ratio': 1.0,
+        'candidate_selection': {},
+        'candidate_ids': ['baseline', 'pro_risk'],
+        'min_train_rows': 120,
+        'min_test_rows': 40,
+    }
+
+    def fake_run_frozen_walkforward(*args, **kwargs):
+        return fake_board.copy(), dict(fake_summary)
+
+    def fake_run_strategy_bundle(df: pd.DataFrame, cfg: dict[str, object]):
+        candidate_id = str(cfg.get('_candidate_id', 'default'))
+        if candidate_id == 'baseline':
+            strategy_ret = 0.012
+            turnover = 0.03
+        elif candidate_id == 'pro_risk':
+            strategy_ret = -0.006
+            turnover = 0.20
+        else:
+            strategy_ret = 0.020
+            turnover = 0.04
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(strategy_ret, index=df.index, dtype=float),
+                'asset_exec_return': pd.Series(0.008, index=df.index, dtype=float),
+                'turnover': pd.Series(turnover, index=df.index, dtype=float),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return df.copy(), ledger, metrics
+
+    def fake_run_backtest(df: pd.DataFrame, cfg: dict[str, object]):
+        ledger = pd.DataFrame(
+            {
+                'strategy_return_net': pd.Series(0.010, index=df.index, dtype=float),
+                'asset_exec_return': pd.Series(0.008, index=df.index, dtype=float),
+                'turnover': pd.Series(0.0, index=df.index, dtype=float),
+            }
+        )
+        metrics = compute_metrics(ledger['strategy_return_net'], ledger['asset_exec_return'], ledger['turnover'])
+        return ledger, metrics
+
+    monkeypatch.setattr(real_walkforward_report, 'run_frozen_walkforward', fake_run_frozen_walkforward)
+    monkeypatch.setattr(real_walkforward_report, 'run_strategy_bundle', fake_run_strategy_bundle)
+    monkeypatch.setattr(real_walkforward_report, 'run_backtest', fake_run_backtest)
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        ['real_walkforward_report.py', '--pit-csv', str(data_path), '--output-dir', str(output_dir)],
+    )
+    real_walkforward_report.main()
+
+    summary = json.loads((output_dir / 'real_walkforward_summary.json').read_text(encoding='utf-8'))
+    frozen = summary['frozen_walkforward']
+    comparison = summary['comparison']
+    stitched_cmp = comparison['stitched_oos_vs_baseline']
+    default_cmp = comparison['default_vs_baseline']
+
+    assert frozen['primary_window_count'] == 1
+    assert frozen['partial_window_count'] == 1
+    assert frozen['primary_window_success_ratio'] == 1.0
+    assert frozen['partial_window_success_ratio'] == 0.0
+
+    assert comparison['annual_return_delta'] == stitched_cmp['annual_return_delta']
+    assert comparison['annual_return_delta'] != default_cmp['annual_return_delta']
+    assert summary['stitched_frozen_oos_metrics']['annual_return'] != summary['default_strategy_full_sample_metrics']['annual_return']
+    assert summary['baseline_stitched_oos_metrics']['annual_return'] != summary['baseline_full_sample_metrics']['annual_return']
+    assert summary['baseline_stitched_oos_metrics']['annual_return'] == pytest.approx(
+        summary['stitched_frozen_oos_metrics']['benchmark_return']
+    )
+
+    report_text = (output_dir / 'real_walkforward_report.md').read_text(encoding='utf-8')
+    assert 'Stitched Frozen OOS vs Baseline' in report_text

+ 191 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1a/regime.yaml

@@ -0,0 +1,191 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+  thresholds:
+    risk_off_down_hazard: 0.62
+    risk_off_stress: 0.85
+    risk_off_trend_floor: -0.10
+    crash_override_down_hazard: 0.72
+    trend_score: 0.45
+    trend_breadth_min: -0.05
+    trend_stress_max: 0.45
+    euphoric_crowding: 0.70
+    euphoric_rebound_hazard: 0.68
+    repair_hazard: 0.58
+    repair_stress_max: 0.85
+    repair_d_stress_max: 0.0
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 213 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1a/test_policy.py

@@ -0,0 +1,213 @@
+import pandas as pd
+import pytest
+
+from data.sample_data import generate_synthetic_chinext50_data
+from config.loader import load_config
+from features.pipeline import build_feature_table
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+from model.policy import build_exposure_plan
+
+
+def test_exposure_plan_is_quantized_and_bounded() -> None:
+    config = load_config()
+    raw = generate_synthetic_chinext50_data(periods=400, seed=11)
+    featured = build_feature_table(raw)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+
+    ladder_step = float(config['trading']['exposure_ladder_step'])
+    max_step = float(config['trading']['max_daily_exposure_change'])
+    assert planned['target_exposure'].between(0.0, 1.0).all()
+    assert (planned['target_exposure'].diff().abs().dropna() <= max_step + 1e-12).all()
+    scaled = planned['target_exposure'].dropna() / ladder_step
+    assert ((scaled - scaled.round()).abs() < 1e-9).all()
+
+
+def test_state_machine_raises_if_invalid_signal_reappears_after_warmup() -> None:
+    idx = pd.date_range('2024-01-01', periods=3, freq='D')
+    df = pd.DataFrame(
+        {
+            'trend_score': [None, 0.6, 0.6],
+            'breadth_score': [None, 0.2, None],
+            'stress_score': [None, 0.1, 0.1],
+            'crowding_score': [None, 0.1, 0.1],
+            'repair_score': [None, 0.1, 0.1],
+            'down_hazard': [None, 0.3, 0.3],
+            'repair_hazard': [None, 0.4, 0.4],
+            'rebound_hazard': [None, 0.4, 0.4],
+            'd_trend': [0.0, 0.0, 0.0],
+            'd_breadth': [0.0, 0.0, 0.0],
+            'd_stress': [0.0, 0.0, 0.0],
+            'd_crowding': [0.0, 0.0, 0.0],
+            'score_acceleration': [0.0, 0.0, 0.0],
+            'core_score_ready': [False, True, False],
+            'hazard_ready': [False, True, True],
+            'breakout_dist_120': [0.0, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='invalid score/hazard after warmup'):
+        run_state_machine(df, load_config())
+
+
+def test_warmup_rows_have_zero_exposure() -> None:
+    idx = pd.date_range('2024-01-01', periods=2, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['warmup', 'chop'],
+            'days_in_state': [1, 1],
+            'down_hazard': [None, 0.3],
+            'repair_hazard': [None, 0.4],
+            'stress_score': [None, 0.1],
+            'trend_score': [None, 0.1],
+            'breadth_score': [None, 0.0],
+            'crowding_score': [None, 0.1],
+            'upper_wick_ratio_5': [None, 0.1],
+        },
+        index=idx,
+    )
+
+    planned = build_exposure_plan(df, load_config())
+    assert planned.loc[idx[0], 'target_exposure'] == 0.0
+    assert planned.loc[idx[0], 'veto_reason'] == 'warmup'
+
+
+def test_policy_does_not_swallow_invalid_trend_inputs() -> None:
+    idx = pd.date_range('2024-01-01', periods=1, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['trend'],
+            'days_in_state': [1],
+            'down_hazard': [0.3],
+            'repair_hazard': [0.4],
+            'stress_score': [0.1],
+            'trend_score': [0.5],
+            'breadth_score': [None],
+            'crowding_score': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='requires non-null "breadth_score"'):
+        build_exposure_plan(df, load_config())
+
+
+def test_candidate_overrides_produce_different_exposure_paths_under_finer_ladder() -> None:
+    idx = pd.date_range('2024-01-01', periods=4, freq='D')
+    state_df = pd.DataFrame(
+        {
+            'state': ['trend', 'trend', 'chop', 'repair'],
+            'days_in_state': [1, 2, 1, 2],
+            'down_hazard': [0.3, 0.3, 0.3, 0.3],
+            'repair_hazard': [0.6, 0.6, 0.6, 0.7],
+            'stress_score': [0.1, 0.1, 0.2, 0.2],
+            'trend_score': [0.5, 0.5, 0.1, 0.2],
+            'breadth_score': [0.0, 0.0, 0.0, 0.0],
+            'crowding_score': [0.1, 0.1, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    baseline_cfg = load_config()
+    pro_risk_cfg = load_config()
+    pro_risk_cfg['policy']['trend'] = 1.00
+    pro_risk_cfg['policy']['euphoric_late'] = 0.65
+    pro_risk_cfg['policy']['chop'] = 0.35
+    pro_risk_cfg['policy']['repair_rebound_base'] = 0.45
+    pro_risk_cfg['policy']['repair_rebound_max'] = 0.95
+    pro_risk_cfg['trading']['max_daily_exposure_change'] = 0.30
+
+    baseline = build_exposure_plan(state_df, baseline_cfg)
+    pro_risk = build_exposure_plan(state_df, pro_risk_cfg)
+
+    assert not baseline['target_exposure'].equals(pro_risk['target_exposure'])
+
+
+def _single_state_row(
+    *,
+    trend_score: float,
+    breadth_score: float,
+    stress_score: float,
+    crowding_score: float,
+    down_hazard: float,
+    repair_hazard: float,
+    rebound_hazard: float,
+    d_stress: float,
+) -> pd.DataFrame:
+    idx = pd.date_range('2024-02-01', periods=1, freq='D')
+    return pd.DataFrame(
+        {
+            'trend_score': [trend_score],
+            'breadth_score': [breadth_score],
+            'stress_score': [stress_score],
+            'crowding_score': [crowding_score],
+            'repair_score': [0.0],
+            'down_hazard': [down_hazard],
+            'repair_hazard': [repair_hazard],
+            'rebound_hazard': [rebound_hazard],
+            'd_trend': [0.0],
+            'd_breadth': [0.0],
+            'd_stress': [d_stress],
+            'd_crowding': [0.0],
+            'score_acceleration': [0.0],
+            'core_score_ready': [True],
+            'hazard_ready': [True],
+            'breakout_dist_120': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+
+def test_state_machine_overlap_prefers_trend_over_repair() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.30,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'trend'
+    assert out.iloc[0]['state'] == 'trend'
+
+
+def test_state_machine_trend_euphoric_branch_maps_to_euphoric_late() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.75,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'euphoric_late'
+    assert out.iloc[0]['state'] == 'euphoric_late'
+
+
+def test_state_machine_risk_off_still_has_top_priority() -> None:
+    df = _single_state_row(
+        trend_score=0.60,
+        breadth_score=0.05,
+        stress_score=0.30,
+        crowding_score=0.10,
+        down_hazard=0.80,
+        repair_hazard=0.80,
+        rebound_hazard=0.80,
+        d_stress=-0.10,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'risk_off'
+    assert out.iloc[0]['state'] == 'risk_off'

+ 191 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/regime.yaml

@@ -0,0 +1,191 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+  thresholds:
+    risk_off_down_hazard: 0.67
+    risk_off_stress: 0.89
+    risk_off_trend_floor: -0.14
+    crash_override_down_hazard: 0.77
+    trend_score: 0.45
+    trend_breadth_min: -0.05
+    trend_stress_max: 0.45
+    euphoric_crowding: 0.70
+    euphoric_rebound_hazard: 0.68
+    repair_hazard: 0.58
+    repair_stress_max: 0.85
+    repair_d_stress_max: 0.0
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 142 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/state_machine.py

@@ -0,0 +1,142 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import pandas as pd
+
+
+@dataclass
+class StateConfig:
+    min_state_duration: int = 3
+    crash_override: bool = True
+    thresholds: dict[str, float] | None = None
+
+
+DEFAULT_THRESHOLDS: dict[str, float] = {
+    'risk_off_down_hazard': 0.62,
+    'risk_off_stress': 0.85,
+    'risk_off_trend_floor': -0.10,
+    'crash_override_down_hazard': 0.72,
+    'trend_score': 0.45,
+    'trend_breadth_min': -0.05,
+    'trend_stress_max': 0.45,
+    'euphoric_crowding': 0.70,
+    'euphoric_rebound_hazard': 0.68,
+    'repair_hazard': 0.58,
+    'repair_stress_max': 0.85,
+    'repair_d_stress_max': 0.0,
+}
+
+
+REQUIRED_STATE_INPUTS: tuple[str, ...] = (
+    'trend_score',
+    'breadth_score',
+    'stress_score',
+    'crowding_score',
+    'down_hazard',
+    'repair_hazard',
+    'rebound_hazard',
+)
+
+
+def _row_is_ready(row: pd.Series) -> bool:
+    if not (bool(row.get('core_score_ready', False)) and bool(row.get('hazard_ready', False))):
+        return False
+    return all(pd.notna(row.get(column)) for column in REQUIRED_STATE_INPUTS)
+
+
+def _resolve_thresholds(config: dict[str, Any] | None) -> dict[str, float]:
+    state_cfg = (config or {}).get('state_machine', {})
+    raw_thresholds = state_cfg.get('thresholds', {}) if isinstance(state_cfg, dict) else {}
+    thresholds = dict(DEFAULT_THRESHOLDS)
+    if isinstance(raw_thresholds, dict):
+        for key, value in raw_thresholds.items():
+            thresholds[str(key)] = float(value)
+    return thresholds
+
+
+def _raw_state(row: pd.Series, thresholds: dict[str, float]) -> str:
+    if row['down_hazard'] >= thresholds['risk_off_down_hazard'] or (
+        row['stress_score'] >= thresholds['risk_off_stress']
+        and row['trend_score'] <= thresholds['risk_off_trend_floor']
+    ):
+        return 'risk_off'
+    if (
+        row['repair_hazard'] >= thresholds['repair_hazard']
+        and row['stress_score'] <= thresholds['repair_stress_max']
+        and row['d_stress'] <= thresholds['repair_d_stress_max']
+        and row['trend_score'] < thresholds['trend_score']
+    ):
+        return 'repair'
+    if (
+        row['trend_score'] >= thresholds['trend_score']
+        and row['breadth_score'] >= thresholds['trend_breadth_min']
+        and row['stress_score'] <= thresholds['trend_stress_max']
+    ):
+        if (
+            row['crowding_score'] >= thresholds['euphoric_crowding']
+            or row['rebound_hazard'] >= thresholds['euphoric_rebound_hazard']
+        ):
+            return 'euphoric_late'
+        return 'trend'
+    return 'chop'
+
+
+def run_state_machine(df: pd.DataFrame, config: dict[str, Any] | None = None) -> pd.DataFrame:
+    out = df.copy()
+    state_cfg = StateConfig(**((config or {}).get('state_machine', {})))
+    thresholds = _resolve_thresholds(config)
+
+    current_state = 'warmup'
+    days_in_state = 0
+    system_ready = False
+    active_days: list[int] = []
+    active_states: list[str] = []
+    proposed_states: list[str] = []
+
+    for ts, row in out.iterrows():
+        if not _row_is_ready(row):
+            if system_ready:
+                raise ValueError(f'invalid score/hazard after warmup at {pd.Timestamp(ts).date().isoformat()}')
+            proposal = 'warmup'
+            new_state = 'warmup'
+        else:
+            proposal = _raw_state(row, thresholds)
+            system_ready = True
+
+            crash_override = (
+                state_cfg.crash_override
+                and proposal == 'risk_off'
+                and row['down_hazard'] >= thresholds['crash_override_down_hazard']
+            )
+
+            if current_state == 'warmup':
+                new_state = proposal
+            elif crash_override:
+                new_state = 'risk_off'
+            elif proposal == current_state:
+                new_state = current_state
+            elif days_in_state >= state_cfg.min_state_duration:
+                new_state = proposal
+            else:
+                new_state = current_state
+
+        proposed_states.append(proposal)
+
+        if new_state == current_state:
+            days_in_state += 1
+        else:
+            current_state = new_state
+            days_in_state = 1
+
+        active_states.append(current_state)
+        active_days.append(days_in_state)
+
+    out['proposed_state'] = proposed_states
+    out['state'] = active_states
+    out['days_in_state'] = active_days
+    out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+    out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+    out['days_since_breakout'] = (out['breakout_dist_120'].fillna(0.0) > 0.0).groupby((out['breakout_dist_120'].fillna(0.0) <= 0.0).cumsum()).cumsum()
+    return out

+ 235 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1/test_policy.py

@@ -0,0 +1,235 @@
+import pandas as pd
+import pytest
+
+from data.sample_data import generate_synthetic_chinext50_data
+from config.loader import load_config
+from features.pipeline import build_feature_table
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+from model.policy import build_exposure_plan
+
+
+def test_exposure_plan_is_quantized_and_bounded() -> None:
+    config = load_config()
+    raw = generate_synthetic_chinext50_data(periods=400, seed=11)
+    featured = build_feature_table(raw)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+
+    ladder_step = float(config['trading']['exposure_ladder_step'])
+    max_step = float(config['trading']['max_daily_exposure_change'])
+    assert planned['target_exposure'].between(0.0, 1.0).all()
+    assert (planned['target_exposure'].diff().abs().dropna() <= max_step + 1e-12).all()
+    scaled = planned['target_exposure'].dropna() / ladder_step
+    assert ((scaled - scaled.round()).abs() < 1e-9).all()
+
+
+def test_state_machine_raises_if_invalid_signal_reappears_after_warmup() -> None:
+    idx = pd.date_range('2024-01-01', periods=3, freq='D')
+    df = pd.DataFrame(
+        {
+            'trend_score': [None, 0.6, 0.6],
+            'breadth_score': [None, 0.2, None],
+            'stress_score': [None, 0.1, 0.1],
+            'crowding_score': [None, 0.1, 0.1],
+            'repair_score': [None, 0.1, 0.1],
+            'down_hazard': [None, 0.3, 0.3],
+            'repair_hazard': [None, 0.4, 0.4],
+            'rebound_hazard': [None, 0.4, 0.4],
+            'd_trend': [0.0, 0.0, 0.0],
+            'd_breadth': [0.0, 0.0, 0.0],
+            'd_stress': [0.0, 0.0, 0.0],
+            'd_crowding': [0.0, 0.0, 0.0],
+            'score_acceleration': [0.0, 0.0, 0.0],
+            'core_score_ready': [False, True, False],
+            'hazard_ready': [False, True, True],
+            'breakout_dist_120': [0.0, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='invalid score/hazard after warmup'):
+        run_state_machine(df, load_config())
+
+
+def test_warmup_rows_have_zero_exposure() -> None:
+    idx = pd.date_range('2024-01-01', periods=2, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['warmup', 'chop'],
+            'days_in_state': [1, 1],
+            'down_hazard': [None, 0.3],
+            'repair_hazard': [None, 0.4],
+            'stress_score': [None, 0.1],
+            'trend_score': [None, 0.1],
+            'breadth_score': [None, 0.0],
+            'crowding_score': [None, 0.1],
+            'upper_wick_ratio_5': [None, 0.1],
+        },
+        index=idx,
+    )
+
+    planned = build_exposure_plan(df, load_config())
+    assert planned.loc[idx[0], 'target_exposure'] == 0.0
+    assert planned.loc[idx[0], 'veto_reason'] == 'warmup'
+
+
+def test_policy_does_not_swallow_invalid_trend_inputs() -> None:
+    idx = pd.date_range('2024-01-01', periods=1, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['trend'],
+            'days_in_state': [1],
+            'down_hazard': [0.3],
+            'repair_hazard': [0.4],
+            'stress_score': [0.1],
+            'trend_score': [0.5],
+            'breadth_score': [None],
+            'crowding_score': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='requires non-null "breadth_score"'):
+        build_exposure_plan(df, load_config())
+
+
+def test_candidate_overrides_produce_different_exposure_paths_under_finer_ladder() -> None:
+    idx = pd.date_range('2024-01-01', periods=4, freq='D')
+    state_df = pd.DataFrame(
+        {
+            'state': ['trend', 'trend', 'chop', 'repair'],
+            'days_in_state': [1, 2, 1, 2],
+            'down_hazard': [0.3, 0.3, 0.3, 0.3],
+            'repair_hazard': [0.6, 0.6, 0.6, 0.7],
+            'stress_score': [0.1, 0.1, 0.2, 0.2],
+            'trend_score': [0.5, 0.5, 0.1, 0.2],
+            'breadth_score': [0.0, 0.0, 0.0, 0.0],
+            'crowding_score': [0.1, 0.1, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    baseline_cfg = load_config()
+    pro_risk_cfg = load_config()
+    pro_risk_cfg['policy']['trend'] = 1.00
+    pro_risk_cfg['policy']['euphoric_late'] = 0.65
+    pro_risk_cfg['policy']['chop'] = 0.35
+    pro_risk_cfg['policy']['repair_rebound_base'] = 0.45
+    pro_risk_cfg['policy']['repair_rebound_max'] = 0.95
+    pro_risk_cfg['trading']['max_daily_exposure_change'] = 0.30
+
+    baseline = build_exposure_plan(state_df, baseline_cfg)
+    pro_risk = build_exposure_plan(state_df, pro_risk_cfg)
+
+    assert not baseline['target_exposure'].equals(pro_risk['target_exposure'])
+
+
+def _single_state_row(
+    *,
+    trend_score: float,
+    breadth_score: float,
+    stress_score: float,
+    crowding_score: float,
+    down_hazard: float,
+    repair_hazard: float,
+    rebound_hazard: float,
+    d_stress: float,
+) -> pd.DataFrame:
+    idx = pd.date_range('2024-02-01', periods=1, freq='D')
+    return pd.DataFrame(
+        {
+            'trend_score': [trend_score],
+            'breadth_score': [breadth_score],
+            'stress_score': [stress_score],
+            'crowding_score': [crowding_score],
+            'repair_score': [0.0],
+            'down_hazard': [down_hazard],
+            'repair_hazard': [repair_hazard],
+            'rebound_hazard': [rebound_hazard],
+            'd_trend': [0.0],
+            'd_breadth': [0.0],
+            'd_stress': [d_stress],
+            'd_crowding': [0.0],
+            'score_acceleration': [0.0],
+            'core_score_ready': [True],
+            'hazard_ready': [True],
+            'breakout_dist_120': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+
+def test_state_machine_overlap_prefers_trend_over_repair() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.30,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'trend'
+    assert out.iloc[0]['state'] == 'trend'
+
+
+def test_state_machine_trend_euphoric_branch_maps_to_euphoric_late() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.75,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'euphoric_late'
+    assert out.iloc[0]['state'] == 'euphoric_late'
+
+
+def test_state_machine_risk_off_still_has_top_priority() -> None:
+    df = _single_state_row(
+        trend_score=0.60,
+        breadth_score=0.05,
+        stress_score=0.30,
+        crowding_score=0.10,
+        down_hazard=0.80,
+        repair_hazard=0.80,
+        rebound_hazard=0.80,
+        d_stress=-0.10,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'risk_off'
+    assert out.iloc[0]['state'] == 'risk_off'
+
+
+def test_state_machine_risk_off_threshold_is_config_driven() -> None:
+    df = _single_state_row(
+        trend_score=0.20,
+        breadth_score=-0.10,
+        stress_score=0.30,
+        crowding_score=0.10,
+        down_hazard=0.65,
+        repair_hazard=0.40,
+        rebound_hazard=0.30,
+        d_stress=0.05,
+    )
+    strict_cfg = load_config()
+    strict_cfg['state_machine']['thresholds']['risk_off_down_hazard'] = 0.62
+    loose_cfg = load_config()
+    loose_cfg['state_machine']['thresholds']['risk_off_down_hazard'] = 0.68
+
+    out_strict = run_state_machine(df, strict_cfg)
+    out_loose = run_state_machine(df, loose_cfg)
+    assert out_strict.iloc[0]['proposed_state'] == 'risk_off'
+    assert out_loose.iloc[0]['proposed_state'] != 'risk_off'

+ 191 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/regime.yaml

@@ -0,0 +1,191 @@
+version: 0.2
+project: chinext50_regime
+trading:
+  frequency: daily
+  execution_timing: next_open_approx
+  instrument_type: domestic_stock_etf
+  max_exposure: 1.0
+  min_exposure: 0.0
+  exposure_mode: ladder
+  exposure_ladder_step: 0.10
+  max_daily_exposure_change: 0.30
+  fee_bps_roundtrip: 8
+  slippage_bps_oneway: 4
+  extreme_day_move_threshold: 0.03
+  extreme_day_cost_multiplier: 1.5
+  gap_slippage_factor: 0.02
+  annualization: 252
+state_machine:
+  min_state_duration: 3
+  crash_override: true
+  thresholds:
+    risk_off_down_hazard: 0.67
+    risk_off_stress: 0.89
+    risk_off_trend_floor: -0.14
+    crash_override_down_hazard: 0.77
+    trend_score: 0.45
+    trend_breadth_min: -0.05
+    trend_stress_max: 0.45
+    euphoric_crowding: 0.70
+    euphoric_rebound_hazard: 0.68
+    repair_hazard: 0.58
+    repair_stress_max: 0.85
+    repair_d_stress_max: 0.0
+policy:
+  trend: 0.90
+  euphoric_late: 0.60
+  chop: 0.30
+  risk_off: 0.00
+  repair_rebound_base: 0.35
+  repair_rebound_max: 0.80
+evaluation:
+  objective: net_utility
+  positive_window_ratio_threshold: 0.60
+  upside_capture_min: 0.75
+  max_drawdown_ratio_max: 0.75
+  primary_window_success_upside_min: 0.25
+  primary_window_success_drawdown_ratio_max: 0.80
+  primary_window_success_turnover_max: 22.0
+  primary_window_success_require_positive_return: true
+  primary_window_success_ratio_min: 0.50
+  primary_window_success_ratio_target: 0.60
+  primary_window_min_rows: 180
+  utility_upside_target: 0.55
+  utility_turnover_penalty_start: 8.0
+  utility_turnover_penalty_rate: 0.010
+data_quality:
+  strict_mode_default: false
+  default_min_coverage: 0.95
+  critical_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - top1_contribution_5
+    - top10_contribution_5
+    - sector_concentration_20
+    - corr_spike_20
+    - dispersion_20
+  blocking_columns:
+    - open
+    - high
+    - low
+    - close
+    - volume
+    - hs300_close
+    - star50_close
+    - csi1000_close
+    - pct_constituents_above_20dma
+    - pct_constituents_above_60dma
+    - pct_new_high_20
+    - pct_new_low_20
+    - eq_weight_ret_5
+    - weighted_ret_5
+    - top3_contribution_5
+    - corr_spike_20
+    - dispersion_20
+  column_min_coverage: {}
+  breadth_integrity_min_unique_non_null: 3
+  breadth_integrity_max_dominant_value_ratio: 0.995
+  breadth_integrity_std_floor: 1e-8
+  breadth_semantic_require_official_index_weight: true
+  breadth_semantic_require_time_varying_membership: true
+  breadth_semantic_max_industry_unknown_ratio: 0.10
+  low_info_min_unique_non_null: 3
+  low_info_std_floor: 1e-8
+  low_info_max_dominant_ratio: 0.995
+  low_info_feature_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+  low_info_blocking_columns:
+    - concentration_spread_5
+    - breadth_thrust_5
+    - up_down_imbalance_20
+    - breadth_divergence
+    - top3_vs_top10_ratio_5
+    - top1_top3_pressure_5
+    - sector_concentration_change_20
+    - volume_z_20
+    - upper_wick_ratio_5
+    - range_pos_120
+frozen_validation:
+  window_mode: expanding
+  min_train_years: 2
+  test_years: 1
+  allow_partial_last_test: true
+  min_train_rows: 120
+  min_test_rows: 40
+  candidate_selection:
+    use_hard_constraints: true
+    upside_capture_min: 0.28
+    max_drawdown_ratio_vs_benchmark: 0.72
+    annual_turnover_soft_max: 18.0
+    annual_return_override_abs: 0.05
+    annual_return_override_ratio: 0.40
+    return_ratio_weight: 0.30
+    upside_weight: 0.30
+    drawdown_weight: 0.20
+    sharpe_delta_weight: 0.10
+    stability_weight: 0.10
+    turnover_penalty_per_unit: 0.015
+    score_cap: 1.20
+    upside_target: 0.45
+    drawdown_improvement_target: 0.35
+    sharpe_delta_shift: 0.05
+    sharpe_delta_scale: 0.15
+    turnover_penalty_start: 12.0
+    core_utility_floor: -0.05
+    core_utility_target: 0.10
+    fallback_mode: closest_to_feasible_frontier
+  candidates:
+    - id: defensive
+      overrides:
+        policy:
+          trend: 0.80
+          euphoric_late: 0.30
+          chop: 0.20
+          repair_rebound_base: 0.30
+          repair_rebound_max: 0.65
+        trading:
+          max_daily_exposure_change: 0.20
+    - id: baseline
+      overrides: {}
+    - id: balanced_capture
+      overrides:
+        policy:
+          trend: 0.95
+          euphoric_late: 0.65
+          chop: 0.35
+          repair_rebound_base: 0.40
+          repair_rebound_max: 0.85
+        trading:
+          max_daily_exposure_change: 0.30
+    - id: pro_risk
+      overrides:
+        policy:
+          trend: 1.00
+          euphoric_late: 0.70
+          chop: 0.45
+          repair_rebound_base: 0.50
+          repair_rebound_max: 0.95
+        trading:
+          max_daily_exposure_change: 0.35

+ 142 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/state_machine.py

@@ -0,0 +1,142 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import pandas as pd
+
+
+@dataclass
+class StateConfig:
+    min_state_duration: int = 3
+    crash_override: bool = True
+    thresholds: dict[str, float] | None = None
+
+
+DEFAULT_THRESHOLDS: dict[str, float] = {
+    'risk_off_down_hazard': 0.62,
+    'risk_off_stress': 0.85,
+    'risk_off_trend_floor': -0.10,
+    'crash_override_down_hazard': 0.72,
+    'trend_score': 0.45,
+    'trend_breadth_min': -0.05,
+    'trend_stress_max': 0.45,
+    'euphoric_crowding': 0.70,
+    'euphoric_rebound_hazard': 0.68,
+    'repair_hazard': 0.58,
+    'repair_stress_max': 0.85,
+    'repair_d_stress_max': 0.0,
+}
+
+
+REQUIRED_STATE_INPUTS: tuple[str, ...] = (
+    'trend_score',
+    'breadth_score',
+    'stress_score',
+    'crowding_score',
+    'down_hazard',
+    'repair_hazard',
+    'rebound_hazard',
+)
+
+
+def _row_is_ready(row: pd.Series) -> bool:
+    if not (bool(row.get('core_score_ready', False)) and bool(row.get('hazard_ready', False))):
+        return False
+    return all(pd.notna(row.get(column)) for column in REQUIRED_STATE_INPUTS)
+
+
+def _resolve_thresholds(config: dict[str, Any] | None) -> dict[str, float]:
+    state_cfg = (config or {}).get('state_machine', {})
+    raw_thresholds = state_cfg.get('thresholds', {}) if isinstance(state_cfg, dict) else {}
+    thresholds = dict(DEFAULT_THRESHOLDS)
+    if isinstance(raw_thresholds, dict):
+        for key, value in raw_thresholds.items():
+            thresholds[str(key)] = float(value)
+    return thresholds
+
+
+def _raw_state(row: pd.Series, thresholds: dict[str, float]) -> str:
+    if row['down_hazard'] >= thresholds['risk_off_down_hazard'] or (
+        row['stress_score'] >= thresholds['risk_off_stress']
+        and row['trend_score'] <= thresholds['risk_off_trend_floor']
+    ):
+        return 'risk_off'
+    if (
+        row['repair_hazard'] >= thresholds['repair_hazard']
+        and row['stress_score'] <= thresholds['repair_stress_max']
+        and row['d_stress'] <= thresholds['repair_d_stress_max']
+        and row['trend_score'] < thresholds['trend_score']
+    ):
+        return 'repair'
+    if (
+        row['trend_score'] >= thresholds['trend_score']
+        and row['breadth_score'] >= thresholds['trend_breadth_min']
+        and row['stress_score'] <= thresholds['trend_stress_max']
+    ):
+        if (
+            row['crowding_score'] >= thresholds['euphoric_crowding']
+            or row['rebound_hazard'] >= thresholds['euphoric_rebound_hazard']
+        ):
+            return 'euphoric_late'
+        return 'trend'
+    return 'chop'
+
+
+def run_state_machine(df: pd.DataFrame, config: dict[str, Any] | None = None) -> pd.DataFrame:
+    out = df.copy()
+    state_cfg = StateConfig(**((config or {}).get('state_machine', {})))
+    thresholds = _resolve_thresholds(config)
+
+    current_state = 'warmup'
+    days_in_state = 0
+    system_ready = False
+    active_days: list[int] = []
+    active_states: list[str] = []
+    proposed_states: list[str] = []
+
+    for ts, row in out.iterrows():
+        if not _row_is_ready(row):
+            if system_ready:
+                raise ValueError(f'invalid score/hazard after warmup at {pd.Timestamp(ts).date().isoformat()}')
+            proposal = 'warmup'
+            new_state = 'warmup'
+        else:
+            proposal = _raw_state(row, thresholds)
+            system_ready = True
+
+            crash_override = (
+                state_cfg.crash_override
+                and proposal == 'risk_off'
+                and row['down_hazard'] >= thresholds['crash_override_down_hazard']
+            )
+
+            if current_state == 'warmup':
+                new_state = proposal
+            elif crash_override:
+                new_state = 'risk_off'
+            elif proposal == current_state:
+                new_state = current_state
+            elif days_in_state >= state_cfg.min_state_duration:
+                new_state = proposal
+            else:
+                new_state = current_state
+
+        proposed_states.append(proposal)
+
+        if new_state == current_state:
+            days_in_state += 1
+        else:
+            current_state = new_state
+            days_in_state = 1
+
+        active_states.append(current_state)
+        active_days.append(days_in_state)
+
+    out['proposed_state'] = proposed_states
+    out['state'] = active_states
+    out['days_in_state'] = active_days
+    out['days_since_riskoff'] = (out['state'] == 'risk_off').cumsum()
+    out.loc[out['state'] != 'risk_off', 'days_since_riskoff'] = 0
+    out['days_since_breakout'] = (out['breakout_dist_120'].fillna(0.0) > 0.0).groupby((out['breakout_dist_120'].fillna(0.0) <= 0.0).cumsum()).cumsum()
+    return out

+ 235 - 0
research/chinext50_regime_project/deliverables/block_backups_20260410/H1b1_retry1/test_policy.py

@@ -0,0 +1,235 @@
+import pandas as pd
+import pytest
+
+from data.sample_data import generate_synthetic_chinext50_data
+from config.loader import load_config
+from features.pipeline import build_feature_table
+from model.scores import build_scores
+from model.state_machine import run_state_machine
+from model.policy import build_exposure_plan
+
+
+def test_exposure_plan_is_quantized_and_bounded() -> None:
+    config = load_config()
+    raw = generate_synthetic_chinext50_data(periods=400, seed=11)
+    featured = build_feature_table(raw)
+    scored = build_scores(featured)
+    stated = run_state_machine(scored, config)
+    planned = build_exposure_plan(stated, config)
+
+    ladder_step = float(config['trading']['exposure_ladder_step'])
+    max_step = float(config['trading']['max_daily_exposure_change'])
+    assert planned['target_exposure'].between(0.0, 1.0).all()
+    assert (planned['target_exposure'].diff().abs().dropna() <= max_step + 1e-12).all()
+    scaled = planned['target_exposure'].dropna() / ladder_step
+    assert ((scaled - scaled.round()).abs() < 1e-9).all()
+
+
+def test_state_machine_raises_if_invalid_signal_reappears_after_warmup() -> None:
+    idx = pd.date_range('2024-01-01', periods=3, freq='D')
+    df = pd.DataFrame(
+        {
+            'trend_score': [None, 0.6, 0.6],
+            'breadth_score': [None, 0.2, None],
+            'stress_score': [None, 0.1, 0.1],
+            'crowding_score': [None, 0.1, 0.1],
+            'repair_score': [None, 0.1, 0.1],
+            'down_hazard': [None, 0.3, 0.3],
+            'repair_hazard': [None, 0.4, 0.4],
+            'rebound_hazard': [None, 0.4, 0.4],
+            'd_trend': [0.0, 0.0, 0.0],
+            'd_breadth': [0.0, 0.0, 0.0],
+            'd_stress': [0.0, 0.0, 0.0],
+            'd_crowding': [0.0, 0.0, 0.0],
+            'score_acceleration': [0.0, 0.0, 0.0],
+            'core_score_ready': [False, True, False],
+            'hazard_ready': [False, True, True],
+            'breakout_dist_120': [0.0, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='invalid score/hazard after warmup'):
+        run_state_machine(df, load_config())
+
+
+def test_warmup_rows_have_zero_exposure() -> None:
+    idx = pd.date_range('2024-01-01', periods=2, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['warmup', 'chop'],
+            'days_in_state': [1, 1],
+            'down_hazard': [None, 0.3],
+            'repair_hazard': [None, 0.4],
+            'stress_score': [None, 0.1],
+            'trend_score': [None, 0.1],
+            'breadth_score': [None, 0.0],
+            'crowding_score': [None, 0.1],
+            'upper_wick_ratio_5': [None, 0.1],
+        },
+        index=idx,
+    )
+
+    planned = build_exposure_plan(df, load_config())
+    assert planned.loc[idx[0], 'target_exposure'] == 0.0
+    assert planned.loc[idx[0], 'veto_reason'] == 'warmup'
+
+
+def test_policy_does_not_swallow_invalid_trend_inputs() -> None:
+    idx = pd.date_range('2024-01-01', periods=1, freq='D')
+    df = pd.DataFrame(
+        {
+            'state': ['trend'],
+            'days_in_state': [1],
+            'down_hazard': [0.3],
+            'repair_hazard': [0.4],
+            'stress_score': [0.1],
+            'trend_score': [0.5],
+            'breadth_score': [None],
+            'crowding_score': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+    with pytest.raises(ValueError, match='requires non-null "breadth_score"'):
+        build_exposure_plan(df, load_config())
+
+
+def test_candidate_overrides_produce_different_exposure_paths_under_finer_ladder() -> None:
+    idx = pd.date_range('2024-01-01', periods=4, freq='D')
+    state_df = pd.DataFrame(
+        {
+            'state': ['trend', 'trend', 'chop', 'repair'],
+            'days_in_state': [1, 2, 1, 2],
+            'down_hazard': [0.3, 0.3, 0.3, 0.3],
+            'repair_hazard': [0.6, 0.6, 0.6, 0.7],
+            'stress_score': [0.1, 0.1, 0.2, 0.2],
+            'trend_score': [0.5, 0.5, 0.1, 0.2],
+            'breadth_score': [0.0, 0.0, 0.0, 0.0],
+            'crowding_score': [0.1, 0.1, 0.1, 0.1],
+            'upper_wick_ratio_5': [0.1, 0.1, 0.1, 0.1],
+        },
+        index=idx,
+    )
+
+    baseline_cfg = load_config()
+    pro_risk_cfg = load_config()
+    pro_risk_cfg['policy']['trend'] = 1.00
+    pro_risk_cfg['policy']['euphoric_late'] = 0.65
+    pro_risk_cfg['policy']['chop'] = 0.35
+    pro_risk_cfg['policy']['repair_rebound_base'] = 0.45
+    pro_risk_cfg['policy']['repair_rebound_max'] = 0.95
+    pro_risk_cfg['trading']['max_daily_exposure_change'] = 0.30
+
+    baseline = build_exposure_plan(state_df, baseline_cfg)
+    pro_risk = build_exposure_plan(state_df, pro_risk_cfg)
+
+    assert not baseline['target_exposure'].equals(pro_risk['target_exposure'])
+
+
+def _single_state_row(
+    *,
+    trend_score: float,
+    breadth_score: float,
+    stress_score: float,
+    crowding_score: float,
+    down_hazard: float,
+    repair_hazard: float,
+    rebound_hazard: float,
+    d_stress: float,
+) -> pd.DataFrame:
+    idx = pd.date_range('2024-02-01', periods=1, freq='D')
+    return pd.DataFrame(
+        {
+            'trend_score': [trend_score],
+            'breadth_score': [breadth_score],
+            'stress_score': [stress_score],
+            'crowding_score': [crowding_score],
+            'repair_score': [0.0],
+            'down_hazard': [down_hazard],
+            'repair_hazard': [repair_hazard],
+            'rebound_hazard': [rebound_hazard],
+            'd_trend': [0.0],
+            'd_breadth': [0.0],
+            'd_stress': [d_stress],
+            'd_crowding': [0.0],
+            'score_acceleration': [0.0],
+            'core_score_ready': [True],
+            'hazard_ready': [True],
+            'breakout_dist_120': [0.1],
+            'upper_wick_ratio_5': [0.1],
+        },
+        index=idx,
+    )
+
+
+def test_state_machine_overlap_prefers_trend_over_repair() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.30,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'trend'
+    assert out.iloc[0]['state'] == 'trend'
+
+
+def test_state_machine_trend_euphoric_branch_maps_to_euphoric_late() -> None:
+    df = _single_state_row(
+        trend_score=0.50,
+        breadth_score=0.00,
+        stress_score=0.40,
+        crowding_score=0.75,
+        down_hazard=0.20,
+        repair_hazard=0.70,
+        rebound_hazard=0.20,
+        d_stress=-0.05,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'euphoric_late'
+    assert out.iloc[0]['state'] == 'euphoric_late'
+
+
+def test_state_machine_risk_off_still_has_top_priority() -> None:
+    df = _single_state_row(
+        trend_score=0.60,
+        breadth_score=0.05,
+        stress_score=0.30,
+        crowding_score=0.10,
+        down_hazard=0.80,
+        repair_hazard=0.80,
+        rebound_hazard=0.80,
+        d_stress=-0.10,
+    )
+    out = run_state_machine(df, load_config())
+    assert out.iloc[0]['proposed_state'] == 'risk_off'
+    assert out.iloc[0]['state'] == 'risk_off'
+
+
+def test_state_machine_risk_off_threshold_is_config_driven() -> None:
+    df = _single_state_row(
+        trend_score=0.20,
+        breadth_score=-0.10,
+        stress_score=0.30,
+        crowding_score=0.10,
+        down_hazard=0.65,
+        repair_hazard=0.40,
+        rebound_hazard=0.30,
+        d_stress=0.05,
+    )
+    strict_cfg = load_config()
+    strict_cfg['state_machine']['thresholds']['risk_off_down_hazard'] = 0.62
+    loose_cfg = load_config()
+    loose_cfg['state_machine']['thresholds']['risk_off_down_hazard'] = 0.68
+
+    out_strict = run_state_machine(df, strict_cfg)
+    out_loose = run_state_machine(df, loose_cfg)
+    assert out_strict.iloc[0]['proposed_state'] == 'risk_off'
+    assert out_loose.iloc[0]['proposed_state'] != 'risk_off'

+ 250 - 0
research/chinext50_regime_project/deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.json

@@ -0,0 +1,250 @@
+{
+  "baseline": {
+    "dir": "outputs\\fullcode_seq_20260410_feedback_baseline_R1",
+    "annual_return_delta": 0.008913253822542266,
+    "drawdown_ratio_vs_baseline": 0.5753165515790345,
+    "upside_capture": 0.3242195755178677,
+    "hard_pass_window_ratio": 0.8,
+    "state_mix": {
+      "chop": 0.37827352085354027,
+      "euphoric_late": 0.07565470417070805,
+      "repair": 0.15809893307468478,
+      "risk_off": 0.3181377303588749,
+      "trend": 0.06983511154219205
+    },
+    "mean_target_exposure": 0.3157129000969932,
+    "state_conditioned_mean_target_exposure": {
+      "chop": 0.30666666666666664,
+      "euphoric_late": 0.5974358974358974,
+      "repair": 0.5423312883435583,
+      "risk_off": 0.008841463414634146,
+      "trend": 0.9444444444444444
+    },
+    "trend_plus_euphoric": 0.1454898157129001,
+    "annual_turnover": 15.866666666666665
+  },
+  "blocks": {
+    "D0": {
+      "dir": "outputs\\fullcode_seq_20260410_D0_diag_only",
+      "values": {
+        "annual_return_delta": 0.008913253822542266,
+        "drawdown_ratio_vs_baseline": 0.5753165515790345,
+        "utility_delta_vs_baseline": -0.006457814265319176,
+        "upside_capture": 0.3242195755178677,
+        "hard_pass_window_ratio": 0.8,
+        "risk_off": 0.3181377303588749,
+        "repair": 0.15809893307468478,
+        "trend_plus_euphoric": 0.1454898157129001,
+        "mean_target_exposure": 0.3157129000969932,
+        "annual_turnover": 15.866666666666665
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": 0.0,
+        "drawdown_ratio_vs_baseline": 0.0,
+        "utility_delta_vs_baseline": 0.0,
+        "upside_capture": 0.0,
+        "annual_turnover": 0.0,
+        "state_mix": {
+          "chop": 0.0,
+          "euphoric_late": 0.0,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        },
+        "mean_target_exposure": 0.0,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.0,
+          "euphoric_late": 0.0,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        }
+      }
+    },
+    "PreparatoryRepairThreshold": {
+      "dir": "outputs\\fullcode_seq_20260410_prep_repair_threshold_defaults",
+      "values": {
+        "annual_return_delta": 0.008913253822542266,
+        "drawdown_ratio_vs_baseline": 0.5753165515790345,
+        "utility_delta_vs_baseline": -0.006457814265319176,
+        "upside_capture": 0.3242195755178677,
+        "hard_pass_window_ratio": 0.8,
+        "risk_off": 0.3181377303588749,
+        "repair": 0.15809893307468478,
+        "trend_plus_euphoric": 0.1454898157129001,
+        "mean_target_exposure": 0.3157129000969932,
+        "annual_turnover": 15.866666666666665
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": 0.0,
+        "drawdown_ratio_vs_baseline": 0.0,
+        "utility_delta_vs_baseline": 0.0,
+        "upside_capture": 0.0,
+        "annual_turnover": 0.0,
+        "state_mix": {
+          "chop": 0.0,
+          "euphoric_late": 0.0,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        },
+        "mean_target_exposure": 0.0,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.0,
+          "euphoric_late": 0.0,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        }
+      }
+    },
+    "H1b1_L1": {
+      "dir": "outputs\\fullcode_seq_20260410_H1b1_L1",
+      "values": {
+        "annual_return_delta": 0.001597201647850044,
+        "drawdown_ratio_vs_baseline": 0.6539314181964493,
+        "utility_delta_vs_baseline": -0.051421427301753087,
+        "upside_capture": 0.34713525908163617,
+        "hard_pass_window_ratio": 1.0,
+        "risk_off": 0.31910766246362754,
+        "repair": 0.11542192046556742,
+        "trend_plus_euphoric": 0.15033947623666344,
+        "mean_target_exposure": 0.3466537342386033,
+        "annual_turnover": 15.89122807017544
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": -0.0073160521746922225,
+        "drawdown_ratio_vs_baseline": 0.07861486661741479,
+        "utility_delta_vs_baseline": -0.04496361303643391,
+        "upside_capture": 0.022915683563768496,
+        "annual_turnover": 0.024561403508775115,
+        "state_mix": {
+          "chop": 0.036857419980601325,
+          "euphoric_late": 0.004849660523763344,
+          "repair": -0.04267701260911737,
+          "risk_off": 0.0009699321047526577,
+          "trend": 0.0
+        },
+        "mean_target_exposure": 0.030940834141610085,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.0606230529595016,
+          "euphoric_late": 0.05919060858819891,
+          "repair": 0.059349383925349275,
+          "risk_off": 0.0021007858254874345,
+          "trend": 0.03472222222222221
+        }
+      }
+    },
+    "H1b1_L2": {
+      "dir": "outputs\\fullcode_seq_20260410_H1b1_L2",
+      "values": {
+        "annual_return_delta": -0.0018830546883454868,
+        "drawdown_ratio_vs_baseline": 0.660712885642315,
+        "utility_delta_vs_baseline": -0.06343526149008206,
+        "upside_capture": 0.3458391781185839,
+        "hard_pass_window_ratio": 1.0,
+        "risk_off": 0.31910766246362754,
+        "repair": 0.11348205625606207,
+        "trend_plus_euphoric": 0.15033947623666344,
+        "mean_target_exposure": 0.3457807953443259,
+        "annual_turnover": 15.842105263157894
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": -0.010796308510887753,
+        "drawdown_ratio_vs_baseline": 0.08539633406328051,
+        "utility_delta_vs_baseline": -0.05697744722476289,
+        "upside_capture": 0.021619602600716215,
+        "annual_turnover": -0.024561403508771562,
+        "state_mix": {
+          "chop": 0.038797284190106696,
+          "euphoric_late": 0.004849660523763344,
+          "repair": -0.04461687681862271,
+          "risk_off": 0.0009699321047526577,
+          "trend": 0.0
+        },
+        "mean_target_exposure": 0.030067895247332665,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.06054263565891477,
+          "euphoric_late": 0.05919060858819891,
+          "repair": 0.055959309947040015,
+          "risk_off": 0.0021007858254874345,
+          "trend": 0.03472222222222221
+        }
+      }
+    },
+    "H1b1_L3": {
+      "dir": "outputs\\fullcode_seq_20260410_H1b1_L3",
+      "values": {
+        "annual_return_delta": -0.00027225064206781724,
+        "drawdown_ratio_vs_baseline": 0.6579096752409642,
+        "utility_delta_vs_baseline": -0.05592534621329974,
+        "upside_capture": 0.34549456157117225,
+        "hard_pass_window_ratio": 1.0,
+        "risk_off": 0.31910766246362754,
+        "repair": 0.1096023278370514,
+        "trend_plus_euphoric": 0.15033947623666344,
+        "mean_target_exposure": 0.3447138700290979,
+        "annual_turnover": 15.645614035087721
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": -0.009185504464610084,
+        "drawdown_ratio_vs_baseline": 0.08259312366192972,
+        "utility_delta_vs_baseline": -0.049467531947980564,
+        "upside_capture": 0.02127498605330458,
+        "annual_turnover": -0.22105263157894406,
+        "state_mix": {
+          "chop": 0.042677012609117326,
+          "euphoric_late": 0.004849660523763344,
+          "repair": -0.04849660523763338,
+          "risk_off": 0.0009699321047526577,
+          "trend": 0.0
+        },
+        "mean_target_exposure": 0.029000969932104714,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.0599231950844854,
+          "euphoric_late": 0.05919060858819891,
+          "repair": 0.05678375590422935,
+          "risk_off": 0.0021007858254874345,
+          "trend": 0.03472222222222221
+        }
+      }
+    },
+    "H1b2_direct_from_R1": {
+      "dir": "outputs\\fullcode_seq_20260410_H1b2_direct_from_R1",
+      "values": {
+        "annual_return_delta": 0.005272696663371601,
+        "drawdown_ratio_vs_baseline": 0.5753165515790345,
+        "utility_delta_vs_baseline": -0.0181351173653614,
+        "upside_capture": 0.32302824960734805,
+        "hard_pass_window_ratio": 0.8,
+        "risk_off": 0.3181377303588749,
+        "repair": 0.15809893307468478,
+        "trend_plus_euphoric": 0.1425800193986421,
+        "mean_target_exposure": 0.3151309408341416,
+        "annual_turnover": 15.964912280701753
+      },
+      "delta_vs_R1": {
+        "annual_return_delta": -0.0036405571591706654,
+        "drawdown_ratio_vs_baseline": 0.0,
+        "utility_delta_vs_baseline": -0.011677303100042224,
+        "upside_capture": -0.001191325910519625,
+        "annual_turnover": 0.09824561403508802,
+        "state_mix": {
+          "chop": 0.002909796314257973,
+          "euphoric_late": -0.0029097963142580008,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        },
+        "mean_target_exposure": -0.0005819592628515946,
+        "state_conditioned_mean_target_exposure": {
+          "chop": 0.0007124681933842192,
+          "euphoric_late": -0.00010256410256415105,
+          "repair": 0.0,
+          "risk_off": 0.0,
+          "trend": 0.0
+        }
+      }
+    }
+  }
+}

+ 75 - 0
research/chinext50_regime_project/deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.md

@@ -0,0 +1,75 @@
+# Chinext50 Post-B3 Feedback Sequence Execution Summary (2026-04-10)
+
+Baseline(R1): `outputs\fullcode_seq_20260410_feedback_baseline_R1`
+- annual_return_delta: 0.008913
+- drawdown_ratio_vs_baseline: 0.575317
+- upside_capture: 0.324220
+- mean_target_exposure: 0.315713
+- trend_plus_euphoric: 0.145490
+
+## D0
+- output_dir: `outputs\fullcode_seq_20260410_D0_diag_only`
+- annual_return_delta: 0.008913 (delta_vs_R1 +0.000000)
+- drawdown_ratio_vs_baseline: 0.575317 (delta_vs_R1 +0.000000)
+- upside_capture: 0.324220 (delta_vs_R1 +0.000000)
+- hard_pass_window_ratio: 0.800000
+- risk_off: 0.318138
+- repair: 0.158099
+- trend_plus_euphoric: 0.145490
+- mean_target_exposure: 0.315713 (delta_vs_R1 +0.000000)
+
+## PreparatoryRepairThreshold
+- output_dir: `outputs\fullcode_seq_20260410_prep_repair_threshold_defaults`
+- annual_return_delta: 0.008913 (delta_vs_R1 +0.000000)
+- drawdown_ratio_vs_baseline: 0.575317 (delta_vs_R1 +0.000000)
+- upside_capture: 0.324220 (delta_vs_R1 +0.000000)
+- hard_pass_window_ratio: 0.800000
+- risk_off: 0.318138
+- repair: 0.158099
+- trend_plus_euphoric: 0.145490
+- mean_target_exposure: 0.315713 (delta_vs_R1 +0.000000)
+
+## H1b1_L1
+- output_dir: `outputs\fullcode_seq_20260410_H1b1_L1`
+- annual_return_delta: 0.001597 (delta_vs_R1 -0.007316)
+- drawdown_ratio_vs_baseline: 0.653931 (delta_vs_R1 +0.078615)
+- upside_capture: 0.347135 (delta_vs_R1 +0.022916)
+- hard_pass_window_ratio: 1.000000
+- risk_off: 0.319108
+- repair: 0.115422
+- trend_plus_euphoric: 0.150339
+- mean_target_exposure: 0.346654 (delta_vs_R1 +0.030941)
+
+## H1b1_L2
+- output_dir: `outputs\fullcode_seq_20260410_H1b1_L2`
+- annual_return_delta: -0.001883 (delta_vs_R1 -0.010796)
+- drawdown_ratio_vs_baseline: 0.660713 (delta_vs_R1 +0.085396)
+- upside_capture: 0.345839 (delta_vs_R1 +0.021620)
+- hard_pass_window_ratio: 1.000000
+- risk_off: 0.319108
+- repair: 0.113482
+- trend_plus_euphoric: 0.150339
+- mean_target_exposure: 0.345781 (delta_vs_R1 +0.030068)
+
+## H1b1_L3
+- output_dir: `outputs\fullcode_seq_20260410_H1b1_L3`
+- annual_return_delta: -0.000272 (delta_vs_R1 -0.009186)
+- drawdown_ratio_vs_baseline: 0.657910 (delta_vs_R1 +0.082593)
+- upside_capture: 0.345495 (delta_vs_R1 +0.021275)
+- hard_pass_window_ratio: 1.000000
+- risk_off: 0.319108
+- repair: 0.109602
+- trend_plus_euphoric: 0.150339
+- mean_target_exposure: 0.344714 (delta_vs_R1 +0.029001)
+
+## H1b2_direct_from_R1
+- output_dir: `outputs\fullcode_seq_20260410_H1b2_direct_from_R1`
+- annual_return_delta: 0.005273 (delta_vs_R1 -0.003641)
+- drawdown_ratio_vs_baseline: 0.575317 (delta_vs_R1 +0.000000)
+- upside_capture: 0.323028 (delta_vs_R1 -0.001191)
+- hard_pass_window_ratio: 0.800000
+- risk_off: 0.318138
+- repair: 0.158099
+- trend_plus_euphoric: 0.142580
+- mean_target_exposure: 0.315131 (delta_vs_R1 -0.000582)
+

+ 17 - 0
research/chinext50_regime_project/deliverables/fullcode_guidance_closure_2026-04-24.md

@@ -0,0 +1,17 @@
+# Fullcode Guidance Closure (2026-04-24)
+
+Change: `execute-fullcode-guidance-20260410`
+
+Closure decision:
+- Archive the change without further `H2/H3` tuning.
+- Treat the fullcode branch as a completed research/diagnostic line, not the production-critical path.
+
+Why this branch is being closed:
+- The sequence completed the high-value semantic repairs (`B1` to `B4`) and established correct stitched-OOS evaluation semantics.
+- Subsequent H1 ladder attempts produced limited or negative incremental value under the project guardrails.
+- The lighter `regime-lite` branch achieved a validated promotion result on real PIT data and now serves the project’s practical end-state more directly.
+
+Implication for project convergence:
+- Fullcode remains available as a reference research branch.
+- The project’s preferred operational baseline is now the validated lite runtime profile `promoted_fast_entry_hold3`.
+- Further fullcode policy tuning is intentionally out of scope unless a future project-level failure reopens that path.

+ 64 - 0
research/chinext50_regime_project/deliverables/gpt_pro_blockers_harden_derived_breadth_2026-04-09.md

@@ -0,0 +1,64 @@
+# GPT Pro Blocker Log - Harden Derived Breadth (2026-04-09)
+
+## Scope
+
+- Change: `harden-derived-breadth-production`
+- Pipeline: `pipelines/ingest_real_data.py` with `--derive-breadth` and full ChiNext50 constituents
+- Date window: `2020-01-01` to `2026-04-09`
+
+## Command Context
+
+```powershell
+py pipelines/ingest_real_data.py `
+  --provider csv `
+  --market-csv outputs/ingestion_derived_smoke_20260409_v12/raw/market.csv `
+  --hs300-csv outputs/ingestion_derived_smoke_20260409_v12/raw/hs300.csv `
+  --star50-csv outputs/ingestion_derived_smoke_20260409_v12/raw/star50.csv `
+  --csi1000-csv outputs/ingestion_derived_smoke_20260409_v12/raw/csi1000.csv `
+  --derive-breadth `
+  --breadth-min-active-constituents 20 `
+  --breadth-cache-dir outputs/ingestion_derived_full50_20260409_v2/raw/constituent_history `
+  --start-date 2020-01-01 `
+  --end-date 2026-04-09 `
+  --mairui-licence AE17EE23-AAE4-492F-A959-EC883DFA5A76 `
+  --strict-data `
+  --output-dir outputs/ingestion_derived_full50_20260409_v2
+```
+
+## Resolved in This Round
+
+1. Strict false-positive blocker was fixed:
+- Previous failure: `top10_contribution_5: std_below_floor`.
+- Fix: only trigger `std_below_floor` when low-std is accompanied by low uniqueness or dominant repeated values.
+- Current status: strict breadth integrity passes.
+
+2. Cache efficiency improved:
+- Added metadata cache (`_meta_cache.json`) under breadth cache dir.
+- First full50 run (cache warm-up): ~`1199.9s`.
+- Second full50 run (cache hit): ~`26.9s`.
+- Current summary shows `meta_cache.hit_count=50`, `meta_cache.miss_count=0`.
+
+## Remaining Blocker for GPT Pro Guidance
+
+### B-01: Industry metadata source remains degraded
+
+- Evidence file:
+  - `outputs/ingestion_derived_full50_20260409_v2/raw/breadth_derivation_summary.json`
+- Current metrics:
+  - `industry_unknown_ratio = 1.0`
+  - `industry_unique_count = 1`
+  - `sector_concentration_mode = weight_hhi_proxy`
+  - `metadata_error_symbol_count = 50`
+- Representative error:
+  - `akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests`
+
+### Why it matters
+
+- Full50 pipeline is runnable and strict-passable, but sector concentration currently falls back to HHI proxy due missing industry coverage.
+- This weakens interpretability of sector crowding diagnostics and may reduce regime explainability.
+
+### Questions for GPT Pro
+
+1. Should we promote a new primary metadata source (instead of Akshare `stock_individual_info_em`) for industry + float shares under sustained API instability?
+2. Should unknown-industry ratio become a strict blocking signal once a stable alternative source is introduced?
+3. Is it preferable to build an offline sector mapping layer (snapshot + periodic refresh) to remove runtime dependency on unstable metadata APIs?

BIN
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09.zip


+ 42 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/QUESTIONS_FOR_GPT_PRO.md

@@ -0,0 +1,42 @@
+# Questions For GPT Pro (Harden Derived Breadth)
+
+## Context
+
+- Change archived: `2026-04-09-harden-derived-breadth-production`
+- Full50 strict ingestion now passes and e2e pipelines run successfully.
+- Remaining blocker is metadata quality degradation from upstream provider instability.
+
+## Evidence Snapshot
+
+- `industry_unknown_ratio = 1.0`
+- `sector_concentration_mode = weight_hhi_proxy`
+- metadata errors on all 50 symbols:
+  - Akshare: `RemoteDisconnected`
+  - Mairui: `HTTP 429 Too Many Requests`
+- Main evidence file:
+  - `outputs/ingestion_derived_full50_20260409_v2/raw/breadth_derivation_summary.json`
+
+## Ask
+
+1. Recommend the best production-grade industry metadata source strategy:
+   - keep Akshare primary + Mairui fallback,
+   - switch primary source,
+   - or build offline sector snapshot + periodic refresh.
+
+2. Define strict-gate policy for metadata quality:
+   - Should `industry_unknown_ratio` become a blocking threshold?
+   - If yes, what threshold and warmup exception rules do you recommend?
+
+3. Suggest robust anti-rate-limit design:
+   - retry/backoff and jitter policy,
+   - local cache TTL and invalidation rules,
+   - batch schedule for refreshing metadata.
+
+4. Confirm whether `weight_hhi_proxy` is acceptable as long-term fallback,
+   and how to quantify impact on crowding diagnostics quality.
+
+## Desired Output From GPT Pro
+
+- A concrete implementation checklist with parameter-level recommendations.
+- Suggested OpenSpec change boundary (scope split if needed).
+- Any required metric acceptance criteria before moving to production.

+ 925 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/data/breadth_builder.py

@@ -0,0 +1,925 @@
+from __future__ import annotations
+
+import importlib
+import json
+import time
+from pathlib import Path
+from typing import Any, Callable, Mapping
+from urllib.error import HTTPError, URLError
+from urllib.parse import urlencode
+from urllib.request import Request, urlopen
+
+import numpy as np
+import pandas as pd
+
+
+BREADTH_REQUIRED_COLUMNS: tuple[str, ...] = (
+    'pct_constituents_above_20dma',
+    'pct_constituents_above_60dma',
+    'pct_new_high_20',
+    'pct_new_low_20',
+    'eq_weight_ret_5',
+    'weighted_ret_5',
+    'top3_contribution_5',
+    'top1_contribution_5',
+    'top10_contribution_5',
+    'sector_concentration_20',
+    'corr_spike_20',
+    'dispersion_20',
+)
+
+MAIRUI_BASE_URL = 'https://api.mairuiapi.com'
+DEFAULT_WARMUP_OBSERVATIONS: dict[str, int] = {
+    'pct_constituents_above_20dma': 40,
+    'pct_constituents_above_60dma': 60,
+    'pct_new_high_20': 40,
+    'pct_new_low_20': 40,
+    'sector_concentration_20': 40,
+    'corr_spike_20': 20,
+    'dispersion_20': 20,
+    'concentration_spread_5': 20,
+}
+CACHE_BOUNDARY_TOLERANCE_DAYS = 10
+META_FETCH_MAX_ATTEMPTS = 3
+
+
+def _to_yyyymmdd(value: str | None) -> str | None:
+    if value is None:
+        return None
+    return pd.Timestamp(value).strftime('%Y%m%d')
+
+
+def _load_akshare() -> Any:
+    try:
+        return importlib.import_module('akshare')
+    except ImportError as exc:
+        raise RuntimeError('derived breadth requires dependency "akshare". Install it first.') from exc
+
+
+def _load_json_url(url: str) -> Any:
+    request = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
+    last_error: Exception | None = None
+    for attempt in range(5):
+        try:
+            with urlopen(request, timeout=30) as resp:
+                payload = json.loads(resp.read().decode('utf-8'))
+            if isinstance(payload, dict) and 'error' in payload:
+                raise ValueError(f'Mairui API error: {payload["error"]}')
+            return payload
+        except HTTPError as exc:
+            last_error = exc
+            if exc.code not in {429, 500, 502, 503, 504} or attempt == 4:
+                raise
+            time.sleep(1.0 + attempt * 1.5)
+        except URLError as exc:
+            last_error = exc
+            if attempt == 4:
+                raise
+            time.sleep(1.0 + attempt * 1.5)
+    if last_error is not None:
+        raise last_error
+    raise RuntimeError(f'Failed to fetch url: {url}')
+
+
+def _normalize_close_panel(df: pd.DataFrame, source_label: str) -> pd.DataFrame:
+    rename_map = {
+        'date': 'date',
+        'close': 'close',
+        'c': 'close',
+        't': 'date',
+    }
+    out = df.rename(columns=rename_map).copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+    if 'date' not in out.columns or 'close' not in out.columns:
+        raise ValueError(f'{source_label} must contain date and close columns.')
+    out['date'] = pd.to_datetime(out['date'], errors='coerce')
+    out['close'] = pd.to_numeric(out['close'], errors='coerce')
+    out = out.dropna(subset=['date'])
+    out = out[['date', 'close']].drop_duplicates(subset=['date'], keep='last').sort_values('date')
+    return out.set_index('date')
+
+
+def _fetch_constituents_akshare(index_symbol: str) -> pd.DataFrame:
+    ak = _load_akshare()
+    raw = ak.index_stock_cons(symbol=index_symbol)
+    if raw is None or raw.empty:
+        raise ValueError(f'Akshare returned empty constituents for index {index_symbol}.')
+    out = raw.copy()
+    out.columns = [str(col).strip().lower() for col in out.columns]
+
+    code_col = next(
+        (
+            col
+            for col in out.columns
+            if out[col].astype(str).str.extract(r'(\d{6})', expand=False).notna().mean() > 0.5
+        ),
+        out.columns[0],
+    )
+    date_col = next(
+        (
+            col
+            for col in out.columns
+            if out[col]
+            .astype(str)
+            .str.strip()
+            .str.match(r'^(\\d{4}[-/]\\d{1,2}[-/]\\d{1,2}|\\d{8})$')
+            .mean()
+            > 0.5
+        ),
+        out.columns[min(2, len(out.columns) - 1)],
+    )
+    name_col = next((col for col in out.columns if col not in {code_col, date_col}), out.columns[min(1, len(out.columns) - 1)])
+
+    entry_raw = out[date_col].astype(str).str.strip()
+    entry_mask = entry_raw.str.match(r'^(\d{4}[-/]\d{1,2}[-/]\d{1,2}|\d{8})$', na=False)
+    rows = pd.DataFrame(
+        {
+            'symbol': out[code_col].astype(str).str.extract(r'(\d{6})', expand=False),
+            'name': out[name_col].astype(str),
+            'entry_date': pd.to_datetime(entry_raw.where(entry_mask), errors='coerce'),
+        }
+    )
+    rows = rows.dropna(subset=['symbol']).drop_duplicates(subset=['symbol'], keep='last').sort_values('symbol')
+    if rows.empty:
+        raise ValueError(f'No valid constituent symbols parsed for index {index_symbol}.')
+    return rows.reset_index(drop=True)
+
+
+def _fetch_stock_history_akshare(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+    ak = _load_akshare()
+    st = _to_yyyymmdd(start_date) or '20050101'
+    et = _to_yyyymmdd(end_date) or pd.Timestamp.today().strftime('%Y%m%d')
+    try:
+        raw = ak.stock_zh_a_hist(symbol=symbol, period='daily', start_date=st, end_date=et, adjust='')
+    except Exception as primary_exc:
+        # Eastmoney endpoint is occasionally unstable; fall back to Tencent line.
+        return _fetch_stock_history_akshare_tx(symbol=symbol, start_date=start_date, end_date=end_date, primary_exc=primary_exc)
+    if raw is None or raw.empty:
+        return _fetch_stock_history_akshare_tx(symbol=symbol, start_date=start_date, end_date=end_date, primary_exc=ValueError('empty panel'))
+
+    date_col = raw.columns[0]
+    close_col = None
+    lower_cols = [str(col).strip().lower() for col in raw.columns]
+    if 'close' in lower_cols:
+        close_col = raw.columns[lower_cols.index('close')]
+    elif len(raw.columns) >= 3:
+        close_col = raw.columns[2]
+    else:
+        numeric_cols = [
+            col for col in raw.columns[1:] if pd.to_numeric(raw[col], errors='coerce').notna().mean() > 0.5
+        ]
+        if numeric_cols:
+            close_col = numeric_cols[0]
+
+    if close_col is None:
+        raise ValueError(f'Unable to detect close column from Akshare history for symbol {symbol}.')
+
+    normalized = raw.rename(columns={date_col: 'date', close_col: 'close'})[['date', 'close']]
+    return _normalize_close_panel(normalized, source_label=f'akshare_stock_{symbol}')
+
+
+def _fetch_stock_history_akshare_tx(
+    *,
+    symbol: str,
+    start_date: str | None,
+    end_date: str | None,
+    primary_exc: Exception,
+) -> pd.DataFrame:
+    ak = _load_akshare()
+    tx_symbol = f"{'sh' if symbol.startswith(('6', '9')) else 'sz'}{symbol}"
+    st = _to_yyyymmdd(start_date) or '20050101'
+    et = _to_yyyymmdd(end_date) or pd.Timestamp.today().strftime('%Y%m%d')
+    try:
+        raw = ak.stock_zh_a_hist_tx(symbol=tx_symbol, start_date=st, end_date=et, adjust='')
+    except Exception as tx_exc:
+        raise ValueError(f'Akshare history failed for {symbol}: primary={primary_exc}; tx={tx_exc}') from tx_exc
+    if raw is None or raw.empty:
+        raise ValueError(f'Akshare history failed for {symbol}: primary={primary_exc}; tx=empty panel')
+    normalized = raw.rename(columns={'date': 'date', 'close': 'close'})[['date', 'close']]
+    return _normalize_close_panel(normalized, source_label=f'akshare_tx_stock_{symbol}')
+
+
+def _infer_mairui_exchange(symbol: str) -> str:
+    return 'SH' if symbol.startswith(('6', '9')) else 'SZ'
+
+
+def _fetch_stock_history_mairui(symbol: str, start_date: str | None, end_date: str | None, licence: str) -> pd.DataFrame:
+    if not licence:
+        raise ValueError('Mairui licence is required for fallback stock history fetch.')
+    code = f'{symbol}.{_infer_mairui_exchange(symbol)}'
+    path = f'/hsstock/history/{code}/d/n/{licence}'
+    params: dict[str, str] = {}
+    st = _to_yyyymmdd(start_date)
+    et = _to_yyyymmdd(end_date)
+    if st:
+        params['st'] = st
+    if et:
+        params['et'] = et
+    if params:
+        path = f'{path}?{urlencode(params)}'
+    payload = _load_json_url(f'{MAIRUI_BASE_URL}{path}')
+    if not isinstance(payload, list) or not payload:
+        raise ValueError(f'Mairui returned empty history for symbol {symbol}.')
+    return _normalize_close_panel(pd.DataFrame(payload), source_label=f'mairui_stock_{symbol}')
+
+
+def _to_float_or_none(value: Any) -> float | None:
+    if value is None:
+        return None
+    text = str(value).strip().replace(',', '')
+    if not text:
+        return None
+    try:
+        return float(text)
+    except ValueError:
+        return None
+
+
+def _load_cached_close_history(path: Path) -> pd.DataFrame | None:
+    if not path.exists():
+        return None
+    try:
+        cached = pd.read_csv(path)
+    except Exception:
+        return None
+    if cached.empty:
+        return None
+    try:
+        return _normalize_close_panel(cached, source_label=f'cache_{path.name}')
+    except Exception:
+        return None
+
+
+def _load_cached_meta(path: Path) -> dict[str, dict[str, Any]]:
+    if not path.exists():
+        return {}
+    try:
+        payload = json.loads(path.read_text(encoding='utf-8'))
+    except Exception:
+        return {}
+    if not isinstance(payload, dict):
+        return {}
+    out: dict[str, dict[str, Any]] = {}
+    for symbol, item in payload.items():
+        if not isinstance(item, dict):
+            continue
+        out[str(symbol).zfill(6)] = dict(item)
+    return out
+
+
+def _persist_cached_meta(path: Path, cache: Mapping[str, Mapping[str, Any]]) -> None:
+    serializable: dict[str, dict[str, Any]] = {}
+    for symbol, item in cache.items():
+        serializable[str(symbol).zfill(6)] = {
+            'industry': str(item.get('industry') or 'unknown'),
+            'float_shares': _to_float_or_none(item.get('float_shares')),
+            'provider': str(item.get('provider') or 'unknown'),
+            'error': str(item.get('error') or ''),
+        }
+    path.write_text(json.dumps(serializable, ensure_ascii=False, indent=2), encoding='utf-8')
+
+
+def _history_covers_range(history: pd.DataFrame, start_date: str | None, end_date: str | None) -> bool:
+    if history is None or history.empty:
+        return False
+    start = pd.Timestamp(start_date) if start_date else None
+    end = pd.Timestamp(end_date) if end_date else None
+    min_dt = history.index.min()
+    max_dt = history.index.max()
+    if start is not None and min_dt > start:
+        gap_days = int((min_dt - start).days)
+        if gap_days > CACHE_BOUNDARY_TOLERANCE_DAYS:
+            return False
+    if end is not None and max_dt < end:
+        gap_days = int((end - max_dt).days)
+        if gap_days > CACHE_BOUNDARY_TOLERANCE_DAYS:
+            return False
+    if start is not None and max_dt < start:
+        return False
+    if end is not None and min_dt > end:
+        return False
+    return True
+
+
+def _extract_key_value_map(raw: pd.DataFrame) -> dict[str, Any]:
+    if raw is None or raw.empty:
+        return {}
+    out = raw.copy()
+    cols = [str(col).strip() for col in out.columns]
+    lower_cols = [col.lower() for col in cols]
+
+    item_col = None
+    value_col = None
+    for candidate in ('item', '\u9879\u76ee', 'name', '\u540d\u79f0'):
+        if candidate in lower_cols:
+            item_col = out.columns[lower_cols.index(candidate)]
+            break
+    for candidate in ('value', '\u503c', '\u6570\u503c'):
+        if candidate in lower_cols:
+            value_col = out.columns[lower_cols.index(candidate)]
+            break
+
+    if item_col is None and len(out.columns) >= 1:
+        item_col = out.columns[0]
+    if value_col is None and len(out.columns) >= 2:
+        value_col = out.columns[1]
+    if item_col is None or value_col is None:
+        return {}
+
+    values: dict[str, Any] = {}
+    for key, value in zip(out[item_col], out[value_col], strict=False):
+        key_text = str(key).strip()
+        if not key_text:
+            continue
+        values[key_text] = value
+        values[key_text.lower()] = value
+    return values
+
+
+def _parse_stock_meta_values(values: Mapping[str, Any]) -> dict[str, Any]:
+    float_shares = None
+    for key, value in values.items():
+        key_text = str(key).strip().lower()
+        if 'float' in key_text or 'share' in key_text or 'circul' in key_text or '\u6d41\u901a' in key_text:
+            parsed = _to_float_or_none(value)
+            if parsed and parsed > 0:
+                float_shares = parsed
+                break
+    if float_shares is None:
+        numeric_candidates = [_to_float_or_none(v) for v in values.values()]
+        numeric_candidates = [v for v in numeric_candidates if v is not None and v > 0]
+        if numeric_candidates:
+            float_shares = max(numeric_candidates)
+
+    industry = 'unknown'
+    for key, value in values.items():
+        key_text = str(key).strip().lower()
+        if 'industry' in key_text or '\u884c\u4e1a' in key_text:
+            text = str(value).strip()
+            if text:
+                industry = text
+                break
+    return {'industry': industry, 'float_shares': float_shares}
+
+
+def _fetch_stock_meta_akshare(symbol: str) -> dict[str, Any]:
+    ak = _load_akshare()
+    last_error: Exception | None = None
+    for attempt in range(META_FETCH_MAX_ATTEMPTS):
+        try:
+            raw = ak.stock_individual_info_em(symbol=symbol)
+            values = _extract_key_value_map(raw)
+            if not values:
+                return {'industry': 'unknown', 'float_shares': None}
+            return _parse_stock_meta_values(values)
+        except Exception as exc:
+            last_error = exc
+            if attempt == META_FETCH_MAX_ATTEMPTS - 1:
+                raise
+            time.sleep(0.8 + attempt * 1.2)
+    if last_error is not None:
+        raise last_error
+    return {'industry': 'unknown', 'float_shares': None}
+
+
+def _pick_industry_from_mairui(payload: Any) -> str:
+    if not isinstance(payload, list):
+        return 'unknown'
+    entries: list[str] = []
+    for item in payload:
+        if not isinstance(item, dict):
+            continue
+        labels = [
+            str(item.get('jctype') or '').strip(),
+            str(item.get('type') or '').strip(),
+        ]
+        names = [
+            str(item.get('jcmc') or '').strip(),
+            str(item.get('name') or '').strip(),
+        ]
+        if any('\u884c\u4e1a' in label for label in labels):
+            for name in names:
+                if name:
+                    return name
+        for name in names:
+            if name:
+                entries.append(name)
+    return entries[0] if entries else 'unknown'
+
+
+def _fetch_stock_meta_mairui(symbol: str, licence: str) -> dict[str, Any]:
+    if not licence:
+        raise ValueError('Mairui licence is required for stock metadata fallback fetch.')
+    url = f'{MAIRUI_BASE_URL}/hszg/zg/{symbol}/{licence}'
+    payload = _load_json_url(url)
+    industry = _pick_industry_from_mairui(payload)
+    return {'industry': industry, 'float_shares': None}
+
+
+def _resolve_symbol_meta(
+    *,
+    symbol: str,
+    mairui_licence: str | None,
+    fetch_meta: Callable[[str], Mapping[str, Any]],
+    fetch_meta_fallback: Callable[[str, str], Mapping[str, Any]],
+) -> tuple[dict[str, Any], str, str | None]:
+    errors: list[str] = []
+    provider = 'akshare'
+    meta: dict[str, Any] = {}
+    try:
+        meta = dict(fetch_meta(symbol))
+    except Exception as exc:
+        errors.append(f'akshare_meta={exc}')
+        meta = {}
+
+    industry_text = str(meta.get('industry') or '').strip()
+    if industry_text and industry_text.lower() != 'unknown':
+        return meta, provider, ('; '.join(errors) if errors else None)
+
+    if mairui_licence:
+        try:
+            fallback_meta = dict(fetch_meta_fallback(symbol, mairui_licence))
+            fallback_industry = str(fallback_meta.get('industry') or '').strip()
+            if fallback_industry and fallback_industry.lower() != 'unknown':
+                provider = 'mairui'
+                if not meta.get('float_shares') and fallback_meta.get('float_shares'):
+                    meta['float_shares'] = fallback_meta.get('float_shares')
+                meta['industry'] = fallback_industry
+                return meta, provider, ('; '.join(errors) if errors else None)
+        except Exception as exc:
+            errors.append(f'mairui_meta={exc}')
+
+    if not industry_text:
+        meta['industry'] = 'unknown'
+    return meta, provider, ('; '.join(errors) if errors else None)
+
+
+def _mean_pairwise_corr(window: pd.DataFrame) -> float | None:
+    valid_cols = [col for col in window.columns if int(window[col].notna().sum()) >= 5]
+    if len(valid_cols) < 2:
+        return None
+    corr = window[valid_cols].corr()
+    if corr.empty:
+        return None
+    values = corr.to_numpy(dtype=float)
+    mask = np.triu(np.ones(values.shape, dtype=bool), k=1)
+    flat = values[mask]
+    flat = flat[~np.isnan(flat)]
+    if flat.size == 0:
+        return None
+    return float(flat.mean())
+
+
+def _compute_sector_concentration(weights: pd.DataFrame, industries: pd.Series) -> tuple[pd.Series, str]:
+    groups: dict[str, list[str]] = {}
+    for symbol, industry in industries.items():
+        key = str(industry).strip()
+        if not key:
+            key = 'unknown'
+        groups.setdefault(key, []).append(symbol)
+
+    known_groups = {k: v for k, v in groups.items() if k.lower() != 'unknown'}
+    if len(known_groups) < 2:
+        # Fallback: use concentration HHI when industry metadata is insufficient.
+        hhi = weights.fillna(0.0).pow(2).sum(axis=1, min_count=1)
+        hhi.index = weights.index
+        return hhi, 'weight_hhi_proxy'
+
+    out = pd.Series(index=weights.index, dtype=float)
+    for dt, row in weights.iterrows():
+        best = np.nan
+        for symbols in known_groups.values():
+            value = row[symbols].sum(min_count=1)
+            if pd.isna(value):
+                continue
+            best = float(value) if pd.isna(best) else float(max(best, value))
+        out.loc[dt] = best
+    return out, 'industry_max_share'
+
+
+def _build_required_breadth_columns(
+    close_panel: pd.DataFrame,
+    float_shares: pd.Series,
+    industries: pd.Series,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    close = close_panel.sort_index()
+    active_mask = close.notna()
+    ret1 = close.pct_change()
+    ret5 = close.div(close.shift(5)).sub(1.0)
+
+    ma20 = close.rolling(20, min_periods=20).mean()
+    ma60 = close.rolling(60, min_periods=60).mean()
+    high20 = close.rolling(20, min_periods=20).max()
+    low20 = close.rolling(20, min_periods=20).min()
+
+    pct_above_20 = close.gt(ma20).where(active_mask).mean(axis=1, skipna=True)
+    pct_above_60 = close.gt(ma60).where(active_mask).mean(axis=1, skipna=True)
+    pct_new_high_20 = close.ge(high20).where(active_mask).mean(axis=1, skipna=True)
+    pct_new_low_20 = close.le(low20).where(active_mask).mean(axis=1, skipna=True)
+
+    eq_weight_ret_5 = ret5.mean(axis=1, skipna=True)
+    caps = close.mul(float_shares.reindex(close.columns).fillna(1.0), axis=1)
+    weights = caps.div(caps.sum(axis=1), axis=0)
+    weighted_ret_5 = weights.mul(ret5).sum(axis=1, min_count=1)
+
+    top1 = weights.apply(lambda row: float(row.dropna().nlargest(1).sum()) if row.notna().any() else np.nan, axis=1)
+    top3 = weights.apply(lambda row: float(row.dropna().nlargest(3).sum()) if row.notna().any() else np.nan, axis=1)
+    top10 = weights.apply(lambda row: float(row.dropna().nlargest(10).sum()) if row.notna().any() else np.nan, axis=1)
+
+    sector_raw, sector_mode = _compute_sector_concentration(weights, industries.reindex(close.columns).fillna('unknown'))
+    sector_concentration_20 = sector_raw.rolling(20, min_periods=5).mean()
+
+    corr_spike_values: list[float | None] = []
+    for i in range(len(ret1)):
+        start_i = max(0, i - 19)
+        corr_spike_values.append(_mean_pairwise_corr(ret1.iloc[start_i : i + 1]))
+    corr_spike_20 = pd.Series(corr_spike_values, index=ret1.index, dtype=float)
+
+    dispersion_20 = ret1.std(axis=1, skipna=True).rolling(20, min_periods=5).mean()
+
+    out = pd.DataFrame(
+        {
+            'pct_constituents_above_20dma': pct_above_20,
+            'pct_constituents_above_60dma': pct_above_60,
+            'pct_new_high_20': pct_new_high_20,
+            'pct_new_low_20': pct_new_low_20,
+            'eq_weight_ret_5': eq_weight_ret_5,
+            'weighted_ret_5': weighted_ret_5,
+            'top3_contribution_5': top3,
+            'top1_contribution_5': top1,
+            'top10_contribution_5': top10,
+            'sector_concentration_20': sector_concentration_20,
+            'corr_spike_20': corr_spike_20,
+            'dispersion_20': dispersion_20,
+        },
+        index=close.index,
+    )
+    out.index.name = 'date'
+    diagnostics = {
+        'sector_concentration_mode': sector_mode,
+    }
+    return out[list(BREADTH_REQUIRED_COLUMNS)], diagnostics
+
+
+def derive_breadth_sidecar(
+    *,
+    start_date: str | None,
+    end_date: str | None,
+    index_symbol: str = '399673',
+    mairui_licence: str | None = None,
+    min_active_constituents: int = 20,
+    max_constituents: int | None = None,
+    cache_dir: str | Path | None = None,
+    constituent_fetcher: Callable[[str], pd.DataFrame] | None = None,
+    stock_history_fetcher: Callable[[str, str | None, str | None], pd.DataFrame] | None = None,
+    stock_history_fallback_fetcher: Callable[[str, str | None, str | None, str], pd.DataFrame] | None = None,
+    stock_meta_fetcher: Callable[[str], Mapping[str, Any]] | None = None,
+    stock_meta_fallback_fetcher: Callable[[str, str], Mapping[str, Any]] | None = None,
+) -> tuple[pd.DataFrame, dict[str, Any]]:
+    fetch_constituents = constituent_fetcher or _fetch_constituents_akshare
+    fetch_history = stock_history_fetcher or _fetch_stock_history_akshare
+    fetch_history_fallback = stock_history_fallback_fetcher or _fetch_stock_history_mairui
+    fetch_meta = stock_meta_fetcher or _fetch_stock_meta_akshare
+    fetch_meta_fallback = stock_meta_fallback_fetcher or _fetch_stock_meta_mairui
+
+    constituents = fetch_constituents(index_symbol)
+    symbols = [str(sym).zfill(6) for sym in constituents['symbol'].tolist()]
+    if max_constituents is not None and int(max_constituents) > 0:
+        symbols = symbols[: int(max_constituents)]
+    if not symbols:
+        raise ValueError(f'No constituents available for index {index_symbol}.')
+
+    close_series_map: dict[str, pd.Series] = {}
+    industries: dict[str, str] = {}
+    float_shares: dict[str, float] = {}
+    provider_by_symbol: dict[str, str] = {}
+    missing_symbols: list[str] = []
+    errors: dict[str, str] = {}
+    metadata_provider_by_symbol: dict[str, str] = {}
+    metadata_errors: dict[str, str] = {}
+    cache_hit_count = 0
+    cache_miss_count = 0
+    cache_root: Path | None = Path(cache_dir) if cache_dir else None
+    meta_cache_hit_count = 0
+    meta_cache_miss_count = 0
+    meta_cache_root: dict[str, dict[str, Any]] = {}
+    meta_cache_path: Path | None = None
+    if cache_root is not None:
+        cache_root.mkdir(parents=True, exist_ok=True)
+        meta_cache_path = cache_root / '_meta_cache.json'
+        meta_cache_root = _load_cached_meta(meta_cache_path)
+
+    def _resolve_meta_with_cache(symbol: str) -> tuple[dict[str, Any], str, str | None]:
+        nonlocal meta_cache_hit_count, meta_cache_miss_count, meta_cache_root
+        cached_meta = meta_cache_root.get(symbol)
+        if cached_meta is not None:
+            meta_cache_hit_count += 1
+            return (
+                {
+                    'industry': str(cached_meta.get('industry') or 'unknown'),
+                    'float_shares': _to_float_or_none(cached_meta.get('float_shares')),
+                },
+                str(cached_meta.get('provider') or 'unknown'),
+                str(cached_meta.get('error') or '').strip() or None,
+            )
+
+        meta_cache_miss_count += 1
+        meta, provider, meta_error = _resolve_symbol_meta(
+            symbol=symbol,
+            mairui_licence=mairui_licence,
+            fetch_meta=fetch_meta,
+            fetch_meta_fallback=fetch_meta_fallback,
+        )
+        meta_cache_root[symbol] = {
+            'industry': str(meta.get('industry') or 'unknown'),
+            'float_shares': _to_float_or_none(meta.get('float_shares')),
+            'provider': provider,
+            'error': str(meta_error or ''),
+        }
+        return meta, provider, meta_error
+
+    for symbol in symbols:
+        cache_path = (cache_root / f'{symbol}.csv') if cache_root is not None else None
+        if cache_path is not None:
+            cached_panel = _load_cached_close_history(cache_path)
+            if cached_panel is not None and _history_covers_range(cached_panel, start_date, end_date):
+                close_series_map[symbol] = cached_panel['close']
+                provider_by_symbol[symbol] = 'cache'
+                cache_hit_count += 1
+                meta, metadata_provider, meta_error = _resolve_meta_with_cache(symbol)
+                metadata_provider_by_symbol[symbol] = metadata_provider
+                if meta_error:
+                    metadata_errors[symbol] = meta_error
+                industries[symbol] = str(meta.get('industry') or 'unknown')
+                float_value = _to_float_or_none(meta.get('float_shares'))
+                float_shares[symbol] = float_value if float_value and float_value > 0 else 1.0
+                continue
+            cache_miss_count += 1
+
+        panel: pd.DataFrame | None = None
+        provider = 'akshare'
+        try:
+            panel = fetch_history(symbol, start_date, end_date)
+            if panel is None or panel.empty:
+                raise ValueError('empty panel')
+        except Exception as primary_exc:
+            if mairui_licence:
+                try:
+                    panel = fetch_history_fallback(symbol, start_date, end_date, mairui_licence)
+                    provider = 'mairui'
+                except Exception as fallback_exc:
+                    errors[symbol] = f'akshare={primary_exc}; mairui={fallback_exc}'
+            else:
+                errors[symbol] = str(primary_exc)
+
+        if panel is None or panel.empty:
+            missing_symbols.append(symbol)
+            continue
+
+        normalized = _normalize_close_panel(panel.reset_index(), source_label=f'stock_{symbol}')
+        if normalized.empty:
+            missing_symbols.append(symbol)
+            errors[symbol] = 'empty normalized close series'
+            continue
+
+        close_series_map[symbol] = normalized['close']
+        provider_by_symbol[symbol] = provider
+        if cache_path is not None:
+            normalized.reset_index().to_csv(cache_path, index=False)
+        meta, metadata_provider, meta_error = _resolve_meta_with_cache(symbol)
+        metadata_provider_by_symbol[symbol] = metadata_provider
+        if meta_error:
+            metadata_errors[symbol] = meta_error
+        industries[symbol] = str(meta.get('industry') or 'unknown')
+        float_value = _to_float_or_none(meta.get('float_shares'))
+        float_shares[symbol] = float_value if float_value and float_value > 0 else 1.0
+
+    if not close_series_map:
+        raise ValueError(f'Unable to fetch any constituent history for index {index_symbol}.')
+
+    close_panel = pd.concat(close_series_map, axis=1).sort_index()
+    close_panel.index.name = 'date'
+    if start_date:
+        close_panel = close_panel.loc[close_panel.index >= pd.Timestamp(start_date)]
+    if end_date:
+        close_panel = close_panel.loc[close_panel.index <= pd.Timestamp(end_date)]
+    if close_panel.empty:
+        raise ValueError('Derived breadth close panel is empty after date filtering.')
+
+    active_count = close_panel.notna().sum(axis=1)
+    max_active = int(active_count.max()) if len(active_count) else 0
+    if max_active < int(min_active_constituents):
+        raise ValueError(
+            f'Derived breadth active constituent count too low: max_active={max_active}, '
+            f'min_required={int(min_active_constituents)}'
+        )
+
+    breadth, builder_diag = _build_required_breadth_columns(
+        close_panel=close_panel,
+        float_shares=pd.Series(float_shares, dtype=float),
+        industries=pd.Series(industries, dtype=str),
+    )
+
+    industry_series = pd.Series(industries, dtype=str)
+    industry_unknown_count = int(industry_series.str.lower().eq('unknown').sum())
+    industry_unique_count = int(industry_series.str.lower().replace('', 'unknown').nunique())
+
+    provider_counts = pd.Series(provider_by_symbol).value_counts().to_dict()
+    metadata_provider_counts = pd.Series(metadata_provider_by_symbol).value_counts().to_dict()
+    if meta_cache_path is not None:
+        _persist_cached_meta(meta_cache_path, meta_cache_root)
+    metadata = {
+        'index_symbol': str(index_symbol),
+        'membership_mode': 'latest_constituents_with_entry_dates',
+        'constituent_count_total': int(len(constituents)),
+        'constituent_count_requested': int(len(symbols)),
+        'constituent_count_used': int(len(close_series_map)),
+        'missing_symbols': sorted(missing_symbols),
+        'provider_by_symbol': provider_by_symbol,
+        'provider_counts': {str(k): int(v) for k, v in provider_counts.items()},
+        'metadata_provider_by_symbol': metadata_provider_by_symbol,
+        'metadata_provider_counts': {str(k): int(v) for k, v in metadata_provider_counts.items()},
+        'metadata_errors': metadata_errors,
+        'cache': {
+            'enabled': bool(cache_root is not None),
+            'path': str(cache_root) if cache_root is not None else None,
+            'hit_count': int(cache_hit_count),
+            'miss_count': int(cache_miss_count),
+        },
+        'meta_cache': {
+            'enabled': bool(meta_cache_path is not None),
+            'path': str(meta_cache_path) if meta_cache_path is not None else None,
+            'hit_count': int(meta_cache_hit_count),
+            'miss_count': int(meta_cache_miss_count),
+        },
+        'active_constituent_count': {
+            'min': int(active_count.min()) if len(active_count) else 0,
+            'median': float(active_count.median()) if len(active_count) else 0.0,
+            'max': max_active,
+        },
+        'date_start': breadth.index.min().date().isoformat() if len(breadth) else None,
+        'date_end': breadth.index.max().date().isoformat() if len(breadth) else None,
+        'row_count': int(len(breadth)),
+        'industry_unknown_count': industry_unknown_count,
+        'industry_unknown_ratio': float(industry_unknown_count / len(close_series_map)) if close_series_map else 1.0,
+        'industry_unique_count': industry_unique_count,
+        'sector_concentration_mode': builder_diag['sector_concentration_mode'],
+        'errors': errors,
+    }
+    return breadth, metadata
+
+
+def evaluate_breadth_source_integrity(
+    breadth: pd.DataFrame,
+    *,
+    required_columns: tuple[str, ...] | list[str] = BREADTH_REQUIRED_COLUMNS,
+    min_unique_non_null: int = 3,
+    max_dominant_value_ratio: float = 0.995,
+    std_floor: float = 1e-8,
+    strict: bool = False,
+    warmup_observations: Mapping[str, int] | None = None,
+) -> dict[str, Any]:
+    panel = breadth.copy()
+    panel.columns = [str(col).strip().lower() for col in panel.columns]
+    req = [str(col).strip().lower() for col in required_columns]
+    failures: list[dict[str, Any]] = []
+    warnings: list[dict[str, Any]] = []
+    column_stats: dict[str, dict[str, Any]] = {}
+    warmup_rules = {str(k).strip().lower(): int(v) for k, v in (warmup_observations or DEFAULT_WARMUP_OBSERVATIONS).items()}
+    warmup_exempt_columns: list[dict[str, Any]] = []
+
+    for column in req:
+        if column not in panel.columns:
+            failure = {'column': column, 'reason': 'missing_required_column'}
+            failures.append(failure)
+            warnings.append(failure)
+            column_stats[column] = {
+                'present': False,
+                'non_null_count': 0,
+                'unique_non_null_count': 0,
+                'dominant_value_ratio': 1.0,
+                'std': 0.0,
+            }
+            continue
+
+        values = pd.to_numeric(panel[column], errors='coerce').dropna()
+        non_null_count = int(values.shape[0])
+        unique_count = int(values.nunique())
+        std_value = float(values.std(ddof=0)) if non_null_count else 0.0
+        dominant_ratio = 1.0
+        if non_null_count:
+            dominant_ratio = float(values.value_counts(normalize=True, dropna=True).iloc[0])
+
+        column_stats[column] = {
+            'present': True,
+            'non_null_count': non_null_count,
+            'unique_non_null_count': unique_count,
+            'dominant_value_ratio': dominant_ratio,
+            'std': std_value,
+        }
+
+        required_obs = int(warmup_rules.get(column, 0))
+        if required_obs > 0 and non_null_count < required_obs:
+            warmup_exempt_columns.append(
+                {
+                    'column': column,
+                    'required_non_null': required_obs,
+                    'observed_non_null': non_null_count,
+                }
+            )
+            continue
+
+        if unique_count < int(min_unique_non_null):
+            item = {
+                'column': column,
+                'reason': 'low_unique_non_null',
+                'observed': unique_count,
+                'threshold': int(min_unique_non_null),
+            }
+            failures.append(item)
+            warnings.append(item)
+        if dominant_ratio > float(max_dominant_value_ratio):
+            item = {
+                'column': column,
+                'reason': 'dominant_value_ratio_too_high',
+                'observed': dominant_ratio,
+                'threshold': float(max_dominant_value_ratio),
+            }
+            failures.append(item)
+            warnings.append(item)
+        if std_value <= float(std_floor) and (
+            unique_count < int(min_unique_non_null) or dominant_ratio > float(max_dominant_value_ratio)
+        ):
+            item = {
+                'column': column,
+                'reason': 'std_below_floor',
+                'observed': std_value,
+                'threshold': float(std_floor),
+            }
+            failures.append(item)
+            warnings.append(item)
+
+    spread_stats: dict[str, Any] = {
+        'present': False,
+        'non_null_count': 0,
+        'unique_non_null_count': 0,
+        'dominant_value_ratio': 1.0,
+        'std': 0.0,
+    }
+    if {'eq_weight_ret_5', 'weighted_ret_5'}.issubset(panel.columns):
+        spread = pd.to_numeric(panel['weighted_ret_5'], errors='coerce') - pd.to_numeric(
+            panel['eq_weight_ret_5'], errors='coerce'
+        )
+        spread = spread.dropna()
+        non_null_count = int(spread.shape[0])
+        unique_count = int(spread.nunique()) if non_null_count else 0
+        std_value = float(spread.std(ddof=0)) if non_null_count else 0.0
+        dominant_ratio = 1.0
+        if non_null_count:
+            dominant_ratio = float(spread.value_counts(normalize=True, dropna=True).iloc[0])
+
+        spread_stats = {
+            'present': True,
+            'non_null_count': non_null_count,
+            'unique_non_null_count': unique_count,
+            'dominant_value_ratio': dominant_ratio,
+            'std': std_value,
+        }
+        spread_required_obs = int(warmup_rules.get('concentration_spread_5', 0))
+        if spread_required_obs > 0 and non_null_count < spread_required_obs:
+            warmup_exempt_columns.append(
+                {
+                    'column': 'concentration_spread_5',
+                    'required_non_null': spread_required_obs,
+                    'observed_non_null': non_null_count,
+                }
+            )
+        elif unique_count < int(min_unique_non_null) or std_value <= float(std_floor):
+            item = {
+                'column': 'concentration_spread_5',
+                'reason': 'constant_or_near_constant_spread',
+                'observed_unique_non_null': unique_count,
+                'observed_std': std_value,
+                'threshold_unique_non_null': int(min_unique_non_null),
+                'threshold_std_floor': float(std_floor),
+            }
+            failures.append(item)
+            warnings.append(item)
+
+    blocking = bool(strict and failures)
+    return {
+        'strict': bool(strict),
+        'passed': not failures,
+        'blocking': blocking,
+        'thresholds': {
+            'min_unique_non_null': int(min_unique_non_null),
+            'max_dominant_value_ratio': float(max_dominant_value_ratio),
+            'std_floor': float(std_floor),
+            'warmup_observations': warmup_rules,
+        },
+        'failures': failures,
+        'warnings': warnings,
+        'column_stats': column_stats,
+        'spread_stats': spread_stats,
+        'warmup_exempt_columns': warmup_exempt_columns,
+    }

+ 64 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/deliverables/gpt_pro_blockers_harden_derived_breadth_2026-04-09.md

@@ -0,0 +1,64 @@
+# GPT Pro Blocker Log - Harden Derived Breadth (2026-04-09)
+
+## Scope
+
+- Change: `harden-derived-breadth-production`
+- Pipeline: `pipelines/ingest_real_data.py` with `--derive-breadth` and full ChiNext50 constituents
+- Date window: `2020-01-01` to `2026-04-09`
+
+## Command Context
+
+```powershell
+py pipelines/ingest_real_data.py `
+  --provider csv `
+  --market-csv outputs/ingestion_derived_smoke_20260409_v12/raw/market.csv `
+  --hs300-csv outputs/ingestion_derived_smoke_20260409_v12/raw/hs300.csv `
+  --star50-csv outputs/ingestion_derived_smoke_20260409_v12/raw/star50.csv `
+  --csi1000-csv outputs/ingestion_derived_smoke_20260409_v12/raw/csi1000.csv `
+  --derive-breadth `
+  --breadth-min-active-constituents 20 `
+  --breadth-cache-dir outputs/ingestion_derived_full50_20260409_v2/raw/constituent_history `
+  --start-date 2020-01-01 `
+  --end-date 2026-04-09 `
+  --mairui-licence AE17EE23-AAE4-492F-A959-EC883DFA5A76 `
+  --strict-data `
+  --output-dir outputs/ingestion_derived_full50_20260409_v2
+```
+
+## Resolved in This Round
+
+1. Strict false-positive blocker was fixed:
+- Previous failure: `top10_contribution_5: std_below_floor`.
+- Fix: only trigger `std_below_floor` when low-std is accompanied by low uniqueness or dominant repeated values.
+- Current status: strict breadth integrity passes.
+
+2. Cache efficiency improved:
+- Added metadata cache (`_meta_cache.json`) under breadth cache dir.
+- First full50 run (cache warm-up): ~`1199.9s`.
+- Second full50 run (cache hit): ~`26.9s`.
+- Current summary shows `meta_cache.hit_count=50`, `meta_cache.miss_count=0`.
+
+## Remaining Blocker for GPT Pro Guidance
+
+### B-01: Industry metadata source remains degraded
+
+- Evidence file:
+  - `outputs/ingestion_derived_full50_20260409_v2/raw/breadth_derivation_summary.json`
+- Current metrics:
+  - `industry_unknown_ratio = 1.0`
+  - `industry_unique_count = 1`
+  - `sector_concentration_mode = weight_hhi_proxy`
+  - `metadata_error_symbol_count = 50`
+- Representative error:
+  - `akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests`
+
+### Why it matters
+
+- Full50 pipeline is runnable and strict-passable, but sector concentration currently falls back to HHI proxy due missing industry coverage.
+- This weakens interpretability of sector crowding diagnostics and may reduce regime explainability.
+
+### Questions for GPT Pro
+
+1. Should we promote a new primary metadata source (instead of Akshare `stock_individual_info_em`) for industry + float shares under sustained API instability?
+2. Should unknown-industry ratio become a strict blocking signal once a stable alternative source is introduced?
+3. Is it preferable to build an offline sector mapping layer (snapshot + periodic refresh) to remove runtime dependency on unstable metadata APIs?

+ 2 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/.openspec.yaml

@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-09

+ 59 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/design.md

@@ -0,0 +1,59 @@
+## Context
+
+Current derived breadth is functionally correct and strict-gated, but repeated runs initially failed to hit cache when request boundaries landed on non-trading days. That issue is now fixed locally. Remaining production-hardening gaps are:
+- Full-constituent runtime robustness under upstream throttling.
+- Industry metadata quality (`industry_unknown_ratio` can remain high when metadata fetch is unstable).
+- Report-reader compatibility for annual return delta key naming.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Ensure derived breadth full-universe runs are cache-efficient and resilient to transient upstream failures.
+- Improve industry metadata sourcing quality and auditability without silently masking data-loss conditions.
+- Emit stable, backward-compatible walk-forward comparison keys.
+- Keep a concrete issue log for GPT Pro escalation when provider-side blockers persist.
+
+**Non-Goals:**
+- No regime threshold tuning or exposure policy retuning.
+- No change to existing strict-data blocking semantics for PIT quality gate.
+- No change to walk-forward candidate selection logic.
+
+## Decisions
+
+1. Metadata fetch chain with explicit provenance
+- Decision: Build a metadata chain: cache/meta state -> Akshare metadata with retry -> optional Mairui industry classification fallback -> `unknown`.
+- Rationale: keeps derived breadth runnable under provider variance while still surfacing degraded metadata quality.
+- Alternative considered: hard-fail when metadata API fails. Rejected because it would make breadth derivation brittle for transient upstream outages.
+
+2. Keep metadata degradation non-blocking but auditable
+- Decision: unknown industry remains non-blocking for breadth publish, but derivation metadata records provider counts, unknown ratio, and per-symbol metadata errors.
+- Rationale: allows production continuity and preserves evidence for post-run review.
+- Alternative considered: turn unknown-industry ratio into strict blocker immediately. Rejected for now because upstream instability is external and would over-block pipeline runs.
+
+3. Stable comparison key compatibility
+- Decision: in `real_walkforward_summary.json`, emit both canonical keys and compatibility aliases (`annual_return_delta_vs_baseline`, `max_drawdown_delta_vs_baseline`).
+- Rationale: prevents consumer null reads caused by key drift across report readers.
+- Alternative considered: rename existing keys only. Rejected to avoid breaking existing readers expecting current names.
+
+4. Explicit issue-log artifact for escalations
+- Decision: maintain a markdown issue log under `deliverables/` for unresolved runtime blockers and include reproducible command/context.
+- Rationale: speeds GPT Pro handoff with concrete evidence.
+
+## Risks / Trade-offs
+
+- [Upstream rate limits on Akshare/Mairui] -> Add retry/backoff and cache-first execution; keep unresolved failures in issue log.
+- [Industry metadata still partial] -> Track unknown ratio and provider diagnostics to prevent silent quality assumptions.
+- [Backward-compat key duplication increases schema surface] -> Keep canonical keys unchanged and document aliases in spec/tests.
+
+## Migration Plan
+
+1. Update OpenSpec deltas and tasks for this change.
+2. Implement metadata chain and report-key compatibility.
+3. Add/extend tests for metadata fallback and comparison-key presence.
+4. Run targeted tests + full regression + strict spec validation.
+5. Run full50 derived breadth smoke and record blockers in issue log when external APIs throttle.
+
+## Open Questions
+
+- Whether to make unknown-industry ratio a blocking criterion after provider reliability is characterized over multiple days.
+- Whether Mairui should become primary metadata source when Akshare metadata endpoint remains unstable.

+ 25 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/proposal.md

@@ -0,0 +1,25 @@
+## Why
+
+Derived breadth has been validated on a 12-constituent smoke run, but production use requires full ChiNext 50 coverage with stable reruns under upstream rate limits. We also observed report-consumer confusion because some comparison readers expect `*_vs_baseline` keys that are not consistently present.
+
+## What Changes
+
+- Harden constituent-derived breadth for full-universe runs with cache-first behavior and boundary-tolerant cache reuse.
+- Add resilient constituent metadata sourcing (industry/float-share) with retry and optional fallback, plus explicit diagnostics in derivation metadata.
+- Standardize walk-forward comparison output by publishing backward-compatible delta keys for baseline comparisons.
+- Add an implementation issue log artifact that captures runtime blockers and unresolved upstream data issues for GPT Pro review.
+
+## Capabilities
+
+### New Capabilities
+- `implementation-issue-log`: Persist machine-readable and human-readable blocker notes for external review when upstream provider behavior blocks deterministic completion.
+
+### Modified Capabilities
+- `constituent-derived-breadth`: strengthen full-constituent execution, cache reuse semantics, and metadata-source resilience.
+- `real-walkforward-report`: require stable comparison field compatibility for downstream readers.
+
+## Impact
+
+- Affected code: `data/breadth_builder.py`, `data/ingestion.py`, `pipelines/real_walkforward_report.py`, related tests.
+- Outputs: derived breadth metadata, optional issue-log artifact, walk-forward summary JSON compatibility keys.
+- External dependencies: Akshare and Mairui endpoint reliability; fallback and diagnostics paths are expanded.

+ 35 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/constituent-derived-breadth/spec.md

@@ -0,0 +1,35 @@
+## MODIFIED Requirements
+
+### Requirement: Constituent-derived breadth sidecar
+The system MUST be able to derive the required ChiNext 50 breadth sidecar columns from constituent-level market histories when the user selects internal breadth derivation, and MUST support full latest-constituent execution by default when no max-constituent cap is configured.
+
+#### Scenario: Build breadth from constituents
+- **WHEN** ingestion runs with derived breadth enabled and no external breadth panel is provided
+- **THEN** the system MUST fetch constituent membership, stock-level histories, and required stock metadata
+- **AND** it MUST emit a breadth panel containing all required sidecar columns used by the PIT contract
+
+#### Scenario: Re-run with a valid local constituent-history cache
+- **WHEN** derived breadth runs again with the same date window and a valid local cache directory
+- **THEN** the system MUST prefer cached constituent histories
+- **AND** cache boundary checks MUST tolerate non-trading-day start/end mismatches within configured tolerance
+
+### Requirement: Hybrid constituent history fetching
+The system MUST support Akshare-first constituent history fetching with per-symbol Mairui fallback for missing or failed stock history calls, and SHALL record per-symbol provider and metadata-source diagnostics for audit.
+
+#### Scenario: Akshare stock history fails for one symbol
+- **WHEN** the breadth builder cannot fetch a constituent history from Akshare for a required symbol
+- **THEN** it MUST retry that symbol with Mairui when a valid licence is configured
+- **AND** it MUST record which provider supplied the final history for that symbol
+
+#### Scenario: Industry metadata fetch degrades
+- **WHEN** stock metadata fetch fails or yields missing industry values for some symbols
+- **THEN** the breadth builder MUST continue with documented fallback behavior
+- **AND** derivation metadata MUST include unknown-industry ratio, metadata error diagnostics, and metadata provider counts
+
+### Requirement: Auditable breadth derivation artifacts
+The system MUST persist metadata describing how the derived breadth panel was built, including cache hit/miss statistics and metadata-quality diagnostics.
+
+#### Scenario: Derived breadth publish succeeds
+- **WHEN** the breadth builder writes a derived sidecar
+- **THEN** it MUST also write machine-readable metadata that includes membership mode, provider usage, constituent counts, and missing-symbol diagnostics
+- **AND** it MUST include cache hit/miss counts, metadata provider counts, and industry coverage diagnostics

+ 8 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/implementation-issue-log/spec.md

@@ -0,0 +1,8 @@
+## ADDED Requirements
+
+### Requirement: Runtime issue log artifact
+The system MUST allow maintainers to persist a run-scoped issue log that records unresolved external data-provider blockers with reproducible context for external review.
+
+#### Scenario: Upstream provider blocks deterministic completion
+- **WHEN** ingestion or derivation encounters unresolved external failures (for example sustained 429/connection aborts) after configured retries
+- **THEN** maintainers MUST be able to persist an issue-log artifact containing timestamp, command context, affected component, observed error, and suggested escalation target

+ 9 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/specs/real-walkforward-report/spec.md

@@ -0,0 +1,9 @@
+## MODIFIED Requirements
+
+### Requirement: Real walk-forward comparative summary artifact
+The system MUST generate a machine-readable summary artifact that compares regime strategy and buy-and-hold baseline outcomes from full PIT input using the actual generated walk-forward windows, and MUST provide backward-compatible comparison delta keys for baseline readers.
+
+#### Scenario: Report pipeline succeeds
+- **WHEN** report pipeline runs with valid PIT input
+- **THEN** `real_walkforward_summary.json` MUST include the actual generated walk-forward windows used during evaluation
+- **AND** the comparison block MUST include canonical delta fields plus compatibility aliases: `annual_return_delta_vs_baseline` and `max_drawdown_delta_vs_baseline`

+ 21 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/changes/archive/2026-04-09-harden-derived-breadth-production/tasks.md

@@ -0,0 +1,21 @@
+## 1. OpenSpec Alignment
+
+- [x] 1.1 Finalize proposal/design/spec deltas for full50 hardening, metadata resilience, and report compatibility.
+- [x] 1.2 Validate change artifacts are unblocked and apply-ready via OpenSpec status/instructions.
+
+## 2. Derived Breadth Hardening
+
+- [x] 2.1 Add resilient metadata fetch chain and metadata diagnostics fields in breadth derivation summary.
+- [x] 2.2 Ensure cache-first reruns and provider diagnostics remain correct on full50 derivation runs.
+- [x] 2.3 Add/extend unit tests for metadata fallback and cache behavior.
+
+## 3. Walkforward Summary Compatibility
+
+- [x] 3.1 Add backward-compatible comparison alias keys in `real_walkforward_summary.json`.
+- [x] 3.2 Add/extend tests to assert alias keys are emitted and numerically consistent.
+
+## 4. Validation and Handoff
+
+- [x] 4.1 Run targeted tests plus full regression and strict OpenSpec validation.
+- [x] 4.2 Run full50 derived-breadth smoke; record unresolved upstream blockers in issue log artifact for GPT Pro escalation.
+- [x] 4.3 Update memory logs with results and blocker status.

+ 39 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/constituent-derived-breadth/spec.md

@@ -0,0 +1,39 @@
+# constituent-derived-breadth Specification
+
+## Purpose
+TBD - created by archiving change derive-real-breadth-sidecar. Update Purpose after archive.
+## Requirements
+### Requirement: Constituent-derived breadth sidecar
+The system MUST be able to derive the required ChiNext 50 breadth sidecar columns from constituent-level market histories when the user selects internal breadth derivation, and MUST support full latest-constituent execution by default when no max-constituent cap is configured.
+
+#### Scenario: Build breadth from constituents
+- **WHEN** ingestion runs with derived breadth enabled and no external breadth panel is provided
+- **THEN** the system MUST fetch constituent membership, stock-level histories, and required stock metadata
+- **AND** it MUST emit a breadth panel containing all required sidecar columns used by the PIT contract
+
+#### Scenario: Re-run with a valid local constituent-history cache
+- **WHEN** derived breadth runs again with the same date window and a valid local cache directory
+- **THEN** the system MUST prefer cached constituent histories
+- **AND** cache boundary checks MUST tolerate non-trading-day start/end mismatches within configured tolerance
+
+### Requirement: Hybrid constituent history fetching
+The system MUST support Akshare-first constituent history fetching with per-symbol Mairui fallback for missing or failed stock history calls, and SHALL record per-symbol provider and metadata-source diagnostics for audit.
+
+#### Scenario: Akshare stock history fails for one symbol
+- **WHEN** the breadth builder cannot fetch a constituent history from Akshare for a required symbol
+- **THEN** it MUST retry that symbol with Mairui when a valid licence is configured
+- **AND** it MUST record which provider supplied the final history for that symbol
+
+#### Scenario: Industry metadata fetch degrades
+- **WHEN** stock metadata fetch fails or yields missing industry values for some symbols
+- **THEN** the breadth builder MUST continue with documented fallback behavior
+- **AND** derivation metadata MUST include unknown-industry ratio, metadata error diagnostics, and metadata provider counts
+
+### Requirement: Auditable breadth derivation artifacts
+The system MUST persist metadata describing how the derived breadth panel was built, including cache hit/miss statistics and metadata-quality diagnostics.
+
+#### Scenario: Derived breadth publish succeeds
+- **WHEN** the breadth builder writes a derived sidecar
+- **THEN** it MUST also write machine-readable metadata that includes membership mode, provider usage, constituent counts, and missing-symbol diagnostics
+- **AND** it MUST include cache hit/miss counts, metadata provider counts, and industry coverage diagnostics
+

+ 12 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/implementation-issue-log/spec.md

@@ -0,0 +1,12 @@
+# implementation-issue-log Specification
+
+## Purpose
+TBD - created by archiving change harden-derived-breadth-production. Update Purpose after archive.
+## Requirements
+### Requirement: Runtime issue log artifact
+The system MUST allow maintainers to persist a run-scoped issue log that records unresolved external data-provider blockers with reproducible context for external review.
+
+#### Scenario: Upstream provider blocks deterministic completion
+- **WHEN** ingestion or derivation encounters unresolved external failures (for example sustained 429/connection aborts) after configured retries
+- **THEN** maintainers MUST be able to persist an issue-log artifact containing timestamp, command context, affected component, observed error, and suggested escalation target
+

+ 27 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/openspec/specs/real-walkforward-report/spec.md

@@ -0,0 +1,27 @@
+# real-walkforward-report Specification
+
+## Purpose
+TBD - created by archiving change add-real-walkforward-report. Update Purpose after archive.
+## Requirements
+### Requirement: Real walk-forward comparative summary artifact
+The system MUST generate a machine-readable summary artifact that compares regime strategy and buy-and-hold baseline outcomes from full PIT input using the actual generated walk-forward windows, and MUST provide backward-compatible comparison delta keys for baseline readers.
+
+#### Scenario: Report pipeline succeeds
+- **WHEN** report pipeline runs with valid PIT input
+- **THEN** `real_walkforward_summary.json` MUST include the actual generated walk-forward windows used during evaluation
+- **AND** the comparison block MUST include canonical delta fields plus compatibility aliases: `annual_return_delta_vs_baseline` and `max_drawdown_delta_vs_baseline`
+
+### Requirement: Human-readable report artifact
+The system SHALL generate a markdown report summarizing core evidence for review.
+
+#### Scenario: Markdown report generation
+- **WHEN** summary generation completes
+- **THEN** output directory SHALL contain `real_walkforward_report.md` with drawdown ratio, upside capture, and utility comparison
+
+### Requirement: Reuse frozen walk-forward evaluation
+The system MUST evaluate strategy windows using existing frozen train-select/test-freeze mechanics.
+
+#### Scenario: Frozen board output in report pipeline
+- **WHEN** report pipeline runs
+- **THEN** output directory MUST contain `frozen_validation_board.csv` compatible with existing audit fields
+

+ 15 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/ingestion_manifest.json

@@ -0,0 +1,15 @@
+{
+  "raw_market_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\market.csv",
+  "raw_hs300_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\hs300.csv",
+  "raw_star50_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\star50.csv",
+  "raw_csi1000_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\csi1000.csv",
+  "raw_breadth_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\breadth.csv",
+  "breadth_source": "derived",
+  "breadth_derivation_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\breadth_derivation_summary.json",
+  "breadth_integrity_path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\breadth_integrity_summary.json",
+  "staging_market_path": "outputs\\ingestion_derived_full50_20260409_v2\\staging\\market.csv",
+  "staging_sidecar_path": "outputs\\ingestion_derived_full50_20260409_v2\\staging\\sidecar.csv",
+  "pit_output_path": "outputs\\ingestion_derived_full50_20260409_v2\\pit\\chinext50_pit.csv",
+  "quality_summary_path": "outputs\\ingestion_derived_full50_20260409_v2\\pit\\pit_quality_summary.json",
+  "row_count": 1517
+}

+ 231 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/pit/pit_quality_summary.json

@@ -0,0 +1,231 @@
+{
+  "mode": "strict",
+  "strict": true,
+  "passed": true,
+  "blocking": false,
+  "blocking_columns": [
+    "close",
+    "corr_spike_20",
+    "csi1000_close",
+    "dispersion_20",
+    "eq_weight_ret_5",
+    "high",
+    "hs300_close",
+    "low",
+    "open",
+    "pct_constituents_above_20dma",
+    "pct_constituents_above_60dma",
+    "pct_new_high_20",
+    "pct_new_low_20",
+    "star50_close",
+    "top3_contribution_5",
+    "volume",
+    "weighted_ret_5"
+  ],
+  "default_min_coverage": 0.95,
+  "column_min_coverage": {
+    "open": 0.95,
+    "high": 0.95,
+    "low": 0.95,
+    "close": 0.95,
+    "volume": 0.95,
+    "hs300_close": 0.95,
+    "star50_close": 0.95,
+    "csi1000_close": 0.95,
+    "pct_constituents_above_20dma": 0.95,
+    "pct_constituents_above_60dma": 0.95,
+    "pct_new_high_20": 0.95,
+    "pct_new_low_20": 0.95,
+    "eq_weight_ret_5": 0.95,
+    "weighted_ret_5": 0.95,
+    "top3_contribution_5": 0.95,
+    "top1_contribution_5": 0.95,
+    "top10_contribution_5": 0.95,
+    "sector_concentration_20": 0.95,
+    "corr_spike_20": 0.95,
+    "dispersion_20": 0.95
+  },
+  "critical_columns": [
+    "open",
+    "high",
+    "low",
+    "close",
+    "volume",
+    "hs300_close",
+    "star50_close",
+    "csi1000_close",
+    "pct_constituents_above_20dma",
+    "pct_constituents_above_60dma",
+    "pct_new_high_20",
+    "pct_new_low_20",
+    "eq_weight_ret_5",
+    "weighted_ret_5",
+    "top3_contribution_5",
+    "top1_contribution_5",
+    "top10_contribution_5",
+    "sector_concentration_20",
+    "corr_spike_20",
+    "dispersion_20"
+  ],
+  "breaches": [],
+  "errors": [],
+  "warnings": [],
+  "quality_report": {
+    "row_count": 1517,
+    "date_start": "2020-01-02",
+    "date_end": "2026-04-09",
+    "duplicate_date_count": 0,
+    "columns": {
+      "open": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "high": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "low": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "close": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "volume": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "hs300_close": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "star50_close": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "csi1000_close": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "pct_constituents_above_20dma": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "pct_constituents_above_60dma": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "pct_new_high_20": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "pct_new_low_20": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "eq_weight_ret_5": {
+        "present": true,
+        "non_null_count": 1512,
+        "non_null_ratio": 0.996704021094265,
+        "missing_ratio": 0.003295978905734964
+      },
+      "weighted_ret_5": {
+        "present": true,
+        "non_null_count": 1512,
+        "non_null_ratio": 0.996704021094265,
+        "missing_ratio": 0.003295978905734964
+      },
+      "top3_contribution_5": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "top1_contribution_5": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "top10_contribution_5": {
+        "present": true,
+        "non_null_count": 1517,
+        "non_null_ratio": 1.0,
+        "missing_ratio": 0.0
+      },
+      "sector_concentration_20": {
+        "present": true,
+        "non_null_count": 1513,
+        "non_null_ratio": 0.997363216875412,
+        "missing_ratio": 0.002636783124588038
+      },
+      "corr_spike_20": {
+        "present": true,
+        "non_null_count": 1512,
+        "non_null_ratio": 0.996704021094265,
+        "missing_ratio": 0.003295978905734964
+      },
+      "dispersion_20": {
+        "present": true,
+        "non_null_count": 1512,
+        "non_null_ratio": 0.996704021094265,
+        "missing_ratio": 0.003295978905734964
+      }
+    }
+  },
+  "sources": {
+    "market_path": "outputs\\ingestion_derived_full50_20260409_v2\\staging\\market.csv",
+    "sidecar_paths": [
+      "outputs\\ingestion_derived_full50_20260409_v2\\staging\\sidecar.csv"
+    ],
+    "sidecar_count": 1,
+    "merged_row_count": 1517
+  },
+  "pit_columns": [
+    "close",
+    "corr_spike_20",
+    "csi1000_close",
+    "dispersion_20",
+    "eq_weight_ret_5",
+    "high",
+    "hs300_close",
+    "low",
+    "open",
+    "pct_constituents_above_20dma",
+    "pct_constituents_above_60dma",
+    "pct_new_high_20",
+    "pct_new_low_20",
+    "sector_concentration_20",
+    "star50_close",
+    "top10_contribution_5",
+    "top1_contribution_5",
+    "top3_contribution_5",
+    "volume",
+    "weighted_ret_5"
+  ]
+}

+ 196 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/raw/breadth_derivation_summary.json

@@ -0,0 +1,196 @@
+{
+  "index_symbol": "399673",
+  "membership_mode": "latest_constituents_with_entry_dates",
+  "constituent_count_total": 50,
+  "constituent_count_requested": 50,
+  "constituent_count_used": 50,
+  "missing_symbols": [],
+  "provider_by_symbol": {
+    "300002": "cache",
+    "300014": "cache",
+    "300015": "cache",
+    "300017": "cache",
+    "300024": "cache",
+    "300033": "cache",
+    "300058": "cache",
+    "300059": "cache",
+    "300073": "cache",
+    "300115": "cache",
+    "300122": "cache",
+    "300124": "cache",
+    "300136": "cache",
+    "300207": "cache",
+    "300223": "cache",
+    "300251": "cache",
+    "300255": "cache",
+    "300274": "cache",
+    "300308": "cache",
+    "300316": "cache",
+    "300339": "cache",
+    "300346": "cache",
+    "300347": "cache",
+    "300373": "cache",
+    "300394": "cache",
+    "300395": "cache",
+    "300408": "cache",
+    "300418": "cache",
+    "300433": "cache",
+    "300442": "cache",
+    "300450": "cache",
+    "300458": "cache",
+    "300474": "cache",
+    "300476": "cache",
+    "300496": "cache",
+    "300502": "cache",
+    "300548": "cache",
+    "300604": "cache",
+    "300724": "cache",
+    "300748": "cache",
+    "300750": "cache",
+    "300759": "cache",
+    "300760": "cache",
+    "300763": "cache",
+    "300782": "cache",
+    "300803": "cache",
+    "300857": "akshare",
+    "301236": "akshare",
+    "301308": "akshare",
+    "302132": "cache"
+  },
+  "provider_counts": {
+    "cache": 47,
+    "akshare": 3
+  },
+  "metadata_provider_by_symbol": {
+    "300002": "akshare",
+    "300014": "akshare",
+    "300015": "akshare",
+    "300017": "akshare",
+    "300024": "akshare",
+    "300033": "akshare",
+    "300058": "akshare",
+    "300059": "akshare",
+    "300073": "akshare",
+    "300115": "akshare",
+    "300122": "akshare",
+    "300124": "akshare",
+    "300136": "akshare",
+    "300207": "akshare",
+    "300223": "akshare",
+    "300251": "akshare",
+    "300255": "akshare",
+    "300274": "akshare",
+    "300308": "akshare",
+    "300316": "akshare",
+    "300339": "akshare",
+    "300346": "akshare",
+    "300347": "akshare",
+    "300373": "akshare",
+    "300394": "akshare",
+    "300395": "akshare",
+    "300408": "akshare",
+    "300418": "akshare",
+    "300433": "akshare",
+    "300442": "akshare",
+    "300450": "akshare",
+    "300458": "akshare",
+    "300474": "akshare",
+    "300476": "akshare",
+    "300496": "akshare",
+    "300502": "akshare",
+    "300548": "akshare",
+    "300604": "akshare",
+    "300724": "akshare",
+    "300748": "akshare",
+    "300750": "akshare",
+    "300759": "akshare",
+    "300760": "akshare",
+    "300763": "akshare",
+    "300782": "akshare",
+    "300803": "akshare",
+    "300857": "akshare",
+    "301236": "akshare",
+    "301308": "akshare",
+    "302132": "akshare"
+  },
+  "metadata_provider_counts": {
+    "akshare": 50
+  },
+  "metadata_errors": {
+    "300002": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300014": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300015": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300017": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300024": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300033": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300058": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300059": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300073": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300115": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300122": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300124": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300136": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300207": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300223": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300251": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300255": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300274": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300308": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300316": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300339": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300346": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300347": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300373": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300394": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300395": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300408": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300418": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300433": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300442": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300450": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300458": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300474": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300476": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300496": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300502": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300548": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300604": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300724": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300748": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300750": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300759": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300760": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300763": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300782": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300803": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "300857": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "301236": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "301308": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests",
+    "302132": "akshare_meta=('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')); mairui_meta=HTTP Error 429: Too Many Requests"
+  },
+  "cache": {
+    "enabled": true,
+    "path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\constituent_history",
+    "hit_count": 47,
+    "miss_count": 3
+  },
+  "meta_cache": {
+    "enabled": true,
+    "path": "outputs\\ingestion_derived_full50_20260409_v2\\raw\\constituent_history\\_meta_cache.json",
+    "hit_count": 50,
+    "miss_count": 0
+  },
+  "active_constituent_count": {
+    "min": 46,
+    "median": 50.0,
+    "max": 50
+  },
+  "date_start": "2020-01-02",
+  "date_end": "2026-04-09",
+  "row_count": 1517,
+  "industry_unknown_count": 50,
+  "industry_unknown_ratio": 1.0,
+  "industry_unique_count": 1,
+  "sector_concentration_mode": "weight_hhi_proxy",
+  "errors": {}
+}

+ 116 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/ingestion_derived_full50_20260409_v2/raw/breadth_integrity_summary.json

@@ -0,0 +1,116 @@
+{
+  "strict": true,
+  "passed": true,
+  "blocking": false,
+  "thresholds": {
+    "min_unique_non_null": 3,
+    "max_dominant_value_ratio": 0.995,
+    "std_floor": 1e-08,
+    "warmup_observations": {
+      "pct_constituents_above_20dma": 40,
+      "pct_constituents_above_60dma": 60,
+      "pct_new_high_20": 40,
+      "pct_new_low_20": 40,
+      "sector_concentration_20": 40,
+      "corr_spike_20": 20,
+      "dispersion_20": 20,
+      "concentration_spread_5": 20
+    }
+  },
+  "failures": [],
+  "warnings": [],
+  "column_stats": {
+    "pct_constituents_above_20dma": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 176,
+      "dominant_value_ratio": 0.01977587343441002,
+      "std": 0.25532316283592393
+    },
+    "pct_constituents_above_60dma": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 161,
+      "dominant_value_ratio": 0.03889255108767304,
+      "std": 0.25942098307480166
+    },
+    "pct_new_high_20": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 104,
+      "dominant_value_ratio": 0.14765985497692816,
+      "std": 0.12537656300029865
+    },
+    "pct_new_low_20": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 114,
+      "dominant_value_ratio": 0.23335530652603823,
+      "std": 0.1556659475544428
+    },
+    "eq_weight_ret_5": {
+      "present": true,
+      "non_null_count": 1512,
+      "unique_non_null_count": 1512,
+      "dominant_value_ratio": 0.0006613756613756613,
+      "std": 0.049401440837184375
+    },
+    "weighted_ret_5": {
+      "present": true,
+      "non_null_count": 1512,
+      "unique_non_null_count": 1512,
+      "dominant_value_ratio": 0.0006613756613756613,
+      "std": 0.05241031992692125
+    },
+    "top3_contribution_5": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 1517,
+      "dominant_value_ratio": 0.0006591957811470006,
+      "std": 0.04295657153500565
+    },
+    "top1_contribution_5": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 1517,
+      "dominant_value_ratio": 0.0006591957811470006,
+      "std": 0.03209071605011246
+    },
+    "top10_contribution_5": {
+      "present": true,
+      "non_null_count": 1517,
+      "unique_non_null_count": 1517,
+      "dominant_value_ratio": 0.0006591957811470006,
+      "std": 0.04519791519215887
+    },
+    "sector_concentration_20": {
+      "present": true,
+      "non_null_count": 1513,
+      "unique_non_null_count": 1513,
+      "dominant_value_ratio": 0.0006609385327164573,
+      "std": 0.009519934161922521
+    },
+    "corr_spike_20": {
+      "present": true,
+      "non_null_count": 1512,
+      "unique_non_null_count": 1512,
+      "dominant_value_ratio": 0.0006613756613756613,
+      "std": 0.13406390221983563
+    },
+    "dispersion_20": {
+      "present": true,
+      "non_null_count": 1512,
+      "unique_non_null_count": 1512,
+      "dominant_value_ratio": 0.0006613756613756613,
+      "std": 0.00531184282605376
+    }
+  },
+  "spread_stats": {
+    "present": true,
+    "non_null_count": 1512,
+    "unique_non_null_count": 1512,
+    "dominant_value_ratio": 0.0006613756613756613,
+    "std": 0.016417905718202737
+  },
+  "warmup_exempt_columns": []
+}

+ 91 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/calibration/execution_calibration_recommendation.json

@@ -0,0 +1,91 @@
+{
+  "input": {
+    "pit_path": "outputs/ingestion_derived_full50_20260409_v2/pit/chinext50_pit.csv",
+    "row_count": 1517,
+    "date_start": "2020-01-02",
+    "date_end": "2026-04-09"
+  },
+  "score_formula": "0.60*utility_total_score + 0.25*annual_return + 0.15*upside_capture - 0.50*max_drawdown - 2.0*max(0, tracking_error_20_p95 - 0.003) - 1.0*max(0, tracking_diff_abs_mean - 0.001)",
+  "search_space": {
+    "cost_multipliers": [
+      1.0,
+      1.25,
+      1.5,
+      1.75
+    ],
+    "gap_slippage_factors": [
+      0.0,
+      0.01,
+      0.02,
+      0.03
+    ],
+    "combination_count": 16
+  },
+  "recommended": {
+    "extreme_day_cost_multiplier": 1.0,
+    "gap_slippage_factor": 0.0,
+    "calibration_score": -0.14537664793101576
+  },
+  "top_candidates": [
+    {
+      "extreme_day_cost_multiplier": 1.0,
+      "gap_slippage_factor": 0.0,
+      "calibration_score": -0.14537664793101576,
+      "utility_total_score": -0.09927909160630177,
+      "annual_return": 0.069958326797535,
+      "sharpe": 0.540206392648576,
+      "max_drawdown": 0.28459241169609906,
+      "tracking_diff_mean": -5.7519788918205824e-05,
+      "tracking_diff_abs_mean": 5.7519788918205824e-05,
+      "tracking_error_20_p95": 0.00010210417796493308
+    },
+    {
+      "extreme_day_cost_multiplier": 1.25,
+      "gap_slippage_factor": 0.0,
+      "calibration_score": -0.14618201694246757,
+      "utility_total_score": -0.10037243303111554,
+      "annual_return": 0.06966481432183369,
+      "sharpe": 0.5379928664965591,
+      "max_drawdown": 0.28472130611720226,
+      "tracking_diff_mean": -5.86147757255937e-05,
+      "tracking_diff_abs_mean": 5.86147757255937e-05,
+      "tracking_error_20_p95": 0.00010574148733395242
+    },
+    {
+      "extreme_day_cost_multiplier": 1.5,
+      "gap_slippage_factor": 0.0,
+      "calibration_score": -0.14698743456043617,
+      "utility_total_score": -0.10146589874948211,
+      "annual_return": 0.06937136905761121,
+      "sharpe": 0.5357790370434938,
+      "max_drawdown": 0.28485018220481506,
+      "tracking_diff_mean": -5.970976253298155e-05,
+      "tracking_diff_abs_mean": 5.970976253298155e-05,
+      "tracking_error_20_p95": 0.00011287997169560831
+    },
+    {
+      "extreme_day_cost_multiplier": 1.75,
+      "gap_slippage_factor": 0.0,
+      "calibration_score": -0.14779289957559236,
+      "utility_total_score": -0.1025594867390609,
+      "annual_return": 0.06907799099240419,
+      "sharpe": 0.533564908786306,
+      "max_drawdown": 0.28497903996085616,
+      "tracking_diff_mean": -6.080474934036943e-05,
+      "tracking_diff_abs_mean": 6.080474934036943e-05,
+      "tracking_error_20_p95": 0.0001225963582134463
+    },
+    {
+      "extreme_day_cost_multiplier": 1.0,
+      "gap_slippage_factor": 0.01,
+      "calibration_score": -0.14893399304042237,
+      "utility_total_score": -0.10375277939061703,
+      "annual_return": 0.06889820854284756,
+      "sharpe": 0.5320341928422913,
+      "max_drawdown": 0.2857323256665525,
+      "tracking_diff_mean": -6.145599247339194e-05,
+      "tracking_diff_abs_mean": 6.145599247339194e-05,
+      "tracking_error_20_p95": 0.00011438040472625246
+    }
+  ]
+}

+ 29 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/demo/metrics_summary.json

@@ -0,0 +1,29 @@
+{
+  "annual_return": 0.0672532209047465,
+  "annual_vol": 0.12947165554280526,
+  "sharpe": 0.5194435849513918,
+  "max_drawdown": 0.28712739301380297,
+  "calmar": 0.23422781156068054,
+  "benchmark_return": 0.14823667844896993,
+  "benchmark_vol": 0.33820228701188976,
+  "benchmark_sharpe": 0.43830773516844534,
+  "benchmark_max_drawdown": 0.601496466295161,
+  "sharpe_delta": 0.08113584978294641,
+  "drawdown_improvement_ratio": 0.5226449212871911,
+  "upside_capture": 0.25932041164946695,
+  "downside_capture": 0.24652910601340003,
+  "annual_turnover": 18.11873350923483,
+  "tracking_diff_mean": -6.758216964335376e-05,
+  "tracking_diff_abs_mean": 6.758216964335376e-05,
+  "tracking_error_20_p95": 0.00013785089106784778,
+  "utility_total_score": -0.11040750752007417,
+  "utility_status": "negative_utility",
+  "state_counts": {
+    "risk_off": 504,
+    "chop": 470,
+    "repair": 308,
+    "warmup": 88,
+    "trend": 87,
+    "euphoric_late": 60
+  }
+}

+ 72 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/frozen/frozen_validation_summary.json

@@ -0,0 +1,72 @@
+{
+  "window_count": 5,
+  "processed_window_count": 5,
+  "skipped_window_count": 0,
+  "positive_window_ratio": 0.4,
+  "selected_candidate_distribution": {
+    "defensive": 5
+  },
+  "window_status_counts": {
+    "ok": 5
+  },
+  "candidate_ids": [
+    "defensive",
+    "baseline",
+    "pro_risk"
+  ],
+  "min_train_rows": 120,
+  "min_test_rows": 40,
+  "windows": [
+    {
+      "train_start": "2020-01-02",
+      "train_end": "2021-12-31",
+      "test_start": "2022-01-04",
+      "test_end": "2022-12-30"
+    },
+    {
+      "train_start": "2020-01-02",
+      "train_end": "2022-12-30",
+      "test_start": "2023-01-03",
+      "test_end": "2023-12-29"
+    },
+    {
+      "train_start": "2020-01-02",
+      "train_end": "2023-12-29",
+      "test_start": "2024-01-02",
+      "test_end": "2024-12-31"
+    },
+    {
+      "train_start": "2020-01-02",
+      "train_end": "2024-12-31",
+      "test_start": "2025-01-02",
+      "test_end": "2025-12-31"
+    },
+    {
+      "train_start": "2020-01-02",
+      "train_end": "2025-12-31",
+      "test_start": "2026-01-05",
+      "test_end": "2026-04-09"
+    }
+  ],
+  "full_sample_metrics": {
+    "annual_return": 0.0672532209047465,
+    "annual_vol": 0.12947165554280526,
+    "sharpe": 0.5194435849513918,
+    "max_drawdown": 0.28712739301380297,
+    "calmar": 0.23422781156068054,
+    "benchmark_return": 0.14823667844896993,
+    "benchmark_vol": 0.33820228701188976,
+    "benchmark_sharpe": 0.43830773516844534,
+    "benchmark_max_drawdown": 0.601496466295161,
+    "sharpe_delta": 0.08113584978294641,
+    "drawdown_improvement_ratio": 0.5226449212871911,
+    "upside_capture": 0.25932041164946695,
+    "downside_capture": 0.24652910601340003,
+    "annual_turnover": 18.11873350923483,
+    "tracking_diff_mean": -6.758216964335376e-05,
+    "tracking_diff_abs_mean": 6.758216964335376e-05,
+    "tracking_error_20_p95": 0.00013785089106784778,
+    "utility_total_score": -0.11040750752007417,
+    "utility_status": "negative_utility"
+  }
+}

+ 110 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/outputs/system_e2e_derived_full50_v2/report/real_walkforward_summary.json

@@ -0,0 +1,110 @@
+{
+  "input": {
+    "pit_path": "outputs/ingestion_derived_full50_20260409_v2/pit/chinext50_pit.csv",
+    "row_count": 1517,
+    "date_start": "2020-01-02",
+    "date_end": "2026-04-09"
+  },
+  "frozen_walkforward": {
+    "total_windows": 5,
+    "processed_window_count": 5,
+    "skipped_window_count": 0,
+    "positive_window_ratio": 0.4,
+    "selected_candidate_distribution": {
+      "defensive": 5
+    },
+    "window_status_counts": {
+      "ok": 5
+    },
+    "candidate_ids": [
+      "defensive",
+      "baseline",
+      "pro_risk"
+    ],
+    "min_train_rows": 120,
+    "min_test_rows": 40,
+    "windows": [
+      {
+        "train_start": "2020-01-02",
+        "train_end": "2021-12-31",
+        "test_start": "2022-01-04",
+        "test_end": "2022-12-30"
+      },
+      {
+        "train_start": "2020-01-02",
+        "train_end": "2022-12-30",
+        "test_start": "2023-01-03",
+        "test_end": "2023-12-29"
+      },
+      {
+        "train_start": "2020-01-02",
+        "train_end": "2023-12-29",
+        "test_start": "2024-01-02",
+        "test_end": "2024-12-31"
+      },
+      {
+        "train_start": "2020-01-02",
+        "train_end": "2024-12-31",
+        "test_start": "2025-01-02",
+        "test_end": "2025-12-31"
+      },
+      {
+        "train_start": "2020-01-02",
+        "train_end": "2025-12-31",
+        "test_start": "2026-01-05",
+        "test_end": "2026-04-09"
+      }
+    ]
+  },
+  "strategy_full_sample_metrics": {
+    "annual_return": 0.0672532209047465,
+    "annual_vol": 0.12947165554280526,
+    "sharpe": 0.5194435849513918,
+    "max_drawdown": 0.28712739301380297,
+    "calmar": 0.23422781156068054,
+    "benchmark_return": 0.14823667844896993,
+    "benchmark_vol": 0.33820228701188976,
+    "benchmark_sharpe": 0.43830773516844534,
+    "benchmark_max_drawdown": 0.601496466295161,
+    "sharpe_delta": 0.08113584978294641,
+    "drawdown_improvement_ratio": 0.5226449212871911,
+    "upside_capture": 0.25932041164946695,
+    "downside_capture": 0.24652910601340003,
+    "annual_turnover": 18.11873350923483,
+    "tracking_diff_mean": -6.758216964335376e-05,
+    "tracking_diff_abs_mean": 6.758216964335376e-05,
+    "tracking_error_20_p95": 0.00013785089106784778,
+    "utility_total_score": -0.11040750752007417,
+    "utility_status": "negative_utility"
+  },
+  "baseline_full_sample_metrics": {
+    "annual_return": 0.1462534784020535,
+    "annual_vol": 0.33818575167159715,
+    "sharpe": 0.43246493289308124,
+    "max_drawdown": 0.6014964662951608,
+    "calmar": 0.24314935597690665,
+    "benchmark_return": 0.14823667844896993,
+    "benchmark_vol": 0.33820228701188976,
+    "benchmark_sharpe": 0.43830773516844534,
+    "benchmark_max_drawdown": 0.601496466295161,
+    "sharpe_delta": -0.005842802275364101,
+    "drawdown_improvement_ratio": 1.8457681579800966e-16,
+    "upside_capture": 0.9991958174462328,
+    "downside_capture": 1.0000762649568267,
+    "annual_turnover": 0.16622691292875988,
+    "tracking_diff_mean": -5.42895437297208e-07,
+    "tracking_diff_abs_mean": 5.42895437297208e-07,
+    "tracking_error_20_p95": 0.0,
+    "utility_total_score": 0.03475011159302114,
+    "utility_status": "positive_utility"
+  },
+  "comparison": {
+    "annual_return_delta": -0.07900025749730699,
+    "annual_return_delta_vs_baseline": -0.07900025749730699,
+    "max_drawdown_delta": -0.3143690732813579,
+    "max_drawdown_delta_vs_baseline": -0.3143690732813579,
+    "drawdown_ratio_vs_baseline": 0.47735507871280897,
+    "utility_delta_vs_baseline": -0.1451576191130953,
+    "upside_capture": 0.25932041164946695
+  }
+}

+ 280 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/pipelines/real_walkforward_report.py

@@ -0,0 +1,280 @@
+from __future__ import annotations
+
+from pathlib import Path
+import sys
+from typing import Any
+
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+import argparse
+import json
+
+import pandas as pd
+
+from backtest.engine import run_backtest
+from backtest.frozen_walkforward import (
+    normalize_hypothesis_candidates,
+    run_frozen_walkforward,
+    run_strategy_bundle,
+)
+from backtest.utility import utility_from_metrics, utility_status
+from backtest.walkforward import WindowSpec, build_expanding_windows
+from config.loader import load_config
+from data.io import evaluate_data_quality_gate, load_full_pit_data
+
+
+def _resolve_data_quality_settings(
+    config: dict[str, Any],
+    *,
+    strict_cli: bool,
+    min_coverage_cli: float | None,
+) -> tuple[bool, float, list[str] | None, list[str] | None, dict[str, float]]:
+    quality_cfg = config.get('data_quality', {})
+    strict_mode = bool(quality_cfg.get('strict_mode_default', False)) or strict_cli
+    default_min_coverage = float(quality_cfg.get('default_min_coverage', 0.95))
+    if min_coverage_cli is not None:
+        default_min_coverage = float(min_coverage_cli)
+    critical_columns = [str(col).strip().lower() for col in quality_cfg.get('critical_columns', [])]
+    blocking_columns = [str(col).strip().lower() for col in quality_cfg.get('blocking_columns', critical_columns)]
+    column_min_coverage = {
+        str(column).strip().lower(): float(value) for column, value in quality_cfg.get('column_min_coverage', {}).items()
+    }
+    return strict_mode, default_min_coverage, (critical_columns or None), (blocking_columns or None), column_min_coverage
+
+
+def _load_candidate_payload(path: str | None) -> list[dict[str, Any]] | None:
+    if not path:
+        return None
+    with Path(path).open('r', encoding='utf-8') as fh:
+        payload = json.load(fh)
+    if not isinstance(payload, list):
+        raise ValueError('Candidate file must be a JSON list of candidate objects.')
+    return payload
+
+
+def _resolve_frozen_settings(
+    config: dict[str, Any],
+    *,
+    candidates_json: str | None,
+    min_train_rows_cli: int | None,
+    min_test_rows_cli: int | None,
+) -> tuple[list[Any], int, int]:
+    frozen_cfg = config.get('frozen_validation', {})
+    raw_candidates = _load_candidate_payload(candidates_json) or frozen_cfg.get('candidates')
+    candidates = normalize_hypothesis_candidates(raw_candidates)
+
+    min_train_rows = int(frozen_cfg.get('min_train_rows', 120))
+    min_test_rows = int(frozen_cfg.get('min_test_rows', 40))
+    if min_train_rows_cli is not None:
+        min_train_rows = int(min_train_rows_cli)
+    if min_test_rows_cli is not None:
+        min_test_rows = int(min_test_rows_cli)
+    return candidates, min_train_rows, min_test_rows
+
+
+def _serialize_windows(windows: list[WindowSpec]) -> list[dict[str, str]]:
+    return [
+        {
+            'train_start': window.train_start,
+            'train_end': window.train_end,
+            'test_start': window.test_start,
+            'test_end': window.test_end,
+        }
+        for window in windows
+    ]
+
+
+def _resolve_walkforward_windows(config: dict[str, Any], raw_index) -> list[WindowSpec]:
+    frozen_cfg = config.get('frozen_validation', {})
+    window_mode = str(frozen_cfg.get('window_mode', 'expanding')).strip().lower()
+    if window_mode != 'expanding':
+        raise ValueError(f'Unsupported window_mode: {window_mode}')
+    return build_expanding_windows(
+        raw_index,
+        min_train_years=int(frozen_cfg.get('min_train_years', 2)),
+        test_years=int(frozen_cfg.get('test_years', 1)),
+        allow_partial_last_test=bool(frozen_cfg.get('allow_partial_last_test', True)),
+    )
+
+
+def _normalize_metrics(metrics: dict[str, Any]) -> dict[str, Any]:
+    out: dict[str, Any] = {}
+    for key, value in metrics.items():
+        if isinstance(value, (int, float)):
+            out[key] = float(value)
+        else:
+            out[key] = value
+    return out
+
+
+def _safe_divide(numerator: float, denominator: float) -> float | None:
+    if abs(float(denominator)) < 1e-12:
+        return None
+    return float(numerator / denominator)
+
+
+def _build_baseline_plan(raw: pd.DataFrame) -> pd.DataFrame:
+    baseline = raw.copy()
+    baseline['target_exposure'] = 1.0
+    return baseline
+
+
+def _build_report_markdown(summary: dict[str, Any]) -> str:
+    meta = summary['input']
+    comparison = summary['comparison']
+    strategy = summary['strategy_full_sample_metrics']
+    baseline = summary['baseline_full_sample_metrics']
+    frozen = summary['frozen_walkforward']
+
+    def _fmt(value: Any, ndigits: int = 4) -> str:
+        if value is None:
+            return 'n/a'
+        if isinstance(value, float):
+            return f'{value:.{ndigits}f}'
+        return str(value)
+
+    lines = [
+        '# Real Walk-Forward Report',
+        '',
+        f"- input_path: `{meta['pit_path']}`",
+        f"- row_count: `{meta['row_count']}`",
+        f"- date_range: `{meta['date_start']}` to `{meta['date_end']}`",
+        '',
+        '## Frozen Validation Summary',
+        f"- total_windows: `{frozen['total_windows']}`",
+        f"- processed_window_count: `{frozen['processed_window_count']}`",
+        f"- skipped_window_count: `{frozen['skipped_window_count']}`",
+        f"- positive_window_ratio: `{_fmt(frozen['positive_window_ratio'])}`",
+        '',
+        '## Full-Sample Comparison',
+        f"- strategy_annual_return: `{_fmt(strategy.get('annual_return'))}`",
+        f"- baseline_annual_return: `{_fmt(baseline.get('annual_return'))}`",
+        f"- annual_return_delta: `{_fmt(comparison.get('annual_return_delta'))}`",
+        f"- strategy_max_drawdown: `{_fmt(strategy.get('max_drawdown'))}`",
+        f"- baseline_max_drawdown: `{_fmt(baseline.get('max_drawdown'))}`",
+        f"- drawdown_ratio_vs_baseline: `{_fmt(comparison.get('drawdown_ratio_vs_baseline'))}`",
+        f"- strategy_utility_total_score: `{_fmt(strategy.get('utility_total_score'))}`",
+        f"- baseline_utility_total_score: `{_fmt(baseline.get('utility_total_score'))}`",
+        f"- utility_delta_vs_baseline: `{_fmt(comparison.get('utility_delta_vs_baseline'))}`",
+        f"- strategy_upside_capture: `{_fmt(strategy.get('upside_capture'))}`",
+    ]
+    return '\n'.join(lines) + '\n'
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description='Generate real-data frozen walk-forward report for ChiNext 50 regime workflow.')
+    parser.add_argument('--pit-csv', '--data-csv', dest='pit_csv', type=str, required=True, help='Required CSV/parquet full PIT input keyed by date.')
+    parser.add_argument('--strict-data', action='store_true', help='Fail fast when blocking quality breaches are detected.')
+    parser.add_argument('--min-coverage', type=float, default=None, help='Override default minimum non-null coverage ratio.')
+    parser.add_argument('--candidates-json', type=str, default=None, help='Optional JSON file describing frozen-validation candidate set.')
+    parser.add_argument('--min-train-rows', type=int, default=None, help='Override minimum required rows for each training window.')
+    parser.add_argument('--min-test-rows', type=int, default=None, help='Override minimum required rows for each test window.')
+    parser.add_argument('--config', type=str, default=None, help='Optional config YAML path.')
+    parser.add_argument('--output-dir', type=str, default='outputs/real_walkforward_report', help='Directory for report artifacts.')
+    args = parser.parse_args()
+
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    config = load_config(args.config)
+    raw = load_full_pit_data(args.pit_csv)
+
+    strict_mode, min_coverage, critical_columns, blocking_columns, column_min_coverage = _resolve_data_quality_settings(
+        config,
+        strict_cli=args.strict_data,
+        min_coverage_cli=args.min_coverage,
+    )
+    quality_summary = evaluate_data_quality_gate(
+        raw,
+        strict=strict_mode,
+        critical_columns=critical_columns,
+        blocking_columns=blocking_columns,
+        default_min_coverage=min_coverage,
+        column_min_coverage=column_min_coverage,
+    )
+    with (output_dir / 'data_quality_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(quality_summary, fh, ensure_ascii=False, indent=2)
+    if quality_summary['blocking']:
+        failed_items = quality_summary.get('errors') or quality_summary['breaches']
+        breached = ', '.join(item['column'] for item in failed_items)
+        raise ValueError(f'Data quality gate failed in strict mode. Breached columns: {breached}')
+
+    config.setdefault('_runtime', {})['strict_feature_gate'] = strict_mode
+    candidates, min_train_rows, min_test_rows = _resolve_frozen_settings(
+        config,
+        candidates_json=args.candidates_json,
+        min_train_rows_cli=args.min_train_rows,
+        min_test_rows_cli=args.min_test_rows,
+    )
+    windows = _resolve_walkforward_windows(config, raw.index)
+    board, frozen_summary = run_frozen_walkforward(
+        raw=raw,
+        config=config,
+        windows=windows,
+        candidates=candidates,
+        min_train_rows=min_train_rows,
+        min_test_rows=min_test_rows,
+    )
+    _, _, strategy_metrics = run_strategy_bundle(raw, config)
+    baseline_plan = _build_baseline_plan(raw)
+    _, baseline_metrics_raw = run_backtest(baseline_plan, config)
+
+    strategy_full_metrics = _normalize_metrics(dict(strategy_metrics))
+    strategy_full_metrics['utility_total_score'] = float(utility_from_metrics(strategy_full_metrics))
+    strategy_full_metrics['utility_status'] = utility_status(strategy_full_metrics['utility_total_score'])
+
+    baseline_metrics = _normalize_metrics(dict(baseline_metrics_raw))
+    baseline_metrics['utility_total_score'] = float(utility_from_metrics(baseline_metrics))
+    baseline_metrics['utility_status'] = utility_status(baseline_metrics['utility_total_score'])
+
+    annual_return_delta = float(strategy_full_metrics.get('annual_return', 0.0) - baseline_metrics.get('annual_return', 0.0))
+    max_drawdown_delta = float(strategy_full_metrics.get('max_drawdown', 0.0) - baseline_metrics.get('max_drawdown', 0.0))
+    comparison = {
+        'annual_return_delta': annual_return_delta,
+        'annual_return_delta_vs_baseline': annual_return_delta,
+        'max_drawdown_delta': max_drawdown_delta,
+        'max_drawdown_delta_vs_baseline': max_drawdown_delta,
+        'drawdown_ratio_vs_baseline': _safe_divide(
+            float(strategy_full_metrics.get('max_drawdown', 0.0)),
+            float(baseline_metrics.get('max_drawdown', 0.0)),
+        ),
+        'utility_delta_vs_baseline': float(
+            strategy_full_metrics.get('utility_total_score', 0.0) - baseline_metrics.get('utility_total_score', 0.0)
+        ),
+        'upside_capture': float(strategy_full_metrics.get('upside_capture', 0.0)),
+    }
+
+    summary = {
+        'input': {
+            'pit_path': str(args.pit_csv),
+            'row_count': int(len(raw)),
+            'date_start': raw.index.min().date().isoformat() if len(raw) else None,
+            'date_end': raw.index.max().date().isoformat() if len(raw) else None,
+        },
+        'frozen_walkforward': {
+            'total_windows': int(frozen_summary['total_windows']),
+            'processed_window_count': int(frozen_summary['processed_window_count']),
+            'skipped_window_count': int(frozen_summary['skipped_window_count']),
+            'positive_window_ratio': float(frozen_summary['positive_window_ratio']),
+            'selected_candidate_distribution': dict(frozen_summary['selected_candidate_distribution']),
+            'window_status_counts': dict(frozen_summary['window_status_counts']),
+            'candidate_ids': list(frozen_summary['candidate_ids']),
+            'min_train_rows': int(frozen_summary['min_train_rows']),
+            'min_test_rows': int(frozen_summary['min_test_rows']),
+            'windows': _serialize_windows(windows),
+        },
+        'strategy_full_sample_metrics': strategy_full_metrics,
+        'baseline_full_sample_metrics': baseline_metrics,
+        'comparison': comparison,
+    }
+
+    board.to_csv(output_dir / 'frozen_validation_board.csv', index=False)
+    with (output_dir / 'real_walkforward_summary.json').open('w', encoding='utf-8') as fh:
+        json.dump(summary, fh, ensure_ascii=False, indent=2)
+    (output_dir / 'real_walkforward_report.md').write_text(_build_report_markdown(summary), encoding='utf-8')
+
+
+if __name__ == '__main__':
+    main()

+ 300 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/tests/test_breadth_builder.py

@@ -0,0 +1,300 @@
+from __future__ import annotations
+
+import pandas as pd
+
+from data.breadth_builder import (
+    BREADTH_REQUIRED_COLUMNS,
+    derive_breadth_sidecar,
+    evaluate_breadth_source_integrity,
+)
+
+
+def _history_frame(base_price: float, periods: int = 120) -> pd.DataFrame:
+    dates = pd.bdate_range('2024-01-02', periods=periods)
+    base = pd.Series(range(periods), dtype=float)
+    return pd.DataFrame(
+        {
+            'date': dates,
+            'close': base_price + base * 0.3 + (base % 7) * 0.2,
+        }
+    )
+
+
+def test_derive_breadth_sidecar_uses_fallback_and_outputs_required_columns() -> None:
+    constituents = pd.DataFrame(
+        {
+            'symbol': ['300001', '300002', '300003', '300004'],
+            'name': ['a', 'b', 'c', 'd'],
+            'entry_date': ['2024-01-01'] * 4,
+        }
+    )
+
+    def _constituent_fetcher(index_symbol: str) -> pd.DataFrame:
+        assert index_symbol == '399673'
+        return constituents
+
+    def _history_fetcher(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        if symbol == '300003':
+            raise ValueError('akshare temporary failure')
+        return _history_frame(base_price=10.0 + int(symbol[-1]) * 3.0)
+
+    def _history_fallback_fetcher(
+        symbol: str,
+        start_date: str | None,
+        end_date: str | None,
+        licence: str,
+    ) -> pd.DataFrame:
+        assert symbol == '300003'
+        assert licence == 'TEST-LICENCE'
+        return _history_frame(base_price=26.0)
+
+    def _meta_fetcher(symbol: str) -> dict[str, object]:
+        return {
+            'industry': 'tech' if symbol in {'300001', '300003'} else 'med',
+            'float_shares': float(1_000_000 + int(symbol[-1]) * 100_000),
+        }
+
+    breadth, metadata = derive_breadth_sidecar(
+        start_date='2024-01-02',
+        end_date='2024-07-31',
+        index_symbol='399673',
+        mairui_licence='TEST-LICENCE',
+        min_active_constituents=3,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher,
+        stock_history_fallback_fetcher=_history_fallback_fetcher,
+        stock_meta_fetcher=_meta_fetcher,
+    )
+
+    assert set(BREADTH_REQUIRED_COLUMNS).issubset(breadth.columns)
+    assert metadata['constituent_count_total'] == 4
+    assert metadata['constituent_count_used'] == 4
+    assert metadata['provider_counts']['akshare'] == 3
+    assert metadata['provider_counts']['mairui'] == 1
+    assert metadata['sector_concentration_mode'] in {'industry_max_share', 'weight_hhi_proxy'}
+    spread = (breadth['weighted_ret_5'] - breadth['eq_weight_ret_5']).dropna()
+    assert int(spread.nunique()) > 3
+
+
+def test_evaluate_breadth_source_integrity_blocks_placeholder_spread() -> None:
+    dates = pd.bdate_range('2025-01-02', periods=50)
+    data = pd.DataFrame(index=dates)
+    data.index.name = 'date'
+    for idx, col in enumerate(BREADTH_REQUIRED_COLUMNS):
+        data[col] = 0.1 + idx * 0.01
+    data['eq_weight_ret_5'] = -0.01
+    data['weighted_ret_5'] = -0.008
+
+    report = evaluate_breadth_source_integrity(
+        data,
+        strict=True,
+        min_unique_non_null=3,
+        max_dominant_value_ratio=0.95,
+        std_floor=1e-10,
+    )
+    assert report['blocking'] is True
+    reasons = {(item['column'], item['reason']) for item in report['failures']}
+    assert ('concentration_spread_5', 'constant_or_near_constant_spread') in reasons
+
+
+def test_evaluate_breadth_source_integrity_respects_warmup_observations() -> None:
+    dates = pd.bdate_range('2025-01-02', periods=25)
+    data = pd.DataFrame(index=dates)
+    data.index.name = 'date'
+    base = pd.Series(range(25), index=dates, dtype=float)
+    data['pct_constituents_above_20dma'] = 0.5
+    data['pct_constituents_above_60dma'] = 0.5
+    data['pct_new_high_20'] = 0.1
+    data['pct_new_low_20'] = 0.1
+    data['eq_weight_ret_5'] = base * 0.001
+    data['weighted_ret_5'] = data['eq_weight_ret_5'] + (base % 7) * 0.0003
+    data['top3_contribution_5'] = 0.2 + base * 0.001
+    data['top1_contribution_5'] = 0.1 + base * 0.0005
+    data['top10_contribution_5'] = 0.6 + base * 0.0008
+    data['sector_concentration_20'] = 0.4
+    data['corr_spike_20'] = 0.2 + base * 0.0007
+    data['dispersion_20'] = 0.03 + base * 0.0002
+
+    report = evaluate_breadth_source_integrity(
+        data,
+        strict=True,
+        min_unique_non_null=3,
+        max_dominant_value_ratio=0.95,
+        std_floor=1e-10,
+    )
+    assert report['blocking'] is False
+    warmup_columns = {item['column'] for item in report['warmup_exempt_columns']}
+    assert 'pct_constituents_above_60dma' in warmup_columns
+
+
+def test_evaluate_breadth_source_integrity_does_not_flag_low_std_with_high_uniqueness() -> None:
+    dates = pd.bdate_range('2025-01-02', periods=80)
+    data = pd.DataFrame(index=dates)
+    data.index.name = 'date'
+    base = pd.Series(range(80), index=dates, dtype=float)
+    for idx, col in enumerate(BREADTH_REQUIRED_COLUMNS):
+        data[col] = 0.2 + idx * 0.01 + base * 1e-10
+    data['eq_weight_ret_5'] = -0.01 + base * 1e-4
+    data['weighted_ret_5'] = data['eq_weight_ret_5'] + 0.002 + base * 1e-9
+    data['top10_contribution_5'] = 0.20 + base * 1e-10
+
+    report = evaluate_breadth_source_integrity(
+        data,
+        strict=True,
+        min_unique_non_null=3,
+        max_dominant_value_ratio=0.95,
+        std_floor=1e-8,
+    )
+    reasons = {(item['column'], item['reason']) for item in report['failures']}
+    assert ('top10_contribution_5', 'std_below_floor') not in reasons
+
+
+def test_derive_breadth_sidecar_uses_mairui_meta_fallback() -> None:
+    constituents = pd.DataFrame(
+        {
+            'symbol': ['300001', '300002'],
+            'name': ['a', 'b'],
+            'entry_date': ['2024-01-01'] * 2,
+        }
+    )
+
+    def _constituent_fetcher(index_symbol: str) -> pd.DataFrame:
+        return constituents
+
+    def _history_fetcher(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        return _history_frame(base_price=10.0 + int(symbol[-1]) * 2.0, periods=120)
+
+    def _meta_fetcher(symbol: str) -> dict[str, object]:
+        if symbol == '300002':
+            raise ValueError('akshare meta temporary failure')
+        return {'industry': 'unknown', 'float_shares': 1_200_000.0}
+
+    def _meta_fallback_fetcher(symbol: str, licence: str) -> dict[str, object]:
+        assert licence == 'TEST-LICENCE'
+        return {'industry': 'semi' if symbol == '300001' else 'med', 'float_shares': None}
+
+    _, metadata = derive_breadth_sidecar(
+        start_date='2024-01-02',
+        end_date='2024-07-31',
+        index_symbol='399673',
+        mairui_licence='TEST-LICENCE',
+        min_active_constituents=2,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher,
+        stock_meta_fetcher=_meta_fetcher,
+        stock_meta_fallback_fetcher=_meta_fallback_fetcher,
+    )
+
+    assert metadata['metadata_provider_counts']['mairui'] == 2
+    assert metadata['industry_unknown_ratio'] == 0.0
+    assert '300002' in metadata['metadata_errors']
+
+
+def test_derive_breadth_sidecar_uses_local_cache(tmp_path) -> None:
+    constituents = pd.DataFrame(
+        {
+            'symbol': ['300001', '300002', '300003'],
+            'name': ['a', 'b', 'c'],
+            'entry_date': ['2024-01-01'] * 3,
+        }
+    )
+    call_counter = {'count': 0}
+    meta_call_counter = {'count': 0}
+
+    def _constituent_fetcher(index_symbol: str) -> pd.DataFrame:
+        return constituents
+
+    def _history_fetcher(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        call_counter['count'] += 1
+        return _history_frame(base_price=10.0 + int(symbol[-1]) * 2.0, periods=90)
+
+    def _meta_fetcher(symbol: str) -> dict[str, object]:
+        meta_call_counter['count'] += 1
+        return {'industry': 'tech', 'float_shares': 1_000_000.0}
+
+    cache_dir = tmp_path / 'cache'
+    first_breadth, first_meta = derive_breadth_sidecar(
+        start_date='2024-01-02',
+        end_date='2024-05-06',
+        index_symbol='399673',
+        min_active_constituents=2,
+        cache_dir=cache_dir,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher,
+        stock_meta_fetcher=_meta_fetcher,
+    )
+    assert call_counter['count'] == 3
+    assert meta_call_counter['count'] == 3
+    assert first_meta['cache']['miss_count'] == 3
+    assert first_meta['meta_cache']['miss_count'] == 3
+
+    def _history_fetcher_should_not_run(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        raise AssertionError('history fetch should not run when cache is valid')
+
+    def _meta_fetcher_should_not_run(symbol: str) -> dict[str, object]:
+        raise AssertionError('meta fetch should not run when meta cache is valid')
+
+    second_breadth, second_meta = derive_breadth_sidecar(
+        start_date='2024-01-02',
+        end_date='2024-05-06',
+        index_symbol='399673',
+        min_active_constituents=2,
+        cache_dir=cache_dir,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher_should_not_run,
+        stock_meta_fetcher=_meta_fetcher_should_not_run,
+    )
+    assert second_meta['cache']['hit_count'] == 3
+    assert second_meta['meta_cache']['hit_count'] == 3
+    assert int(len(first_breadth)) == int(len(second_breadth))
+
+
+def test_derive_breadth_sidecar_cache_handles_non_trading_boundaries(tmp_path) -> None:
+    constituents = pd.DataFrame(
+        {
+            'symbol': ['300001', '300002', '300003'],
+            'name': ['a', 'b', 'c'],
+            'entry_date': ['2024-01-01'] * 3,
+        }
+    )
+    call_counter = {'count': 0}
+
+    def _constituent_fetcher(index_symbol: str) -> pd.DataFrame:
+        return constituents
+
+    def _history_fetcher(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        call_counter['count'] += 1
+        return _history_frame(base_price=11.0 + int(symbol[-1]) * 2.0, periods=90)
+
+    def _meta_fetcher(symbol: str) -> dict[str, object]:
+        return {'industry': 'tech', 'float_shares': 1_000_000.0}
+
+    cache_dir = tmp_path / 'cache_non_trading'
+    first_breadth, _ = derive_breadth_sidecar(
+        start_date='2024-01-01',
+        end_date='2024-05-11',
+        index_symbol='399673',
+        min_active_constituents=2,
+        cache_dir=cache_dir,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher,
+        stock_meta_fetcher=_meta_fetcher,
+    )
+    assert call_counter['count'] == 3
+
+    def _history_fetcher_should_not_run(symbol: str, start_date: str | None, end_date: str | None) -> pd.DataFrame:
+        raise AssertionError('history fetch should not run when cache has non-trading boundary tolerance')
+
+    second_breadth, second_meta = derive_breadth_sidecar(
+        start_date='2024-01-01',
+        end_date='2024-05-11',
+        index_symbol='399673',
+        min_active_constituents=2,
+        cache_dir=cache_dir,
+        constituent_fetcher=_constituent_fetcher,
+        stock_history_fetcher=_history_fetcher_should_not_run,
+        stock_meta_fetcher=_meta_fetcher,
+    )
+    assert second_meta['cache']['hit_count'] == 3
+    assert second_meta['cache']['miss_count'] == 0
+    assert int(len(first_breadth)) == int(len(second_breadth))

+ 94 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_harden_derived_2026-04-09/tests/test_real_walkforward_report_pipeline.py

@@ -0,0 +1,94 @@
+from __future__ import annotations
+
+import json
+import sys
+
+import pandas as pd
+import pytest
+
+import pipelines.real_walkforward_report as real_walkforward_report
+
+
+def _write_full_pit_csv(path, periods: int = 320, *, sparse_column: str | None = None) -> None:
+    dates = pd.bdate_range('2022-01-04', periods=periods)
+    base = pd.Series(range(periods), dtype=float)
+    df = pd.DataFrame(
+        {
+            'date': dates,
+            'open': 100.0 + base * 0.1,
+            'high': 101.0 + base * 0.1 + (base % 5) * 0.02,
+            'low': 99.0 + base * 0.1 - (base % 4) * 0.015,
+            'close': 100.5 + base * 0.1 + (base % 3) * 0.01,
+            'volume': 1_000_000.0 + base * 1000.0 + (base % 7) * 200.0,
+            'hs300_close': 4000.0 + base * 0.5,
+            'star50_close': 1200.0 + base * 0.2,
+            'csi1000_close': 5000.0 + base * 0.4,
+            'pct_constituents_above_20dma': 0.55 + (base % 10) * 0.01,
+            'pct_constituents_above_60dma': 0.50 + (base % 8) * 0.01,
+            'pct_new_high_20': 0.06 + (base % 5) * 0.002,
+            'pct_new_low_20': 0.07 + (base % 4) * 0.002,
+            'eq_weight_ret_5': -0.01 + (base % 7) * 0.002,
+            'weighted_ret_5': -0.008 + (base % 7) * 0.002 + (base % 3) * 0.0005,
+            'top3_contribution_5': 0.34 + (base % 6) * 0.004,
+            'top1_contribution_5': 0.11 + (base % 6) * 0.003,
+            'top10_contribution_5': 0.60 + (base % 6) * 0.004,
+            'sector_concentration_20': 0.20 + (base % 5) * 0.003 + (base % 3) * 0.0005,
+            'corr_spike_20': 0.05 + (base % 9) * 0.003,
+            'dispersion_20': 0.18 + (base % 8) * 0.004,
+        }
+    )
+    if sparse_column is not None:
+        df.loc[5:, sparse_column] = float('nan')
+    df.to_csv(path, index=False)
+
+
+def test_real_walkforward_report_generates_artifacts(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit.csv'
+    output_dir = tmp_path / 'report_output'
+    _write_full_pit_csv(data_path, periods=360)
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        ['real_walkforward_report.py', '--pit-csv', str(data_path), '--output-dir', str(output_dir)],
+    )
+    real_walkforward_report.main()
+
+    assert (output_dir / 'data_quality_summary.json').exists()
+    assert (output_dir / 'frozen_validation_board.csv').exists()
+    assert (output_dir / 'real_walkforward_summary.json').exists()
+    assert (output_dir / 'real_walkforward_report.md').exists()
+
+    summary = json.loads((output_dir / 'real_walkforward_summary.json').read_text(encoding='utf-8'))
+    assert 'strategy_full_sample_metrics' in summary
+    assert 'baseline_full_sample_metrics' in summary
+    assert 'comparison' in summary
+    assert 'utility_delta_vs_baseline' in summary['comparison']
+    assert 'annual_return_delta_vs_baseline' in summary['comparison']
+    assert 'max_drawdown_delta_vs_baseline' in summary['comparison']
+    assert summary['comparison']['annual_return_delta'] == summary['comparison']['annual_return_delta_vs_baseline']
+    assert summary['comparison']['max_drawdown_delta'] == summary['comparison']['max_drawdown_delta_vs_baseline']
+    assert summary['input']['row_count'] == 360
+
+
+def test_real_walkforward_report_strict_mode_blocks_on_core_breach(tmp_path, monkeypatch) -> None:
+    data_path = tmp_path / 'pit_sparse.csv'
+    output_dir = tmp_path / 'report_strict_fail'
+    _write_full_pit_csv(data_path, periods=180, sparse_column='pct_constituents_above_60dma')
+
+    monkeypatch.setattr(
+        sys,
+        'argv',
+        [
+            'real_walkforward_report.py',
+            '--pit-csv',
+            str(data_path),
+            '--strict-data',
+            '--output-dir',
+            str(output_dir),
+        ],
+    )
+
+    with pytest.raises(ValueError, match='Data quality gate failed in strict mode'):
+        real_walkforward_report.main()
+    assert (output_dir / 'data_quality_summary.json').exists()

BIN
research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09.zip


+ 45 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09/CONTEXT_FOR_GPT_PRO.md

@@ -0,0 +1,45 @@
+# GPT Pro Context - Recalibrate Walkforward Economics (2026-04-09)
+
+## What was changed
+- OpenSpec change: `recalibrate-walkforward-economics-after-semantic-hardening` (archived).
+- Candidate selection upgraded in `backtest/frozen_walkforward.py`:
+  - hard-constraint-first filter
+  - multi-objective ranking score
+  - deterministic fallback (`utility_fallback_no_hard_pass`) when no hard-pass candidate exists
+  - richer per-window diagnostics fields
+- Pipeline summaries updated:
+  - `pipelines/frozen_hypothesis_validation.py`
+  - `pipelines/real_walkforward_report.py`
+  - new summary fields include `selection_mode_distribution`, `hard_pass_window_ratio`, hard-pass window counts, and candidate selection config snapshot.
+- Default calibration updated in `config/regime.yaml`:
+  - policy exposure defaults and candidate set expanded with `balanced_capture`
+  - `frozen_validation.candidate_selection` config added.
+- Tests updated/added:
+  - `tests/test_frozen_walkforward.py`
+  - `tests/test_frozen_validation_pipeline.py`
+  - `tests/test_real_walkforward_report_pipeline.py`
+
+## Verification
+- Targeted tests passed.
+- Full regression passed: `77` tests.
+- Strict spec validation passed: `29` specs.
+
+## Key result comparison (old -> new)
+Old summary: `outputs/system_e2e_derived_full50_v2/report/real_walkforward_summary.json`
+New summary: `outputs/real_walkforward_recalibrated_20260409_v2/real_walkforward_summary.json`
+
+- selected_candidate_distribution: `{defensive: 5}` -> `{balanced_capture: 2, pro_risk: 1, baseline: 1, defensive: 1}`
+- positive_window_ratio: `0.4` -> `0.2`
+- upside_capture: `0.2593` -> `0.2849`
+- annual_return: `0.0673` -> `0.0666`
+- max_drawdown: `0.2871` -> `0.2896`
+- utility_total_score: `-0.1104` -> `-0.0997`
+- utility_delta_vs_baseline: `-0.1452` -> `-0.1345`
+- drawdown_ratio_vs_baseline: `0.4774` -> `0.4814`
+- hard_pass_window_ratio (new): `0.8`
+- selection_mode_distribution (new): `{constraint_score: 4, utility_fallback_no_hard_pass: 1}`
+
+## Current judgment
+- Candidate diversification improved materially.
+- Utility gap improved slightly, upside improved, but positive_window_ratio deteriorated.
+- Need GPT Pro guidance on objective/constraint and ranking-weight design to avoid this tradeoff.

+ 0 - 0
research/chinext50_regime_project/deliverables/gpt_pro_bundle_recalibrate_2026-04-09/QUESTIONS_FOR_GPT_PRO.md


Неке датотеке нису приказане због велике количине промена