erwin 1c90bb6525 regime		3 settimane fa
..
.claude	1c90bb6525 regime	3 settimane fa
.codex	1c90bb6525 regime	3 settimane fa
backtest	1c90bb6525 regime	3 settimane fa
config	1c90bb6525 regime	3 settimane fa
data	1c90bb6525 regime	3 settimane fa
deliverables	1c90bb6525 regime	3 settimane fa
docs	1c90bb6525 regime	3 settimane fa
features	1c90bb6525 regime	3 settimane fa
memory	1c90bb6525 regime	3 settimane fa
model	1c90bb6525 regime	3 settimane fa
openspec	1c90bb6525 regime	3 settimane fa
outputs	1c90bb6525 regime	3 settimane fa
pipelines	1c90bb6525 regime	3 settimane fa
tests	1c90bb6525 regime	3 settimane fa
MEMORY.md	1c90bb6525 regime	3 settimane fa
README.md	1c90bb6525 regime	3 settimane fa
USER.md	1c90bb6525 regime	3 settimane fa
chinext50_blocker_checklist_for_codex.md	1c90bb6525 regime	3 settimane fa
chinext50_fullcode_guidance_for_codex_2026-04-10.md	1c90bb6525 regime	3 settimane fa
chinext50_harden_derived_breadth_direction_handoff_2026-04-09.md	1c90bb6525 regime	3 settimane fa
chinext50_post_b3_detailed_guidance_for_codex_2026-04-10.md	1c90bb6525 regime	3 settimane fa
chinext50_post_b3_feedback_response_for_codex_2026-04-10.md	1c90bb6525 regime	3 settimane fa
chinext50_post_b3_next_steps_for_codex_2026-04-10.md	1c90bb6525 regime	3 settimane fa
chinext50_recalibrate_guidance_for_codex_2026-04-09.md	1c90bb6525 regime	3 settimane fa
chinext50_regime_build_handoff_2026-04-08.md	1c90bb6525 regime	3 settimane fa
chinext50_regime_review_2026-04-09.md	1c90bb6525 regime	3 settimane fa
pytest.ini	1c90bb6525 regime	3 settimane fa

ChiNext 50 Regime Project Starter

这是一个创业板50专用的日频 regime-aware exposure control 项目骨架。

它的目标不是预测每天涨跌，而是尽量做到：

大跌/拥挤期少亏
真修复阶段逐步回补
主升段保留大部分参与率

当前已经搭好的内容

data/：CSV/parquet 读取器 + synthetic demo 数据生成器
features/：价格、广度、相对强弱三层特征
model/：连续分数、5 态状态机、仓位映射和硬 veto
backtest/：next-open 近似执行回测、utility、事件切片
pipelines/：demo 管线 + frozen-hypothesis validation
tests/：最小端到端测试

核心状态

risk_off
repair
trend
chop
euphoric_late

核心分数

trend_score
breadth_score
stress_score
crowding_score
repair_score

以及三个路径型 hazard：

down_hazard
repair_hazard
rebound_hazard

运行 demo

在项目根目录执行：

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/demo

这会使用 synthetic 数据生成：

outputs/demo/daily_ledger.csv
outputs/demo/event_summary.csv
outputs/demo/metrics_summary.json

运行 frozen-hypothesis 验证

python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/frozen_validation

换成真实数据

你的 CSV/parquet 至少需要这些列：

date
open
high
low
close
volume

建议同时提供：

hs300_close
star50_close
csi1000_close
pct_constituents_above_20dma
pct_constituents_above_60dma
pct_new_high_20
pct_new_low_20
eq_weight_ret_5
weighted_ret_5
top3_contribution_5
corr_spike_20
dispersion_20

运行方式：

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/real_data_demo

重要说明

当前 scaffold 不是业绩证明，只是把“特征 -> 分数 -> 状态 -> 仓位 -> 回测 -> 事件诊断”这条闭环先搭通。
economic effect 需要你接入真实的创业板50指数/ETF历史 + 历史成分股宽度数据后再做严格 walk-forward 验证。
第一阶段不要同时扩到多市场或复杂 readiness/portability 系统。

Real Data Input Contract and Quality Gate

The runtime pipelines now require a full point-in-time dataset and can optionally block low-quality data before feature construction.

Required PIT columns

date, open, high, low, close, volume, hs300_close, star50_close, csi1000_close, pct_constituents_above_20dma, pct_constituents_above_60dma, pct_new_high_20, pct_new_low_20, eq_weight_ret_5, weighted_ret_5, top3_contribution_5, top1_contribution_5, top10_contribution_5, sector_concentration_20, corr_spike_20, dispersion_20

Column names are normalized to lowercase with surrounding whitespace removed.
Duplicate trading dates are rejected.
Rows are sorted by trading date before downstream processing.
Runtime entrypoints no longer merge sidecars on the fly.
If required PIT columns are missing, the pipeline fails before quality gate and feature construction.

Data quality gate modes

Non-strict (default): pipeline continues and records warnings when critical-column coverage is below threshold.
Strict (--strict-data): pipeline stops only when configured blocking_columns are breached; non-blocking breaches remain warnings.

Coverage threshold configuration:

Config defaults: config/regime.yaml -> data_quality.default_min_coverage and data_quality.column_min_coverage
CLI override: --min-coverage

Output artifact

Each run writes data_quality_summary.json into the output directory. This artifact includes gate mode, pass/fail status, breach severities (error/warning), and field-level coverage metrics.

Example commands

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --min-coverage 0.98 \
  --output-dir outputs/real_data_demo

python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --min-coverage 0.98 \
  --output-dir outputs/frozen_validation_real

Build Point-In-Time (PIT) Dataset

Use pipelines/build_pit_dataset.py to create a reusable point-in-time table before running strategy pipelines.

Command

python pipelines/build_pit_dataset.py \
  --market-csv path/to/chinext50_market.csv \
  --sidecar-csv path/to/chinext50_benchmark_sidecar.csv \
  --sidecar-csv path/to/chinext50_breadth_sidecar.csv \
  --output-path outputs/pit/chinext50_pit.csv

Optional quality controls:

--strict-data: block PIT output when quality breaches occur
--min-coverage 0.98: override minimum non-null coverage threshold
--config path/to/regime.yaml: load custom quality defaults

Output semantics

Always writes pit_quality_summary.json in the same output directory.
On success, writes PIT data to --output-path (.csv or .parquet).
In strict failure mode, PIT file is not written, but pit_quality_summary.json is still written for diagnostics.
Quality summary includes source metadata:
- sources.market_path
- sources.sidecar_paths
- sources.sidecar_count
sources.merged_row_count
- pit_columns

Real Data Ingestion

Use pipelines/ingest_real_data.py to fetch/load source data, publish raw + staging layers, and output final PIT in one run.

CSV provider (local source files)

python pipelines/ingest_real_data.py \
  --provider csv \
  --market-csv path/to/chinext50_market.csv \
  --hs300-csv path/to/hs300.csv \
  --star50-csv path/to/star50.csv \
  --csi1000-csv path/to/csi1000.csv \
  --breadth-csv path/to/chinext50_breadth.csv \
  --output-dir outputs/ingestion

Akshare provider (online fetch + local breadth)

python pipelines/ingest_real_data.py \
  --provider akshare \
  --market-symbol 159915 \
  --market-symbol-type etf \
  --hs300-symbol 000300 \
  --star50-symbol 000688 \
  --csi1000-symbol 000852 \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --breadth-csv path/to/chinext50_breadth.csv \
  --output-dir outputs/ingestion

Akshare + Mairui fallback (recommended when Akshare缺字段或不可用)

python pipelines/ingest_real_data.py \
  --provider akshare \
  --market-symbol 159915 \
  --market-symbol-type etf \
  --breadth-csv path/to/chinext50_breadth.csv \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --output-dir outputs/ingestion

Mairui provider (online fetch as primary)

python pipelines/ingest_real_data.py \
  --provider mairui \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-market-kind index \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --breadth-csv path/to/chinext50_breadth.csv \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --output-dir outputs/ingestion

If breadth fields are also served by a Mairui endpoint, you can replace --breadth-csv with:

--mairui-breadth-url https://api.mairuiapi.com/xxx/{licence}
optional --mairui-breadth-map-json path/to/rename_map.json

If you do not trust an external breadth panel (or do not have one), you can derive breadth from constituent histories:

python pipelines/ingest_real_data.py \
  --provider mairui \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-market-kind index \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --derive-breadth \
  --breadth-index-symbol 399673 \
  --breadth-min-active-constituents 20 \
  --breadth-max-constituents 50 \
  --breadth-cache-dir outputs/ingestion/raw/constituent_history \
  --output-dir outputs/ingestion

Strict mode now includes a breadth-source integrity gate. Placeholder-like breadth inputs (for example, constant weighted_ret_5 - eq_weight_ret_5) are blocked before PIT publish.

Output structure includes:

outputs/ingestion/raw/*.csv
outputs/ingestion/raw/breadth_integrity_summary.json
outputs/ingestion/raw/breadth_derivation_summary.json (when --derive-breadth is used)
outputs/ingestion/staging/*.csv
outputs/ingestion/pit/chinext50_pit.csv
outputs/ingestion/pit/pit_quality_summary.json
outputs/ingestion/ingestion_manifest.json

Frozen Walk-Forward (Train-Select / Test-Freeze)

pipelines/frozen_hypothesis_validation.py now runs a strict frozen-hypothesis process:

Evaluate predefined candidates only on each training window.
Select one winner by training utility (deterministic tie-break by candidate order).
Freeze that winner and evaluate the paired test window without re-selection.

Candidate configuration

Candidates can come from:

config/regime.yaml -> frozen_validation.candidates
optional CLI override file: --candidates-json path/to/candidates.json

Window row requirements:

frozen_validation.min_train_rows (or --min-train-rows)
frozen_validation.min_test_rows (or --min-test-rows)

If a window is too short, it is marked as skipped with an explicit status.

Audit outputs

frozen_validation_board.csv now includes:

window ranges (train_*, test_*)
status
selected_candidate_id
selected_candidate_overrides (serialized JSON)
prefixed train/test metrics such as train_utility_total_score and test_utility_total_score

frozen_validation_summary.json now includes:

processed/skipped window counts
positive test-utility ratio
selected candidate distribution
status distribution

Example

python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --candidates-json path/to/frozen_candidates.json \
  --min-train-rows 180 \
  --min-test-rows 60 \
  --output-dir outputs/frozen_validation_real

Real Walk-Forward Report

Use pipelines/real_walkforward_report.py to generate a review-ready bundle from full PIT input:

data_quality_summary.json
frozen_validation_board.csv
real_walkforward_summary.json
real_walkforward_report.md

python pipelines/real_walkforward_report.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --output-dir outputs/real_walkforward_report

Event-Anchored Diagnostics

run_demo now outputs transition-anchor diagnostics with explicit event taxonomy:

crash_onset
false_rebound
true_repair
crowded_unwind
state_transition (fallback class for other transitions)

Event artifacts

event_log.csv: per-transition anchor details (event_date, from_state, to_state, event_type, forward returns, exposure context)
event_summary.csv: event-type grouped averages and counts

Classification logic is rule-based on state transitions plus forward-window confirmation signals for rebound quality.

Execution Layer Constraints and Tracking Diagnostics

Backtest execution now includes configurable constraints for better ETF-style realism:

trading.extreme_day_move_threshold: absolute executed return threshold that triggers cost amplification
trading.extreme_day_cost_multiplier: multiplier applied to base trading cost on extreme days
trading.gap_slippage_factor: additive gap shock cost factor using abs(gap_open) * turnover

New ledger diagnostics:

tracking_difference: strategy_return_net - strategy_return_gross
tracking_error_20: 20-day rolling std of tracking_difference

New summary metrics:

tracking_diff_mean
tracking_diff_abs_mean
tracking_error_20_p95

Execution Constraint Calibration

Use pipelines/calibrate_execution_constraints.py to sweep execution parameters and output a recommendation:

execution_calibration_grid.csv
execution_calibration_recommendation.json

python pipelines/calibrate_execution_constraints.py \
  --pit-csv path/to/chinext50_pit.csv \
  --cost-multipliers 1.0,1.25,1.5,1.75 \
  --gap-slippage-factors 0.0,0.01,0.02,0.03 \
  --output-dir outputs/execution_calibration

Additional Optional Concentration Inputs

To improve crowding diagnostics, you can optionally provide:

top1_contribution_5
top10_contribution_5
sector_concentration_20

Regime Lite (Small-Team Runtime)

Use pipelines/regime_lite_run.py for a minimal operational workflow:

3 states only: risk_off, chop, trend
fixed base exposures: 0.0, 0.35, 0.80
daily exposure step cap: 0.20
explicit execution profiles:
- baseline: lag1 timing, no overlay
- promoted_fast_entry_hold3: prior promoted fixed-hold reference, based on combo_fast_hold3
- promoted_fast_entry_adaptive_extend: current preferred profile after adaptive keep-vs-replace closure, based on combo_fast_adaptive_extend

python pipelines/regime_lite_run.py \
  --pit-csv path/to/chinext50_pit.csv \
  --profile promoted_fast_entry_adaptive_extend \
  --output-dir outputs/regime_lite

Current preferred lite runtime profile:

promoted_fast_entry_adaptive_extend
promotion decision artifact: outputs/regime_lite_promotion_20260424/regime_lite_promotion_decision.json
rationale: the bounded adaptive closure concluded adaptive-replace-candidate, selecting combo_fast_adaptive_extend to replace the prior fixed-hold reference while keeping baseline as rollback-safe reference
rollback/reference profile: baseline
inspect promotion_decision.active_adaptive_mode plus regime_lite_summary.json -> execution_profile.adaptive_hold_mode / adaptive_hold_context to understand the active bounded hold semantics before operating it

Converged lite operational flow:

Run the preferred profile with pipelines/regime_lite_run.py --profile promoted_fast_entry_adaptive_extend.
Inspect regime_lite_runtime_health.json for bounded status healthy / review / hold / rollback_recommended.
Inspect regime_lite_post_promotion_review.json for bounded decision keep_promoted / hold_and_review / recommend_rollback.
In post-promotion review, treat recent_window_evidence as the primary decision basis; full_history_reference is reference context only, and segmented_diagnostics is for bounded diagnosis rather than override.
If health stays healthy and review stays keep_promoted, continue normal lite operation.
If health moves to review or review moves to hold_and_review, pause new tuning and inspect the bounded reasons before any change.
If health reaches rollback_recommended or review reaches recommend_rollback, switch back to baseline as the rollback-safe profile and keep the lite path scoped to that runtime handoff.

Artifacts:

regime_lite_daily_ledger.csv
regime_lite_summary.json
regime_lite_report.md
regime_lite_runtime_health.json
regime_lite_post_promotion_review.json

Execution Timing + Entry-Exit Experiments

Run controlled A/B experiments for:

execution timing: lag1 vs fast_entry
entry-specific exit overlay: short trend-entry hold floor with stop guard

python pipelines/regime_lite_experiments.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/regime_lite_experiments