erwin 1c90bb6525 regime 3 settimane fa
..
.claude 1c90bb6525 regime 3 settimane fa
.codex 1c90bb6525 regime 3 settimane fa
backtest 1c90bb6525 regime 3 settimane fa
config 1c90bb6525 regime 3 settimane fa
data 1c90bb6525 regime 3 settimane fa
deliverables 1c90bb6525 regime 3 settimane fa
docs 1c90bb6525 regime 3 settimane fa
features 1c90bb6525 regime 3 settimane fa
memory 1c90bb6525 regime 3 settimane fa
model 1c90bb6525 regime 3 settimane fa
openspec 1c90bb6525 regime 3 settimane fa
outputs 1c90bb6525 regime 3 settimane fa
pipelines 1c90bb6525 regime 3 settimane fa
tests 1c90bb6525 regime 3 settimane fa
MEMORY.md 1c90bb6525 regime 3 settimane fa
README.md 1c90bb6525 regime 3 settimane fa
USER.md 1c90bb6525 regime 3 settimane fa
chinext50_blocker_checklist_for_codex.md 1c90bb6525 regime 3 settimane fa
chinext50_fullcode_guidance_for_codex_2026-04-10.md 1c90bb6525 regime 3 settimane fa
chinext50_harden_derived_breadth_direction_handoff_2026-04-09.md 1c90bb6525 regime 3 settimane fa
chinext50_post_b3_detailed_guidance_for_codex_2026-04-10.md 1c90bb6525 regime 3 settimane fa
chinext50_post_b3_feedback_response_for_codex_2026-04-10.md 1c90bb6525 regime 3 settimane fa
chinext50_post_b3_next_steps_for_codex_2026-04-10.md 1c90bb6525 regime 3 settimane fa
chinext50_recalibrate_guidance_for_codex_2026-04-09.md 1c90bb6525 regime 3 settimane fa
chinext50_regime_build_handoff_2026-04-08.md 1c90bb6525 regime 3 settimane fa
chinext50_regime_review_2026-04-09.md 1c90bb6525 regime 3 settimane fa
pytest.ini 1c90bb6525 regime 3 settimane fa

README.md

ChiNext 50 Regime Project Starter

这是一个创业板50专用的日频 regime-aware exposure control 项目骨架。

它的目标不是预测每天涨跌,而是尽量做到:

  • 大跌/拥挤期少亏
  • 真修复阶段逐步回补
  • 主升段保留大部分参与率

当前已经搭好的内容

  • data/:CSV/parquet 读取器 + synthetic demo 数据生成器
  • features/:价格、广度、相对强弱三层特征
  • model/:连续分数、5 态状态机、仓位映射和硬 veto
  • backtest/:next-open 近似执行回测、utility、事件切片
  • pipelines/:demo 管线 + frozen-hypothesis validation
  • tests/:最小端到端测试

核心状态

  • risk_off
  • repair
  • trend
  • chop
  • euphoric_late

核心分数

  • trend_score
  • breadth_score
  • stress_score
  • crowding_score
  • repair_score

以及三个路径型 hazard:

  • down_hazard
  • repair_hazard
  • rebound_hazard

运行 demo

在项目根目录执行:

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/demo

这会使用 synthetic 数据生成:

  • outputs/demo/daily_ledger.csv
  • outputs/demo/event_summary.csv
  • outputs/demo/metrics_summary.json

运行 frozen-hypothesis 验证

python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/frozen_validation

换成真实数据

你的 CSV/parquet 至少需要这些列:

  • date
  • open
  • high
  • low
  • close
  • volume

建议同时提供:

  • hs300_close
  • star50_close
  • csi1000_close
  • pct_constituents_above_20dma
  • pct_constituents_above_60dma
  • pct_new_high_20
  • pct_new_low_20
  • eq_weight_ret_5
  • weighted_ret_5
  • top3_contribution_5
  • corr_spike_20
  • dispersion_20

运行方式:

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/real_data_demo

重要说明

  • 当前 scaffold 不是业绩证明,只是把“特征 -> 分数 -> 状态 -> 仓位 -> 回测 -> 事件诊断”这条闭环先搭通。
  • economic effect 需要你接入真实的创业板50指数/ETF历史 + 历史成分股宽度数据后再做严格 walk-forward 验证。
  • 第一阶段不要同时扩到多市场或复杂 readiness/portability 系统。

Real Data Input Contract and Quality Gate

The runtime pipelines now require a full point-in-time dataset and can optionally block low-quality data before feature construction.

Required PIT columns

date, open, high, low, close, volume, hs300_close, star50_close, csi1000_close, pct_constituents_above_20dma, pct_constituents_above_60dma, pct_new_high_20, pct_new_low_20, eq_weight_ret_5, weighted_ret_5, top3_contribution_5, top1_contribution_5, top10_contribution_5, sector_concentration_20, corr_spike_20, dispersion_20

  • Column names are normalized to lowercase with surrounding whitespace removed.
  • Duplicate trading dates are rejected.
  • Rows are sorted by trading date before downstream processing.

  • Runtime entrypoints no longer merge sidecars on the fly.

  • If required PIT columns are missing, the pipeline fails before quality gate and feature construction.

Data quality gate modes

  • Non-strict (default): pipeline continues and records warnings when critical-column coverage is below threshold.
  • Strict (--strict-data): pipeline stops only when configured blocking_columns are breached; non-blocking breaches remain warnings.

Coverage threshold configuration:

  • Config defaults: config/regime.yaml -> data_quality.default_min_coverage and data_quality.column_min_coverage
  • CLI override: --min-coverage

Output artifact

Each run writes data_quality_summary.json into the output directory. This artifact includes gate mode, pass/fail status, breach severities (error/warning), and field-level coverage metrics.

Example commands

python pipelines/run_demo.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --min-coverage 0.98 \
  --output-dir outputs/real_data_demo
python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --min-coverage 0.98 \
  --output-dir outputs/frozen_validation_real

Build Point-In-Time (PIT) Dataset

Use pipelines/build_pit_dataset.py to create a reusable point-in-time table before running strategy pipelines.

Command

python pipelines/build_pit_dataset.py \
  --market-csv path/to/chinext50_market.csv \
  --sidecar-csv path/to/chinext50_benchmark_sidecar.csv \
  --sidecar-csv path/to/chinext50_breadth_sidecar.csv \
  --output-path outputs/pit/chinext50_pit.csv

Optional quality controls:

  • --strict-data: block PIT output when quality breaches occur
  • --min-coverage 0.98: override minimum non-null coverage threshold
  • --config path/to/regime.yaml: load custom quality defaults

Output semantics

  • Always writes pit_quality_summary.json in the same output directory.
  • On success, writes PIT data to --output-path (.csv or .parquet).
  • In strict failure mode, PIT file is not written, but pit_quality_summary.json is still written for diagnostics.
  • Quality summary includes source metadata:
    • sources.market_path
    • sources.sidecar_paths
    • sources.sidecar_count
  • sources.merged_row_count
    • pit_columns

Real Data Ingestion

Use pipelines/ingest_real_data.py to fetch/load source data, publish raw + staging layers, and output final PIT in one run.

CSV provider (local source files)

python pipelines/ingest_real_data.py \
  --provider csv \
  --market-csv path/to/chinext50_market.csv \
  --hs300-csv path/to/hs300.csv \
  --star50-csv path/to/star50.csv \
  --csi1000-csv path/to/csi1000.csv \
  --breadth-csv path/to/chinext50_breadth.csv \
  --output-dir outputs/ingestion

Akshare provider (online fetch + local breadth)

python pipelines/ingest_real_data.py \
  --provider akshare \
  --market-symbol 159915 \
  --market-symbol-type etf \
  --hs300-symbol 000300 \
  --star50-symbol 000688 \
  --csi1000-symbol 000852 \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --breadth-csv path/to/chinext50_breadth.csv \
  --output-dir outputs/ingestion

Akshare + Mairui fallback (recommended when Akshare缺字段或不可用)

python pipelines/ingest_real_data.py \
  --provider akshare \
  --market-symbol 159915 \
  --market-symbol-type etf \
  --breadth-csv path/to/chinext50_breadth.csv \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --output-dir outputs/ingestion

Mairui provider (online fetch as primary)

python pipelines/ingest_real_data.py \
  --provider mairui \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-market-kind index \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --breadth-csv path/to/chinext50_breadth.csv \
  --start-date 2018-01-01 \
  --end-date 2026-04-09 \
  --output-dir outputs/ingestion

If breadth fields are also served by a Mairui endpoint, you can replace --breadth-csv with:

  • --mairui-breadth-url https://api.mairuiapi.com/xxx/{licence}
  • optional --mairui-breadth-map-json path/to/rename_map.json

If you do not trust an external breadth panel (or do not have one), you can derive breadth from constituent histories:

python pipelines/ingest_real_data.py \
  --provider mairui \
  --mairui-licence YOUR_MAIRUI_LICENCE \
  --mairui-market-code 399673.SZ \
  --mairui-market-kind index \
  --mairui-hs300-code 000300.SH \
  --mairui-star50-code 000688.SH \
  --mairui-csi1000-code 000852.SH \
  --derive-breadth \
  --breadth-index-symbol 399673 \
  --breadth-min-active-constituents 20 \
  --breadth-max-constituents 50 \
  --breadth-cache-dir outputs/ingestion/raw/constituent_history \
  --output-dir outputs/ingestion

Strict mode now includes a breadth-source integrity gate. Placeholder-like breadth inputs (for example, constant weighted_ret_5 - eq_weight_ret_5) are blocked before PIT publish.

Output structure includes:

  • outputs/ingestion/raw/*.csv
  • outputs/ingestion/raw/breadth_integrity_summary.json
  • outputs/ingestion/raw/breadth_derivation_summary.json (when --derive-breadth is used)
  • outputs/ingestion/staging/*.csv
  • outputs/ingestion/pit/chinext50_pit.csv
  • outputs/ingestion/pit/pit_quality_summary.json
  • outputs/ingestion/ingestion_manifest.json

Frozen Walk-Forward (Train-Select / Test-Freeze)

pipelines/frozen_hypothesis_validation.py now runs a strict frozen-hypothesis process:

  1. Evaluate predefined candidates only on each training window.
  2. Select one winner by training utility (deterministic tie-break by candidate order).
  3. Freeze that winner and evaluate the paired test window without re-selection.

Candidate configuration

Candidates can come from:

  • config/regime.yaml -> frozen_validation.candidates
  • optional CLI override file: --candidates-json path/to/candidates.json

Window row requirements:

  • frozen_validation.min_train_rows (or --min-train-rows)
  • frozen_validation.min_test_rows (or --min-test-rows)

If a window is too short, it is marked as skipped with an explicit status.

Audit outputs

frozen_validation_board.csv now includes:

  • window ranges (train_*, test_*)
  • status
  • selected_candidate_id
  • selected_candidate_overrides (serialized JSON)
  • prefixed train/test metrics such as train_utility_total_score and test_utility_total_score

frozen_validation_summary.json now includes:

  • processed/skipped window counts
  • positive test-utility ratio
  • selected candidate distribution
  • status distribution

Example

python pipelines/frozen_hypothesis_validation.py \
  --pit-csv path/to/chinext50_pit.csv \
  --candidates-json path/to/frozen_candidates.json \
  --min-train-rows 180 \
  --min-test-rows 60 \
  --output-dir outputs/frozen_validation_real

Real Walk-Forward Report

Use pipelines/real_walkforward_report.py to generate a review-ready bundle from full PIT input:

  • data_quality_summary.json
  • frozen_validation_board.csv
  • real_walkforward_summary.json
  • real_walkforward_report.md
python pipelines/real_walkforward_report.py \
  --pit-csv path/to/chinext50_pit.csv \
  --strict-data \
  --output-dir outputs/real_walkforward_report

Event-Anchored Diagnostics

run_demo now outputs transition-anchor diagnostics with explicit event taxonomy:

  • crash_onset
  • false_rebound
  • true_repair
  • crowded_unwind
  • state_transition (fallback class for other transitions)

Event artifacts

  • event_log.csv: per-transition anchor details (event_date, from_state, to_state, event_type, forward returns, exposure context)
  • event_summary.csv: event-type grouped averages and counts

Classification logic is rule-based on state transitions plus forward-window confirmation signals for rebound quality.

Execution Layer Constraints and Tracking Diagnostics

Backtest execution now includes configurable constraints for better ETF-style realism:

  • trading.extreme_day_move_threshold: absolute executed return threshold that triggers cost amplification
  • trading.extreme_day_cost_multiplier: multiplier applied to base trading cost on extreme days
  • trading.gap_slippage_factor: additive gap shock cost factor using abs(gap_open) * turnover

New ledger diagnostics:

  • tracking_difference: strategy_return_net - strategy_return_gross
  • tracking_error_20: 20-day rolling std of tracking_difference

New summary metrics:

  • tracking_diff_mean
  • tracking_diff_abs_mean
  • tracking_error_20_p95

Execution Constraint Calibration

Use pipelines/calibrate_execution_constraints.py to sweep execution parameters and output a recommendation:

  • execution_calibration_grid.csv
  • execution_calibration_recommendation.json
python pipelines/calibrate_execution_constraints.py \
  --pit-csv path/to/chinext50_pit.csv \
  --cost-multipliers 1.0,1.25,1.5,1.75 \
  --gap-slippage-factors 0.0,0.01,0.02,0.03 \
  --output-dir outputs/execution_calibration

Additional Optional Concentration Inputs

To improve crowding diagnostics, you can optionally provide:

  • top1_contribution_5
  • top10_contribution_5
  • sector_concentration_20

Regime Lite (Small-Team Runtime)

Use pipelines/regime_lite_run.py for a minimal operational workflow:

  • 3 states only: risk_off, chop, trend
  • fixed base exposures: 0.0, 0.35, 0.80
  • daily exposure step cap: 0.20
  • explicit execution profiles:
    • baseline: lag1 timing, no overlay
    • promoted_fast_entry_hold3: prior promoted fixed-hold reference, based on combo_fast_hold3
    • promoted_fast_entry_adaptive_extend: current preferred profile after adaptive keep-vs-replace closure, based on combo_fast_adaptive_extend
python pipelines/regime_lite_run.py \
  --pit-csv path/to/chinext50_pit.csv \
  --profile promoted_fast_entry_adaptive_extend \
  --output-dir outputs/regime_lite

Current preferred lite runtime profile:

  • promoted_fast_entry_adaptive_extend
  • promotion decision artifact: outputs/regime_lite_promotion_20260424/regime_lite_promotion_decision.json
  • rationale: the bounded adaptive closure concluded adaptive-replace-candidate, selecting combo_fast_adaptive_extend to replace the prior fixed-hold reference while keeping baseline as rollback-safe reference
  • rollback/reference profile: baseline
  • inspect promotion_decision.active_adaptive_mode plus regime_lite_summary.json -> execution_profile.adaptive_hold_mode / adaptive_hold_context to understand the active bounded hold semantics before operating it

Converged lite operational flow:

  1. Run the preferred profile with pipelines/regime_lite_run.py --profile promoted_fast_entry_adaptive_extend.
  2. Inspect regime_lite_runtime_health.json for bounded status healthy / review / hold / rollback_recommended.
  3. Inspect regime_lite_post_promotion_review.json for bounded decision keep_promoted / hold_and_review / recommend_rollback.
  4. In post-promotion review, treat recent_window_evidence as the primary decision basis; full_history_reference is reference context only, and segmented_diagnostics is for bounded diagnosis rather than override.
  5. If health stays healthy and review stays keep_promoted, continue normal lite operation.
  6. If health moves to review or review moves to hold_and_review, pause new tuning and inspect the bounded reasons before any change.
  7. If health reaches rollback_recommended or review reaches recommend_rollback, switch back to baseline as the rollback-safe profile and keep the lite path scoped to that runtime handoff.

Artifacts:

  • regime_lite_daily_ledger.csv
  • regime_lite_summary.json
  • regime_lite_report.md
  • regime_lite_runtime_health.json
  • regime_lite_post_promotion_review.json

Execution Timing + Entry-Exit Experiments

Run controlled A/B experiments for:

  • execution timing: lag1 vs fast_entry
  • entry-specific exit overlay: short trend-entry hold floor with stop guard
python pipelines/regime_lite_experiments.py \
  --pit-csv path/to/chinext50_pit.csv \
  --output-dir outputs/regime_lite_experiments

Artifacts:

  • regime_lite_experiment_results.csv
  • regime_lite_experiment_summary.json
  • regime_lite_experiment_report.html
  • regime_lite_experiment_baseline_ledger.csv
  • regime_lite_experiment_best_ledger.csv
  • regime_lite_promotion_decision.json

The experiment board now separates:

  • recommendation candidate: best discovery-sample variant
  • promotion status: final promote / hold / reject decision from deterministic holdout validation
  • governance handoff target: the promoted runtime profile that must flow into bounded lite runtime health and post-promotion review

Verification

Project health check:

py -m pytest -q

The repository now pins pytest collection to the main tests/ directory, so historical deliverable bundles and backups do not pollute the default test run.