这是一个创业板50专用的日频 regime-aware exposure control 项目骨架。
它的目标不是预测每天涨跌,而是尽量做到:
data/:CSV/parquet 读取器 + synthetic demo 数据生成器features/:价格、广度、相对强弱三层特征model/:连续分数、5 态状态机、仓位映射和硬 vetobacktest/:next-open 近似执行回测、utility、事件切片pipelines/:demo 管线 + frozen-hypothesis validationtests/:最小端到端测试risk_offrepairtrendchopeuphoric_latetrend_scorebreadth_scorestress_scorecrowding_scorerepair_score以及三个路径型 hazard:
down_hazardrepair_hazardrebound_hazard在项目根目录执行:
python pipelines/run_demo.py \
--pit-csv path/to/chinext50_pit.csv \
--output-dir outputs/demo
这会使用 synthetic 数据生成:
outputs/demo/daily_ledger.csvoutputs/demo/event_summary.csvoutputs/demo/metrics_summary.jsonpython pipelines/frozen_hypothesis_validation.py \
--pit-csv path/to/chinext50_pit.csv \
--output-dir outputs/frozen_validation
你的 CSV/parquet 至少需要这些列:
dateopenhighlowclosevolume建议同时提供:
hs300_closestar50_closecsi1000_closepct_constituents_above_20dmapct_constituents_above_60dmapct_new_high_20pct_new_low_20eq_weight_ret_5weighted_ret_5top3_contribution_5corr_spike_20dispersion_20运行方式:
python pipelines/run_demo.py \
--pit-csv path/to/chinext50_pit.csv \
--output-dir outputs/real_data_demo
The runtime pipelines now require a full point-in-time dataset and can optionally block low-quality data before feature construction.
date, open, high, low, close, volume, hs300_close, star50_close, csi1000_close, pct_constituents_above_20dma, pct_constituents_above_60dma, pct_new_high_20, pct_new_low_20, eq_weight_ret_5, weighted_ret_5, top3_contribution_5, top1_contribution_5, top10_contribution_5, sector_concentration_20, corr_spike_20, dispersion_20
Rows are sorted by trading date before downstream processing.
Runtime entrypoints no longer merge sidecars on the fly.
If required PIT columns are missing, the pipeline fails before quality gate and feature construction.
--strict-data): pipeline stops only when configured blocking_columns are breached; non-blocking breaches remain warnings.Coverage threshold configuration:
config/regime.yaml -> data_quality.default_min_coverage and data_quality.column_min_coverage--min-coverageEach run writes data_quality_summary.json into the output directory.
This artifact includes gate mode, pass/fail status, breach severities (error/warning), and field-level coverage metrics.
python pipelines/run_demo.py \
--pit-csv path/to/chinext50_pit.csv \
--strict-data \
--min-coverage 0.98 \
--output-dir outputs/real_data_demo
python pipelines/frozen_hypothesis_validation.py \
--pit-csv path/to/chinext50_pit.csv \
--strict-data \
--min-coverage 0.98 \
--output-dir outputs/frozen_validation_real
Use pipelines/build_pit_dataset.py to create a reusable point-in-time table before running strategy pipelines.
python pipelines/build_pit_dataset.py \
--market-csv path/to/chinext50_market.csv \
--sidecar-csv path/to/chinext50_benchmark_sidecar.csv \
--sidecar-csv path/to/chinext50_breadth_sidecar.csv \
--output-path outputs/pit/chinext50_pit.csv
Optional quality controls:
--strict-data: block PIT output when quality breaches occur--min-coverage 0.98: override minimum non-null coverage threshold--config path/to/regime.yaml: load custom quality defaultspit_quality_summary.json in the same output directory.--output-path (.csv or .parquet).pit_quality_summary.json is still written for diagnostics.sources.market_pathsources.sidecar_pathssources.sidecar_countsources.merged_row_count
pit_columnsUse pipelines/ingest_real_data.py to fetch/load source data, publish raw + staging layers, and output final PIT in one run.
python pipelines/ingest_real_data.py \
--provider csv \
--market-csv path/to/chinext50_market.csv \
--hs300-csv path/to/hs300.csv \
--star50-csv path/to/star50.csv \
--csi1000-csv path/to/csi1000.csv \
--breadth-csv path/to/chinext50_breadth.csv \
--output-dir outputs/ingestion
python pipelines/ingest_real_data.py \
--provider akshare \
--market-symbol 159915 \
--market-symbol-type etf \
--hs300-symbol 000300 \
--star50-symbol 000688 \
--csi1000-symbol 000852 \
--start-date 2018-01-01 \
--end-date 2026-04-09 \
--breadth-csv path/to/chinext50_breadth.csv \
--output-dir outputs/ingestion
python pipelines/ingest_real_data.py \
--provider akshare \
--market-symbol 159915 \
--market-symbol-type etf \
--breadth-csv path/to/chinext50_breadth.csv \
--mairui-licence YOUR_MAIRUI_LICENCE \
--mairui-market-code 399673.SZ \
--mairui-hs300-code 000300.SH \
--mairui-star50-code 000688.SH \
--mairui-csi1000-code 000852.SH \
--start-date 2018-01-01 \
--end-date 2026-04-09 \
--output-dir outputs/ingestion
python pipelines/ingest_real_data.py \
--provider mairui \
--mairui-licence YOUR_MAIRUI_LICENCE \
--mairui-market-code 399673.SZ \
--mairui-market-kind index \
--mairui-hs300-code 000300.SH \
--mairui-star50-code 000688.SH \
--mairui-csi1000-code 000852.SH \
--breadth-csv path/to/chinext50_breadth.csv \
--start-date 2018-01-01 \
--end-date 2026-04-09 \
--output-dir outputs/ingestion
If breadth fields are also served by a Mairui endpoint, you can replace --breadth-csv with:
--mairui-breadth-url https://api.mairuiapi.com/xxx/{licence}--mairui-breadth-map-json path/to/rename_map.jsonIf you do not trust an external breadth panel (or do not have one), you can derive breadth from constituent histories:
python pipelines/ingest_real_data.py \
--provider mairui \
--mairui-licence YOUR_MAIRUI_LICENCE \
--mairui-market-code 399673.SZ \
--mairui-market-kind index \
--mairui-hs300-code 000300.SH \
--mairui-star50-code 000688.SH \
--mairui-csi1000-code 000852.SH \
--derive-breadth \
--breadth-index-symbol 399673 \
--breadth-min-active-constituents 20 \
--breadth-max-constituents 50 \
--breadth-cache-dir outputs/ingestion/raw/constituent_history \
--output-dir outputs/ingestion
Strict mode now includes a breadth-source integrity gate. Placeholder-like breadth inputs (for example, constant weighted_ret_5 - eq_weight_ret_5) are blocked before PIT publish.
Output structure includes:
outputs/ingestion/raw/*.csvoutputs/ingestion/raw/breadth_integrity_summary.jsonoutputs/ingestion/raw/breadth_derivation_summary.json (when --derive-breadth is used)outputs/ingestion/staging/*.csvoutputs/ingestion/pit/chinext50_pit.csvoutputs/ingestion/pit/pit_quality_summary.jsonoutputs/ingestion/ingestion_manifest.jsonpipelines/frozen_hypothesis_validation.py now runs a strict frozen-hypothesis process:
Candidates can come from:
config/regime.yaml -> frozen_validation.candidates--candidates-json path/to/candidates.jsonWindow row requirements:
frozen_validation.min_train_rows (or --min-train-rows)frozen_validation.min_test_rows (or --min-test-rows)If a window is too short, it is marked as skipped with an explicit status.
frozen_validation_board.csv now includes:
train_*, test_*)statusselected_candidate_idselected_candidate_overrides (serialized JSON)train_utility_total_score and test_utility_total_scorefrozen_validation_summary.json now includes:
python pipelines/frozen_hypothesis_validation.py \
--pit-csv path/to/chinext50_pit.csv \
--candidates-json path/to/frozen_candidates.json \
--min-train-rows 180 \
--min-test-rows 60 \
--output-dir outputs/frozen_validation_real
Use pipelines/real_walkforward_report.py to generate a review-ready bundle from full PIT input:
data_quality_summary.jsonfrozen_validation_board.csvreal_walkforward_summary.jsonreal_walkforward_report.mdpython pipelines/real_walkforward_report.py \
--pit-csv path/to/chinext50_pit.csv \
--strict-data \
--output-dir outputs/real_walkforward_report
run_demo now outputs transition-anchor diagnostics with explicit event taxonomy:
crash_onsetfalse_reboundtrue_repaircrowded_unwindstate_transition (fallback class for other transitions)event_log.csv: per-transition anchor details (event_date, from_state, to_state, event_type, forward returns, exposure context)event_summary.csv: event-type grouped averages and countsClassification logic is rule-based on state transitions plus forward-window confirmation signals for rebound quality.
Backtest execution now includes configurable constraints for better ETF-style realism:
trading.extreme_day_move_threshold: absolute executed return threshold that triggers cost amplificationtrading.extreme_day_cost_multiplier: multiplier applied to base trading cost on extreme daystrading.gap_slippage_factor: additive gap shock cost factor using abs(gap_open) * turnoverNew ledger diagnostics:
tracking_difference: strategy_return_net - strategy_return_grosstracking_error_20: 20-day rolling std of tracking_differenceNew summary metrics:
tracking_diff_meantracking_diff_abs_meantracking_error_20_p95Use pipelines/calibrate_execution_constraints.py to sweep execution parameters and output a recommendation:
execution_calibration_grid.csvexecution_calibration_recommendation.jsonpython pipelines/calibrate_execution_constraints.py \
--pit-csv path/to/chinext50_pit.csv \
--cost-multipliers 1.0,1.25,1.5,1.75 \
--gap-slippage-factors 0.0,0.01,0.02,0.03 \
--output-dir outputs/execution_calibration
To improve crowding diagnostics, you can optionally provide:
top1_contribution_5top10_contribution_5sector_concentration_20Use pipelines/regime_lite_run.py for a minimal operational workflow:
risk_off, chop, trend0.0, 0.35, 0.800.20baseline: lag1 timing, no overlaypromoted_fast_entry_hold3: prior promoted fixed-hold reference, based on combo_fast_hold3promoted_fast_entry_adaptive_extend: current preferred profile after adaptive keep-vs-replace closure, based on combo_fast_adaptive_extendpython pipelines/regime_lite_run.py \
--pit-csv path/to/chinext50_pit.csv \
--profile promoted_fast_entry_adaptive_extend \
--output-dir outputs/regime_lite
Current preferred lite runtime profile:
promoted_fast_entry_adaptive_extendoutputs/regime_lite_promotion_20260424/regime_lite_promotion_decision.jsonadaptive-replace-candidate, selecting combo_fast_adaptive_extend to replace the prior fixed-hold reference while keeping baseline as rollback-safe referencebaselinepromotion_decision.active_adaptive_mode plus regime_lite_summary.json -> execution_profile.adaptive_hold_mode / adaptive_hold_context to understand the active bounded hold semantics before operating itConverged lite operational flow:
pipelines/regime_lite_run.py --profile promoted_fast_entry_adaptive_extend.regime_lite_runtime_health.json for bounded status healthy / review / hold / rollback_recommended.regime_lite_post_promotion_review.json for bounded decision keep_promoted / hold_and_review / recommend_rollback.recent_window_evidence as the primary decision basis; full_history_reference is reference context only, and segmented_diagnostics is for bounded diagnosis rather than override.healthy and review stays keep_promoted, continue normal lite operation.review or review moves to hold_and_review, pause new tuning and inspect the bounded reasons before any change.rollback_recommended or review reaches recommend_rollback, switch back to baseline as the rollback-safe profile and keep the lite path scoped to that runtime handoff.Artifacts:
regime_lite_daily_ledger.csvregime_lite_summary.jsonregime_lite_report.mdregime_lite_runtime_health.jsonregime_lite_post_promotion_review.jsonRun controlled A/B experiments for:
lag1 vs fast_entrypython pipelines/regime_lite_experiments.py \
--pit-csv path/to/chinext50_pit.csv \
--output-dir outputs/regime_lite_experiments
Artifacts:
regime_lite_experiment_results.csvregime_lite_experiment_summary.jsonregime_lite_experiment_report.htmlregime_lite_experiment_baseline_ledger.csvregime_lite_experiment_best_ledger.csvregime_lite_promotion_decision.jsonThe experiment board now separates:
promote / hold / reject decision from deterministic holdout validationProject health check:
py -m pytest -q
The repository now pins pytest collection to the main tests/ directory, so historical deliverable bundles and backups do not pollute the default test run.