2026-04-10.md 8.4 KB

## 2026-04-10

  • User requested strict sequential implementation of GPT Pro guidance in chinext50_recalibrate_guidance_for_codex_2026-04-09.md.
  • Confirmed step-1/2 changes already present in codebase (backtest/frozen_walkforward.py, config/regime.yaml).
  • Completed step-3 report semantics refactor in pipelines/real_walkforward_report.py:
    • Added stitched frozen OOS ledger reconstruction and output stitched_frozen_oos_ledger.csv.
    • Summary now includes three metric branches: default_strategy_full_sample_metrics, stitched_frozen_oos_metrics, baseline_full_sample_metrics.
    • Comparison now split into stitched_oos_vs_baseline and default_vs_baseline.
    • Legacy top-level comparison aliases retained and mapped to stitched OOS branch.
    • Added primary/partial window success diagnostics and rule serialization.
  • Completed step-4 tests:
    • tests/test_frozen_walkforward.py now includes frontier fallback, non-binary stability score, mixed turnover override threshold, and bounded return-ratio tests.
    • tests/test_real_walkforward_report_pipeline.py now validates stitched metrics fields, primary-window success semantics, partial-window exclusion, and stitched-first report comparison.
  • Verification:
    • Targeted tests: 11 passed.
    • Full regression: 81 passed.
  • Real pipeline rerun on full50 PIT completed at outputs/real_walkforward_recalibrated_20260410_semantic_v1/ with new stitched/report semantics.
  • Started OpenSpec change execute-fullcode-guidance-20260410 for strict block order B1 -> B2 -> B3 -> H1 -> H2 -> H3.
  • B1 completed:
    • Added same-period stitched baseline semantics in real_walkforward_report.py (baseline_stitched_oos_metrics, stitched comparison now aligned to stitched dates).
    • Updated report pipeline tests for semantic split.
    • Targeted tests passed (3 passed).
    • Key delta vs block0 baseline: stitched annual_return_delta -0.0951 -> +0.0299.
  • B2 completed:
    • Split utility to core_utility + turnover_penalty + net_utility in backtest/utility.py.
    • Updated frozen selection stability to use core utility (removed net-utility turnover duplication in stability path) in backtest/frozen_walkforward.py.
    • Added config defaults for utility calibration and core utility normalization.
    • Added/updated tests in test_utility.py and test_frozen_walkforward.py.
    • Targeted tests passed (15 passed).
    • Key delta vs B1: stitched utility_delta_vs_baseline improved -0.1012 -> +0.0257; return/drawdown unchanged.
  • B3 attempted then rolled back due guardrail failure:
    • Attempted state precedence fix (trend before repair) + overlap test.
    • Targeted tests passed during attempt (6 passed), but guardrail failed on stitched state mix: risk_off=0.3686 (required <=0.32).
    • Rolled back B3 files to pre-block snapshots and revalidated rollback (tests/test_policy.py back to 5 passed; B3_rolled_back metrics equal B2).
  • Stopped execution per rule after guardrail failure; did not proceed to H1/H2/H3.
  • Followed new sequence document chinext50_post_b3_next_steps_for_codex_2026-04-10.md.
  • B3 re-landed successfully with semantic fix + config-driven state thresholds:
    • model/state_machine.py now uses config-driven state_machine.thresholds and trend/repair overlap exclusivity.
    • config/regime.yaml now has explicit state_machine.thresholds keys.
    • Added overlap / euphoric / risk_off-priority tests in tests/test_policy.py.
    • B3 guardrails passed against B2 baseline under semantic criteria.
  • B4 completed:
    • positive_window_ratio kept and marked diagnostic-only in frozen summary/report.
    • Added primary_acceptance_metrics + report acceptance anchor text.
    • Targeted tests passed and stitched/state deltas vs B3 are zero (semantic labeling only).
  • H1a executed (risk_off-only thresholds):
    • Applied risk_off and crash-override threshold changes in config/regime.yaml.
    • Added config-driven risk_off threshold test in tests/test_policy.py.
    • H1a stop conditions did not trigger, so no rollback.
    • H1a acceptance criteria failed on annual_return floor (annual_return_delta below B3-0.01), so flow is blocked before H1b by document rule.
    • Additional risk_off-only micro-grid checks did not find a combination satisfying all H1a acceptance constraints simultaneously on current PIT.
  • Created GPT-Pro handoff bundle with code + data + outputs + issue list:
    • deliverables/gpt_pro_post_b3_bundle_2026-04-10.zip
    • includes current codebase slices, PIT input, B2/B3/B4/H1a run outputs, block backups, and gpt_pro_post_b3_issues_2026-04-10.md.
  • Executed detailed post-B3 guidance sequence blocks:
    • R0 completed (H1a two-tier acceptance reframe note generated at deliverables/h1a_reframe_decision_2026-04-10.md).
    • R1 optional microprobe completed and adopted (risk_off thresholds set to 0.67/0.89/-0.14, crash_override 0.77), improving stitched annual_return_delta and keeping defense constraints.
    • H1b.1 attempted with repair cleanup thresholds and new repair guards, but stop condition failed (drawdown_ratio_vs_baseline=0.6624 > 0.64), so block was rolled back.
  • After rollback, state restored to R1 baseline and verified by tests + report rerun (fullcode_seq_20260410_H1b1_rolled_back equals R1 metrics).
  • Ran one additional conservative H1b.1 attempt (H1b1_retry1) from R1 baseline:
    • Attempted repair cleanup with milder thresholds still triggered stop condition (drawdown_ratio_vs_baseline=0.6579 > 0.64).
    • Block was rolled back per rule; post-rollback metrics match R1 baseline.
  • Prepared updated GPT-Pro full handoff package with latest detailed-guidance execution context:
    • zip: deliverables/gpt_pro_fullcode_post_b3_detailed_2026-04-10.zip
    • summary note: deliverables/gpt_pro_handoff_post_b3_detailed_2026-04-10.md
    • includes code, PIT data, B3/B4/H1a/R0/R1/H1b1 attempts and rollbacks, OpenSpec change, and backups.
  • 2026-04-10: Executed chinext50_post_b3_feedback_response_for_codex_2026-04-10.md sequence in strict order: D0 -> preparatory repair-threshold code change -> H1b.1-L1 -> H1b.1-L2 -> H1b.1-L3 -> H1b.2-direct-from-R1.
  • D0 completed with diagnostic-only additions in report (stitched_diagnostics fields + markdown section), with zero drift vs R1 stitched metrics/state mix/exposure means.
  • Preparatory change completed: added repair_breadth_min and repair_d_trend_min (default -1.0) in state machine/config; added policy tests; verified zero drift vs R1.
  • H1b.1 results:
    • L1 stop triggered (drawdown_ratio_vs_baseline=0.6539, repair=0.1154, mean_target_exposure breach); rolled back current block.
    • L2 stop triggered (drawdown_ratio_vs_baseline=0.6607, annual_return_delta<0, repair=0.1135, mean_target_exposure breach); rolled back current block.
    • L3 stop triggered (drawdown_ratio_vs_baseline=0.6579, annual_return_delta<0, repair=0.1096, mean_target_exposure breach); rolled back current block.
  • Since H1b.1 ladder failed, moved to H1b.2-direct-from-R1 as instructed.
  • H1b.2-direct-from-R1 run kept stop conditions clear but did not meet acceptance thresholds: annual_return_delta and upside_capture both below R1, trend+euphoric below 0.155.
  • Outputs saved to:
    • outputs/fullcode_seq_20260410_D0_diag_only/
    • outputs/fullcode_seq_20260410_prep_repair_threshold_defaults/
    • outputs/fullcode_seq_20260410_H1b1_L1/
    • outputs/fullcode_seq_20260410_H1b1_L2/
    • outputs/fullcode_seq_20260410_H1b1_L3/
    • outputs/fullcode_seq_20260410_H1b2_direct_from_R1/
  • Consolidated run summary artifacts:
    • deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.json
    • deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.md
  • 2026-04-10: User requested no more back-and-forth confirmations and asked to continue autonomously.
  • Pivoted to small-team scope and completed OpenSpec change build-regime-lite-mvp.
  • Implemented new lightweight pipeline pipelines/regime_lite_run.py (3 states: risk_off/chop/trend; base exposures 0.0/0.35/0.80; max daily step 0.20).
  • Added tests tests/test_regime_lite_pipeline.py (2 passed).
  • Real PIT smoke output: outputs/regime_lite_20260410/ with artifacts:
    • regime_lite_daily_ledger.csv
    • regime_lite_summary.json
    • regime_lite_report.md
  • Smoke key metrics:
    • annual_return=0.0481
    • max_drawdown=0.3124
    • sharpe=0.4789
    • annual_turnover=8.4859
  • OpenSpec archive done: openspec/changes/archive/2026-04-10-build-regime-lite-mvp/.
  • Specs validation after sync: 31 passed.