openclaw/cyb50-quant @ 4b84fd98a24b1a435e3f5ea3155463d211af5332

## 2026-04-10

User requested strict sequential implementation of GPT Pro guidance in chinext50_recalibrate_guidance_for_codex_2026-04-09.md.
Confirmed step-1/2 changes already present in codebase (backtest/frozen_walkforward.py, config/regime.yaml).
Completed step-3 report semantics refactor in pipelines/real_walkforward_report.py:
- Added stitched frozen OOS ledger reconstruction and output stitched_frozen_oos_ledger.csv.
- Summary now includes three metric branches: default_strategy_full_sample_metrics, stitched_frozen_oos_metrics, baseline_full_sample_metrics.
- Comparison now split into stitched_oos_vs_baseline and default_vs_baseline.
- Legacy top-level comparison aliases retained and mapped to stitched OOS branch.
- Added primary/partial window success diagnostics and rule serialization.
Completed step-4 tests:
- tests/test_frozen_walkforward.py now includes frontier fallback, non-binary stability score, mixed turnover override threshold, and bounded return-ratio tests.
- tests/test_real_walkforward_report_pipeline.py now validates stitched metrics fields, primary-window success semantics, partial-window exclusion, and stitched-first report comparison.
Verification:
- Targeted tests: 11 passed.
- Full regression: 81 passed.
Real pipeline rerun on full50 PIT completed at outputs/real_walkforward_recalibrated_20260410_semantic_v1/ with new stitched/report semantics.
Started OpenSpec change execute-fullcode-guidance-20260410 for strict block order B1 -> B2 -> B3 -> H1 -> H2 -> H3.
B1 completed:
- Added same-period stitched baseline semantics in real_walkforward_report.py (baseline_stitched_oos_metrics, stitched comparison now aligned to stitched dates).
- Updated report pipeline tests for semantic split.
- Targeted tests passed (3 passed).
- Key delta vs block0 baseline: stitched annual_return_delta -0.0951 -> +0.0299.
B2 completed:
- Split utility to core_utility + turnover_penalty + net_utility in backtest/utility.py.
- Updated frozen selection stability to use core utility (removed net-utility turnover duplication in stability path) in backtest/frozen_walkforward.py.
- Added config defaults for utility calibration and core utility normalization.
- Added/updated tests in test_utility.py and test_frozen_walkforward.py.
- Targeted tests passed (15 passed).
- Key delta vs B1: stitched utility_delta_vs_baseline improved -0.1012 -> +0.0257; return/drawdown unchanged.
B3 attempted then rolled back due guardrail failure:
- Attempted state precedence fix (trend before repair) + overlap test.
- Targeted tests passed during attempt (6 passed), but guardrail failed on stitched state mix: risk_off=0.3686 (required <=0.32).
- Rolled back B3 files to pre-block snapshots and revalidated rollback (tests/test_policy.py back to 5 passed; B3_rolled_back metrics equal B2).
Stopped execution per rule after guardrail failure; did not proceed to H1/H2/H3.
Followed new sequence document chinext50_post_b3_next_steps_for_codex_2026-04-10.md.
B3 re-landed successfully with semantic fix + config-driven state thresholds:
- model/state_machine.py now uses config-driven state_machine.thresholds and trend/repair overlap exclusivity.
- config/regime.yaml now has explicit state_machine.thresholds keys.
- Added overlap / euphoric / risk_off-priority tests in tests/test_policy.py.
- B3 guardrails passed against B2 baseline under semantic criteria.
B4 completed:
- positive_window_ratio kept and marked diagnostic-only in frozen summary/report.
- Added primary_acceptance_metrics + report acceptance anchor text.
- Targeted tests passed and stitched/state deltas vs B3 are zero (semantic labeling only).
H1a executed (risk_off-only thresholds):
- Applied risk_off and crash-override threshold changes in config/regime.yaml.
- Added config-driven risk_off threshold test in tests/test_policy.py.
- H1a stop conditions did not trigger, so no rollback.
- H1a acceptance criteria failed on annual_return floor (annual_return_delta below B3-0.01), so flow is blocked before H1b by document rule.
- Additional risk_off-only micro-grid checks did not find a combination satisfying all H1a acceptance constraints simultaneously on current PIT.
Created GPT-Pro handoff bundle with code + data + outputs + issue list:
- deliverables/gpt_pro_post_b3_bundle_2026-04-10.zip
- includes current codebase slices, PIT input, B2/B3/B4/H1a run outputs, block backups, and gpt_pro_post_b3_issues_2026-04-10.md.
Executed detailed post-B3 guidance sequence blocks:
- R0 completed (H1a two-tier acceptance reframe note generated at deliverables/h1a_reframe_decision_2026-04-10.md).
- R1 optional microprobe completed and adopted (risk_off thresholds set to 0.67/0.89/-0.14, crash_override 0.77), improving stitched annual_return_delta and keeping defense constraints.
- H1b.1 attempted with repair cleanup thresholds and new repair guards, but stop condition failed (drawdown_ratio_vs_baseline=0.6624 > 0.64), so block was rolled back.
After rollback, state restored to R1 baseline and verified by tests + report rerun (fullcode_seq_20260410_H1b1_rolled_back equals R1 metrics).
Ran one additional conservative H1b.1 attempt (H1b1_retry1) from R1 baseline:
- Attempted repair cleanup with milder thresholds still triggered stop condition (drawdown_ratio_vs_baseline=0.6579 > 0.64).
- Block was rolled back per rule; post-rollback metrics match R1 baseline.
Prepared updated GPT-Pro full handoff package with latest detailed-guidance execution context:
- zip: deliverables/gpt_pro_fullcode_post_b3_detailed_2026-04-10.zip
- summary note: deliverables/gpt_pro_handoff_post_b3_detailed_2026-04-10.md
- includes code, PIT data, B3/B4/H1a/R0/R1/H1b1 attempts and rollbacks, OpenSpec change, and backups.
2026-04-10: Executed chinext50_post_b3_feedback_response_for_codex_2026-04-10.md sequence in strict order: D0 -> preparatory repair-threshold code change -> H1b.1-L1 -> H1b.1-L2 -> H1b.1-L3 -> H1b.2-direct-from-R1.
D0 completed with diagnostic-only additions in report (stitched_diagnostics fields + markdown section), with zero drift vs R1 stitched metrics/state mix/exposure means.
Preparatory change completed: added repair_breadth_min and repair_d_trend_min (default -1.0) in state machine/config; added policy tests; verified zero drift vs R1.
H1b.1 results:
- L1 stop triggered (drawdown_ratio_vs_baseline=0.6539, repair=0.1154, mean_target_exposure breach); rolled back current block.
- L2 stop triggered (drawdown_ratio_vs_baseline=0.6607, annual_return_delta<0, repair=0.1135, mean_target_exposure breach); rolled back current block.
- L3 stop triggered (drawdown_ratio_vs_baseline=0.6579, annual_return_delta<0, repair=0.1096, mean_target_exposure breach); rolled back current block.
Since H1b.1 ladder failed, moved to H1b.2-direct-from-R1 as instructed.
H1b.2-direct-from-R1 run kept stop conditions clear but did not meet acceptance thresholds: annual_return_delta and upside_capture both below R1, trend+euphoric below 0.155.
Outputs saved to:
- outputs/fullcode_seq_20260410_D0_diag_only/
- outputs/fullcode_seq_20260410_prep_repair_threshold_defaults/
- outputs/fullcode_seq_20260410_H1b1_L1/
- outputs/fullcode_seq_20260410_H1b1_L2/
- outputs/fullcode_seq_20260410_H1b1_L3/
- outputs/fullcode_seq_20260410_H1b2_direct_from_R1/
Consolidated run summary artifacts:
- deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.json
- deliverables/chinext50_post_b3_feedback_execution_summary_2026-04-10.md
2026-04-10: User requested no more back-and-forth confirmations and asked to continue autonomously.
Pivoted to small-team scope and completed OpenSpec change build-regime-lite-mvp.
Implemented new lightweight pipeline pipelines/regime_lite_run.py (3 states: risk_off/chop/trend; base exposures 0.0/0.35/0.80; max daily step 0.20).
Added tests tests/test_regime_lite_pipeline.py (2 passed).
Real PIT smoke output: outputs/regime_lite_20260410/ with artifacts:
- regime_lite_daily_ledger.csv
- regime_lite_summary.json
- regime_lite_report.md
Smoke key metrics:
- annual_return=0.0481
- max_drawdown=0.3124
- sharpe=0.4789
- annual_turnover=8.4859
OpenSpec archive done: openspec/changes/archive/2026-04-10-build-regime-lite-mvp/.
Specs validation after sync: 31 passed.

2026-04-10.md 8.4 KB Histórico Em bruto

2026-04-10.md 8.4 KB

Histórico Em bruto