- equity_curves.png: now shows large/mid/small cap tiers with Alpaca costs
vs theoretical no-cost baseline; SPY clamped to last strategy data point
- hp_sweep.png: updated to Alpaca zero-commission cost decomposition
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Large/mid underperform SPY significantly. Micro-cap surprisingly beats market
by +12% despite highest RT costs -- per-trade alpha in small stocks is large
enough to survive friction, but missing price data and real illiquidity are
bigger concerns than the simulation can capture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- simulate.py: --cap-tier large|mid|small|micro; yfinance market cap fetch
with DB cache (ticker_meta table); argv fix for main.py dispatch
- plot.py: equity curves now show cap tiers with Alpaca costs (zero commission);
HP sweep uses Alpaca cost decomposition; SPY line clamped to last strategy date
- db/models.py: TickerMeta table
- db/db.py: get_cached_market_caps, upsert_market_caps
- README: add --cap-tier to simulate docs; backfill note (~3 days for 2 years
at SEC 10 req/s limit); remove duplicate setup block; remove em-dashes in prose;
results table tilde estimates to be updated once cap-tier sims complete
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- backtest/plot.py: generates two plots saved to plots/
- hp_sweep.png: 7x7 heatmap of holding_days x round-trip cost, showing
annualised excess vs SPY and raw annualised return per cell
- equity_curves.png: portfolio equity vs SPY for 4 cost scenarios
- backtest/simulate.py: accept pre-loaded prices dict to avoid reloading
on every sweep iteration; return equity_curve in result
- main.py: add `plot` command
- README: updated results section with Alpaca-specific cost breakdown
(zero commission, costs are spread+slippage only); added honest analysis
of why insidercopytrading.com-style services show outperformance that
cannot be replicated in practice; note Alpaca integration not finished
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Actual simulation results with 1.5% round-trip show -2.5% annualized (vs SPY +16%).
The per-trade signal exists but the margin (~0.68% alpha) is too thin to survive
realistic small-cap execution costs and a 1-day entry delay.
Also explains why insider-copytrade sites report outperformance: they use same-day
entry and omit spread/slippage from their simulations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cluster_detector: pass as_of_date through to DB query so historical signal
reprocessing doesn't look into the future
- filter_engine: accept as_of_date; skip non-open-market tx_codes (only P/"");
reject placeholder tickers (NONE, N/A); propagate as_of_date to cluster detection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sec_bulk_ingest.py: new module — downloads quarterly form.idx from SEC EDGAR,
filters Form 4/4A, fetches each filing's SGML/XML, parses and stores.
Adaptive token-bucket rate limiter (backs off on 429/5xx, ramps on success).
Uses filter_new_accessions for fast quarter-level dedup before any HTTP.
Marks derivative-only filings as seen so they're skipped on resume.
- form4_parser: extract tx_code (transactionCode) from each transaction row;
fix role extraction (Director/10%owner/Officer fallback); fix _text() to
handle <value> sub-elements; fix footnote text extraction
- edgar_poller: filter feed entries to Form 4/4A only; skip XSLT stylesheet URLs
when resolving XML filing links
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- insert_filing: catch IntegrityError on duplicate accession instead of crashing
- filter_new_accessions: bulk pre-filter entire quarter against DB in chunked IN queries
(avoids 30min per-row accession_exists loop during resume)
- mark_accession_seen: store placeholder row for derivative-only/empty filings so they
aren't re-fetched on every resume
- get_recent_buys_for_ticker: accept as_of_date to clamp queries for historical signal gen
- get_all_buys_for_reprocess: return all buy filings ordered by transaction_date for backfill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace db/schema.sql + raw sqlite3 with SQLAlchemy ORM (db/models.py)
- Filing, Signal, PriceCache models with proper indexes
- db/db.py uses SQLAlchemy sessions throughout; no raw SQL strings
- Add PriceCache table: stores daily close prices per ticker
- backtest._fetch_prices checks DB first; skips yfinance for completed ranges
- New data persisted via upsert_prices()
- get_cached_prices() / upsert_prices() added to db.py
- EDGAR poller incremental fetch: get_latest_filed_date() returns newest
filed_date in DB; fetch_and_store_new_filings skips entries older than
that cutoff before even checking accession_exists
- Add get_signals_for_backtest() to db.py; backtest no longer opens its
own sqlite3 connection
- requirements.txt: add sqlalchemy>=2.0.0
Co-authored-by: dodox <dodox@users.noreply.local>