- cluster_detector: pass as_of_date through to DB query so historical signal
reprocessing doesn't look into the future
- filter_engine: accept as_of_date; skip non-open-market tx_codes (only P/"");
reject placeholder tickers (NONE, N/A); propagate as_of_date to cluster detection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sec_bulk_ingest.py: new module — downloads quarterly form.idx from SEC EDGAR,
filters Form 4/4A, fetches each filing's SGML/XML, parses and stores.
Adaptive token-bucket rate limiter (backs off on 429/5xx, ramps on success).
Uses filter_new_accessions for fast quarter-level dedup before any HTTP.
Marks derivative-only filings as seen so they're skipped on resume.
- form4_parser: extract tx_code (transactionCode) from each transaction row;
fix role extraction (Director/10%owner/Officer fallback); fix _text() to
handle <value> sub-elements; fix footnote text extraction
- edgar_poller: filter feed entries to Form 4/4A only; skip XSLT stylesheet URLs
when resolving XML filing links
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- insert_filing: catch IntegrityError on duplicate accession instead of crashing
- filter_new_accessions: bulk pre-filter entire quarter against DB in chunked IN queries
(avoids 30min per-row accession_exists loop during resume)
- mark_accession_seen: store placeholder row for derivative-only/empty filings so they
aren't re-fetched on every resume
- get_recent_buys_for_ticker: accept as_of_date to clamp queries for historical signal gen
- get_all_buys_for_reprocess: return all buy filings ordered by transaction_date for backfill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace db/schema.sql + raw sqlite3 with SQLAlchemy ORM (db/models.py)
- Filing, Signal, PriceCache models with proper indexes
- db/db.py uses SQLAlchemy sessions throughout; no raw SQL strings
- Add PriceCache table: stores daily close prices per ticker
- backtest._fetch_prices checks DB first; skips yfinance for completed ranges
- New data persisted via upsert_prices()
- get_cached_prices() / upsert_prices() added to db.py
- EDGAR poller incremental fetch: get_latest_filed_date() returns newest
filed_date in DB; fetch_and_store_new_filings skips entries older than
that cutoff before even checking accession_exists
- Add get_signals_for_backtest() to db.py; backtest no longer opens its
own sqlite3 connection
- requirements.txt: add sqlalchemy>=2.0.0
Co-authored-by: dodox <dodox@users.noreply.local>