smaug/README.md
claude b119b9abae feat: SQLAlchemy ORM models, filing cache incremental fetch, yfinance price cache
- Replace db/schema.sql + raw sqlite3 with SQLAlchemy ORM (db/models.py)
  - Filing, Signal, PriceCache models with proper indexes
  - db/db.py uses SQLAlchemy sessions throughout; no raw SQL strings
- Add PriceCache table: stores daily close prices per ticker
  - backtest._fetch_prices checks DB first; skips yfinance for completed ranges
  - New data persisted via upsert_prices()
  - get_cached_prices() / upsert_prices() added to db.py
- EDGAR poller incremental fetch: get_latest_filed_date() returns newest
  filed_date in DB; fetch_and_store_new_filings skips entries older than
  that cutoff before even checking accession_exists
- Add get_signals_for_backtest() to db.py; backtest no longer opens its
  own sqlite3 connection
- requirements.txt: add sqlalchemy>=2.0.0

Co-authored-by: dodox <dodox@users.noreply.local>
2026-05-04 17:21:23 +00:00

113 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Smaug — Insider Copytrade Monitor
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
## Architecture
```
EDGAR (Form 4 feed)
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans
db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables
signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring
signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
positions auto-closed after holding period expires
```
## Setup
```bash
cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt
```
### Environment variables (`.env`)
| Variable | Required | Default | Description |
|---|---|---|---|
| `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts |
| `ALPACA_KEY` | optional | — | Alpaca API key |
| `ALPACA_SECRET` | optional | — | Alpaca API secret |
| `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint |
| `DB_PATH` | optional | `insider.db` | SQLite database file path |
| `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings |
## Usage
```bash
# Initialize DB and ingest current EDGAR feed (one shot)
python main.py fetch-once
# Run continuous polling loop (every 10 minutes)
python main.py run
# Backtest signals already in the DB against historical prices
python main.py backtest
```
## Key configuration (`config.py`)
| Parameter | Default | Description |
|---|---|---|
| `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence |
| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this |
| `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires |
| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting |
| `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + auto-close trigger) |
| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade |
| `MAX_POSITIONS` | 20 | Hard position limit |
| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert |
## Scoring
```
score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size 1))
```
Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0
## Backtesting
The backtest loads signals from the DB and fetches OHLC data via `yfinance`. Prices are cached in the `price_cache` table — completed date ranges are served entirely from the DB on repeat runs, avoiding redundant network calls. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date. Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number.
The EDGAR poller also skips fetching XML for filings older than the newest `filed_date` already stored in the DB, so incremental runs only process truly new filings.
Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio.
## Position lifecycle
Positions are tracked in the `signals` table. When a trade is executed, `executed_at` is recorded. On each poll cycle the poller checks for positions where `executed_at` is older than `HOLDING_PERIOD_DAYS` and calls Alpaca to close them, marking `closed=1` in the DB.
## Modules
| Path | Purpose |
|---|---|
| `config.py` | All thresholds and env-var loading |
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication |
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection |
| `db/models.py` | SQLAlchemy ORM models (`Filing`, `Signal`, `PriceCache`) |
| `db/db.py` | DB access layer (SQLAlchemy sessions) |
| `signals/filter_engine.py` | Filing → signal pipeline |
| `signals/cluster_detector.py` | Cluster detection from DB |
| `alerts/slack_alert.py` | Slack webhook alert |
| `broker/alpaca_client.py` | Alpaca order execution + position exit |
| `backtest/backtest.py` | Historical backtest runner |
| `main.py` | CLI entry point |
## Requirements
- Python 3.11+
- See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`, `sqlalchemy`