# Smaug — Insider Copytrade Monitor Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca. ## Architecture ``` EDGAR (Form 4 feed) │ ▼ ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession │ ▼ ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans │ ▼ db/schema.sql + db/db.py ← SQLite (WAL mode): filings + signals tables │ ▼ signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window │ ├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold └──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap positions auto-closed after holding period expires ``` ## Setup ```bash cp .env.example .env # edit .env with your credentials pip install -r requirements.txt ``` ### Environment variables (`.env`) | Variable | Required | Default | Description | |---|---|---|---| | `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts | | `ALPACA_KEY` | optional | — | Alpaca API key | | `ALPACA_SECRET` | optional | — | Alpaca API secret | | `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint | | `DB_PATH` | optional | `insider.db` | SQLite database file path | | `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings | ## Usage ```bash # Initialize DB and ingest current EDGAR feed (one shot) python main.py fetch-once # Run continuous polling loop (every 10 minutes) python main.py run # Backtest signals already in the DB against historical prices python main.py backtest ``` ## Key configuration (`config.py`) | Parameter | Default | Description | |---|---|---| | `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence | | `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this | | `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires | | `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting | | `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + auto-close trigger) | | `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade | | `MAX_POSITIONS` | 20 | Hard position limit | | `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert | ## Scoring ``` score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size − 1)) ``` Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0 ## Backtesting The backtest loads signals from the SQLite DB and fetches OHLC data via `yfinance` on demand (no local price cache). Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date. Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number. Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio. ## Position lifecycle Positions are tracked in the `signals` table. When a trade is executed, `executed_at` is recorded. On each poll cycle the poller checks for positions where `executed_at` is older than `HOLDING_PERIOD_DAYS` and calls Alpaca to close them, marking `closed=1` in the DB. ## Modules | Path | Purpose | |---|---| | `config.py` | All thresholds and env-var loading | | `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication | | `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection | | `db/schema.sql` | SQLite schema (`filings`, `signals`) | | `db/db.py` | DB access layer | | `signals/filter_engine.py` | Filing → signal pipeline | | `signals/cluster_detector.py` | Cluster detection from DB | | `alerts/slack_alert.py` | Slack webhook alert | | `broker/alpaca_client.py` | Alpaca order execution + position exit | | `backtest/backtest.py` | Historical backtest runner | | `main.py` | CLI entry point | ## Requirements - Python 3.11+ - See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`