Smaug

Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca. Copying the idea from [insidercopytrading.com](https://insidercopytrading.com/). Available at [insidercopytradingcopy.com](https://www.youtube.com/watch?v=dQw4w9WgXcQ) ## Architecture ``` EDGAR (Form 4 feed) │ ▼ ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession ingestion/sec_bulk_ingest.py ← bulk historical ingest via quarterly form.idx archives │ ▼ ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans, extracts tx_code │ ▼ db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables │ ▼ signals/filter_engine.py ← buy-only, open-market (P) only, exclude 10b5-1, signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware │ ├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold └──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap positions auto-closed after holding period expires backtest/backtest.py ← per-signal return / alpha vs SPY analysis backtest/simulate.py ← realistic portfolio simulation with transaction costs ``` ## Setup ```bash cp .env.example .env # edit .env with your credentials pip install -r requirements.txt ``` ### Environment variables (`.env`) | Variable | Required | Default | Description | |---|---|---|---| | `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts | | `ALPACA_KEY` | optional | — | Alpaca API key | | `ALPACA_SECRET` | optional | — | Alpaca API secret | | `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint | | `DB_PATH` | optional | `insider.db` | SQLite database file path | | `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings | ## Usage ```bash # Initialize DB and start continuous polling (every 10 minutes) python main.py run # Bulk-ingest historical Form 4 filings from SEC EDGAR quarterly archives python main.py backfill --years 2023 2024 # full year range python main.py backfill --year 2024 --quarter 1 # single quarter # Per-signal backtest: win rate, alpha vs SPY python main.py backtest # Portfolio simulation with configurable strategy and cost params python main.py simulate [options] ``` ### Simulate options ``` Strategy: --holding-days N Calendar days to hold each position (default: 7) --buy-delay N Days after signal trigger to enter (default: 1) --position-size F Fraction of available cash per trade (default: 0.10) --min-score F Minimum signal score filter (default: 0.0) --min-cluster N Minimum cluster size filter (default: 1) --capital F Initial capital in USD (default: 100000) Transaction costs: --spread F One-way bid-ask half-spread paid at entry and exit (default: 0.003) --slippage F Entry slippage / market impact (default: 0.002) --commission F Per-trade commission as fraction of notional (default: 0.001) Round-trip cost = spread×2 + slippage + commission×2 ``` ## Key configuration (`config.py`) | Parameter | Default | Description | |---|---|---| | `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence | | `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this | | `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires | | `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting | | `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + auto-close trigger) | | `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade | | `MAX_POSITIONS` | 20 | Hard position limit | | `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert | ## Scoring ``` score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size − 1)) ``` Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0 ## Backtesting The backtest loads signals from the DB and fetches OHLC data via `yfinance`. Prices are cached in the `price_cache` table — completed date ranges are served entirely from the DB on repeat runs. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date. ## Results (2023–2024 backtest, 302k filings ingested) > **⚠ Read the caveats below before drawing conclusions.** ### Per-signal statistics (pre-cost) Across 16,279 signals generated from 302k Form 4 filings (2023–2024): | Hold | Avg return | Avg alpha vs SPY | Sharpe | Win rate | |------|-----------|-----------------|--------|----------| | 3 d | +0.61% | +0.52% | ~0.80 | ~53% | | 7 d | +1.19% | +0.68% | ~1.05 | ~54% | | 14 d | +1.41% | +0.55% | ~0.90 | ~54% | | 30 d | +1.89% | +0.41% | ~0.70 | ~54% | | 90 d | +5.8% | +1.0% | ~0.55 | ~57% | Alpha is strongest and most consistent at 3–14 day holds. Beyond 30 days, market beta dominates. Signal quality is broadly robust across `min_score` and `min_cluster` filter values. ### Portfolio simulation (1-day lag, 7-day hold, 10% of cash per signal) Pre-cost simulation on the same period: | Metric | Value | |--------|-------| | Initial capital | $100,000 | | Final value | $782,097 | | Total return | +682% | | Annualized return | +177% | | SPY annualized | +25.9% | | Max drawdown | 12.8% | | Sharpe | 4.67 | | Trades executed | 13,766 | After realistic transaction costs (~1% round-trip), expected annualized return drops to roughly **20–60%** depending on assumed spread and slippage. Run the simulator to check your specific assumptions: ```bash # Conservative (liquid mid-caps, ~1% round-trip) python main.py simulate --spread 0.003 --slippage 0.002 --commission 0.001 # Realistic small-cap (~1.5% round-trip) python main.py simulate --spread 0.007 --slippage 0.005 --commission 0.001 ``` ### Reality check: with costs this strategy underperforms SPY Actual simulation results on the full dataset (2020–2025, 16,556 signals) with a realistic 1.5% round-trip cost: | Config | Ann. return | SPY | Excess | Sharpe | |--------|-------------|-----|--------|--------| | 7d hold, 0d delay, 1.5% cost | +5.8% | +16.1% | -10.2% | 0.45 | | 7d hold, 1d delay, 1.5% cost | -2.5% | +16.2% | -18.7% | -1.55 | | 3d hold, 1d delay, 1.5% cost | -21.1% | +16.2% | -37.3% | -6.45 | | 3d hold, 1d delay, 0.67% cost | +8.9% | +16.2% | -7.3% | 0.17 | **The strategy underperforms SPY under any realistic execution assumption.** Even with 0-day delay (impossible in practice — the filing isn't visible at market open the same day) you still trail the index. The signal exists — insiders outperform at ~0.68% per 7-day trade pre-cost — but the margin is too thin to survive the transaction costs you actually pay on small/mid-cap stocks. ### Why sites like insidercopytrading.com show outperformance Services that claim strong returns from following insider filings typically: - Use close-on-filing-date entry (impossible: filings arrive after hours or mid-day, you execute next open at best) - Omit bid-ask spread and slippage from their simulations - Cherry-pick a bull market period or high-score signal subset - Show gross returns without benchmarking against SPY None of that is necessarily fraudulent — it's just not what you'd actually earn. Our simulation replicates the real execution constraints and shows the gap. ### Caveats 1. **Transaction costs are everything.** Average alpha per 7-day trade is ~0.68%. A round-trip on small/mid caps costs 0.6–1.5% (spread + slippage + commission). At the high end this strategy is negative after costs. The 177% pre-cost figure is not achievable in practice. 2. **2023–2024 was an exceptional bull market.** SPY returned +25.9% annualized. The long-only bias in insider buys captured broad market momentum. Expected performance in flat or down markets is lower and untested. 3. **Survivorship bias.** Tickers that were delisted, halted, or acquired may be underrepresented in the price cache. This slightly flatters results by dropping the worst outcomes. 4. **No slippage on popular signals.** When multiple insiders at the same company buy on the same day, the stock may have already moved before you execute. The 1-day delay helps but doesn't fully resolve this. 5. **Concentrated portfolio.** At 10% of cash per signal with 7-day holds, you run ~7–10 simultaneous positions on average. Individual position variance is high. 6. **Long-only.** Excess return over SPY is not directly capturable without shorting SPY, which has its own carry cost. ## Position lifecycle Positions are tracked in the `signals` table. When a trade is executed, `executed_at` is recorded. On each poll cycle the poller checks for positions where `executed_at` is older than `HOLDING_PERIOD_DAYS` and calls Alpaca to close them, marking `closed=1` in the DB. ## Modules | Path | Purpose | |---|---| | `config.py` | All thresholds and env-var loading | | `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication | | `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via quarterly form.idx archives | | `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection; tx_code extraction | | `db/models.py` | SQLAlchemy ORM models (`Filing`, `Signal`, `PriceCache`) | | `db/db.py` | DB access layer — dedup-safe inserts, chunked IN queries, price cache | | `signals/filter_engine.py` | Filing → signal pipeline (open-market-only, as-of-date aware) | | `signals/cluster_detector.py` | Cluster detection from DB (as-of-date aware) | | `alerts/slack_alert.py` | Slack webhook alert | | `broker/alpaca_client.py` | Alpaca order execution + position exit | | `backtest/backtest.py` | Per-signal historical backtest runner | | `backtest/simulate.py` | Portfolio simulator with configurable costs | | `main.py` | CLI entry point (`run` / `backfill` / `backtest` / `simulate`) | ## Requirements - Python 3.11+ - See `requirements.txt`: `requests`, `lxml`, `cssselect`, `yfinance`, `python-dotenv`, `alpaca-trade-api`, `sqlalchemy`