Smaug

Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca. Copying the idea from [insidercopytrading.com](https://insidercopytrading.com/). Available at [insidercopytradingcopy.com](#no-hosted-version). ## Architecture ``` EDGAR (Form 4 feed) | v ingestion/edgar_poller.py -- polls every 10 min, dedupes by accession ingestion/sec_bulk_ingest.py -- bulk historical ingest via quarterly form.idx archives | v ingestion/form4_parser.py -- parses XML, detects 10b5-1 plans, extracts tx_code | v db/models.py + db/db.py -- SQLAlchemy ORM: filings, signals, price_cache tables | v signals/filter_engine.py -- buy-only, open-market (P) only, exclude 10b5-1, signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware | +---> alerts/slack_alert.py -- POST to Slack webhook when score >= threshold +---> broker/alpaca_client.py -- paper/live order (NOT FULLY IMPLEMENTED -- see Results) backtest/backtest.py -- per-signal return / alpha vs SPY backtest/simulate.py -- portfolio simulation with configurable transaction costs backtest/plot.py -- HP sweep heatmap + equity curve plots ``` ## Usage ```bash pip install -r requirements.txt cp .env.example .env # fill in credentials # Live polling (every 10 min) python main.py run # Bulk-ingest historical filings (2 years took ~3 days at SEC's 10 req/s rate limit) python main.py backfill --years 2023 2024 python main.py backfill --year 2024 --quarter 1 # Per-signal backtest: win rate, alpha vs SPY python main.py backtest # Portfolio simulation with transaction cost modelling python main.py simulate [options] # Generate HP heatmap + equity curve plots (saves to plots/) python main.py plot ``` ### Simulate options ``` Strategy: --holding-days N Days to hold each position (default: 7) --buy-delay N Days after signal to enter (default: 1) --position-size F Fraction of available cash per trade (default: 0.10) --min-score F Minimum signal score (default: 0.0) --min-cluster N Minimum cluster size (default: 1) --cap-tier large|mid|small|micro Filter by market cap tier (default: all) --capital F Initial capital (default: 100000) Transaction costs (Alpaca has zero commission, set --commission 0): --spread F One-way bid-ask half-spread at entry and exit (default: 0.003) --slippage F Entry slippage / market impact (default: 0.002) --commission F Per-trade commission as fraction of notional (default: 0.001) ``` Round-trip = spread x 2 + slippage + commission x 2. Cap tiers: large >$10B, mid $2-10B, small $300M-2B, micro <$300M. Market caps are fetched from yfinance on first use and cached in the DB. ## Setup | Variable | Default | Description | |---|---|---| | `SLACK_WEBHOOK_URL` | | Incoming webhook URL for alerts | | `ALPACA_KEY` | | Alpaca API key | | `ALPACA_SECRET` | | Alpaca API secret | | `ALPACA_BASE_URL` | `https://paper-api.alpaca.markets` | Paper or live endpoint | | `DB_PATH` | `insider.db` | SQLite database path | ## Key config (`config.py`) | Parameter | Default | Description | |---|---|---| | `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this | | `MIN_CLUSTER_SIZE` | 1 | Unique insiders before a signal fires | | `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting | | `HOLDING_PERIOD_DAYS` | 90 | Days held per position | | `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade | | `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger alert | ## Scoring ``` score = role_weight * log(total_value) * (1 + 0.5 * (cluster_size - 1)) ``` Role weights: CEO 3.0, CFO/President 2.5, COO 2.0, Director 1.5, VP 1.2, 10% owner 1.0 ## No Hosted Version Yeah, no. There is actually no hosted version available of Smaug. Bazinga! Read along to learn why. If you still want to run it yourself, see [Usage](#usage). ## Results 16,279 signals from 302k Form 4 filings (2020-2025). ### Per-signal stats (pre-cost) | Hold | Avg return | Alpha vs SPY | Sharpe | Win rate | |------|-----------|--------------|--------|----------| | 3d | +0.61% | +0.52% | ~0.80 | ~53% | | 7d | +1.19% | +0.68% | ~1.05 | ~54% | | 14d | +1.41% | +0.55% | ~0.90 | ~54% | | 30d | +1.89% | +0.41% | ~0.70 | ~54% | The signal exists. It just does not survive transaction costs. ### Portfolio simulation (7d hold, 1d delay, 10% of cash per signal) ![HP Sweep](plots/hp_sweep.png) ![Equity Curves](plots/equity_curves.png) ![Position Size Sensitivity](plots/position_size.png) Alpaca charges $0 commission on US equities. Real costs are spread + slippage only. Cost estimates based on SEC small-cap liquidity research and Alpaca documentation. Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal: | Cap tier | Signals | RT cost | Ann. return | vs SPY | |----------|---------|---------|-------------|--------| | Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% | | Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% | | Small ($300M-2B) | 3,871 | ~1.5% | see plot | see plot | | Micro (<$300M) | 5,048 | ~5% (if listed) | see plot | see plot | **Note on micro-cap:** Alpaca does not allow opening new positions in OTC/Pink Sheet stocks (close-only). Most micro-cap signals involve OTC-listed names that are simply not tradeable. For exchange-listed micro-caps, realistic round-trip costs are ~5% or more based on SEC spread data. The simulated alpha disappears entirely at that cost level. ### About insidercopytrading.com Their website advertises backtested returns that significantly outperform the market. Those numbers cannot be replicated in practice because the backtesting methodology omits the costs that matter most: - **Same-day entry** at the closing price of the filing date, a price you cannot buy at as a retail trader. - **No spread or slippage.** SEC data shows small-cap round-trip costs of ~1.5% and micro-cap of ~5%, matching the table above. - **Survivorship bias** -- signals for stocks that later delisted or became untradeable are excluded from their results but would have been part of your portfolio. Under realistic assumptions, the strategy underperforms SPY across all tested parameters. [insidercopytrading.com](https://insidercopytrading.com) advertises performance numbers that their own subscribers cannot reproduce. Their website is rather pretty though. Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested. ## Modules | Path | Purpose | |---|---| | `config.py` | Thresholds and env-var loading | | `ingestion/edgar_poller.py` | EDGAR Atom feed polling | | `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via form.idx | | `ingestion/form4_parser.py` | Form 4 XML parser; 10b5-1 detection | | `db/models.py` | SQLAlchemy ORM models (Filing, Signal, PriceCache, TickerMeta) | | `db/db.py` | DB access layer | | `signals/filter_engine.py` | Filing to signal pipeline | | `signals/cluster_detector.py` | Cluster detection | | `alerts/slack_alert.py` | Slack webhook | | `broker/alpaca_client.py` | Alpaca order execution | | `backtest/backtest.py` | Per-signal backtest | | `backtest/simulate.py` | Portfolio simulator with cap-tier filtering | | `backtest/plot.py` | Plot generator | | `main.py` | CLI: `run / backfill / backtest / simulate / plot` | ## Requirements Python 3.11+. See `requirements.txt`.