Actual simulation results with 1.5% round-trip show -2.5% annualized (vs SPY +16%). The per-trade signal exists but the margin (~0.68% alpha) is too thin to survive realistic small-cap execution costs and a 1-day entry delay. Also explains why insider-copytrade sites report outperformance: they use same-day entry and omit spread/slippage from their simulations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
226 lines
10 KiB
Markdown
226 lines
10 KiB
Markdown
<h1 align="center">
|
||
<img src='./icon.png' width="250px"">
|
||
<br>
|
||
<b>Smaug</b>
|
||
</h1>
|
||
|
||
|
||
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
|
||
Copying the idea from [insidercopytrading.com](https://insidercopytrading.com/). Available at [insidercopytradingcopy.com](https://www.youtube.com/watch?v=dQw4w9WgXcQ)
|
||
|
||
## Architecture
|
||
|
||
```
|
||
EDGAR (Form 4 feed)
|
||
│
|
||
▼
|
||
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
|
||
ingestion/sec_bulk_ingest.py ← bulk historical ingest via quarterly form.idx archives
|
||
│
|
||
▼
|
||
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans, extracts tx_code
|
||
│
|
||
▼
|
||
db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables
|
||
│
|
||
▼
|
||
signals/filter_engine.py ← buy-only, open-market (P) only, exclude 10b5-1,
|
||
signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware
|
||
│
|
||
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
|
||
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
|
||
positions auto-closed after holding period expires
|
||
|
||
backtest/backtest.py ← per-signal return / alpha vs SPY analysis
|
||
backtest/simulate.py ← realistic portfolio simulation with transaction costs
|
||
```
|
||
|
||
## Setup
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
# edit .env with your credentials
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Environment variables (`.env`)
|
||
|
||
| Variable | Required | Default | Description |
|
||
|---|---|---|---|
|
||
| `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts |
|
||
| `ALPACA_KEY` | optional | — | Alpaca API key |
|
||
| `ALPACA_SECRET` | optional | — | Alpaca API secret |
|
||
| `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint |
|
||
| `DB_PATH` | optional | `insider.db` | SQLite database file path |
|
||
| `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings |
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
# Initialize DB and start continuous polling (every 10 minutes)
|
||
python main.py run
|
||
|
||
# Bulk-ingest historical Form 4 filings from SEC EDGAR quarterly archives
|
||
python main.py backfill --years 2023 2024 # full year range
|
||
python main.py backfill --year 2024 --quarter 1 # single quarter
|
||
|
||
# Per-signal backtest: win rate, alpha vs SPY
|
||
python main.py backtest
|
||
|
||
# Portfolio simulation with configurable strategy and cost params
|
||
python main.py simulate [options]
|
||
```
|
||
|
||
### Simulate options
|
||
|
||
```
|
||
Strategy:
|
||
--holding-days N Calendar days to hold each position (default: 7)
|
||
--buy-delay N Days after signal trigger to enter (default: 1)
|
||
--position-size F Fraction of available cash per trade (default: 0.10)
|
||
--min-score F Minimum signal score filter (default: 0.0)
|
||
--min-cluster N Minimum cluster size filter (default: 1)
|
||
--capital F Initial capital in USD (default: 100000)
|
||
|
||
Transaction costs:
|
||
--spread F One-way bid-ask half-spread paid at entry and exit (default: 0.003)
|
||
--slippage F Entry slippage / market impact (default: 0.002)
|
||
--commission F Per-trade commission as fraction of notional (default: 0.001)
|
||
|
||
Round-trip cost = spread×2 + slippage + commission×2
|
||
```
|
||
|
||
## Key configuration (`config.py`)
|
||
|
||
| Parameter | Default | Description |
|
||
|---|---|---|
|
||
| `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence |
|
||
| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this |
|
||
| `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires |
|
||
| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting |
|
||
| `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + auto-close trigger) |
|
||
| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade |
|
||
| `MAX_POSITIONS` | 20 | Hard position limit |
|
||
| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert |
|
||
|
||
## Scoring
|
||
|
||
```
|
||
score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size − 1))
|
||
```
|
||
|
||
Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0
|
||
|
||
## Backtesting
|
||
|
||
The backtest loads signals from the DB and fetches OHLC data via `yfinance`. Prices are cached in the `price_cache` table — completed date ranges are served entirely from the DB on repeat runs. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date.
|
||
|
||
## Results (2023–2024 backtest, 302k filings ingested)
|
||
|
||
> **⚠ Read the caveats below before drawing conclusions.**
|
||
|
||
### Per-signal statistics (pre-cost)
|
||
|
||
Across 16,279 signals generated from 302k Form 4 filings (2023–2024):
|
||
|
||
| Hold | Avg return | Avg alpha vs SPY | Sharpe | Win rate |
|
||
|------|-----------|-----------------|--------|----------|
|
||
| 3 d | +0.61% | +0.52% | ~0.80 | ~53% |
|
||
| 7 d | +1.19% | +0.68% | ~1.05 | ~54% |
|
||
| 14 d | +1.41% | +0.55% | ~0.90 | ~54% |
|
||
| 30 d | +1.89% | +0.41% | ~0.70 | ~54% |
|
||
| 90 d | +5.8% | +1.0% | ~0.55 | ~57% |
|
||
|
||
Alpha is strongest and most consistent at 3–14 day holds. Beyond 30 days, market beta dominates. Signal quality is broadly robust across `min_score` and `min_cluster` filter values.
|
||
|
||
### Portfolio simulation (1-day lag, 7-day hold, 10% of cash per signal)
|
||
|
||
Pre-cost simulation on the same period:
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Initial capital | $100,000 |
|
||
| Final value | $782,097 |
|
||
| Total return | +682% |
|
||
| Annualized return | +177% |
|
||
| SPY annualized | +25.9% |
|
||
| Max drawdown | 12.8% |
|
||
| Sharpe | 4.67 |
|
||
| Trades executed | 13,766 |
|
||
|
||
After realistic transaction costs (~1% round-trip), expected annualized return drops to roughly **20–60%** depending on assumed spread and slippage. Run the simulator to check your specific assumptions:
|
||
|
||
```bash
|
||
# Conservative (liquid mid-caps, ~1% round-trip)
|
||
python main.py simulate --spread 0.003 --slippage 0.002 --commission 0.001
|
||
|
||
# Realistic small-cap (~1.5% round-trip)
|
||
python main.py simulate --spread 0.007 --slippage 0.005 --commission 0.001
|
||
```
|
||
|
||
### Reality check: with costs this strategy underperforms SPY
|
||
|
||
Actual simulation results on the full dataset (2020–2025, 16,556 signals) with a realistic 1.5% round-trip cost:
|
||
|
||
| Config | Ann. return | SPY | Excess | Sharpe |
|
||
|--------|-------------|-----|--------|--------|
|
||
| 7d hold, 0d delay, 1.5% cost | +5.8% | +16.1% | -10.2% | 0.45 |
|
||
| 7d hold, 1d delay, 1.5% cost | -2.5% | +16.2% | -18.7% | -1.55 |
|
||
| 3d hold, 1d delay, 1.5% cost | -21.1% | +16.2% | -37.3% | -6.45 |
|
||
| 3d hold, 1d delay, 0.67% cost | +8.9% | +16.2% | -7.3% | 0.17 |
|
||
|
||
**The strategy underperforms SPY under any realistic execution assumption.** Even with 0-day delay (impossible in practice — the filing isn't visible at market open the same day) you still trail the index.
|
||
|
||
The signal exists — insiders outperform at ~0.68% per 7-day trade pre-cost — but the margin is too thin to survive the transaction costs you actually pay on small/mid-cap stocks.
|
||
|
||
### Why sites like insidercopytrading.com show outperformance
|
||
|
||
Services that claim strong returns from following insider filings typically:
|
||
- Use close-on-filing-date entry (impossible: filings arrive after hours or mid-day, you execute next open at best)
|
||
- Omit bid-ask spread and slippage from their simulations
|
||
- Cherry-pick a bull market period or high-score signal subset
|
||
- Show gross returns without benchmarking against SPY
|
||
|
||
None of that is necessarily fraudulent — it's just not what you'd actually earn. Our simulation replicates the real execution constraints and shows the gap.
|
||
|
||
### Caveats
|
||
|
||
1. **Transaction costs are everything.** Average alpha per 7-day trade is ~0.68%. A round-trip on small/mid caps costs 0.6–1.5% (spread + slippage + commission). At the high end this strategy is negative after costs. The 177% pre-cost figure is not achievable in practice.
|
||
|
||
2. **2023–2024 was an exceptional bull market.** SPY returned +25.9% annualized. The long-only bias in insider buys captured broad market momentum. Expected performance in flat or down markets is lower and untested.
|
||
|
||
3. **Survivorship bias.** Tickers that were delisted, halted, or acquired may be underrepresented in the price cache. This slightly flatters results by dropping the worst outcomes.
|
||
|
||
4. **No slippage on popular signals.** When multiple insiders at the same company buy on the same day, the stock may have already moved before you execute. The 1-day delay helps but doesn't fully resolve this.
|
||
|
||
5. **Concentrated portfolio.** At 10% of cash per signal with 7-day holds, you run ~7–10 simultaneous positions on average. Individual position variance is high.
|
||
|
||
6. **Long-only.** Excess return over SPY is not directly capturable without shorting SPY, which has its own carry cost.
|
||
|
||
## Position lifecycle
|
||
|
||
Positions are tracked in the `signals` table. When a trade is executed, `executed_at` is recorded. On each poll cycle the poller checks for positions where `executed_at` is older than `HOLDING_PERIOD_DAYS` and calls Alpaca to close them, marking `closed=1` in the DB.
|
||
|
||
## Modules
|
||
|
||
| Path | Purpose |
|
||
|---|---|
|
||
| `config.py` | All thresholds and env-var loading |
|
||
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication |
|
||
| `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via quarterly form.idx archives |
|
||
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection; tx_code extraction |
|
||
| `db/models.py` | SQLAlchemy ORM models (`Filing`, `Signal`, `PriceCache`) |
|
||
| `db/db.py` | DB access layer — dedup-safe inserts, chunked IN queries, price cache |
|
||
| `signals/filter_engine.py` | Filing → signal pipeline (open-market-only, as-of-date aware) |
|
||
| `signals/cluster_detector.py` | Cluster detection from DB (as-of-date aware) |
|
||
| `alerts/slack_alert.py` | Slack webhook alert |
|
||
| `broker/alpaca_client.py` | Alpaca order execution + position exit |
|
||
| `backtest/backtest.py` | Per-signal historical backtest runner |
|
||
| `backtest/simulate.py` | Portfolio simulator with configurable costs |
|
||
| `main.py` | CLI entry point (`run` / `backfill` / `backtest` / `simulate`) |
|
||
|
||
## Requirements
|
||
|
||
- Python 3.11+
|
||
- See `requirements.txt`: `requests`, `lxml`, `cssselect`, `yfinance`, `python-dotenv`, `alpaca-trade-api`, `sqlalchemy`
|