- Costs updated to evidence-based values (SEC small-cap liquidity study 2013, Nasdaq spread data 2021, AQR Trading Costs paper 2018): large ~0.2% RT, mid ~0.5%, small ~1.5%, micro ~5% - Micro-cap note: Alpaca does not allow new OTC/Pink Sheet positions; most micro-cap signals are untradeable; at realistic 5% RT, micro-cap destroys capital (-36% to -81% excess return) - db.py: get_cached_market_caps returns already_fetched set including null rows, preventing repeated yfinance re-queries for known-missing tickers - plot_hp_heatmap: colorbar in dedicated axes (right margin), no overlap - plot_equity_curves: two-pass approach clips all curves to min end date - README: updated cost table, shortened insidercopytrading.com section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
188 lines
7.6 KiB
Markdown
188 lines
7.6 KiB
Markdown
<h1 align="center">
|
|
<img src='./icon.png' width="250px"">
|
|
<br>
|
|
<b>Smaug</b>
|
|
</h1>
|
|
|
|
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
|
|
Copying the idea from [insidercopytrading.com](https://insidercopytrading.com/). Available at [insidercopytradingcopy.com](#no-hosted-version).
|
|
|
|
|
|
## Architecture
|
|
|
|
```
|
|
EDGAR (Form 4 feed)
|
|
|
|
|
v
|
|
ingestion/edgar_poller.py -- polls every 10 min, dedupes by accession
|
|
ingestion/sec_bulk_ingest.py -- bulk historical ingest via quarterly form.idx archives
|
|
|
|
|
v
|
|
ingestion/form4_parser.py -- parses XML, detects 10b5-1 plans, extracts tx_code
|
|
|
|
|
v
|
|
db/models.py + db/db.py -- SQLAlchemy ORM: filings, signals, price_cache tables
|
|
|
|
|
v
|
|
signals/filter_engine.py -- buy-only, open-market (P) only, exclude 10b5-1,
|
|
signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware
|
|
|
|
|
+---> alerts/slack_alert.py -- POST to Slack webhook when score >= threshold
|
|
+---> broker/alpaca_client.py -- paper/live order (NOT FULLY IMPLEMENTED -- see Results)
|
|
|
|
backtest/backtest.py -- per-signal return / alpha vs SPY
|
|
backtest/simulate.py -- portfolio simulation with configurable transaction costs
|
|
backtest/plot.py -- HP sweep heatmap + equity curve plots
|
|
```
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
cp .env.example .env # fill in credentials
|
|
|
|
# Live polling (every 10 min)
|
|
python main.py run
|
|
|
|
# Bulk-ingest historical filings (2 years took ~3 days at SEC's 10 req/s rate limit)
|
|
python main.py backfill --years 2023 2024
|
|
python main.py backfill --year 2024 --quarter 1
|
|
|
|
# Per-signal backtest: win rate, alpha vs SPY
|
|
python main.py backtest
|
|
|
|
# Portfolio simulation with transaction cost modelling
|
|
python main.py simulate [options]
|
|
|
|
# Generate HP heatmap + equity curve plots (saves to plots/)
|
|
python main.py plot
|
|
```
|
|
|
|
### Simulate options
|
|
|
|
```
|
|
Strategy:
|
|
--holding-days N Days to hold each position (default: 7)
|
|
--buy-delay N Days after signal to enter (default: 1)
|
|
--position-size F Fraction of available cash per trade (default: 0.10)
|
|
--min-score F Minimum signal score (default: 0.0)
|
|
--min-cluster N Minimum cluster size (default: 1)
|
|
--cap-tier large|mid|small|micro Filter by market cap tier (default: all)
|
|
--capital F Initial capital (default: 100000)
|
|
|
|
Transaction costs (Alpaca has zero commission, set --commission 0):
|
|
--spread F One-way bid-ask half-spread at entry and exit (default: 0.003)
|
|
--slippage F Entry slippage / market impact (default: 0.002)
|
|
--commission F Per-trade commission as fraction of notional (default: 0.001)
|
|
```
|
|
|
|
Round-trip = spread x 2 + slippage + commission x 2.
|
|
|
|
Cap tiers: large >$10B, mid $2-10B, small $300M-2B, micro <$300M.
|
|
Market caps are fetched from yfinance on first use and cached in the DB.
|
|
|
|
## Setup
|
|
|
|
| Variable | Default | Description |
|
|
|---|---|---|
|
|
| `SLACK_WEBHOOK_URL` | | Incoming webhook URL for alerts |
|
|
| `ALPACA_KEY` | | Alpaca API key |
|
|
| `ALPACA_SECRET` | | Alpaca API secret |
|
|
| `ALPACA_BASE_URL` | `https://paper-api.alpaca.markets` | Paper or live endpoint |
|
|
| `DB_PATH` | `insider.db` | SQLite database path |
|
|
|
|
## Key config (`config.py`)
|
|
|
|
| Parameter | Default | Description |
|
|
|---|---|---|
|
|
| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this |
|
|
| `MIN_CLUSTER_SIZE` | 1 | Unique insiders before a signal fires |
|
|
| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting |
|
|
| `HOLDING_PERIOD_DAYS` | 90 | Days held per position |
|
|
| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade |
|
|
| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger alert |
|
|
|
|
## Scoring
|
|
|
|
```
|
|
score = role_weight * log(total_value) * (1 + 0.5 * (cluster_size - 1))
|
|
```
|
|
|
|
Role weights: CEO 3.0, CFO/President 2.5, COO 2.0, Director 1.5, VP 1.2, 10% owner 1.0
|
|
|
|
## No Hosted Version
|
|
|
|
Yeah, no. There is actually no hosted version available of Smaug. Bazinga! Read along to learn why. If you still want to run it yourself, see [Usage](#usage).
|
|
|
|
## Results
|
|
|
|
16,279 signals from 302k Form 4 filings (2020-2025).
|
|
|
|
### Per-signal stats (pre-cost)
|
|
|
|
| Hold | Avg return | Alpha vs SPY | Sharpe | Win rate |
|
|
|------|-----------|--------------|--------|----------|
|
|
| 3d | +0.61% | +0.52% | ~0.80 | ~53% |
|
|
| 7d | +1.19% | +0.68% | ~1.05 | ~54% |
|
|
| 14d | +1.41% | +0.55% | ~0.90 | ~54% |
|
|
| 30d | +1.89% | +0.41% | ~0.70 | ~54% |
|
|
|
|
The signal exists. It just does not survive transaction costs.
|
|
|
|
### Portfolio simulation (7d hold, 1d delay, 10% of cash per signal)
|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|
Alpaca charges $0 commission on US equities. Real costs are spread + slippage only.
|
|
Cost estimates based on SEC small-cap liquidity research and Alpaca documentation.
|
|
Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal:
|
|
|
|
| Cap tier | Signals | RT cost | Ann. return | vs SPY |
|
|
|----------|---------|---------|-------------|--------|
|
|
| Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% |
|
|
| Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% |
|
|
| Small ($300M-2B) | 3,871 | ~1.5% | see plot | see plot |
|
|
| Micro (<$300M) | 5,048 | ~5% (if listed) | see plot | see plot |
|
|
|
|
**Note on micro-cap:** Alpaca does not allow opening new positions in OTC/Pink Sheet stocks (close-only). Most micro-cap signals involve OTC-listed names that are simply not tradeable. For exchange-listed micro-caps, realistic round-trip costs are ~5% or more based on SEC spread data — the simulated alpha disappears entirely at that cost level.
|
|
|
|
### About insidercopytrading.com
|
|
|
|
Their website advertises backtested returns that significantly outperform the market. Those numbers cannot be replicated in practice because the backtesting methodology omits the costs that matter most:
|
|
|
|
- **Same-day entry** at the closing price of the filing date — a price you cannot buy at as a retail trader.
|
|
- **No spread or slippage** — SEC data shows small/micro-cap bid-ask spreads of 0.5-2%+ each way.
|
|
- **Survivorship bias** — signals for stocks that later delisted or became untradeable are excluded from their results but would have been part of your portfolio.
|
|
|
|
Under realistic assumptions, the strategy underperforms SPY across all tested parameters. [insidercopytrading.com](https://insidercopytrading.com) advertises performance numbers that their own subscribers cannot reproduce. Their website is rather pretty though, and their subscription revenue is presumably real.
|
|
|
|
Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested.
|
|
|
|
|
|
## Modules
|
|
|
|
| Path | Purpose |
|
|
|---|---|
|
|
| `config.py` | Thresholds and env-var loading |
|
|
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling |
|
|
| `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via form.idx |
|
|
| `ingestion/form4_parser.py` | Form 4 XML parser; 10b5-1 detection |
|
|
| `db/models.py` | SQLAlchemy ORM models (Filing, Signal, PriceCache, TickerMeta) |
|
|
| `db/db.py` | DB access layer |
|
|
| `signals/filter_engine.py` | Filing to signal pipeline |
|
|
| `signals/cluster_detector.py` | Cluster detection |
|
|
| `alerts/slack_alert.py` | Slack webhook |
|
|
| `broker/alpaca_client.py` | Alpaca order execution |
|
|
| `backtest/backtest.py` | Per-signal backtest |
|
|
| `backtest/simulate.py` | Portfolio simulator with cap-tier filtering |
|
|
| `backtest/plot.py` | Plot generator |
|
|
| `main.py` | CLI: `run / backfill / backtest / simulate / plot` |
|
|
|
|
## Requirements
|
|
|
|
Python 3.11+. See `requirements.txt`.
|