8.7 KiB
Smaug
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
Copying the idea from insidercopytrading.com. Available at insidercopytradingcopy.com
Architecture
EDGAR (Form 4 feed)
│
▼
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
ingestion/sec_bulk_ingest.py ← bulk historical ingest via quarterly form.idx archives
│
▼
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans, extracts tx_code
│
▼
db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables
│
▼
signals/filter_engine.py ← buy-only, open-market (P) only, exclude 10b5-1,
signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware
│
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
positions auto-closed after holding period expires
backtest/backtest.py ← per-signal return / alpha vs SPY analysis
backtest/simulate.py ← realistic portfolio simulation with transaction costs
Setup
cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt
Environment variables (.env)
| Variable | Required | Default | Description |
|---|---|---|---|
SLACK_WEBHOOK_URL |
optional | — | Incoming webhook URL for alerts |
ALPACA_KEY |
optional | — | Alpaca API key |
ALPACA_SECRET |
optional | — | Alpaca API secret |
ALPACA_BASE_URL |
optional | https://paper-api.alpaca.markets |
Use paper or live endpoint |
DB_PATH |
optional | insider.db |
SQLite database file path |
DATA_DIR |
optional | data/filings |
Directory for cached raw XML filings |
Usage
# Initialize DB and start continuous polling (every 10 minutes)
python main.py run
# Bulk-ingest historical Form 4 filings from SEC EDGAR quarterly archives
python main.py backfill --years 2023 2024 # full year range
python main.py backfill --year 2024 --quarter 1 # single quarter
# Per-signal backtest: win rate, alpha vs SPY
python main.py backtest
# Portfolio simulation with configurable strategy and cost params
python main.py simulate [options]
Simulate options
Strategy:
--holding-days N Calendar days to hold each position (default: 7)
--buy-delay N Days after signal trigger to enter (default: 1)
--position-size F Fraction of available cash per trade (default: 0.10)
--min-score F Minimum signal score filter (default: 0.0)
--min-cluster N Minimum cluster size filter (default: 1)
--capital F Initial capital in USD (default: 100000)
Transaction costs:
--spread F One-way bid-ask half-spread paid at entry and exit (default: 0.003)
--slippage F Entry slippage / market impact (default: 0.002)
--commission F Per-trade commission as fraction of notional (default: 0.001)
Round-trip cost = spread×2 + slippage + commission×2
Key configuration (config.py)
| Parameter | Default | Description |
|---|---|---|
EDGAR_POLL_INTERVAL |
600 s | Polling cadence |
MIN_TRANSACTION_VALUE |
$50,000 | Ignore buys below this |
MIN_CLUSTER_SIZE |
1 | Minimum unique insiders before a signal fires |
CLUSTER_WINDOW_DAYS |
30 | Rolling window for cluster counting |
HOLDING_PERIOD_DAYS |
90 | Days held per position (backtest + auto-close trigger) |
POSITION_SIZE_PCT |
2% | Fraction of portfolio per trade |
MAX_POSITIONS |
20 | Hard position limit |
SCORE_ALERT_THRESHOLD |
5.0 | Minimum score to trigger Slack alert |
Scoring
score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size − 1))
Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0
Backtesting
The backtest loads signals from the DB and fetches OHLC data via yfinance. Prices are cached in the price_cache table — completed date ranges are served entirely from the DB on repeat runs. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date.
Results (2023–2024 backtest, 302k filings ingested)
⚠ Read the caveats below before drawing conclusions.
Per-signal statistics (pre-cost)
Across 16,279 signals generated from 302k Form 4 filings (2023–2024):
| Hold | Avg return | Avg alpha vs SPY | Sharpe | Win rate |
|---|---|---|---|---|
| 3 d | +0.61% | +0.52% | ~0.80 | ~53% |
| 7 d | +1.19% | +0.68% | ~1.05 | ~54% |
| 14 d | +1.41% | +0.55% | ~0.90 | ~54% |
| 30 d | +1.89% | +0.41% | ~0.70 | ~54% |
| 90 d | +5.8% | +1.0% | ~0.55 | ~57% |
Alpha is strongest and most consistent at 3–14 day holds. Beyond 30 days, market beta dominates. Signal quality is broadly robust across min_score and min_cluster filter values.
Portfolio simulation (1-day lag, 7-day hold, 10% of cash per signal)
Pre-cost simulation on the same period:
| Metric | Value |
|---|---|
| Initial capital | $100,000 |
| Final value | $782,097 |
| Total return | +682% |
| Annualized return | +177% |
| SPY annualized | +25.9% |
| Max drawdown | 12.8% |
| Sharpe | 4.67 |
| Trades executed | 13,766 |
After realistic transaction costs (~1% round-trip), expected annualized return drops to roughly 20–60% depending on assumed spread and slippage. Run the simulator to check your specific assumptions:
# Conservative (liquid mid-caps, ~1% round-trip)
python main.py simulate --spread 0.003 --slippage 0.002 --commission 0.001
# Realistic small-cap (~1.5% round-trip)
python main.py simulate --spread 0.007 --slippage 0.005 --commission 0.001
Caveats
-
Transaction costs are everything. Average alpha per 7-day trade is ~0.68%. A round-trip on small/mid caps costs 0.6–1.5% (spread + slippage + commission). At the high end this strategy is slightly negative after costs. The 177% pre-cost figure is not achievable in practice.
-
2023–2024 was an exceptional bull market. SPY returned +25.9% annualized. The long-only bias in insider buys captured broad market momentum. Expected performance in flat or down markets is lower and untested.
-
Survivorship bias. Tickers that were delisted, halted, or acquired may be underrepresented in the price cache. This slightly flatters results by dropping the worst outcomes.
-
No slippage on popular signals. When multiple insiders at the same company buy on the same day, the stock may have already moved before you execute. The 1-day delay helps but doesn't fully resolve this.
-
Concentrated portfolio. At 10% of cash per signal with 7-day holds, you run ~7–10 simultaneous positions on average. Individual position variance is high.
-
Long-only. Excess return over SPY is not directly capturable without shorting SPY, which has its own carry cost.
Position lifecycle
Positions are tracked in the signals table. When a trade is executed, executed_at is recorded. On each poll cycle the poller checks for positions where executed_at is older than HOLDING_PERIOD_DAYS and calls Alpaca to close them, marking closed=1 in the DB.
Modules
| Path | Purpose |
|---|---|
config.py |
All thresholds and env-var loading |
ingestion/edgar_poller.py |
EDGAR Atom feed polling and deduplication |
ingestion/sec_bulk_ingest.py |
Bulk historical ingest via quarterly form.idx archives |
ingestion/form4_parser.py |
Form 4 XML → structured dict; 10b5-1 detection; tx_code extraction |
db/models.py |
SQLAlchemy ORM models (Filing, Signal, PriceCache) |
db/db.py |
DB access layer — dedup-safe inserts, chunked IN queries, price cache |
signals/filter_engine.py |
Filing → signal pipeline (open-market-only, as-of-date aware) |
signals/cluster_detector.py |
Cluster detection from DB (as-of-date aware) |
alerts/slack_alert.py |
Slack webhook alert |
broker/alpaca_client.py |
Alpaca order execution + position exit |
backtest/backtest.py |
Per-signal historical backtest runner |
backtest/simulate.py |
Portfolio simulator with configurable costs |
main.py |
CLI entry point (run / backfill / backtest / simulate) |
Requirements
- Python 3.11+
- See
requirements.txt:requests,lxml,cssselect,yfinance,python-dotenv,alpaca-trade-api,sqlalchemy