Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca. Copying the idea from https://insidercopytrading.com
Go to file
Dominik Roth e340d59a69 docs: update README with results section, simulator usage, and caveats
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 17:49:29 +02:00
.gitea/workflows Initial commit 2026-05-04 18:07:44 +02:00
alerts feat: add PLAN.md and insider copytrade POC implementation 2026-05-04 16:15:22 +00:00
backtest feat(backtest): portfolio simulator with configurable strategy and transaction costs 2026-05-26 17:49:14 +02:00
broker fix: address sanity-check issues + rebrand to Smaug 2026-05-04 16:32:00 +00:00
db feat(db): dedup-safe inserts, filter_new_accessions, mark_accession_seen, as-of-date queries 2026-05-26 17:48:33 +02:00
ingestion feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00
signals feat(signals): as-of-date aware cluster detection, open-market-only filter 2026-05-26 17:48:59 +02:00
.env.example feat: add PLAN.md and insider copytrade POC implementation 2026-05-04 16:15:22 +00:00
.gitignore chore: gitignore data/, .claude/, WAL sidecar files; add cssselect dep 2026-05-26 17:48:23 +02:00
config.py feat: add PLAN.md and insider copytrade POC implementation 2026-05-04 16:15:22 +00:00
icon.png add icon 2026-05-04 19:59:45 +02:00
main.py feat(cli): add backfill and simulate commands; historical signal reprocessing 2026-05-26 17:49:23 +02:00
PLAN.md feat: add PLAN.md and insider copytrade POC implementation 2026-05-04 16:15:22 +00:00
README.md docs: update README with results section, simulator usage, and caveats 2026-05-26 17:49:29 +02:00
requirements.txt chore: gitignore data/, .claude/, WAL sidecar files; add cssselect dep 2026-05-26 17:48:23 +02:00


Smaug

Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
Copying the idea from insidercopytrading.com. Available at insidercopytradingcopy.com

Architecture

EDGAR (Form 4 feed)
      │
      ▼
ingestion/edgar_poller.py    ← polls every 10 min, dedupes by accession
ingestion/sec_bulk_ingest.py ← bulk historical ingest via quarterly form.idx archives
      │
      ▼
ingestion/form4_parser.py    ← parses XML, detects 10b5-1 plans, extracts tx_code
      │
      ▼
db/models.py + db/db.py      ← SQLAlchemy ORM: filings, signals, price_cache tables
      │
      ▼
signals/filter_engine.py     ← buy-only, open-market (P) only, exclude 10b5-1,
signals/cluster_detector.py    min $50k, role-weighted scoring, as-of-date aware
      │
      ├──► alerts/slack_alert.py   ← POST to Slack webhook when score ≥ threshold
      └──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
                                        positions auto-closed after holding period expires

backtest/backtest.py         ← per-signal return / alpha vs SPY analysis
backtest/simulate.py         ← realistic portfolio simulation with transaction costs

Setup

cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt

Environment variables (.env)

Variable Required Default Description
SLACK_WEBHOOK_URL optional Incoming webhook URL for alerts
ALPACA_KEY optional Alpaca API key
ALPACA_SECRET optional Alpaca API secret
ALPACA_BASE_URL optional https://paper-api.alpaca.markets Use paper or live endpoint
DB_PATH optional insider.db SQLite database file path
DATA_DIR optional data/filings Directory for cached raw XML filings

Usage

# Initialize DB and start continuous polling (every 10 minutes)
python main.py run

# Bulk-ingest historical Form 4 filings from SEC EDGAR quarterly archives
python main.py backfill --years 2023 2024        # full year range
python main.py backfill --year 2024 --quarter 1  # single quarter

# Per-signal backtest: win rate, alpha vs SPY
python main.py backtest

# Portfolio simulation with configurable strategy and cost params
python main.py simulate [options]

Simulate options

Strategy:
  --holding-days N      Calendar days to hold each position (default: 7)
  --buy-delay N         Days after signal trigger to enter (default: 1)
  --position-size F     Fraction of available cash per trade (default: 0.10)
  --min-score F         Minimum signal score filter (default: 0.0)
  --min-cluster N       Minimum cluster size filter (default: 1)
  --capital F           Initial capital in USD (default: 100000)

Transaction costs:
  --spread F            One-way bid-ask half-spread paid at entry and exit (default: 0.003)
  --slippage F          Entry slippage / market impact (default: 0.002)
  --commission F        Per-trade commission as fraction of notional (default: 0.001)

Round-trip cost = spread×2 + slippage + commission×2

Key configuration (config.py)

Parameter Default Description
EDGAR_POLL_INTERVAL 600 s Polling cadence
MIN_TRANSACTION_VALUE $50,000 Ignore buys below this
MIN_CLUSTER_SIZE 1 Minimum unique insiders before a signal fires
CLUSTER_WINDOW_DAYS 30 Rolling window for cluster counting
HOLDING_PERIOD_DAYS 90 Days held per position (backtest + auto-close trigger)
POSITION_SIZE_PCT 2% Fraction of portfolio per trade
MAX_POSITIONS 20 Hard position limit
SCORE_ALERT_THRESHOLD 5.0 Minimum score to trigger Slack alert

Scoring

score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size  1))

Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0

Backtesting

The backtest loads signals from the DB and fetches OHLC data via yfinance. Prices are cached in the price_cache table — completed date ranges are served entirely from the DB on repeat runs. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date.

Results (20232024 backtest, 302k filings ingested)

⚠ Read the caveats below before drawing conclusions.

Per-signal statistics (pre-cost)

Across 16,279 signals generated from 302k Form 4 filings (20232024):

Hold Avg return Avg alpha vs SPY Sharpe Win rate
3 d +0.61% +0.52% ~0.80 ~53%
7 d +1.19% +0.68% ~1.05 ~54%
14 d +1.41% +0.55% ~0.90 ~54%
30 d +1.89% +0.41% ~0.70 ~54%
90 d +5.8% +1.0% ~0.55 ~57%

Alpha is strongest and most consistent at 314 day holds. Beyond 30 days, market beta dominates. Signal quality is broadly robust across min_score and min_cluster filter values.

Portfolio simulation (1-day lag, 7-day hold, 10% of cash per signal)

Pre-cost simulation on the same period:

Metric Value
Initial capital $100,000
Final value $782,097
Total return +682%
Annualized return +177%
SPY annualized +25.9%
Max drawdown 12.8%
Sharpe 4.67
Trades executed 13,766

After realistic transaction costs (~1% round-trip), expected annualized return drops to roughly 2060% depending on assumed spread and slippage. Run the simulator to check your specific assumptions:

# Conservative (liquid mid-caps, ~1% round-trip)
python main.py simulate --spread 0.003 --slippage 0.002 --commission 0.001

# Realistic small-cap (~1.5% round-trip)
python main.py simulate --spread 0.007 --slippage 0.005 --commission 0.001

Caveats

  1. Transaction costs are everything. Average alpha per 7-day trade is ~0.68%. A round-trip on small/mid caps costs 0.61.5% (spread + slippage + commission). At the high end this strategy is slightly negative after costs. The 177% pre-cost figure is not achievable in practice.

  2. 20232024 was an exceptional bull market. SPY returned +25.9% annualized. The long-only bias in insider buys captured broad market momentum. Expected performance in flat or down markets is lower and untested.

  3. Survivorship bias. Tickers that were delisted, halted, or acquired may be underrepresented in the price cache. This slightly flatters results by dropping the worst outcomes.

  4. No slippage on popular signals. When multiple insiders at the same company buy on the same day, the stock may have already moved before you execute. The 1-day delay helps but doesn't fully resolve this.

  5. Concentrated portfolio. At 10% of cash per signal with 7-day holds, you run ~710 simultaneous positions on average. Individual position variance is high.

  6. Long-only. Excess return over SPY is not directly capturable without shorting SPY, which has its own carry cost.

Position lifecycle

Positions are tracked in the signals table. When a trade is executed, executed_at is recorded. On each poll cycle the poller checks for positions where executed_at is older than HOLDING_PERIOD_DAYS and calls Alpaca to close them, marking closed=1 in the DB.

Modules

Path Purpose
config.py All thresholds and env-var loading
ingestion/edgar_poller.py EDGAR Atom feed polling and deduplication
ingestion/sec_bulk_ingest.py Bulk historical ingest via quarterly form.idx archives
ingestion/form4_parser.py Form 4 XML → structured dict; 10b5-1 detection; tx_code extraction
db/models.py SQLAlchemy ORM models (Filing, Signal, PriceCache)
db/db.py DB access layer — dedup-safe inserts, chunked IN queries, price cache
signals/filter_engine.py Filing → signal pipeline (open-market-only, as-of-date aware)
signals/cluster_detector.py Cluster detection from DB (as-of-date aware)
alerts/slack_alert.py Slack webhook alert
broker/alpaca_client.py Alpaca order execution + position exit
backtest/backtest.py Per-signal historical backtest runner
backtest/simulate.py Portfolio simulator with configurable costs
main.py CLI entry point (run / backfill / backtest / simulate)

Requirements

  • Python 3.11+
  • See requirements.txt: requests, lxml, cssselect, yfinance, python-dotenv, alpaca-trade-api, sqlalchemy