- PLAN.md: full implementation plan from issue - config.py: configurable thresholds, API keys via .env - ingestion/: EDGAR RSS poller + Form 4 XML parser - db/: SQLite schema + interface (WAL mode) - signals/: filter engine (buy/10b5-1/value/role) + cluster detector - alerts/: Slack webhook alert with score gating - broker/: Alpaca paper/live trade execution - backtest/: historical signal backtesting with yfinance - main.py: CLI entrypoint (run | fetch-once | backtest)
9.1 KiB
Insider Copytrade System -- Implementation Plan
Description
A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API.
The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions.
Background
Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider buying is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when:
- Multiple insiders buy simultaneously (cluster signal)
- The trade is unplanned (not a 10b5-1 scheduled plan)
- The company is small/mid-cap (less institutional arbitrage)
The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level.
Key risk: This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys.
System Outline
SEC EDGAR RSS Feed (poll every 10 min)
|
[Ingestion Layer]
|
Parse Form 4 XML
|
[Filter Engine]
- Buy only (flag = A)
- Exclude 10b5-1 plans
- Min transaction size
- Role weighting
- Cluster detection
|
SQLite Database
|
┌────────────┬──────────────┐
| | |
[Backtester] [Slack Alert] [Alpaca API]
(manual) (paper/live)
Actionables
Phase 1 -- Data Ingestion
Goal: Reliably pull and parse Form 4 filings as they appear.
Tasks:
- Set up project structure
insider-copytrade/
ingestion/
edgar_poller.py # polls EDGAR RSS
form4_parser.py # parses XML -> structured dict
db/
schema.sql
db.py # SQLite interface
signals/
filter_engine.py # applies signal filters
cluster_detector.py
alerts/
slack_alert.py
broker/
alpaca_client.py
backtest/
backtest.py
config.py
main.py
- Poll EDGAR RSS for Form 4 filings every 10 minutes:
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent
SEC also provides a structured latest filings feed:
https://efts.sec.gov/LATEST/search-index?q=&forms=4
-
For each new filing, fetch and parse the XML document. Key fields to extract:
issuerTradingSymbol(ticker)rptOwnerName,officerTitle(insider name + role)transactionDatetransactionAcquiredDisposedCode(A = buy, D = sell)transactionShares,transactionPricePerSharetransactionTotalValue(compute if not present)footnotes(check for "10b5-1" mention)sharesOwnedFollowingTransaction
-
Store raw filing XML + parsed fields. Track
accessionNumberas dedup key.
SQLite schema:
CREATE TABLE filings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
accession_number TEXT UNIQUE,
ticker TEXT,
cik TEXT,
insider_name TEXT,
role TEXT,
transaction_date TEXT,
filed_date TEXT,
shares REAL,
price REAL,
total_value REAL,
flag TEXT, -- A or D
is_10b51 INTEGER, -- 0 or 1
post_tx_shares REAL,
created_at TEXT
);
CREATE TABLE signals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ticker TEXT,
trigger_date TEXT,
cluster_size INTEGER,
total_cluster_value REAL,
score REAL,
alerted INTEGER DEFAULT 0,
executed INTEGER DEFAULT 0,
created_at TEXT
);
Phase 2 -- Filter Engine
Goal: Reduce noise to actionable signals only.
Filters to apply (in order):
| Filter | Logic |
|---|---|
| Buy only | flag == 'A' |
| Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" |
| Min transaction value | total_value >= 50000 (configurable) |
| Exclude derivative transactions | Options exercises are weaker signal than open market purchases |
| Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent |
| Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal |
Scoring formula (simple v1):
score = base_role_weight * log(total_value) * cluster_multiplier
# cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1))
Expose all thresholds in config.py for easy tuning during backtesting.
Phase 3 -- SQLite Storage
SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting:
conn = sqlite3.connect('insider.db')
conn.execute('PRAGMA journal_mode=WAL')
Keep raw filing XML in a /data/filings/ directory keyed by accession number. Parse on ingest, re-parse never needed.
Phase 4 -- Slack Alerts
Goal: Get notified immediately when a signal fires, with enough context to decide manually.
- Create a Slack app, get a webhook URL (takes 5 minutes)
- Alert format:
INSIDER BUY SIGNAL
Ticker: $ACME
Insider: John Smith (CEO)
Date: 2025-05-01
Shares: 10,000 @ $14.50 = $145,000
Cluster: 3 insiders in last 14 days
Score: 8.4
10b5-1: No
EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?...
- Alert only on signals above configurable score threshold
- Mark
alerted = 1in DB after sending to avoid duplicates on re-poll
import requests
def send_slack_alert(webhook_url, signal):
requests.post(webhook_url, json={"text": format_signal(signal)})
Phase 5 -- Backtesting
Goal: Validate filter parameters on historical data before going live.
Data:
- Historical Form 4 filings: download bulk XML from
https://www.sec.gov/dera/data/form-4-data - Price data:
yfinance(free, sufficient for backtesting)
Backtest logic:
# For each signal in historical data:
# - Entry: next market open after filed_date
# - Exit: N days later (configurable: 30/60/90/180)
# - Calculate return vs SPY over same period
# - Aggregate by role, cluster_size, market_cap bucket
Use vectorbt for performance:
import vectorbt as vbt
# Build entry/exit signal matrices aligned to price data
# Run portfolio simulation with configurable position sizing
Output metrics:
- Annualized return vs SPY benchmark
- Win rate
- Avg return by holding period
- Avg return by role / cluster size
- Max drawdown
- Sharpe ratio
Critical: Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage.
Parameter grid to test:
MIN_VALUE = [25_000, 50_000, 100_000]
HOLDING_DAYS = [30, 60, 90, 180]
CLUSTER_WINDOW = [14, 30]
MIN_CLUSTER_SIZE = [1, 2, 3]
ROLES = ['all', 'c-suite-only']
Phase 6 -- Alpaca Integration
Goal: Optionally auto-execute signals. Start with paper trading.
Paper trading base URL: https://paper-api.alpaca.markets
Live trading base URL: https://api.alpaca.markets
Swap via config flag -- never hardcode.
from alpaca_trade_api import REST
api = REST(
key_id=config.ALPACA_KEY,
secret_key=config.ALPACA_SECRET,
base_url=config.ALPACA_BASE_URL # paper or live
)
def execute_signal(ticker, portfolio_value, signal_score):
# Fixed fractional sizing: 2% of portfolio per signal
price = api.get_latest_trade(ticker).price
allocation = portfolio_value * 0.02
qty = int(allocation / price)
if qty < 1:
return
api.submit_order(
symbol=ticker,
qty=qty,
side='buy',
type='market',
time_in_force='day'
)
Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure.
Exit logic (v1): time-based only (close after N days). Add trailing stop later.
Build Order
| Step | Deliverable | Est. Time |
|---|---|---|
| 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day |
| 2 | Filter engine + cluster detector | 0.5 day |
| 3 | Slack alert | 1 hour |
| 4 | Historical data download + backtest | 1-2 days |
| 5 | Alpaca paper trading integration | 0.5 day |
| 6 | Run paper trading 4-8 weeks, monitor | -- |
| 7 | Switch to live with small capital | -- |
Do not proceed to Step 7 without meaningful paper trading history.
Dependencies
requests
lxml
sqlite3 (stdlib)
yfinance
vectorbt
alpaca-trade-api
python-dotenv
All free. No paid APIs required.
Config Template
# config.py
EDGAR_POLL_INTERVAL = 600 # seconds
MIN_TRANSACTION_VALUE = 50_000
MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality
CLUSTER_WINDOW_DAYS = 30
HOLDING_PERIOD_DAYS = 90
POSITION_SIZE_PCT = 0.02 # 2% per signal
MAX_POSITIONS = 20
SCORE_ALERT_THRESHOLD = 5.0
SLACK_WEBHOOK_URL = ""
ALPACA_KEY = ""
ALPACA_SECRET = ""
ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live