# Insider Copytrade System -- Implementation Plan ## Description A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API. The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions. --- ## Background Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider *buying* is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when: - Multiple insiders buy simultaneously (cluster signal) - The trade is unplanned (not a 10b5-1 scheduled plan) - The company is small/mid-cap (less institutional arbitrage) The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level. **Key risk:** This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys. --- ## System Outline ``` SEC EDGAR RSS Feed (poll every 10 min) | [Ingestion Layer] | Parse Form 4 XML | [Filter Engine] - Buy only (flag = A) - Exclude 10b5-1 plans - Min transaction size - Role weighting - Cluster detection | SQLite Database | ┌────────────┬──────────────┐ | | | [Backtester] [Slack Alert] [Alpaca API] (manual) (paper/live) ``` --- ## Actionables ### Phase 1 -- Data Ingestion **Goal:** Reliably pull and parse Form 4 filings as they appear. **Tasks:** 1. Set up project structure ``` insider-copytrade/ ingestion/ edgar_poller.py # polls EDGAR RSS form4_parser.py # parses XML -> structured dict db/ schema.sql db.py # SQLite interface signals/ filter_engine.py # applies signal filters cluster_detector.py alerts/ slack_alert.py broker/ alpaca_client.py backtest/ backtest.py config.py main.py ``` 2. Poll EDGAR RSS for Form 4 filings every 10 minutes: ``` https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent ``` SEC also provides a structured latest filings feed: ``` https://efts.sec.gov/LATEST/search-index?q=&forms=4 ``` 3. For each new filing, fetch and parse the XML document. Key fields to extract: - `issuerTradingSymbol` (ticker) - `rptOwnerName`, `officerTitle` (insider name + role) - `transactionDate` - `transactionAcquiredDisposedCode` (A = buy, D = sell) - `transactionShares`, `transactionPricePerShare` - `transactionTotalValue` (compute if not present) - `footnotes` (check for "10b5-1" mention) - `sharesOwnedFollowingTransaction` 4. Store raw filing XML + parsed fields. Track `accessionNumber` as dedup key. **SQLite schema:** ```sql CREATE TABLE filings ( id INTEGER PRIMARY KEY AUTOINCREMENT, accession_number TEXT UNIQUE, ticker TEXT, cik TEXT, insider_name TEXT, role TEXT, transaction_date TEXT, filed_date TEXT, shares REAL, price REAL, total_value REAL, flag TEXT, -- A or D is_10b51 INTEGER, -- 0 or 1 post_tx_shares REAL, created_at TEXT ); CREATE TABLE signals ( id INTEGER PRIMARY KEY AUTOINCREMENT, ticker TEXT, trigger_date TEXT, cluster_size INTEGER, total_cluster_value REAL, score REAL, alerted INTEGER DEFAULT 0, executed INTEGER DEFAULT 0, created_at TEXT ); ``` --- ### Phase 2 -- Filter Engine **Goal:** Reduce noise to actionable signals only. **Filters to apply (in order):** | Filter | Logic | |---|---| | Buy only | `flag == 'A'` | | Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" | | Min transaction value | `total_value >= 50000` (configurable) | | Exclude derivative transactions | Options exercises are weaker signal than open market purchases | | Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent | | Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal | **Scoring formula (simple v1):** ```python score = base_role_weight * log(total_value) * cluster_multiplier # cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1)) ``` Expose all thresholds in `config.py` for easy tuning during backtesting. --- ### Phase 3 -- SQLite Storage SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting: ```python conn = sqlite3.connect('insider.db') conn.execute('PRAGMA journal_mode=WAL') ``` Keep raw filing XML in a `/data/filings/` directory keyed by accession number. Parse on ingest, re-parse never needed. --- ### Phase 4 -- Slack Alerts **Goal:** Get notified immediately when a signal fires, with enough context to decide manually. 1. Create a Slack app, get a webhook URL (takes 5 minutes) 2. Alert format: ``` INSIDER BUY SIGNAL Ticker: $ACME Insider: John Smith (CEO) Date: 2025-05-01 Shares: 10,000 @ $14.50 = $145,000 Cluster: 3 insiders in last 14 days Score: 8.4 10b5-1: No EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?... ``` 3. Alert only on signals above configurable score threshold 4. Mark `alerted = 1` in DB after sending to avoid duplicates on re-poll ```python import requests def send_slack_alert(webhook_url, signal): requests.post(webhook_url, json={"text": format_signal(signal)}) ``` --- ### Phase 5 -- Backtesting **Goal:** Validate filter parameters on historical data before going live. **Data:** - Historical Form 4 filings: download bulk XML from `https://www.sec.gov/dera/data/form-4-data` - Price data: `yfinance` (free, sufficient for backtesting) **Backtest logic:** ```python # For each signal in historical data: # - Entry: next market open after filed_date # - Exit: N days later (configurable: 30/60/90/180) # - Calculate return vs SPY over same period # - Aggregate by role, cluster_size, market_cap bucket ``` **Use `vectorbt` for performance:** ```python import vectorbt as vbt # Build entry/exit signal matrices aligned to price data # Run portfolio simulation with configurable position sizing ``` **Output metrics:** - Annualized return vs SPY benchmark - Win rate - Avg return by holding period - Avg return by role / cluster size - Max drawdown - Sharpe ratio **Critical:** Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage. **Parameter grid to test:** ```python MIN_VALUE = [25_000, 50_000, 100_000] HOLDING_DAYS = [30, 60, 90, 180] CLUSTER_WINDOW = [14, 30] MIN_CLUSTER_SIZE = [1, 2, 3] ROLES = ['all', 'c-suite-only'] ``` --- ### Phase 6 -- Alpaca Integration **Goal:** Optionally auto-execute signals. Start with paper trading. **Paper trading base URL:** `https://paper-api.alpaca.markets` **Live trading base URL:** `https://api.alpaca.markets` Swap via config flag -- never hardcode. ```python from alpaca_trade_api import REST api = REST( key_id=config.ALPACA_KEY, secret_key=config.ALPACA_SECRET, base_url=config.ALPACA_BASE_URL # paper or live ) def execute_signal(ticker, portfolio_value, signal_score): # Fixed fractional sizing: 2% of portfolio per signal price = api.get_latest_trade(ticker).price allocation = portfolio_value * 0.02 qty = int(allocation / price) if qty < 1: return api.submit_order( symbol=ticker, qty=qty, side='buy', type='market', time_in_force='day' ) ``` Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure. Exit logic (v1): time-based only (close after N days). Add trailing stop later. --- ## Build Order | Step | Deliverable | Est. Time | |---|---|---| | 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day | | 2 | Filter engine + cluster detector | 0.5 day | | 3 | Slack alert | 1 hour | | 4 | Historical data download + backtest | 1-2 days | | 5 | Alpaca paper trading integration | 0.5 day | | 6 | Run paper trading 4-8 weeks, monitor | -- | | 7 | Switch to live with small capital | -- | Do not proceed to Step 7 without meaningful paper trading history. --- ## Dependencies ``` requests lxml sqlite3 (stdlib) yfinance vectorbt alpaca-trade-api python-dotenv ``` All free. No paid APIs required. --- ## Config Template ```python # config.py EDGAR_POLL_INTERVAL = 600 # seconds MIN_TRANSACTION_VALUE = 50_000 MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality CLUSTER_WINDOW_DAYS = 30 HOLDING_PERIOD_DAYS = 90 POSITION_SIZE_PCT = 0.02 # 2% per signal MAX_POSITIONS = 20 SCORE_ALERT_THRESHOLD = 5.0 SLACK_WEBHOOK_URL = "" ALPACA_KEY = "" ALPACA_SECRET = "" ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live ```