- PLAN.md: full implementation plan from issue - config.py: configurable thresholds, API keys via .env - ingestion/: EDGAR RSS poller + Form 4 XML parser - db/: SQLite schema + interface (WAL mode) - signals/: filter engine (buy/10b5-1/value/role) + cluster detector - alerts/: Slack webhook alert with score gating - broker/: Alpaca paper/live trade execution - backtest/: historical signal backtesting with yfinance - main.py: CLI entrypoint (run | fetch-once | backtest)
341 lines
9.1 KiB
Markdown
341 lines
9.1 KiB
Markdown
# Insider Copytrade System -- Implementation Plan
|
|
|
|
## Description
|
|
|
|
A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API.
|
|
|
|
The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions.
|
|
|
|
---
|
|
|
|
## Background
|
|
|
|
Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider *buying* is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when:
|
|
|
|
- Multiple insiders buy simultaneously (cluster signal)
|
|
- The trade is unplanned (not a 10b5-1 scheduled plan)
|
|
- The company is small/mid-cap (less institutional arbitrage)
|
|
|
|
The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level.
|
|
|
|
**Key risk:** This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys.
|
|
|
|
---
|
|
|
|
## System Outline
|
|
|
|
```
|
|
SEC EDGAR RSS Feed (poll every 10 min)
|
|
|
|
|
[Ingestion Layer]
|
|
|
|
|
Parse Form 4 XML
|
|
|
|
|
[Filter Engine]
|
|
- Buy only (flag = A)
|
|
- Exclude 10b5-1 plans
|
|
- Min transaction size
|
|
- Role weighting
|
|
- Cluster detection
|
|
|
|
|
SQLite Database
|
|
|
|
|
┌────────────┬──────────────┐
|
|
| | |
|
|
[Backtester] [Slack Alert] [Alpaca API]
|
|
(manual) (paper/live)
|
|
```
|
|
|
|
---
|
|
|
|
## Actionables
|
|
|
|
### Phase 1 -- Data Ingestion
|
|
|
|
**Goal:** Reliably pull and parse Form 4 filings as they appear.
|
|
|
|
**Tasks:**
|
|
|
|
1. Set up project structure
|
|
```
|
|
insider-copytrade/
|
|
ingestion/
|
|
edgar_poller.py # polls EDGAR RSS
|
|
form4_parser.py # parses XML -> structured dict
|
|
db/
|
|
schema.sql
|
|
db.py # SQLite interface
|
|
signals/
|
|
filter_engine.py # applies signal filters
|
|
cluster_detector.py
|
|
alerts/
|
|
slack_alert.py
|
|
broker/
|
|
alpaca_client.py
|
|
backtest/
|
|
backtest.py
|
|
config.py
|
|
main.py
|
|
```
|
|
|
|
2. Poll EDGAR RSS for Form 4 filings every 10 minutes:
|
|
```
|
|
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent
|
|
```
|
|
SEC also provides a structured latest filings feed:
|
|
```
|
|
https://efts.sec.gov/LATEST/search-index?q=&forms=4
|
|
```
|
|
|
|
3. For each new filing, fetch and parse the XML document. Key fields to extract:
|
|
- `issuerTradingSymbol` (ticker)
|
|
- `rptOwnerName`, `officerTitle` (insider name + role)
|
|
- `transactionDate`
|
|
- `transactionAcquiredDisposedCode` (A = buy, D = sell)
|
|
- `transactionShares`, `transactionPricePerShare`
|
|
- `transactionTotalValue` (compute if not present)
|
|
- `footnotes` (check for "10b5-1" mention)
|
|
- `sharesOwnedFollowingTransaction`
|
|
|
|
4. Store raw filing XML + parsed fields. Track `accessionNumber` as dedup key.
|
|
|
|
**SQLite schema:**
|
|
```sql
|
|
CREATE TABLE filings (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
accession_number TEXT UNIQUE,
|
|
ticker TEXT,
|
|
cik TEXT,
|
|
insider_name TEXT,
|
|
role TEXT,
|
|
transaction_date TEXT,
|
|
filed_date TEXT,
|
|
shares REAL,
|
|
price REAL,
|
|
total_value REAL,
|
|
flag TEXT, -- A or D
|
|
is_10b51 INTEGER, -- 0 or 1
|
|
post_tx_shares REAL,
|
|
created_at TEXT
|
|
);
|
|
|
|
CREATE TABLE signals (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
ticker TEXT,
|
|
trigger_date TEXT,
|
|
cluster_size INTEGER,
|
|
total_cluster_value REAL,
|
|
score REAL,
|
|
alerted INTEGER DEFAULT 0,
|
|
executed INTEGER DEFAULT 0,
|
|
created_at TEXT
|
|
);
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2 -- Filter Engine
|
|
|
|
**Goal:** Reduce noise to actionable signals only.
|
|
|
|
**Filters to apply (in order):**
|
|
|
|
| Filter | Logic |
|
|
|---|---|
|
|
| Buy only | `flag == 'A'` |
|
|
| Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" |
|
|
| Min transaction value | `total_value >= 50000` (configurable) |
|
|
| Exclude derivative transactions | Options exercises are weaker signal than open market purchases |
|
|
| Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent |
|
|
| Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal |
|
|
|
|
**Scoring formula (simple v1):**
|
|
```python
|
|
score = base_role_weight * log(total_value) * cluster_multiplier
|
|
# cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1))
|
|
```
|
|
|
|
Expose all thresholds in `config.py` for easy tuning during backtesting.
|
|
|
|
---
|
|
|
|
### Phase 3 -- SQLite Storage
|
|
|
|
SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting:
|
|
|
|
```python
|
|
conn = sqlite3.connect('insider.db')
|
|
conn.execute('PRAGMA journal_mode=WAL')
|
|
```
|
|
|
|
Keep raw filing XML in a `/data/filings/` directory keyed by accession number. Parse on ingest, re-parse never needed.
|
|
|
|
---
|
|
|
|
### Phase 4 -- Slack Alerts
|
|
|
|
**Goal:** Get notified immediately when a signal fires, with enough context to decide manually.
|
|
|
|
1. Create a Slack app, get a webhook URL (takes 5 minutes)
|
|
2. Alert format:
|
|
|
|
```
|
|
INSIDER BUY SIGNAL
|
|
Ticker: $ACME
|
|
Insider: John Smith (CEO)
|
|
Date: 2025-05-01
|
|
Shares: 10,000 @ $14.50 = $145,000
|
|
Cluster: 3 insiders in last 14 days
|
|
Score: 8.4
|
|
10b5-1: No
|
|
EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?...
|
|
```
|
|
|
|
3. Alert only on signals above configurable score threshold
|
|
4. Mark `alerted = 1` in DB after sending to avoid duplicates on re-poll
|
|
|
|
```python
|
|
import requests
|
|
|
|
def send_slack_alert(webhook_url, signal):
|
|
requests.post(webhook_url, json={"text": format_signal(signal)})
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 5 -- Backtesting
|
|
|
|
**Goal:** Validate filter parameters on historical data before going live.
|
|
|
|
**Data:**
|
|
- Historical Form 4 filings: download bulk XML from `https://www.sec.gov/dera/data/form-4-data`
|
|
- Price data: `yfinance` (free, sufficient for backtesting)
|
|
|
|
**Backtest logic:**
|
|
```python
|
|
# For each signal in historical data:
|
|
# - Entry: next market open after filed_date
|
|
# - Exit: N days later (configurable: 30/60/90/180)
|
|
# - Calculate return vs SPY over same period
|
|
# - Aggregate by role, cluster_size, market_cap bucket
|
|
```
|
|
|
|
**Use `vectorbt` for performance:**
|
|
```python
|
|
import vectorbt as vbt
|
|
# Build entry/exit signal matrices aligned to price data
|
|
# Run portfolio simulation with configurable position sizing
|
|
```
|
|
|
|
**Output metrics:**
|
|
- Annualized return vs SPY benchmark
|
|
- Win rate
|
|
- Avg return by holding period
|
|
- Avg return by role / cluster size
|
|
- Max drawdown
|
|
- Sharpe ratio
|
|
|
|
**Critical:** Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage.
|
|
|
|
**Parameter grid to test:**
|
|
```python
|
|
MIN_VALUE = [25_000, 50_000, 100_000]
|
|
HOLDING_DAYS = [30, 60, 90, 180]
|
|
CLUSTER_WINDOW = [14, 30]
|
|
MIN_CLUSTER_SIZE = [1, 2, 3]
|
|
ROLES = ['all', 'c-suite-only']
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 6 -- Alpaca Integration
|
|
|
|
**Goal:** Optionally auto-execute signals. Start with paper trading.
|
|
|
|
**Paper trading base URL:** `https://paper-api.alpaca.markets`
|
|
**Live trading base URL:** `https://api.alpaca.markets`
|
|
|
|
Swap via config flag -- never hardcode.
|
|
|
|
```python
|
|
from alpaca_trade_api import REST
|
|
|
|
api = REST(
|
|
key_id=config.ALPACA_KEY,
|
|
secret_key=config.ALPACA_SECRET,
|
|
base_url=config.ALPACA_BASE_URL # paper or live
|
|
)
|
|
|
|
def execute_signal(ticker, portfolio_value, signal_score):
|
|
# Fixed fractional sizing: 2% of portfolio per signal
|
|
price = api.get_latest_trade(ticker).price
|
|
allocation = portfolio_value * 0.02
|
|
qty = int(allocation / price)
|
|
if qty < 1:
|
|
return
|
|
api.submit_order(
|
|
symbol=ticker,
|
|
qty=qty,
|
|
side='buy',
|
|
type='market',
|
|
time_in_force='day'
|
|
)
|
|
```
|
|
|
|
Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure.
|
|
|
|
Exit logic (v1): time-based only (close after N days). Add trailing stop later.
|
|
|
|
---
|
|
|
|
## Build Order
|
|
|
|
| Step | Deliverable | Est. Time |
|
|
|---|---|---|
|
|
| 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day |
|
|
| 2 | Filter engine + cluster detector | 0.5 day |
|
|
| 3 | Slack alert | 1 hour |
|
|
| 4 | Historical data download + backtest | 1-2 days |
|
|
| 5 | Alpaca paper trading integration | 0.5 day |
|
|
| 6 | Run paper trading 4-8 weeks, monitor | -- |
|
|
| 7 | Switch to live with small capital | -- |
|
|
|
|
Do not proceed to Step 7 without meaningful paper trading history.
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
```
|
|
requests
|
|
lxml
|
|
sqlite3 (stdlib)
|
|
yfinance
|
|
vectorbt
|
|
alpaca-trade-api
|
|
python-dotenv
|
|
```
|
|
|
|
All free. No paid APIs required.
|
|
|
|
---
|
|
|
|
## Config Template
|
|
|
|
```python
|
|
# config.py
|
|
EDGAR_POLL_INTERVAL = 600 # seconds
|
|
MIN_TRANSACTION_VALUE = 50_000
|
|
MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality
|
|
CLUSTER_WINDOW_DAYS = 30
|
|
HOLDING_PERIOD_DAYS = 90
|
|
POSITION_SIZE_PCT = 0.02 # 2% per signal
|
|
MAX_POSITIONS = 20
|
|
SCORE_ALERT_THRESHOLD = 5.0
|
|
|
|
SLACK_WEBHOOK_URL = ""
|
|
ALPACA_KEY = ""
|
|
ALPACA_SECRET = ""
|
|
ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live
|
|
```
|