feat: Insider Copytrade POC + PLAN.md #2

Merged
dodox merged 4 commits from claude/issue-1-insider-copytrade-poc into master 2026-05-04 19:38:21 +02:00
Showing only changes of commit 8c0085e503 - Show all commits

105
README.md Normal file
View File

@ -0,0 +1,105 @@
# Cleopatra — Insider Copytrade POC
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
## Architecture
```
EDGAR (Form 4 feed)
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans
db/schema.sql + db/db.py ← SQLite (WAL mode): filings + signals tables
signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring
signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
```
## Setup
```bash
cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt
```
### Environment variables (`.env`)
| Variable | Required | Default | Description |
|---|---|---|---|
| `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts |
| `ALPACA_KEY` | optional | — | Alpaca API key |
| `ALPACA_SECRET` | optional | — | Alpaca API secret |
| `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint |
| `DB_PATH` | optional | `insider.db` | SQLite database file path |
| `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings |
## Usage
```bash
# Initialize DB and ingest current EDGAR feed (one shot)
python main.py fetch-once
# Run continuous polling loop (every 10 minutes)
python main.py run
# Backtest signals already in the DB against historical prices
python main.py backtest
```
## Key configuration (`config.py`)
| Parameter | Default | Description |
|---|---|---|
| `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence |
| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this |
| `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires |
| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting |
| `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + close trigger) |
| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade |
| `MAX_POSITIONS` | 20 | Hard position limit |
| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert |
## Scoring
```
score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size 1))
```
Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0
## Backtesting
The backtest loads signals from the SQLite DB and fetches OHLC data via `yfinance` on demand (no local price cache). Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number to avoid re-downloading.
Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio.
## Modules
| Path | Purpose |
|---|---|
| `config.py` | All thresholds and env-var loading |
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication |
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection |
| `db/schema.sql` | SQLite schema (`filings`, `signals`) |
| `db/db.py` | DB access layer |
| `signals/filter_engine.py` | Filing → signal pipeline |
| `signals/cluster_detector.py` | Cluster detection from DB |
| `alerts/slack_alert.py` | Slack webhook alert |
| `broker/alpaca_client.py` | Alpaca order execution |
| `backtest/backtest.py` | Historical backtest runner |
| `main.py` | CLI entry point |
## Requirements
- Python 3.11+
- See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`