diff --git a/README.md b/README.md new file mode 100644 index 0000000..f0db139 --- /dev/null +++ b/README.md @@ -0,0 +1,105 @@ +# Cleopatra — Insider Copytrade POC + +Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca. + +## Architecture + +``` +EDGAR (Form 4 feed) + │ + ▼ +ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession + │ + ▼ +ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans + │ + ▼ +db/schema.sql + db/db.py ← SQLite (WAL mode): filings + signals tables + │ + ▼ +signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring +signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window + │ + ├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold + └──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap +``` + +## Setup + +```bash +cp .env.example .env +# edit .env with your credentials +pip install -r requirements.txt +``` + +### Environment variables (`.env`) + +| Variable | Required | Default | Description | +|---|---|---|---| +| `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts | +| `ALPACA_KEY` | optional | — | Alpaca API key | +| `ALPACA_SECRET` | optional | — | Alpaca API secret | +| `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint | +| `DB_PATH` | optional | `insider.db` | SQLite database file path | +| `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings | + +## Usage + +```bash +# Initialize DB and ingest current EDGAR feed (one shot) +python main.py fetch-once + +# Run continuous polling loop (every 10 minutes) +python main.py run + +# Backtest signals already in the DB against historical prices +python main.py backtest +``` + +## Key configuration (`config.py`) + +| Parameter | Default | Description | +|---|---|---| +| `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence | +| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this | +| `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires | +| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting | +| `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + close trigger) | +| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade | +| `MAX_POSITIONS` | 20 | Hard position limit | +| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert | + +## Scoring + +``` +score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size − 1)) +``` + +Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0 + +## Backtesting + +The backtest loads signals from the SQLite DB and fetches OHLC data via `yfinance` on demand (no local price cache). Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number to avoid re-downloading. + +Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio. + +## Modules + +| Path | Purpose | +|---|---| +| `config.py` | All thresholds and env-var loading | +| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication | +| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection | +| `db/schema.sql` | SQLite schema (`filings`, `signals`) | +| `db/db.py` | DB access layer | +| `signals/filter_engine.py` | Filing → signal pipeline | +| `signals/cluster_detector.py` | Cluster detection from DB | +| `alerts/slack_alert.py` | Slack webhook alert | +| `broker/alpaca_client.py` | Alpaca order execution | +| `backtest/backtest.py` | Historical backtest runner | +| `main.py` | CLI entry point | + +## Requirements + +- Python 3.11+ +- See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`