smaug/README.md
Claude 8c0085e503 docs: add README
Co-authored-by: dodox <dodox@users.noreply.local>
2026-05-04 16:24:25 +00:00

106 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cleopatra — Insider Copytrade POC
Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.
## Architecture
```
EDGAR (Form 4 feed)
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans
db/schema.sql + db/db.py ← SQLite (WAL mode): filings + signals tables
signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring
signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
```
## Setup
```bash
cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt
```
### Environment variables (`.env`)
| Variable | Required | Default | Description |
|---|---|---|---|
| `SLACK_WEBHOOK_URL` | optional | — | Incoming webhook URL for alerts |
| `ALPACA_KEY` | optional | — | Alpaca API key |
| `ALPACA_SECRET` | optional | — | Alpaca API secret |
| `ALPACA_BASE_URL` | optional | `https://paper-api.alpaca.markets` | Use paper or live endpoint |
| `DB_PATH` | optional | `insider.db` | SQLite database file path |
| `DATA_DIR` | optional | `data/filings` | Directory for cached raw XML filings |
## Usage
```bash
# Initialize DB and ingest current EDGAR feed (one shot)
python main.py fetch-once
# Run continuous polling loop (every 10 minutes)
python main.py run
# Backtest signals already in the DB against historical prices
python main.py backtest
```
## Key configuration (`config.py`)
| Parameter | Default | Description |
|---|---|---|
| `EDGAR_POLL_INTERVAL` | 600 s | Polling cadence |
| `MIN_TRANSACTION_VALUE` | $50,000 | Ignore buys below this |
| `MIN_CLUSTER_SIZE` | 1 | Minimum unique insiders before a signal fires |
| `CLUSTER_WINDOW_DAYS` | 30 | Rolling window for cluster counting |
| `HOLDING_PERIOD_DAYS` | 90 | Days held per position (backtest + close trigger) |
| `POSITION_SIZE_PCT` | 2% | Fraction of portfolio per trade |
| `MAX_POSITIONS` | 20 | Hard position limit |
| `SCORE_ALERT_THRESHOLD` | 5.0 | Minimum score to trigger Slack alert |
## Scoring
```
score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size 1))
```
Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0
## Backtesting
The backtest loads signals from the SQLite DB and fetches OHLC data via `yfinance` on demand (no local price cache). Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number to avoid re-downloading.
Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio.
## Modules
| Path | Purpose |
|---|---|
| `config.py` | All thresholds and env-var loading |
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication |
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection |
| `db/schema.sql` | SQLite schema (`filings`, `signals`) |
| `db/db.py` | DB access layer |
| `signals/filter_engine.py` | Filing → signal pipeline |
| `signals/cluster_detector.py` | Cluster detection from DB |
| `alerts/slack_alert.py` | Slack webhook alert |
| `broker/alpaca_client.py` | Alpaca order execution |
| `backtest/backtest.py` | Historical backtest runner |
| `main.py` | CLI entry point |
## Requirements
- Python 3.11+
- See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`