smaug/README.md
Claude 8c0085e503 docs: add README
Co-authored-by: dodox <dodox@users.noreply.local>
2026-05-04 16:24:25 +00:00

3.6 KiB
Raw Blame History

Cleopatra — Insider Copytrade POC

Monitors SEC EDGAR Form 4 filings in near real-time, detects insider buy clusters, sends Slack alerts, and optionally executes trades via Alpaca.

Architecture

EDGAR (Form 4 feed)
      │
      ▼
ingestion/edgar_poller.py   ← polls every 10 min, dedupes by accession
      │
      ▼
ingestion/form4_parser.py   ← parses XML, detects 10b5-1 plans
      │
      ▼
db/schema.sql + db/db.py    ← SQLite (WAL mode): filings + signals tables
      │
      ▼
signals/filter_engine.py    ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring
signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window
      │
      ├──► alerts/slack_alert.py   ← POST to Slack webhook when score ≥ threshold
      └──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap

Setup

cp .env.example .env
# edit .env with your credentials
pip install -r requirements.txt

Environment variables (.env)

Variable Required Default Description
SLACK_WEBHOOK_URL optional Incoming webhook URL for alerts
ALPACA_KEY optional Alpaca API key
ALPACA_SECRET optional Alpaca API secret
ALPACA_BASE_URL optional https://paper-api.alpaca.markets Use paper or live endpoint
DB_PATH optional insider.db SQLite database file path
DATA_DIR optional data/filings Directory for cached raw XML filings

Usage

# Initialize DB and ingest current EDGAR feed (one shot)
python main.py fetch-once

# Run continuous polling loop (every 10 minutes)
python main.py run

# Backtest signals already in the DB against historical prices
python main.py backtest

Key configuration (config.py)

Parameter Default Description
EDGAR_POLL_INTERVAL 600 s Polling cadence
MIN_TRANSACTION_VALUE $50,000 Ignore buys below this
MIN_CLUSTER_SIZE 1 Minimum unique insiders before a signal fires
CLUSTER_WINDOW_DAYS 30 Rolling window for cluster counting
HOLDING_PERIOD_DAYS 90 Days held per position (backtest + close trigger)
POSITION_SIZE_PCT 2% Fraction of portfolio per trade
MAX_POSITIONS 20 Hard position limit
SCORE_ALERT_THRESHOLD 5.0 Minimum score to trigger Slack alert

Scoring

score = role_weight × log(total_value) × (1 + 0.5 × (cluster_size  1))

Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2 · 10% owner 1.0

Backtesting

The backtest loads signals from the SQLite DB and fetches OHLC data via yfinance on demand (no local price cache). Raw XML filings are cached in DATA_DIR (data/filings/) by accession number to avoid re-downloading.

Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio.

Modules

Path Purpose
config.py All thresholds and env-var loading
ingestion/edgar_poller.py EDGAR Atom feed polling and deduplication
ingestion/form4_parser.py Form 4 XML → structured dict; 10b5-1 detection
db/schema.sql SQLite schema (filings, signals)
db/db.py DB access layer
signals/filter_engine.py Filing → signal pipeline
signals/cluster_detector.py Cluster detection from DB
alerts/slack_alert.py Slack webhook alert
broker/alpaca_client.py Alpaca order execution
backtest/backtest.py Historical backtest runner
main.py CLI entry point

Requirements

  • Python 3.11+
  • See requirements.txt: requests, lxml, yfinance, python-dotenv, alpaca-trade-api