feat: cap-tier filtering, Alpaca cost model, README cleanup
- simulate.py: --cap-tier large|mid|small|micro; yfinance market cap fetch with DB cache (ticker_meta table); argv fix for main.py dispatch - plot.py: equity curves now show cap tiers with Alpaca costs (zero commission); HP sweep uses Alpaca cost decomposition; SPY line clamped to last strategy date - db/models.py: TickerMeta table - db/db.py: get_cached_market_caps, upsert_market_caps - README: add --cap-tier to simulate docs; backfill note (~3 days for 2 years at SEC 10 req/s limit); remove duplicate setup block; remove em-dashes in prose; results table tilde estimates to be updated once cap-tier sims complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
56ec0b4a81
commit
d0e98b9cb7
340
PLAN.md
340
PLAN.md
@ -1,340 +0,0 @@
|
|||||||
# Insider Copytrade System -- Implementation Plan
|
|
||||||
|
|
||||||
## Description
|
|
||||||
|
|
||||||
A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API.
|
|
||||||
|
|
||||||
The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider *buying* is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when:
|
|
||||||
|
|
||||||
- Multiple insiders buy simultaneously (cluster signal)
|
|
||||||
- The trade is unplanned (not a 10b5-1 scheduled plan)
|
|
||||||
- The company is small/mid-cap (less institutional arbitrage)
|
|
||||||
|
|
||||||
The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level.
|
|
||||||
|
|
||||||
**Key risk:** This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## System Outline
|
|
||||||
|
|
||||||
```
|
|
||||||
SEC EDGAR RSS Feed (poll every 10 min)
|
|
||||||
|
|
|
||||||
[Ingestion Layer]
|
|
||||||
|
|
|
||||||
Parse Form 4 XML
|
|
||||||
|
|
|
||||||
[Filter Engine]
|
|
||||||
- Buy only (flag = A)
|
|
||||||
- Exclude 10b5-1 plans
|
|
||||||
- Min transaction size
|
|
||||||
- Role weighting
|
|
||||||
- Cluster detection
|
|
||||||
|
|
|
||||||
SQLite Database
|
|
||||||
|
|
|
||||||
┌────────────┬──────────────┐
|
|
||||||
| | |
|
|
||||||
[Backtester] [Slack Alert] [Alpaca API]
|
|
||||||
(manual) (paper/live)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Actionables
|
|
||||||
|
|
||||||
### Phase 1 -- Data Ingestion
|
|
||||||
|
|
||||||
**Goal:** Reliably pull and parse Form 4 filings as they appear.
|
|
||||||
|
|
||||||
**Tasks:**
|
|
||||||
|
|
||||||
1. Set up project structure
|
|
||||||
```
|
|
||||||
insider-copytrade/
|
|
||||||
ingestion/
|
|
||||||
edgar_poller.py # polls EDGAR RSS
|
|
||||||
form4_parser.py # parses XML -> structured dict
|
|
||||||
db/
|
|
||||||
schema.sql
|
|
||||||
db.py # SQLite interface
|
|
||||||
signals/
|
|
||||||
filter_engine.py # applies signal filters
|
|
||||||
cluster_detector.py
|
|
||||||
alerts/
|
|
||||||
slack_alert.py
|
|
||||||
broker/
|
|
||||||
alpaca_client.py
|
|
||||||
backtest/
|
|
||||||
backtest.py
|
|
||||||
config.py
|
|
||||||
main.py
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Poll EDGAR RSS for Form 4 filings every 10 minutes:
|
|
||||||
```
|
|
||||||
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent
|
|
||||||
```
|
|
||||||
SEC also provides a structured latest filings feed:
|
|
||||||
```
|
|
||||||
https://efts.sec.gov/LATEST/search-index?q=&forms=4
|
|
||||||
```
|
|
||||||
|
|
||||||
3. For each new filing, fetch and parse the XML document. Key fields to extract:
|
|
||||||
- `issuerTradingSymbol` (ticker)
|
|
||||||
- `rptOwnerName`, `officerTitle` (insider name + role)
|
|
||||||
- `transactionDate`
|
|
||||||
- `transactionAcquiredDisposedCode` (A = buy, D = sell)
|
|
||||||
- `transactionShares`, `transactionPricePerShare`
|
|
||||||
- `transactionTotalValue` (compute if not present)
|
|
||||||
- `footnotes` (check for "10b5-1" mention)
|
|
||||||
- `sharesOwnedFollowingTransaction`
|
|
||||||
|
|
||||||
4. Store raw filing XML + parsed fields. Track `accessionNumber` as dedup key.
|
|
||||||
|
|
||||||
**SQLite schema:**
|
|
||||||
```sql
|
|
||||||
CREATE TABLE filings (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
accession_number TEXT UNIQUE,
|
|
||||||
ticker TEXT,
|
|
||||||
cik TEXT,
|
|
||||||
insider_name TEXT,
|
|
||||||
role TEXT,
|
|
||||||
transaction_date TEXT,
|
|
||||||
filed_date TEXT,
|
|
||||||
shares REAL,
|
|
||||||
price REAL,
|
|
||||||
total_value REAL,
|
|
||||||
flag TEXT, -- A or D
|
|
||||||
is_10b51 INTEGER, -- 0 or 1
|
|
||||||
post_tx_shares REAL,
|
|
||||||
created_at TEXT
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE signals (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
ticker TEXT,
|
|
||||||
trigger_date TEXT,
|
|
||||||
cluster_size INTEGER,
|
|
||||||
total_cluster_value REAL,
|
|
||||||
score REAL,
|
|
||||||
alerted INTEGER DEFAULT 0,
|
|
||||||
executed INTEGER DEFAULT 0,
|
|
||||||
created_at TEXT
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 2 -- Filter Engine
|
|
||||||
|
|
||||||
**Goal:** Reduce noise to actionable signals only.
|
|
||||||
|
|
||||||
**Filters to apply (in order):**
|
|
||||||
|
|
||||||
| Filter | Logic |
|
|
||||||
|---|---|
|
|
||||||
| Buy only | `flag == 'A'` |
|
|
||||||
| Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" |
|
|
||||||
| Min transaction value | `total_value >= 50000` (configurable) |
|
|
||||||
| Exclude derivative transactions | Options exercises are weaker signal than open market purchases |
|
|
||||||
| Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent |
|
|
||||||
| Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal |
|
|
||||||
|
|
||||||
**Scoring formula (simple v1):**
|
|
||||||
```python
|
|
||||||
score = base_role_weight * log(total_value) * cluster_multiplier
|
|
||||||
# cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1))
|
|
||||||
```
|
|
||||||
|
|
||||||
Expose all thresholds in `config.py` for easy tuning during backtesting.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 3 -- SQLite Storage
|
|
||||||
|
|
||||||
SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting:
|
|
||||||
|
|
||||||
```python
|
|
||||||
conn = sqlite3.connect('insider.db')
|
|
||||||
conn.execute('PRAGMA journal_mode=WAL')
|
|
||||||
```
|
|
||||||
|
|
||||||
Keep raw filing XML in a `/data/filings/` directory keyed by accession number. Parse on ingest, re-parse never needed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 4 -- Slack Alerts
|
|
||||||
|
|
||||||
**Goal:** Get notified immediately when a signal fires, with enough context to decide manually.
|
|
||||||
|
|
||||||
1. Create a Slack app, get a webhook URL (takes 5 minutes)
|
|
||||||
2. Alert format:
|
|
||||||
|
|
||||||
```
|
|
||||||
INSIDER BUY SIGNAL
|
|
||||||
Ticker: $ACME
|
|
||||||
Insider: John Smith (CEO)
|
|
||||||
Date: 2025-05-01
|
|
||||||
Shares: 10,000 @ $14.50 = $145,000
|
|
||||||
Cluster: 3 insiders in last 14 days
|
|
||||||
Score: 8.4
|
|
||||||
10b5-1: No
|
|
||||||
EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?...
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Alert only on signals above configurable score threshold
|
|
||||||
4. Mark `alerted = 1` in DB after sending to avoid duplicates on re-poll
|
|
||||||
|
|
||||||
```python
|
|
||||||
import requests
|
|
||||||
|
|
||||||
def send_slack_alert(webhook_url, signal):
|
|
||||||
requests.post(webhook_url, json={"text": format_signal(signal)})
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 5 -- Backtesting
|
|
||||||
|
|
||||||
**Goal:** Validate filter parameters on historical data before going live.
|
|
||||||
|
|
||||||
**Data:**
|
|
||||||
- Historical Form 4 filings: download bulk XML from `https://www.sec.gov/dera/data/form-4-data`
|
|
||||||
- Price data: `yfinance` (free, sufficient for backtesting)
|
|
||||||
|
|
||||||
**Backtest logic:**
|
|
||||||
```python
|
|
||||||
# For each signal in historical data:
|
|
||||||
# - Entry: next market open after filed_date
|
|
||||||
# - Exit: N days later (configurable: 30/60/90/180)
|
|
||||||
# - Calculate return vs SPY over same period
|
|
||||||
# - Aggregate by role, cluster_size, market_cap bucket
|
|
||||||
```
|
|
||||||
|
|
||||||
**Use `vectorbt` for performance:**
|
|
||||||
```python
|
|
||||||
import vectorbt as vbt
|
|
||||||
# Build entry/exit signal matrices aligned to price data
|
|
||||||
# Run portfolio simulation with configurable position sizing
|
|
||||||
```
|
|
||||||
|
|
||||||
**Output metrics:**
|
|
||||||
- Annualized return vs SPY benchmark
|
|
||||||
- Win rate
|
|
||||||
- Avg return by holding period
|
|
||||||
- Avg return by role / cluster size
|
|
||||||
- Max drawdown
|
|
||||||
- Sharpe ratio
|
|
||||||
|
|
||||||
**Critical:** Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage.
|
|
||||||
|
|
||||||
**Parameter grid to test:**
|
|
||||||
```python
|
|
||||||
MIN_VALUE = [25_000, 50_000, 100_000]
|
|
||||||
HOLDING_DAYS = [30, 60, 90, 180]
|
|
||||||
CLUSTER_WINDOW = [14, 30]
|
|
||||||
MIN_CLUSTER_SIZE = [1, 2, 3]
|
|
||||||
ROLES = ['all', 'c-suite-only']
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase 6 -- Alpaca Integration
|
|
||||||
|
|
||||||
**Goal:** Optionally auto-execute signals. Start with paper trading.
|
|
||||||
|
|
||||||
**Paper trading base URL:** `https://paper-api.alpaca.markets`
|
|
||||||
**Live trading base URL:** `https://api.alpaca.markets`
|
|
||||||
|
|
||||||
Swap via config flag -- never hardcode.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from alpaca_trade_api import REST
|
|
||||||
|
|
||||||
api = REST(
|
|
||||||
key_id=config.ALPACA_KEY,
|
|
||||||
secret_key=config.ALPACA_SECRET,
|
|
||||||
base_url=config.ALPACA_BASE_URL # paper or live
|
|
||||||
)
|
|
||||||
|
|
||||||
def execute_signal(ticker, portfolio_value, signal_score):
|
|
||||||
# Fixed fractional sizing: 2% of portfolio per signal
|
|
||||||
price = api.get_latest_trade(ticker).price
|
|
||||||
allocation = portfolio_value * 0.02
|
|
||||||
qty = int(allocation / price)
|
|
||||||
if qty < 1:
|
|
||||||
return
|
|
||||||
api.submit_order(
|
|
||||||
symbol=ticker,
|
|
||||||
qty=qty,
|
|
||||||
side='buy',
|
|
||||||
type='market',
|
|
||||||
time_in_force='day'
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure.
|
|
||||||
|
|
||||||
Exit logic (v1): time-based only (close after N days). Add trailing stop later.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Build Order
|
|
||||||
|
|
||||||
| Step | Deliverable | Est. Time |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day |
|
|
||||||
| 2 | Filter engine + cluster detector | 0.5 day |
|
|
||||||
| 3 | Slack alert | 1 hour |
|
|
||||||
| 4 | Historical data download + backtest | 1-2 days |
|
|
||||||
| 5 | Alpaca paper trading integration | 0.5 day |
|
|
||||||
| 6 | Run paper trading 4-8 weeks, monitor | -- |
|
|
||||||
| 7 | Switch to live with small capital | -- |
|
|
||||||
|
|
||||||
Do not proceed to Step 7 without meaningful paper trading history.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
|
|
||||||
```
|
|
||||||
requests
|
|
||||||
lxml
|
|
||||||
sqlite3 (stdlib)
|
|
||||||
yfinance
|
|
||||||
vectorbt
|
|
||||||
alpaca-trade-api
|
|
||||||
python-dotenv
|
|
||||||
```
|
|
||||||
|
|
||||||
All free. No paid APIs required.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Config Template
|
|
||||||
|
|
||||||
```python
|
|
||||||
# config.py
|
|
||||||
EDGAR_POLL_INTERVAL = 600 # seconds
|
|
||||||
MIN_TRANSACTION_VALUE = 50_000
|
|
||||||
MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality
|
|
||||||
CLUSTER_WINDOW_DAYS = 30
|
|
||||||
HOLDING_PERIOD_DAYS = 90
|
|
||||||
POSITION_SIZE_PCT = 0.02 # 2% per signal
|
|
||||||
MAX_POSITIONS = 20
|
|
||||||
SCORE_ALERT_THRESHOLD = 5.0
|
|
||||||
|
|
||||||
SLACK_WEBHOOK_URL = ""
|
|
||||||
ALPACA_KEY = ""
|
|
||||||
ALPACA_SECRET = ""
|
|
||||||
ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live
|
|
||||||
```
|
|
||||||
31
README.md
31
README.md
@ -44,7 +44,7 @@ cp .env.example .env # fill in credentials
|
|||||||
# Live polling (every 10 min)
|
# Live polling (every 10 min)
|
||||||
python main.py run
|
python main.py run
|
||||||
|
|
||||||
# Bulk-ingest historical filings
|
# Bulk-ingest historical filings (2 years took ~3 days at SEC's 10 req/s rate limit)
|
||||||
python main.py backfill --years 2023 2024
|
python main.py backfill --years 2023 2024
|
||||||
python main.py backfill --year 2024 --quarter 1
|
python main.py backfill --year 2024 --quarter 1
|
||||||
|
|
||||||
@ -62,14 +62,15 @@ python main.py plot
|
|||||||
|
|
||||||
```
|
```
|
||||||
Strategy:
|
Strategy:
|
||||||
--holding-days N Days to hold each position (default: 7)
|
--holding-days N Days to hold each position (default: 7)
|
||||||
--buy-delay N Days after signal to enter (default: 1)
|
--buy-delay N Days after signal to enter (default: 1)
|
||||||
--position-size F Fraction of available cash per trade (default: 0.10)
|
--position-size F Fraction of available cash per trade (default: 0.10)
|
||||||
--min-score F Minimum signal score (default: 0.0)
|
--min-score F Minimum signal score (default: 0.0)
|
||||||
--min-cluster N Minimum cluster size (default: 1)
|
--min-cluster N Minimum cluster size (default: 1)
|
||||||
--capital F Initial capital (default: 100000)
|
--cap-tier large|mid|small|micro Filter by market cap tier (default: all)
|
||||||
|
--capital F Initial capital (default: 100000)
|
||||||
|
|
||||||
Transaction costs:
|
Transaction costs (Alpaca has zero commission, set --commission 0):
|
||||||
--spread F One-way bid-ask half-spread at entry and exit (default: 0.003)
|
--spread F One-way bid-ask half-spread at entry and exit (default: 0.003)
|
||||||
--slippage F Entry slippage / market impact (default: 0.002)
|
--slippage F Entry slippage / market impact (default: 0.002)
|
||||||
--commission F Per-trade commission as fraction of notional (default: 0.001)
|
--commission F Per-trade commission as fraction of notional (default: 0.001)
|
||||||
@ -77,12 +78,10 @@ Transaction costs:
|
|||||||
|
|
||||||
Round-trip = spread x 2 + slippage + commission x 2.
|
Round-trip = spread x 2 + slippage + commission x 2.
|
||||||
|
|
||||||
## Setup
|
Cap tiers: large >$10B, mid $2-10B, small $300M-2B, micro <$300M.
|
||||||
|
Market caps are fetched from yfinance on first use and cached in the DB.
|
||||||
|
|
||||||
```bash
|
## Setup
|
||||||
cp .env.example .env
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
| Variable | Default | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
@ -147,7 +146,7 @@ Alpaca charges $0 commission on US equities. Real costs are spread + slippage on
|
|||||||
|
|
||||||
SPY annualised over the same period: ~+16%.
|
SPY annualised over the same period: ~+16%.
|
||||||
|
|
||||||
Break-even is roughly 0.3-0.5% round-trip. On Alpaca that means large-cap stocks only -- but most insider buying happens in small and mid-cap names, so filtering aggressively kills signal count.
|
Break-even is roughly 0.3-0.5% round-trip. On Alpaca that means large-cap stocks only. Most insider buying happens in small and mid-cap names, so filtering aggressively kills signal count.
|
||||||
|
|
||||||
### Is insidercopytrading.com a scam?
|
### Is insidercopytrading.com a scam?
|
||||||
|
|
||||||
@ -174,14 +173,14 @@ Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not
|
|||||||
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling |
|
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling |
|
||||||
| `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via form.idx |
|
| `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via form.idx |
|
||||||
| `ingestion/form4_parser.py` | Form 4 XML parser; 10b5-1 detection |
|
| `ingestion/form4_parser.py` | Form 4 XML parser; 10b5-1 detection |
|
||||||
| `db/models.py` | SQLAlchemy ORM models |
|
| `db/models.py` | SQLAlchemy ORM models (Filing, Signal, PriceCache, TickerMeta) |
|
||||||
| `db/db.py` | DB access layer |
|
| `db/db.py` | DB access layer |
|
||||||
| `signals/filter_engine.py` | Filing to signal pipeline |
|
| `signals/filter_engine.py` | Filing to signal pipeline |
|
||||||
| `signals/cluster_detector.py` | Cluster detection |
|
| `signals/cluster_detector.py` | Cluster detection |
|
||||||
| `alerts/slack_alert.py` | Slack webhook |
|
| `alerts/slack_alert.py` | Slack webhook |
|
||||||
| `broker/alpaca_client.py` | Alpaca order execution |
|
| `broker/alpaca_client.py` | Alpaca order execution |
|
||||||
| `backtest/backtest.py` | Per-signal backtest |
|
| `backtest/backtest.py` | Per-signal backtest |
|
||||||
| `backtest/simulate.py` | Portfolio simulator |
|
| `backtest/simulate.py` | Portfolio simulator with cap-tier filtering |
|
||||||
| `backtest/plot.py` | Plot generator |
|
| `backtest/plot.py` | Plot generator |
|
||||||
| `main.py` | CLI: `run / backfill / backtest / simulate / plot` |
|
| `main.py` | CLI: `run / backfill / backtest / simulate / plot` |
|
||||||
|
|
||||||
|
|||||||
@ -41,15 +41,11 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
hold_days = [3, 5, 7, 10, 14, 21, 30]
|
hold_days = [3, 5, 7, 10, 14, 21, 30]
|
||||||
rt_pcts = [0.3, 0.5, 0.7, 1.0, 1.2, 1.5, 2.0]
|
rt_pcts = [0.3, 0.5, 0.7, 1.0, 1.2, 1.5, 2.0]
|
||||||
|
|
||||||
# decompose round-trip into (spread, slippage, commission) that sum correctly:
|
# Alpaca: zero commission. Decompose RT into spread + slippage only (50/50).
|
||||||
# roundtrip = 2*spread + slippage + 2*commission
|
# roundtrip = 2*spread + slippage => spread = RT*0.25, slippage = RT*0.5
|
||||||
# allocate 40% spread, 40% slippage, 20% commission (all relative to RT)
|
# verify: 2*0.25 + 0.5 = 1.0 * RT ✓
|
||||||
# => spread = RT*0.4/2 = RT*0.2 (one-way)
|
|
||||||
# => slippage = RT*0.4
|
|
||||||
# => commission = RT*0.2/2 = RT*0.1 (one-way)
|
|
||||||
# verify: 2*0.2 + 0.4 + 2*0.1 = 0.4+0.4+0.2 = 1.0 * RT ✓
|
|
||||||
def _costs(rt):
|
def _costs(rt):
|
||||||
return dict(spread=rt * 0.2, slippage=rt * 0.4, commission=rt * 0.1)
|
return dict(spread=rt * 0.25, slippage=rt * 0.5, commission=0)
|
||||||
|
|
||||||
rows_excess = []
|
rows_excess = []
|
||||||
rows_ann = []
|
rows_ann = []
|
||||||
@ -116,7 +112,7 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
ax.text(j, i, txt, ha="center", va="center", fontsize=7.5, color=color)
|
ax.text(j, i, txt, ha="center", va="center", fontsize=7.5, color=color)
|
||||||
|
|
||||||
fig.suptitle(
|
fig.suptitle(
|
||||||
"HP sweep: 1-day entry delay, 10% position size, buy filter only",
|
"HP sweep: Alpaca (zero commission), 1-day entry delay, 10% position size, all cap tiers",
|
||||||
fontsize=12,
|
fontsize=12,
|
||||||
)
|
)
|
||||||
plt.tight_layout()
|
plt.tight_layout()
|
||||||
@ -135,22 +131,25 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
"""
|
"""
|
||||||
matplotlib, plt, mdates, np = _get_matplotlib()
|
matplotlib, plt, mdates, np = _get_matplotlib()
|
||||||
|
|
||||||
|
# Alpaca zero-commission costs by cap tier (spread + slippage only)
|
||||||
scenarios = [
|
scenarios = [
|
||||||
{"label": "0% RT cost (theoretical)", "spread": 0, "slippage": 0, "commission": 0},
|
{"label": "Large cap (~0.2% RT)", "cap_tier": "large", "spread": 0.001, "slippage": 0.001},
|
||||||
{"label": "0.67% RT (best case)", "spread": 0.0014, "slippage": 0.0027, "commission": 0.0007},
|
{"label": "Mid cap (~0.5% RT)", "cap_tier": "mid", "spread": 0.0025, "slippage": 0.0025},
|
||||||
{"label": "1.0% RT (mid)", "spread": 0.002, "slippage": 0.004, "commission": 0.001},
|
{"label": "Small cap (~0.8% RT)", "cap_tier": "small", "spread": 0.004, "slippage": 0.004},
|
||||||
{"label": "1.5% RT (realistic small-cap)","spread": 0.003, "slippage": 0.006, "commission": 0.0015},
|
{"label": "All tickers (0% RT)", "cap_tier": None, "spread": 0, "slippage": 0},
|
||||||
]
|
]
|
||||||
|
|
||||||
fig, ax = plt.subplots(figsize=(13, 7))
|
fig, ax = plt.subplots(figsize=(13, 7))
|
||||||
|
|
||||||
colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"]
|
colors = ["#2ecc71", "#3498db", "#e67e22", "#aaaaaa"]
|
||||||
sim_start = sim_end = None
|
sim_start = None
|
||||||
|
last_curve_date = None
|
||||||
|
|
||||||
for sc, color in zip(scenarios, colors):
|
for sc, color in zip(scenarios, colors):
|
||||||
s = Strategy(
|
s = Strategy(
|
||||||
holding_days=7, buy_delay=1,
|
holding_days=7, buy_delay=1,
|
||||||
spread=sc["spread"], slippage=sc["slippage"], commission=sc["commission"],
|
spread=sc["spread"], slippage=sc["slippage"], commission=0,
|
||||||
|
cap_tier=sc["cap_tier"],
|
||||||
)
|
)
|
||||||
r = simulate(s, prices=prices)
|
r = simulate(s, prices=prices)
|
||||||
curve = r.get("equity_curve", [])
|
curve = r.get("equity_curve", [])
|
||||||
@ -158,7 +157,7 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
sim_start = sim_start or r["period"]["start"]
|
sim_start = sim_start or r["period"]["start"]
|
||||||
sim_end = r["period"]["end"]
|
last_curve_date = curve[-1][0] # actual last signal date in this curve
|
||||||
|
|
||||||
dates = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve]
|
dates = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve]
|
||||||
values = [v for _, v in curve]
|
values = [v for _, v in curve]
|
||||||
@ -166,10 +165,10 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
ax.plot(dates, [v / base * 100 for v in values],
|
ax.plot(dates, [v / base * 100 for v in values],
|
||||||
label=sc["label"], color=color, linewidth=1.8)
|
label=sc["label"], color=color, linewidth=1.8)
|
||||||
|
|
||||||
# SPY buy-and-hold overlay
|
# SPY buy-and-hold overlay — clamp to last data point of strategy curves
|
||||||
spy_px = prices.get("SPY", {})
|
spy_px = prices.get("SPY", {})
|
||||||
if spy_px and sim_start and sim_end:
|
if spy_px and sim_start and last_curve_date:
|
||||||
spy_dates = sorted(d for d in spy_px if sim_start <= d <= sim_end)
|
spy_dates = sorted(d for d in spy_px if sim_start <= d <= last_curve_date)
|
||||||
if spy_dates:
|
if spy_dates:
|
||||||
base = spy_px[spy_dates[0]]
|
base = spy_px[spy_dates[0]]
|
||||||
ax.plot(
|
ax.plot(
|
||||||
@ -182,7 +181,7 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR) -> str:
|
|||||||
ax.set_xlabel("Date", fontsize=11)
|
ax.set_xlabel("Date", fontsize=11)
|
||||||
ax.set_ylabel("Portfolio value (indexed to 100)", fontsize=11)
|
ax.set_ylabel("Portfolio value (indexed to 100)", fontsize=11)
|
||||||
ax.set_title(
|
ax.set_title(
|
||||||
"Insider Copytrade: equity curves vs SPY (7d hold, 1d delay, 10% position size)",
|
"Insider Copytrade: equity curves by cap tier, Alpaca costs (7d hold, 1d delay, 10% position size)",
|
||||||
fontsize=12,
|
fontsize=12,
|
||||||
)
|
)
|
||||||
ax.legend(fontsize=10)
|
ax.legend(fontsize=10)
|
||||||
|
|||||||
@ -32,7 +32,39 @@ from datetime import datetime, timedelta
|
|||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||||
|
|
||||||
import config
|
import config
|
||||||
from db.db import get_signals_for_backtest
|
from db.db import get_signals_for_backtest, get_cached_market_caps, upsert_market_caps
|
||||||
|
|
||||||
|
CAP_TIERS = {
|
||||||
|
"large": (10_000_000_000, None),
|
||||||
|
"mid": (2_000_000_000, 10_000_000_000),
|
||||||
|
"small": (300_000_000, 2_000_000_000),
|
||||||
|
"micro": (0, 300_000_000),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _fetch_market_caps(tickers: list[str]) -> dict[str, float]:
|
||||||
|
"""Return market cap for each ticker, using DB cache then yfinance for misses."""
|
||||||
|
import yfinance as yf
|
||||||
|
|
||||||
|
cached = get_cached_market_caps(tickers)
|
||||||
|
missing = [t for t in tickers if t not in cached]
|
||||||
|
|
||||||
|
if missing:
|
||||||
|
logger.info(f"Fetching market caps for {len(missing)} tickers via yfinance...")
|
||||||
|
fetched = {}
|
||||||
|
for ticker in missing:
|
||||||
|
try:
|
||||||
|
info = yf.Ticker(ticker).fast_info
|
||||||
|
cap = getattr(info, "market_cap", None)
|
||||||
|
if cap:
|
||||||
|
fetched[ticker] = float(cap)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if fetched:
|
||||||
|
upsert_market_caps(fetched)
|
||||||
|
cached.update(fetched)
|
||||||
|
|
||||||
|
return cached
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -92,6 +124,7 @@ class Strategy:
|
|||||||
spread: float = 0.003,
|
spread: float = 0.003,
|
||||||
slippage: float = 0.002,
|
slippage: float = 0.002,
|
||||||
commission: float = 0.001,
|
commission: float = 0.001,
|
||||||
|
cap_tier: str = None,
|
||||||
):
|
):
|
||||||
self.holding_days = holding_days
|
self.holding_days = holding_days
|
||||||
self.buy_delay = buy_delay
|
self.buy_delay = buy_delay
|
||||||
@ -102,6 +135,7 @@ class Strategy:
|
|||||||
self.spread = spread
|
self.spread = spread
|
||||||
self.slippage = slippage
|
self.slippage = slippage
|
||||||
self.commission = commission
|
self.commission = commission
|
||||||
|
self.cap_tier = cap_tier # "large" | "mid" | "small" | "micro" | None
|
||||||
|
|
||||||
# cost applied at entry: half-spread + slippage + commission
|
# cost applied at entry: half-spread + slippage + commission
|
||||||
@property
|
@property
|
||||||
@ -137,6 +171,22 @@ def simulate(strategy: Strategy, prices: dict = None) -> dict:
|
|||||||
if not signals:
|
if not signals:
|
||||||
return {"error": "No signals after filtering"}
|
return {"error": "No signals after filtering"}
|
||||||
|
|
||||||
|
if strategy.cap_tier:
|
||||||
|
tier = CAP_TIERS.get(strategy.cap_tier)
|
||||||
|
if tier is None:
|
||||||
|
raise ValueError(f"Unknown cap_tier {strategy.cap_tier!r}. Use: {list(CAP_TIERS)}")
|
||||||
|
cap_min, cap_max = tier
|
||||||
|
tickers = list({s["ticker"] for s in signals})
|
||||||
|
market_caps = _fetch_market_caps(tickers)
|
||||||
|
signals = [
|
||||||
|
s for s in signals
|
||||||
|
if market_caps.get(s["ticker"], 0) >= cap_min
|
||||||
|
and (cap_max is None or market_caps.get(s["ticker"], 0) < cap_max)
|
||||||
|
]
|
||||||
|
logger.info(f"Cap tier '{strategy.cap_tier}': {len(signals)} signals after filtering")
|
||||||
|
if not signals:
|
||||||
|
return {"error": f"No signals after cap_tier={strategy.cap_tier} filter"}
|
||||||
|
|
||||||
if prices is None:
|
if prices is None:
|
||||||
prices = _load_all_prices()
|
prices = _load_all_prices()
|
||||||
|
|
||||||
@ -291,6 +341,7 @@ def simulate(strategy: Strategy, prices: dict = None) -> dict:
|
|||||||
"min_score": strategy.min_score,
|
"min_score": strategy.min_score,
|
||||||
"min_cluster": strategy.min_cluster,
|
"min_cluster": strategy.min_cluster,
|
||||||
"roundtrip_cost_pct": round(strategy.roundtrip_cost * 100, 3),
|
"roundtrip_cost_pct": round(strategy.roundtrip_cost * 100, 3),
|
||||||
|
"cap_tier": strategy.cap_tier or "all",
|
||||||
},
|
},
|
||||||
"period": {
|
"period": {
|
||||||
"start": equity_curve[0][0] if equity_curve else "n/a",
|
"start": equity_curve[0][0] if equity_curve else "n/a",
|
||||||
@ -338,7 +389,7 @@ def _print_results(r: dict):
|
|||||||
print(f"{'=' * w}")
|
print(f"{'=' * w}")
|
||||||
print(f" Strategy")
|
print(f" Strategy")
|
||||||
print(f" Hold: {s['holding_days']}d | Delay: {s['buy_delay']}d | Size: {s['position_size']*100:.0f}% of cash")
|
print(f" Hold: {s['holding_days']}d | Delay: {s['buy_delay']}d | Size: {s['position_size']*100:.0f}% of cash")
|
||||||
print(f" Score ≥ {s['min_score']} | Cluster ≥ {s['min_cluster']}")
|
print(f" Score ≥ {s['min_score']} | Cluster ≥ {s['min_cluster']} | Cap: {s['cap_tier']}")
|
||||||
print(f" Round-trip cost: {s['roundtrip_cost_pct']:.2f}%")
|
print(f" Round-trip cost: {s['roundtrip_cost_pct']:.2f}%")
|
||||||
print(f" Period: {period['start']} → {period['end']} ({period['years']}y)")
|
print(f" Period: {period['start']} → {period['end']} ({period['years']}y)")
|
||||||
print(f"{'─' * w}")
|
print(f"{'─' * w}")
|
||||||
@ -373,6 +424,8 @@ def main():
|
|||||||
help="Fraction of available cash per trade (0.10 = 10%%)")
|
help="Fraction of available cash per trade (0.10 = 10%%)")
|
||||||
parser.add_argument("--min-score", type=float, default=0.0)
|
parser.add_argument("--min-score", type=float, default=0.0)
|
||||||
parser.add_argument("--min-cluster", type=int, default=1)
|
parser.add_argument("--min-cluster", type=int, default=1)
|
||||||
|
parser.add_argument("--cap-tier", choices=["large", "mid", "small", "micro"],
|
||||||
|
default=None, help="Filter by market cap tier")
|
||||||
parser.add_argument("--capital", type=float, default=100_000.0)
|
parser.add_argument("--capital", type=float, default=100_000.0)
|
||||||
# Costs
|
# Costs
|
||||||
parser.add_argument("--spread", type=float, default=0.003,
|
parser.add_argument("--spread", type=float, default=0.003,
|
||||||
@ -382,7 +435,11 @@ def main():
|
|||||||
parser.add_argument("--commission", type=float, default=0.001,
|
parser.add_argument("--commission", type=float, default=0.001,
|
||||||
help="Per-trade commission as fraction of notional")
|
help="Per-trade commission as fraction of notional")
|
||||||
|
|
||||||
args = parser.parse_args()
|
# When invoked via `python main.py simulate ...`, argv[1] is 'simulate' -- skip it
|
||||||
|
raw = sys.argv[1:]
|
||||||
|
if raw and raw[0] == "simulate":
|
||||||
|
raw = raw[1:]
|
||||||
|
args = parser.parse_args(raw)
|
||||||
|
|
||||||
from db.db import init_db
|
from db.db import init_db
|
||||||
init_db()
|
init_db()
|
||||||
@ -397,6 +454,7 @@ def main():
|
|||||||
spread=args.spread,
|
spread=args.spread,
|
||||||
slippage=args.slippage,
|
slippage=args.slippage,
|
||||||
commission=args.commission,
|
commission=args.commission,
|
||||||
|
cap_tier=args.cap_tier,
|
||||||
)
|
)
|
||||||
|
|
||||||
result = simulate(strategy)
|
result = simulate(strategy)
|
||||||
|
|||||||
24
db/db.py
24
db/db.py
@ -6,7 +6,7 @@ from sqlalchemy.exc import IntegrityError
|
|||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
import config
|
import config
|
||||||
from db.models import Base, Filing, PriceCache, Signal
|
from db.models import Base, Filing, PriceCache, Signal, TickerMeta
|
||||||
|
|
||||||
|
|
||||||
def _engine():
|
def _engine():
|
||||||
@ -219,6 +219,28 @@ def get_signals_for_backtest(min_score: float, min_cluster_size: int) -> list[di
|
|||||||
return [_signal_to_dict(r) for r in rows]
|
return [_signal_to_dict(r) for r in rows]
|
||||||
|
|
||||||
|
|
||||||
|
def get_cached_market_caps(tickers: list[str]) -> dict[str, float]:
|
||||||
|
if not tickers:
|
||||||
|
return {}
|
||||||
|
with _session() as session:
|
||||||
|
rows = session.scalars(
|
||||||
|
select(TickerMeta).where(TickerMeta.ticker.in_(tickers))
|
||||||
|
).all()
|
||||||
|
return {r.ticker: r.market_cap for r in rows if r.market_cap is not None}
|
||||||
|
|
||||||
|
|
||||||
|
def upsert_market_caps(caps: dict[str, float]) -> None:
|
||||||
|
with _session() as session:
|
||||||
|
for ticker, cap in caps.items():
|
||||||
|
existing = session.get(TickerMeta, ticker)
|
||||||
|
if existing:
|
||||||
|
existing.market_cap = cap
|
||||||
|
existing.fetched_at = datetime.utcnow()
|
||||||
|
else:
|
||||||
|
session.add(TickerMeta(ticker=ticker, market_cap=cap))
|
||||||
|
session.commit()
|
||||||
|
|
||||||
|
|
||||||
def get_cached_prices(ticker: str, start_date: str, end_date: str) -> dict[str, float]:
|
def get_cached_prices(ticker: str, start_date: str, end_date: str) -> dict[str, float]:
|
||||||
with _session() as session:
|
with _session() as session:
|
||||||
rows = session.scalars(
|
rows = session.scalars(
|
||||||
|
|||||||
@ -66,6 +66,14 @@ class Signal(Base):
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class TickerMeta(Base):
|
||||||
|
__tablename__ = "ticker_meta"
|
||||||
|
|
||||||
|
ticker = Column(String, primary_key=True)
|
||||||
|
market_cap = Column(Float, nullable=True)
|
||||||
|
fetched_at = Column(DateTime, default=datetime.utcnow)
|
||||||
|
|
||||||
|
|
||||||
class PriceCache(Base):
|
class PriceCache(Base):
|
||||||
__tablename__ = "price_cache"
|
__tablename__ = "price_cache"
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user