feat: Insider Copytrade POC + PLAN.md #2
6
.env.example
Normal file
6
.env.example
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
SLACK_WEBHOOK_URL=
|
||||||
|
ALPACA_KEY=
|
||||||
|
ALPACA_SECRET=
|
||||||
|
ALPACA_BASE_URL=https://paper-api.alpaca.markets
|
||||||
|
DB_PATH=insider.db
|
||||||
|
DATA_DIR=data/filings
|
||||||
340
PLAN.md
Normal file
340
PLAN.md
Normal file
@ -0,0 +1,340 @@
|
|||||||
|
# Insider Copytrade System -- Implementation Plan
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API.
|
||||||
|
|
||||||
|
The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider *buying* is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when:
|
||||||
|
|
||||||
|
- Multiple insiders buy simultaneously (cluster signal)
|
||||||
|
- The trade is unplanned (not a 10b5-1 scheduled plan)
|
||||||
|
- The company is small/mid-cap (less institutional arbitrage)
|
||||||
|
|
||||||
|
The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level.
|
||||||
|
|
||||||
|
**Key risk:** This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Outline
|
||||||
|
|
||||||
|
```
|
||||||
|
SEC EDGAR RSS Feed (poll every 10 min)
|
||||||
|
|
|
||||||
|
[Ingestion Layer]
|
||||||
|
|
|
||||||
|
Parse Form 4 XML
|
||||||
|
|
|
||||||
|
[Filter Engine]
|
||||||
|
- Buy only (flag = A)
|
||||||
|
- Exclude 10b5-1 plans
|
||||||
|
- Min transaction size
|
||||||
|
- Role weighting
|
||||||
|
- Cluster detection
|
||||||
|
|
|
||||||
|
SQLite Database
|
||||||
|
|
|
||||||
|
┌────────────┬──────────────┐
|
||||||
|
| | |
|
||||||
|
[Backtester] [Slack Alert] [Alpaca API]
|
||||||
|
(manual) (paper/live)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Actionables
|
||||||
|
|
||||||
|
### Phase 1 -- Data Ingestion
|
||||||
|
|
||||||
|
**Goal:** Reliably pull and parse Form 4 filings as they appear.
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
|
||||||
|
1. Set up project structure
|
||||||
|
```
|
||||||
|
insider-copytrade/
|
||||||
|
ingestion/
|
||||||
|
edgar_poller.py # polls EDGAR RSS
|
||||||
|
form4_parser.py # parses XML -> structured dict
|
||||||
|
db/
|
||||||
|
schema.sql
|
||||||
|
db.py # SQLite interface
|
||||||
|
signals/
|
||||||
|
filter_engine.py # applies signal filters
|
||||||
|
cluster_detector.py
|
||||||
|
alerts/
|
||||||
|
slack_alert.py
|
||||||
|
broker/
|
||||||
|
alpaca_client.py
|
||||||
|
backtest/
|
||||||
|
backtest.py
|
||||||
|
config.py
|
||||||
|
main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Poll EDGAR RSS for Form 4 filings every 10 minutes:
|
||||||
|
```
|
||||||
|
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent
|
||||||
|
```
|
||||||
|
SEC also provides a structured latest filings feed:
|
||||||
|
```
|
||||||
|
https://efts.sec.gov/LATEST/search-index?q=&forms=4
|
||||||
|
```
|
||||||
|
|
||||||
|
3. For each new filing, fetch and parse the XML document. Key fields to extract:
|
||||||
|
- `issuerTradingSymbol` (ticker)
|
||||||
|
- `rptOwnerName`, `officerTitle` (insider name + role)
|
||||||
|
- `transactionDate`
|
||||||
|
- `transactionAcquiredDisposedCode` (A = buy, D = sell)
|
||||||
|
- `transactionShares`, `transactionPricePerShare`
|
||||||
|
- `transactionTotalValue` (compute if not present)
|
||||||
|
- `footnotes` (check for "10b5-1" mention)
|
||||||
|
- `sharesOwnedFollowingTransaction`
|
||||||
|
|
||||||
|
4. Store raw filing XML + parsed fields. Track `accessionNumber` as dedup key.
|
||||||
|
|
||||||
|
**SQLite schema:**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE filings (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
accession_number TEXT UNIQUE,
|
||||||
|
ticker TEXT,
|
||||||
|
cik TEXT,
|
||||||
|
insider_name TEXT,
|
||||||
|
role TEXT,
|
||||||
|
transaction_date TEXT,
|
||||||
|
filed_date TEXT,
|
||||||
|
shares REAL,
|
||||||
|
price REAL,
|
||||||
|
total_value REAL,
|
||||||
|
flag TEXT, -- A or D
|
||||||
|
is_10b51 INTEGER, -- 0 or 1
|
||||||
|
post_tx_shares REAL,
|
||||||
|
created_at TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE signals (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
ticker TEXT,
|
||||||
|
trigger_date TEXT,
|
||||||
|
cluster_size INTEGER,
|
||||||
|
total_cluster_value REAL,
|
||||||
|
score REAL,
|
||||||
|
alerted INTEGER DEFAULT 0,
|
||||||
|
executed INTEGER DEFAULT 0,
|
||||||
|
created_at TEXT
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2 -- Filter Engine
|
||||||
|
|
||||||
|
**Goal:** Reduce noise to actionable signals only.
|
||||||
|
|
||||||
|
**Filters to apply (in order):**
|
||||||
|
|
||||||
|
| Filter | Logic |
|
||||||
|
|---|---|
|
||||||
|
| Buy only | `flag == 'A'` |
|
||||||
|
| Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" |
|
||||||
|
| Min transaction value | `total_value >= 50000` (configurable) |
|
||||||
|
| Exclude derivative transactions | Options exercises are weaker signal than open market purchases |
|
||||||
|
| Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent |
|
||||||
|
| Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal |
|
||||||
|
|
||||||
|
**Scoring formula (simple v1):**
|
||||||
|
```python
|
||||||
|
score = base_role_weight * log(total_value) * cluster_multiplier
|
||||||
|
# cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1))
|
||||||
|
```
|
||||||
|
|
||||||
|
Expose all thresholds in `config.py` for easy tuning during backtesting.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3 -- SQLite Storage
|
||||||
|
|
||||||
|
SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting:
|
||||||
|
|
||||||
|
```python
|
||||||
|
conn = sqlite3.connect('insider.db')
|
||||||
|
conn.execute('PRAGMA journal_mode=WAL')
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep raw filing XML in a `/data/filings/` directory keyed by accession number. Parse on ingest, re-parse never needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 4 -- Slack Alerts
|
||||||
|
|
||||||
|
**Goal:** Get notified immediately when a signal fires, with enough context to decide manually.
|
||||||
|
|
||||||
|
1. Create a Slack app, get a webhook URL (takes 5 minutes)
|
||||||
|
2. Alert format:
|
||||||
|
|
||||||
|
```
|
||||||
|
INSIDER BUY SIGNAL
|
||||||
|
Ticker: $ACME
|
||||||
|
Insider: John Smith (CEO)
|
||||||
|
Date: 2025-05-01
|
||||||
|
Shares: 10,000 @ $14.50 = $145,000
|
||||||
|
Cluster: 3 insiders in last 14 days
|
||||||
|
Score: 8.4
|
||||||
|
10b5-1: No
|
||||||
|
EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?...
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Alert only on signals above configurable score threshold
|
||||||
|
4. Mark `alerted = 1` in DB after sending to avoid duplicates on re-poll
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
def send_slack_alert(webhook_url, signal):
|
||||||
|
requests.post(webhook_url, json={"text": format_signal(signal)})
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 5 -- Backtesting
|
||||||
|
|
||||||
|
**Goal:** Validate filter parameters on historical data before going live.
|
||||||
|
|
||||||
|
**Data:**
|
||||||
|
- Historical Form 4 filings: download bulk XML from `https://www.sec.gov/dera/data/form-4-data`
|
||||||
|
- Price data: `yfinance` (free, sufficient for backtesting)
|
||||||
|
|
||||||
|
**Backtest logic:**
|
||||||
|
```python
|
||||||
|
# For each signal in historical data:
|
||||||
|
# - Entry: next market open after filed_date
|
||||||
|
# - Exit: N days later (configurable: 30/60/90/180)
|
||||||
|
# - Calculate return vs SPY over same period
|
||||||
|
# - Aggregate by role, cluster_size, market_cap bucket
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use `vectorbt` for performance:**
|
||||||
|
```python
|
||||||
|
import vectorbt as vbt
|
||||||
|
# Build entry/exit signal matrices aligned to price data
|
||||||
|
# Run portfolio simulation with configurable position sizing
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output metrics:**
|
||||||
|
- Annualized return vs SPY benchmark
|
||||||
|
- Win rate
|
||||||
|
- Avg return by holding period
|
||||||
|
- Avg return by role / cluster size
|
||||||
|
- Max drawdown
|
||||||
|
- Sharpe ratio
|
||||||
|
|
||||||
|
**Critical:** Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage.
|
||||||
|
|
||||||
|
**Parameter grid to test:**
|
||||||
|
```python
|
||||||
|
MIN_VALUE = [25_000, 50_000, 100_000]
|
||||||
|
HOLDING_DAYS = [30, 60, 90, 180]
|
||||||
|
CLUSTER_WINDOW = [14, 30]
|
||||||
|
MIN_CLUSTER_SIZE = [1, 2, 3]
|
||||||
|
ROLES = ['all', 'c-suite-only']
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 6 -- Alpaca Integration
|
||||||
|
|
||||||
|
**Goal:** Optionally auto-execute signals. Start with paper trading.
|
||||||
|
|
||||||
|
**Paper trading base URL:** `https://paper-api.alpaca.markets`
|
||||||
|
**Live trading base URL:** `https://api.alpaca.markets`
|
||||||
|
|
||||||
|
Swap via config flag -- never hardcode.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from alpaca_trade_api import REST
|
||||||
|
|
||||||
|
api = REST(
|
||||||
|
key_id=config.ALPACA_KEY,
|
||||||
|
secret_key=config.ALPACA_SECRET,
|
||||||
|
base_url=config.ALPACA_BASE_URL # paper or live
|
||||||
|
)
|
||||||
|
|
||||||
|
def execute_signal(ticker, portfolio_value, signal_score):
|
||||||
|
# Fixed fractional sizing: 2% of portfolio per signal
|
||||||
|
price = api.get_latest_trade(ticker).price
|
||||||
|
allocation = portfolio_value * 0.02
|
||||||
|
qty = int(allocation / price)
|
||||||
|
if qty < 1:
|
||||||
|
return
|
||||||
|
api.submit_order(
|
||||||
|
symbol=ticker,
|
||||||
|
qty=qty,
|
||||||
|
side='buy',
|
||||||
|
type='market',
|
||||||
|
time_in_force='day'
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure.
|
||||||
|
|
||||||
|
Exit logic (v1): time-based only (close after N days). Add trailing stop later.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build Order
|
||||||
|
|
||||||
|
| Step | Deliverable | Est. Time |
|
||||||
|
|---|---|---|
|
||||||
|
| 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day |
|
||||||
|
| 2 | Filter engine + cluster detector | 0.5 day |
|
||||||
|
| 3 | Slack alert | 1 hour |
|
||||||
|
| 4 | Historical data download + backtest | 1-2 days |
|
||||||
|
| 5 | Alpaca paper trading integration | 0.5 day |
|
||||||
|
| 6 | Run paper trading 4-8 weeks, monitor | -- |
|
||||||
|
| 7 | Switch to live with small capital | -- |
|
||||||
|
|
||||||
|
Do not proceed to Step 7 without meaningful paper trading history.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
requests
|
||||||
|
lxml
|
||||||
|
sqlite3 (stdlib)
|
||||||
|
yfinance
|
||||||
|
vectorbt
|
||||||
|
alpaca-trade-api
|
||||||
|
python-dotenv
|
||||||
|
```
|
||||||
|
|
||||||
|
All free. No paid APIs required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Config Template
|
||||||
|
|
||||||
|
```python
|
||||||
|
# config.py
|
||||||
|
EDGAR_POLL_INTERVAL = 600 # seconds
|
||||||
|
MIN_TRANSACTION_VALUE = 50_000
|
||||||
|
MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality
|
||||||
|
CLUSTER_WINDOW_DAYS = 30
|
||||||
|
HOLDING_PERIOD_DAYS = 90
|
||||||
|
POSITION_SIZE_PCT = 0.02 # 2% per signal
|
||||||
|
MAX_POSITIONS = 20
|
||||||
|
SCORE_ALERT_THRESHOLD = 5.0
|
||||||
|
|
||||||
|
SLACK_WEBHOOK_URL = ""
|
||||||
|
ALPACA_KEY = ""
|
||||||
|
ALPACA_SECRET = ""
|
||||||
|
ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live
|
||||||
|
```
|
||||||
0
alerts/__init__.py
Normal file
0
alerts/__init__.py
Normal file
64
alerts/slack_alert.py
Normal file
64
alerts/slack_alert.py
Normal file
@ -0,0 +1,64 @@
|
|||||||
|
import logging
|
||||||
|
import requests
|
||||||
|
|
||||||
|
import config
|
||||||
|
from db.db import mark_signal_alerted
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def format_signal(signal: dict) -> str:
|
||||||
|
filing = signal.get("filing", {})
|
||||||
|
ticker = signal["ticker"]
|
||||||
|
insider = filing.get("insider_name", "Unknown")
|
||||||
|
role = filing.get("role", "Unknown")
|
||||||
|
tx_date = filing.get("transaction_date", "")
|
||||||
|
shares = filing.get("shares")
|
||||||
|
price = filing.get("price")
|
||||||
|
total_value = filing.get("total_value") or signal.get("total_cluster_value", 0)
|
||||||
|
cluster_size = signal["cluster_size"]
|
||||||
|
score = signal["score"]
|
||||||
|
is_10b51 = "Yes" if filing.get("is_10b51") else "No"
|
||||||
|
accession = filing.get("accession_number", "")
|
||||||
|
|
||||||
|
shares_str = f"{shares:,.0f}" if shares else "N/A"
|
||||||
|
price_str = f"${price:,.2f}" if price else "N/A"
|
||||||
|
value_str = f"${total_value:,.0f}" if total_value else "N/A"
|
||||||
|
edgar_url = f"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&type=4&dateb=&owner=include&count=10&search_text=&ticker={ticker}"
|
||||||
|
|
||||||
|
return (
|
||||||
|
f"*INSIDER BUY SIGNAL*\n"
|
||||||
|
f"Ticker: ${ticker}\n"
|
||||||
|
f"Insider: {insider} ({role})\n"
|
||||||
|
f"Date: {tx_date}\n"
|
||||||
|
f"Shares: {shares_str} @ {price_str} = {value_str}\n"
|
||||||
|
f"Cluster: {cluster_size} insider(s) in last {config.CLUSTER_WINDOW_DAYS} days\n"
|
||||||
|
f"Score: {score}\n"
|
||||||
|
f"10b5-1: {is_10b51}\n"
|
||||||
|
f"EDGAR: {edgar_url}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def send_slack_alert(signal: dict) -> bool:
|
||||||
|
if not config.SLACK_WEBHOOK_URL:
|
||||||
|
logger.warning("SLACK_WEBHOOK_URL not configured")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if signal.get("score", 0) < config.SCORE_ALERT_THRESHOLD:
|
||||||
|
logger.debug(f"Signal score {signal['score']} below threshold {config.SCORE_ALERT_THRESHOLD}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
text = format_signal(signal)
|
||||||
|
try:
|
||||||
|
resp = requests.post(
|
||||||
|
config.SLACK_WEBHOOK_URL,
|
||||||
|
json={"text": text},
|
||||||
|
timeout=10,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
mark_signal_alerted(signal["id"])
|
||||||
|
logger.info(f"Slack alert sent for {signal['ticker']}")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to send Slack alert: {e}")
|
||||||
|
return False
|
||||||
0
backtest/__init__.py
Normal file
0
backtest/__init__.py
Normal file
147
backtest/backtest.py
Normal file
147
backtest/backtest.py
Normal file
@ -0,0 +1,147 @@
|
|||||||
|
import logging
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from typing import Optional
|
||||||
|
import sqlite3
|
||||||
|
|
||||||
|
import config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _load_signals_from_db(db_path: str) -> list[dict]:
|
||||||
|
conn = sqlite3.connect(db_path)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
rows = conn.execute(
|
||||||
|
"SELECT s.*, f.role FROM signals s "
|
||||||
|
"LEFT JOIN filings f ON f.ticker = s.ticker AND f.transaction_date = s.trigger_date "
|
||||||
|
"WHERE s.cluster_size >= 1"
|
||||||
|
).fetchall()
|
||||||
|
conn.close()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
|
||||||
|
def run_backtest(
|
||||||
|
db_path: str = None,
|
||||||
|
holding_days: int = None,
|
||||||
|
min_score: float = 0.0,
|
||||||
|
min_cluster_size: int = 1,
|
||||||
|
) -> dict:
|
||||||
|
try:
|
||||||
|
import yfinance as yf
|
||||||
|
except ImportError:
|
||||||
|
raise ImportError("yfinance not installed. Run: pip install yfinance")
|
||||||
|
|
||||||
|
db_path = db_path or config.DB_PATH
|
||||||
|
holding_days = holding_days or config.HOLDING_PERIOD_DAYS
|
||||||
|
|
||||||
|
signals = _load_signals_from_db(db_path)
|
||||||
|
signals = [s for s in signals if s["score"] >= min_score and s["cluster_size"] >= min_cluster_size]
|
||||||
|
|
||||||
|
if not signals:
|
||||||
|
logger.warning("No signals found matching criteria")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
results = []
|
||||||
|
spy_returns = {}
|
||||||
|
|
||||||
|
for signal in signals:
|
||||||
|
ticker = signal["ticker"]
|
||||||
|
entry_date_str = signal["trigger_date"]
|
||||||
|
|
||||||
|
try:
|
||||||
|
entry_date = datetime.strptime(entry_date_str, "%Y-%m-%d")
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
exit_date = entry_date + timedelta(days=holding_days)
|
||||||
|
|
||||||
|
try:
|
||||||
|
stock_data = yf.download(
|
||||||
|
ticker,
|
||||||
|
start=(entry_date - timedelta(days=5)).strftime("%Y-%m-%d"),
|
||||||
|
end=(exit_date + timedelta(days=5)).strftime("%Y-%m-%d"),
|
||||||
|
progress=False,
|
||||||
|
auto_adjust=True,
|
||||||
|
)
|
||||||
|
if stock_data.empty:
|
||||||
|
continue
|
||||||
|
|
||||||
|
entry_price = float(stock_data["Close"].iloc[0])
|
||||||
|
exit_price = float(stock_data["Close"].iloc[-1])
|
||||||
|
stock_return = (exit_price - entry_price) / entry_price
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug(f"Failed to get data for {ticker}: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
period_key = (entry_date_str, holding_days)
|
||||||
|
if period_key not in spy_returns:
|
||||||
|
try:
|
||||||
|
spy_data = yf.download(
|
||||||
|
"SPY",
|
||||||
|
start=(entry_date - timedelta(days=5)).strftime("%Y-%m-%d"),
|
||||||
|
end=(exit_date + timedelta(days=5)).strftime("%Y-%m-%d"),
|
||||||
|
progress=False,
|
||||||
|
auto_adjust=True,
|
||||||
|
)
|
||||||
|
if not spy_data.empty:
|
||||||
|
spy_entry = float(spy_data["Close"].iloc[0])
|
||||||
|
spy_exit = float(spy_data["Close"].iloc[-1])
|
||||||
|
spy_returns[period_key] = (spy_exit - spy_entry) / spy_entry
|
||||||
|
else:
|
||||||
|
spy_returns[period_key] = 0.0
|
||||||
|
except Exception:
|
||||||
|
spy_returns[period_key] = 0.0
|
||||||
|
|
||||||
|
spy_return = spy_returns.get(period_key, 0.0)
|
||||||
|
alpha = stock_return - spy_return
|
||||||
|
|
||||||
|
results.append({
|
||||||
|
"ticker": ticker,
|
||||||
|
"entry_date": entry_date_str,
|
||||||
|
"stock_return": round(stock_return, 4),
|
||||||
|
"spy_return": round(spy_return, 4),
|
||||||
|
"alpha": round(alpha, 4),
|
||||||
|
"cluster_size": signal["cluster_size"],
|
||||||
|
"score": signal["score"],
|
||||||
|
})
|
||||||
|
|
||||||
|
if not results:
|
||||||
|
return {"error": "No results computed"}
|
||||||
|
|
||||||
|
returns = [r["stock_return"] for r in results]
|
||||||
|
alphas = [r["alpha"] for r in results]
|
||||||
|
win_rate = sum(1 for r in returns if r > 0) / len(returns)
|
||||||
|
avg_return = sum(returns) / len(returns)
|
||||||
|
avg_alpha = sum(alphas) / len(alphas)
|
||||||
|
|
||||||
|
import math
|
||||||
|
std_dev = math.sqrt(sum((r - avg_return) ** 2 for r in returns) / len(returns))
|
||||||
|
sharpe = (avg_return / std_dev * math.sqrt(252 / holding_days)) if std_dev > 0 else 0.0
|
||||||
|
|
||||||
|
summary = {
|
||||||
|
"total_signals": len(results),
|
||||||
|
"win_rate": round(win_rate, 4),
|
||||||
|
"avg_return": round(avg_return, 4),
|
||||||
|
"avg_alpha_vs_spy": round(avg_alpha, 4),
|
||||||
|
"sharpe_ratio": round(sharpe, 4),
|
||||||
|
"holding_days": holding_days,
|
||||||
|
"results": results,
|
||||||
|
}
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(summary: dict):
|
||||||
|
if "error" in summary:
|
||||||
|
print(f"Error: {summary['error']}")
|
||||||
|
return
|
||||||
|
print(f"\n{'='*40}")
|
||||||
|
print(f"Backtest Results ({summary['holding_days']}-day hold)")
|
||||||
|
print(f"{'='*40}")
|
||||||
|
print(f"Total signals: {summary['total_signals']}")
|
||||||
|
print(f"Win rate: {summary['win_rate']:.1%}")
|
||||||
|
print(f"Avg return: {summary['avg_return']:.2%}")
|
||||||
|
print(f"Avg alpha vs SPY: {summary['avg_alpha_vs_spy']:.2%}")
|
||||||
|
print(f"Sharpe ratio: {summary['sharpe_ratio']:.2f}")
|
||||||
|
print(f"{'='*40}\n")
|
||||||
0
broker/__init__.py
Normal file
0
broker/__init__.py
Normal file
94
broker/alpaca_client.py
Normal file
94
broker/alpaca_client.py
Normal file
@ -0,0 +1,94 @@
|
|||||||
|
import logging
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import config
|
||||||
|
from db.db import mark_signal_executed
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_api():
|
||||||
|
try:
|
||||||
|
from alpaca_trade_api import REST
|
||||||
|
except ImportError:
|
||||||
|
raise ImportError("alpaca-trade-api not installed. Run: pip install alpaca-trade-api")
|
||||||
|
|
||||||
|
return REST(
|
||||||
|
key_id=config.ALPACA_KEY,
|
||||||
|
secret_key=config.ALPACA_SECRET,
|
||||||
|
base_url=config.ALPACA_BASE_URL,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def get_portfolio_value() -> float:
|
||||||
|
api = _get_api()
|
||||||
|
account = api.get_account()
|
||||||
|
return float(account.portfolio_value)
|
||||||
|
|
||||||
|
|
||||||
|
def get_open_positions_count() -> int:
|
||||||
|
api = _get_api()
|
||||||
|
return len(api.list_positions())
|
||||||
|
|
||||||
|
|
||||||
|
def execute_signal(signal: dict) -> bool:
|
||||||
|
if not config.ALPACA_KEY or not config.ALPACA_SECRET:
|
||||||
|
logger.warning("Alpaca credentials not configured")
|
||||||
|
return False
|
||||||
|
|
||||||
|
ticker = signal["ticker"]
|
||||||
|
|
||||||
|
try:
|
||||||
|
api = _get_api()
|
||||||
|
positions_count = get_open_positions_count()
|
||||||
|
if positions_count >= config.MAX_POSITIONS:
|
||||||
|
logger.warning(f"Max positions ({config.MAX_POSITIONS}) reached, skipping {ticker}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
portfolio_value = get_portfolio_value()
|
||||||
|
allocation = portfolio_value * config.POSITION_SIZE_PCT
|
||||||
|
|
||||||
|
latest_trade = api.get_latest_trade(ticker)
|
||||||
|
price = float(latest_trade.price)
|
||||||
|
if price <= 0:
|
||||||
|
logger.error(f"Invalid price for {ticker}: {price}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
qty = int(allocation / price)
|
||||||
|
if qty < 1:
|
||||||
|
logger.warning(f"Allocation too small for {ticker}: ${allocation:.2f} at ${price:.2f}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
existing_positions = {p.symbol: p for p in api.list_positions()}
|
||||||
|
if ticker in existing_positions:
|
||||||
|
existing_value = float(existing_positions[ticker].market_value)
|
||||||
|
if existing_value / portfolio_value >= 0.10:
|
||||||
|
logger.warning(f"Already at 10% cap for {ticker}, skipping")
|
||||||
|
return False
|
||||||
|
|
||||||
|
order = api.submit_order(
|
||||||
|
symbol=ticker,
|
||||||
|
qty=qty,
|
||||||
|
side="buy",
|
||||||
|
type="market",
|
||||||
|
time_in_force="day",
|
||||||
|
)
|
||||||
|
logger.info(f"Order submitted: {ticker} qty={qty} order_id={order.id}")
|
||||||
|
mark_signal_executed(signal["id"])
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to execute signal for {ticker}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def close_position_after_days(ticker: str, holding_days: Optional[int] = None):
|
||||||
|
days = holding_days or config.HOLDING_PERIOD_DAYS
|
||||||
|
api = _get_api()
|
||||||
|
try:
|
||||||
|
api.close_position(ticker)
|
||||||
|
logger.info(f"Closed position: {ticker} after {days} days")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to close position {ticker}: {e}")
|
||||||
|
return False
|
||||||
40
config.py
Normal file
40
config.py
Normal file
@ -0,0 +1,40 @@
|
|||||||
|
import os
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
EDGAR_POLL_INTERVAL = 600
|
||||||
|
MIN_TRANSACTION_VALUE = 50_000
|
||||||
|
MIN_CLUSTER_SIZE = 1
|
||||||
|
CLUSTER_WINDOW_DAYS = 30
|
||||||
|
HOLDING_PERIOD_DAYS = 90
|
||||||
|
POSITION_SIZE_PCT = 0.02
|
||||||
|
MAX_POSITIONS = 20
|
||||||
|
SCORE_ALERT_THRESHOLD = 5.0
|
||||||
|
|
||||||
|
ROLE_WEIGHTS = {
|
||||||
|
"ceo": 3.0,
|
||||||
|
"chief executive officer": 3.0,
|
||||||
|
"cfo": 2.5,
|
||||||
|
"chief financial officer": 2.5,
|
||||||
|
"president": 2.5,
|
||||||
|
"coo": 2.0,
|
||||||
|
"chief operating officer": 2.0,
|
||||||
|
"director": 1.5,
|
||||||
|
"vp": 1.2,
|
||||||
|
"vice president": 1.2,
|
||||||
|
"10% owner": 1.0,
|
||||||
|
}
|
||||||
|
DEFAULT_ROLE_WEIGHT = 1.0
|
||||||
|
|
||||||
|
SLACK_WEBHOOK_URL = os.getenv("SLACK_WEBHOOK_URL", "")
|
||||||
|
ALPACA_KEY = os.getenv("ALPACA_KEY", "")
|
||||||
|
ALPACA_SECRET = os.getenv("ALPACA_SECRET", "")
|
||||||
|
ALPACA_BASE_URL = os.getenv("ALPACA_BASE_URL", "https://paper-api.alpaca.markets")
|
||||||
|
|
||||||
|
DB_PATH = os.getenv("DB_PATH", "insider.db")
|
||||||
|
DATA_DIR = os.getenv("DATA_DIR", "data/filings")
|
||||||
|
|
||||||
|
EDGAR_RSS_URL = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent"
|
||||||
|
EDGAR_SEARCH_URL = "https://efts.sec.gov/LATEST/search-index?q=&forms=4"
|
||||||
|
EDGAR_BASE_URL = "https://www.sec.gov"
|
||||||
0
db/__init__.py
Normal file
0
db/__init__.py
Normal file
123
db/db.py
Normal file
123
db/db.py
Normal file
@ -0,0 +1,123 @@
|
|||||||
|
import sqlite3
|
||||||
|
import os
|
||||||
|
from datetime import datetime
|
||||||
|
import config
|
||||||
|
|
||||||
|
|
||||||
|
def get_connection():
|
||||||
|
conn = sqlite3.connect(config.DB_PATH)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
conn.execute("PRAGMA journal_mode=WAL")
|
||||||
|
conn.execute("PRAGMA foreign_keys=ON")
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def init_db():
|
||||||
|
schema_path = os.path.join(os.path.dirname(__file__), "schema.sql")
|
||||||
|
with open(schema_path, "r") as f:
|
||||||
|
schema = f.read()
|
||||||
|
conn = get_connection()
|
||||||
|
conn.executescript(schema)
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def insert_filing(filing: dict) -> bool:
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
INSERT OR IGNORE INTO filings
|
||||||
|
(accession_number, ticker, cik, insider_name, role,
|
||||||
|
transaction_date, filed_date, shares, price, total_value,
|
||||||
|
flag, is_10b51, post_tx_shares)
|
||||||
|
VALUES
|
||||||
|
(:accession_number, :ticker, :cik, :insider_name, :role,
|
||||||
|
:transaction_date, :filed_date, :shares, :price, :total_value,
|
||||||
|
:flag, :is_10b51, :post_tx_shares)
|
||||||
|
""",
|
||||||
|
filing,
|
||||||
|
)
|
||||||
|
inserted = conn.execute("SELECT changes()").fetchone()[0] > 0
|
||||||
|
conn.commit()
|
||||||
|
return inserted
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def insert_signal(signal: dict) -> int:
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
cur = conn.execute(
|
||||||
|
"""
|
||||||
|
INSERT INTO signals
|
||||||
|
(ticker, trigger_date, cluster_size, total_cluster_value, score)
|
||||||
|
VALUES
|
||||||
|
(:ticker, :trigger_date, :cluster_size, :total_cluster_value, :score)
|
||||||
|
""",
|
||||||
|
signal,
|
||||||
|
)
|
||||||
|
signal_id = cur.lastrowid
|
||||||
|
conn.commit()
|
||||||
|
return signal_id
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def mark_signal_alerted(signal_id: int):
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
conn.execute("UPDATE signals SET alerted=1 WHERE id=?", (signal_id,))
|
||||||
|
conn.commit()
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def mark_signal_executed(signal_id: int):
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
conn.execute("UPDATE signals SET executed=1 WHERE id=?", (signal_id,))
|
||||||
|
conn.commit()
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def get_unalerted_signals():
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
rows = conn.execute(
|
||||||
|
"SELECT * FROM signals WHERE alerted=0 ORDER BY created_at ASC"
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def get_recent_buys_for_ticker(ticker: str, window_days: int) -> list:
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
rows = conn.execute(
|
||||||
|
"""
|
||||||
|
SELECT * FROM filings
|
||||||
|
WHERE ticker=?
|
||||||
|
AND flag='A'
|
||||||
|
AND is_10b51=0
|
||||||
|
AND transaction_date >= date('now', ? || ' days')
|
||||||
|
ORDER BY transaction_date DESC
|
||||||
|
""",
|
||||||
|
(ticker, f"-{window_days}"),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def accession_exists(accession_number: str) -> bool:
|
||||||
|
conn = get_connection()
|
||||||
|
try:
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT 1 FROM filings WHERE accession_number=?", (accession_number,)
|
||||||
|
).fetchone()
|
||||||
|
return row is not None
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
34
db/schema.sql
Normal file
34
db/schema.sql
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
CREATE TABLE IF NOT EXISTS filings (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
accession_number TEXT UNIQUE,
|
||||||
|
ticker TEXT,
|
||||||
|
cik TEXT,
|
||||||
|
insider_name TEXT,
|
||||||
|
role TEXT,
|
||||||
|
transaction_date TEXT,
|
||||||
|
filed_date TEXT,
|
||||||
|
shares REAL,
|
||||||
|
price REAL,
|
||||||
|
total_value REAL,
|
||||||
|
flag TEXT,
|
||||||
|
is_10b51 INTEGER DEFAULT 0,
|
||||||
|
post_tx_shares REAL,
|
||||||
|
created_at TEXT DEFAULT (datetime('now'))
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS signals (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
ticker TEXT,
|
||||||
|
trigger_date TEXT,
|
||||||
|
cluster_size INTEGER,
|
||||||
|
total_cluster_value REAL,
|
||||||
|
score REAL,
|
||||||
|
alerted INTEGER DEFAULT 0,
|
||||||
|
executed INTEGER DEFAULT 0,
|
||||||
|
created_at TEXT DEFAULT (datetime('now'))
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_filings_ticker ON filings(ticker);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_filings_transaction_date ON filings(transaction_date);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_signals_ticker ON signals(ticker);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_signals_alerted ON signals(alerted);
|
||||||
0
ingestion/__init__.py
Normal file
0
ingestion/__init__.py
Normal file
138
ingestion/edgar_poller.py
Normal file
138
ingestion/edgar_poller.py
Normal file
@ -0,0 +1,138 @@
|
|||||||
|
import time
|
||||||
|
import os
|
||||||
|
import logging
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional
|
||||||
|
import requests
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
import config
|
||||||
|
from ingestion.form4_parser import parse_form4
|
||||||
|
from db.db import insert_filing, accession_exists
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
HEADERS = {
|
||||||
|
"User-Agent": "insider-copytrade-poc contact@example.com",
|
||||||
|
"Accept-Encoding": "gzip, deflate",
|
||||||
|
}
|
||||||
|
|
||||||
|
EDGAR_FULL_INDEX = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&output=atom"
|
||||||
|
|
||||||
|
|
||||||
|
def _fetch(url: str, timeout: int = 30) -> requests.Response:
|
||||||
|
resp = requests.get(url, headers=HEADERS, timeout=timeout)
|
||||||
|
resp.raise_for_status()
|
||||||
|
return resp
|
||||||
|
|
||||||
|
|
||||||
|
def _get_filing_urls() -> list[tuple[str, str, str]]:
|
||||||
|
resp = _fetch(EDGAR_FULL_INDEX)
|
||||||
|
root = etree.fromstring(resp.content)
|
||||||
|
ns = {"atom": "http://www.w3.org/2005/Atom"}
|
||||||
|
entries = root.findall("atom:entry", ns)
|
||||||
|
results = []
|
||||||
|
for entry in entries:
|
||||||
|
filing_href = entry.find("atom:link", ns)
|
||||||
|
if filing_href is None:
|
||||||
|
continue
|
||||||
|
url = filing_href.get("href", "")
|
||||||
|
updated = (entry.findtext("atom:updated", namespaces=ns) or "")[:10]
|
||||||
|
accession = url.rstrip("/").split("/")[-1].replace("-index.htm", "")
|
||||||
|
accession = accession.replace("-", "")
|
||||||
|
if len(accession) == 18:
|
||||||
|
accession = f"{accession[:10]}-{accession[10:12]}-{accession[12:]}"
|
||||||
|
results.append((url, accession, updated))
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def _get_xml_url_from_index(index_url: str) -> Optional[str]:
|
||||||
|
try:
|
||||||
|
resp = _fetch(index_url)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
root = etree.fromstring(resp.content)
|
||||||
|
ns = {"atom": "http://www.w3.org/2005/Atom"}
|
||||||
|
for link in root.findall("atom:link", ns):
|
||||||
|
href = link.get("href", "")
|
||||||
|
if href.endswith(".xml") and "form4" in href.lower():
|
||||||
|
return href
|
||||||
|
for link in root.findall(".//filing-href"):
|
||||||
|
if link.text and link.text.endswith(".xml"):
|
||||||
|
return link.text.strip()
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _save_raw_xml(accession: str, xml_bytes: bytes):
|
||||||
|
os.makedirs(config.DATA_DIR, exist_ok=True)
|
||||||
|
path = os.path.join(config.DATA_DIR, f"{accession}.xml")
|
||||||
|
if not os.path.exists(path):
|
||||||
|
with open(path, "wb") as f:
|
||||||
|
f.write(xml_bytes)
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_and_store_new_filings() -> list[dict]:
|
||||||
|
new_filings = []
|
||||||
|
try:
|
||||||
|
entries = _get_filing_urls()
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to fetch EDGAR index: {e}")
|
||||||
|
return new_filings
|
||||||
|
|
||||||
|
for index_url, accession, filed_date in entries:
|
||||||
|
if accession_exists(accession):
|
||||||
|
continue
|
||||||
|
|
||||||
|
xml_url = _resolve_xml_url(index_url, accession)
|
||||||
|
if not xml_url:
|
||||||
|
logger.warning(f"No XML found for {accession}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
xml_resp = _fetch(xml_url)
|
||||||
|
xml_bytes = xml_resp.content
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to fetch XML for {accession}: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
_save_raw_xml(accession, xml_bytes)
|
||||||
|
parsed = parse_form4(xml_bytes, accession, filed_date)
|
||||||
|
|
||||||
|
for filing in parsed:
|
||||||
|
inserted = insert_filing(filing)
|
||||||
|
if inserted:
|
||||||
|
new_filings.append(filing)
|
||||||
|
|
||||||
|
return new_filings
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_xml_url(index_url: str, accession: str) -> Optional[str]:
|
||||||
|
accession_path = accession.replace("-", "")
|
||||||
|
cik = accession_path[:10].lstrip("0")
|
||||||
|
base = f"{config.EDGAR_BASE_URL}/Archives/edgar/data/{cik}/{accession_path}/"
|
||||||
|
candidate = f"{base}{accession}-index.htm"
|
||||||
|
try:
|
||||||
|
resp = _fetch(candidate)
|
||||||
|
root = etree.fromstring(resp.content)
|
||||||
|
for node in root.iter():
|
||||||
|
text = (node.text or "").strip()
|
||||||
|
if text.endswith(".xml") and ("4" in text or "form" in text.lower()):
|
||||||
|
return base + text
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def run_poller(on_new_filing=None):
|
||||||
|
logger.info("EDGAR poller started")
|
||||||
|
while True:
|
||||||
|
logger.info("Polling EDGAR for new Form 4 filings...")
|
||||||
|
new = fetch_and_store_new_filings()
|
||||||
|
logger.info(f"Found {len(new)} new filings")
|
||||||
|
if on_new_filing:
|
||||||
|
for filing in new:
|
||||||
|
try:
|
||||||
|
on_new_filing(filing)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in on_new_filing callback: {e}")
|
||||||
|
time.sleep(config.EDGAR_POLL_INTERVAL)
|
||||||
95
ingestion/form4_parser.py
Normal file
95
ingestion/form4_parser.py
Normal file
@ -0,0 +1,95 @@
|
|||||||
|
import re
|
||||||
|
from lxml import etree
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
|
||||||
|
_10B51_PATTERNS = [
|
||||||
|
r"10b5-1",
|
||||||
|
r"rule 10b5",
|
||||||
|
r"adopted a plan",
|
||||||
|
r"10b5\(1\)",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _is_10b51(text: str) -> bool:
|
||||||
|
text_lower = text.lower()
|
||||||
|
return any(re.search(p, text_lower) for p in _10B51_PATTERNS)
|
||||||
|
|
||||||
|
|
||||||
|
def _text(el, tag: str) -> Optional[str]:
|
||||||
|
node = el.find(".//" + tag)
|
||||||
|
if node is not None and node.text:
|
||||||
|
return node.text.strip()
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _float(el, tag: str) -> Optional[float]:
|
||||||
|
val = _text(el, tag)
|
||||||
|
if val is None:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return float(val.replace(",", ""))
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def parse_form4(xml_bytes: bytes, accession_number: str, filed_date: str) -> list[dict]:
|
||||||
|
try:
|
||||||
|
root = etree.fromstring(xml_bytes)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
return []
|
||||||
|
|
||||||
|
ticker = _text(root, "issuerTradingSymbol") or ""
|
||||||
|
cik = _text(root, "issuerCik") or ""
|
||||||
|
insider_name = _text(root, "rptOwnerName") or ""
|
||||||
|
role = _text(root, "officerTitle") or _text(root, "isDirector") or ""
|
||||||
|
|
||||||
|
footnotes_text = " ".join(
|
||||||
|
(node.text or "") for node in root.findall(".//footnote")
|
||||||
|
)
|
||||||
|
global_10b51 = _is_10b51(footnotes_text)
|
||||||
|
|
||||||
|
transactions = root.findall(".//nonDerivativeTransaction")
|
||||||
|
results = []
|
||||||
|
|
||||||
|
for tx in transactions:
|
||||||
|
flag = _text(tx, "transactionAcquiredDisposedCode")
|
||||||
|
if not flag:
|
||||||
|
continue
|
||||||
|
|
||||||
|
shares = _float(tx, "transactionShares")
|
||||||
|
price = _float(tx, "transactionPricePerShare")
|
||||||
|
total_value = _float(tx, "transactionTotalValue")
|
||||||
|
if total_value is None and shares is not None and price is not None:
|
||||||
|
total_value = shares * price
|
||||||
|
post_tx_shares = _float(tx, "sharesOwnedFollowingTransaction")
|
||||||
|
tx_date = _text(tx, "transactionDate") or filed_date
|
||||||
|
|
||||||
|
tx_footnote_ids = [
|
||||||
|
fn.get("id", "") for fn in tx.findall(".//footnoteId")
|
||||||
|
]
|
||||||
|
tx_footnote_text = " ".join(
|
||||||
|
(root.find(f".//footnote[@id='{fid}']") or etree.Element("x")).text or ""
|
||||||
|
for fid in tx_footnote_ids
|
||||||
|
)
|
||||||
|
is_10b51 = int(global_10b51 or _is_10b51(tx_footnote_text))
|
||||||
|
|
||||||
|
results.append(
|
||||||
|
{
|
||||||
|
"accession_number": accession_number,
|
||||||
|
"ticker": ticker.upper(),
|
||||||
|
"cik": cik,
|
||||||
|
"insider_name": insider_name,
|
||||||
|
"role": role,
|
||||||
|
"transaction_date": tx_date,
|
||||||
|
"filed_date": filed_date,
|
||||||
|
"shares": shares,
|
||||||
|
"price": price,
|
||||||
|
"total_value": total_value,
|
||||||
|
"flag": flag.upper(),
|
||||||
|
"is_10b51": is_10b51,
|
||||||
|
"post_tx_shares": post_tx_shares,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return results
|
||||||
78
main.py
Normal file
78
main.py
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _on_new_filing(filing: dict):
|
||||||
|
from signals.filter_engine import process_filing
|
||||||
|
from alerts.slack_alert import send_slack_alert
|
||||||
|
import config
|
||||||
|
|
||||||
|
signal = process_filing(filing)
|
||||||
|
if signal is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Signal: {signal['ticker']} score={signal['score']} cluster={signal['cluster_size']}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if config.SLACK_WEBHOOK_URL:
|
||||||
|
send_slack_alert(signal)
|
||||||
|
|
||||||
|
if config.ALPACA_KEY and config.ALPACA_SECRET:
|
||||||
|
from broker.alpaca_client import execute_signal
|
||||||
|
execute_signal(signal)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_run():
|
||||||
|
from db.db import init_db
|
||||||
|
from ingestion.edgar_poller import run_poller
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
logger.info("Database initialized")
|
||||||
|
run_poller(on_new_filing=_on_new_filing)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_backtest():
|
||||||
|
from backtest.backtest import run_backtest, print_summary
|
||||||
|
import config
|
||||||
|
|
||||||
|
logger.info("Running backtest...")
|
||||||
|
summary = run_backtest(
|
||||||
|
db_path=config.DB_PATH,
|
||||||
|
holding_days=config.HOLDING_PERIOD_DAYS,
|
||||||
|
min_score=config.SCORE_ALERT_THRESHOLD,
|
||||||
|
min_cluster_size=config.MIN_CLUSTER_SIZE,
|
||||||
|
)
|
||||||
|
print_summary(summary)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_fetch_once():
|
||||||
|
from db.db import init_db
|
||||||
|
from ingestion.edgar_poller import fetch_and_store_new_filings
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
filings = fetch_and_store_new_filings()
|
||||||
|
logger.info(f"Fetched and stored {len(filings)} new filings")
|
||||||
|
|
||||||
|
for filing in filings:
|
||||||
|
signal = _on_new_filing(filing)
|
||||||
|
|
||||||
|
|
||||||
|
COMMANDS = {
|
||||||
|
"run": cmd_run,
|
||||||
|
"backtest": cmd_backtest,
|
||||||
|
"fetch-once": cmd_fetch_once,
|
||||||
|
}
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
cmd = sys.argv[1] if len(sys.argv) > 1 else "run"
|
||||||
|
if cmd not in COMMANDS:
|
||||||
|
print(f"Usage: python main.py [{' | '.join(COMMANDS)}]")
|
||||||
|
sys.exit(1)
|
||||||
|
COMMANDS[cmd]()
|
||||||
5
requirements.txt
Normal file
5
requirements.txt
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
requests>=2.31.0
|
||||||
|
lxml>=5.0.0
|
||||||
|
yfinance>=0.2.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
alpaca-trade-api>=3.0.0
|
||||||
0
signals/__init__.py
Normal file
0
signals/__init__.py
Normal file
13
signals/cluster_detector.py
Normal file
13
signals/cluster_detector.py
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
from db.db import get_recent_buys_for_ticker
|
||||||
|
import config
|
||||||
|
|
||||||
|
|
||||||
|
def detect_cluster(ticker: str) -> dict:
|
||||||
|
buys = get_recent_buys_for_ticker(ticker, config.CLUSTER_WINDOW_DAYS)
|
||||||
|
unique_insiders = {b["insider_name"] for b in buys}
|
||||||
|
total_value = sum(b["total_value"] or 0 for b in buys)
|
||||||
|
return {
|
||||||
|
"cluster_size": len(unique_insiders),
|
||||||
|
"total_cluster_value": total_value,
|
||||||
|
"buys": buys,
|
||||||
|
}
|
||||||
67
signals/filter_engine.py
Normal file
67
signals/filter_engine.py
Normal file
@ -0,0 +1,67 @@
|
|||||||
|
import math
|
||||||
|
import logging
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import config
|
||||||
|
from signals.cluster_detector import detect_cluster
|
||||||
|
from db.db import insert_signal
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _role_weight(role: str) -> float:
|
||||||
|
role_lower = (role or "").lower()
|
||||||
|
for key, weight in config.ROLE_WEIGHTS.items():
|
||||||
|
if key in role_lower:
|
||||||
|
return weight
|
||||||
|
return config.DEFAULT_ROLE_WEIGHT
|
||||||
|
|
||||||
|
|
||||||
|
def _score(total_value: float, role: str, cluster_size: int) -> float:
|
||||||
|
if not total_value or total_value <= 0:
|
||||||
|
return 0.0
|
||||||
|
base = _role_weight(role)
|
||||||
|
cluster_mult = 1.0 + 0.5 * (cluster_size - 1)
|
||||||
|
return base * math.log(total_value) * cluster_mult
|
||||||
|
|
||||||
|
|
||||||
|
def process_filing(filing: dict) -> Optional[dict]:
|
||||||
|
if filing.get("flag") != "A":
|
||||||
|
return None
|
||||||
|
|
||||||
|
if filing.get("is_10b51"):
|
||||||
|
logger.debug(f"Skipping 10b5-1 filing: {filing['accession_number']}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
total_value = filing.get("total_value") or 0
|
||||||
|
if total_value < config.MIN_TRANSACTION_VALUE:
|
||||||
|
logger.debug(f"Below min value: {filing['accession_number']} (${total_value:,.0f})")
|
||||||
|
return None
|
||||||
|
|
||||||
|
ticker = filing.get("ticker", "")
|
||||||
|
if not ticker:
|
||||||
|
return None
|
||||||
|
|
||||||
|
cluster_info = detect_cluster(ticker)
|
||||||
|
cluster_size = cluster_info["cluster_size"]
|
||||||
|
total_cluster_value = cluster_info["total_cluster_value"]
|
||||||
|
|
||||||
|
if cluster_size < config.MIN_CLUSTER_SIZE:
|
||||||
|
return None
|
||||||
|
|
||||||
|
score = _score(total_value, filing.get("role", ""), cluster_size)
|
||||||
|
|
||||||
|
signal = {
|
||||||
|
"ticker": ticker,
|
||||||
|
"trigger_date": filing.get("transaction_date", ""),
|
||||||
|
"cluster_size": cluster_size,
|
||||||
|
"total_cluster_value": total_cluster_value,
|
||||||
|
"score": round(score, 2),
|
||||||
|
"filing": filing,
|
||||||
|
"cluster_buys": cluster_info["buys"],
|
||||||
|
}
|
||||||
|
|
||||||
|
signal_id = insert_signal(signal)
|
||||||
|
signal["id"] = signal_id
|
||||||
|
logger.info(f"Signal generated: {ticker} score={score:.2f} cluster={cluster_size}")
|
||||||
|
return signal
|
||||||
Loading…
Reference in New Issue
Block a user