docs: update README with results section, simulator usage, and caveats
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
8f666130b9
commit
e340d59a69
122
README.md
122
README.md
@ -14,21 +14,25 @@ Copying the idea from [insidercopytrading.com](https://insidercopytrading.com/).
|
||||
EDGAR (Form 4 feed)
|
||||
│
|
||||
▼
|
||||
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
|
||||
ingestion/edgar_poller.py ← polls every 10 min, dedupes by accession
|
||||
ingestion/sec_bulk_ingest.py ← bulk historical ingest via quarterly form.idx archives
|
||||
│
|
||||
▼
|
||||
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans
|
||||
ingestion/form4_parser.py ← parses XML, detects 10b5-1 plans, extracts tx_code
|
||||
│
|
||||
▼
|
||||
db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables
|
||||
db/models.py + db/db.py ← SQLAlchemy ORM: filings, signals, price_cache tables
|
||||
│
|
||||
▼
|
||||
signals/filter_engine.py ← buy-only, exclude 10b5-1, min $50k, role-weighted scoring
|
||||
signals/cluster_detector.py ← counts unique insiders per ticker in rolling 30-day window
|
||||
signals/filter_engine.py ← buy-only, open-market (P) only, exclude 10b5-1,
|
||||
signals/cluster_detector.py min $50k, role-weighted scoring, as-of-date aware
|
||||
│
|
||||
├──► alerts/slack_alert.py ← POST to Slack webhook when score ≥ threshold
|
||||
└──► broker/alpaca_client.py ← paper/live order: 2% position size, 10% per-ticker cap
|
||||
positions auto-closed after holding period expires
|
||||
|
||||
backtest/backtest.py ← per-signal return / alpha vs SPY analysis
|
||||
backtest/simulate.py ← realistic portfolio simulation with transaction costs
|
||||
```
|
||||
|
||||
## Setup
|
||||
@ -53,14 +57,37 @@ pip install -r requirements.txt
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Initialize DB and ingest current EDGAR feed (one shot)
|
||||
python main.py fetch-once
|
||||
|
||||
# Run continuous polling loop (every 10 minutes)
|
||||
# Initialize DB and start continuous polling (every 10 minutes)
|
||||
python main.py run
|
||||
|
||||
# Backtest signals already in the DB against historical prices
|
||||
# Bulk-ingest historical Form 4 filings from SEC EDGAR quarterly archives
|
||||
python main.py backfill --years 2023 2024 # full year range
|
||||
python main.py backfill --year 2024 --quarter 1 # single quarter
|
||||
|
||||
# Per-signal backtest: win rate, alpha vs SPY
|
||||
python main.py backtest
|
||||
|
||||
# Portfolio simulation with configurable strategy and cost params
|
||||
python main.py simulate [options]
|
||||
```
|
||||
|
||||
### Simulate options
|
||||
|
||||
```
|
||||
Strategy:
|
||||
--holding-days N Calendar days to hold each position (default: 7)
|
||||
--buy-delay N Days after signal trigger to enter (default: 1)
|
||||
--position-size F Fraction of available cash per trade (default: 0.10)
|
||||
--min-score F Minimum signal score filter (default: 0.0)
|
||||
--min-cluster N Minimum cluster size filter (default: 1)
|
||||
--capital F Initial capital in USD (default: 100000)
|
||||
|
||||
Transaction costs:
|
||||
--spread F One-way bid-ask half-spread paid at entry and exit (default: 0.003)
|
||||
--slippage F Entry slippage / market impact (default: 0.002)
|
||||
--commission F Per-trade commission as fraction of notional (default: 0.001)
|
||||
|
||||
Round-trip cost = spread×2 + slippage + commission×2
|
||||
```
|
||||
|
||||
## Key configuration (`config.py`)
|
||||
@ -86,11 +113,64 @@ Role weights: CEO 3.0 · CFO/President 2.5 · COO 2.0 · Director 1.5 · VP 1.2
|
||||
|
||||
## Backtesting
|
||||
|
||||
The backtest loads signals from the DB and fetches OHLC data via `yfinance`. Prices are cached in the `price_cache` table — completed date ranges are served entirely from the DB on repeat runs, avoiding redundant network calls. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date. Raw XML filings are cached in `DATA_DIR` (`data/filings/`) by accession number.
|
||||
The backtest loads signals from the DB and fetches OHLC data via `yfinance`. Prices are cached in the `price_cache` table — completed date ranges are served entirely from the DB on repeat runs. Entry price is the closing price on the first trading day on or after the signal date; exit price is the closing price on the last trading day before or on the exit date.
|
||||
|
||||
The EDGAR poller also skips fetching XML for filings older than the newest `filed_date` already stored in the DB, so incremental runs only process truly new filings.
|
||||
## Results (2023–2024 backtest, 302k filings ingested)
|
||||
|
||||
Metrics reported: win rate, average return, average alpha vs SPY, Sharpe ratio.
|
||||
> **⚠ Read the caveats below before drawing conclusions.**
|
||||
|
||||
### Per-signal statistics (pre-cost)
|
||||
|
||||
Across 16,279 signals generated from 302k Form 4 filings (2023–2024):
|
||||
|
||||
| Hold | Avg return | Avg alpha vs SPY | Sharpe | Win rate |
|
||||
|------|-----------|-----------------|--------|----------|
|
||||
| 3 d | +0.61% | +0.52% | ~0.80 | ~53% |
|
||||
| 7 d | +1.19% | +0.68% | ~1.05 | ~54% |
|
||||
| 14 d | +1.41% | +0.55% | ~0.90 | ~54% |
|
||||
| 30 d | +1.89% | +0.41% | ~0.70 | ~54% |
|
||||
| 90 d | +5.8% | +1.0% | ~0.55 | ~57% |
|
||||
|
||||
Alpha is strongest and most consistent at 3–14 day holds. Beyond 30 days, market beta dominates. Signal quality is broadly robust across `min_score` and `min_cluster` filter values.
|
||||
|
||||
### Portfolio simulation (1-day lag, 7-day hold, 10% of cash per signal)
|
||||
|
||||
Pre-cost simulation on the same period:
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Initial capital | $100,000 |
|
||||
| Final value | $782,097 |
|
||||
| Total return | +682% |
|
||||
| Annualized return | +177% |
|
||||
| SPY annualized | +25.9% |
|
||||
| Max drawdown | 12.8% |
|
||||
| Sharpe | 4.67 |
|
||||
| Trades executed | 13,766 |
|
||||
|
||||
After realistic transaction costs (~1% round-trip), expected annualized return drops to roughly **20–60%** depending on assumed spread and slippage. Run the simulator to check your specific assumptions:
|
||||
|
||||
```bash
|
||||
# Conservative (liquid mid-caps, ~1% round-trip)
|
||||
python main.py simulate --spread 0.003 --slippage 0.002 --commission 0.001
|
||||
|
||||
# Realistic small-cap (~1.5% round-trip)
|
||||
python main.py simulate --spread 0.007 --slippage 0.005 --commission 0.001
|
||||
```
|
||||
|
||||
### Caveats
|
||||
|
||||
1. **Transaction costs are everything.** Average alpha per 7-day trade is ~0.68%. A round-trip on small/mid caps costs 0.6–1.5% (spread + slippage + commission). At the high end this strategy is slightly negative after costs. The 177% pre-cost figure is not achievable in practice.
|
||||
|
||||
2. **2023–2024 was an exceptional bull market.** SPY returned +25.9% annualized. The long-only bias in insider buys captured broad market momentum. Expected performance in flat or down markets is lower and untested.
|
||||
|
||||
3. **Survivorship bias.** Tickers that were delisted, halted, or acquired may be underrepresented in the price cache. This slightly flatters results by dropping the worst outcomes.
|
||||
|
||||
4. **No slippage on popular signals.** When multiple insiders at the same company buy on the same day, the stock may have already moved before you execute. The 1-day delay helps but doesn't fully resolve this.
|
||||
|
||||
5. **Concentrated portfolio.** At 10% of cash per signal with 7-day holds, you run ~7–10 simultaneous positions on average. Individual position variance is high.
|
||||
|
||||
6. **Long-only.** Excess return over SPY is not directly capturable without shorting SPY, which has its own carry cost.
|
||||
|
||||
## Position lifecycle
|
||||
|
||||
@ -102,17 +182,19 @@ Positions are tracked in the `signals` table. When a trade is executed, `execute
|
||||
|---|---|
|
||||
| `config.py` | All thresholds and env-var loading |
|
||||
| `ingestion/edgar_poller.py` | EDGAR Atom feed polling and deduplication |
|
||||
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection |
|
||||
| `ingestion/sec_bulk_ingest.py` | Bulk historical ingest via quarterly form.idx archives |
|
||||
| `ingestion/form4_parser.py` | Form 4 XML → structured dict; 10b5-1 detection; tx_code extraction |
|
||||
| `db/models.py` | SQLAlchemy ORM models (`Filing`, `Signal`, `PriceCache`) |
|
||||
| `db/db.py` | DB access layer (SQLAlchemy sessions) |
|
||||
| `signals/filter_engine.py` | Filing → signal pipeline |
|
||||
| `signals/cluster_detector.py` | Cluster detection from DB |
|
||||
| `db/db.py` | DB access layer — dedup-safe inserts, chunked IN queries, price cache |
|
||||
| `signals/filter_engine.py` | Filing → signal pipeline (open-market-only, as-of-date aware) |
|
||||
| `signals/cluster_detector.py` | Cluster detection from DB (as-of-date aware) |
|
||||
| `alerts/slack_alert.py` | Slack webhook alert |
|
||||
| `broker/alpaca_client.py` | Alpaca order execution + position exit |
|
||||
| `backtest/backtest.py` | Historical backtest runner |
|
||||
| `main.py` | CLI entry point |
|
||||
| `backtest/backtest.py` | Per-signal historical backtest runner |
|
||||
| `backtest/simulate.py` | Portfolio simulator with configurable costs |
|
||||
| `main.py` | CLI entry point (`run` / `backfill` / `backtest` / `simulate`) |
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- See `requirements.txt`: `requests`, `lxml`, `yfinance`, `python-dotenv`, `alpaca-trade-api`, `sqlalchemy`
|
||||
- See `requirements.txt`: `requests`, `lxml`, `cssselect`, `yfinance`, `python-dotenv`, `alpaca-trade-api`, `sqlalchemy`
|
||||
|
||||
Loading…
Reference in New Issue
Block a user