fix: realistic transaction costs, colorbar layout, equity curve clipping

- Costs updated to evidence-based values (SEC small-cap liquidity study 2013, Nasdaq spread data 2021, AQR Trading Costs paper 2018): large ~0.2% RT, mid ~0.5%, small ~1.5%, micro ~5% - Micro-cap note: Alpaca does not allow new OTC/Pink Sheet positions; most micro-cap signals are untradeable; at realistic 5% RT, micro-cap destroys capital (-36% to -81% excess return) - db.py: get_cached_market_caps returns already_fetched set including null rows, preventing repeated yfinance re-queries for known-missing tickers - plot_hp_heatmap: colorbar in dedicated axes (right margin), no overlap - plot_equity_curves: two-pass approach clips all curves to min end date - README: updated cost table, shortened insidercopytrading.com section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 14:23:13 +02:00 · 2026-05-27 14:23:13 +02:00 · b615920843
commit b615920843
parent 9417a9e542
7 changed files with 65 additions and 50 deletions
--- a/README.md
+++ b/README.md
@ -138,32 +138,29 @@ The signal exists. It just does not survive transaction costs.
 ![Position Size Sensitivity](plots/position_size.png)
 Alpaca charges $0 commission on US equities. Real costs are spread + slippage only.
 Cost estimates based on SEC small-cap liquidity research and Alpaca documentation.
 Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal:
 | Cap tier | Signals | RT cost | Ann. return | vs SPY |
 |----------|---------|---------|-------------|--------|
 | Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% |
 | Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% |
-| Small ($300M-2B) | 3,871 | ~0.8% | +12.2% | -6.8% |
+| Small ($300M-2B) | 3,871 | ~1.5% | see plot | see plot |
-| Micro (<$300M) | 5,048 | ~1.6% | +27.5% | +11.9% |
+| Micro (<$300M) | 5,048 | ~5% (if listed) | see plot | see plot |
-The large and mid-cap tiers lose badly to the index. Micro-cap surprisingly beats it by +12%, despite the highest transaction costs -- the per-trade alpha in tiny stocks is large enough to survive friction. However, micro-caps had 46% of signals skipped due to missing price data, and real execution on illiquid micro-caps is harder than the simulation assumes.
+**Note on micro-cap:** Alpaca does not allow opening new positions in OTC/Pink Sheet stocks (close-only). Most micro-cap signals involve OTC-listed names that are simply not tradeable. For exchange-listed micro-caps, realistic round-trip costs are ~5% or more based on SEC spread data — the simulated alpha disappears entirely at that cost level.
-### Is insidercopytrading.com a scam?
+### About insidercopytrading.com
-Kind of, yes.
+Their website advertises backtested returns that significantly outperform the market. Those numbers cannot be replicated in practice because the backtesting methodology omits the costs that matter most:
-Their website shows backtested returns that significantly outperform the market. Those numbers are real in the sense that the simulation ran correctly. They are not real in the sense that you could ever achieve them:
+- **Same-day entry** at the closing price of the filing date — a price you cannot buy at as a retail trader.
 - **No spread or slippage** — SEC data shows small/micro-cap bid-ask spreads of 0.5-2%+ each way.
 - **Survivorship bias** — signals for stocks that later delisted or became untradeable are excluded from their results but would have been part of your portfolio.
- **Same-day entry.** Form 4 filings are submitted after market close or intraday. By the time you see the filing and place an order, the earliest realistic entry is the next morning's open. Their simulations use the closing price on the filing date -- a price you cannot buy at.
+Under realistic assumptions, the strategy underperforms SPY across all tested parameters. [insidercopytrading.com](https://insidercopytrading.com) advertises performance numbers that their own subscribers cannot reproduce. Their website is rather pretty though, and their subscription revenue is presumably real.
 - **No spread or slippage.** They assume you transact at the closing mid-price with zero friction. In reality, on the small-cap and micro-cap stocks where most insider buying happens, the bid-ask spread alone is 0.3-0.8% each way.
 - **No market impact.** Their signals all execute at the same price regardless of how many people are following the service. If a meaningful number of subscribers act on the same signal, they move the stock against themselves.
-Under realistic assumptions with a 1-day entry delay and real bid-ask costs on Alpaca, our simulation shows the strategy **underperforms SPY across all tested holding periods and produces negative absolute returns for any round-trip cost above ~0.5%**. For the small and mid-cap stocks that dominate insider buying signals, you are not reaching 0.5%.
+Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested.
 This is not a unique failure of this implementation. It is a fundamental property of the strategy: the edge (~0.7% per 7-day trade) is smaller than the friction of executing it in real markets. [insidercopytrading.com](https://insidercopytrading.com) either does not know this or does not want you to know it. Either way, they are charging a subscription for backtested numbers that cannot be reproduced with real money. Their website is rather pretty though.
 Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested, for the above reason. Wiring up live execution to a strategy that burns money seemed like a bad idea.
 ## Modules
--- a/backtest/plot.py
+++ b/backtest/plot.py
@ -45,14 +45,17 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market
    buy_delays = [0, 1, 2, 3]
    # Cap tier definitions: (label, cap_tier, spread, slippage)
-    # Costs match README results table. commission=0 (Alpaca).
+    # Costs based on SEC small-cap liquidity study (2013), Nasdaq spread data (2021),
    # and Frazzini/Israel/Moskowitz "Trading Costs" (AQR, 2018).
    # Alpaca charges zero commission. OTC/Pink Sheet stocks cannot be opened on Alpaca
    # (close-only), so micro-cap signals overlap heavily with untradeable names.
    tiers = [
-        ("Theoretical (0% RT, all)",  None,    0.000,  0.000),
+        ("Theoretical (0% RT, all)",       None,    0.000,  0.000),
-        ("All cap (~0.7% RT)",         None,    0.0025, 0.002),
+        ("All cap (~1% RT)",               None,    0.003,  0.004),
-        ("Large cap (~0.2% RT)",       "large", 0.001,  0.001),
+        ("Large cap (~0.2% RT)",           "large", 0.0005, 0.001),
-        ("Mid cap (~0.5% RT)",         "mid",   0.0015, 0.0015),
+        ("Mid cap (~0.5% RT)",             "mid",   0.0015, 0.002),
-        ("Small cap (~0.8% RT)",       "small", 0.003,  0.002),
+        ("Small cap (~1.5% RT)",           "small", 0.005,  0.005),
-        ("Micro cap (~1.6% RT)",       "micro", 0.005,  0.003),
+        ("Micro cap (~5% RT, if listed)",  "micro", 0.015,  0.020),
    ]
    total = len(tiers) * len(hold_days) * len(buy_delays)
@ -104,17 +107,18 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market
                ax.text(j, i, f"{val:+.1f}", ha="center", va="center",
                        fontsize=8, color=color)
    # Shared colorbar
    fig.colorbar(
        plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"),
        ax=axes_flat, label="Annualised excess return vs SPY (%)", shrink=0.6,
    )
    fig.suptitle(
        "HP sweep: holding period x entry delay, by cap tier  (Alpaca, zero commission)",
        fontsize=13,
    )
-    plt.tight_layout()
+    plt.tight_layout(rect=[0, 0, 0.88, 1])
    # Shared colorbar in reserved right margin — avoids overlapping panels
    cbar_ax = fig.add_axes([0.905, 0.15, 0.018, 0.65])
    fig.colorbar(
        plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"),
        cax=cbar_ax, label="Annualised excess return vs SPY (%)",
    )
    os.makedirs(out_dir, exist_ok=True)
    out = os.path.join(out_dir, "hp_sweep.png")
@ -130,21 +134,26 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
    """
    matplotlib, plt, mdates, np = _get_matplotlib()
-    # Alpaca zero-commission costs by cap tier (spread + slippage only)
+    # Realistic Alpaca costs by cap tier (zero commission, spread + slippage only).
-    # Costs match the values used in the README results table
+    # Sources: SEC small-cap liquidity study (2013); Nasdaq spread data (2021);
    # Frazzini/Israel/Moskowitz "Trading Costs" AQR (2018).
    # Micro-cap: Alpaca does not allow new positions in OTC/Pink Sheet stocks — most
    # micro-cap names fall in this category and are simply not tradeable.
    scenarios = [
-        {"label": "Large cap  (~0.2% RT)", "cap_tier": "large", "spread": 0.001,  "slippage": 0.001},
+        {"label": "Large cap  (~0.2% RT)", "cap_tier": "large", "spread": 0.0005, "slippage": 0.001},
-        {"label": "Mid cap    (~0.5% RT)", "cap_tier": "mid",   "spread": 0.0015, "slippage": 0.0015},
+        {"label": "Mid cap    (~0.5% RT)", "cap_tier": "mid",   "spread": 0.0015, "slippage": 0.002},
-        {"label": "Small cap  (~0.8% RT)", "cap_tier": "small", "spread": 0.003,  "slippage": 0.002},
+        {"label": "Small cap  (~1.5% RT)", "cap_tier": "small", "spread": 0.005,  "slippage": 0.005},
-        {"label": "Micro cap  (~1.6% RT)", "cap_tier": "micro", "spread": 0.005,  "slippage": 0.003},
+        {"label": "Micro cap  (~5% RT, if listed)", "cap_tier": "micro", "spread": 0.015, "slippage": 0.020},
    ]
    fig, ax = plt.subplots(figsize=(13, 7))
    colors  = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"]
    sim_start = None
-    last_curve_date = None  # earliest end across all scenarios — SPY clipped here
+    last_curve_date = None  # earliest end across all scenarios — all curves clipped here
    # First pass: simulate and find common end date
    raw_curves = []
    for sc, color in zip(scenarios, colors):
        s = Strategy(
            holding_days=7, buy_delay=1,
@ -154,13 +163,19 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
        print(f"  equity curve: {sc['label']}...", flush=True)
        r = simulate(s, prices=prices, _signals=signals, _market_caps=market_caps)
        curve = r.get("equity_curve", [])
        raw_curves.append((sc, color, curve, r))
        if curve:
            sim_start = sim_start or r["period"]["start"]
            end = curve[-1][0]
            last_curve_date = min(last_curve_date, end) if last_curve_date else end
    # Second pass: plot all curves clipped to the minimum end date
    for sc, color, curve, r in raw_curves:
        if not curve:
            continue
        curve = [(d, v) for d, v in curve if d <= last_curve_date]
        if not curve:
            continue
        sim_start = sim_start or r["period"]["start"]
        end = curve[-1][0]
        last_curve_date = min(last_curve_date, end) if last_curve_date else end
        dates  = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve]
        values = [v for _, v in curve]
        base   = values[0]
@ -213,10 +228,10 @@ def plot_position_size(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
    pos_sizes = [0.03, 0.05, 0.07, 0.10, 0.15, 0.20, 0.25]
    tiers = [
-        ("Large (~0.2% RT)",  "large", 0.001,  0.001),
+        ("Large (~0.2% RT)",        "large", 0.0005, 0.001),
-        ("Mid (~0.5% RT)",    "mid",   0.0015, 0.0015),
+        ("Mid (~0.5% RT)",          "mid",   0.0015, 0.002),
-        ("Small (~0.8% RT)",  "small", 0.003,  0.002),
+        ("Small (~1.5% RT)",        "small", 0.005,  0.005),
-        ("Micro (~1.6% RT)",  "micro", 0.005,  0.003),
+        ("Micro (~5% RT, if lsted)","micro", 0.015,  0.020),
    ]
    colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"]
--- a/backtest/simulate.py
+++ b/backtest/simulate.py
@ -47,9 +47,9 @@ def _fetch_market_caps(tickers: list[str]) -> dict[str, float]:
    import yfinance as yf
    from concurrent.futures import ThreadPoolExecutor, as_completed
-    cached = get_cached_market_caps(tickers)
+    cached, already_fetched = get_cached_market_caps(tickers)
-    # Skip tickers with special chars that yfinance can't handle
+    # Skip tickers already tried (even if null) and those with special chars
-    missing = [t for t in tickers if t not in cached and "/" not in t]
+    missing = [t for t in tickers if t not in already_fetched and "/" not in t]
    if missing:
        logger.info(f"Fetching market caps for {len(missing)} tickers via yfinance (parallel)...")
--- a/db/db.py
+++ b/db/db.py
@ -219,14 +219,17 @@ def get_signals_for_backtest(min_score: float, min_cluster_size: int) -> list[di
        return [_signal_to_dict(r) for r in rows]
-def get_cached_market_caps(tickers: list[str]) -> dict[str, float]:
+def get_cached_market_caps(tickers: list[str]) -> tuple[dict[str, float], set[str]]:
    """Return (cap_map, already_fetched_set). already_fetched includes tickers with null cap."""
    if not tickers:
-        return {}
+        return {}, set()
    with _session() as session:
        rows = session.scalars(
            select(TickerMeta).where(TickerMeta.ticker.in_(tickers))
        ).all()
-    return {r.ticker: r.market_cap for r in rows if r.market_cap is not None}
+    caps = {r.ticker: r.market_cap for r in rows if r.market_cap is not None}
    fetched = {r.ticker for r in rows}
    return caps, fetched
 def upsert_market_caps(caps: dict[str, float]) -> None:
--- a/plots/equity_curves.png
+++ b/plots/equity_curves.png
--- a/plots/hp_sweep.png
+++ b/plots/hp_sweep.png
--- a/plots/position_size.png
+++ b/plots/position_size.png