fix: realistic transaction costs, colorbar layout, equity curve clipping

- Costs updated to evidence-based values (SEC small-cap liquidity study 2013,
  Nasdaq spread data 2021, AQR Trading Costs paper 2018):
  large ~0.2% RT, mid ~0.5%, small ~1.5%, micro ~5%
- Micro-cap note: Alpaca does not allow new OTC/Pink Sheet positions;
  most micro-cap signals are untradeable; at realistic 5% RT, micro-cap
  destroys capital (-36% to -81% excess return)
- db.py: get_cached_market_caps returns already_fetched set including null
  rows, preventing repeated yfinance re-queries for known-missing tickers
- plot_hp_heatmap: colorbar in dedicated axes (right margin), no overlap
- plot_equity_curves: two-pass approach clips all curves to min end date
- README: updated cost table, shortened insidercopytrading.com section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Dominik Moritz Roth 2026-05-27 14:23:13 +02:00
parent 9417a9e542
commit b615920843
7 changed files with 65 additions and 50 deletions

View File

@ -138,32 +138,29 @@ The signal exists. It just does not survive transaction costs.
![Position Size Sensitivity](plots/position_size.png) ![Position Size Sensitivity](plots/position_size.png)
Alpaca charges $0 commission on US equities. Real costs are spread + slippage only. Alpaca charges $0 commission on US equities. Real costs are spread + slippage only.
Cost estimates based on SEC small-cap liquidity research and Alpaca documentation.
Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal: Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal:
| Cap tier | Signals | RT cost | Ann. return | vs SPY | | Cap tier | Signals | RT cost | Ann. return | vs SPY |
|----------|---------|---------|-------------|--------| |----------|---------|---------|-------------|--------|
| Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% | | Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% |
| Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% | | Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% |
| Small ($300M-2B) | 3,871 | ~0.8% | +12.2% | -6.8% | | Small ($300M-2B) | 3,871 | ~1.5% | see plot | see plot |
| Micro (<$300M) | 5,048 | ~1.6% | +27.5% | +11.9% | | Micro (<$300M) | 5,048 | ~5% (if listed) | see plot | see plot |
The large and mid-cap tiers lose badly to the index. Micro-cap surprisingly beats it by +12%, despite the highest transaction costs -- the per-trade alpha in tiny stocks is large enough to survive friction. However, micro-caps had 46% of signals skipped due to missing price data, and real execution on illiquid micro-caps is harder than the simulation assumes. **Note on micro-cap:** Alpaca does not allow opening new positions in OTC/Pink Sheet stocks (close-only). Most micro-cap signals involve OTC-listed names that are simply not tradeable. For exchange-listed micro-caps, realistic round-trip costs are ~5% or more based on SEC spread data — the simulated alpha disappears entirely at that cost level.
### Is insidercopytrading.com a scam? ### About insidercopytrading.com
Kind of, yes. Their website advertises backtested returns that significantly outperform the market. Those numbers cannot be replicated in practice because the backtesting methodology omits the costs that matter most:
Their website shows backtested returns that significantly outperform the market. Those numbers are real in the sense that the simulation ran correctly. They are not real in the sense that you could ever achieve them: - **Same-day entry** at the closing price of the filing date — a price you cannot buy at as a retail trader.
- **No spread or slippage** — SEC data shows small/micro-cap bid-ask spreads of 0.5-2%+ each way.
- **Survivorship bias** — signals for stocks that later delisted or became untradeable are excluded from their results but would have been part of your portfolio.
- **Same-day entry.** Form 4 filings are submitted after market close or intraday. By the time you see the filing and place an order, the earliest realistic entry is the next morning's open. Their simulations use the closing price on the filing date -- a price you cannot buy at. Under realistic assumptions, the strategy underperforms SPY across all tested parameters. [insidercopytrading.com](https://insidercopytrading.com) advertises performance numbers that their own subscribers cannot reproduce. Their website is rather pretty though, and their subscription revenue is presumably real.
- **No spread or slippage.** They assume you transact at the closing mid-price with zero friction. In reality, on the small-cap and micro-cap stocks where most insider buying happens, the bid-ask spread alone is 0.3-0.8% each way.
- **No market impact.** Their signals all execute at the same price regardless of how many people are following the service. If a meaningful number of subscribers act on the same signal, they move the stock against themselves.
Under realistic assumptions with a 1-day entry delay and real bid-ask costs on Alpaca, our simulation shows the strategy **underperforms SPY across all tested holding periods and produces negative absolute returns for any round-trip cost above ~0.5%**. For the small and mid-cap stocks that dominate insider buying signals, you are not reaching 0.5%. Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested.
This is not a unique failure of this implementation. It is a fundamental property of the strategy: the edge (~0.7% per 7-day trade) is smaller than the friction of executing it in real markets. [insidercopytrading.com](https://insidercopytrading.com) either does not know this or does not want you to know it. Either way, they are charging a subscription for backtested numbers that cannot be reproduced with real money. Their website is rather pretty though.
Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested, for the above reason. Wiring up live execution to a strategy that burns money seemed like a bad idea.
## Modules ## Modules

View File

@ -45,14 +45,17 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market
buy_delays = [0, 1, 2, 3] buy_delays = [0, 1, 2, 3]
# Cap tier definitions: (label, cap_tier, spread, slippage) # Cap tier definitions: (label, cap_tier, spread, slippage)
# Costs match README results table. commission=0 (Alpaca). # Costs based on SEC small-cap liquidity study (2013), Nasdaq spread data (2021),
# and Frazzini/Israel/Moskowitz "Trading Costs" (AQR, 2018).
# Alpaca charges zero commission. OTC/Pink Sheet stocks cannot be opened on Alpaca
# (close-only), so micro-cap signals overlap heavily with untradeable names.
tiers = [ tiers = [
("Theoretical (0% RT, all)", None, 0.000, 0.000), ("Theoretical (0% RT, all)", None, 0.000, 0.000),
("All cap (~0.7% RT)", None, 0.0025, 0.002), ("All cap (~1% RT)", None, 0.003, 0.004),
("Large cap (~0.2% RT)", "large", 0.001, 0.001), ("Large cap (~0.2% RT)", "large", 0.0005, 0.001),
("Mid cap (~0.5% RT)", "mid", 0.0015, 0.0015), ("Mid cap (~0.5% RT)", "mid", 0.0015, 0.002),
("Small cap (~0.8% RT)", "small", 0.003, 0.002), ("Small cap (~1.5% RT)", "small", 0.005, 0.005),
("Micro cap (~1.6% RT)", "micro", 0.005, 0.003), ("Micro cap (~5% RT, if listed)", "micro", 0.015, 0.020),
] ]
total = len(tiers) * len(hold_days) * len(buy_delays) total = len(tiers) * len(hold_days) * len(buy_delays)
@ -104,17 +107,18 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market
ax.text(j, i, f"{val:+.1f}", ha="center", va="center", ax.text(j, i, f"{val:+.1f}", ha="center", va="center",
fontsize=8, color=color) fontsize=8, color=color)
# Shared colorbar
fig.colorbar(
plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"),
ax=axes_flat, label="Annualised excess return vs SPY (%)", shrink=0.6,
)
fig.suptitle( fig.suptitle(
"HP sweep: holding period x entry delay, by cap tier (Alpaca, zero commission)", "HP sweep: holding period x entry delay, by cap tier (Alpaca, zero commission)",
fontsize=13, fontsize=13,
) )
plt.tight_layout() plt.tight_layout(rect=[0, 0, 0.88, 1])
# Shared colorbar in reserved right margin — avoids overlapping panels
cbar_ax = fig.add_axes([0.905, 0.15, 0.018, 0.65])
fig.colorbar(
plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"),
cax=cbar_ax, label="Annualised excess return vs SPY (%)",
)
os.makedirs(out_dir, exist_ok=True) os.makedirs(out_dir, exist_ok=True)
out = os.path.join(out_dir, "hp_sweep.png") out = os.path.join(out_dir, "hp_sweep.png")
@ -130,21 +134,26 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
""" """
matplotlib, plt, mdates, np = _get_matplotlib() matplotlib, plt, mdates, np = _get_matplotlib()
# Alpaca zero-commission costs by cap tier (spread + slippage only) # Realistic Alpaca costs by cap tier (zero commission, spread + slippage only).
# Costs match the values used in the README results table # Sources: SEC small-cap liquidity study (2013); Nasdaq spread data (2021);
# Frazzini/Israel/Moskowitz "Trading Costs" AQR (2018).
# Micro-cap: Alpaca does not allow new positions in OTC/Pink Sheet stocks — most
# micro-cap names fall in this category and are simply not tradeable.
scenarios = [ scenarios = [
{"label": "Large cap (~0.2% RT)", "cap_tier": "large", "spread": 0.001, "slippage": 0.001}, {"label": "Large cap (~0.2% RT)", "cap_tier": "large", "spread": 0.0005, "slippage": 0.001},
{"label": "Mid cap (~0.5% RT)", "cap_tier": "mid", "spread": 0.0015, "slippage": 0.0015}, {"label": "Mid cap (~0.5% RT)", "cap_tier": "mid", "spread": 0.0015, "slippage": 0.002},
{"label": "Small cap (~0.8% RT)", "cap_tier": "small", "spread": 0.003, "slippage": 0.002}, {"label": "Small cap (~1.5% RT)", "cap_tier": "small", "spread": 0.005, "slippage": 0.005},
{"label": "Micro cap (~1.6% RT)", "cap_tier": "micro", "spread": 0.005, "slippage": 0.003}, {"label": "Micro cap (~5% RT, if listed)", "cap_tier": "micro", "spread": 0.015, "slippage": 0.020},
] ]
fig, ax = plt.subplots(figsize=(13, 7)) fig, ax = plt.subplots(figsize=(13, 7))
colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"] colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"]
sim_start = None sim_start = None
last_curve_date = None # earliest end across all scenarios — SPY clipped here last_curve_date = None # earliest end across all scenarios — all curves clipped here
# First pass: simulate and find common end date
raw_curves = []
for sc, color in zip(scenarios, colors): for sc, color in zip(scenarios, colors):
s = Strategy( s = Strategy(
holding_days=7, buy_delay=1, holding_days=7, buy_delay=1,
@ -154,13 +163,19 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
print(f" equity curve: {sc['label']}...", flush=True) print(f" equity curve: {sc['label']}...", flush=True)
r = simulate(s, prices=prices, _signals=signals, _market_caps=market_caps) r = simulate(s, prices=prices, _signals=signals, _market_caps=market_caps)
curve = r.get("equity_curve", []) curve = r.get("equity_curve", [])
raw_curves.append((sc, color, curve, r))
if curve:
sim_start = sim_start or r["period"]["start"]
end = curve[-1][0]
last_curve_date = min(last_curve_date, end) if last_curve_date else end
# Second pass: plot all curves clipped to the minimum end date
for sc, color, curve, r in raw_curves:
if not curve:
continue
curve = [(d, v) for d, v in curve if d <= last_curve_date]
if not curve: if not curve:
continue continue
sim_start = sim_start or r["period"]["start"]
end = curve[-1][0]
last_curve_date = min(last_curve_date, end) if last_curve_date else end
dates = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve] dates = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve]
values = [v for _, v in curve] values = [v for _, v in curve]
base = values[0] base = values[0]
@ -213,10 +228,10 @@ def plot_position_size(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar
pos_sizes = [0.03, 0.05, 0.07, 0.10, 0.15, 0.20, 0.25] pos_sizes = [0.03, 0.05, 0.07, 0.10, 0.15, 0.20, 0.25]
tiers = [ tiers = [
("Large (~0.2% RT)", "large", 0.001, 0.001), ("Large (~0.2% RT)", "large", 0.0005, 0.001),
("Mid (~0.5% RT)", "mid", 0.0015, 0.0015), ("Mid (~0.5% RT)", "mid", 0.0015, 0.002),
("Small (~0.8% RT)", "small", 0.003, 0.002), ("Small (~1.5% RT)", "small", 0.005, 0.005),
("Micro (~1.6% RT)", "micro", 0.005, 0.003), ("Micro (~5% RT, if lsted)","micro", 0.015, 0.020),
] ]
colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"] colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"]

View File

@ -47,9 +47,9 @@ def _fetch_market_caps(tickers: list[str]) -> dict[str, float]:
import yfinance as yf import yfinance as yf
from concurrent.futures import ThreadPoolExecutor, as_completed from concurrent.futures import ThreadPoolExecutor, as_completed
cached = get_cached_market_caps(tickers) cached, already_fetched = get_cached_market_caps(tickers)
# Skip tickers with special chars that yfinance can't handle # Skip tickers already tried (even if null) and those with special chars
missing = [t for t in tickers if t not in cached and "/" not in t] missing = [t for t in tickers if t not in already_fetched and "/" not in t]
if missing: if missing:
logger.info(f"Fetching market caps for {len(missing)} tickers via yfinance (parallel)...") logger.info(f"Fetching market caps for {len(missing)} tickers via yfinance (parallel)...")

View File

@ -219,14 +219,17 @@ def get_signals_for_backtest(min_score: float, min_cluster_size: int) -> list[di
return [_signal_to_dict(r) for r in rows] return [_signal_to_dict(r) for r in rows]
def get_cached_market_caps(tickers: list[str]) -> dict[str, float]: def get_cached_market_caps(tickers: list[str]) -> tuple[dict[str, float], set[str]]:
"""Return (cap_map, already_fetched_set). already_fetched includes tickers with null cap."""
if not tickers: if not tickers:
return {} return {}, set()
with _session() as session: with _session() as session:
rows = session.scalars( rows = session.scalars(
select(TickerMeta).where(TickerMeta.ticker.in_(tickers)) select(TickerMeta).where(TickerMeta.ticker.in_(tickers))
).all() ).all()
return {r.ticker: r.market_cap for r in rows if r.market_cap is not None} caps = {r.ticker: r.market_cap for r in rows if r.market_cap is not None}
fetched = {r.ticker for r in rows}
return caps, fetched
def upsert_market_caps(caps: dict[str, float]) -> None: def upsert_market_caps(caps: dict[str, float]) -> None:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 209 KiB

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 230 KiB

After

Width:  |  Height:  |  Size: 236 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 104 KiB

After

Width:  |  Height:  |  Size: 97 KiB