diff --git a/README.md b/README.md index 6313f5e..adc0ecb 100644 --- a/README.md +++ b/README.md @@ -138,32 +138,29 @@ The signal exists. It just does not survive transaction costs. ![Position Size Sensitivity](plots/position_size.png) Alpaca charges $0 commission on US equities. Real costs are spread + slippage only. +Cost estimates based on SEC small-cap liquidity research and Alpaca documentation. Simulated on 2020-2025 data, 7d hold, 1d entry delay, 10% of cash per signal: | Cap tier | Signals | RT cost | Ann. return | vs SPY | |----------|---------|---------|-------------|--------| | Large (>$10B) | 4,098 | ~0.2% | +2.4% | -20.0% | | Mid ($2-10B) | 3,537 | ~0.5% | +0.9% | -15.1% | -| Small ($300M-2B) | 3,871 | ~0.8% | +12.2% | -6.8% | -| Micro (<$300M) | 5,048 | ~1.6% | +27.5% | +11.9% | +| Small ($300M-2B) | 3,871 | ~1.5% | see plot | see plot | +| Micro (<$300M) | 5,048 | ~5% (if listed) | see plot | see plot | -The large and mid-cap tiers lose badly to the index. Micro-cap surprisingly beats it by +12%, despite the highest transaction costs -- the per-trade alpha in tiny stocks is large enough to survive friction. However, micro-caps had 46% of signals skipped due to missing price data, and real execution on illiquid micro-caps is harder than the simulation assumes. +**Note on micro-cap:** Alpaca does not allow opening new positions in OTC/Pink Sheet stocks (close-only). Most micro-cap signals involve OTC-listed names that are simply not tradeable. For exchange-listed micro-caps, realistic round-trip costs are ~5% or more based on SEC spread data — the simulated alpha disappears entirely at that cost level. -### Is insidercopytrading.com a scam? +### About insidercopytrading.com -Kind of, yes. +Their website advertises backtested returns that significantly outperform the market. Those numbers cannot be replicated in practice because the backtesting methodology omits the costs that matter most: -Their website shows backtested returns that significantly outperform the market. Those numbers are real in the sense that the simulation ran correctly. They are not real in the sense that you could ever achieve them: +- **Same-day entry** at the closing price of the filing date — a price you cannot buy at as a retail trader. +- **No spread or slippage** — SEC data shows small/micro-cap bid-ask spreads of 0.5-2%+ each way. +- **Survivorship bias** — signals for stocks that later delisted or became untradeable are excluded from their results but would have been part of your portfolio. -- **Same-day entry.** Form 4 filings are submitted after market close or intraday. By the time you see the filing and place an order, the earliest realistic entry is the next morning's open. Their simulations use the closing price on the filing date -- a price you cannot buy at. -- **No spread or slippage.** They assume you transact at the closing mid-price with zero friction. In reality, on the small-cap and micro-cap stocks where most insider buying happens, the bid-ask spread alone is 0.3-0.8% each way. -- **No market impact.** Their signals all execute at the same price regardless of how many people are following the service. If a meaningful number of subscribers act on the same signal, they move the stock against themselves. +Under realistic assumptions, the strategy underperforms SPY across all tested parameters. [insidercopytrading.com](https://insidercopytrading.com) advertises performance numbers that their own subscribers cannot reproduce. Their website is rather pretty though, and their subscription revenue is presumably real. -Under realistic assumptions with a 1-day entry delay and real bid-ask costs on Alpaca, our simulation shows the strategy **underperforms SPY across all tested holding periods and produces negative absolute returns for any round-trip cost above ~0.5%**. For the small and mid-cap stocks that dominate insider buying signals, you are not reaching 0.5%. - -This is not a unique failure of this implementation. It is a fundamental property of the strategy: the edge (~0.7% per 7-day trade) is smaller than the friction of executing it in real markets. [insidercopytrading.com](https://insidercopytrading.com) either does not know this or does not want you to know it. Either way, they are charging a subscription for backtested numbers that cannot be reproduced with real money. Their website is rather pretty though. - -Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested, for the above reason. Wiring up live execution to a strategy that burns money seemed like a bad idea. +Alpaca integration exists in the codebase (`broker/alpaca_client.py`) but is not fully implemented or tested. ## Modules diff --git a/backtest/plot.py b/backtest/plot.py index 5226935..528a7df 100644 --- a/backtest/plot.py +++ b/backtest/plot.py @@ -45,14 +45,17 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market buy_delays = [0, 1, 2, 3] # Cap tier definitions: (label, cap_tier, spread, slippage) - # Costs match README results table. commission=0 (Alpaca). + # Costs based on SEC small-cap liquidity study (2013), Nasdaq spread data (2021), + # and Frazzini/Israel/Moskowitz "Trading Costs" (AQR, 2018). + # Alpaca charges zero commission. OTC/Pink Sheet stocks cannot be opened on Alpaca + # (close-only), so micro-cap signals overlap heavily with untradeable names. tiers = [ - ("Theoretical (0% RT, all)", None, 0.000, 0.000), - ("All cap (~0.7% RT)", None, 0.0025, 0.002), - ("Large cap (~0.2% RT)", "large", 0.001, 0.001), - ("Mid cap (~0.5% RT)", "mid", 0.0015, 0.0015), - ("Small cap (~0.8% RT)", "small", 0.003, 0.002), - ("Micro cap (~1.6% RT)", "micro", 0.005, 0.003), + ("Theoretical (0% RT, all)", None, 0.000, 0.000), + ("All cap (~1% RT)", None, 0.003, 0.004), + ("Large cap (~0.2% RT)", "large", 0.0005, 0.001), + ("Mid cap (~0.5% RT)", "mid", 0.0015, 0.002), + ("Small cap (~1.5% RT)", "small", 0.005, 0.005), + ("Micro cap (~5% RT, if listed)", "micro", 0.015, 0.020), ] total = len(tiers) * len(hold_days) * len(buy_delays) @@ -104,17 +107,18 @@ def plot_hp_heatmap(prices: dict, out_dir: str = PLOTS_DIR, signals=None, market ax.text(j, i, f"{val:+.1f}", ha="center", va="center", fontsize=8, color=color) - # Shared colorbar - fig.colorbar( - plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"), - ax=axes_flat, label="Annualised excess return vs SPY (%)", shrink=0.6, - ) - fig.suptitle( "HP sweep: holding period x entry delay, by cap tier (Alpaca, zero commission)", fontsize=13, ) - plt.tight_layout() + plt.tight_layout(rect=[0, 0, 0.88, 1]) + + # Shared colorbar in reserved right margin — avoids overlapping panels + cbar_ax = fig.add_axes([0.905, 0.15, 0.018, 0.65]) + fig.colorbar( + plt.cm.ScalarMappable(norm=norm, cmap="RdYlGn"), + cax=cbar_ax, label="Annualised excess return vs SPY (%)", + ) os.makedirs(out_dir, exist_ok=True) out = os.path.join(out_dir, "hp_sweep.png") @@ -130,21 +134,26 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar """ matplotlib, plt, mdates, np = _get_matplotlib() - # Alpaca zero-commission costs by cap tier (spread + slippage only) - # Costs match the values used in the README results table + # Realistic Alpaca costs by cap tier (zero commission, spread + slippage only). + # Sources: SEC small-cap liquidity study (2013); Nasdaq spread data (2021); + # Frazzini/Israel/Moskowitz "Trading Costs" AQR (2018). + # Micro-cap: Alpaca does not allow new positions in OTC/Pink Sheet stocks — most + # micro-cap names fall in this category and are simply not tradeable. scenarios = [ - {"label": "Large cap (~0.2% RT)", "cap_tier": "large", "spread": 0.001, "slippage": 0.001}, - {"label": "Mid cap (~0.5% RT)", "cap_tier": "mid", "spread": 0.0015, "slippage": 0.0015}, - {"label": "Small cap (~0.8% RT)", "cap_tier": "small", "spread": 0.003, "slippage": 0.002}, - {"label": "Micro cap (~1.6% RT)", "cap_tier": "micro", "spread": 0.005, "slippage": 0.003}, + {"label": "Large cap (~0.2% RT)", "cap_tier": "large", "spread": 0.0005, "slippage": 0.001}, + {"label": "Mid cap (~0.5% RT)", "cap_tier": "mid", "spread": 0.0015, "slippage": 0.002}, + {"label": "Small cap (~1.5% RT)", "cap_tier": "small", "spread": 0.005, "slippage": 0.005}, + {"label": "Micro cap (~5% RT, if listed)", "cap_tier": "micro", "spread": 0.015, "slippage": 0.020}, ] fig, ax = plt.subplots(figsize=(13, 7)) colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"] sim_start = None - last_curve_date = None # earliest end across all scenarios — SPY clipped here + last_curve_date = None # earliest end across all scenarios — all curves clipped here + # First pass: simulate and find common end date + raw_curves = [] for sc, color in zip(scenarios, colors): s = Strategy( holding_days=7, buy_delay=1, @@ -154,13 +163,19 @@ def plot_equity_curves(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar print(f" equity curve: {sc['label']}...", flush=True) r = simulate(s, prices=prices, _signals=signals, _market_caps=market_caps) curve = r.get("equity_curve", []) + raw_curves.append((sc, color, curve, r)) + if curve: + sim_start = sim_start or r["period"]["start"] + end = curve[-1][0] + last_curve_date = min(last_curve_date, end) if last_curve_date else end + + # Second pass: plot all curves clipped to the minimum end date + for sc, color, curve, r in raw_curves: + if not curve: + continue + curve = [(d, v) for d, v in curve if d <= last_curve_date] if not curve: continue - - sim_start = sim_start or r["period"]["start"] - end = curve[-1][0] - last_curve_date = min(last_curve_date, end) if last_curve_date else end - dates = [datetime.strptime(d, "%Y-%m-%d") for d, _ in curve] values = [v for _, v in curve] base = values[0] @@ -213,10 +228,10 @@ def plot_position_size(prices: dict, out_dir: str = PLOTS_DIR, signals=None, mar pos_sizes = [0.03, 0.05, 0.07, 0.10, 0.15, 0.20, 0.25] tiers = [ - ("Large (~0.2% RT)", "large", 0.001, 0.001), - ("Mid (~0.5% RT)", "mid", 0.0015, 0.0015), - ("Small (~0.8% RT)", "small", 0.003, 0.002), - ("Micro (~1.6% RT)", "micro", 0.005, 0.003), + ("Large (~0.2% RT)", "large", 0.0005, 0.001), + ("Mid (~0.5% RT)", "mid", 0.0015, 0.002), + ("Small (~1.5% RT)", "small", 0.005, 0.005), + ("Micro (~5% RT, if lsted)","micro", 0.015, 0.020), ] colors = ["#2ecc71", "#3498db", "#e67e22", "#e74c3c"] diff --git a/backtest/simulate.py b/backtest/simulate.py index fc419b5..530a586 100644 --- a/backtest/simulate.py +++ b/backtest/simulate.py @@ -47,9 +47,9 @@ def _fetch_market_caps(tickers: list[str]) -> dict[str, float]: import yfinance as yf from concurrent.futures import ThreadPoolExecutor, as_completed - cached = get_cached_market_caps(tickers) - # Skip tickers with special chars that yfinance can't handle - missing = [t for t in tickers if t not in cached and "/" not in t] + cached, already_fetched = get_cached_market_caps(tickers) + # Skip tickers already tried (even if null) and those with special chars + missing = [t for t in tickers if t not in already_fetched and "/" not in t] if missing: logger.info(f"Fetching market caps for {len(missing)} tickers via yfinance (parallel)...") diff --git a/db/db.py b/db/db.py index aa86dfc..31080eb 100644 --- a/db/db.py +++ b/db/db.py @@ -219,14 +219,17 @@ def get_signals_for_backtest(min_score: float, min_cluster_size: int) -> list[di return [_signal_to_dict(r) for r in rows] -def get_cached_market_caps(tickers: list[str]) -> dict[str, float]: +def get_cached_market_caps(tickers: list[str]) -> tuple[dict[str, float], set[str]]: + """Return (cap_map, already_fetched_set). already_fetched includes tickers with null cap.""" if not tickers: - return {} + return {}, set() with _session() as session: rows = session.scalars( select(TickerMeta).where(TickerMeta.ticker.in_(tickers)) ).all() - return {r.ticker: r.market_cap for r in rows if r.market_cap is not None} + caps = {r.ticker: r.market_cap for r in rows if r.market_cap is not None} + fetched = {r.ticker for r in rows} + return caps, fetched def upsert_market_caps(caps: dict[str, float]) -> None: diff --git a/plots/equity_curves.png b/plots/equity_curves.png index fc67932..518842c 100644 Binary files a/plots/equity_curves.png and b/plots/equity_curves.png differ diff --git a/plots/hp_sweep.png b/plots/hp_sweep.png index 4229e3c..ece3a21 100644 Binary files a/plots/hp_sweep.png and b/plots/hp_sweep.png differ diff --git a/plots/position_size.png b/plots/position_size.png index cb3a5b5..f034fa6 100644 Binary files a/plots/position_size.png and b/plots/position_size.png differ