The plan #1

Closed
opened 2026-05-04 18:08:16 +02:00 by dodox · 2 comments
Owner

Insider Copytrade System -- Implementation Plan

Description

A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API.

The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions.


Background

Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider buying is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when:

  • Multiple insiders buy simultaneously (cluster signal)
  • The trade is unplanned (not a 10b5-1 scheduled plan)
  • The company is small/mid-cap (less institutional arbitrage)

The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level.

Key risk: This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys.


System Outline

SEC EDGAR RSS Feed (poll every 10 min)
        |
   [Ingestion Layer]
        |
   Parse Form 4 XML
        |
   [Filter Engine]
    - Buy only (flag = A)
    - Exclude 10b5-1 plans
    - Min transaction size
    - Role weighting
    - Cluster detection
        |
   SQLite Database
        |
   ┌────────────┬──────────────┐
   |            |              |
[Backtester] [Slack Alert]  [Alpaca API]
             (manual)      (paper/live)

Actionables

Phase 1 -- Data Ingestion

Goal: Reliably pull and parse Form 4 filings as they appear.

Tasks:

  1. Set up project structure
insider-copytrade/
  ingestion/
    edgar_poller.py      # polls EDGAR RSS
    form4_parser.py      # parses XML -> structured dict
  db/
    schema.sql
    db.py                # SQLite interface
  signals/
    filter_engine.py     # applies signal filters
    cluster_detector.py
  alerts/
    slack_alert.py
  broker/
    alpaca_client.py
  backtest/
    backtest.py
  config.py
  main.py
  1. Poll EDGAR RSS for Form 4 filings every 10 minutes:
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent

SEC also provides a structured latest filings feed:

https://efts.sec.gov/LATEST/search-index?q=&forms=4
  1. For each new filing, fetch and parse the XML document. Key fields to extract:

    • issuerTradingSymbol (ticker)
    • rptOwnerName, officerTitle (insider name + role)
    • transactionDate
    • transactionAcquiredDisposedCode (A = buy, D = sell)
    • transactionShares, transactionPricePerShare
    • transactionTotalValue (compute if not present)
    • footnotes (check for "10b5-1" mention)
    • sharesOwnedFollowingTransaction
  2. Store raw filing XML + parsed fields. Track accessionNumber as dedup key.

SQLite schema:

CREATE TABLE filings (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    accession_number TEXT UNIQUE,
    ticker TEXT,
    cik TEXT,
    insider_name TEXT,
    role TEXT,
    transaction_date TEXT,
    filed_date TEXT,
    shares REAL,
    price REAL,
    total_value REAL,
    flag TEXT,           -- A or D
    is_10b51 INTEGER,    -- 0 or 1
    post_tx_shares REAL,
    created_at TEXT
);

CREATE TABLE signals (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    ticker TEXT,
    trigger_date TEXT,
    cluster_size INTEGER,
    total_cluster_value REAL,
    score REAL,
    alerted INTEGER DEFAULT 0,
    executed INTEGER DEFAULT 0,
    created_at TEXT
);

Phase 2 -- Filter Engine

Goal: Reduce noise to actionable signals only.

Filters to apply (in order):

Filter Logic
Buy only flag == 'A'
Exclude 10b5-1 Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan"
Min transaction value total_value >= 50000 (configurable)
Exclude derivative transactions Options exercises are weaker signal than open market purchases
Role weighting CEO/CFO/President = high; Director = medium; 10% owner = context-dependent
Cluster detection 2+ insiders buying same ticker within 30 days = elevated signal

Scoring formula (simple v1):

score = base_role_weight * log(total_value) * cluster_multiplier
# cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1))

Expose all thresholds in config.py for easy tuning during backtesting.


Phase 3 -- SQLite Storage

SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting:

conn = sqlite3.connect('insider.db')
conn.execute('PRAGMA journal_mode=WAL')

Keep raw filing XML in a /data/filings/ directory keyed by accession number. Parse on ingest, re-parse never needed.


Phase 4 -- Slack Alerts

Goal: Get notified immediately when a signal fires, with enough context to decide manually.

  1. Create a Slack app, get a webhook URL (takes 5 minutes)
  2. Alert format:
INSIDER BUY SIGNAL
Ticker:   $ACME
Insider:  John Smith (CEO)
Date:     2025-05-01
Shares:   10,000 @ $14.50 = $145,000
Cluster:  3 insiders in last 14 days
Score:    8.4
10b5-1:   No
EDGAR:    https://www.sec.gov/cgi-bin/browse-edgar?...
  1. Alert only on signals above configurable score threshold
  2. Mark alerted = 1 in DB after sending to avoid duplicates on re-poll
import requests

def send_slack_alert(webhook_url, signal):
    requests.post(webhook_url, json={"text": format_signal(signal)})

Phase 5 -- Backtesting

Goal: Validate filter parameters on historical data before going live.

Data:

  • Historical Form 4 filings: download bulk XML from https://www.sec.gov/dera/data/form-4-data
  • Price data: yfinance (free, sufficient for backtesting)

Backtest logic:

# For each signal in historical data:
# - Entry: next market open after filed_date
# - Exit: N days later (configurable: 30/60/90/180)
# - Calculate return vs SPY over same period
# - Aggregate by role, cluster_size, market_cap bucket

Use vectorbt for performance:

import vectorbt as vbt
# Build entry/exit signal matrices aligned to price data
# Run portfolio simulation with configurable position sizing

Output metrics:

  • Annualized return vs SPY benchmark
  • Win rate
  • Avg return by holding period
  • Avg return by role / cluster size
  • Max drawdown
  • Sharpe ratio

Critical: Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage.

Parameter grid to test:

MIN_VALUE = [25_000, 50_000, 100_000]
HOLDING_DAYS = [30, 60, 90, 180]
CLUSTER_WINDOW = [14, 30]
MIN_CLUSTER_SIZE = [1, 2, 3]
ROLES = ['all', 'c-suite-only']

Phase 6 -- Alpaca Integration

Goal: Optionally auto-execute signals. Start with paper trading.

Paper trading base URL: https://paper-api.alpaca.markets
Live trading base URL: https://api.alpaca.markets

Swap via config flag -- never hardcode.

from alpaca_trade_api import REST

api = REST(
    key_id=config.ALPACA_KEY,
    secret_key=config.ALPACA_SECRET,
    base_url=config.ALPACA_BASE_URL  # paper or live
)

def execute_signal(ticker, portfolio_value, signal_score):
    # Fixed fractional sizing: 2% of portfolio per signal
    price = api.get_latest_trade(ticker).price
    allocation = portfolio_value * 0.02
    qty = int(allocation / price)
    if qty < 1:
        return
    api.submit_order(
        symbol=ticker,
        qty=qty,
        side='buy',
        type='market',
        time_in_force='day'
    )

Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure.

Exit logic (v1): time-based only (close after N days). Add trailing stop later.


Build Order

Step Deliverable Est. Time
1 EDGAR poller + Form 4 XML parser + SQLite storage 1 day
2 Filter engine + cluster detector 0.5 day
3 Slack alert 1 hour
4 Historical data download + backtest 1-2 days
5 Alpaca paper trading integration 0.5 day
6 Run paper trading 4-8 weeks, monitor --
7 Switch to live with small capital --

Do not proceed to Step 7 without meaningful paper trading history.


Dependencies

requests
lxml
sqlite3 (stdlib)
yfinance
vectorbt
alpaca-trade-api
python-dotenv

All free. No paid APIs required.


Config Template

# config.py
EDGAR_POLL_INTERVAL = 600        # seconds
MIN_TRANSACTION_VALUE = 50_000
MIN_CLUSTER_SIZE = 1             # raise to 2 for higher quality
CLUSTER_WINDOW_DAYS = 30
HOLDING_PERIOD_DAYS = 90
POSITION_SIZE_PCT = 0.02         # 2% per signal
MAX_POSITIONS = 20
SCORE_ALERT_THRESHOLD = 5.0

SLACK_WEBHOOK_URL = ""
ALPACA_KEY = ""
ALPACA_SECRET = ""
ALPACA_BASE_URL = "https://paper-api.alpaca.markets"  # switch for live
# Insider Copytrade System -- Implementation Plan ## Description A personal system that monitors SEC EDGAR Form 4 filings in real-time, filters for high-quality insider buying signals, alerts via Slack, and optionally executes trades automatically through Alpaca's paper or live trading API. The system is fully self-hosted, uses only free/public data sources, and requires no third-party data subscriptions. --- ## Background Company insiders (executives, directors, >10% shareholders) must file SEC Form 4 within 2 business days of any trade. This is public data via SEC EDGAR. The signal value of insider *buying* is academically documented -- executives buying their own stock with personal capital is a meaningful vote of confidence, particularly when: - Multiple insiders buy simultaneously (cluster signal) - The trade is unplanned (not a 10b5-1 scheduled plan) - The company is small/mid-cap (less institutional arbitrage) The edge vs. political trade copying: 2-day disclosure lag vs. 45 days, and the signal is company-specific rather than sector-level. **Key risk:** This signal is publicly known and tracked. The edge is in filtering quality and execution speed, not data exclusivity. Large-cap Form 4 signals are arbitraged quickly. Focus on small/mid-cap, clustered, unplanned buys. --- ## System Outline ``` SEC EDGAR RSS Feed (poll every 10 min) | [Ingestion Layer] | Parse Form 4 XML | [Filter Engine] - Buy only (flag = A) - Exclude 10b5-1 plans - Min transaction size - Role weighting - Cluster detection | SQLite Database | ┌────────────┬──────────────┐ | | | [Backtester] [Slack Alert] [Alpaca API] (manual) (paper/live) ``` --- ## Actionables ### Phase 1 -- Data Ingestion **Goal:** Reliably pull and parse Form 4 filings as they appear. **Tasks:** 1. Set up project structure ``` insider-copytrade/ ingestion/ edgar_poller.py # polls EDGAR RSS form4_parser.py # parses XML -> structured dict db/ schema.sql db.py # SQLite interface signals/ filter_engine.py # applies signal filters cluster_detector.py alerts/ slack_alert.py broker/ alpaca_client.py backtest/ backtest.py config.py main.py ``` 2. Poll EDGAR RSS for Form 4 filings every 10 minutes: ``` https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=4&dateb=&owner=include&count=40&search_text=&action=getcurrent ``` SEC also provides a structured latest filings feed: ``` https://efts.sec.gov/LATEST/search-index?q=&forms=4 ``` 3. For each new filing, fetch and parse the XML document. Key fields to extract: - `issuerTradingSymbol` (ticker) - `rptOwnerName`, `officerTitle` (insider name + role) - `transactionDate` - `transactionAcquiredDisposedCode` (A = buy, D = sell) - `transactionShares`, `transactionPricePerShare` - `transactionTotalValue` (compute if not present) - `footnotes` (check for "10b5-1" mention) - `sharesOwnedFollowingTransaction` 4. Store raw filing XML + parsed fields. Track `accessionNumber` as dedup key. **SQLite schema:** ```sql CREATE TABLE filings ( id INTEGER PRIMARY KEY AUTOINCREMENT, accession_number TEXT UNIQUE, ticker TEXT, cik TEXT, insider_name TEXT, role TEXT, transaction_date TEXT, filed_date TEXT, shares REAL, price REAL, total_value REAL, flag TEXT, -- A or D is_10b51 INTEGER, -- 0 or 1 post_tx_shares REAL, created_at TEXT ); CREATE TABLE signals ( id INTEGER PRIMARY KEY AUTOINCREMENT, ticker TEXT, trigger_date TEXT, cluster_size INTEGER, total_cluster_value REAL, score REAL, alerted INTEGER DEFAULT 0, executed INTEGER DEFAULT 0, created_at TEXT ); ``` --- ### Phase 2 -- Filter Engine **Goal:** Reduce noise to actionable signals only. **Filters to apply (in order):** | Filter | Logic | |---|---| | Buy only | `flag == 'A'` | | Exclude 10b5-1 | Scan footnotes for "10b5-1", "Rule 10b5", "adopted a plan" | | Min transaction value | `total_value >= 50000` (configurable) | | Exclude derivative transactions | Options exercises are weaker signal than open market purchases | | Role weighting | CEO/CFO/President = high; Director = medium; 10% owner = context-dependent | | Cluster detection | 2+ insiders buying same ticker within 30 days = elevated signal | **Scoring formula (simple v1):** ```python score = base_role_weight * log(total_value) * cluster_multiplier # cluster_multiplier = 1.0 + (0.5 * (cluster_size - 1)) ``` Expose all thresholds in `config.py` for easy tuning during backtesting. --- ### Phase 3 -- SQLite Storage SQLite is sufficient for this workload (low write volume, single process). Use WAL mode for concurrent reads during backtesting: ```python conn = sqlite3.connect('insider.db') conn.execute('PRAGMA journal_mode=WAL') ``` Keep raw filing XML in a `/data/filings/` directory keyed by accession number. Parse on ingest, re-parse never needed. --- ### Phase 4 -- Slack Alerts **Goal:** Get notified immediately when a signal fires, with enough context to decide manually. 1. Create a Slack app, get a webhook URL (takes 5 minutes) 2. Alert format: ``` INSIDER BUY SIGNAL Ticker: $ACME Insider: John Smith (CEO) Date: 2025-05-01 Shares: 10,000 @ $14.50 = $145,000 Cluster: 3 insiders in last 14 days Score: 8.4 10b5-1: No EDGAR: https://www.sec.gov/cgi-bin/browse-edgar?... ``` 3. Alert only on signals above configurable score threshold 4. Mark `alerted = 1` in DB after sending to avoid duplicates on re-poll ```python import requests def send_slack_alert(webhook_url, signal): requests.post(webhook_url, json={"text": format_signal(signal)}) ``` --- ### Phase 5 -- Backtesting **Goal:** Validate filter parameters on historical data before going live. **Data:** - Historical Form 4 filings: download bulk XML from `https://www.sec.gov/dera/data/form-4-data` - Price data: `yfinance` (free, sufficient for backtesting) **Backtest logic:** ```python # For each signal in historical data: # - Entry: next market open after filed_date # - Exit: N days later (configurable: 30/60/90/180) # - Calculate return vs SPY over same period # - Aggregate by role, cluster_size, market_cap bucket ``` **Use `vectorbt` for performance:** ```python import vectorbt as vbt # Build entry/exit signal matrices aligned to price data # Run portfolio simulation with configurable position sizing ``` **Output metrics:** - Annualized return vs SPY benchmark - Win rate - Avg return by holding period - Avg return by role / cluster size - Max drawdown - Sharpe ratio **Critical:** Test on post-2022 data specifically. Pre-2022 results are likely inflated -- the signal became widely tracked after Autopilot/media coverage. **Parameter grid to test:** ```python MIN_VALUE = [25_000, 50_000, 100_000] HOLDING_DAYS = [30, 60, 90, 180] CLUSTER_WINDOW = [14, 30] MIN_CLUSTER_SIZE = [1, 2, 3] ROLES = ['all', 'c-suite-only'] ``` --- ### Phase 6 -- Alpaca Integration **Goal:** Optionally auto-execute signals. Start with paper trading. **Paper trading base URL:** `https://paper-api.alpaca.markets` **Live trading base URL:** `https://api.alpaca.markets` Swap via config flag -- never hardcode. ```python from alpaca_trade_api import REST api = REST( key_id=config.ALPACA_KEY, secret_key=config.ALPACA_SECRET, base_url=config.ALPACA_BASE_URL # paper or live ) def execute_signal(ticker, portfolio_value, signal_score): # Fixed fractional sizing: 2% of portfolio per signal price = api.get_latest_trade(ticker).price allocation = portfolio_value * 0.02 qty = int(allocation / price) if qty < 1: return api.submit_order( symbol=ticker, qty=qty, side='buy', type='market', time_in_force='day' ) ``` Position sizing: start at 2% per signal, max 10% in any single ticker. Add a max open positions limit (e.g. 20) to cap exposure. Exit logic (v1): time-based only (close after N days). Add trailing stop later. --- ## Build Order | Step | Deliverable | Est. Time | |---|---|---| | 1 | EDGAR poller + Form 4 XML parser + SQLite storage | 1 day | | 2 | Filter engine + cluster detector | 0.5 day | | 3 | Slack alert | 1 hour | | 4 | Historical data download + backtest | 1-2 days | | 5 | Alpaca paper trading integration | 0.5 day | | 6 | Run paper trading 4-8 weeks, monitor | -- | | 7 | Switch to live with small capital | -- | Do not proceed to Step 7 without meaningful paper trading history. --- ## Dependencies ``` requests lxml sqlite3 (stdlib) yfinance vectorbt alpaca-trade-api python-dotenv ``` All free. No paid APIs required. --- ## Config Template ```python # config.py EDGAR_POLL_INTERVAL = 600 # seconds MIN_TRANSACTION_VALUE = 50_000 MIN_CLUSTER_SIZE = 1 # raise to 2 for higher quality CLUSTER_WINDOW_DAYS = 30 HOLDING_PERIOD_DAYS = 90 POSITION_SIZE_PCT = 0.02 # 2% per signal MAX_POSITIONS = 20 SCORE_ALERT_THRESHOLD = 5.0 SLACK_WEBHOOK_URL = "" ALPACA_KEY = "" ALPACA_SECRET = "" ALPACA_BASE_URL = "https://paper-api.alpaca.markets" # switch for live ```
Author
Owner

@claude get to work. Create md in repo of this plan. And build the poc.

@claude get to work. Create md in repo of this plan. And build the poc.
Collaborator

Claude finished @dodox's task —— View job


Done

  • Gather context and understand request
  • Check for existing branches
  • Create branch: claude/issue-1-insider-copytrade-poc
  • Create PLAN.md
  • Build POC structure
    • config.py
    • main.py
    • ingestion/edgar_poller.py
    • ingestion/form4_parser.py
    • db/schema.sql + db/db.py
    • signals/filter_engine.py + cluster_detector.py
    • alerts/slack_alert.py
    • broker/alpaca_client.py
    • backtest/backtest.py
    • requirements.txt + .env.example
  • Commit and push
  • Create PR

PR opened: dodox/Cleopatra#2

20 files committed on branch claude/issue-1-insider-copytrade-poc. Quick start:

cp .env.example .env
# fill in SLACK_WEBHOOK_URL, ALPACA_KEY, ALPACA_SECRET
pip install -r requirements.txt
python main.py fetch-once   # one-shot EDGAR ingest
python main.py run          # continuous polling loop (every 10 min)
python main.py backtest     # backtest signals stored in DB

| Branch

**Claude finished @dodox's task** —— [View job](https://git.dominik-roth.eu/dodox/Cleopatra/actions/runs/2) --- ### Done - [x] Gather context and understand request - [x] Check for existing branches - [x] Create branch: `claude/issue-1-insider-copytrade-poc` - [x] Create PLAN.md - [x] Build POC structure - [x] config.py - [x] main.py - [x] ingestion/edgar_poller.py - [x] ingestion/form4_parser.py - [x] db/schema.sql + db/db.py - [x] signals/filter_engine.py + cluster_detector.py - [x] alerts/slack_alert.py - [x] broker/alpaca_client.py - [x] backtest/backtest.py - [x] requirements.txt + .env.example - [x] Commit and push - [x] Create PR PR opened: https://git.dominik-roth.eu/dodox/Cleopatra/pulls/2 **20 files committed** on branch `claude/issue-1-insider-copytrade-poc`. Quick start: ```bash cp .env.example .env # fill in SLACK_WEBHOOK_URL, ALPACA_KEY, ALPACA_SECRET pip install -r requirements.txt python main.py fetch-once # one-shot EDGAR ingest python main.py run # continuous polling loop (every 10 min) python main.py backtest # backtest signals stored in DB ``` | [Branch](https://git.dominik-roth.eu/dodox/Cleopatra/src/branch/claude/issue-1-insider-copytrade-poc)
dodox closed this issue 2026-05-04 18:24:37 +02:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: dodox/smaug#1
No description provided.