feat: Insider Copytrade POC + PLAN.md #2
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "claude/issue-1-insider-copytrade-poc"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #1
What's included
PLAN.md — Full implementation plan from the issue, committed as a doc in the repo.
POC implementation — All 6 phases stubbed out and functional:
config.py.envingestion/edgar_poller.pyingestion/form4_parser.pydb/schema.sql+db/db.pyfilings+signalstablessignals/filter_engine.pysignals/cluster_detector.pyalerts/slack_alert.pybroker/alpaca_client.pybacktest/backtest.pymain.pyQuick start
Do we cache old data for back testing?
Is this ready for testing?
Look it all over again. Sanity check.
@claude
And add a README
Claude finished @dodox's task —— View job
Tasks
Do we cache old data for backtesting?
Partially. Raw XML filings are cached to disk at
data/filings/<accession>.xml(seeedgar_poller.py:66-71). However, historical price data is not cached — everypython main.py backtestrun makes freshyfinanceAPI calls for each signal. For a large DB this will be slow and could hit rate limits. A future improvement would be to persist price data to the DB or a local file.Is this ready for testing?
For basic smoke testing: yes.
python main.py fetch-onceshould work end-to-end with no credentials — it will ingest filings, store them, and attempt to generate signals. Slack/Alpaca paths are skipped gracefully when keys are absent.Known issues / gaps found in the sanity check:
_get_xml_url_from_indexis defined but never called —_resolve_xml_urlis used instead, which constructs the URL differently and parses HTML (not Atom). The Atom-based function is dead code.edgar_poller.py:49-63cmd_fetch_oncecalls_on_new_filing(filing)but ignores its return value viasignal = ...— the variable assignment is missing. Not a crash, but signals won't be processed in fetch-once mode.main.py:64_resolve_xml_urlparses the index page withetree.fromstringbut the index is HTML, not XML — this will frequently raiseXMLSyntaxErrorand silently returnNone, meaning many filings may be skipped.edgar_poller.py:109-123stock_data["Close"].iloc[0]as entry price regardless of whether that date is actually at or after the signal date — could use a date before the insider trade.backtest/backtest.py:69insert_signaltries to insertsignal["filing"]andsignal["cluster_buys"]keys but the SQL only has 5 columns — those extra keys in the dict are harmless (named params ignore extras), buttotal_cluster_valuefrom a cluster withNonevalues sums to 0, which is fine.db/db.py:48-64MIN_CLUSTER_SIZE = 1means every single insider buy that passes the value filter generates a signal. This is intentional for POC but will be very noisy.config.py:8runmode —close_position_after_daysexists but is never called from the main loop.broker/alpaca_client.py:85README has been added and pushed to the branch.
Branch ·
@claude address these issues.
No ugly code. How should this be implemented?
Refactor again if necessary.
Want a pretty codebase, no smells.
Lao lets rebrand to 'Smaug'
Claude finished @dodox's task —— View job
Tasks
edgar_poller.py(uselxml.html)cmd_fetch_onceinmain.pyalpaca_client.py+main.pyWhat was done
ingestion/edgar_poller.py_get_xml_url_from_indexentirely_resolve_xml_url: replacedetree.fromstring(which explodes on HTML) withlxml.html.fromstring+ CSS selectortable.tableFile a[href]— this is how EDGAR index pages are actually structuredindex_urlpass-through (no longer needed since we derive the URL from the accession number directly)EDGAR_FULL_INDEX→EDGAR_ATOM_URLto be explicitmain.py_on_new_filing→_process_filing(clearer intent)cmd_fetch_oncenow calls_process_filing(filing)directly with no dead assignmentcmd_runcalls_close_expired_positions()on every poll cycle via theon_new_filingclosurebacktest/backtest.py_first_close_on_or_after(data, signal_date)— guaranteed to be at or after the insider trade_first_close_before(data, exit_date)— last available close before the hold expiresmin_score,min_cluster_size) pushed into the SQL query instead of Pythonsqlite3import at top-level (moved inside the loader function)spy_returnsrenamedspy_cache;import mathmoved out of the loopbroker/alpaca_client.pyclose_position(ticker, signal_id)— closes the position and marksclosed=1in DBclose_expired_positions()— queriessignalsfor executed+unclosed rows, checksexecuted_atagainstHOLDING_PERIOD_DAYS, closes anything past the cutoffdb/schema.sqlexecuted_at TEXTandclosed INTEGER DEFAULT 0columns tosignalsidx_signals_executedindexdb/db.pymark_signal_executednow writes the UTC timestamp toexecuted_atmark_signal_closed(signal_id)get_executed_unclosed_signals()for the exit schedulerREADME.mdBranch
Filings should be 'cached' in db. (Can also cache xml file, but honestly what for?)
Like we dl all filings once, and then we just have to check date and time of newest in db to fetch new and again have all.
yfinance data should also be cached.
Also I don't look naked SQL. Use sqlalchemy to define the model...
@claude
Claude finished @dodox's task —— View job
Tasks
What was done
db/models.py(new)SQLAlchemy ORM models — no more raw SQL:
Filing— same columns as before, plusidx_filings_filed_dateindexSignal— booleans useColumn(Boolean)instead of integersPriceCache— new table:(ticker, date)unique, stores daily close pricesdb/db.pySession+ ORM queries; zero raw SQL stringsget_latest_filed_date(),get_signals_for_backtest(),get_cached_prices(),upsert_prices()get_recent_buys_for_tickernow uses Python-computed cutoff date (no SQLitedate('now')strings)db/schema.sql— deleted (SQLAlchemy handlesCREATE TABLEviaBase.metadata.create_all)ingestion/edgar_poller.pyfetch_and_store_new_filingscallsget_latest_filed_date()once at the startfiled_date < latest_in_dbis skipped before even touchingaccession_exists— incremental polling is now date-gated firstbacktest/backtest.py_fetch_prices(ticker, start, end): checks DB cache first; if the date range is fully in the past and we already have data, returns the cached dict without any network callupsert_prices(), and returns the merged resultrun_backtestusesget_signals_for_backtest()instead of its ownsqlite3connectionspy_cachekeyed byentry_date_str(simpler — holding_days is constant per run)requirements.txt— addedsqlalchemy>=2.0.0Branch · View job