smaug/ingestion
Dominik Roth b5268f063e feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes
- sec_bulk_ingest.py: new module — downloads quarterly form.idx from SEC EDGAR,
  filters Form 4/4A, fetches each filing's SGML/XML, parses and stores.
  Adaptive token-bucket rate limiter (backs off on 429/5xx, ramps on success).
  Uses filter_new_accessions for fast quarter-level dedup before any HTTP.
  Marks derivative-only filings as seen so they're skipped on resume.
- form4_parser: extract tx_code (transactionCode) from each transaction row;
  fix role extraction (Director/10%owner/Officer fallback); fix _text() to
  handle <value> sub-elements; fix footnote text extraction
- edgar_poller: filter feed entries to Form 4/4A only; skip XSLT stylesheet URLs
  when resolving XML filing links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 17:48:51 +02:00
..
__init__.py feat: add PLAN.md and insider copytrade POC implementation 2026-05-04 16:15:22 +00:00
edgar_poller.py feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00
efts_ingest.py feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00
form4_parser.py feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00
historical_ingest.py feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00
sec_bulk_ingest.py feat(ingestion): bulk historical ingest, form4 tx_code, parser fixes 2026-05-26 17:48:51 +02:00