Practical ways AI can extract and compare KPIs from public company filings

This topic has 4 replies, 4 voices, and was last updated 4 months ago by aaron.

Viewing 4 reply threads

Author

Posts
- Oct 3, 2025 at 9:23 am #128613
  Rick Retirement Planner
  Spectator
  I’m a non-technical, curious reader (over 40) who wants to use AI to pull key performance indicators (KPIs) from public company filings — things like revenue, gross margin, headcount, or customer metrics — and compare them across companies and time. I often start with annual reports and filings (for example, SEC EDGAR filings: https://www.sec.gov/edgar/search/).
  
  What I’m hoping to learn:
  - Practical tools: simple AI tools or services (no-code or low-code) that work well for extracting KPIs.
  - Step-by-step workflow: how to go from a filing PDF/HTML to a clean table of KPIs ready for comparison.
  - Accuracy & verification: how to check extraction results and handle ambiguous wording.
  - Output formats: easy formats for comparison (CSV, spreadsheet, small databases).
  Any examples, templates, sample prompts, or recommendations for beginner-friendly tools would be very helpful. Thanks in advance — I’d appreciate practical, non-technical answers or links to simple tutorials.
- Oct 3, 2025 at 10:10 am #128621
  Jeff Bullas
  Keymaster
  Nice point — wanting practical, repeatable ways to pull KPIs from filings is exactly the right focus. Try this quick win first: open a company’s latest 10‑K or 10‑Q, copy the income statement table and paste it into an AI chat. Ask it to return the key KPIs as a CSV — you can do that in under five minutes.
  
  Why this works
  
  Public filings have the gold: revenue, margins, EPS, balance-sheet totals. The challenge is inconsistent labels and formats. AI is great at pattern-matching and turning messy tables into structured rows you can analyze in Excel or Google Sheets.
  
  What you’ll need
  - A recent public filing in text or PDF (10‑K or 10‑Q).
  - AI chat access (ChatGPT or similar).
  - A spreadsheet (Excel or Google Sheets).
  Step-by-step
  1. Find the table: locate the Consolidated Statements of Operations and the Balance Sheet in the filing. Copy the table text (or screenshot and OCR if needed).
  2. Paste into the AI chat and use this prompt (copy-paste below).
  3. Ask the AI to output CSV or a plain table. Copy results into your spreadsheet and set columns to numeric.
  4. Normalize units (thousands vs millions) and currency. Add a column for units if AI didn’t provide one.
  5. Calculate comparison KPIs in the spreadsheet: revenue growth %, gross margin, operating margin, net margin, EPS growth, return on assets.
  Copy-paste AI prompt (use as-is)
  
  Extract the following KPIs from the pasted filing text: CompanyName, FilingDate, PeriodEnd, Revenue, CostOfRevenue, GrossProfit, OperatingIncome, NetIncome, BasicEPS, DilutedEPS, TotalAssets, TotalLiabilities, CashAndCashEquivalents. Output as CSV with headers: CompanyName,FilingDate,PeriodEnd,Revenue,CostOfRevenue,GrossProfit,OperatingIncome,NetIncome,BasicEPS,DilutedEPS,TotalAssets,TotalLiabilities,CashAndCashEquivalents,Units. If a value is not present, write NA. Detect and convert units to USD millions if possible; if you can’t detect units, leave Units blank.
  
  Example result you should expect
  
  CompanyName,FilingDate,PeriodEnd,Revenue,CostOfRevenue,GrossProfit,OperatingIncome,NetIncome,BasicEPS,DilutedEPS,TotalAssets,TotalLiabilities,CashAndCashEquivalents,Units
  Acme Inc,2024-02-28,2023-12-31,5,000,3,000,2,000,500,200,195,10,000,6,000,1,200,USD millions
  
  Common mistakes & fixes
  - Wrong units — always verify the filing header (thousands vs millions). Fix: add a Units column and standardize.
  - Different line names (e.g., “Total revenue” vs “Net sales”) — fix: ask AI to map synonyms before extracting.
  - Tables split across pages — fix: paste contiguous text and tell AI to merge rows for the same period.
  Action plan (next 48 hours)
  1. Pick 3 competitors, pull their latest filings, and run the copy-paste prompt for each.
  2. Import results to one spreadsheet, normalize units, and compute growth and margin columns.
  3. Spot-check 2-3 numbers against the original PDF to validate accuracy.
  Small, repeatable steps win. Start with one company, get comfortable, then scale to a handful. AI speeds the extraction; your judgment validates the comparison.
- Oct 3, 2025 at 11:17 am #128626
  Ian Investor
  Spectator
  Nice practical starter — extracting tables into CSV is exactly the fast win most investors need. I’ll build on that with a compact, repeatable workflow that reduces manual checking and makes cross-company comparisons reliable as you scale from one firm to a dozen.
  
  What you’ll need
  - Latest filings for each company (10‑K/10‑Q PDFs or text).
  - AI chat or extraction tool for turning text/tables into structured rows.
  - A spreadsheet (Excel or Google Sheets) and a simple template to hold normalized KPIs.
  - Optional: OCR app for scanned PDFs and a small “mapping” note of common line-name synonyms.
  Step-by-step: a repeatable workflow
  1. Collect filings: save each filing with a consistent name (Company_Ticker_FilingDate). Keep the original PDF for provenance.
  2. Extract raw tables: copy text or use OCR, then paste into your AI chat/extractor and ask for a structured table of standard line items (you don’t need a verbatim prompt here; keep it conversational and explicit about desired fields).
  3. Normalize units and names: in your spreadsheet, convert every numeric field to the same units (e.g., USD millions) and apply a small mapping table that equates synonyms like “Net sales” = “Revenue.”
  4. Compute KPIs: add columns for revenue growth %, gross/operating/net margins, EPS growth and return-on-assets. Use cell formulas so updates auto-recalculate when you paste new data.
  5. Validate with spot-checks: for each company verify 2–3 figures (revenue, net income, cash) directly against the PDF. Flag any rows where the AI output differs materially and record the PDF page/link in a Provenance column.
  6. Compare across peers: once normalized, create a dashboard sheet that ranks peers by growth, margins and ROA. Sort and filter to spot outliers quickly.
  7. Scale safely: when comfortable, batch extract filings and paste into the same template. Keep versioning (date-stamped copies) and a small audit log of manual corrections.
  What to expect
  - Initial accuracy will be high for headline items but watch for split lines, nonstandard labels and unit misreads — plan 5–10 minutes of validation per filing.
  - Building a short synonym mapping and a Provenance column cuts reconciliation time by half.
  Concise tip: maintain a tiny dictionary of line-item synonyms and a confidence flag per row; over a few filings the dictionary quickly captures most naming quirks and saves you repeated checks.
- Oct 3, 2025 at 11:57 am #128633
  aaron
  Participant
  Quick win acknowledged: your repeatable workflow is exactly the right foundation — saving originals, normalizing names and adding provenance are the non-sexy steps that make scale possible. I’ll build on that with concrete checks, automation-friendly rules and a ready-to-use AI prompt so you get consistent KPIs across peers.
  
  The core problem
  
  Filings are inconsistent: different labels, split tables, and unit notes hidden in headers. That creates false differences between companies unless you normalize and validate.
  
  Why this matters
  
  If you don’t standardize names, units and provenance you’ll compare apples to oranges — wrong growth rates, misstated margins and bad decisions. A few extra minutes of process reduces that risk materially.
  
  Do / Do not (quick checklist)
  - Do store PDF originals, file-named Company_Ticker_Date.
  - Do add a Units column and a Provenance column with PDF page/line note.
  - Do create a tiny synonym dictionary (Net sales → Revenue, Gross profit → GrossProfit).
  - Do not trust AI outputs without spot checks (2–3 headline items per filing).
  - Do not mix units across rows — convert to a common base (USD millions).
  Step-by-step (what you’ll need, how to do it, what to expect)
  1. Collect: download 10‑K/10‑Q PDFs and name consistently. Keep originals.
  2. Extract: copy table text or OCR, paste into the AI chat and run the prompt below.
  3. Import: paste AI CSV into your spreadsheet template that includes mapping and Units columns.
  4. Normalize: convert all numbers to USD millions, apply synonym mapping, add Provenance and Confidence tags.
  5. Validate: spot-check Revenue, Net Income and Cash vs PDF. If discrepancy >2%, flag for manual review.
  Copy-paste AI prompt (use as-is)
  
  Extract these items from the pasted filing text and tables: CompanyName, Ticker, FilingDate, PeriodEnd, Revenue, CostOfRevenue, GrossProfit, OperatingIncome, NetIncome, BasicEPS, DilutedEPS, TotalAssets, TotalLiabilities, CashAndCashEquivalents. Convert values to USD millions if the filing shows thousands or millions; if units are ambiguous, leave Units blank and set Confidence to LOW. Output as CSV with headers: CompanyName,Ticker,FilingDate,PeriodEnd,Revenue,CostOfRevenue,GrossProfit,OperatingIncome,NetIncome,BasicEPS,DilutedEPS,TotalAssets,TotalLiabilities,CashAndCashEquivalents,Units,Confidence,Provenance. If an item is missing, write NA. For Provenance, include page number or nearby line text.
  
  Worked example (expected single-row CSV)
  
  Acme Inc,ACME,2024-02-28,2023-12-31,5000,3000,2000,500,200,1.95,1.90,10000,6000,1200,USD millions,HIGH,”PDF p.45: Consolidated Statements of Operations”
  
  Metrics to track
  - Extraction accuracy rate (spot-checks passed / total checked).
  - Average validation time per filing.
  - % of rows with LOW confidence.
  Common mistakes & fixes
  - Wrong units — fix: read header, bulk-convert, record Units.
  - Split rows across pages — fix: tell AI to merge consecutive lines for same period; manually validate.
  - Synonym mismatch — fix: expand the dictionary and re-run mapping formula in sheet.
  1-week action plan
  1. Day 1: Pick 3 competitors, download PDFs, run the AI prompt and import CSVs.
  2. Day 2–3: Normalize units, apply mapping, add Provenance and Confidence flags; spot-check 2–3 numbers per filing.
  3. Day 4–5: Build a dashboard sheet ranking growth, margins and ROA; review outliers.
  4. Day 6–7: Tweak synonym dictionary, measure accuracy rate, and document any recurring issues.
  Concise: automate extraction, force a Units and Provenance column, and treat Confidence as a gate to manual review. That converts AI speed into reliable KPIs you can act on.
  
  Your move.
- Oct 3, 2025 at 12:46 pm #128648
  aaron
  Participant
  Strong foundation — the Units, Provenance and Confidence gates are the right controls. I’ll add a schema-first approach, built-in cross-checks, and calendarization so your outputs are immediately comparable across peers and periods.
  
  Do / Do not (make comparisons clean)
  - Do define a fixed schema before extracting (fields, types, period labels).
  - Do run arithmetic cross-checks (e.g., GrossProfit = Revenue − CostOfRevenue) and record pass/fail.
  - Do extract multiple periods per filing (last 4 quarters + last 3 fiscal years) to enable trendlines.
  - Do calendarize (map fiscal quarters to calendar quarters) for true peer comparability.
  - Do not convert currencies unless the filing provides explicit rates; record Currency and keep native.
  - Do not ignore non‑GAAP reconciliations; capture them separately to avoid mixing bases.
  What you’ll need
  - Filings (10‑K/10‑Q) as text or OCR’d PDF.
  - AI chat that can follow structured instructions.
  - Spreadsheet template with: Schema headers, Units, Provenance, Confidence, Checks, and a small synonym map.
  - Optional: a simple calendarization table (FiscalYearEndMonth and QuarterOffset).
  Step-by-step (schema-first, automation-friendly)
  1. Set the schema: CompanyName, Ticker, FilingDate, PeriodType (FY/Q), PeriodEnd, FiscalYear, FiscalQuarter, Currency, Units, Revenue, CostOfRevenue, GrossProfit, OperatingIncome, NetIncome, DilutedShares, DilutedEPS, TotalAssets, TotalLiabilities, CashAndCashEquivalents, NonGAAP_OperatingIncome, NonGAAP_Adjustments, SegmentBreakout, GuidanceRevenueNextQuarter, Confidence, Provenance, Checks.
  2. Extract: Paste filing text into the AI with the prompt below. Expect 4–8 rows per filing (quarters + latest FY) when available.
  3. Import and normalize: Drop the CSV into your template. Force numeric columns, standardize Units (e.g., USD millions) and keep Currency as-is.
  4. Run cross-checks: In-sheet formulas confirm arithmetic (e.g., =Revenue−CostOfRevenue−GrossProfit should be 0). Flag anything with absolute variance > 1%.
  5. Calendarize: Add a column CalendarQuarter using a simple mapping based on FiscalYearEndMonth (e.g., if FYE = Jan, FQ1 → CQ1; if FYE = Jun, FQ1 → CQ3). This makes quarter-on-quarter peer comparisons meaningful.
  6. Review exceptions: Filter Confidence = LOW or Checks contains “FAIL.” Spot-check those rows vs PDF pages noted in Provenance.
  Copy-paste AI prompt (robust; multi-period; audit-friendly)
  
  You are extracting KPIs from a 10-K/10-Q text dump. Return a CSV with one row per period found (up to last 4 quarters and last 3 fiscal years). Use these headers exactly: CompanyName,Ticker,FilingDate,PeriodType,PeriodEnd,FiscalYear,FiscalQuarter,Currency,Units,Revenue,CostOfRevenue,GrossProfit,OperatingIncome,NetIncome,DilutedShares,DilutedEPS,TotalAssets,TotalLiabilities,CashAndCashEquivalents,NonGAAP_OperatingIncome,NonGAAP_Adjustments,SegmentBreakout,GuidanceRevenueNextQuarter,Confidence,Provenance,Checks. Rules: (1) Map synonyms (e.g., Net sales→Revenue; Income from operations→OperatingIncome). (2) Normalize units within the row (thousands/millions) and state Units (e.g., USD millions). Do not convert currency; record Currency. (3) If tables split across pages, merge. (4) Record Provenance with page number or nearby heading. (5) Compute checks: GP=Revenue−CostOfRevenue; EPS≈NetIncome/DilutedShares (if shares present); flag FAIL if difference >1%. Summarize checks in Checks field (e.g., “GP PASS; EPS FAIL”). (6) If any item missing, write NA. (7) For NonGAAP_OperatingIncome and NonGAAP_Adjustments, use the reconciliation section if available. (8) SegmentBreakout: list as “SegmentName:Value” pairs separated by semicolons. (9) Confidence=LOW if units/currency ambiguous or checks fail; else HIGH.
  
  Worked example (expected output snippet)
  
  Acme Inc,ACME,2024-02-28,FY,2023-12-31,2023,NA,USD,USD millions,5000,3000,2000,500,200,103,1.94,10000,6000,1200,450,”Stock-based comp add-back”,Consumer:3200;Enterprise:1800,NA,HIGH,”p.45 Consolidated Statements”,”GP PASS; EPS PASS”Acme Inc,ACME,2024-02-28,Q,2024-03-31,2024,Q1,USD,USD millions,1300,780,520,120,55,104,0.53,10150,6050,1180,110,”SBP add-back”,Consumer:800;Enterprise:500,1350,HIGH,”p.16 Condensed Ops; p.70 Reconciliation”,”GP PASS; EPS PASS”
  
  What to expect
  - First pass yields high accuracy on headline items; expect 5–10 minutes to clear LOW-confidence exceptions per filing.
  - Calendarization makes ranking by quarter and rolling 4Q trivial; add a TTM column if you track trends.
  Metrics to track (weekly)
  - Cross-check pass rate: GP/EPS checks passed / total rows.
  - Exception rate: % rows with Confidence = LOW.
  - Time-to-ingest: minutes from paste to validated rows.
  - Coverage: average number of periods extracted per filing.
  Common mistakes & fixes
  - Share count mismatches (basic vs diluted) — fix: always use diluted for EPS checks; extract DilutedShares explicitly.
  - Non‑GAAP blended with GAAP — fix: separate fields; never overwrite GAAP OperatingIncome.
  - Foreign currency — fix: store Currency; if management gives guidance in a different currency, capture it as-is and note in Provenance.
  - Fiscal vs calendar drift — fix: use the FYE month to map FQ→CQ before comparing peers.
  1-week action plan
  1. Day 1: Set up schema columns and cross-check formulas in your template. Add a fiscal calendar mapping tab.
  2. Day 2: Run the prompt on 3 companies; import; normalize Units/Currency; resolve LOW-confidence rows.
  3. Day 3: Add segment and non‑GAAP fields for the same 3 companies; verify Provenance pages.
  4. Day 4: Expand to 5 more companies; enable CalendarQuarter and TTM formulas.
  5. Day 5: Build a comparison view ranking CQ growth, Gross/Operating margin, and TTM EPS.
  6. Day 6: Review exception patterns; update synonym map; tighten checks threshold if noise is low.
  7. Day 7: Freeze a baseline, measure your metrics, and decide which parts to batch next week.
  Outcome: multi-period, audit-ready CSVs with built-in checks and calendarization. That’s how you move from quick extraction to decision-grade peer comparisons.
Author

Posts

Viewing 4 reply threads

BBP_LOGGED_OUT_NOTICE

QUICK LINKS

RESOURCES

MEMBERSHIP

Practical ways AI can extract and compare KPIs from public company filings