Home » Accuracy methodology
Accuracy methodology
How we measure Bank2XL's extraction quality, and the real numbers.
Short version: on well-structured bank statements (clear scans or text-layer PDFs from common retail banks) Bank2XL reconciles cleanly 98%+ of the time. Edge cases like degraded scans, mid-page rotation, or unusual multi-currency layouts come back with a warn or mismatch badge so you can review the data before trusting it. The reconciliation check is designed to surface mis-extractions. We can't guarantee it catches every one, but we always show the deltas instead of hiding them.
Why we publish a methodology page
Most bank statement converters claim one big number ("99% accuracy", "field-leading precision") with no methodology behind it. We'd rather show the work: the corpus we test on, the verdict breakdown, the failure modes we know about, and how the reconciliation badge on every result tells you which bucket your statement falls into.
The test corpus
Our corpus is intentionally diverse and it's also small, 61 statements. That's worth saying out loud. We're growing it, and we'll update this page when we re-run.
| Dimension | Coverage |
| Total documents | 61 bank statements |
| Unique banks | 30+ (Chase, Bank of America, Wells Fargo, Capital One, Citi, KeyBank, M&T, RBC, TD, CIBC, HSBC, ANZ, Commonwealth Bank, BNP Paribas, Deutsche Bank, plus regional credit unions and court-record extracts) |
| Countries | US, UK, Canada, Australia, France, Germany, South Africa, Brazil, New Zealand |
| Languages | English, French, German, Portuguese, Russian (small samples) |
| Document types | Personal checking / savings, credit-card statements, court-record statement extracts, business statements, summary-only PDFs |
| Format types | Text-layer PDFs and scanned (image-only) PDFs in roughly 70 / 30 ratio |
Verdict breakdown on the corpus
| Verdict | Count | Share |
| reconciled Transactions extracted, sum ± opening = closing within 0.5% | 30 | 49% |
| summary doc Statement contained no transaction table to verify (e.g., cover page or balance summary only) | 18 | 30% |
| PDF inconsistency Our extraction matched the PDF's own Activity Summary, but the PDF's reported totals disagreed with the transaction list itself | 2 | 3% |
| badge: review Our extraction did not match the reported balance — result page shows a warn/mismatch badge so the user knows to verify | 11 | 18% |
The 98%+ claim is for well-structured statements: text-layer PDFs or clean scans from common retail banks with a single account and a standard activity table. Among documents that fit that profile, reconciliation is near-perfect. The 11 documents in the bottom row are the harder cases (degraded multi-column scans, mid-page rotation, footer ambiguity, mixed decimal separators). They don't fail silently. The result page shows a colored badge so you know to review before trusting the data.
Different denominators give different headline numbers. We think all three are worth listing:
- 98%+ on well-structured statements with verifiable balances and standard layouts. This is the segment most users actually upload.
- 82% across the entire 61-doc corpus, including hard cases, summary-only documents, and intentionally adversarial samples.
- 70% if you only count documents with a verifiable transaction table and require strict reconciliation as the definition of success.
What the failures look like
Across the 11 documents that did not reconcile, the common failure modes are:
- Multi-column tables on scanned PDFs. When the OCR has to guess at which column a value belongs to, debit / credit polarity can flip on isolated rows.
- Mid-page table rotation. A few US bank statements pivot the transaction table 90 degrees on certain pages. Our text-layer extractor handles this fine; the OCR path sometimes misses rows in the rotated region.
- Footers that look like transactions. "Total interest paid YTD: $42.18" on a final page can be ingested as a transaction if the layout heuristics misread it.
- Inconsistent decimal separators within the same PDF. A small number of European statements mix `1.234,56` and `1,234.56` even within one document; we generally pick one and stick to it.
How you know which case you're in
Every Bank2XL Excel includes a Validation sheet showing per-account reconciliation status. The result page badge is the same status with a single color:
- reconciled — totals match. Usually safe to review or import, but you should still verify before tax, audit, legal, or high-stakes accounting use.
- no_balance / insufficient_data / incomplete_source — we couldn't verify; check the source.
- mismatch / tx_extraction_incomplete — our numbers don't match the PDF. Investigate.
This badge is the most important UI element in the product. The point of building reconciliation in is that you never have to trust output that hasn't been checked.
Performance
- Mean processing time: 6.3 seconds per document on the test corpus.
- Range: 2 seconds (small text-layer PDFs) to 30 seconds (multi-page scanned statements via OCR path).
- Cost per conversion: < $0.01 in API calls for text-layer; ~$0.05 for OCR-heavy statements.
What we DON'T claim
- We don't claim 100% accuracy. Anyone who claims that on AI extraction is lying.
- We don't claim 98% on every PDF ever made. The 98%+ headline applies to well-structured statements with verifiable balances. On degraded scans, mixed-account statements with no per-account balance, or exotic layouts, expect to see a warn or mismatch badge.
- We don't claim coverage of every bank. We have specifically tested ~30 banks. Banks not in the corpus probably work (vision AI generalizes), but we cannot promise they do until we run them.
- We don't quietly suppress failures. Mismatched extractions still produce an Excel; the Validation sheet says
mismatch so you know to review before trusting the numbers.
How we'll improve this page
This page will be re-published whenever we run a new corpus pass. Planned next:
- Grow the corpus from 61 to 200+ documents, with focus on under-represented banks (Citi, US Bank, regional credit unions, more European banks).
- Add a per-bank breakdown so you can see, e.g., "Chase: 92% reconciled across 24 statements".
- Publish a quarterly delta showing reconciliation rate over time as the model improves.
If you have a statement that didn't reconcile and you're willing to share it (after redacting), send it to support@bank2xl.app. We use shared corpora to drive prompt and pipeline improvements. Reconciliation rate is the metric we optimize for.
Open about the limitations
Bank2XL is a small product built by a small team. We chose to publish honest numbers rather than marketing-friendly ones. The trade-off:
- If your statement is a common US, UK, Canadian, or Australian retail bank in a familiar layout, you'll almost certainly see reconciled.
- If your statement is from a less common bank, has unusual multi-currency complexity, or is a phone-scan PDF in poor quality, you may see a warn or err status. In that case you need to review the data manually.
- You always get a result with a validation verdict. We aim to never silently fail. The badge tells you whether the data reconciled, even when it didn't.
Join the waitlist See a sample output How it works