The CostPocket Automatic Expense Document Digitisation System (lets call it robot) is a complicated, fast and precise system for finding data from accounting documents (invoices, receipts, waybills) that is needed for different accounting purposes
Our robot processes about 700k documents per month (04.2025) and is used by many softwares:
We handle documents from 77 countries and 52 langauges from Europe, Asia, Oceania, Americas, Arabic countries and Africa, and are constantly adding more.
The robot can detect the following fields:
Issue date, due date, document number, currency, subtotal, total VAT, grand total, bank accounts, reference number, debit/credit card last 4 numbers, electricity consumption, origin country code, VAT rows (one row for each VAT rate), item lines (one line for each product: description, product code, unit, quantity, neto sum, bruto sum, neto price, bruto price, sum without discount, isDiscount, type (product/service), order number), rounding (rounding of item lines), document type (invoice or receipt), document direction (debit or credit), order number, fuel rows (example: gas station).
We developed the first prototype 10y ago, and have been improving it continuously, whilst keeping up with modern technological possibilites and staying ahead of competitors.
The robot comprises of several different subsystems:
The digitisation base is our own algorithms (code), which have been tailored over the years to handle similarities and also exceptions by analysing tens of thousands of expense documents. This data is complemented by the other systems mentioned above. And finally cleaned/validated with many conditions and checks.
In case of high uncertanty, values are rather left blank (no random guessing) by the robot.
Numerical values (totals, VAT rows, item lines) are validated strictly. To achieve this, there are multi-step processes to ensure that the numeric results match: Finding grand total -> finding VAT rows -> finding discounts -> finding item lines. Found results are compared and are in dependance of each other, not parsed individually.
All fields are validated with relevant subsystems (AI, machine learning, registeries). Some fields have specific conditions. Please see them here: https://costpocket.com/en/digi-tutorial-format
The JSON output format is strict and doesn't change without comminucation from the CostPocket team.
Please view details about the format here: https://costpocket.com/en/digi-tutorial-format
Please the digitisation precision overview here: https://costpocket.com/en/learn/robot-digitisation-precision
Please find an example JSON here: https://costpocket.com/en/digi-tutorial-format