enetlvfiplru

digi.costpocket.com precision, calculations and validation

Introduction

The CostPocket Automatic Expense Document Digitisation System (lets call it robot) is a complicated, fast and precise system for finding data from accounting documents (invoices, receipts, waybills) that is needed for different accounting purposes 

Our robot processes about 700k documents per month (04.2025) and is used by many softwares:

  • those owned by us (CostPocket OÜ): CostPocket app, CostPocket cloud, CostPocket DIGI, Outvoicer
  • accounting softwares through CostPocket DIGI
  • other companies for their internal accounting system through CostPocket DIGI

We handle documents from 77 countries and 52 langauges from Europe, Asia, Oceania, Americas, Arabic countries and Africa, and are constantly adding more.

Fields

The robot can detect the following fields:

Issue date, due date, document number, currency, subtotal, total VAT, grand total, bank accounts, reference number, debit/credit card last 4 numbers, electricity consumption, origin country code, VAT rows (one row for each VAT rate), item lines (one line for each product: description, product code, unit, quantity, neto sum, bruto sum, neto price, bruto price, sum without discount, isDiscount, type (product/service), order number), rounding (rounding of item lines), document type (invoice or receipt), document direction (debit or credit), order number, fuel rows (example: gas station).

Finding methods

We developed the first prototype 10y ago, and have been improving it continuously, whilst keeping up with modern technological possibilites and staying ahead of competitors.

The robot comprises of several different subsystems:

  • Most improtant: our own advanced algorithms
  • OCR (text recognition)
  • AI systems
  • Our own machine learning system
  • Country specific property and language collections
  • Company data registeries

Data digitisation

The digitisation base is our own algorithms (code), which have been tailored over the years to handle similarities and also exceptions by analysing tens of thousands of expense documents. This data is complemented by the other systems mentioned above. And finally cleaned/validated with many conditions and checks.

In case of high uncertanty, values are rather left blank (no random guessing) by the robot.

Numerical values (totals, VAT rows, item lines) are validated strictly. To achieve this, there are multi-step processes to ensure that the numeric results match: Finding grand total -> finding VAT rows -> finding discounts -> finding item lines. Found results are compared and are in dependance of each other, not parsed individually.

Data validation

All fields are validated with relevant subsystems (AI, machine learning, registeries). Some fields have specific conditions. Please see them here: https://costpocket.com/en/digi-tutorial-format

Data format

The JSON output format is strict and doesn't change without comminucation from the CostPocket team.

Please view details about the format here:   https://costpocket.com/en/digi-tutorial-format

Data precision

Please the digitisation precision overview here:  https://costpocket.com/en/learn/robot-digitisation-precision

Example

Please find an example JSON here:  https://costpocket.com/en/digi-tutorial-format