How does CostPocket robot digitisation work?

Digitisation process with CostPocket takes on average 2-3 seconds (excluding document upload time, which depends on user device and connection) and is split in the following steps:

1. Pre-formatting input - with multiple file types supported, the input file is preprocessed to make it ready for digitisation {how is pre formatting happening, what are specific things?}
2. Optical Character Recognition (OCR) - all text with metadata (positioning) is extracted from the image
3. Parsing OCR output - CostPocket in-house model analyses and marks the recognizable data structures
4. Identifying document origin & language - data structures often follow different patterns based on document country of origin. Over the years our robot has learned a lot of country specific cases and exceptions
5. Extracting accounting data - CostPocket robot identifies data fields by combining global data structure patterns and sets of AI generated rules specific to document format and origin.
6. Data validation - we confirm some of the data fields (company info, registration and VAT codes, VAT rate, etc.) with public databases
7. Returning results - submitter confirms the digitised data and data is sent to the selected accounting software

Note, that sets of algorithms for data recognition in step #3 are constantly evolving. Every week we feed the CostPocket robot with human verified data so it can learn from its mistakes and improve its recognition in the future.

Example of Input and Results

Input

From the receipt above CostPocket robot digitises the following data:
• Issue date: 2020-08-23
• Total amount: 38.08
• VAT: 6.61
• Document ID: 1434421
• Currency: EUR
• Supplier
   ○ Name: Circle K Latvia SIA
   ○ Address: Rīga, Duntes iela 6
   ○ Postal Code: LV-1013
   ○ Registration code: 40003064094
   ○ VAT code: LV40003064094