enetlvfiplru

How does CostPocket robot digitisation work?

Extracting data reliably, accurately, and quickly from submitted PDF documents and pictures requires a lot of effort and specialized tools. At CostPocket, we’ve developed a robot that uses a variety of advanced technologies — including OCR (Optical Character Recognition), machine learning, algorithms, company databases, language-specific rules and templates, and AI — to process hundreds of thousands of documents every month.

While the CostPocket application does much more than just data digitization, if your business only needs data extraction, you can integrate our DIGI product into your systems. Learn more at digi.costpocket.com.

The digitization process with CostPocket typically takes 2-3 seconds (not including document upload time, which depends on your device and internet connection) and follows these steps:

1. Pre-formatting input. We support multiple file types, and the input file is preprocessed to prepare it for digitization. This includes cleaning up images, standardizing file formats, and enhancing quality.
2. Optical Character Recognition (OCR). All text, along with its position on the page, is extracted from the image.
3. Parsing OCR output. Our in-house model analyzes the extracted text and identifies recognizable data structures.
4. Identifying document origin & language. Because data structures vary by country, the robot recognizes the document’s origin and applies specific rules it has learned over years of processing international cases.
5. Extracting accounting data. The robot combines global data patterns and AI-generated, format-specific rules to accurately identify the relevant accounting fields.
6. Data validation. Some data fields (company info, registration and VAT codes, VAT rates, etc.) are cross-checked against public databases for accuracy.
7. Returning results. The submitter reviews and confirms the digitized data before it is sent to the chosen accounting software.

Our algorithms for data recognition (step 3) are constantly improving. Every week, we update the CostPocket robot with human-verified data so it can learn from past errors and enhance its accuracy over time.

Example of Input and Results

Input

After submission, the CostPocket robot digitizes and returns the following data:
• Issue date: 2020-08-23
• Total amount: 38.08
• VAT: 6.61
• Document ID: 1434421
• Currency: EUR
• Supplier
   ○ Name: Circle K Latvia SIA
   ○ Address: Rīga, Duntes iela 6
   ○ Postal Code: LV-1013
   ○ Registration code: 40003064094
   ○ VAT code: LV40003064094