CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

Rasmus Berg Palm, Ole Winther, Florian Laws

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

106 Downloads (Pure)

Abstract

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.
Original languageEnglish
Title of host publicationProceedings of 2017 14th IAPR International Conference on Document Analysis and Recognition
PublisherIEEE
Publication date2017
Pages406-413
ISBN (Print)9781538635858
DOIs
Publication statusPublished - 2017
Event2017 14th IAPR International Conference on Document Analysis and Recognition
- Kyoto Terrsa , Kyoto, Japan
Duration: 13 Nov 201715 Nov 2017

Conference

Conference2017 14th IAPR International Conference on Document Analysis and Recognition
LocationKyoto Terrsa
CountryJapan
CityKyoto
Period13/11/201715/11/2017
Series2017 14th Iapr International Conference on Document Analysis and Recognition (icdar)
ISSN2379-2140

Fingerprint Dive into the research topics of 'CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks'. Together they form a unique fingerprint.

Cite this