Abstract
Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available, at no additional cost. Unfortunately, state-of-the-art word classification methods for information extraction cannot use this data, instead requiring word-level labels which are expensive to create and consequently not available for many real life tasks. In this paper we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on end-to-end data, bypassing the need for word-level labels. We evaluate the proposed architecture on a large diverse set of invoices, and outperform a state-of-the-art production system based on word classification. We believe our proposed architecture can be used on many real life information extraction tasks where word classification cannot be used due to a lack of the required word-level labels.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2019 International Conference on Document Analysis and Recognition |
| Publisher | IEEE |
| Publication date | 2019 |
| Pages | 329-336 |
| Article number | 8977951 |
| ISBN (Print) | 9781728128610 |
| DOIs | |
| Publication status | Published - 2019 |
| Event | 15th IAPR International Conference on Document Analysis and Recognition - International Convention Centre, Sydney, Australia Duration: 20 Sept 2019 → 25 Sept 2019 Conference number: 15 |
Conference
| Conference | 15th IAPR International Conference on Document Analysis and Recognition |
|---|---|
| Number | 15 |
| Location | International Convention Centre |
| Country/Territory | Australia |
| City | Sydney |
| Period | 20/09/2019 → 25/09/2019 |