DOCUMENT LAYOUT ANALYSIS

Prof. Ann Dooms and Ph.D. student Tan Lu from the Digital Mathematics Research Group (DIMA) won the 1st prize on the document layout analysis contest at the ICDAR conference with the DSPH algorithm.

There were 9 academic and 3 industrial participants from France, India, China, the Czech Republic, and Vietnam. Their method, Document Segmentation using Probabilistic Homogeneity, outperformed the other algorithms. The 3 industrial contributions were ABBYY FineReader engines (FRE 11 and FRE 12) and Google Tesseract 4.

Layout analysis is a key step in many automatic document processing applications, such as analyzing invoices or payslips, or labeling customer letters with topics. As a lot of companies still have to deal with a significant number of scanned documents daily, using artificial intelligence techniques to efficiently analyze these documents will save a lot of time and resources.

DSPH exploits characteristics of human perception to discriminate between text and non-text homogeneity. We define this text homogeneity mathematically, which allows us to calculate the likelihood of a component being a text using a comprehensive statistical model.

Please contact us if you are interested in licensing this technology. We are more than happy to give further details about the DSPH algorithm in a meeting.

Contact: Leander Schietgat, Business Developer Artificial Intelligence Lab

E-mail: leander.schietgat@vub.be

More information about the Digital Mathematics Research Group

More information about DSPH

Leave a Reply