Layout analysis is a key step in many automatic document processing applications, such as analyzing invoices or pay slips, or labeling customer letters with topics. As a lot of companies still have to deal with a significant number of scanned documents on a daily basis, using artificial intelligence techniques to efficiently analyze these documents will save a lot of time and resources.

DSPH exploits characteristics of human perception to discriminate between text and non- text homogeneity. We define this text homogeneity mathematically, which allows us to calculate the likelihood of a component being a text using a comprehensive statistical model. 

On September 29th 2019,  Prof. Ann Dooms and PhD student Tan Lu from the Digital Mathematics Research Group (DIMA) won the 1st prize on the document layout analysis contest at the ICDAR conference with the DSPH algorithm.  

There were 9 academic and 3 industrial participants from France, India, China, the Czech Republic and Vietnam. Their method, Document Segmentation using Probabilistic Homogeneity, outperformed the other algorithms. The 3 industrial contributions were  ABBYY FineReader engines (FRE 11 and FRE 12) and Google Tesseract 4.


Please contact us if you are interested in licensing this technology. We are more than happy to give further details about the DSPH algorithm in a meeting.

Contact: Leander Schietgat, business developer Artificial Intelligence Lab
E-mail: leander.schietgat@vub.be