A machine handwriting recognition model was created in the National Széchény Library

A machine handwriting recognition model was created in the National Széchény Library

Handwriting recognition is performed by an algorithm based on artificial intelligence.

The first publicly available computerized handwriting recognition model in Hungarian has been published. The so-called HTR (Handwritten Text Recognition) model was created by the Digital Arts Center of the National Széchény Library (OSZK DBK) and made available to everyone as a component of the Transkribus software. It was founded by József Kiss, the 19-20. It consists of the professional and personal correspondence of the editor of A Hét, a Hungarian poet who lived at the turn of the 20th century and was considered a forerunner of the West. Handwriting recognition is performed by an algorithm based on artificial intelligence. It must first be taught to recognize different handwritings, and then build a model based on the given samples, with which it will be able to interpret an unknown, never-before-seen handwriting image. The more variety of materials we teach, the better he performs on different texts. The current model first learned from the handwriting of József Kiss, then from the mixed handwriting of the correspondence partners. The manuscripts used so far can be found in the Petőfi Literary Museum (PIM), the number of words used for learning is approximately 75,000. Documents include envelopes, postcards, traditional and letterhead letters, and business cards. The letter writers were József Kiss and his family, as well as writers, journalists and artists of the turn of the century, such as Endre Ady, Zsigmond Móricz or István Tömörkeny. This means a total of 300 letters of varying length and quality, which the DBK continuously publishes for readers on the website dhupla.hu/collection/kiss-jozsef-levelezes. Additional manuscripts of the exchange of letters are currently being processed in OSZK and PIM.The model currently works with an error rate of 9.19, which means that it can determine the characters of the text with almost 90% accuracy in the project.

In the future, the various Hungarian-language projects that use automatic handwriting recognition must work together to create a more and more general tool for the digitization of Hungarian manuscript sources by integrating the models trained on their own text corpora. It is in our common interest that the cultural treasures hidden in these public collections become accessible, readable, searchable as text in the digital space, can be processed and researched with computer tools, as we can see in the case of the text editions published on the dhupla.hu website and the corresponding creative content. The first Hungarian handwriting recognition model that has just been made public is an important milestone in this process.The new Hungarian handwriting recognition model can be found in the Transkribus desktop application and on the web interface as well.

Hardware, software, tests, interesting and colorful news from the world of IT by clicking here!

Leave a Comment

Your email address will not be published.