Stage-oe-small.jpg

Thema4675: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
K
 
Zeile 7: Zeile 7:
 
|Partner=FIZ Karlsruhe
 
|Partner=FIZ Karlsruhe
 
|Forschungsgruppe=Information Service Engineering
 
|Forschungsgruppe=Information Service Engineering
|Abschlussarbeitsstatus=Vergeben
+
|Abschlussarbeitsstatus=Abgeschlossen
 
|Beginn=2021/04/01
 
|Beginn=2021/04/01
 +
|Abgabe=2021/10/15
 
|Ausschreibung=Handwritten and Printed Text Separation in Historical Documents_New Version.pdf
 
|Ausschreibung=Handwritten and Printed Text Separation in Historical Documents_New Version.pdf
 +
|Ergebnisse=Anastasia-Prikhodina-Handwritten-and-Printed-Text-Separation-in-Historical-Documents.pdf
 
|Beschreibung DE=Objective of this work:  
 
|Beschreibung DE=Objective of this work:  
  

Aktuelle Version vom 21. Januar 2022, 13:06 Uhr



Handwritten and Printed Text Separation in Historical Documents


Anastasia Prikhodina



Informationen zur Arbeit

Abschlussarbeitstyp: Bachelor
Betreuer: Harald SackOleksandra VsesviatskaMahsa Vafaie
Forschungsgruppe: Information Service Engineering
Partner: FIZ Karlsruhe
Archivierungsnummer: 4675
Abschlussarbeitsstatus: Abgeschlossen
Beginn: 01. April 2021
Abgabe: 15. Oktober 2021

Weitere Informationen

Objective of this work:

With the increase of digitized documents, automatic document analysis has become extremely important. The presentation of historical documents to the public introduces a variety of document types, content, quality and structure. Fundamentally speaking, documents can be skewed, noisy, and overlapped with graphics, i.e., lines, unconstrained annotations, stamps. Most optical character recognition (OCR) systems recognize either printed or handwritten text. Hence, the task of the thesis is to separate machine printed text from handwritten text in scanned documents before feeding it to an OCR system.


In this thesis:

  • Documents containing a mix of handwritten and printed text will be collected.
  • An additional mixed dataset may be generated from historical documents.
  • The existing approaches of text separation will be reviewed and investigated.
  • A pixel-based approach for text separation based on [1] will be applied.
  • The results will be evaluated based on the ground truth data.


[1] Dutly, N., Slimane, F., & Ingold, R. (2019, September). Phti-ws: A printed and handwritten text identification web service based on fcn and crf post-processing. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 2, pp. 20-25). IEEE.


The project work will be supervised by Prof. Dr. Harald Sack, Mahsa Vafaie and Oleksandra Bruns, Information Service Engineering at Institute AIFB, KIT, in collaboration with FIZ Karlsruhe.


Keywords:

Machine Learning, CNN, pattern recognition


Pre-requisites:

Knowledge of Programming with Python.


Ausschreibung: Download (pdf)


Download: Download (pdf)