Sadeghian, Rasoul, Shahin, Shahrooz and Sareh, Sina ORCID: https://orcid.org/0000-0002-9787-1798, 2024, Conference or Workshop, CVOCR: Context Vision OCR at 2024 20th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), Genova, Italy, 02-04 September 2024.
Abstract or Description: | Optical Character Recognition (OCR) technologies are crucial for automated information extraction across various domains. However, the intricate layouts and diverse text properties often found on different products can complicate accurate data retrieval and categorization. This paper introduces Context Vision OCR (CVOCR), a versatile framework designed to address the proposed challenges using advanced image processing and text analysis techniques. While CVOCR is applicable to any OCR-related application, this paper focuses on pharmaceutical items as a case study due to the stringent accuracy requirements and the complexity of medicine packaging. The CVOCR algorithm is developed based on the integration of the Fast Super-Resolution Convolutional Neural Network (FSRCNN) for enhanced image clarity, LayoutLMv2 for spatial layout understanding, Tesseract OCR for robust character recognition, and GPT-Neo for advanced contextual analysis. The strategic integration of these components form a cohesive system that significantly improves text detection and interpretation accuracy. We demonstrate the efficacy of the CVOCR system through testing on various pharmaceutical products, where it consistently outperforms Tesseract OCR. |
---|---|
Subjects: | Other > Engineering > H600 Electronic and Electrical Engineering > H670 Robotics and Cybernetics > H671 Robotics |
School or Centre: | Other Research & Innovation School of Design |
Date Deposited: | 30 Jul 2024 12:25 |
Last Modified: | 30 Jul 2024 12:25 |
URI: | https://researchonline.rca.ac.uk/id/eprint/5915 |
Edit Item (login required) |