Optical Character Recognition and Named Entity Recognition for Highly Confidential Documents

dc.contributor.authorMuhammad Ahmed Al-Desouki
dc.date.accessioned2025-06-19T10:34:51Z
dc.date.issued2024-05
dc.description.abstractOptical character recognition (OCR) is a crucial technique for extracting textual data from various sources, reducing human labor, and enhancing accessibility. Named Entity Recognition (NER) organizes and categorizes data, while Regular expression (Regex) patterning facilitates data extraction from OCR-read text. This technology reduces human labor for extracting large amounts of confidential and sensitive data, improving accessibility and preservation, especially in confidential and sensitive situations. The study utilizes the Tesseract OCR tool and the Marefa-NER NER Model, combining Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Natural Language Processing (NLP) techniques. The technologies have been successfully integrated into websites, and have proven their effectiveness in accurately identifying textual content and categorizing it using OCR, NER, and Regex patterns. The combination of OCR, NER, and Regex pattern matching has proven to be a successful and efficient method for extracting textual information from various sources, reducing human effort and improving accessibility, particularly in cases of confidentiality and sensitivity
dc.identifier.urihttps://research.arabeast.edu.sa/handle/123456789/266
dc.language.isoen
dc.publisherInternational Journal of Computer Applications
dc.titleOptical Character Recognition and Named Entity Recognition for Highly Confidential Documents
dc.typeArticle

ملفات

الحزمة الرئيسية

يظهر الآن 1 - 1 من 1
جاري التحميل...
صورة مصغرة
الاسم:
للحصول صورة.png
الحجم:
1.02 MB
تنسيق:
Portable Network Graphics

حزمة الترخيص

يظهر الآن 1 - 1 من 1
جاري التحميل...
صورة مصغرة
الاسم:
license.txt
الحجم:
1.71 KB
تنسيق:
Item-specific license agreed to upon submission
الوصف: