MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

جاري التحميل...
صورة مصغرة

التاريخ

المؤلفين

عنوان الدورية

ردمد الدورية

عنوان المجلد

الناشر

CMC – Computers, Materials, and Continua

خلاصة

Depression is a prevalent mental health issue affecting individuals of all age groups globally. Similar to other mental health disorders, diagnosing depression presents significant challenges for medical practitioners and clinical experts, primarily due to societal stigma and a lack of awareness and acceptance. Although medical interventions such as therapies, medications, and brain stimulation therapy provide hope for treatment, there is still a gap in the efficient detection of depression. Traditional methods, like in-person therapies, are both time-consuming and labor-intensive, emphasizing the necessity for technological assistance, especially through Artificial Intelligence. Alternative to this, in most cases it has been diagnosed through questionnaire-based mental status assessments. However, this method often produces inconsistent and inaccurate results. Additionally, there is currently a lack of a comprehensive diagnostic framework that could be effective in achieving accurate and robust diagnostic outcomes. For a considerable time, researchers have sought methods to identify symptoms of depression through individuals’ speech and responses, leveraging automation systems and computer technology. This research proposed MDD which is composed of multimodal data collection, preprocessing, and feature extraction (utilizing the T5 model for text features and the Wave Net model for speech features). Canonical Correlation Analysis (CCA) is then used to create correlated projections of text and audio features, followed by feature fusion through concatenation. Finally, depression detection is performed using a neural network with a sigmoid output layer. The proposed model achieved remarkable performance, on the Distress Analysis Interview Corpus-Wizard (DAIC-WOZ) dataset, it attained an accuracy of 92.75%, precision of 92.05%, and recall of 92.22%. For the E-DAIC dataset, it achieved an accuracy of 91.74%, precision of 90.35%, and recall of 90.95%. Whereas, on CD-III dataset (Custom Dataset for Depression), the model demonstrated an accuracy of 93.05%, precision of 92.12%, and recall of 92.85%. These results underscore the model’s robust capability in accurately diagnosing depressive disorder, demonstrating the efficacy of advanced feature extraction methods and improved classification algorithm.

الوصف

كلمات رئيسية

اقتباس

Endorsement

Review

item.page.supplemented

item.page.referenced