Approximately 300 million people worldwide suffer from depression, and more than 60% of psychiatric patients do not have access to mental health services due to the shortage of psychiatrists and the high costs associated with clinical diagnosis and treatment. Correct and efficient diagnosis of depression can help overcome these straits. Automatic detection of depressive symptoms can help improve the accuracy and availability of diagnosis. In this paper, a fusion feature for Bispectral Features and Bicoherent Features by using higher-order spectral analysis. Experiments were performed on the Depression Sub-Challenge Dataset of the Audio/Visual Emotion Challenge 2017. The fusion feature fuses higher-order spectral features and traditional speech features with classification weights greater than 100 extracted by using A Collaborative Voice Analysis Repository. The support vector machine and k-nearest neighbor classification algorithms were used as the traditional machine learning models, and the convolutional neural network was used as the deep learning model to verify the proposed features. The experimental results show that under the support vector machine algorithm, the accuracies of extraction of speech-related features by using a collaborative voice analysis repository, The higher-order spectral analysis, and their fusion features were 63.15%, 68.42%, and 73.68%, respectively. Under the k-nearest neighbor classification algorithms model algorithm, the corresponding accuracies were 68.18%, 72.73%, and 77.27%, respectively. For the convolutional neural network model, the corresponding accuracies were 70%, 77%, and 85%, respectively. The results demonstrate that the fusion feature recognition accuracy is high and can be employed to improve the accuracy of depression identification by using traditional machine learning and deep learning models.
Elsevier, Speech Communication, Volume 143, September 2022, Pages 46-56