During long-term electrocardiogram (ECG) monitoring, various types of noise inevitably become mixed with the signal, potentially hindering doctors' ability to accurately assess and interpret patient data. Therefore, evaluating the quality of ECG signals before conducting analysis and diagnosis is crucial. This paper addresses the limitations of existing ECG signal quality assessment methods, particularly their insufficient focus on the 12-lead multi-scale correlation. We propose a novel ECG signal quality assessment method that integrates a convolutional neural network (CNN) with a squeeze and excitation residual network (SE-ResNet). This approach not only captures both local and global features of ECG time series but also emphasizes the spatial correlation among ECG signals. Testing on a public dataset demonstrated that our method achieved an accuracy of 99.5%, sensitivity of 98.5%, and specificity of 99.6%. Compared with other methods, our technique significantly enhances the accuracy of ECG signal quality assessment by leveraging inter-lead correlation information, which is expected to advance the development of intelligent ECG monitoring and diagnostic technology.
Emotion can reflect the psychological and physiological health of human beings, and the main expression of human emotion is voice and facial expression. How to extract and effectively integrate the two modes of emotion information is one of the main challenges faced by emotion recognition. In this paper, a multi-branch bidirectional multi-scale time perception model is proposed, which can detect the forward and reverse speech Mel-frequency spectrum coefficients in the time dimension. At the same time, the model uses causal convolution to obtain temporal correlation information between different scale features, and assigns attention maps to them according to the information, so as to obtain multi-scale fusion of speech emotion features. Secondly, this paper proposes a two-modal feature dynamic fusion algorithm, which combines the advantages of AlexNet and uses overlapping maximum pooling layers to obtain richer fusion features from different modal feature mosaic matrices. Experimental results show that the accuracy of the multi-branch bidirectional multi-scale time sensing dual-modal emotion recognition model proposed in this paper reaches 97.67% and 90.14% respectively on the two public audio and video emotion data sets, which is superior to other common methods, indicating that the proposed emotion recognition model can effectively capture emotion feature information and improve the accuracy of emotion recognition.
To address the high computational complexity of the Transformer in the segmentation of ultrasound thyroid nodules and the loss of image details or omission of key spatial information caused by traditional image sampling techniques when dealing with high-resolution, complex texture or uneven density two-dimensional ultrasound images, this paper proposes a thyroid nodule segmentation method that integrates the receiving weighted key-value (RWKV) architecture and spherical geometry feature (SGF) sampling technology. This method effectively captures the details of adjacent regions through two-dimensional offset prediction and pixel-level sampling position adjustment, achieving precise segmentation. Additionally, this study introduces a patch attention module (PAM) to optimize the decoder feature map using a regional cross-attention mechanism, enabling it to focus more precisely on the high-resolution features of the encoder. Experiments on the thyroid nodule segmentation dataset (TN3K) and the digital database for thyroid images (DDTI) show that the proposed method achieves dice similarity coefficients (DSC) of 87.24% and 80.79% respectively, outperforming existing models while maintaining a lower computational complexity. This approach may provide an efficient solution for the precise segmentation of thyroid nodules.
Deep learning method can be used to automatically analyze electrocardiogram (ECG) data and rapidly implement arrhythmia classification, which provides significant clinical value for the early screening of arrhythmias. How to select arrhythmia features effectively under limited abnormal sample supervision is an urgent issue to address. This paper proposed an arrhythmia classification algorithm based on an adaptive multi-feature fusion network. The algorithm extracted RR interval features from ECG signals, employed one-dimensional convolutional neural network (1D-CNN) to extract time-domain deep features, employed Mel frequency cepstral coefficients (MFCC) and two-dimensional convolutional neural network (2D-CNN) to extract frequency-domain deep features. The features were fused using adaptive weighting strategy for arrhythmia classification. The paper used the arrhythmia database jointly developed by the Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH) and evaluated the algorithm under the inter-patient paradigm. Experimental results demonstrated that the proposed algorithm achieved an average precision of 75.2%, an average recall of 70.1% and an average F1-score of 71.3%, demonstrating high classification accuracy and being able to provide algorithmic support for arrhythmia classification in wearable devices.
Magnetic resonance imaging(MRI) can obtain multi-modal images with different contrast, which provides rich information for clinical diagnosis. However, some contrast images are not scanned or the quality of the acquired images cannot meet the diagnostic requirements due to the difficulty of patient's cooperation or the limitation of scanning conditions. Image synthesis techniques have become a method to compensate for such image deficiencies. In recent years, deep learning has been widely used in the field of MRI synthesis. In this paper, a synthesis network based on multi-modal fusion is proposed, which firstly uses a feature encoder to encode the features of multiple unimodal images separately, and then fuses the features of different modal images through a feature fusion module, and finally generates the target modal image. The similarity measure between the target image and the predicted image in the network is improved by introducing a dynamic weighted combined loss function based on the spatial domain and K-space domain. After experimental validation and quantitative comparison, the multi-modal fusion deep learning network proposed in this paper can effectively synthesize high-quality MRI fluid-attenuated inversion recovery (FLAIR) images. In summary, the method proposed in this paper can reduce MRI scanning time of the patient, as well as solve the clinical problem of missing FLAIR images or image quality that is difficult to meet diagnostic requirements.
Emotion classification and recognition is a crucial area in emotional computing. Physiological signals, such as electroencephalogram (EEG), provide an accurate reflection of emotions and are difficult to disguise. However, emotion recognition still faces challenges in single-modal signal feature extraction and multi-modal signal integration. This study collected EEG, electromyogram (EMG), and electrodermal activity (EDA) signals from participants under three emotional states: happiness, sadness, and fear. A feature-weighted fusion method was applied for integrating the signals, and both support vector machine (SVM) and extreme learning machine (ELM) were used for classification. The results showed that the classification accuracy was highest when the fusion weights were set to EEG 0.7, EMG 0.15, and EDA 0.15, achieving accuracy rates of 80.19% and 82.48% for SVM and ELM, respectively. These rates represented an improvement of 5.81% and 2.95% compared to using EEG alone. This study offers methodological support for emotion classification and recognition using multi-modal physiological signals.
ST segment morphology is closely related to cardiovascular disease. It is used not only for characterizing different diseases, but also for predicting the severity of the disease. However, the short duration, low energy, variable morphology and interference from various noises make ST segment morphology classification a difficult task. In this paper, we address the problems of single feature extraction and low classification accuracy of ST segment morphology classification, and use the gradient of ST surface to improve the accuracy of ST segment morphology multi-classification. In this paper, we identify five ST segment morphologies: normal, upward-sloping elevation, arch-back elevation, horizontal depression, and arch-back depression. Firstly, we select an ST segment candidate segment according to the QRS wave group location and medical statistical law. Secondly, we extract ST segment area, mean value, difference with reference baseline, slope, and mean squared error features. In addition, the ST segment is converted into a surface, the gradient features of the ST surface are extracted, and the morphological features are formed into a feature vector. Finally, the support vector machine is used to classify the ST segment, and then the ST segment morphology is multi-classified. The MIT-Beth Israel Hospital Database (MITDB) and the European ST-T database (EDB) were used as data sources to validate the algorithm in this paper, and the results showed that the algorithm in this paper achieved an average recognition rate of 97.79% and 95.60%, respectively, in the process of ST segment recognition. Based on the results of this paper, it is expected that this method can be introduced in the clinical setting in the future to provide morphological guidance for the diagnosis of cardiovascular diseases in the clinic and improve the diagnostic efficiency.
The macrotrabecular-massive (MTM) subtype of hepatocellular carcinoma (HCC) is a histological variant with higher malignant potential. Non-invasive preoperative identification of MTM-HCC is crucial for precise treatment. Current radiomics-based diagnostic models often integrate multi-phase features by simple feature concatenation, which may inadequately explore the latent complementary information between phases. This study proposes a feature fusion-based radiomics model using multi-phase contrast-enhanced computed tomography (mpCECT) images. Features were extracted from the arterial phase (AP), portal venous phase (PVP), and delayed phase (DP) CT images of 121 HCC patients. The fusion model was constructed and compared against the traditional concatenation model. Five-fold cross-validation demonstrated that the feature fusion model combining AP and PVP features achieved the best classification performance, with an area under the receiver operating characteristic curve (AUC) of 0.839. Furthermore, for any combination of two phases, the feature fusion model consistently outperformed the traditional feature concatenation approach. In conclusion, the proposed feature fusion model effectively enhances the discrimination capability compared to traditional models, providing a new tool for clinical practice.
In clinical diagnosis of brain tumors, accurate segmentation based on multimodal magnetic resonance imaging (MRI) is essential for determining tumor type, extent, and spatial boundaries. However, differences in imaging mechanisms, information emphasis, and feature distributions among multimodal MRI data have posed significant challenges for precise tumor modeling and fusion-based segmentation. In recent years, fusion neural networks have provided effective strategies for integrating multimodal information and have become a major research focus in multimodal brain tumor segmentation. This review systematically summarized relevant studies on fusion neural networks for multimodal brain tumor segmentation published since 2019. First, the fundamental concepts of multimodal data fusion and model fusion were introduced. Then, existing methods were categorized into three types according to fusion levels: prediction fusion models, feature fusion models, and stage fusion models, and their structural characteristics and segmentation performance were comparatively analyzed. Finally, current limitations were discussed, and potential development trends of fusion neural networks for multimodal MRI brain tumor segmentation were summarized. This review aims to provide references for the design and optimization of future multimodal brain tumor segmentation models.
Motor imagery electroencephalogram (MI-EEG) decoding algorithms face multiple challenges. These include incomplete feature extraction, susceptibility of attention mechanisms to distraction under low signal-to-noise ratios, and limited capture of long-range temporal dependencies. To address these issues, this paper proposes a multi-branch differential attention temporal network (MDAT-Net). First, the method constructed a multi-branch feature fusion module to extract and fuse diverse spatio-temporal features from different scales. Next, to suppress noise and stabilize attention, a novel multi-head differential attention mechanism was introduced to enhance key signal dynamics by calculating the difference between attention maps. Finally, an adaptive residual separable temporal convolutional network was designed to efficiently capture long-range dependencies within the feature sequence for precise classification. Experimental results showed that the proposed method achieved average classification accuracies of 85.73%, 90.04%, and 96.30% on the public datasets BCI-IV-2a, BCI-IV-2b, and HGD, respectively, significantly outperforming several baseline models. This research provides an effective new solution for developing high-precision motor imagery brain-computer interface systems.