The traditional method of multi-parameter flow data clustering in flow cytometry is to mainly use professional software to manually set the door and circle out the target cells for analysis. The analysis process is complex and professional. Based on this, a clustering algorithm, which is based on t-distributed stochastic neighbor embedding (t-SNE) algorithm for multi-parameter stream data, is proposed in the paper. In this algorithm, the Euclidean distance of sample data in high dimensional space is transformed into conditional probability to represent similarity, and the data is reduced to low dimensional space. In this paper, the stained human peripheral blood cells were treated by flow cytometry, and the processed data were derived as experimental sample data. Thet-SNE algorithm is compared with the kernel principal component analysis (KPCA) dimensionality reduction algorithm, and the main component data obtained by the dimensionality reduction are classified using K-means algorithm. The results show that thet-SNE algorithm has a good clustering effect on the cell population with asymmetric and trailing distribution, and the clustering accuracy can reach 92.55%, which may be helpful for automatic analysis of multi-color multi-parameter flow data.
In clinic, intima and media thickness are the main indicators for evaluating the development of atherosclerosis. At present, these indicators are measured by professional doctors manually marking the boundaries of the inner and media on B-mode images, which is complicated, time-consuming and affected by many artificial factors. A grayscale threshold method based on Gaussian Mixture Model (GMM) clustering is therefore proposed to detect the intima and media thickness in carotid arteries from B-mode images in this paper. Firstly, the B-mode images are clustered based on the GMM, and the boundary between the intima and media of the vessel wall is then detected by the gray threshold method, and finally the thickness of the two is measured. Compared with the measurement technique using the gray threshold method directly, the clustering of B-mode images of carotid artery solves the problem of gray boundary blurring of inner and middle membrane, thereby improving the stability and detection accuracy of the gray threshold method. In the clinical trials of 120 healthy carotid arteries, means of 4 manual measurements obtained by two experts are used as reference values. Experimental results show that the normalized root mean square errors (NRMSEs) of the estimated intima and media thickness after GMM clustering were 0.104 7 ± 0.076 2 and 0.097 4 ± 0.068 3, respectively. Compared with the results of the direct gray threshold estimation, means of NRMSEs are reduced by 19.6% and 22.4%, respectively, which indicates that the proposed method has higher measurement accuracy. The standard deviations are reduced by 17.0% and 21.7%, respectively, which indicates that the proposed method has better stability. In summary, this method is helpful for early diagnosis and monitoring of vascular diseases, such as atherosclerosis.
The use of echocardiography ventricle segmentation can obtain ventricular volume parameters, and it is helpful to evaluate cardiac function. However, the ultrasound images have the characteristics of high noise and difficulty in segmentation, bringing huge workload to segment the object region manually. Meanwhile, the automatic segmentation technology cannot guarantee the segmentation accuracy. In order to solve this problem, a novel algorithm framework is proposed to segment the ventricle. Firstly, faster region-based convolutional neural network is used to locate the object to get the region of interest. Secondly, K-means is used to pre-segment the image; then a mean shift with adaptive bandwidth of kernel function is proposed to segment the region of interest. Finally, the region growing algorithm is used to get the object region. By this framework, ventricle is obtained automatically without manual localization. Experiments prove that this framework can segment the object accurately, and the algorithm of adaptive mean shift is more stable and accurate than the mean shift with fixed bandwidth on quantitative evaluation. These results show that the method in this paper is helpful for automatic segmentation of left ventricle in echocardiography.
The deoxyribonucleic acid (DNA) molecule damage simulations with an atom level geometric model use the traversal algorithm that has the disadvantages of quite time-consuming, slow convergence and high-performance computer requirement. Therefore, this work presents a density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm based on the spatial distributions of energy depositions and hydroxyl radicals (·OH). The algorithm with probability and statistics can quickly get the DNA strand break yields and help to study the variation pattern of the clustered DNA damage. Firstly, we simulated the transportation of protons and secondary particles through the nucleus, as well as the ionization and excitation of water molecules by using Geant4-DNA that is the Monte Carlo simulation toolkit for radiobiology, and got the distributions of energy depositions and hydroxyl radicals. Then we used the damage probability functions to get the spatial distribution dataset of DNA damage points in a simplified geometric model. The DBSCAN clustering algorithm based on damage points density was used to determine the single-strand break (SSB) yield and double-strand break (DSB) yield. Finally, we analyzed the DNA strand break yield variation trend with particle linear energy transfer (LET) and summarized the variation pattern of damage clusters. The simulation results show that the new algorithm has a faster simulation speed than the traversal algorithm and a good precision result. The simulation results have consistency when compared to other experiments and simulations. This work achieves more precise information on clustered DNA damage induced by proton radiation at the molecular level with high speed, so that it provides an essential and powerful research method for the study of radiation biological damage mechanism.
In order to develop safe training intensity and training methods for the passive balance rehabilitation training system, we propose in this paper a mathematical model for human standing balance adjustment based on T-S fuzzy identification method. This model takes the acceleration of a multidimensional motion platform as its inputs, and human joint angles as its outputs. We used the artificial bee colony optimization algorithm to improve fuzzy C-means clustering algorithm, which enhanced the efficiency of the identification for antecedent parameters. Through some experiments, the data of 9 testees were collected, which were used for model training and model results validation. With the mean square error and cross-correlation between the simulation data and measured data, we concluded that the model was accurate and reasonable.
The diagnosis of pancreatic cancer is very important. The main method of diagnosis is based on pathological analysis of microscopic image of Pap smear slide. The accurate segmentation and classification of images are two important phases of the analysis. In this paper, we proposed a new automatic segmentation and classification method for microscopic images of pancreas. For the segmentation phase, firstly multi-features Mean-shift clustering algorithm (MFMS) was applied to localize regions of nuclei. Then, chain splitting model (CSM) containing flexible mathematical morphology and curvature scale space corner detection method was applied to split overlapped cells for better accuracy and robustness. For classification phase, 4 shape-based features and 138 textural features based on color spaces of cell nuclei were extracted. In order to achieve optimal feature set and classify different cells, chain-like agent genetic algorithm (CAGA) combined with support vector machine (SVM) was proposed. The proposed method was tested on 15 cytology images containing 461 cell nuclei. Experimental results showed that the proposed method could automatically segment and classify different types of microscopic images of pancreatic cell and had effective segmentation and classification results. The mean accuracy of segmentation is 93.46%±7.24%. The classification performance of normal and malignant cells can achieve 96.55%±0.99% for accuracy, 96.10%±3.08% for sensitivity and 96.80%±1.48% for specificity.
The rapid development of high-throughput chromatin conformation capture (Hi-C) technology provides rich genomic interaction data between chromosomal loci for chromatin structure analysis. However, existing methods for identifying topologically associated domains (TADs) based on Hi-C data suffer from low accuracy and sensitivity to parameters. In this context, a TAD identification method based on spatial density clustering was designed and implemented in this paper. The method preprocessed the raw Hi-C data to obtain normalized Hi-C contact matrix data. Then, it computed the distance matrix between loci, generated a reachability graph based on the core distance and reachability distance of loci, and extracted clustering clusters. Finally, it extracted TAD boundaries based on clustering results. This method could identify TAD structures with higher coherence, and TAD boundaries were enriched with more ChIP-seq factors. Experimental results demonstrate that our method has advantages such as higher accuracy and practical significance in TAD identification.
Objective To investigate the dietary patterns of rural residents in the high-incidence areas of esophageal cancer (EC), and to explore the clustering and influencing factors of risk factors associated with high-incidence characteristics. Methods A special structured questionnaire was applied to conduct a face-to-face survey on the dietary patterns of rural residents in Yanting county of Sichuan Province from July to August 2021. Univariate and multivariate logistic regression models were used to analyze the influencing factors of risk factor clustering for EC. Results There were 838 valid questionnaires in this study. A total of 90.8% of rural residents used clean water such as tap water. In the past one year, the people who ate fruits and vegetables, soybean products, onions and garlic in high frequency accounted for 69.5%, 32.8% and 74.5%, respectively; the people who ate kimchi, pickled vegetables, sauerkraut, barbecue, hot food and mildew food in low frequency accounted for 59.2%, 79.6%, 68.2%, 90.3%, 80.9% and 90.3%, respectively. The clustering of risk factors for EC was found in 73.3% of residents, and the aggregation of two risk factors was the most common mode (28.2%), among which tumor history and preserved food was the main clustering pattern (4.6%). The logistic regression model revealed that the gender, age, marital status and occupation were independent influencing factors for the risk factors clustering of EC (P<0.05). Conclusion A majority of rural residents in high-incidence areas of EC in Yanting county have good eating habits, but the clustering of some risk factors is still at a high level. Gender, age, marital status, and occupation are influencing factors of the risk factors clustering of EC.
Magnetic resonance (MR) images can be used to detect lesions in the brains of patients with multiple sclerosis (MS). An automatic method is presented for segmentation of MS lesions using multispectral MR images in this paper. Firstly, a Pd-w image is subtracted from its corresponding T1-w images to get an image in which the cerebral spinal fluid (CSF) is enhanced. Secondly, based on kernel fuzzy c-means clustering (KFCM) algorithm, the enhanced image and the corresponding T2-w image are segmented respectively to extract the CSF region and the CSF-MS lesions combinatoin region. A raw MS lesions image is obtained by subtracting the CSF region from CSF-MS region. Thirdly, based on applying median filter and thresholding to the raw image, the MS lesions were detected finally. Results were tested on BrainWeb images and evaluated with Dice similarity coefficient (DSC), sensitivity (Sens), specificity (Spec) and accuracy (Acc). The testing results were satisfactory.
At present, the incidence of Parkinson’s disease (PD) is gradually increasing. This seriously affects the quality of life of patients, and the burden of diagnosis and treatment is increasing. However, the disease is difficult to intervene in early stage as early monitoring means are limited. Aiming to find an effective biomarker of PD, this work extracted correlation between each pair of electroencephalogram (EEG) channels for each frequency band using weighted symbolic mutual information and k-means clustering. The results showed that State1 of Beta frequency band (P = 0.034) and State5 of Gamma frequency band (P = 0.010) could be used to differentiate health controls and off-medication Parkinson’s disease patients. These findings indicated that there were significant differences in the resting channel-wise correlation states between PD patients and healthy subjects. However, no significant differences were found between PD-on and PD-off patients, and between PD-on patients and healthy controls. This may provide a clinical diagnosis reference for Parkinson’s disease.