Consensus Clustering

聚類分析 · 發表 2019-03-02 16:04:57

摘要： Consensus Clustering(一致性聚類)，無監督聚類方法，是一種常見的癌症亞型分類研究方法（如乳腺癌中的PAM50），可根據不同組學資料集將樣本區分成幾個亞型，從而發現新的疾病亞型或者對不同亞型進行比較分析( Justification for using consensus ...

Consensus Clustering(一致性聚類)，無監督聚類方法，是一種常見的癌症亞型分類研究方法（如乳腺癌中的PAM50），可根據不同組學資料集將樣本區分成幾個亞型，從而發現新的疾病亞型或者對不同亞型進行比較分析( Justification for using consensus clustering(wiki) )

Consensus Clustering的思路是：採用重抽樣方法抽取一定樣本的資料集，指定聚類數目k並計算不同聚類數目下的合理性(PAC方法)

PAC可用來優化聚類模型選擇最優的K值， wiki 解釋如下：

The “proportion of ambiguous clustering” (PAC) measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u1, u2) ∈ [0, 1] where u1 is a value close to 0 and u2 is a value close to 1 (for instance u1=0.1 and u2=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. We can therefore infer the optimal number of clusters by the K value having the lowest PAC

從上圖可得：一般常用的方法是考慮CDF下降坡度小（在u1-u2範圍內的曲線），但有時不一定要遵守這個方法 how to choose optimal K in Consensus clustering ，可以選擇其他K值最優的方法或者按照自己的研究目的來選

除了Consensus Clustering外，還有些大文章會用non-negative matrix factorization (NMF) consensus cluster（R包-NMF）來尋找亞型，如文章：Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma(Nature)

Consensus Clustering實現比較簡單，有現成的R包ConsensusClusterPlus，操作比較簡單，只需要一個表達矩陣(如 rawdata.txt )，如下：

data <- read.table(file = "rawdata.txt", sep = "\t", header = T, stringsAsFactors = F, row.names = 1, check.names = F)
# 過濾50%缺失值的
data2 <- data[apply(data, 1, function(x){sum(is.na(x)) < ncol(data)/2}),]
data2 <- as.matrix(data2)

res <- ConsensusClusterPlus(data2, maxK = 10, reps = 1000, pItem = 0.8, pFeature = 1, clusterAlg = "pam", corUse = "complete.obs", seed=123456, plot="png", writeTable=T)

其結果將會輸出k從2-10各個情況下的分型情況，聚類的方法用的是 pam ，抽樣比例為0.8，最後輸出png圖和csv表格檔案

結果檔案：

按照上述選擇k值的方法，根據這個資料的結果，感覺k值可以暫時選擇7~當然也可以根據研究背景的選擇來定

確定亞型後，接著可以基於各個亞型來分析：比如繪製不同亞型的表達模型熱圖、看看某個分類下不同亞型的表達高低差異、做不同亞型之間基因表達的顯著性差異以及結合PCA或者共表達網路等等

參考資料：

ConsensusClusterPlus (Tutorial)

Consensus Clustering

您可能也會喜歡…