Datei:Datest Dendrogram PCP.JPG

Aus Teachwiki
Version vom 18. Juni 2018, 15:58 Uhr von Maintenance script (Diskussion | Beiträge) (Maintenance script lud Datei:Datest Dendrogram PCP.JPG hoch)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu: Navigation, Suche
Datest_Dendrogram_PCP.JPG(600 × 600 Pixel, Dateigröße: 38 KB, MIME-Typ: image/jpeg)

Dendrogram and Parallel Coordinate Plot

Dendrogram

This chart is the graphical representation of the clustering algorithm. For more detailed interpretation check Analysis of Tuberculosis.

Parallel Coordinate Plot

This plot shows the values of the cluster means (cluster = groups found in the data) on a vertical standardized axis. For interpretation check Analysis of Tuberculosis.

Usage of the Program

This program provides the opportunity to compute different numbers of clusters with several distance measures and clustering algortihms. Default settings are the computation of 3 clusters, usage of the euclidean distance and the Ward clustering algorithm. The output contains the text vectors of the the country names falling into each cluster as well as the cluster means and variances. Furthermore, the PCP (Parallel Coordinate Plot) of cluster means is plotted to give a first insight into differences between the groups found in the data. Finally, the cluster information can be saved to new files for ongoin analysis (clusters could, for instance, be checked for outliers with the the outlier program that can be found here).

Program Code

Attention! For repeating the computation a transformed dataset is needed! If you have not yet computated and saved the transformation, run the program for transformation on the wikipage Analysis of Tuberculosis first!


library("xplore")
library("stats")

; ----- Reading Data ---------------------------------------------------------------------------

choose = "Read data from:"

defaults = "C:\Dokumente und Einstellungen\All Users\Desktop\UN_data_ordered.csv"

v1 = readvalue(choose, defaults)

x = readcsvm(v1)

data = x.double
country = x.text

; ----- Cluster Analysis ----------------------------------------------------------------------

data2 = data[,4:cols(data)]
data3 = data2/var(data2)

d=distance(data3,"euclid")
t=agglom(d,"WARD",3)

g=tree(t.g,0,"CENTER")
g=g.points
l = 5.*(1:rows(g)/5) + (0:4)' - 4

setmaskl(g, l, 0, 1, 1)
setmaskp(g, 0, 0, 0)

; ----- Graphical Options ----------------------------------------------------------------------

setsize(600, 600)
f = createdisplay(2,1)

axeson()
show(f, 1, 1, g)
title1 = "Dendrogram"
xlabel1 = "Countries"
ylabel1 = "Euclidean-distance"
setgopt(f, 1, 1, "title", title1, "xlabel", xlabel1, "ylabel", "Euclidean-distance")

; ----- Get Cluster Data and Countries ---------------------------------------------------------

cluster1=paf(data,t.pd==1)
cluster2=paf(data,t.pd==2)
cluster3=paf(data,t.pd==3)

x1=paf(data2,t.pd==1)
x2=paf(data2,t.pd==2)
x3=paf(data2,t.pd==3)

country[cluster1[,1]]
country[cluster2[,1]]
country[cluster3[,1]]

; ----- Get Basic Info about the Clusters (Mean, Variance) and Draw PCP ------------------------

mc=(mean(x1)')~(mean(x2)')~(mean(x3)')
mc

vc=(sqrt(var(x1))')~(sqrt(var(x2))')~(sqrt(var(x3))')
vc

col1  = grc.col.green-grc.col.blue
col2  = grc.col.red-grc.col.blue
col   = grc.col.blue+col1*(mc'[,1]<=min(mc'[,1]))+col2*(mc'[,1]>min(mc'[,1])&&mc'[,1]<max(mc'[,1]))

/*
mctrans = mc' - mean(mc')
mctrans = mctrans'/sqrt(var(mc'))'
mctrans
*/

mctrans = mc' - min(mc')
mctrans = mctrans'/(max(mc')-min(mc'))'
mctrans

gr = grpcp(mctrans',col)

; ----------- Graphical Options ----------

title2 = "Parallel Coordinate Plot of Cluster Means"
xlabel2 = "Aids"|"Mal"|"Tub"|"Con"|"Drug"|"Edu"|"Lit"|"San"|"Wat"|"CO2"|"Int"|"PC"|"Tel"
ylabel2 = "0"|"stdzd."|"1"

axesoff()
axes = graxes((0.80|13.2)~(-0.05|1.05), "origin", 7.5, "ytextpos", 9, "xtextpos", 6, "xticks", (1:13), "xtext", xlabel2, "yticks", 0|0.5|1, "ytext", ylabel2, "xtextsize", 16, "ytextsize", 16)
axes1 = graxes((0.80|13.2)~(-0.05|1.05), "origin", 1.5, "ytextpos", 3, "xtextpos", -1, "xticks", (1:13), "yticks", 0|0.5|1, "ytext", 0|0.5|1, "xtextsize", 16, "ytextsize", 16)

show(f, 2, 1, gr, axes, axes1)
setgopt(f, 2, 1, "title", title2)

; ----- Saving Options for Clusters-------------------------------------------------------------

proc()=save(c1, c2, c3)
	
	head2 = "Save Clusters"
	
	item2 = "Cluster1" | "Cluster2" | "Cluster3"
	
	sel2 = selectitem(head2, item2)
	
	switch
		
		case(sel2[1]==1 && sel2[2]==0 && sel2[3]==0)
		
			folder = "Save cluster1 to:"
			
			default3 = "C:\Dokumente und Einstellungen\All Users\Desktop\Cluster1.csv"
			
			v3=readvalue(folder, default3)
			

Dateiversionen

Klicke auf einen Zeitpunkt, um diese Version zu laden.

Version vomVorschaubildMaßeBenutzerKommentar
aktuell15:58, 18. Jun. 2018
Fehler beim Erstellen des Vorschaubildes: Die Miniaturansicht konnte nicht am vorgesehenen Ort gespeichert werden
600 × 600 (38 KB)Maintenance script (Diskussion | Beiträge)
  • Du kannst diese Datei nicht überschreiben.

Die folgende Seite verwendet diese Datei: