#Classification trees
A regression tree models a continuous outcome. It is an alternative to carrying out a multiple regression.
A classification tree models a discrete outcome. In many cases, the outcome variable is binary (having two values), in which case the classification tree is an alternative to fitting a logistic regression model. Logistic regression is a topic covered in Stat4620 (Data Analysis).
Consider the Heart data set.
heart=read.csv("http://faculty.marshall.usc.edu/gareth-james/ISL/Heart.csv")
head(heart)
## X Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca
## 1 1 63 1 typical 145 233 1 2 150 0 2.3 3 0
## 2 2 67 1 asymptomatic 160 286 0 2 108 1 1.5 2 3
## 3 3 67 1 asymptomatic 120 229 0 2 129 1 2.6 2 2
## 4 4 37 1 nonanginal 130 250 0 0 187 0 3.5 3 0
## 5 5 41 0 nontypical 130 204 0 2 172 0 1.4 1 0
## 6 6 56 1 nontypical 120 236 0 0 178 0 0.8 1 0
## Thal AHD
## 1 fixed No
## 2 normal Yes
## 3 reversable Yes
## 4 normal No
## 5 normal No
## 6 normal No
The outcome variable AHD is binary, taking values “Yes” or “No”. This is equivalent to a variable which equals 1 when AHD=“Yes”, and 0 otherwise.
summary(heart)
## X Age Sex ChestPain
## Min. : 1.0 Min. :29.00 Min. :0.0000 asymptomatic:144
## 1st Qu.: 76.5 1st Qu.:48.00 1st Qu.:0.0000 nonanginal : 86
## Median :152.0 Median :56.00 Median :1.0000 nontypical : 50
## Mean :152.0 Mean :54.44 Mean :0.6799 typical : 23
## 3rd Qu.:227.5 3rd Qu.:61.00 3rd Qu.:1.0000
## Max. :303.0 Max. :77.00 Max. :1.0000
##
## RestBP Chol Fbs RestECG
## Min. : 94.0 Min. :126.0 Min. :0.0000 Min. :0.0000
## 1st Qu.:120.0 1st Qu.:211.0 1st Qu.:0.0000 1st Qu.:0.0000
## Median :130.0 Median :241.0 Median :0.0000 Median :1.0000
## Mean :131.7 Mean :246.7 Mean :0.1485 Mean :0.9901
## 3rd Qu.:140.0 3rd Qu.:275.0 3rd Qu.:0.0000 3rd Qu.:2.0000
## Max. :200.0 Max. :564.0 Max. :1.0000 Max. :2.0000
##
## MaxHR ExAng Oldpeak Slope
## Min. : 71.0 Min. :0.0000 Min. :0.00 Min. :1.000
## 1st Qu.:133.5 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:1.000
## Median :153.0 Median :0.0000 Median :0.80 Median :2.000
## Mean :149.6 Mean :0.3267 Mean :1.04 Mean :1.601
## 3rd Qu.:166.0 3rd Qu.:1.0000 3rd Qu.:1.60 3rd Qu.:2.000
## Max. :202.0 Max. :1.0000 Max. :6.20 Max. :3.000
##
## Ca Thal AHD
## Min. :0.0000 fixed : 18 No :164
## 1st Qu.:0.0000 normal :166 Yes:139
## Median :0.0000 reversable:117
## Mean :0.6722 NA's : 2
## 3rd Qu.:1.0000
## Max. :3.0000
## NA's :4
nmiss=apply(is.na(heart),1,sum) #number of missing values, by row
heartm=heart[nmiss!=0,] #cases with missing values
heart=heart[nmiss==0,] #remove cases with missing values
head(heartm)
## X Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope
## 88 88 53 0 nonanginal 128 216 0 2 115 0 0.0 1
## 167 167 52 1 nonanginal 138 223 0 0 169 0 0.0 1
## 193 193 43 1 asymptomatic 132 247 1 2 143 1 0.1 2
## 267 267 52 1 asymptomatic 128 204 1 0 156 1 1.0 2
## 288 288 58 1 nontypical 125 220 0 0 144 0 0.4 2
## 303 303 38 1 nonanginal 138 175 0 0 173 0 0.0 1
## Ca Thal AHD
## 88 0 <NA> No
## 167 NA normal No
## 193 NA reversable Yes
## 267 0 <NA> Yes
## 288 NA reversable No
## 303 NA normal No
Let’s predict AHD using a classification tree, with all other variables in the data set used as potential predictors.
library(tree)
heart.tree=tree(AHD~., data=heart)
plot(heart.tree)
text(heart.tree)
print(heart.tree)
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 297 409.900 No ( 0.53872 0.46128 )
## 2) Thal: normal 164 175.100 No ( 0.77439 0.22561 )
## 4) Ca < 0.5 115 81.150 No ( 0.88696 0.11304 )
## 8) Age < 57.5 80 25.590 No ( 0.96250 0.03750 )
## 16) MaxHR < 152.5 17 15.840 No ( 0.82353 0.17647 )
## 32) Chol < 226.5 8 0.000 No ( 1.00000 0.00000 ) *
## 33) Chol > 226.5 9 11.460 No ( 0.66667 0.33333 ) *
## 17) MaxHR > 152.5 63 0.000 No ( 1.00000 0.00000 ) *
## 9) Age > 57.5 35 41.880 No ( 0.71429 0.28571 )
## 18) Fbs < 0.5 29 37.360 No ( 0.65517 0.34483 ) *
## 19) Fbs > 0.5 6 0.000 No ( 1.00000 0.00000 ) *
## 5) Ca > 0.5 49 67.910 No ( 0.51020 0.48980 )
## 10) ChestPain: nonanginal,nontypical,typical 29 32.050 No ( 0.75862 0.24138 )
## 20) X < 230.5 19 7.835 No ( 0.94737 0.05263 ) *
## 21) X > 230.5 10 13.460 Yes ( 0.40000 0.60000 ) *
## 11) ChestPain: asymptomatic 20 16.910 Yes ( 0.15000 0.85000 )
## 22) Sex < 0.5 6 8.318 No ( 0.50000 0.50000 ) *
## 23) Sex > 0.5 14 0.000 Yes ( 0.00000 1.00000 ) *
## 3) Thal: fixed,reversable 133 149.000 Yes ( 0.24812 0.75188 )
## 6) Ca < 0.5 59 81.370 Yes ( 0.45763 0.54237 )
## 12) ExAng < 0.5 33 42.010 No ( 0.66667 0.33333 )
## 24) Age < 51 13 17.320 Yes ( 0.38462 0.61538 )
## 48) ChestPain: nonanginal,nontypical 5 5.004 No ( 0.80000 0.20000 ) *
## 49) ChestPain: asymptomatic,typical 8 6.028 Yes ( 0.12500 0.87500 ) *
## 25) Age > 51 20 16.910 No ( 0.85000 0.15000 ) *
## 13) ExAng > 0.5 26 25.460 Yes ( 0.19231 0.80769 )
## 26) Oldpeak < 1.55 11 15.160 Yes ( 0.45455 0.54545 )
## 52) Chol < 240.5 6 5.407 No ( 0.83333 0.16667 ) *
## 53) Chol > 240.5 5 0.000 Yes ( 0.00000 1.00000 ) *
## 27) Oldpeak > 1.55 15 0.000 Yes ( 0.00000 1.00000 ) *
## 7) Ca > 0.5 74 41.650 Yes ( 0.08108 0.91892 )
## 14) RestECG < 0.5 34 31.690 Yes ( 0.17647 0.82353 )
## 28) MaxHR < 145 20 7.941 Yes ( 0.05000 0.95000 ) *
## 29) MaxHR > 145 14 18.250 Yes ( 0.35714 0.64286 )
## 58) MaxHR < 158 5 5.004 No ( 0.80000 0.20000 ) *
## 59) MaxHR > 158 9 6.279 Yes ( 0.11111 0.88889 ) *
## 15) RestECG > 0.5 40 0.000 Yes ( 0.00000 1.00000 ) *
\[G= \hat p_L (1- \hat p_L) + \hat p_R (1- \hat p_R)\]
\(G\) is called the “Gini index”. You may recognize it as the estimated total variance, of the binary variable AHD in this case. Minimizing impurity maximizes purity.
think of $\(\hat p_L\) as the probability that Yes output is going to the left branch.
Note that in this case some of the predictor variables are discrete. For example, “sex” and “Thal”. There is only one possible split using “sex”. Ignoring missing values, there are three ways to split using “Thal”.
Generally the tree growing process generates a tree with too many splits. This is an example of overfitting.
We would like to restrict the size of the tree in some manner. We do this by cut back the initially grown tree. This is known as pruning.
How far back should we prune the tree? And along which branches?
heartcv=cv.tree(heart.tree, FUN=prune.misclass)
print(heartcv)
## $size
## [1] 19 14 11 8 6 4 2 1
##
## $dev
## [1] 83 83 74 74 74 84 84 140
##
## $k
## [1] -Inf 0.0 1.0 2.0 3.0 5.5 7.0 67.0
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(heartcv$size,heartcv$dev,type='b')
bestsize=heartcv$size[heartcv$dev==min(heartcv$dev)]
The argument FUN=prune.misclass used in the call to cv.tree says to use the misclassification error rate to guide the pruning process, rather than the deviance. This is a reasonable choice for classification trees, where the goal is generally to classify the discrete outcome.
Prune the tree back to the best size, 11, 8, 6, using prune.misclass
heart.prune=prune.misclass(heart.tree,best=bestsize)
## Warning in best <= size: longer object length is not a multiple of shorter
## object length
heart.prune
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 297 409.900 No ( 0.53872 0.46128 )
## 2) Thal: normal 164 175.100 No ( 0.77439 0.22561 )
## 4) Ca < 0.5 115 81.150 No ( 0.88696 0.11304 ) *
## 5) Ca > 0.5 49 67.910 No ( 0.51020 0.48980 )
## 10) ChestPain: nonanginal,nontypical,typical 29 32.050 No ( 0.75862 0.24138 )
## 20) X < 230.5 19 7.835 No ( 0.94737 0.05263 ) *
## 21) X > 230.5 10 13.460 Yes ( 0.40000 0.60000 ) *
## 11) ChestPain: asymptomatic 20 16.910 Yes ( 0.15000 0.85000 ) *
## 3) Thal: fixed,reversable 133 149.000 Yes ( 0.24812 0.75188 )
## 6) Ca < 0.5 59 81.370 Yes ( 0.45763 0.54237 )
## 12) ExAng < 0.5 33 42.010 No ( 0.66667 0.33333 )
## 24) Age < 51 13 17.320 Yes ( 0.38462 0.61538 )
## 48) ChestPain: nonanginal,nontypical 5 5.004 No ( 0.80000 0.20000 ) *
## 49) ChestPain: asymptomatic,typical 8 6.028 Yes ( 0.12500 0.87500 ) *
## 25) Age > 51 20 16.910 No ( 0.85000 0.15000 ) *
## 13) ExAng > 0.5 26 25.460 Yes ( 0.19231 0.80769 )
## 26) Oldpeak < 1.55 11 15.160 Yes ( 0.45455 0.54545 )
## 52) Chol < 240.5 6 5.407 No ( 0.83333 0.16667 ) *
## 53) Chol > 240.5 5 0.000 Yes ( 0.00000 1.00000 ) *
## 27) Oldpeak > 1.55 15 0.000 Yes ( 0.00000 1.00000 ) *
## 7) Ca > 0.5 74 41.650 Yes ( 0.08108 0.91892 ) *
plot(heart.prune,ylab="error rate")
text(heart.prune, pretty=0)
heart.pred=predict(heart.prune,data=heart, type="class")
table(heart.pred,heart$AHD)
##
## heart.pred No Yes
## No 146 19
## Yes 14 118
the error rate in the above analysis is unrealistically low in that all data were used to fit the tree, and all data were used to assess the performance of the classification
a better strategy to is intially split the dataset into training and test sets, do all of the fitting on the training set, and assess the error rate on the test set.
### randomly split the data into a training and a test set of approximately the same size.
nrec=dim(heart)[1]
nrec
## [1] 297
ntrain=ifelse(nrec%%2==0,nrec/2,(nrec+1)/2)
ntrain
## [1] 149
train=sample(1:nrec,ntrain,replace=F)
train
## [1] 202 128 214 261 247 114 244 142 175 144 90 185 286 40 281 220 228 267
## [19] 275 292 97 177 217 84 196 253 285 197 72 194 137 17 283 265 117 284
## [37] 235 134 113 46 171 93 62 51 102 14 76 33 63 249 99 56 60 218
## [55] 252 45 277 44 159 156 266 100 173 64 103 61 224 291 120 190 81 238
## [73] 280 52 289 150 271 96 29 71 234 203 145 167 3 115 49 95 242 48
## [91] 28 229 273 135 131 165 89 53 254 212 110 140 104 239 209 272 32 282
## [109] 296 223 149 210 262 160 11 268 69 225 243 263 237 108 221 132 189 54
## [127] 37 36 42 294 22 112 191 208 55 136 77 75 250 153 199 31 169 182
## [145] 122 172 256 180 290
htrain=heart[train,]
htrain
## X Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope
## 205 205 43 1 asymptomatic 110 211 0 0 161 0 0.0 1
## 129 129 44 1 nontypical 120 220 0 0 170 0 0.0 1
## 217 217 46 0 nontypical 105 204 0 0 172 0 0.0 1
## 264 264 44 1 nonanginal 120 226 0 0 169 0 0.0 1
## 250 250 62 1 nontypical 128 208 1 2 140 0 0.0 1
## 115 115 62 0 nonanginal 130 263 0 0 97 0 1.2 2
## 247 247 58 1 asymptomatic 100 234 0 0 156 0 0.1 1
## 143 143 52 1 nontypical 128 205 1 0 184 0 0.0 1
## 177 177 52 1 asymptomatic 108 233 1 0 147 0 0.1 1
## 145 145 58 1 nonanginal 105 240 0 2 154 1 0.6 2
## 91 91 66 1 asymptomatic 120 302 0 2 151 0 0.4 2
## 187 187 42 1 nonanginal 120 240 1 0 194 0 0.8 3
## 291 291 67 1 nonanginal 152 212 0 2 150 0 0.8 2
## 40 40 61 1 nonanginal 150 243 1 0 137 1 1.0 2
## 285 285 61 1 asymptomatic 148 203 0 0 161 0 0.0 1
## 223 223 39 0 nonanginal 94 199 0 0 179 0 0.0 1
## 231 231 52 0 nonanginal 136 196 0 2 169 0 0.1 2
## 271 271 61 1 asymptomatic 140 207 0 2 138 1 1.9 1
## 279 279 57 1 nontypical 154 232 0 2 164 0 0.0 1
## 297 297 59 1 asymptomatic 164 176 1 2 90 0 1.0 2
## 98 98 60 0 asymptomatic 150 258 0 2 157 0 2.6 2
## 179 179 43 1 nonanginal 130 315 0 0 162 0 1.9 1
## 220 220 59 1 asymptomatic 138 271 0 2 182 0 0.0 1
## 84 84 68 1 nonanginal 180 274 1 2 150 1 1.6 2
## 199 199 50 0 nontypical 120 244 0 0 162 0 1.1 1
## 256 256 42 0 nonanginal 120 209 0 0 173 0 0.0 2
## 290 290 56 1 nontypical 120 240 0 0 169 0 0.0 3
## 200 200 59 1 typical 160 273 0 2 125 0 0.0 1
## 72 72 67 1 asymptomatic 125 254 1 0 163 0 0.2 2
## 197 197 69 1 typical 160 234 1 2 131 0 0.1 2
## 138 138 62 1 nontypical 120 281 0 2 103 0 1.4 2
## 17 17 48 1 nontypical 110 229 0 0 168 0 1.0 3
## 287 287 58 0 asymptomatic 170 225 1 2 146 1 2.8 2
## 269 269 40 1 asymptomatic 152 223 0 0 181 0 0.0 1
## 118 118 35 0 asymptomatic 138 183 0 0 182 0 1.4 1
## 289 289 56 1 nontypical 130 221 0 2 163 0 0.0 1
## 238 238 46 1 asymptomatic 120 249 0 2 144 0 0.8 1
## 135 135 43 0 nonanginal 122 213 0 0 165 0 0.2 2
## 114 114 43 0 asymptomatic 132 341 1 2 136 1 3.0 2
## 46 46 58 1 nonanginal 112 230 0 2 165 0 2.5 2
## 173 173 59 0 asymptomatic 174 249 0 0 143 1 0.0 2
## 94 94 44 0 nonanginal 108 141 0 0 175 0 0.6 2
## 62 62 46 0 nonanginal 142 177 0 2 160 1 1.4 3
## 51 51 41 0 nontypical 105 198 0 0 168 0 0.0 1
## 103 103 57 0 asymptomatic 128 303 0 2 159 0 0.0 1
## 14 14 44 1 nontypical 120 263 0 0 173 0 0.0 1
## 76 76 65 0 nonanginal 160 360 0 2 151 0 0.8 1
## 33 33 64 1 nonanginal 140 335 0 0 158 0 0.0 1
## 63 63 58 1 asymptomatic 128 216 0 2 131 1 2.2 2
## 252 252 58 1 asymptomatic 146 218 0 0 105 0 2.0 2
## 100 100 48 1 asymptomatic 122 222 0 2 186 0 0.0 1
## 56 56 54 1 asymptomatic 124 266 0 2 109 1 2.2 2
## 60 60 51 1 typical 125 213 0 2 125 1 1.4 1
## 221 221 41 0 nonanginal 112 268 0 2 172 1 0.0 1
## 255 255 43 1 asymptomatic 115 303 0 0 181 0 1.2 2
## 45 45 61 0 asymptomatic 130 330 0 2 169 0 0.0 1
## 281 281 57 1 asymptomatic 110 335 0 0 143 1 3.0 2
## 44 44 59 1 nonanginal 150 212 1 0 157 0 1.6 1
## 160 160 68 1 nonanginal 118 277 0 0 151 0 1.0 1
## 157 157 51 1 asymptomatic 140 299 0 0 173 1 1.6 1
## 270 270 42 1 nonanginal 130 180 0 0 150 0 0.0 1
## 101 101 45 1 asymptomatic 115 260 0 2 185 0 0.0 1
## 175 175 64 1 asymptomatic 145 212 0 2 132 0 2.0 2
## 64 64 54 0 nonanginal 135 304 1 0 170 0 0.0 1
## 104 104 71 0 nonanginal 110 265 1 2 130 0 0.0 1
## 61 61 51 0 asymptomatic 130 305 0 0 142 1 1.2 2
## 227 227 47 1 asymptomatic 112 204 0 0 143 0 0.1 1
## 296 296 41 1 nontypical 120 157 0 0 182 0 0.0 1
## 121 121 48 1 asymptomatic 130 256 1 2 150 1 0.0 1
## 192 192 51 1 asymptomatic 140 298 0 0 122 1 4.2 2
## 81 81 45 1 asymptomatic 104 208 0 2 148 1 3.0 2
## 241 241 41 1 nontypical 110 235 0 0 153 0 0.0 1
## 284 284 35 1 nontypical 122 192 0 0 174 0 0.0 1
## 52 52 65 1 asymptomatic 120 177 0 0 140 0 0.4 1
## 294 294 63 1 asymptomatic 140 187 0 2 144 1 4.0 1
## 151 151 52 1 typical 152 298 1 0 178 0 1.2 2
## 275 275 59 1 typical 134 204 0 0 162 0 0.8 1
## 97 97 59 1 asymptomatic 110 239 0 2 142 1 1.2 2
## 29 29 43 1 asymptomatic 150 247 0 0 171 0 1.5 1
## 71 71 65 0 nonanginal 155 269 0 0 148 0 0.8 1
## 237 237 56 1 asymptomatic 130 283 1 2 103 1 1.6 3
## 206 206 45 1 asymptomatic 142 309 0 2 147 1 0.0 2
## 146 146 47 1 nonanginal 108 243 0 0 152 0 0.0 1
## 169 169 35 1 asymptomatic 126 282 0 2 156 1 0.0 1
## 3 3 67 1 asymptomatic 120 229 0 2 129 1 2.6 2
## 116 116 41 1 nontypical 135 203 0 0 132 0 0.0 2
## 49 49 65 0 nonanginal 140 417 1 2 157 0 0.8 1
## 96 96 52 1 asymptomatic 128 255 0 0 161 1 0.0 1
## 245 245 60 0 nonanginal 120 178 1 0 96 0 0.0 1
## 48 48 50 1 asymptomatic 150 243 0 2 128 0 2.6 2
## 28 28 66 0 typical 150 226 0 0 114 0 2.6 3
## 232 232 55 0 asymptomatic 180 327 0 1 117 1 3.4 2
## 277 277 66 0 nonanginal 146 278 0 2 152 0 0.0 2
## 136 136 55 0 nontypical 135 250 0 2 161 0 1.4 2
## 132 132 51 1 nonanginal 94 227 0 0 154 1 0.0 1
## 166 166 57 1 asymptomatic 132 207 0 0 168 1 0.0 1
## 90 90 51 0 nonanginal 130 256 0 2 149 0 0.5 1
## 53 53 44 1 asymptomatic 112 290 0 2 153 0 0.0 1
## 257 257 67 0 asymptomatic 106 223 0 0 142 0 0.3 1
## 215 215 52 1 asymptomatic 112 230 0 0 160 0 0.0 1
## 111 111 61 0 asymptomatic 145 307 0 2 146 1 1.0 2
## 141 141 59 1 nontypical 140 221 0 0 164 1 0.0 1
## 105 105 49 1 nonanginal 120 188 0 0 139 0 2.0 2
## 242 242 41 0 nontypical 126 306 0 0 163 0 0.0 1
## 212 212 38 1 typical 120 231 0 0 182 1 3.8 2
## 276 276 64 1 typical 170 227 0 2 155 0 0.6 2
## 32 32 60 1 asymptomatic 117 230 1 0 160 1 1.4 1
## 286 286 58 1 asymptomatic 114 318 0 1 140 0 4.4 3
## 301 301 57 1 asymptomatic 130 131 0 0 115 1 1.2 2
## 226 226 34 0 nontypical 118 210 0 0 192 0 0.7 1
## 150 150 60 0 nonanginal 102 318 0 0 160 0 0.0 1
## 213 213 41 1 nonanginal 130 214 0 2 168 0 2.0 2
## 265 265 61 1 asymptomatic 138 166 0 2 125 1 3.6 2
## 161 161 46 1 nontypical 101 197 1 0 156 0 0.0 1
## 11 11 57 1 asymptomatic 140 192 0 0 148 0 0.4 2
## 272 272 66 1 asymptomatic 160 228 0 2 138 0 2.3 1
## 69 69 59 1 asymptomatic 170 326 0 2 140 1 3.4 3
## 228 228 67 0 nonanginal 152 277 0 0 172 0 0.0 1
## 246 246 67 1 asymptomatic 120 237 0 0 71 0 1.0 2
## 266 266 42 1 asymptomatic 136 315 0 0 125 1 1.8 2
## 240 240 42 1 nontypical 120 295 0 0 162 0 0.0 1
## 109 109 61 1 asymptomatic 120 260 0 0 140 1 3.6 2
## 224 224 53 1 asymptomatic 123 282 0 0 95 1 2.0 2
## 133 133 29 1 nontypical 130 204 0 2 202 0 0.0 1
## 191 191 50 1 nonanginal 129 196 0 0 163 0 0.0 1
## 54 54 44 1 nontypical 130 219 0 2 188 0 0.0 1
## 37 37 43 1 asymptomatic 120 177 0 2 120 1 2.5 2
## 36 36 42 1 asymptomatic 140 226 0 0 178 0 0.0 1
## 42 42 40 1 typical 140 199 0 0 178 1 1.4 1
## 299 299 45 1 typical 110 264 0 0 132 0 1.2 2
## 22 22 58 0 typical 150 283 1 2 162 0 1.0 1
## 113 113 52 1 typical 118 186 0 2 190 0 0.0 2
## 194 194 62 0 asymptomatic 138 294 1 0 106 0 1.9 2
## 211 211 37 0 nonanginal 120 215 0 0 170 0 0.0 1
## 55 55 60 1 asymptomatic 130 253 0 0 144 1 1.4 1
## 137 137 70 1 asymptomatic 145 174 0 0 125 1 2.6 3
## 77 77 60 1 asymptomatic 125 258 0 2 141 1 2.8 2
## 75 75 44 1 asymptomatic 110 197 0 2 177 0 0.0 1
## 253 253 64 1 asymptomatic 128 263 0 0 105 1 0.2 2
## 154 154 55 1 asymptomatic 160 289 0 2 145 1 0.8 2
## 202 202 64 0 asymptomatic 180 325 0 0 154 1 0.0 1
## 31 31 69 0 typical 140 239 0 0 151 0 1.8 1
## 171 171 70 1 nonanginal 160 269 0 0 112 1 2.9 2
## 184 184 59 1 typical 178 270 0 2 145 0 4.2 3
## 123 123 51 1 nonanginal 100 222 0 0 143 1 1.2 2
## 174 174 62 0 asymptomatic 140 394 0 2 157 0 1.2 2
## 259 259 70 1 nontypical 156 245 0 2 143 0 0.0 1
## 182 182 56 0 asymptomatic 134 409 0 2 150 1 1.9 2
## 295 295 63 0 asymptomatic 124 197 0 0 136 1 0.0 2
## Ca Thal AHD
## 205 0 reversable No
## 129 0 normal No
## 217 0 normal No
## 264 0 normal No
## 250 0 normal No
## 115 1 reversable Yes
## 247 1 reversable Yes
## 143 0 normal No
## 177 3 reversable No
## 145 0 reversable No
## 91 0 normal No
## 187 0 reversable No
## 291 0 reversable Yes
## 40 0 normal No
## 285 1 reversable Yes
## 223 0 normal No
## 231 0 normal No
## 271 1 reversable Yes
## 279 1 normal Yes
## 297 2 fixed Yes
## 98 2 reversable Yes
## 179 1 normal No
## 220 0 normal No
## 84 0 reversable Yes
## 199 0 normal No
## 256 0 normal No
## 290 0 normal No
## 200 0 normal Yes
## 72 2 reversable Yes
## 197 1 normal No
## 138 1 reversable Yes
## 17 0 reversable Yes
## 287 2 fixed Yes
## 269 0 reversable Yes
## 118 0 normal No
## 289 0 reversable No
## 238 0 reversable Yes
## 135 0 normal No
## 114 0 reversable Yes
## 46 1 reversable Yes
## 173 0 normal Yes
## 94 0 normal No
## 62 0 normal No
## 51 1 normal No
## 103 1 normal No
## 14 0 reversable No
## 76 0 normal No
## 33 0 normal Yes
## 63 3 reversable Yes
## 252 1 reversable Yes
## 100 0 normal No
## 56 1 reversable Yes
## 60 1 normal No
## 221 0 normal No
## 255 0 normal No
## 45 0 normal Yes
## 281 1 reversable Yes
## 44 0 normal No
## 160 1 reversable No
## 157 0 reversable Yes
## 270 0 normal No
## 101 0 normal No
## 175 2 fixed Yes
## 64 0 normal No
## 104 1 normal No
## 61 0 reversable Yes
## 227 0 normal No
## 296 0 normal No
## 121 2 reversable Yes
## 192 3 reversable Yes
## 81 0 normal No
## 241 0 normal No
## 284 0 normal No
## 52 0 reversable No
## 294 2 reversable Yes
## 151 0 reversable No
## 275 2 normal Yes
## 97 1 reversable Yes
## 29 0 normal No
## 71 0 normal No
## 237 0 reversable Yes
## 206 3 reversable Yes
## 146 0 normal Yes
## 169 0 reversable Yes
## 3 2 reversable Yes
## 116 0 fixed No
## 49 1 normal No
## 96 1 reversable Yes
## 245 0 normal No
## 48 0 reversable Yes
## 28 0 normal No
## 232 0 normal Yes
## 277 1 normal No
## 136 0 normal No
## 132 1 reversable No
## 166 0 reversable No
## 90 0 normal No
## 53 1 normal Yes
## 257 2 normal No
## 215 1 normal Yes
## 111 0 reversable Yes
## 141 0 normal No
## 105 3 reversable Yes
## 242 0 normal No
## 212 0 reversable Yes
## 276 0 reversable No
## 32 2 reversable Yes
## 286 3 fixed Yes
## 301 1 reversable Yes
## 226 0 normal No
## 150 1 normal No
## 213 0 normal No
## 265 1 normal Yes
## 161 0 reversable No
## 11 0 fixed No
## 272 0 fixed No
## 69 0 reversable Yes
## 228 1 normal No
## 246 0 normal Yes
## 266 0 fixed Yes
## 240 0 normal No
## 109 1 reversable Yes
## 224 2 reversable Yes
## 133 0 normal No
## 191 0 normal No
## 54 0 normal No
## 37 0 reversable Yes
## 36 0 normal No
## 42 0 reversable No
## 299 0 reversable Yes
## 22 0 normal No
## 113 0 fixed No
## 194 3 normal Yes
## 211 0 normal No
## 55 1 reversable Yes
## 137 0 reversable Yes
## 77 1 reversable Yes
## 75 1 normal Yes
## 253 1 reversable No
## 154 1 reversable Yes
## 202 0 normal No
## 31 2 normal No
## 171 1 reversable Yes
## 184 0 reversable No
## 123 0 normal No
## 174 0 normal No
## 259 0 normal No
## 182 2 reversable Yes
## 295 0 normal Yes
htest=heart[-train,]
htest
## X Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope
## 1 1 63 1 typical 145 233 1 2 150 0 2.3 3
## 2 2 67 1 asymptomatic 160 286 0 2 108 1 1.5 2
## 4 4 37 1 nonanginal 130 250 0 0 187 0 3.5 3
## 5 5 41 0 nontypical 130 204 0 2 172 0 1.4 1
## 6 6 56 1 nontypical 120 236 0 0 178 0 0.8 1
## 7 7 62 0 asymptomatic 140 268 0 2 160 0 3.6 3
## 8 8 57 0 asymptomatic 120 354 0 0 163 1 0.6 1
## 9 9 63 1 asymptomatic 130 254 0 2 147 0 1.4 2
## 10 10 53 1 asymptomatic 140 203 1 2 155 1 3.1 3
## 12 12 56 0 nontypical 140 294 0 2 153 0 1.3 2
## 13 13 56 1 nonanginal 130 256 1 2 142 1 0.6 2
## 15 15 52 1 nonanginal 172 199 1 0 162 0 0.5 1
## 16 16 57 1 nonanginal 150 168 0 0 174 0 1.6 1
## 18 18 54 1 asymptomatic 140 239 0 0 160 0 1.2 1
## 19 19 48 0 nonanginal 130 275 0 0 139 0 0.2 1
## 20 20 49 1 nontypical 130 266 0 0 171 0 0.6 1
## 21 21 64 1 typical 110 211 0 2 144 1 1.8 2
## 23 23 58 1 nontypical 120 284 0 2 160 0 1.8 2
## 24 24 58 1 nonanginal 132 224 0 2 173 0 3.2 1
## 25 25 60 1 asymptomatic 130 206 0 2 132 1 2.4 2
## 26 26 50 0 nonanginal 120 219 0 0 158 0 1.6 2
## 27 27 58 0 nonanginal 120 340 0 0 172 0 0.0 1
## 30 30 40 1 asymptomatic 110 167 0 2 114 1 2.0 2
## 34 34 59 1 asymptomatic 135 234 0 0 161 0 0.5 2
## 35 35 44 1 nonanginal 130 233 0 0 179 1 0.4 1
## 38 38 57 1 asymptomatic 150 276 0 2 112 1 0.6 2
## 39 39 55 1 asymptomatic 132 353 0 0 132 1 1.2 2
## 41 41 65 0 asymptomatic 150 225 0 2 114 0 1.0 2
## 43 43 71 0 nontypical 160 302 0 0 162 0 0.4 1
## 47 47 51 1 nonanginal 110 175 0 0 123 0 0.6 1
## 50 50 53 1 nonanginal 130 197 1 2 152 0 1.2 3
## 57 57 50 1 nonanginal 140 233 0 0 163 0 0.6 2
## 58 58 41 1 asymptomatic 110 172 0 2 158 0 0.0 1
## 59 59 54 1 nonanginal 125 273 0 2 152 0 0.5 3
## 65 65 54 1 asymptomatic 120 188 0 0 113 0 1.4 2
## 66 66 60 1 asymptomatic 145 282 0 2 142 1 2.8 2
## 67 67 60 1 nonanginal 140 185 0 2 155 0 3.0 2
## 68 68 54 1 nonanginal 150 232 0 2 165 0 1.6 1
## 70 70 46 1 nonanginal 150 231 0 0 147 0 3.6 2
## 73 73 62 1 asymptomatic 120 267 0 0 99 1 1.8 2
## 74 74 65 1 asymptomatic 110 248 0 2 158 0 0.6 1
## 78 78 51 0 nonanginal 140 308 0 2 142 0 1.5 1
## 79 79 48 1 nontypical 130 245 0 2 180 0 0.2 2
## 80 80 58 1 asymptomatic 150 270 0 2 111 1 0.8 1
## 82 82 53 0 asymptomatic 130 264 0 2 143 0 0.4 2
## 83 83 39 1 nonanginal 140 321 0 2 182 0 0.0 1
## 85 85 52 1 nontypical 120 325 0 0 172 0 0.2 1
## 86 86 44 1 nonanginal 140 235 0 2 180 0 0.0 1
## 87 87 47 1 nonanginal 138 257 0 2 156 0 0.0 1
## 89 89 53 0 asymptomatic 138 234 0 2 160 0 0.0 1
## 92 92 62 0 asymptomatic 160 164 0 2 145 0 6.2 3
## 93 93 62 1 nonanginal 130 231 0 0 146 0 1.8 2
## 95 95 63 0 nonanginal 135 252 0 2 172 0 0.0 1
## 99 99 52 1 nontypical 134 201 0 0 158 0 0.8 1
## 102 102 34 1 typical 118 182 0 2 174 0 0.0 1
## 106 106 54 1 nontypical 108 309 0 0 156 0 0.0 1
## 107 107 59 1 asymptomatic 140 177 0 0 162 1 0.0 1
## 108 108 57 1 nonanginal 128 229 0 2 150 0 0.4 2
## 110 110 39 1 asymptomatic 118 219 0 0 140 0 1.2 2
## 112 112 56 1 asymptomatic 125 249 1 2 144 1 1.2 2
## 117 117 58 1 nonanginal 140 211 1 2 165 0 0.0 1
## 119 119 63 1 asymptomatic 130 330 1 2 132 1 1.8 1
## 120 120 65 1 asymptomatic 135 254 0 2 127 0 2.8 2
## 122 122 63 0 asymptomatic 150 407 0 2 154 0 4.0 2
## 124 124 55 1 asymptomatic 140 217 0 0 111 1 5.6 3
## 125 125 65 1 typical 138 282 1 2 174 0 1.4 2
## 126 126 45 0 nontypical 130 234 0 2 175 0 0.6 2
## 127 127 56 0 asymptomatic 200 288 1 2 133 1 4.0 3
## 128 128 54 1 asymptomatic 110 239 0 0 126 1 2.8 2
## 130 130 62 0 asymptomatic 124 209 0 0 163 0 0.0 1
## 131 131 54 1 nonanginal 120 258 0 2 147 0 0.4 2
## 134 134 51 1 asymptomatic 140 261 0 2 186 1 0.0 1
## 139 139 35 1 asymptomatic 120 198 0 0 130 1 1.6 2
## 140 140 51 1 nonanginal 125 245 1 2 166 0 2.4 2
## 142 142 59 1 typical 170 288 0 2 159 0 0.2 2
## 144 144 64 1 nonanginal 125 309 0 0 131 1 1.8 2
## 147 147 57 1 asymptomatic 165 289 1 2 124 0 1.0 2
## 148 148 41 1 nonanginal 112 250 0 0 179 0 0.0 1
## 149 149 45 1 nontypical 128 308 0 2 170 0 0.0 1
## 152 152 42 0 asymptomatic 102 265 0 2 122 0 0.6 2
## 153 153 67 0 nonanginal 115 564 0 2 160 0 1.6 2
## 155 155 64 1 asymptomatic 120 246 0 2 96 1 2.2 3
## 156 156 70 1 asymptomatic 130 322 0 2 109 0 2.4 2
## 158 158 58 1 asymptomatic 125 300 0 2 171 0 0.0 1
## 159 159 60 1 asymptomatic 140 293 0 2 170 0 1.2 2
## 162 162 77 1 asymptomatic 125 304 0 2 162 1 0.0 1
## 163 163 54 0 nonanginal 110 214 0 0 158 0 1.6 2
## 164 164 58 0 asymptomatic 100 248 0 2 122 0 1.0 2
## 165 165 48 1 nonanginal 124 255 1 0 175 0 0.0 1
## 168 168 54 0 nontypical 132 288 1 2 159 1 0.0 1
## 170 170 45 0 nontypical 112 160 0 0 138 0 0.0 2
## 172 172 53 1 asymptomatic 142 226 0 2 111 1 0.0 1
## 176 176 57 1 asymptomatic 152 274 0 0 88 1 1.2 2
## 178 178 56 1 asymptomatic 132 184 0 2 105 1 2.1 2
## 180 180 53 1 nonanginal 130 246 1 2 173 0 0.0 1
## 181 181 48 1 asymptomatic 124 274 0 2 166 0 0.5 2
## 183 183 42 1 typical 148 244 0 2 178 0 0.8 1
## 185 185 60 0 asymptomatic 158 305 0 2 161 0 0.0 1
## 186 186 63 0 nontypical 140 195 0 0 179 0 0.0 1
## 188 188 66 1 nontypical 160 246 0 0 120 1 0.0 2
## 189 189 54 1 nontypical 192 283 0 2 195 0 0.0 1
## 190 190 69 1 nonanginal 140 254 0 2 146 0 2.0 2
## 195 195 68 0 nonanginal 120 211 0 2 115 0 1.5 2
## 196 196 67 1 asymptomatic 100 299 0 2 125 1 0.9 2
## 198 198 45 0 asymptomatic 138 236 0 2 152 1 0.2 2
## 201 201 50 0 asymptomatic 110 254 0 2 159 0 0.0 1
## 203 203 57 1 nonanginal 150 126 1 0 173 0 0.2 1
## 204 204 64 0 nonanginal 140 313 0 0 133 0 0.2 1
## 207 207 58 1 asymptomatic 128 259 0 2 130 1 3.0 2
## 208 208 50 1 asymptomatic 144 200 0 2 126 1 0.9 2
## 209 209 55 1 nontypical 130 262 0 0 155 0 0.0 1
## 210 210 62 0 asymptomatic 150 244 0 0 154 1 1.4 2
## 214 214 66 0 asymptomatic 178 228 1 0 165 1 1.0 2
## 216 216 56 1 typical 120 193 0 2 162 0 1.9 2
## 218 218 46 0 asymptomatic 138 243 0 2 152 1 0.0 2
## 219 219 64 0 asymptomatic 130 303 0 0 122 0 2.0 2
## 222 222 54 0 nonanginal 108 267 0 2 167 0 0.0 1
## 225 225 63 0 asymptomatic 108 269 0 0 169 1 1.8 2
## 229 229 54 1 asymptomatic 110 206 0 2 108 1 0.0 2
## 230 230 66 1 asymptomatic 112 212 0 2 132 1 0.1 1
## 233 233 49 1 nonanginal 118 149 0 2 126 0 0.8 1
## 234 234 74 0 nontypical 120 269 0 2 121 1 0.2 1
## 235 235 54 0 nonanginal 160 201 0 0 163 0 0.0 1
## 236 236 54 1 asymptomatic 122 286 0 2 116 1 3.2 2
## 239 239 49 0 nontypical 134 271 0 0 162 0 0.0 2
## 243 243 49 0 asymptomatic 130 269 0 0 163 0 0.0 1
## 244 244 61 1 typical 134 234 0 0 145 0 2.6 2
## 248 248 47 1 asymptomatic 110 275 0 2 118 1 1.0 2
## 249 249 52 1 asymptomatic 125 212 0 0 168 0 1.0 1
## 251 251 57 1 asymptomatic 110 201 0 0 126 1 1.5 2
## 254 254 51 0 nonanginal 120 295 0 2 157 0 0.6 1
## 258 258 76 0 nonanginal 140 197 0 1 116 0 1.1 2
## 260 260 57 1 nontypical 124 261 0 0 141 0 0.3 1
## 261 261 44 0 nonanginal 118 242 0 0 149 0 0.3 2
## 262 262 58 0 nontypical 136 319 1 2 152 0 0.0 1
## 263 263 60 0 typical 150 240 0 0 171 0 0.9 1
## 268 268 59 1 nonanginal 126 218 1 0 134 0 2.2 2
## 273 273 46 1 asymptomatic 140 311 0 0 120 1 1.8 2
## 274 274 71 0 asymptomatic 112 149 0 0 125 0 1.6 2
## 278 278 39 0 nonanginal 138 220 0 0 152 0 0.0 2
## 280 280 58 0 asymptomatic 130 197 0 0 131 0 0.6 2
## 282 282 47 1 nonanginal 130 253 0 0 179 0 0.0 1
## 283 283 55 0 asymptomatic 128 205 0 1 130 1 2.0 2
## 292 292 55 0 nontypical 132 342 0 0 166 0 1.2 1
## 293 293 44 1 asymptomatic 120 169 0 0 144 1 2.8 3
## 298 298 57 0 asymptomatic 140 241 0 0 123 1 0.2 2
## 300 300 68 1 asymptomatic 144 193 1 0 141 0 3.4 2
## 302 302 57 0 nontypical 130 236 0 2 174 0 0.0 2
## Ca Thal AHD
## 1 0 fixed No
## 2 3 normal Yes
## 4 0 normal No
## 5 0 normal No
## 6 0 normal No
## 7 2 normal Yes
## 8 0 normal No
## 9 1 reversable Yes
## 10 0 reversable Yes
## 12 0 normal No
## 13 1 fixed Yes
## 15 0 reversable No
## 16 0 normal No
## 18 0 normal No
## 19 0 normal No
## 20 0 normal No
## 21 0 normal No
## 23 0 normal Yes
## 24 2 reversable Yes
## 25 2 reversable Yes
## 26 0 normal No
## 27 0 normal No
## 30 0 reversable Yes
## 34 0 reversable No
## 35 0 normal No
## 38 1 fixed Yes
## 39 1 reversable Yes
## 41 3 reversable Yes
## 43 2 normal No
## 47 0 normal No
## 50 0 normal No
## 57 1 reversable Yes
## 58 0 reversable Yes
## 59 1 normal No
## 65 1 reversable Yes
## 66 2 reversable Yes
## 67 0 normal Yes
## 68 0 reversable No
## 70 0 normal Yes
## 73 2 reversable Yes
## 74 2 fixed Yes
## 78 1 normal No
## 79 0 normal No
## 80 0 reversable Yes
## 82 0 normal No
## 83 0 normal No
## 85 0 normal No
## 86 0 normal No
## 87 0 normal No
## 89 0 normal No
## 92 3 reversable Yes
## 93 3 reversable No
## 95 0 normal No
## 99 1 normal No
## 102 0 normal No
## 106 0 reversable No
## 107 1 reversable Yes
## 108 1 reversable Yes
## 110 0 reversable Yes
## 112 1 normal Yes
## 117 0 normal No
## 119 3 reversable Yes
## 120 1 reversable Yes
## 122 3 reversable Yes
## 124 0 reversable Yes
## 125 1 normal Yes
## 126 0 normal No
## 127 2 reversable Yes
## 128 1 reversable Yes
## 130 0 normal No
## 131 0 reversable No
## 134 0 normal No
## 139 0 reversable Yes
## 140 0 normal No
## 142 0 reversable Yes
## 144 0 reversable Yes
## 147 3 reversable Yes
## 148 0 normal No
## 149 0 normal No
## 152 0 normal No
## 153 0 reversable No
## 155 1 normal Yes
## 156 3 normal Yes
## 158 2 reversable Yes
## 159 2 reversable Yes
## 162 3 normal Yes
## 163 0 normal No
## 164 0 normal No
## 165 2 normal No
## 168 1 normal No
## 170 0 normal No
## 172 0 reversable No
## 176 1 reversable Yes
## 178 1 fixed Yes
## 180 3 normal No
## 181 0 reversable Yes
## 183 2 normal No
## 185 0 normal Yes
## 186 2 normal No
## 188 3 fixed Yes
## 189 1 reversable Yes
## 190 3 reversable Yes
## 195 0 normal No
## 196 2 normal Yes
## 198 0 normal No
## 201 0 normal No
## 203 1 reversable No
## 204 0 reversable No
## 207 2 reversable Yes
## 208 0 reversable Yes
## 209 0 normal No
## 210 0 normal Yes
## 214 2 reversable Yes
## 216 0 reversable No
## 218 0 normal No
## 219 2 normal No
## 222 0 normal No
## 225 2 normal Yes
## 229 1 normal Yes
## 230 1 normal Yes
## 233 3 normal Yes
## 234 1 normal No
## 235 1 normal No
## 236 2 normal Yes
## 239 0 normal No
## 243 0 normal No
## 244 2 normal Yes
## 248 1 normal Yes
## 249 2 reversable Yes
## 251 0 fixed No
## 254 0 normal No
## 258 0 normal No
## 260 0 reversable Yes
## 261 1 normal No
## 262 2 normal Yes
## 263 0 normal No
## 268 1 fixed Yes
## 273 2 reversable Yes
## 274 0 normal No
## 278 0 normal No
## 280 0 normal No
## 282 0 normal No
## 283 1 reversable Yes
## 292 0 normal No
## 293 0 fixed Yes
## 298 0 reversable Yes
## 300 2 reversable Yes
## 302 1 normal Yes
### carry out the fitting on the training set
attach(heart)
AHD.test=AHD[-train]
AHD.train=AHD[train]
heart.train=heart[train,]
heart.test=heart[-train,]
heart.tree=tree(AHD~., data=heart.train)
heartcv=cv.tree(heart.tree, FUN=prune.misclass)
plot(heartcv$size,heartcv$dev,type='b')
bestsize=heartcv$size[heartcv$dev==min(heartcv$dev)]
heart.prune=prune.misclass(heart.tree,best=bestsize)
print(heart.prune)
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 149 204.600 No ( 0.55705 0.44295 )
## 2) Thal: normal 78 76.370 No ( 0.80769 0.19231 )
## 4) ChestPain: nonanginal,nontypical 47 22.310 No ( 0.93617 0.06383 ) *
## 5) ChestPain: asymptomatic,typical 31 41.380 No ( 0.61290 0.38710 )
## 10) MaxHR < 177.5 25 34.620 No ( 0.52000 0.48000 )
## 20) Oldpeak < 0.05 9 9.535 Yes ( 0.22222 0.77778 ) *
## 21) Oldpeak > 0.05 16 19.870 No ( 0.68750 0.31250 )
## 42) X < 184 8 0.000 No ( 1.00000 0.00000 ) *
## 43) X > 184 8 10.590 Yes ( 0.37500 0.62500 ) *
## 11) MaxHR > 177.5 6 0.000 No ( 1.00000 0.00000 ) *
## 3) Thal: fixed,reversable 71 84.430 Yes ( 0.28169 0.71831 )
## 6) Oldpeak < 0.7 22 28.840 No ( 0.63636 0.36364 )
## 12) ChestPain: nonanginal,nontypical,typical 8 0.000 No ( 1.00000 0.00000 ) *
## 13) ChestPain: asymptomatic 14 19.120 Yes ( 0.42857 0.57143 )
## 26) Chol < 233.5 7 8.376 No ( 0.71429 0.28571 ) *
## 27) Chol > 233.5 7 5.742 Yes ( 0.14286 0.85714 ) *
## 7) Oldpeak > 0.7 49 36.430 Yes ( 0.12245 0.87755 ) *
text(heart.prune)
pred.train=predict(heart.prune,data=heart.train,type="class")
ctable=table(AHD.train,pred.train)
print(ctable)
## pred.train
## AHD.train No Yes
## No 71 12
## Yes 5 61
trainrate= (sum(ctable)-sum(diag(ctable)))/sum(ctable)
pred.test=predict(heart.prune,heart.test,type="class")
ctable=table(AHD.test,pred.test)
print(ctable)
## pred.test
## AHD.test No Yes
## No 59 18
## Yes 21 50
testrate= (sum(ctable)-sum(diag(ctable)))/sum(ctable)
\[E=\frac{1}{n^2} \sum_{i,j} |x_i - x_j| = \bar{d_{ij}}\] \(M=\frac{1}{n}\sum x_i\) \[G=E/2M\] difference / mean
There is a some alternative measures. For instance, Generalized Entropy Indexes, Atkinson Index, Piesch index, Kakwani index. Each measure can exhibit differnt properties.
\[G=\sum_{i \neq j } p_i p_j\]
\[1-p_1^2 -p_2^2\]
\[1-F\]