Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 2: Stat 4690: Statistical Analysis of Networks

Online submission through carmen due on Tuesday, February 13, at 11:59 PM

library(igraph)

football = read_graph("~/Downloads/football.gml", format="gml")

vertex_attr(football)

## $id

## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

## [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

## [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

## [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

## [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

## [91] 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

## [109] 108 109 110 111 112 113 114

##

## $label

## [1] "BrighamYoung" "FloridaState" "Iowa"

## [4] "KansasState" "NewMexico" "TexasTech"

## [7] "PennState" "SouthernCalifornia" "ArizonaState"

## [10] "SanDiegoState" "Baylor" "NorthTexas"

## [13] "NorthernIllinois" "Northwestern" "WesternMichigan"

## [16] "Wisconsin" "Wyoming" "Auburn"

## [19] "Akron" "VirginiaTech" "Alabama"

## [22] "UCLA" "Arizona" "Utah"

## [25] "ArkansasState" "NorthCarolinaState" "BallState"

## [28] "Florida" "BoiseState" "BostonCollege"

## [31] "WestVirginia" "BowlingGreenState" "Michigan"

## [34] "Virginia" "Buffalo" "Syracuse"

## [37] "CentralFlorida" "GeorgiaTech" "CentralMichigan"

## [40] "Purdue" "Colorado" "ColoradoState"

## [43] "Connecticut" "EasternMichigan" "EastCarolina"

## [46] "Duke" "FresnoState" "OhioState"

## [49] "Houston" "Rice" "Idaho"

## [52] "Washington" "Kansas" "SouthernMethodist"

## [55] "Kent" "Pittsburgh" "Kentucky"

## [58] "Louisville" "LouisianaTech" "LouisianaMonroe"

## [61] "Minnesota" "MiamiOhio" "Vanderbilt"

## [64] "MiddleTennesseeState" "Illinois" "MississippiState"

## [67] "Memphis" "Nevada" "Oregon"

## [70] "NewMexicoState" "SouthCarolina" "Ohio"

## [73] "IowaState" "SanJoseState" "Nebraska"

## [76] "SouthernMississippi" "Tennessee" "Stanford"

## [79] "WashingtonState" "Temple" "Navy"

## [82] "TexasA&M" "NotreDame" "TexasElPaso"

## [85] "Oklahoma" "Toledo" "Tulane"

## [88] "Mississippi" "Tulsa" "NorthCarolina"

## [91] "UtahState" "Army" "Cincinnati"

## [94] "AirForce" "Rutgers" "Georgia"

## [97] "LouisianaState" "LouisianaLafayette" "Texas"

## [100] "Marshall" "MichiganState" "MiamiFlorida"

## [103] "Missouri" "Clemson" "NevadaLasVegas"

## [106] "WakeForest" "Indiana" "OklahomaState"

## [109] "OregonState" "Maryland" "TexasChristian"

## [112] "California" "AlabamaBirmingham" "Arkansas"

## [115] "Hawaii"

##

## $value

## [1] 7 0 2 3 7 3 2 8 8 7 3 10 6 2 6 2 7 9 6 1 9 8 8 7 10

## [26] 0 6 9 11 1 1 6 2 0 6 1 5 0 6 2 3 7 5 6 4 0 11 2 4 11

## [51] 10 8 3 11 6 1 9 4 11 10 2 6 9 10 2 9 4 11 8 10 9 6 3 11 3

## [76] 4 9 8 8 1 5 3 5 11 3 6 4 9 11 0 5 4 4 7 1 9 9 10 3 6

## [101] 2 1 3 0 7 0 2 3 8 0 4 8 4 9 11

1. (60 points) The following questions are related to community detection in network. We will use the US college football dataset attached with this homework in the .gml format. You may use the above code to read the graph in R using igraph. The function vertex_attr() above queries vertex attributes of the graph. The vertices can be identified by their ids as well as the labels which are the actual names of college football teams. The third attribute “value” contains information on which conference the team belongs to.

(a) (15) Cluster the graph using the Louvain algorithm to optimize modularity score. Report how many communities you obtain, their sizes (i.e., how many vertices are there in each community), and the modularity value of the detected community structure.

(b) (15) Visualize the community structure in the graph. To do so, plot the football network and color the vertices by the communities you have detected in the previous step (you may also draw a color polygon covering the vertices in the same community to viualize the community structure). Show the vertex names. Make the plot nice enough so that the communities can be clearly seen.

(c) (15) How well the detected community structure match the conference the colleges belong to? The conference information can be obtained from the value field in the vertex attributes. You may use the compare() function in igraph with method=“nmi“. A high value indicates good agreement between two community assignment vectors

(d) (15) Identify the names of the other schools which are in the same community as OhioState.

2. (45 points) The following questions are based on some probability calculations on the ER random graph.

(a) (10) What is the expected number of q cliques (2 < q < n), i.e., a clique involving q vertices, in a random graph obtained from the Erdos-Renyi G(n, p) model. (b) (10) Suppose you observe a random graph with n vertices and find there are m edges. Then find an estimate for the expected number of triangles in the graph. [Hint: Obtain the MLE for the probability of an edge parameter p, and then plug in the MLE in the formula for expected number of triangles]

(c) (15) Let Xn be non-negative random variable. If E[Xn] → 0, then show that P(Xn = 0) → 1. [Hint: Use Markov inequality]. Now let Xn indicate the number of 4-cliques in a Erdos-Renyi random graph. Use the above result and your expectation calculation from part (a) to show that P(Xn = 0) → 1 if np → 0.

(d) (10) Let Gn be a random graph from the model G(n, 1/2). Using the theorems we have seen in class, determine whether the probability that Gn contains at least one triangle converges to 0 or 1, i.e., P(Gn has at least one triangle ) → 1 or 0, as n → ∞. No detailed proof required - just apply the result of the theorem in lecture.

The total possible points are 105, while your grade will be recorded as out of 100. Therefore if you make no mistakes in the homework then you will receive 105/100. Therefore effectively the homework has 5 bonus points that everyone who submits on time receives.