Homophily and Community Structure in Networks

AuthorPRITHA DEV
Date01 April 2016
DOIhttp://doi.org/10.1111/jpet.12142
Published date01 April 2016
HOMOPHILY AND COMMUNITY STRUCTURE IN NETWORKS
PRITHA DEV
ITAM
Abstract
This paper proposes a strategy to estimate the community structure for
a network accounting for the empirically established fact that commu-
nities and links are formed based on homophily. It presents a maxi-
mum likelihood approach to rank community structures where the set
of possible community structures depends on the set of salient charac-
teristics and the probability of a link between two nodes varies accord-
ing to the characteristics of the two nodes. This approach has good
large sample properties, which lead to a practical algorithm for the es-
timation. Toexemplify the approach it is applied to data collected from
four village clusters in Ghana.1
1. Introduction
This paper examines homophily in networks with a particular focus on extracting the
salient characteristics for the fragmentation of networks. Homophily has long been
documented in the sociology literature as evident from the seminal paper by Lazars-
feld and Merton (1954). McPherson, Smith-Lovin, and Cook (2001) provide an excel-
lent overview of the vast literature studying homophily.2This literature points to net-
works organizing by homophilous groups with more links between members of the
same homophilous group than across groups. In this paper, an empirical methodol-
ogy is outlined to estimate the salient characteristics that lead to the formation of these
homophilous groups.
In particular, this paper seeks to uncover the community structure underlying a net-
work while directly accounting for the fact that communities within a network arise due
to homophily. A community is defined as a collection of nodes in which each member
of the community is more likely to have links with nodes from the community than with
1I would like to thank Chris Udry and Markus Goldstein for providing free access to data collected by
them from four villages in Eastern Region of Ghana.
2In the field of economics, a recent paper by Currarini, Jackson, and Pin (2009) shows the presence of
homophily in friendship networks.
I am thankful to Hans Haller for his suggestion regarding the name of the paper and to Alberto Bisin,
Prabal De, Kaushal Kishore, Guido Ruta, Julia Schwenkenberg, and Joerg Stoye for helpful comments
and suggestions.
I would also like to thank seminar participants at the LSU Networks Conference 2013. Support from
the Asociaci´
on Mexicana de Cultur´
a A.C. is gratefully acknowledged.
Received September 21, 2014; Accepted October 13, 2014.
C2014 Wiley Periodicals, Inc.
Journal of Public Economic Theory, 18 (2), 2016, pp. 268–290.
268
Community Structure in Networks 269
Figure 1: Raw network data.
(a) Sorted by Color (b) Sorted by Shape
Figure 2: Rearranged data.
nodes outside the community. A community structure is then the collection of all such
communities in a population. The novelty of this paper lies in focusing only on those
community structures that satisfy homophily along some characteristics. This greatly
reduces the costs of searching for the community structure for any given network.
In this paper, each node in a network is assigned an identity where identity is
defined along different dimensions and each dimension is composed of a fixed set
of discrete characteristics. Each node’s identity characteristic vector3consists of one
characteristic from each of these dimensions. This paper then seeks to find the dimen-
sions that lead to the fragmentation of networks. To fix ideas consider the network in
Figure 1 where nodes have identity along the dimensions of Color (white/black) and
Shape (square/triangle). Judging by this figure it is not clear which, if any, dimension
of identity is more important in partitioning the network. In Figures 2(a) and (b)
the network data are rearranged by Color and by Shape, respectively. These figures
make it visually clear that Shape is more important in generating the link data. The
estimation strategy is a generalization of the ideas presented in Figures 1 and 2(a)
and (b). It involves attaching likelihood numbers to the various possible partitions
and probabilities of link formation (varying by identity) and picking the partition and
corresponding probabilities of link formation, which maximize the likelihood.
The most important contribution of the paper comes from the fact that by account-
ing for homophily, the space of possible community structures can be reduced dramat-
ically. Suppressing network data, suppose the nodes are of the form given in Figure 3.
As in the previous figures, the nodes have characteristics along the dimension of Shape
and Color. Disregarding their characteristics, any partition defined over the set of nodes
is a possible community structure. Two such examples of the community structure are
given in Figure 4. The number of all possible community structures is a function of
the number of nodes and is increasing at an increasing rate with this number. An ex-
ample of the number of possible community structures is presented in Argyle (2013),
where it is shown that with 10 nodes the possible community structures are 115,975 while
with 25 nodes the possible community structures are 4,638,590,332,330,743,949. A phys-
ical search over all the possible community structures becomes impractical for a large
3Note that the identity characteristic vector attached to any node need not be unique as per the defi-
nition used in this paper.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT