On the development of a satisfaction survey instrument using PageRank centrality.

Author:Kim, In-Jae

    Questionnaires have been recognized as one of the most popular survey instruments because they are more economical and convenient than any other instruments, and can be administered to large numbers of people (Rosenthal and Rosnow, 1984). Among many important issues facing instrument designers, the construction of a questionnaire with structured questions usually involves lengthy and tedious processes. Determination of key areas that represent all significant aspects of the survey and development of valid, reliable, and precise questions for each area are the two most critical issues in designing a survey questionnaire. Even though it is widely used by many organizations, it is noteworthy that poorly worded questions and lengthy questionnaires could often result in undesirable and insincere behaviors toward the survey, thereby producing biased and meaningless answers. While survey questions should be carefully determined so that the purpose of the survey could be effectively measured, questionnaire developers may often fall into a trap of 'overkill' that may result in negative effects on the survey. Some survey questions might be related conceptually to other questions, might be similar in wording to other questions, and might be a mere duplication of other questions. When such phenomena happen, the effectiveness of the survey might not be maximized because of the respondents' undesirable responses.

    With the purpose of overcoming such drawbacks, this paper introduces 'PageRank Centrality' that can be used in developing a parsimonious survey instrument. Our goal is not to dwell on the issues associated with the wording of survey questions, but instead to focus on how the number of survey questions could be reduced effectively without damaging the gist of the survey, if appropriate. After conceptual relationships among survey questions are examined, the PageRank central scores are obtained for all survey questions and a small number of central questions that can still represent the theme of the survey are determined. The mathematical background on PageRank Centrality and an illustrated example are explored further in the following sections.


    In order to explain the mathematical foundations of PageRank Centrality, we first introduce the following network theory.

    A network is a graphical configuration consisting of dots and lines/curves connecting dots. The dots are called vertices and lines/curves are called edges. If a vertex i is connected to a vertex j by an edge, then we say that vertex i is adjacent to vertex j, or vertex j is a neighbor of vertex i. The number of neighbors of a vertex i is called the degree of vertex i. If the edge between vertices i and j has a direction, for instance, from vertex i to j, then the directed edge is called an arc from vertex i to j. This arc is an out-going arc from vertex i and an in-coming arc into vertex j. An edge between vertices i and j can be considered as two different arcs with opposite directions between vertices i and j. The number of out-going arcs from a vertex i is called its out-degree, and the number of in-coming arcs into the vertex i is called its in-degree.

    The arc (or edge) dynamics among the vertices of a network can be captured in an algebraic object, the adjacency matrix of the network.

    Definition 1

    The adjacency matrix A = [[a.sub.ij]] of a network is defined as follows:


    The order of A is equal to the number of vertices in the network.



    The adjacency matrix of the above network is


    Using this arc dynamics of the network, one can tell which vertices are more central (important, popular, etc.) than others by comparing their centrality scores. In the following section, we define centrality score and discuss how to measure such scores.


    A centrality measure can be used to find the most important or central vertices in a network. A simple centrality measure in a network is just the in-degree of a vertex. We can consider an in-coming arc into a vertex i as one "centrality point" for vertex i, i.e., the in-degree centrality score [d.sub.i] of vertex i is [d.sub.i] = [[summation].sub.j] [a.sub.ij], where [a.sub.ij] is the (i,j)-entry of the adjacency matrix A of the network. In a social network, for instance, it seems reasonable to assume that individuals who have connections to many others (getting many centrality points from others) might have more influence, more access to information, or more prestige than those who have fewer connections (Newman, 2010). Note that the in-degree centrality score [d.sub.i] is equal to the ith row sum of A. In the example in Section 2, the vertex with the highest in-degree centrality scores is the vertex 4 because the fourth row sum is greater than the others. For in-degree centrality, we treat each neighbor equivalent by giving one "centrality point" to every neighbor. However, in reality, vertices in a network are not likely to be equivalent. The eigenvector centrality treats the neighbors differently, giving each vertex a score proportional to the sum of the scores of its neighbors. Here is how it works.

    We first make some initial guess about the centrality [x.sub.i] of each vertex i, say [x.sub.i] = 1 for each i. We use this not-so-useful measure to compute a better one, [x'.sub.i]. We define [x'.sub.i] to be the sum of the centralities of vertex i's neighbors, i.e.,

    [x'.sub.i] =...

To continue reading