Main path analysis
Main path analysis was first proposed by Hummon and Doreian[1]. It is a mathematical tool to identify the major paths in a directed acyclic graph (DAG), typically citation network. The method begins by measuring the significance of all the links in a citation network through the concept of ‘traversal count’ and then sequentially chains the most significant links into a ‘main path’, which is deemed the most significant historical path in the target DAG. The method is applicable to any human activity that can be organized in the form of a DAG. The most common applications are tracing the knowledge flow paths or development trajectories of a science or technology field through bibliographic citations or patent citations[2][3][4]. It has also been applied to judicial decision to trace the evolving changes of legal opinion[5].
History
Main path analysis is first propoed in Hummon and Doreian[1] in which they suggest a different approach for analyzing a citation network "where the connective threads through a network are preserved and the focus is on the links in the network rather than on the nodes."[1] They call the resulting chain of the most used citation links "main path" and claim that "It is our intuition that the main path, selected on the basis of the most used path will identify the main stream of a literature." The idea was verified using a set of DNA research articles. To make the method more practical, Liu and Lu[6] proposed the key-route search. The most useful feature of the key-route search is that one is able to view different level of main paths by adjusting the number key-routes.
The method
Main path analysis operates in two steps. The first step obtains the traversal count of each link in a DAG. Several algorithms are mentioned in the literature to calculate traversal count. The second step searches for the main paths by linking the significant links according to the size of traversal counts. Hereafter, the method is explained assuming the the DAG is a citation network.
Preparing a citation network
It is necessary to prepare a citation network before starting main path analysis. In a citation network, the nodes are the documents such as academic articles, patents, or legal cases. These nodes a connected using citation information. Citation networks are by nature directed. They are directed because the two nodes associated with a link is not symmetrical in their roles. As regards to the direction, this article adopts the convention that the cited node points to the citing node, signifying the fact that knowledge in the cited node flows to the citing node. Citation network are also by nature acyclic, which means that a node can never chain back to itself following the directed link. Several terms related to citation network are defined here before proceeding further .
Heads and tails: Heads are the nodes the direction arrow leads to. Tails are the nodes on other end of the direction arrow.
Sources and sinks: Sources are the nodes that are cited, but cite no others. Sinks cite other nodes, but are not cited.
Ancestors and descendants: Ancestors are the nodes that can be traced back to. Descendants are the nodes that one can reach following the links.
Traversal counts
Traversal counts measures the significance of a link. Literature discusses several types of traversal counts, including search path count (SPC), search path link count (SPLC), search path node pair (SPNP), and other variations. All these traversal counts will be noted as SPX.
Search path count (SPC)
A link’s SPC is the number of times the link is traversed if one runs through all possible paths from all the sources to all the sinks. SPC is first proposed by Vladimir Batagelj[7].
Search path link count (SPLC)
A link’s SPLC is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the sinks. SPLC is first proposed by Hummon and Doreian[1].
Search path node pair (SPNP)
A link’s SPNP is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the descendants of the head node (including itself). SPNP is first proposed by Hummon and Doreian[1].
Path search
Based on the traversal counts, one can then search for the most significant path(s). There are several ways of finding them, including local search, global, and key-route search.
Local search
This is mentioned in Hummon and Doreian[1] as "priority first" search. In the search process, local search chooses the next link with the highest SPX as the outgoing link. It keeps tracking the most traversed link thus obtain the main stream among all citation chains.
Global search
Global search simply suggest the path with the largest overall SPX. The concept of global search is similar to critical path method in project scheduling.
Key-route search
Key-route search[6] is designed to avoid the problem of missing significant links in both the local and global search.
The Variants
In addition to the key-route search approach, variations of the method include approaches that are aggregative and stochastic[8], consider decay in knowledge diffusion[9], etc.
Applications
The method has been applied to three types of documenting system that has the tradition of making references to the previous documents. They are the academic article, patent, and legal case system.
Academic article
Patent
Legal case
Software Implementation
Main path analysis is implemented in Pajek, a widely used freeware written by Vladimir Batagelj and Andrej Mrvar of University of Ljubljana, Slovenia. To run main path analysis in Pajek, one needs to prepare a citation network and have Pajek reads in the network. Next, in the Pajek main menu, computes the traversal counts of all links in the network applying one of the following command sequences.
Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPC), or
Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPLC), or
Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Node Pairs (SPNP)
After traversal counts are computed, the following command sequences find the main paths.
For local main paths
Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Forward
For global main paths
Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Standard
For local key-route main paths
Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Key-Route
For global key-route main paths
Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Key-Route
References
- ^ a b c d e f Hummon, Norman P.; Doreian, Patrick. "Connectivity in a citation network: The development of DNA theory". Social Networks. 11 (1): 39–63. doi:10.1016/0378-8733(89)90017-8.
- ^ Liu, John S.; Lu, Louis Y.Y.; Lu, Wen-Min; Lin, Bruce J.Y. "Data envelopment analysis 1978–2010: A citation-based literature survey". Omega. 41 (1): 3–15. doi:10.1016/j.omega.2010.12.006.
- ^ Verspagen, Bart (2007-03-01). "Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research". Advances in Complex Systems. 10 (01): 93–115. doi:10.1142/S0219525907000945. ISSN 0219-5259.
- ^ Lucio-Arias, Diana; Leydesdorff, Loet (2008-10-01). "Main-path analysis and path-dependent transitions in HistCite™-based historiograms". Journal of the American Society for Information Science and Technology. 59 (12): 1948–1962. doi:10.1002/asi.20903. ISSN 1532-2890.
- ^ Liu, John S.; Chen, Hsiao-Hui; Ho, Mei Hsiu-Ching; Li, Yu-Chen (2014-12-01). "Citations with different levels of relevancy: Tracing the main paths of legal opinions". Journal of the Association for Information Science and Technology. 65 (12): 2479–2488. doi:10.1002/asi.23135. ISSN 2330-1643.
- ^ a b Liu, John S.; Lu, Louis Y.Y. (2012-03-01). "An integrated approach for main path analysis: Development of the Hirsch index as an example". Journal of the American Society for Information Science and Technology. 63 (3): 528–542. doi:10.1002/asi.21692. ISSN 1532-2890.
- ^ Batagelj, V. (2003). Efficient algorithms for citation network analysis. arXiv preprint cs/0309023.
- ^ Yeo, Woondong; Kim, Seonho; Lee, Jae-Min; Kang, Jaewoo (2014-01-01). "Aggregative and stochastic model of main path identification: a case study on graphene". Scientometrics. 98 (1): 633–655. doi:10.1007/s11192-013-1140-3. ISSN 0138-9130.
- ^ Liu, John S.; Kuan, Chung-Huei (2016-02-01). "A new approach for main path analysis: Decay in knowledge diffusion". Journal of the Association for Information Science and Technology. 67 (2): 465–476. doi:10.1002/asi.23384. ISSN 2330-1643.