Jump to content

Main path analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Johnliu.tw (talk | contribs) at 02:01, 17 August 2017 (Added more contents.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Main path analysis was first proposed by Hummon and Doreian[1]. It is a mathematical tool to identify the major paths in a directed acyclic graph (DAG), typically citation network. The method begins by measuring the significance of all the links in a citation network through the concept of ‘traversal count’ and then sequentially chains the most significant links into a ‘main path’, which is deemed the most significant historical path in the target DAG. The method is applicable to any human activity that can be organized in the form of a DAG. The most common applications are tracing the knowledge flow paths or development trajectories of a science or technology field through bibliographic citations or patent citations[2][3][4]. It has also been applied to judicial decision to trace the evolving changes of legal opinion[5].

History

Main path analysis is first propoed in Hummon and Doreian[1] in which they suggest a different approach for analyzing a citation network "where the connective threads through a network are preserved and the focus is on the links in the network rather than on the nodes."[1] They call the resulting chain of the most used citation links "main path" and claim that "It is our intuition that the main path, selected on the basis of the most used path will identify the main stream of a literature." The idea was verified using a set of DNA research articles. To make the method more practical, Liu and Lu[6] proposed the key-route search. The most useful feature of the key-route search is that one is able to view different level of main paths by adjusting the key-route numbers.

The method

Main path analysis operates in two steps. The first step obtains the traversal count of each link in a DAG. Several algorithms are mentioned in the literature to calculate traversal count. The second step searches for the main paths by linking the significant links according to the size of traversal counts. Hereafter, the method is explained assuming the the DAG is a citation network.

Preparing a citation network

It is necessary to prepare a citation network before starting main path analysis. In a citation network, the nodes are the documents such as academic articles, patents, or legal cases. These nodes a connected using citation information. Citation networks are by nature directed. They are directed because the two nodes associated with a link is not symmetrical in their roles. As regards to the direction, this article adopts the convention that the cited node points to the citing node, signifying the fact that knowledge in the cited node flows to the citing node. Citation network are also by nature acyclic, which means that a node can never chain back to itself following the directed link. Several terms related to citation network are defined here before proceeding further .

Heads and tails: Heads are the nodes the direction arrow leads to. Tails are the nodes on other end of the direction arrow.  

Sources and sinks: Sources are the nodes that are cited, but cite no others. Sinks cite other nodes, but are not cited.

Ancestors and descendants: Ancestors are the nodes that can be traced back to. Descendants are the nodes that one can reach following the links.

Traversal counts

Traversal counts measures the significance of a link. Literature discusses several types of traversal counts, including search path count (SPC), search path link count (SPLC), search path node pair (SPNP), and other variations. All these traversal counts will be noted as SPX.

Search path count (SPC)

A link’s SPC is the number of times the link is traversed if one runs through all possible paths from all the sources to all the sinks. SPC is first proposed by Vladimir Batagelj[7].

Search path link count (SPLC)

A link’s SPLC is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the sinks. SPLC is first proposed by Hummon and Doreian[1].

Search path node pair (SPNP)

A link’s SPNP is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the descendants of the head node (including itself). SPNP is first proposed by Hummon and Doreian[1].

Based on the traversal counts, one can then search for the most significant path(s). There are several ways of finding them, including local search, global, and key-route search.

This is mentioned in Hummon and Doreian[1] as "priority first" search. In the search process, local search chooses the next link with the highest SPX as the outgoing link. It keeps tracking the most traversed link thus obtain the main stream among all citation chains.

Global search simply suggest the path with the largest overall SPX. The concept of global search is similar to critical path method in project scheduling.

Key-route search is designed to avoid the problem of missing significant links in both the local and global search. As described in Liu and Lu (2012)[6], the approach searches main paths from the specified links (key-routes) thus guarantees the inclusion of the links in the main paths. Other notable advantage of the key-route approach is that by varying the number of key-routes one is able to obtain main paths in different details. The larger the number of key-route is specified, the more detail is given. When the number of key-route increases to a certain point the search returns the whole citation network.

The Variants

In addition to the key-route search approach, variations of the method include approaches that are aggregative and stochastic[8], consider decay in knowledge diffusion[9], etc.

Applications

The method has been applied to three types of documenting system that has the tradition of making references to the previous documents. They are the academic article, patent, and legal case system.

Academic article

Academic citation databases such as Web of Science and Scopus includes comprehensive citation information thus make it possible to apply main path analysis to examine the knowledge structure or trace knowledge flow of any scientific fields. Some early works explores on the subject of centrality-productivity[10], conflict resolution[11]. More recent investigations on scientific fields includes fullerenes[4], nanotubes[4], data envelopment analysis[2][12][13], supply chain management[14], corporate social responsibility[15], IT outsourcing[16], medical tourism[17], etc.

Patent

Patents referencing prior arts is also a common practice. Patent database such as Clarivate Analytics, Webpat, contain patent citation information. Verspagen (2007)[18] and Mina (2007)[19] are the two early works that apply main path analysis to the patent data.

Software Implementation

Main path analysis is implemented in Pajek, a widely used freeware written by Vladimir Batagelj and Andrej Mrvar of University of Ljubljana, Slovenia. To run main path analysis in Pajek, one needs to prepare a citation network and have Pajek reads in the network. Next, in the Pajek main menu, computes the traversal counts of all links in the network applying one of the following command sequences.

Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPC), or

Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPLC), or

Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Node Pairs (SPNP)

After traversal counts are computed, the following command sequences find the main paths.

For local main paths 

Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Forward

For global main paths

Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Standard

For local key-route main paths

Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Key-Route

For global key-route main paths

Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Key-Route

References

  1. ^ a b c d e f Hummon, Norman P.; Doreian, Patrick. "Connectivity in a citation network: The development of DNA theory". Social Networks. 11 (1): 39–63. doi:10.1016/0378-8733(89)90017-8.
  2. ^ a b Liu, John S.; Lu, Louis Y.Y.; Lu, Wen-Min; Lin, Bruce J.Y. "Data envelopment analysis 1978–2010: A citation-based literature survey". Omega. 41 (1): 3–15. doi:10.1016/j.omega.2010.12.006.
  3. ^ Verspagen, Bart (2007-03-01). "Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research". Advances in Complex Systems. 10 (01): 93–115. doi:10.1142/S0219525907000945. ISSN 0219-5259.
  4. ^ a b c Lucio-Arias, Diana; Leydesdorff, Loet (2008-10-01). "Main-path analysis and path-dependent transitions in HistCite™-based historiograms". Journal of the American Society for Information Science and Technology. 59 (12): 1948–1962. doi:10.1002/asi.20903. ISSN 1532-2890.
  5. ^ Liu, John S.; Chen, Hsiao-Hui; Ho, Mei Hsiu-Ching; Li, Yu-Chen (2014-12-01). "Citations with different levels of relevancy: Tracing the main paths of legal opinions". Journal of the Association for Information Science and Technology. 65 (12): 2479–2488. doi:10.1002/asi.23135. ISSN 2330-1643.
  6. ^ a b Liu, John S.; Lu, Louis Y.Y. (2012-03-01). "An integrated approach for main path analysis: Development of the Hirsch index as an example". Journal of the American Society for Information Science and Technology. 63 (3): 528–542. doi:10.1002/asi.21692. ISSN 1532-2890.
  7. ^ Batagelj, V. (2003). Efficient algorithms for citation network analysis. arXiv preprint cs/0309023.
  8. ^ Yeo, Woondong; Kim, Seonho; Lee, Jae-Min; Kang, Jaewoo (2014-01-01). "Aggregative and stochastic model of main path identification: a case study on graphene". Scientometrics. 98 (1): 633–655. doi:10.1007/s11192-013-1140-3. ISSN 0138-9130.
  9. ^ Liu, John S.; Kuan, Chung-Huei (2016-02-01). "A new approach for main path analysis: Decay in knowledge diffusion". Journal of the Association for Information Science and Technology. 67 (2): 465–476. doi:10.1002/asi.23384. ISSN 2330-1643.
  10. ^ Hummon, Norman P.; Doreian, Patrick; Freeman, Linton C. (2016-08-18). "Analyzing the Structure of the Centrality-Productivity Literature Created Between 1948 and 1979". Knowledge. 11 (4): 459–480. doi:10.1177/107554709001100405.
  11. ^ Carley, Kathleen M.; Hummon, Norman P.; Harty, Martha (2016-08-17). "Scientific Influence". Knowledge. 14 (4): 417–447. doi:10.1177/107554709301400406.
  12. ^ Liu, John S.; Lu, Louis Y.Y.; Lu, Wen-Min. "Research fronts in data envelopment analysis". Omega. 58: 33–45. doi:10.1016/j.omega.2015.04.004.
  13. ^ Liu, John S.; Lu, Louis Y.Y.; Lu, Wen-Min; Lin, Bruce J.Y. "A survey of DEA applications". Omega. 41 (5): 893–902. doi:10.1016/j.omega.2012.11.004.
  14. ^ Claudia Colicchia; Fernanda Strozzi (2012-06-15). "Supply chain risk management: a new methodology for a systematic literature review". Supply Chain Management: An International Journal. 17 (4): 403–418. doi:10.1108/13598541211246558. ISSN 1359-8546.
  15. ^ Lu, Louis Y.Y.; Liu, John S. (2014-03-01). "The Knowledge Diffusion Paths of Corporate Social Responsibility – From 1970 to 2011". Corporate Social Responsibility and Environmental Management. 21 (2): 113–128. doi:10.1002/csr.1309. ISSN 1535-3966.
  16. ^ Liang, Huigang; Wang, Jian-Jun; Xue, Yajiong; Cui, Xiaocong. "IT outsourcing research from 1992 to 2013: A literature review based on main path analysis". Information & Management. 53 (2): 227–251. doi:10.1016/j.im.2015.10.001.
  17. ^ Chuang, Thomas C.; Liu, John S.; Lu, Louis Y.Y.; Lee, Yachi. "The main paths of medical tourism: From transplantation to beautification". Tourism Management. 45: 49–58. doi:10.1016/j.tourman.2014.03.016.
  18. ^ Verspagen, Bart (2007-03-01). "Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research". Advances in Complex Systems. 10 (01): 93–115. doi:10.1142/S0219525907000945. ISSN 0219-5259.
  19. ^ Mina, A.; Ramlogan, R.; Tampubolon, G.; Metcalfe, J.S. "Mapping evolutionary trajectories: Applications to the growth and transformation of medical knowledge". Research Policy. 36 (5): 789–806. doi:10.1016/j.respol.2006.12.007.