Talk:Cyclomatic complexity

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

???

This article has not yet received a rating on Wikipedia's content assessment scale.

???

This article has not yet received a rating on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Alternative Way is not helpful

The "Alternative Way" section is not helpful, as it uses the undefined term "number of closed loops". The author might be getting at the reason behind the term "cyclomatic", which is that if the (directed) control flow graph is viewed as an undirected graph, the cyclomatic number is the minimum number of edges that must be removed to remove all loops from the undirected graph. However, this calculation is hardly any easier than counting the number of branches (conditionals). I recommend removing the "Alternative Way" section or replacing it by describing the connection to the cyclomatic number in undirected graphs. Greg sullivan 19:52, 23 August 2006 (UTC)[reply]

Material merge

Ok, I merged in the material that was in the erroneously named Cyclometric complexity. I didn't do it all too perfectly, partly because I am not all that familiar with the subject. Some of it seems to conflict a bit, possibly the definitions, so if someone knowledgeable could properly fact check it using the external links or other good resources that would be great. But I thought I should at least do what I can. - Taxman 22:50, Feb 24, 2005 (UTC)

Clarifications

http://c2.com/cgi/wiki?CyclomaticComplexityMetric has the definition:

E = number of edges in the graph.
N = number of nodes in the graph.
P = number of nodes that are exit points
     (last instruction, return, exit, etc.)
 
 Then
 
Cyclomatic complexity = E - N + P

That's a simpler explanation of 'P' than 'connected nodes'. It really means 'ways in which this module can branch to out to other modules'.

The article says, "It directly measures the number of linearly independent paths through a program's source code.". Wouldn't it be better to say that it gives an upper bound on the number linearly independent paths? For example, in the first control flow graph given in the margin, there are as few as two or as many as three linearly independent paths.

75.87.186.225 (talk) 17:15, 23 October 2009 (UTC)[reply]

Rule of thumb

That rule of thumb is a bit unbelievable. I have a simple frame of C code with 4 functions: main(), parse_options(), print_version() and print_usage(). There are only two available options: "help" and "version". The file is 89 lines long, but only 67 of them are real lines of code. Cyclomatic number for such a simple file is equal to 8, according to measure done by cccc. So is the file close to be "error-prone"? Is article wrong, rule of thumb is wrong, cccc is wrong or maybe I don't understand something? Could someone explain this, please? LukMak 18:20, 28 March 2006 (UTC)

Your code is only error prone if you have not handled all conditionals. We use cyclomatic complexity for two purposes.

During the initial development cycle, when designing test frames using decision tables all pathways must be addressed. Cyclomatic complexity use at this point provides the point that test frames are needed, and also highlights consitions that the programmer or arhcitect has missed.

The second purpose is to identify functions (and programs) that use conditionals nested very deeply. This is not good because most humans have a problem reading deeply nested conditionals and may add code during the maintenance cycle that does not address all pathways.

Hope this helps.

--Bill Buckels 12:26, 6 November 2007 (UTC)[reply]

Discuss: Merge article with "essential complexity"

Against: My two cents: merging this article ("cyclomatic complexity") with "essential complexity" doesn't make sense, because cyclomatic complexity is a method of measuring complexity (of a body of code / instructions), and the complexity which this method measures isn't implicitly categorized as either "essential" or "accidental". That's one of the strengths of cyclomatic complexity measurement, frankly: it's just one objective measurement of complexity, and it doesn't try to assert whether anything can or should be done about the complexity that it measures. Cyclomatic complexity and essential complexity are effectively in different categories that I would name something like "measures of complexity" and "interpretation / meaning of complexity", respectively. Mlibby 16:20, 17 March 2007 (UTC)[reply]
Against: Cyclomatic complexity is one measure of the total complexity of a computer program; essential complexity is a kind of complexity found in a problem. Sometimes people write programs to solve problems, and in that case the essential complexity of the problem often finds its way into the program, but there's always other complexity involved as well. In short, cyclomatic complexity is very different from essential complexity, and the two are only loosely related. Kragen Sitaker 02:34, 1 April 2007 (UTC) Update: McCabe's paper defining cyclomatic complexity also defines a concept called "essential complexity", but it is also unrelated to the concept in the Essential Complexity Wikipedia page. Perhaps this explains how this otherwise bizarre proposal for merging arose. Kragen Sitaker 03:46, 1 April 2007 (UTC)[reply]

I agree that this proposal is bizarre. The Essential Complexity article is clearly talking about the notion from Fred Brooks' "No Silver Bullet" paper; that and cyclomatic complexity are very distinct concepts. Glv 19:30, 10 April 2007 (UTC)[reply]

Also signed for against; the two articles are a different focus. --BlueNovember 16:17, 20 May 2007 (UTC)[reply]

I am STRONGLY against merging this article with the Essential Complexity Wikipedia page and I am actually shocked that this was suggested. This comment should be removed from this page altogether because it makes no sense to anyone wanting a conscise definition of Cyclomatic Complexity as I have understood in my last 12 years as a Software Developer.

--Bill Buckels 12:17, 6 November 2007 (UTC)[reply]

What about functions?

This article neglects to explain how functions are handled. If a function is called from ten places, then control flow can flow from each of those ten places to the beginning of that function, and from the function's end back to the point after each of those ten places. Are these 20 arcs part of the control-flow graph being analyzed, or not?

Consider, for example, this Python program to calculate a sum of some squares:

   total = 0
   for ii in [2, 4, 8, 10]:
       total += ii * ii
   print total

This program clearly has cyclomatic complexity of M = E - N + P = 4 - 4 + 1 = 2, which is unsurprising since it contains one loop and no conditionals.

Now, we can rewrite this with a function as follows:

   total = 0
   def n(ii):
       global total
       total += ii * ii
   n(2); n(4); n(8); n(10)
   print total

If we treat the "def n(ii):" line as a compile-time declaration, as it is in most languages, rather than a run-time statement, which it is in Python, then we can get a couple of different measures. There are 8 statements (7 if you exclude "global") and either 6 edges and 2 connected components, if we exclude the edges from the calls to n to the top of the body of n and from the bottom of the body back to the next statement after the call, or 14 edges and 1 connected component, if we include those edges. This gives 6 - 8 + 2 = 0 or 14 - 8 + 1 = 7, both of which seem like implausible answers.

I'm reading McCabe's paper now to see what he says about this (if anything!), and then perhaps I can clarify the article.

Update: McCabe's paper uses the formula M = E - N + 2P throughout most of it, although it states the M = E - N + P version up front; 6 - 8 + 4 = 2 and 14 - 8 + 2 = 8, and change the original version to 4 - 4 + 2 = 2, and at least the first of those numbers seems like a plausible value, not changing the essential complexity of the program.

So far I haven't run across anything in the paper about functions, but maybe Section IV explains it. What about methods or higher-order functions, I wonder? The first formula could treat them consistently; the second would be even more completely hosed.

Update: Section III suggests limiting each module (in FORTRAN, a subroutine) to a cyclomatic complexity of 10, and, as I had suspected, section IV explains that McCabe treats separate subroutines as independent graphs. This is sensible. There's an error on p.310 where a straight-line program is claimed to have V(G) = 2 instead of 1 (would be 0 by E - N + P), but so far the other values I've seen are consistent. I also cleaned up my code above a bit.

Kragen Sitaker 03:09, 1 April 2007 (UTC)[reply]

Update: Although the basic idea seems good, and the paper is still better than the current version of this Wikipedia entry, this is really rather a poor paper, aside from the errors I pointed out above, some of which seem to have caused some long-lasting confusion (e.g. the error of omitting the 2 in the formula the first time it's presented).

The author debases Dijkstra's term "structured", which Dijkstra coined to describe programs that were comprehensible and had mathematically provable behavior, to mean "using only alternation, iteration, and sequencing as control-flow operations," and indeed at one point (p.315, section VI, second paragraph) characterizes Knuth's article "Structured Programming With GO TO Statements" as advocating the use of "unstructured goto", which is exactly the opposite of Knuth's point in that article, which is that a slightly wider range of control structures are needed to structure programs comprehensibly --- that is, that structured programming is occasionally mutually exclusive with the debased meaning of "structured programming" being used by McCabe in this paper. (I've read Knuth's article, but you can begin to see the problem just from its title. It's currently available at http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf)

It's possible that the error on p.310 is a result of a problem in the digitization --- perhaps an arrow is missing.

There are several other errors:

the diagram of "if" in footnote 4 on p.315 has an arrow going the wrong way;
the diagrams of the "following four control structures" on p.315 is missing one of the four;
the descriptions of those control structures on p.316 exchanges (c) and (d), or at any rate it disagrees with the diagram above about which ones they are;
the references misspell Henry Ledgard's name as "Legard" and "genealogy" (in the title of Ledgard and Marcotty's paper) as "generalogy";
they also misspell the name of Knuth's paper slightly.

This is not an exhaustive list.

Kragen Sitaker 04:09, 1 April 2007 (UTC)[reply]

A couple of points re McCabe's paper: first, in 1976 the term "structured programming" was a very specific term that doesn't have the broad meaning we tend to read it with today. The meaning McCabe used was a common interpretation of the phrase. Knuth's article was, effectively, advocating a broadening of this definition. It was therefore reasonable for McCabe to describe what he was advocating as "unstructured": his ideas had not become mainstream at that point. Programs with breaks from loops were not considered structured at the time. Neither were programs that contained functions with multiple returns. Some of the uses of go to that Knuth suggested were acceptable are still today not considered to be part of structured programming (e.g. using goto to jump to error handlers).

The "+P" rather than "+2P" appears to be due to the suggestion of adding an additional edge from the exit to the entrance of the function. This additional edge, of which there will be one for each connected subset of the graph, eliminates the necessity to use 2P. The graph shown on the first page includes such an edge; most of the rest of the graphs do not, hence the use of 2P in the rest of the article. This could have been explained better, but is not an error.

You appear to have a rather poor scan of the article. The copy I've linked to shows the program you refer to on p310 has a loop from block 2 back to itself, and does have all four control structures on p.315. JulesH (talk) 20:48, 24 March 2009 (UTC)[reply]

Functional vs Imperative Programming

Is it possible to apply cyclomatic complexity to code written in a functional programming language, or otherwise? If not, then the article should specify the paradigms to which cyclomatic complexity applies. If something similar is possible with functional programming language, it would be interesting if the article discussed it. Mgsloan 04:37, 2 October 2007 (UTC)[reply]

Exceptions anyone?

In the presence of exceptions, featured by many modern languages, those need to be treated as possible exit points, too. Looking at the example code provided, it means that you can not test all code with two test cases. Rather, you need

c1==true, f1() throws
c1==true, f1() returns
c1==false, f2() throws
c1==false, f2() returns
c2==true, f3() throws
c2==true, f3() returns
c2==false, f4() throws
c2==false, f4() returns

Some people might argue that f3 or f4 throwing or returning doesn't change anything, so they shouldn't be treated as different cases. However, I wouldn't do that, because they exit the containing function through different exit points. Question remains what the official definition has to say on that topic.

Attempting to improve clarity of the article

But have a few concerns that I can't really answer myself. First: the "formal definition" section goes beyond my understanding of this topic by straying into a field of mathematics that I know little about. But, I can find no sources that discuss the topic in these terms. A google search for '"cyclomatic complexity" "relative homology"' turns up nothing except wikipedia mirrors. Is this original research?

There appears to be more than one formal definition of cyclomatic complexity: for example see Henderson-Sellers doi:10.1007/BF00403560. Henderson-Sellers claims that one metric is better suited to module-level testing while the other is more appropriate for system-level, integration testing. The article should at least mention other definitions. Here are some more relevant articles:

Baker, A.L. and Zweben, S.H. (1980) A comparison of measures of control flow complexity. IEEE Transactions of Software Engineering, 6(6), 506–512.
Feghali, I. and Watson, A.H. (1994) Clarification concerning modularization and McCabe's cyclomatic complexity. Communications of the ACM, 37(4), 91–92.
Henderson-Sellers, B. (1992) Modularization and McCabe's cyclomatic complexity. Communications of the ACM, 35(12), 17–19.
Shepperd, M. (1988) A critique of cyclomatic complexity as a software metric. Software Engineering Journal, 3, 30–36.
Weyuker, E.J. (1988) Evaluating software complexity measures. IEEE Transactions of Software Engineering, 14(9), 1357–1365.

Hope that helps. pgr94 (talk) 22:34, 24 March 2009 (UTC)[reply]

Well, the original McCabe proposal for the CC, resided on the definition of Cyclomatic Number in the graph theory. Berge defined it in Graphs and Hypergraphs. But the same number could also be defined in a branch of that theory called topological graph theory and, if so, that definition is exactly the same as reported in the wiki page. Of course the first is simpler and more pratically useful, but this is the subtle relation between them. --Ciosbel (talk) 20:30, 8 January 2010 (UTC)[reply]

As for the "prediction ability" of CC, are we talking about predicting number of defects in a module based on its complexity, or something else? Is there a paper associated with this keynote speech? I've seen a set of slides for it, but can find nothing about CC in those. JulesH (talk) 19:23, 24 March 2009 (UTC)[reply]

I think that the meaning "prediction ability" is intended as predicting the future program failures (or defects) from its complexity. I've found the slides you mention and only slide 17 matters. I've also found an article by Les Hatton about comparing defects against a wide varierty of well-known static measures (and of course also CC) looking for correlations. The analysis was done on the NAG Fortran library already cited in the slides. As it turned out, he found there were a correletion between CC and the number of executables lines of code; Quoting from the article: "[...] it is extremely likely that there will be a decision about every 3-4 executables lines of code [...]" and he stated even an equation for that. So i think the original author ment that for "CC has the same prediction ability as lines of code".Here there's the article. Ciosbel (talk) 19:21, 8 January 2010 (UTC)[reply]

I removed "However there are certain types of modules that one would expect to have a high complexity number, such as user interface (UI) modules containing source code for data validation and error recovery.^{[citation needed]}". Obviously the author is not familiar with the humble view design pattern: http://martinfowler.com/eaaDev/OrganizingPresentations.html

Example where P>1

The article about Connected component does not deal with directed graphs very well. If it is applicable to interpret the flow graph as undirected, there can almost never be a P > 1 (a function use to have only one entry point, and all nodes are reachable from that). Can anyone give an example where P > 1? -- 130.236.56.173 (talk) 11:00, 12 January 2010 (UTC)[reply]