Talk:Source lines of code
When discussing the comparison of quality of code produced by different programmers, the term "productivity" is used where another term, e.g. "efficiency", may be more appropriate. This assumes that the definition of "productivity" skews towards quantity, i.e. to be more productive simply means to produce more output, whereas the context skews towards efficiency, quality or some analogue which suggests the concept of "doing more with less". Then again, the discussion actually considers two aspects: two different code artifacts written to do the same task, and the different qualities of the respective programmers who produce the code artifacts. What is a good term to describe a worker who is good at producing higher quality products or tools? More specifically, what is a word for the measure of said ability? "Productive" is not a good word for that measure.
Perhaps the distinctions between using SLOC to estimate software complexity, the measure of software quality in general, and the measure of programmer capability should be made more explicit.
for (i=0; i<100; ++i) {printf("hello");} /* How many lines of code is this? */
Why is that ambiguous? Because it has more than one semicolon? -from a non-programmer
- I would say one line of code. When counting lines in a large program it is usually done mechanically (i.e. by a program), so it isn't going to think it the way a person would. If it were reformatted it would count as two lines. That is why SLOC is a rough estimate. The question is whether the above sample line conforms to the formatting standards used by the shop it was written in. Since printf("hello"); is pretty simple I think it would be OK, but since it is so simple this is a somewhat contrived example. (sigh...I just realized I hedge so much that I nevger actually said anything...) RJFJR 01:37, August 10, 2005 (UTC)
The SLOC table for Windows appears to be very wrong (see this comment on Larry Osterman's blog):
"That wikipedia page is kinda funny. According to it, "Windows NT 5.0", released in 2000, contains 20M lines of code, whereas "Windows 2000", released in 2001, contains 35M. ::scratches head::
In fact, all of the "years" are completely wrong. Windows 3.1 in 1990? No, 3.0 was 1990. 3.1 was 1991 or 1992 I think. "Windows NT" (no version) in 1995, Windows 95 in 1997, NT4 in 1998, and so on. The table is prefixed "According to Gary McGraw." Wonder where that guy got his info from? I'd hardly believe his LOC counts if he can't even get the years right."
- I've changed the table to use the values from Andrew Tanenbaum's "Modern Operating Systems" book. Unfortunately, this only covers the NT line, not the Win 3.1/9x products. Does anyone have accurate figures for these? Bakery2k 11:48, 25 March 2006 (UTC)
Quote: "With the advent of GUI-based languages/tools such as Visual Basic, much of development work is done by drag-and-drops and a few mouse clicks, where the programmer virtually writes no piece of code, most of the time." - that is one of the most asinine things I have ever read. It sounds like a hippy ideal from the mid-70s of 4GL languages. Where is my jetpack? They promised me one by now!!!
Programs for counting lines of code
I think this section deserves to be removed. This is an uncommented collection of links that does not provide any help and does not belong into wikipedia. 84.191.231.103 21:17, 10 November 2006 (UTC)
I agree. Vorratt 01:10, 31 December 2006 (UTC)
I'm uncertain why you think this is inappropriate for Wikipedia (and I'm not sure I understand Wikipedia's goals and rules well enough to judge that) -- but it'd be a shame, as this kind of information is awfully useful for researchers who want to study this subject. -- Terry Hancock
Citations
There are several "citation requested" notes in the SLOC tables. I'm not sure which particular numbers come from which sources, but I can provide the following links to SLOC data, by category:
Five versions of Debian (from 2.0 "Hamm" to 3.1 "Sarge") may be found at: http://libresoft.dat.escet.urjc.es/debian-counting/
Another semi-independent paper evaluated just 2.2 "Potato": http://people.debian.org/~jgb/debian-counting/
(I say semi-independent, because I believe one author is shared between the two sources. But it was an independent study, although it used the same SLOCCount tool).
Data for Red Hat 6.2 and 7.1 were published by David Wheeler at: http://www.dwheeler.com/sloc/
Direct links to the papers: http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html http://www.dwheeler.com/sloc/redhat62-v1/redhat62sloc.html
In it, he cites the following references for Windows versions: http://www.schneier.com/crypto-gram-0003.html#8
(even with the anchor, you'll have to scroll down a bit to find this information buried midway through the article)
and also this source for NASA Space Shuttle flight software: http://books.nap.edu/html/statsoft/chap2.html
(Note that Wheeler's notation is a bit misleading -- it appears (to me) to indicate that 420,000 SLOC are used on the on board computer and that 1.4 million SLOC are used on the ground. But that's incorrect. The 1.4 million SLOC software is the size of the *testbed software* used to certify the 420,000 SLOC actually used on the Shuttle. In other words, it's *all* for the on board software. Wheeler doesn't actually say anything wrong, it's just that it's a brief note in a table and not explained.).
Which hopefully will make it easy to check the figures' sources (or replace them with equivalents). You'll realize of course that the numbers won't match exactly because there are different methodologies used in picking exactly which lines of code should be included in the counts, when exactly the data were measured, etc.
I haven't found anything on Mac OSs or FreeBSD, though I'm still looking.
Also, regarding the flippant comment about the "dates being all wrong" for Windows -- this betrays a misunderstanding. The dates need not be "release dates". They are the approximate dates at which the evaluation was made on the original source code, which evolves continuously over time. Proprietary software is simply released at specific points on that evolution, so it's less obvious that this is true. So the dates being different from the release dates doesn't necessarily mean anything. On the other hand, I don't have the McGraw book, so I can't see what the actual claim is.
In order to truly get SLOC for Windows products, you would have to have inside access to the code (which is no doubt, only available under an NDA to people contracting with Microsoft). This limits who we can get such information from. The complained-about numbers appear to be taken from David Wheeler's introduction to his papers, which references a "Gary McGraw (of Cigital)" for the source. However, *in* the papers, he uses the Schneier citation I've listed above (so I think it's probably a more reliable source).
-- Terry Hancock
More on citations...
I had a source referencing a book by Andrew Tanenbaum from 2001. However, that's obviously not the source for the later numbers. Probably this is (already in the references): http://www.computerworld.com.au/index.php/id;1942598204;pp;1
However, there's another book with similar information in it: http://www.knowing.net/PermaLink,guid,c4bdc793-bbcf-4fff-8167-3eb1f4f4ef99.aspx
Which is quoting from the book: Vincent Maraia, "The Build Master: Microsoft's Software Configuration Management Best Practices", Addison-Wesley Microsoft Technology Series, 2005.
The source for the Mac OS 10.4 "Leopard" release is apparently Steve Jobs himself, from a keynote speech, which is described here (including a paraphrase of Jobs): http://www.macosxrumors.com/articles/2006/08/09/wwdc-2006-keynote-detailed-report
68.93.224.4 21:43, 14 February 2007 (UTC)Terry Hancock
Okay, another issue:
Open Solaris is claimed to be 10 million SLOC by Sun in 2005. http://www.boostmarketing.com/story.php?id=474
From Sun, there is a claim that *Star Office* was 7.5 Million SLOC when it was first released as OpenOffice, e.g.: http://java.sun.com/developer/jcpopensource/
The Sun Solaris 7.5 meg figure is from the Debian 3.1 paper, and that refers to an earlier paper for those numbers.
68.93.224.4 22:41, 14 February 2007 (UTC)Terry Hancock