Wikipedia talk:Algorithms on Wikipedia

C++

This is a simple issue. Use C++ without any object oriented constructs and without pointers except for binary trees and other dynamically allocated data structures. References should be used whenever possible. Also, no macros.

Secondly, clarity/readability should be emphasized over speed.

Advantages:

- basic syntax of C++ is essentially the same/easily understandable by programmers in other languages
- C is widely known
- Avoids Unix specific languages such as Ruby, Python, Perl, etc. (NB: Technically, these are portable but almost all users of those languages use linux/unix).

Why C++ and not C? Also although a lot of samples are in C, once you start having to deal with pointers it's probably getting too bogged down in the C-ness of it and you need a more "pseudo"-type language such as Python. —EatMyShortz 13:55, 14 February 2006 (UTC)[reply]

There's a very good reason not to use C/C++ exclusively. Their syntax has a lot of problems, and C++ in particular is extremely verbose, even for relatively simple things. A lot of the built-in operators are counterintuitive to people who just know high-school math, such as assignment/equality, not-equals, bitwise versus logical and/or, and so on. They use different bitwise operators from computer engineers. Damian Conway has written papers on how confusing C/C++'s declaration syntax is. Additionally, C is limited to fixed-size integers, which makes many algorithms in number theory and other areas much more complicated. C++ at least has string operators, That said, I do support procedural code samples for most things, and I have often used C samples where they are simple. Deco 20:07, 14 February 2006 (UTC)[reply]

I don't think using any concrete programming language is a good idea, as the clearest algorithms are written with appropriate section in words, e.g. union of X_i for i ∈ { 1 ... n } is infinitely clearer than (apply union xs) in Lisp/Scheme, foldl union [] xs in Haskell, and the corresponding C++ iterator loop. --Mgreenbe 20:35, 14 February 2006 (UTC)[reply]

I have to agree with Mgreenbe that the requirement of executable code in a concrete programming language is only going to obfuscate things. High level pseudocode allows you to condense something that would require considerable complexity in a concrete language into a simple and accessible description in pseudocode. For instance I think it would be hard to write an executable version of Buchberger's algorithm in C or C++ or Scheme or Python that is going to be as clear as the psuedocode presented - far too much code would be spent dealing with the complications of implementation rather than in explicating the important steps of the alorithm itself. Leland McInnes 22:03, 14 February 2006 (UTC)[reply]

Recursion vs. Iteration

I can appreciate the emphasis on iteration versus recursion, as recursion is not taught to some people. I don't agree that iteration is always preferable. A recursive algorithm that searches a binary tree is simpler to understand, requires less book-keeping in variables, and is in general, preferable when explaining to a neophyte. I think there are an entire class of problems that are similar. --BenBaker

I'm aware that recursion vs. iteration is one of more controversial of hints listed. There is certainly class of problems are much easier to solve using recursion than iteration. That is a hint for problems that can be solved equally easily with both methods (like fibonacci).

There is also a problem that various people find either of methods "simpler to understand". There's some point in providing both implementations, so both groups can find easy-to-understand algorithm.

Probably some weaker wording should be used. --Taw

Iterative where immediately apparent would be nice. On the other hand, if its a problem that lends itself to recursion, no reason not to. programming is complicated, there's no two ways about it.

BTW, Even with iterative constructs available in LISP, I tend to go recursive just because the language lends itself to it, but in C I go iterative if I can help it. Something to think about... --alan d

Role of this page

I think this page should be renamed "List of Algorithms", should have a one-line explanation of every algorithm, and should be listed on Reference tables. I don't think it makes much sense to try to dictate in which way people are to write sample implementations. Personally, I'd prefer pseudo code. Also, why is it directed at Unix programmers? What about Hurd hackers? --AxelBoldt

Page name is almost irrelevant.
One-line ok
Reference Tables done
It's not discating, it's setting standards
It makes LOT of sense. Bad sample implementation is almost completely useless.
Pseudo code is really bad idea. It is in no aspect better than real languages like Ruby or Python. It's impossible to test, so may contain errors and one has to assume many things like array indexing about it. Pseudocode should be discouraged.
It should be directed at Unix programmers, because majority of people interesed in this topic knows Unix or technologies that come from it, like C. No Unix-specific quirks in code of course.
Objectively speaking, Hurd is kind of Unix or at least mostly Unix-compatible.
Hurd hackers are very small group of people and I'm sure that all of them know some other Unices.

--Taw

Page names involving "Wikipedia" are generally taken to be meta articles about Wikipedia, not genuine Wikipedia articles. If this is intended to be a policy article about Wikipedia coding standards, then there should be a separate list of algorithms.
While it is true that pseudo code cannot be tested, it is intended for human understanding and may therefore contain less errors. Programming languages often contain obscure constructs (= vs == for instance, or overloaded division /) which make understanding harder. Just because you can test a program doesn't mean that it is less likely to have errors.
The reference to Hurd was a joke. I was alluding to the fact that there are many programmers which don't use Unix.

The article should be directed at programmers, period. --AxelBoldt

Pseudocode vs. real code

So the only unresolved issue is pseudocode, right ?

Arguments for pseudocode and against simple real languages:

Rarely used languages like Ruby may confuse readers
Pseudo code forces the programmer to think through the algorithm before implementing it; the danger with real code is that they just cut and paste
Pseudo code can focus in on the heart of the algorithm, glossing over trivial parts (initializing of arrays, type casts)
Pseudo code is the most concise description of an algorithm designed for human consumption, programming languages are designed for consumption by computers.
Pseudo code can use generally established conventions of mathematical notation (xⁿ, √x, x mod n), while avoiding idiosyncracies of programming languages.
Any algorithm that can be expressed at all can be expressed in pseudo code; some algorithms can not be expressed in some programming languages.
...

Arguments for simple real languages and against pseudocode:

Pseudocode can't be easily tested, so may contain subtle errors like off-by-one
Pseudocode needs explanations about things like == vs. =, type of division, array indexing etc. before each example
Pseudocode isn't standarized, so it may confuse reader
Reader can't play with sample implementation, and playing can help to understand in case of some algorithms
Pseudocode needs some redundant information, for example it must have array size specified as separate variable, while in most languages constructs like array.size can be used to get it.
Pseudocode can't be easily used for many algorithms, for example those involving pointers
...

I think the argument that "Pseudocode can't be easily tested" is a spurious one. Unless we're engaged in "original research", the pseudocode in question should be coming from some external (referenced) source, so there shouldn't be any need to test it (although there might be a need to proof-read it against the reference). With regard to the argument about sample implementations

constructing their own sample implementation based on the pseudocode will probably help the reader gain an even better understanding of the algorithm
unless we're planning on providing sample implementations in every conceivable language, there's a chance that $RANDOMREADER won't be able to make use of the sample implementation anyway
there's nothing that says that there shouldn't be a few sample implementations in addition to a good language-independent pseudocode description of the algorithm

--Allan McInnes 00:44, 29 January 2006 (UTC)[reply]

I'm moving Ruby into the list of less-recommended languages. Smalltalk and Eiffel both have many more users than Ruby does, so if you're going to recommend against them (which I think is reasonable), then you have to take out Ruby as well. The only ones I think have a significant enough user base that any programmer should at least be familiar with them are C/C++, Java, BASIC, Pascal, Perl, and Python. BASIC and Pascal are bad because there are too many incompatible variants.

-- LDC

C programs can be a mess because to write them portably, taking into account char-size, endianness etc., can make them unreadable. While I like Perl, I think it is unreadable unless you have really used it quite a bit. Ideally, I would like the sample implementations to be relevant to beginning computer science students as well as to working programmers. --AxelBoldt

Choice of language criteria

There are two things that need to be considered in choosing languages - its readability and its popularity. That's why languages like Python and Ruby, which were designed (successfully) with readability in mind are more apropriate, while Perl and C are less apropriate, than they would be if only popularity mattered. I put Eiffel and Smalltalk on discouraged list not only because of their little popularity but also because of their non-standard syntax, and it case of Smalltalk, philosophy.

About popularity:

From http://www.eros-os.org/pipermail/e-lang/2001-October/005793.html (freshmeat projects count, only relevant entries quoted):
- Python: 403
- Ruby: 31
- Eiffel: 16
- Delphi: 15
- Pascal: 15
- Smalltalk: 3
- Visual Basic: 6
- Object Pascal: 4
- Basic: 1
- so if Unix hackers are considered, BASIC, Pascal, Smalltalk and Eiffel are less popular than you stated while Ruby is more. (in other groups, popularity ranking will be different)
BASIC and Pascal aren't bad only because there are too many variants. There are many bigger problems with these languages. (details are easy to find online)
Nowadays most new programmers aren't familiar with Pascal and BASIC. They might have seen some code, but haven't used it to do anything serious. Most schools don't use these languages for teaching and they are virtually unused in UNIX world.
Ruby's popularity is comparable to Python's in Japan (there was article on /. about it, but i don't have hard data handy) --Taw

Standardized pseudocode

As a beginning computer science student, I request that all algorithms be written in a Wikipedia-standardized pseudocode. I'm not particularly familiar with any language, and IMO, the most generalized presentation would be of greatest benefit. There's no reason why there can't also be implementations in Python or Java, giving us the best of both worlds. --Stephen Gilbert

What's the point in "standard" pseudocode as compared to using simple real world language ? --Taw

I find that pseudocode helps me understand the general algorithm, as opposed to a specific implementation of it. Why can't both pseudocode and a simple real world language be used? --Stephen Gilbert

Samples of code for popularity

Freshmeat is hardly an unbiased sample, and the numbers quoted are too small to mean much anyway. Google returns twice about as many hits for "Smalltalk language" and "Eiffel language" as it does for "Ruby language". Check out any bookstore: you're likely to find 5-10 books on Eiffel and Smalltalk; I think only one Ruby book exists. Smalltalk is big in academia as a teaching language (like Pascal used to be, and Java is now), so a lot of people are familiar with it even if it doesn't get used for real projects much. I have no idea what you mean by its "philosophy" or why that matters. BASIC isn't used in academia and it's not used much in Unix, but it's still the #1 most popular programming language by many measures--let's not kid ourselves that the world is Unix--the world in Windows, and us Unix hackers are growing minority, but still a minority.

The Ruby code does have the advantage that it's pretty readable as pseudocode for simple things. All in all, I think if I were to add code samples here, I'd use Java (which you omitted from the table above--its count is 740, above all those you mentioned and below C/C++, Perl, and PHP), since it is quite common, well-defined and standardized, and quite readable. -LDC

By user count, world may be Windows, but by programmer count it's not.
I wouldn't call Java very "readable".
Google count for 'basic language' and 'basic programming language' contains mostly false positives on main page - basic is popular english word after all

Your first point may be true; it would be hard to measure. Perhaps the "bookstore" metric would be useful there as well (that is, count the sales of related books). I personally don't find Java any less readable than Python or Ruby; they all use the same operators, keywords are sensible, etc. At any rate, that's a matter of personal taste. Yes, searching for "BASIC language" on Google would be silly, which is why I didn't do or suggest it. Even the ones I did do are only the roughest possible estimate, just like your "Freshmeat" metric. Another metric that might be useful is to look at newspaper job advertisements. Java, C, and BASIC (usually MS VB) programmers are in greatest demand, followed by Perl, ASP (a BASIC derivative), and miscellaneous others.

I'd support using real programming code (my favourites would be C/C++ or Java, but take your pick), not psuedocode. The problem with psuedocode is that there are no standards, and sometimes it is difficult to interpret. At least with real programming code, there is always only one (in theory at least) interpretation for the code. -- SJK

Why don't we just let everybody use their favorite programming language? That way, not only will we get the biggest possible catalog of sample implementation, but we will also get a nice collection of sample code snippets that we can link to from the respective programming language pages. Whenever you come across an ugly sample implementation in language you don't understand, like Scheme, just add your own implementation written in Brainfuck. Can't hurt, can it? --AxelBoldt

Nobody's saying that one can't add another sample implementation, if there is some reason of doing it. Hints are made so people have easier time if they're unsure what's the good way. --Taw

I'd prefer real code to pseudocode in general, but sometimes pseudocode is far more readable. For example, I just added an algorithm to Complexity_classes_P_and_NP. It's 8 lines of pseudocode, and fairly readable (I think). In any real language, this would be far less readable, since you'd have to build an interpretor, and deal with all sorts of data structures that aren't really important to the central idea. --LC

But that's because it's hardly an "algorithm", it's rather abstract mathematical construct --Taw

That's what algorithms are! Actual code implements an algorithm, which is the abstract mathematical construct of a sequence of steps to produce a result. Taw uses pseudocode here to give a very high-level presentation of an idea. In a real language, you'd have to just put in a procedure call like "RunProgram(x)", and comment that this is left as an exercise for the reader. :-)

The list of algorithms seems to be rather slewed towards algorithms used in computer programming. Do algorithms like Euclid's algorithm (a numerical algorithm) and Grover's algorithm (a quantum algorithm) belong? Or are the lists meant to be geared towards computer programming? -- CYD

Haskell?

A couple of points:

OOP is about programming in the large. For programming in the small (to provide illustrative examples of algorithms, for instance), OOP is just a bunch of unnecessary cruft. Hence, Java and C++ don't add very much over C for the purpose of illustrating fundamental algorithms, and the cruft gets in the way of understanding.
Some of the HLL's (Perl, for instance), aren't exactly ideal for describing implementations of data structures (trees, hash tables etc).
As far as the untestability of pseudocode, most algorithms we present should be verifiable by inspection.
Perl looks ugly and encourages, er, idiosyncratic coding. It's great for getting jobs done. It's bad for presenting examples in, IMHO.
Of course, I think we should present most of our algorithms in something like Haskell, but I can't see us winning that argument :) --Robert Merkel

Yeah Robert I agree at most. Especially for C as I don't understand all the HLL's, PERL's, IMHO's, Haskell 98's, Haskell++, O'Haskell's and Mondrian's stuff you've written. C++ and Java are very active nowadays and perhaps according to C++, through C# and some other futher classification efforts C will see its own rebirth. I am also very much interested in RPN programming and in Maple V R4.00a codeing for the number theory applications. --XJamC 4 Wednesday (Thor's day) [2002.02.28) (0)

I suggest as a compromise the Python programming language one of the most readable progamming languages, often described as "executable pseudocode". -- Anon.

I would be very much in favour of real pseudo-code. This has benefits that

No programmer is "angry" because the implementation is in a - in his eyes - stupid, silly, inferior programming language (it's NPOV). Moreover, they will not feel compelled to add other versions of the same algorithm
No details of implementation have to be presented, and use of mathematical signs (also variables with subscripts) or plain text make the code very readable
Pseudocode makes it possible to focus on what's important about the algorithm.
An encyclopedia is about things, not the things themselves. Therefore, having executable code is not very useful (it's not in all languages anyway). If somebody is interested in implementing one of the algorithms, pseudocode should give him more than enough information to implement things in his favourite language, especially if the article also contains some pointers on implementation details (f.e., an algorithm on graphs can point to an article Graph representation in programming, or something like that.

I could probably go on here, but I think my point is clear. A problem of pseudo-code is of course that there's no standard, but it should not be a problem to create a Wikipedia pseudo code language, and linking to explanation when used. Jeronimo

If you fully define a pseudocode then it becomes a code. Next thing someone will write a complier for it, then people will start with the language pissing matches etc.... ;-)

Hehe, you might be right. But te definition should simply define some syntax (assignment, if-then-else, while-do, etc). The rest of the notation should be relatively free. That's why it's pseudo-code. Jeronimo

Proposal: construct our own "executable pseudocode"

I've been thinking about the real-language versus pseudocode debate, and I think an interesting compromise would be to construct an "executable pseudocode" language with a real interpreter specifically with the aims of readability, simplicity, and brevity in small code samples. To facilitate embedded English statements, we could either allow English statements to "summarize" (hide) sections of code that implement them, or we could have the interpreter pause and make the human interpret the English statement. This would allow us to test code samples without succumbing to the idiosyncrisies and verbosity of real languages, with the additional benefit of ensuring regularity of syntax for common constructs like loops. Real languages could still be used where one is particularly suitable or simple, or in a supplementary fashion.

This is all a little abstract, so here's a contrived example. Let's say you type this very C-like text into a text file:

function insert(array a, int length, value) {
    set i to index of last element
    while i >= 0 and a[i] > value {
        a[i + 1] := a[i];
        i--;
    }
    a[i + 1] := value;
}

You run it through a "beautifier" which converts it to this pretty wikitext, expanding or replacing counterintuitive constructs and operators and adding bold/italics:

function insert(array a, int length, value) {
    set i to index of last element
    while i ≥ 0 and a[i] > value
        a[i + 1] ← a[i]
        i ← i − 1
    a[i + 1] ← value
}

Next you load this up in the interpreter. It gives you an input-eval-print loop in which to enter test invocations. Let's say you type "insert({1,3,5,0},3,2)". It executes the above function, and when you get to "set i to index of last element", the interpreter stops and asks you to carry this action out. You type "i := 2" and then continue running. The result "{1,2,3,5}" comes back.

So that's a brief preview - any suggestions or criticism welcome. Deco 21:23, 14 February 2006 (UTC)[reply]

Who would be entering an algorithm that is unable to either (1) test and experiment with a working implementation or (2) prove correctness (perhaps following a proof in a book)? I feel like anyone who doesn't fall into one of these two categories shouldn't be entering algorithms. So would this be helpful to readers? I wouldn't find much use from it (biased sample). Perhaps it would be more helpful to leverage WPCS] to add real-language implementations of algorithms to Wikisource. --Mgreenbe 22:08, 14 February 2006 (UTC)[reply]