User:Dcoetzee/Wikicode/Specification

A while ago, on Talk: Binary tree, dze27 said:

Also, perhaps we need some sort of consistent pseudo-language for specifying pseudocode. Something like they do in Introduction to Algorithms, or smilar dze27

The pseudocode in Intro to Algorithms requires some rather complex typesetting (I've had to duplicate it before). This would be better for readers, but not conducive to editing by writers. I agree there needs to be consistency; I've written pseudocode on a number of pages, but I don't claim mine is ideal. I try to focus on a few aspects I consider important:

Clarity: the most important; it should be clear what the code does
Brevity: the code should not occupy much article space, and should consist mainly of algorithm content as opposed to unnecessary syntax
Ease of writing: the code should be easy to write and edit
Universality: the code should be clear to programmers familiar with nearly any common language, and not too similar to any one

To facilitate these I've adopted a few conventions, such as Wirth-style := for assignment, use of indention in place of blocks and end marks, and so on, but I appreciate any suggestions for improvements.

Derrick Coetzee 18:30, 4 May 2004 (UTC)

From that discussion I later decided to write this proposal for a standard pseudocode. Here's the somewhat incomplete proposal, intended to meet the above general goals. Please criticize, expand, and help wikify and finalize it. Please be bold in editing the proposal! Although I (DCoetzee) wrote the initial version, I do not claim it, we share it.

The Wikicode Standard

This is an informal standard for a pseudocode which I call wikicode. The purpose of wikicode is to easily facilitate both writing and reading of clear and concise descriptions of algorithms and data structures. All wikicode will be typeset in a fixed-width font, for example by prefixing each line with spaces. Do not wrap the code in HTML <pre> tags; this disallows additional markup inside the code block. If wikicode occurs in the middle of encyclopedic text, this will be indicated by wrapping it in HTML <code> tags, which also use a fixed-width font.

General Guidelines

Authors using wikicode should keep in mind that the purpose of pseudocode is to convey a process to an audience that may include peers as well as beginners. Therefore, where possible, examples should be as short and simple as practical. Comments should appear on most, if not all lines of code. Lengthy examples should be broken into segments, with each segment introduced by a description of the process, followed by the pseudocode with comments, and possibly followed by a wrap-up paragraph.

Assumptions should be clearly before they are used, and/or identified in comment statements. For example:

Linked lists can be used to ... (blah blah blah)

Assuming the following routines are available:

   function nodeInsert(node prev, node n) // insert node 'n' before 'prev'
   function nodeFind(node list, node n) // find node 'n' in list

We can do the following:

   node n, list, t  // node to add, linked list, and temp node
   t = nodeFine(list, n) // find node
   (additional code ... blah blah blah)

As can be seen from the above example, (blah blah blah)

However, if (blah blah blah) we can do the following

   node n, list, t  // node to add, linked list, and temp node
   t = nodeFine(list, n) // find node
   (additional code ... blah blah blah)

The advantages of the first and second examples are (blah blah blah)

Functions

A possibly recursive function is defined using the emboldened function keyword. If the keywords below do not appear in bold, make sure your Monospace (fixed-width) font is set to a font which has a bold version, such as Courier. It is given a list of arguments; types may or may not be specified for the arguments, but if they are they will be indicated by the argument name, a colon (:), and then the type name (see Types). Unabbreviated names should be preferred, unless the name is explained in the text or comments:

 function sort(list, comparisonPredicate)
     (indented body of function)

 function addModK(left : int, right : int, modulus : int)
     (indented body of function)

Results are returned from functions using the return keyword, as in return 5 or simply return if there is no result. Functions are called by simply specifying the name, followed by a parenthesized list of arguments:

 addModK(5, 3, 7)
 sort(myList, func(x,y) x < y)
 sin(5)

Here, sin determines the sine of a value.

As shown above, anonymous functions may be created with func. You can also generally assume all sorts of useful math, string, etc. functions are available, as long as you explain them, in comments or in the text (as also shown above). However, when manipulating the built-in lists, sets, and maps, please use their standard interface (see their sections).

Finally, we also provide the ability to declare functions which are called using a more natural, sentence-like calling syntax:

 function add (value : int) to end of (list)
     (indented body of function)

 add 5 to end of numberList
 add x to end of numberList

Comments

Comments, surrounded by parentheses and typed in italics, occupy an entire line or a suffix of a line. Parentheses are used in analogy with English text, to suggest that comments are parenthetical:

  (Do something silly)
  i := i + 0       (add nothing to i)

If a comment is so long it requires multiple lines, it is preferable to embed it in the text somehow instead; however, if necessary they may be done simply by placing the entire comment inside parentheses, italicizing each line, and indenting appropriately:

    (This is a very long and unnecessary comment which is somewhat difficult
     to edit because it would require the editor to fix the line wrapping,
     which is bound to be annoying.)
    i := 5

I'm hoping for a better proposal for multi-line comments.

Variables

Variables are mutable, are referenced by name, and are assigned new values using the := operator. See Names for naming guidelines. Local variables need not be declared, but if you wish to you can use the var form for this:

 var w               (number of free munchkins)
 var x := 2
 var y : int := 5
 var z : int
 var a := 2, b : int, c, (d : int) := 5

You can optionally place parentheses around a variable and its type, for clarity, as shown. Variables need not be initialized when declared, but may not be used before being initialized.

Conditionals and Loops

Conditional tests can be done with the if-else statement, with if and else emboldened:

 if x = 5
     (x is 5 here)
 else
     (x is not 5 here)

There is no then or endif, and there need not be an else branch. The condition must have have bool type (see section Types).

There are only four types of loops, the while loop, and the do-while loop, the for-to loop, and the for each-in loop. The while loop iterates while the condition holds; the until keyword may also be used to iterate until the condition does not hold. No extra parentheses or keywords are needed:

 while x ≠ 0
     (body of while loop)

The do-while loop, unlike the while loop, always executes at least once before testing its condition:

 do
     (body)
 while done = false

The for each-to loop steps an integer variables along a range of integer values in increasing order, including both endpoints. For decreasing order, for each-down to can be used.

 for each x from 1 to 10
     (body, executed 10 times)
 for each x from 10 down to 1
     (body, executed 10 times)

Finally, the for each-in loop is used with lists and sets (see their sections) and iterates over the elements of the list in order or set in some arbitrary order:

 for each n in numberList
     (do something with n)
 for each n in numberSet
     (do something with n)

The keyword exit loop can be used to break out of any loop at any time and execute the instructions following the loop.

Operators

From highest to lowest order of precedence, the standard operators are:

Op	Typed	Description
`^`	`^`	Exponentiate numeric values
`×,÷`	`×,÷`	Multiply/divide numeric values
`+,-`	`+,-`	Add/subtract numeric values
`=,<,>,≤,≥,≠`	`=,<,>,≤,≥,≠`	Comparison
`in`	`'''in'''`	Collection membership
`not`	`'''not'''`	Logical not
`and,or`	`'''and''','''or'''`	Logical and, or
Need more operators

All of the entities used above are rendered correctly by all modern browsers. All operators are left-associative except ^. There is no integer divide; instead use truncate(x ÷ y), or just floor(x ÷ y) for positive quantities.

New operators may be used at any time, if they are explained. If you do need to invent your own operators, consider using:

∼,≈,≡ (&sim;,≈,&equiv;) for equivalence relations
add more good candidates here

Types

There are only a few built-in types available, as well as mechanisms for creating user-defined types. Types (like comments) are always typeset in italics; no ambiguity arises because types should not occur on a line by themselves or be preceded by //.

The simplest built-in type bool has one of two built-in values: true or false. Any bool value is a valid condition for a conditional or loop construct. The built-in type int is an infinite-precision integer type including positive and negative values and zero. The type real represents infinite-precision, infinite-range real numbers, while float represents a floating point number with whatever characteristics you like; if they are important, explain them in the text. If you need additional integer types, such as unsigned 32-bit integers, in your example, name a type in the article text and then use it.

The type character includes all conceivable writable characters and is written by surrounding the character with single quotes ('), and string is a finite-length string of such characters, surrounded by double-quotes ("). Double-quotes may be used inside a string, as long as it's clear from context where the string ends; the outer double-quotes may be emphasized with bold ("hel"lo") if this is not clear.

Although these types are unrealistic, they are useful for two reasons: some of these types can be simulated to a large extent in practice (using arbitrary-precision arithmetic libraries), and, more importantly for pseudocode, the use of such types simplifies many algorithms, eliminating some relatively unimportant implementation concerns.

There is also a primitive list type, a primitive set type, and a primitive map type. See their respective sections for more details.

A new record type (called a struct in C) may be defined using the record keyword, may be recursive, and must include field names but not necessarily types:

 record employee
     name : string
     age : int
     mentor : employee
     stuff

Records, as well as tagged unions below, are only referred to by reference, and references may contain the special value null to indicate they refer to nothing. Record fields are accessed using ., as in e.name, and such values may be assigned to. Multiple fields may be listed on a line, if commas are used:

 record employee
     name : string, age : int
     mentor : employee, stuff

Types may also be constructed using the tagged union keyword, which defines a tagged union, much like ML's datatype. Tagged unions may be recursive, and assign a tag name to each branch:

 tagged union tree
     Leaf:
     Node:  left : tree, right : tree

Fields are specified after each tag; only one set of these fields is available at a time, depending on the tag. The tag of a tagged union object may be determined using the cases keyword, and within the cases, fields of the correct branch of the tagged union may be accessed:

 cases t
     Leaf: (it's a leaf)
           (do stuff)
     Node: (it's a node)
           (do stuff with t.left and t.right)

If a type is not specified for an entity, such as an argument, local variable, or record field, it can assume any value. Operations such as + may "fail" (make the pseudocode invalid) if they may be applied to a value they do not apply to.

Lists and Arrays

A list stores a sequenced list of values. The primitive list operations are:

 list(1, 2, 3)                      (the list containing 1, 2, 3 in that order)
 emptyList                          (the list containing no elements)
 insert (value) at front of (list)  (inserts new value at front of list; push)
 insert (value) at end of (list)    (inserts new value at end of list; enqueue)
 removeFront(list)                  (removes value from front of list; pop/dequeue)
 removeEnd(list)                    (removes value from end of list; undo enqueue)
 isEmpty(list)                      (returns true if and only if list is empty)
 getFront(list)                     (gets the value at the front of the list)
 getEnd(list)                       (gets the value at the end of the list)
 list[i]                            (gets ith element of the list, zero for front)
 x in list                          (x is an element of the list)
 size(list)                         (gets number of elements in the list)
 listToSet(list)                    (get a set containing all elements of the list)

You can also use for-in to iterate over a list from front to back. When a list variable with elements of one type is declared, the syntax type[lower bound..upper bound] should be used, as in:

 var numbers : int[0..k-1]
 var (names : string[1..numNames]) := list("Joe", "Harry")
 var (moreNames : string[]) := list("Joe", "Harry")

If the upper bound is omitted, the size can vary; if the lower bound is omitted, it is assumed to be zero.

The type array is a special synonym for list which implies that the operations above have the following associated costs, where n is the number of elements in the array:

 insert (value) at front, removeFront(list)      O(n)
 insert (value) at end                           O(1) amortized
 removeEnd(list)                                 O(1)
 isEmpty(list)                                   O(1)
 getFront(list), getEnd(list), list[i]           O(1)
 list[i]                                         O(1)
 x in list                                       O(n)
 size(list)                                      O(1)
 listToSet(list)                                 O(n)

There is also an array constructor array(1, 2, 3) and an object emptyArray equivalent to the list versions but meant to suggest they produce arrays.

Finally, multidimensional arrays are permitted, declared and index as shown:

 var a : int[1..j,1..k]
 var b : int[,]
 a[2,3] := a[3,4] + 1
 for each i from 1 to 10
     b[1,i] := i+1

Accesses to elements which do not exist expand the array. If you need to initialize the contents of an array, you are encouraged to do this in the text, as in: The array b is initialized to this value:

${\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}}$

Set

A set stores an unordered sequence of values with no duplicates. The primitive set operations are:

 set(1, 1, "a", 3)         (the set containing 1, "a", and 3)
 emptySet                  (the empty set; don't use the math symbol)
 insert(set, value)        (insert a new value into the set)
 remove(set, value)        (remove a value from the set)
 x := extractElement(set)  (removes a value from the set and returns it)
 x in set                  (x is in the set)
 size(set)                 (gets number of elements in the set)
 setToList(set)            (get a list containing all elements of the set)

You can also use for-in to iterate over a set in some arbitrary order.

Map

Maps are associative arrays. They take keys and produce associated values. All keys which have not been assigned values map to the special value unassigned, but to actually determine if a key is in the map it is preferable to use x in keys(map). The primitive operations are:

 map(1 -> "a", 'k' -> 7)   (the map mapping 1 to "a" and 'k' to 7)
 emptyMap                  (the empty map)
 map[x]                    (get value associated with x)
 map[x] := y               (make x map to y from now on)
 map[x] := unassigned      (unassign x)
 size(map)                 (gets number of assigned keys in the map)
 keys(map)                 (gets a list of the keys in the map)
 values(map)               (gets a list of the values in the map)

You cannot iterate over a map; if you need to do this, iterate over the list of keys instead, looking up corresponding values as needed.

Names

No two named entities, including functions, variables, and types, may share the same name in a single code listing (although duplicate names are possible in a single article). Valid names may include letters, numbers, greek letters, and mathematical markup, such as subscripts, superscripts, and primes (typed as apostrophes), but names may not start with a number. Names are case-sensitive, but names that differ only by case are strongly discouraged.

Multiword names will use mixed case, always beginning with lowercase. Underscores (_) or other word separators will not be used.

Line length

If possible, no line should exceed 78 characters, but there may be up to 100 characters on a line. Also, if lines can be made shorter without adding additional lines or sacrificing clarity, this should be done.

Indention

The entire code listing will be indented an initial 2 spaces. A function body or the body of a conditional or loop construct must always be indented exactly 4 additional spaces. For example:

 function foo(x, y)
     if x ≠ y
         (Doesn't terminate for negative y)
         while x > y
             x := x - y
         return x
     else
         foo(x + 1, y)

Any other indention performed is up to the writer, but should be done in a clear way that keeps line length under the maximum while retaining clarity and not falsely associating unrelated elements.

The following are some helpful suggestions about indentation that are not mandated. If a function call's arguments are large enough that they cannot fit on one line, a line break should either follow an operator, a comma, or an opening parenthesis. If it follows a comma, the next line is indented to the column following the initial opening parenthesis of the call. If it follows an operator or parenthesis, it is indented 2 additional spaces. For example (don't really use names like the following):

 function foo(x, y)
     if x > y
         otherFunctionWithGreatBigName(thisArgumentFillsUpTheRestOfTheLine,
                                       argument1ToThePlusOperator +
                                         argument2ToThePlusOperator,
                                       callYetAnotherFunctionWithBigName(
                                         argToYetAnotherFunction))

Naturally, since this is less clear, clarity of naming and line length should be traded off intelligently.

Links

Links may be used anywhere they are useful in pseudocode, particularly in comments and around calls to functions not specified in the current listing. However, if used on an emboldened or italic word, it should be used in addition to the existing markup. It may also be desirable to link tagged union or record the first time they are used conspicuously in an article. Also, use common link sense: link only strongly relevent articles.

A page has been created describing wikicode for readers (Wikipedia: Wikicode, still needs work), and each page which uses wikicode must immediately precede its first conspicuous use with {{wikicode}} (Template: Wikicode), a message which will provide the reader a link to the explanation for readers, as well as this page. Other uses may optionally include the tag, and are recommended to if they're in widely separated or independent sections.

Please add your comments, but also don't hesitate to edit the proposal!

The Vote

In an effort to solicit wider feedback about the proposal and to experiment with its use in practice, I recommend we move forward with changing any existing pseudocode in articles to match this proposal. If no strong objections are listed here within 5 days, this page will be cleaned up and moved to Wikipedia: Wikicode/Specification, the user's guide on Wikipedia: Wikicode will be completed, and then articles will be changed. Note that this does not affect code samples in any particular real language.

Yes, proceed

Derrick Coetzee 21:12, 25 Aug 2004 (UTC)
AdmN 21:24, 25 Aug 2004 (UTC)

No, needs more work

Ambivalent/apathetic votes

wrp103 (Bill Pringle) | Talk]] 21:44, 28 Aug 2004 (UTC) I don't know if anyone can chime in, but this standard seems a little more comprehensive than necessary for the casual user. It might be intimidating for someone who just wants to write a few lines of pseudocode to illustrate some point. Does this mean that somebody will go along behind folks and tidy up their pseudocode?

The page Wikipedia: Wikicode will describe the pseudocode in a friendly fashion for casual readers and editors. People can use their own pseudocode or make mistakes in wikicode, but others will fix these up, and the hope is that regular contributors would learn it. The specification document is mainly so detailed in order to increase consistency between articles and to be as unambiguous as possible. Any suggestions are appreciated, though, the sooner the better. Derrick Coetzee 22:52, 28 Aug 2004 (UTC)