User:Dcoetzee/Wikicode/Specification
A while ago, on Talk: Binary tree, dze27 said:
Also, perhaps we need some sort of consistent pseudo-language for specifying pseudocode. Something like they do in Introduction to Algorithms, or smilar dze27
- The pseudocode in Intro to Algorithms requires some rather complex typesetting (I've had to duplicate it before). This would be better for readers, but not conducive to editing by writers. I agree there needs to be consistency; I've written pseudocode on a number of pages, but I don't claim mine is ideal. I try to focus on a few aspects I consider important:
- Clarity: the most important; it should be clear what the code does
- Brevity: the code should not occupy much article space, and should consist mainly of algorithm content as opposed to unnecessary syntax
- Ease of writing: the code should be easy to write and edit
- Universality: the code should be clear to programmers familiar with nearly any common language, and not too similar to any one
- To facilitate these I've adopted a few conventions, such as Wirth-style := for assignment, use of indention in place of blocks and end marks, and so on, but I appreciate any suggestions for improvements.
- Derrick Coetzee 18:30, 4 May 2004 (UTC)
Here's my (that is, DCoetzee's) somewhat incomplete suggestion for a standard pseudocode, intended to meet the above general goals, please criticize, expand, and help wikify and finalize it:
The Wikicode Standard
This is an informal standard for a pseudocode which I call wikicode. The purpose of wikicode is to easily facilitate both writing and reading of clear and concise descriptions of algorithms and data structures. All wikicode will be typeset in a fixed-width font, for example by prefixing each line with spaces. Do not wrap the code in HTML <pre> tags; this disallows additional markup inside the code block. If wikicode occurs in the middle of encyclopedic text, this will be indicated by wrapping it in HTML <code> tags, which also use a fixed-width font.
Functions
A possibly recursive function is defined using the emboldened function
keyword. It is given a list of arguments; types may or may not be specified for the arguments, but if they are they will be indicated by the argument name, a colon (:), and then the type name (see Types). Unabbreviated names should be preferred, unless the name is explained in the text or comments:
function sort(list, comparisonPredicate) indented body of function
function addModK(left : int, right : int, modulus : int) indented body of function
Functions are called by simply specifying the name, followed by a parenthesized list of arguments. If there is only one argument, the parentheses may be omitted, if this is clearer:
addModK(5, 3, 7) sort(myList, func(x,y) x < y) sin 5
As shown above, anonymous functions may be created with func. You can also generally assume all sorts of useful math, string, etc. functions are available, as long as you explain them, in comments or in the text. However, when manipulating the built-in lists, sets, and maps, please use their standard interface (see their sections).
Comments
Comments are indicated by the use of italics occupying an entire line or a suffix of a line. In the former case, they should form complete (possible imperative) sentences; in the latter case, they should be preceded by //
, to delineate them:
Do something silly. i := i + 0 // add nothing to i
Variables
Variables are mutable, are referenced by name, and are assigned new values using the :=
operator. See Names for naming guidelines. Local variables need not be declared, but if you wish to you can use the var form for this:
var w // number of free munchkins var x := 2 var y : int := 5 var z : int var a := 2, b : int, c, d : int := 5
Variables need not be initialized when declared, but may not be used before being initialized.
Conditionals and Loops
Conditional tests can be done with the if
-else
statement, with if and else emboldened:
if x = 5 x is 5 here else x is not 5 here
There is no then or endif, and there need not be an else branch. The condition must have have bool
type (see section Types). The equality comparison operator is simply =
. For less-than or equal to ≤
(≤) will be used, for greater-than or equal to ≥
(≥) will be used, and for not-equal ≠
(≠) will be used, as in mathematical texts.
There are only four types of loops, the while loop, and the do-while loop, the for-to loop, and the for-in loop. The while loop iterates while the condition holds; the until keyword may also be used to iterate until the condition does not hold. No extra parentheses or keywords are needed:
while x ≠ 0 body of while loop
The do-while loop, unlike the while loop, always executes at least once before testing its condition:
do body while done = false
The for-to loop steps an integer variables along a range of integer values in increasing order. For decreasing order, for-downto can be used.
for x := 1 to 10 body, executed 10 times for x := 10 downto 1 body, executed 10 times
Finally, the for-in loop is used with lists and sets (see section Sets) and iterates over the elements of the list in order or set in some arbitrary order:
for n in numberList do something with n for n in numberSet do something with n
The keyword end loop
can be used to break out of any loop at any time.
Types
There are only a few built-in types available, as well as mechanisms for creating user-defined types. Types (like comments) are always typeset in italics; no ambiguity arises because types should not occur on a line by themselves or be preceded by //
.
The simplest built-in type bool
has one of two built-in values: true or false. Any bool
value is a valid condition for a conditional or loop construct. The built-in type int
is an infinite-precision integer type including positive and negative values and zero. The type character
includes all conceivable writable characters and is written by surrounding the character with single quotes ('), and string
is a string of such characters, surrounded by double-quotes ("). Double-quotes may be used inside a string, as long as it's clear from context where the string ends; the outer double-quotes may be emphasized with bold ("hel"lo"
) if this is not clear.
There is also a primitive list
type, a primitive set
type, and a primitive map
type. See their respective sections for more details.
A new record type (called a struct in C) may be defined using the record keyword, may be recursive, and must include field names but not necessarily types:
record employee name : string age : int mentor : employee stuff
Records, as well as tagged unions below, are only referred to by reference, and references may contain the special value null to indicate they refer to nothing. Record fields are accessed using .
, as in e.name
, and such values may be assigned to. Multiple fields may be listed on a line, if commas are used:
record employee name : string, age : int mentor : employee, stuff
Types may also be constructed using the tagged union keyword, which defines a tagged union, much like ML's datatype
. Tagged unions may be recursive, and assign a tag name to each branch:
tagged union tree Leaf: Node: left : tree, right : tree
Fields are specified after each tag; only one set of these fields is available at a time, depending on the tag. The tag of a tagged union object may be determined using the cases keyword, and within the cases, fields of the correct branch of the tagged union may be accessed:
cases t Leaf: // it's a leaf Node: // it's a node, do stuff with t.left and t.right
If a type is not specified for an entity, such as an argument, local variable, or record field, it can assume any value. Operations such as +
may "fail" (make the pseudocode invalid) if they may be applied to a value they do not apply to.
List
A list stores a sequenced list of values. The primitive list operations are:
list(1,2,3) // the list containing 1, 2, 3 in that order emptyList // the list containing no elements insertFront list value // inserts new value at front of list; push insertEnd list value // inserts new value at end of list; enqueue removeFront list // removes value from front of list; pop/dequeue removeEnd list // removes value from end of list; undo enqueue isEmpty list // returns true if and only if list is empty getFront list // gets the value at the front of the list getEnd list // gets the value at the end of the list list[i] // gets ith element of the list, zero for front size list // gets number of elements in the list listToSet list // get a set containing all elements of the list
You can also use for-in to iterate over a list from front to back.
Set
A set stores an unordered sequence of values with no duplicates. The primitive set operations are:
set(1,1,2,3) // the set containing 1, 1, 2, 3 emptySet // the empty set; don't use the math symbol insert set value // insert a new value into the set remove set value // remove a value from the set x := extractElement set // removes a value from the set and returns it x ∈ set // x is in the set (written ∈) x ∉ set // x is not in the set (written ∉) size list // gets number of elements in the set setToList set // get a list containing all elements of the set
You can also use for-in to iterate over a set in some arbitrary order.
Map
Maps are associative arrays. They take keys and produce associated values. All keys which have not been assigned values map to the special value unassigned
, but it may be clearer to readers if you use the unassigned
operation below. The primitive operations are:
map(1 -> "a", 2 -> false) // the map mapping 1 to "a" and 2 to false emptyMap // the empty map map[x] // get value associated with x map[x] := y // make x map to y from now on map[x] := unassigned // unassign x size list // gets number of assigned keys in the map
You cannot iterate over a map.
Names
No two named entities, including functions, variables, and types, may share the same name in a single code listing (although duplicate names are possible in a single article). Valid names may include letters, numbers, greek letters, and mathematical markup, such as subscripts, superscripts, and primes (typed as apostrophes), but names may not start with a number. Names are case-sensitive, but names that differ only by case are strongly discouraged.
Multiword names will use mixed case, always beginning with lowercase. Underscores (_) or other word separators will not be used.
Line length
If possible, no line should exceed 78 characters, but there may be up to 100 characters on a line. Also, if lines can be made shorter without adding additional lines or sacrificing clarity, this should be done.
Indention
The entire code listing will be indented an initial 2 spaces. A function body or the body of a conditional or loop construct must always be indented exactly 4 additional spaces. If a function call's arguments are large enough that they cannot fit on one line, a line break will either follow an operator, a comma, or an opening parenthesis. If it follows a comma, the next line is indented to the column following the initial opening parenthesis of the call. If it follows an operator or parenthesis, it is indented 2 additional spaces. For example (don't really use names like the following):
function foo(x, y) if x > y otherFunctionWithGreatBigName(thisArgumentFillsUpTheRestOfTheLine, argument1ToThePlusOperator + argument2ToThePlusOperator, callYetAnotherFunctionWithBigName( argToYetAnotherFunction))
Naturally, since this is less clear, clarity of naming and line length should be traded off intelligently.
Links
Links may be used anywhere they are useful in pseudocode, particularly in comments and around calls to functions not specified in the current listing. However, if used on an emboldened or italic word, it should be used in addition to the existing markup. It may also be desirable to link tagged union or record the first time they are used conspicuously in an article. Also, use common link sense: link only strongly relevent articles.
{{msg:wikicode}}
A page will be created describing wikicode for readers, and each page which uses wikicode must immediately precede its first conspicuous use with {{msg:wikicode}}, a message which will provide the reader a link to this page.
Please add your own comments here