Gap buffer

In computer science, a gap buffer is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in a large buffer in two contiguous segments, with a gap between them for inserting new text. Moving the cursor involves copying text from one side of the gap to the other (sometimes copying is delayed until the next operation that changes the text). Insertion adds new text at the end of the first segment. Deletion increases the size of the gap.It is a fairly simple technique that involves keeping track of 5 pointers and a sequencial block (gap) inside the buffer structure for inserting new text. The five pointers are
(1) head of the buffer,
(2) start of the gap,
(3) first location outside the gap,
(4) end of the buffer, and
(5) location (point) within the buffer.

The main rule for point is that it must be within the buffer and cannot be anywhere inside the gap other than the beginning of it.

The advantage of using a gap buffer over more sophisticated data structures (such as linked lists) is that the text is represented simply as two literal strings, which take very little extra space and which can be searched and displayed very quickly.

The disadvantage is that operations at different locations in the text and ones that fill the gap (requiring a new gap to be created) require re-copying most of the text, which is especially inefficient for large files. The use of gap buffers is based on the assumption that such recopying occurs rarely enough that its cost can be amortized over the more common cheap operations.

Gap Buffer in Emacs

A gap buffer is used in most Emacs editors. Emacs buffers are implemented using an invisible gap to make insertion and deletion faster. Insertion works by filling in part of the gap, and deletion adds to the gap. Of course, this means that the gap must first be moved to the locus of the insertion or deletion. Emacs moves the gap only when you try to insert or delete. This is why your first editing command in one part of a large buffer, after previously editing in another far-away part, sometimes involves a noticeable delay.

This mechanism works invisibly, and Lisp code should never be affected by the gap's current location, but these functions are available for getting information about the gap status.

Function: gap-position

This function returns the current gap position in the current buffer.

Function: gap-size

This function returns the current gap size of the current buffer.

Text Editing with Gap Buffer

Editable sequences are useful, in particular in interactive applications such as [[text editors]], word processors, score editors, and more. In such applications, it is highly likely that an editing operation is close to the previous one, measured as the difference in positions in the sequence. This statistical behavior makes it feasible to implement the editable sequence as a gap buffer.
The basic idea is to store objects in a vector that is usually longer than the number of elements stored in it. For a sequence of N elements where editing is required at index i, elements 0 through i are stored at the beginning of the vector, and elements i + 1 through N − 1 are stored at the end of the vector. When the vector is longer N, this storage leaves a gap. Editing operations always result in modifications at the beginning or at the end of the gap.
Occasionally, the gap has to be moved, or rather, some elements have to be moved so as to leave the gap where the next editing operation is desired. In the worst case, i.e., that of an alternating sequence of editing operations at the beginning and at the end of the sequence, every element needs to be moved. While it can be argued that this case does not happen very frequently, it unfortunately corresponds to operations that might be reasonable in some clients, namely rotation of the elements or the use of the sequence as a queue

Example

Below are some examples of operations with buffer gaps. The gap is represented pictorially by the empty space between the square brackets. This representation is a bit misleading: in a typical implementation, the endpoints of the gap are tracked using pointers or array indices, and the contents of the gap are ignored; this allows, for example, deletions to be done by adjusting a pointer without changing the text in the buffer. It is a common programming practice to use a semi-open interval for the gap pointers, i.e. the start-of-gap points to the invalid character following the last character in the first buffer, and the end-of-gap points to the first valid character in the second buffer (or equivalently, the pointers are considered to point "between" characters).

Initial state:

This is the way [                    ]out.

User inserts some new text:

This is the way the world started [   ]out.

User moves the cursor before "started"; system moves "started " from the first buffer to the second buffer.

This is the way the world [   ]started out.

User adds text filling the gap; system creates new gap:

This is the way the world as we know it [                   ]started out.

External references

Overview and implementation in .NET/C#
Brief overview and sample C++ code
Implementation of a cyclic sorted gap buffer in .NET/C#
Use of gap buffer in early editor. (First written somewhere between 1969 and 1971)
emac gap buffer info(Emacs gap buffer reference)

Gap Buffer in Emacs

Text Editing with Gap Buffer

Example

See also

External references