Jump to content

Consistent Overhead Byte Stuffing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 23.83.37.241 (talk) at 03:51, 4 March 2018 (Overhaul description. Overhead is mostly consistent, but some parts of preceding said it was CONSTANT, which is not true; there is a weak data-dependency.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Consistent Overhead Byte Stuffing (COBS) is an algorithm for encoding data bytes that results in efficient, reliable, unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. It employs a particular byte value, typically zero, to serve as a packet delimiter (a special value that indicates the boundary between packets). When zero is used as a delimiter, the algorithm replaces each delimiter byte with a non-delimiter value so that no delimiter bytes will appear in the packet and thus be misinterpreted as packet boundaries.

Byte stuffing is a process that transforms a sequence of data bytes that may contain 'illegal' or 'reserved' values (such as packet delimiter) into a potentially longer sequence that contains no occurrences of those values. The extra length of the transformed sequence is typically referred to as the overhead of the algorithm. The COBS algorithm tightly bounds the worst-case overhead, limiting it to a minimum of one byte and a maximum of n/254 bytes (one byte in 254, rounded up). Consequently, the time to transmit the encoded byte sequence is highly predictable, which makes COBS useful for real-time applications in which jitter may be problematic. The algorithm is computationally inexpensive and its average overhead is low compared to other unambiguous framing algorithms.[1][2]

COBS does, however, require up to 254 bytes of lookahead. Before transmitting its first byte, it needs to know the position of the first zero byte (if any) in the following 254 bytes.

Packet framing and stuffing

When packetized data is sent over any serial medium, some protocol is required to demarcate packet boundaries. This is done by using a framing marker, a special bit-sequence or character value that indicates where the boundaries between packets fall. Data stuffing is the process that transforms the packet data before transmission to eliminate all occurrences of the framing marker, so that when the receiver detects a marker, it can be certain that the marker indicates a boundary between packets.

For simplicity, COBS is described assuming a framing byte of zero, but the technique can be applied to an arbitrary framing byte. COBS transforms an arbitrary string of bytes in the range [0,255] into bytes in the range [1,255]. Having eliminated all zero bytes from the data, a zero byte can now be appended to the transformed payload to unambiguously mark the end of the packet.

There are two equivalent ways to describe the COBS encoding process:

Prefixed block description
To encode some bytes, first append a zero byte, then break them into groups of either 254 non-zero bytes, or 0–253 non-zero bytes followed by a zero byte. Because of the appended zero byte, this is always possible.
Encode each group by deleting the trailing zero byte (if any) and prepending the number of non-zero bytes, plus one. Thus, each encoded group is the same size as the original, except that 254 non-zero bytes are encoded into 255 bytes by prepending a byte of 255.
As a special exception, if a packet ends with a group of 254 non-zero bytes, it is not necessary to encode the added trailing zero byte. This saves one byte in some situations.
Linked list description
To encode some bytes, replace each zero byte with the offset to the next zero byte, or the end of the packet. Finally, at the beginning of the packet, prepend the offset to the first zero byte.
The simple form of this encoding fails if the the offset to the following zero byte is more than 255, and so will not fit into one byte. To deal with this, first insert a zero byte after every run of 254 non-zero bytes. During decoding, if an offset is 255, the following offset is decoded, but deleted, rather that decoded as a zero byte (as it is for all smaller offsets).
As a special case, it is not necessary to insert a zero byte if the packet ends with 254 non-zero bytes.

It is worth noting that while other byte-stuffing algorithms have their worst case when the framing byte occurs frequently in the data to be encoded, this is the best case for COBS. It expands the data only if there are 254 other bytes in a row. This is another reason that COBS normally escapes zero bytes; they are common in uncompressed binary data.

Encoding examples

These examples show how various data sequences would be encoded by the COBS algorithm. In the examples, all bytes are expressed as hexadecimal values, and encoded data is shown with text formatting to illustrate various features:

  • An overhead byte appears at the beginning of every encoded packet. This byte does not correspond to a data byte; instead it encodes the offset to the first zero in the packet.
  • Bold indicates a data byte that has not been altered by encoding. All non-zero data bytes remain unaltered.
  • Green indicates a zero data byte that was altered by encoding. All zero data bytes are replaced during encoding by the offset to the following zero byte (i.e. one plus the number of non-zero bytes that follow). It is effectively a pointer to the next packet byte that requires interpretation: if the addressed byte is non-zero then it is the following group header byte zero data byte that points to the next byte requiring interpretation; if the addressed byte is zero then it is the end of packet.
  • A zero byte appears at the end of every packet to indicate end-of-packet to the data receiver. This packet delimiter byte is not part of COBS proper; it is an additional framing byte that is appended to the encoded output.
Example Unencoded data (hex) Encoded with COBS (hex)
1 00 01 01 00
2 00 00 01 01 01 00
3 11 22 00 33 03 11 22 02 33 00
4 11 22 33 44 05 11 22 33 44 00
5 11 00 00 00 02 11 01 01 01 00
6 01 02 ... FE FF 01 02 ... FE 00

Below is a diagram using example 3 from above table, to illustrate how each modified data byte is located, and how it is identified as a data byte or an end of frame byte.

     [OHB]                              : Overhead byte (Start of frame)
     3+ -------------->|                : First byte points to relative location of first zero symbol
                       2+-------->|     : Is a zero data byte, pointing to next zero symbol
                                  [EOP] : Location of end-of-packet zero symbol.
     0     1     2     3     4    5     : Byte Position
     03    11    22    02    33   00    : COBS Data Frame
           11    22    00    33         : Extracted Data
     
OHB = Overhead Byte (Points to next zero symbol)
EOP = End Of Packet

Implementation

The following code implements a COBS encoder and decoder in the C programming language:

#include <stdint.h>
#include <stddef.h>

/*
 * StuffData byte stuffs "length" bytes of data
 * at the location pointed to by "ptr", writing
 * the output to the location pointed to by "dst".
 *
 * Returns the length of the encoded data, which is
 * guaranteed to be <= length + 1 + (length - 1)/254.
 */
#define FinishBlock() (*code_ptr = code, code_ptr = dst++, code = 0x01)

size_t StuffData(const uint8_t *ptr, size_t length, uint8_t *dst)
{
  const uint8_t *start = dst, *end = ptr + length;
  uint8_t *code_ptr = dst++;  /* Where to insert the leading count */
  uint8_t code = 0x01;

  for (; ptr < end; ptr++) {
    if (*ptr != 0) {
      *dst++ = *ptr;
      if (++code != 0xFF)
        continue;
    }
    FinishBlock();
  }

  FinishBlock();
  return dst - start;
}

/*
 * UnStuffData decodes "length" bytes of data at
 * the location pointed to by "ptr", writing the
 * output to the location pointed to by "dst".
 *
 * Returns the length of the decoded data (which is
 * guaranteed to be <= length.)
 */
size_t UnStuffData(const uint8_t *ptr, size_t length, uint8_t *dst)
{
  const uint8_t *start = dst, *end = ptr + length;
  uint8_t code = 0xFF, copy = 0;

  while (ptr < end) {
    if (copy != 0) {
      *dst++ = *ptr++;
    } else {
      if (code != 0xFF)
        *dst++ = 0;
      copy = code = *ptr++;
      if (code == 0)
        break;   /* Should never happen */
    }
    copy--;
  }
  return dst - start;
}

Further overhead reduction

COBS uniquely decodes almost every possible string of bytes in the range [1,255]. The one exception is the last group of bytes, whose group header byte can be replaced by any greater value without affecting the decoded result. This leads to the minimum overhead of one additional byte per packet.

If it is necessary to encode small packets with minimal overhead, it is possible to reduce this by performing COBS across packet boundaries. That is, the packets are all concatenated, COBS is performed on the result, and then the framing bytes are inserted back into the encoded result. This eliminates the one-byte overhead, but introduces the problem of pauses in the packet stream. It is not possible to encode a packet without lookahead into the following packets, which may not be available yet.

This can be addressed in one of the following ways:

  • Append a dummy "no operation" packet which contains a zero byte. It cannot be encoded and sent until after the pause, but the preceding packet, the last one conveying information, can be, or
  • Transmit a continuous stream of dummy packets, or
  • Use some other special sequence outside of COBS. For example, a zero-length packet or a packet consisting of a single zero byte might be otherwise illegal, and thus available to indicate a coding restart.
  • Modify COBS by reducing the maximum group header byte to 254, and use a group header byte of 255 to encode a pause, or
  • use the tiny amount of remaining redundancy to mark a coding pause. Specifically, the encoded bytes mostly correspond one-to-one with raw bytes, leaving no ambiguity as to where the framing bytes go, but the group header byte after a group of 254 non-zero bytes has no corresponding raw byte, so it is possible to interpret the position of a framing byte relative to this group header byte. In the normal case, when there is no pause in packet transmission, let the group header byte come before the framing byte. If a framing byte occurs immediately before the group header byte (i.e. immediately after the last byte in the group), consider the just-completed packet a dummy to pause encoding and discard it. This requires appending a dummy packet of up to 254 bytes, but that may not be a problem as the channel is idle.

References

  1. ^ Cheshire, Stuart; Baker, Mary (April 1999). "Consistent Overhead Byte Stuffing" (PDF). IEEE/ACM Transactions on Networking. 7 (2). doi:10.1109/90.769765. Retrieved November 30, 2015.
  2. ^ Cheshire, Stuart; Baker, Mary (17 November 1997). Consistent Overhead Byte Stuffing (PDF). ACM SIGCOMM '97. Cannes. Retrieved November 23, 2010.