Algorithmic efficiency

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm^{[citation needed]}. An algorithm must be analyzed to determine its resource usage^{[citation needed]}, and the efficiency of an algorithm can be measured based on the usage of different resources.^{[citation needed]} Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.^{[citation needed]}

For maximum efficiency it is desirable to minimize resource usage.^{[citation needed]} However, different resources such as time and space complexity cannot be compared directly^{[citation needed]}, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is considered most important.^{[citation needed]}

For example, bubble sort and timsort are both algorithms to sort a list of items from smallest to largest.^{[citation needed]} Bubble sort sorts the list in time proportional to the number of elements squared ( ${\textstyle O(n^{2})}$ , see Big O notation), but only requires a small amount of extra memory which is constant with respect to the length of the list ( ${\textstyle O(1)}$ ).^{[citation needed]} Timsort sorts the list in time linearithmic (proportional to a quantity times its logarithm) in the list's length ( ${\textstyle O(n\log n)}$ ), but has a space requirement linear in the length of the list ( ${\textstyle O(n)}$ ).^{[citation needed]} If large lists must be sorted at high speed for a given application, timsort is a better choice; however, if minimizing the memory footprint of the sorting is more important, bubble sort is a better choice.^{[citation needed]}

Background

The importance of efficiency with respect to time was emphasised by Ada Lovelace in 1843 as applied to Charles Babbage's mechanical analytical engine:

"In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation"^[1]

Early electronic computers had both limited speed and limited random access memory. Therefore, a space–time trade-off occurred. A task could use a fast algorithm using a lot of memory, or it could use a slow algorithm using little memory. The engineering trade-off was then to use the fastest algorithm that could fit in the available memory.^{[citation needed]}

Modern computers are significantly faster than the early computers, and have a much larger amount of memory available (Gigabytes instead of Kilobytes).^{[citation needed]} Nevertheless, Donald Knuth emphasised that efficiency is still an important consideration:

"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"^[2]

Overview

An algorithm is considered efficient if its resource consumption, also known as computational cost, is at or below some acceptable level.^{[citation needed]} Roughly speaking, 'acceptable' means: it will run in a reasonable amount of time or space on an available computer, typically as a function of the size of the input.^{[citation needed]} Since the 1950s computers have seen dramatic increases in both the available computational power and in the available amount of memory, so current acceptable levels would have been unacceptable even 10 years ago.^{[citation needed]} In fact, thanks to the approximate doubling of computer power every 2 years, tasks that are acceptably efficient on modern smartphones and embedded systems may have been unacceptably inefficient for industrial servers 10 years ago.^{[citation needed]}

Computer manufacturers frequently bring out new models, often with higher performance. Software costs can be quite high, so in some cases the simplest and cheapest way of getting higher performance might be to just buy a faster computer, provided it is compatible with an existing computer.^{[citation needed]}

There are many ways in which the resources used by an algorithm can be measured: the two most common measures are speed and memory usage^{[citation needed]}; other measures could include transmission speed, temporary disk usage, long-term disk usage, power consumption, total cost of ownership, response time to external stimuli, etc.^{[citation needed]} Many of these measures depend on the size of the input to the algorithm, i.e. the amount of data to be processed.^{[citation needed]} They might also depend on the way in which the data is arranged; for example, some sorting algorithms perform poorly on data which is already sorted, or which is sorted in reverse order.^{[citation needed]}

In practice, there are other factors which can affect the efficiency of an algorithm, such as requirements for accuracy and/or reliability.^{[citation needed]} As detailed below, the way in which an algorithm is implemented can also have a significant effect on actual efficiency, though many aspects of this relate to optimization issues.

Theoretical analysis

In the theoretical analysis of algorithms, the normal practice is to estimate their complexity in the asymptotic sense.^{[citation needed]} The most commonly used notation to describe resource consumption or "complexity" is Donald Knuth's Big O notation^{[citation needed]}, representing the complexity of an algorithm as a function of the size of the input ${\textstyle n}$ .^{[citation needed]} Big O notation is an asymptotic measure of function complexity, where ${\textstyle f(n)=O{\bigl (}g(n){\bigr )}}$ roughly means the time requirement for an algorithm is proportional to $g(n)$ , omitting lower-order terms that contribute less than $g(n)$ to the growth of the function as ${\textstyle n}$ grows arbitrarily large.^{[citation needed]} This estimate may be misleading when ${\textstyle n}$ is small, but is generally sufficiently accurate when ${\textstyle n}$ is large as the notation is asymptotic.^{[citation needed]} For example, bubble sort may be faster than merge sort when only a few items are to be sorted; however either implementation is likely to meet performance requirements for a small list.^{[citation needed]} Typically, programmers are interested in algorithms that scale efficiently to large input sizes^{[citation needed]}, and merge sort is preferred over bubble sort for lists of length encountered in most data-intensive programs.^{[citation needed]}

Some examples of Big O notation applied to algorithms' asymptotic time complexity include:

Notation	Name	Examples
$O(1)$	constant	Finding the median from a sorted list of measurements^{[citation needed]}; Using a constant-size lookup table^{[citation needed]}; Using a suitable hash function for looking up an item.^{[citation needed]}
$O(\log n)$	logarithmic	Finding an item in a sorted array with a binary search^{[citation needed]} or a balanced search tree^{[citation needed]} as well as all operations in a Binomial heap.^{[citation needed]}
$O(n)$	linear	Finding an item in an unsorted list^{[citation needed]} or a malformed tree (worst case)^{[citation needed]} or in an unsorted array^{[citation needed]}; Adding two n-bit integers by ripple carry.^{[citation needed]}
$O(n\log n)$	linearithmic, loglinear, or quasilinear	Performing a Fast Fourier transform^{[citation needed]}; heapsort^{[citation needed]}, quicksort (best and average case)^{[citation needed]}, or merge sort^{[citation needed]}
$O(n^{2})$	quadratic	Multiplying two n-digit numbers by a simple algorithm^{[citation needed]}; bubble sort (worst case or naive implementation)^{[citation needed]}, Shell sort^{[citation needed]}, quicksort (worst case)^{[citation needed]}, selection sort^{[citation needed]} or insertion sort^{[citation needed]}
$O(c^{n}),\;c>1$	exponential	Finding the optimal (non-approximate) solution to the travelling salesman problem using dynamic programming^{[citation needed]}; determining if two logical statements are equivalent using brute-force search^{[citation needed]}

Benchmarking: measuring performance

For new versions of software or to provide comparisons with competitive systems, benchmarks are sometimes used, which assist with gauging an algorithms relative performance.^{[citation needed]} If a new sort algorithm is produced, for example, it can be compared with its predecessors to ensure that at least it is efficient as before with known data, taking into consideration any functional improvements. Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example, in the mainframe world certain proprietary sort products from independent software companies such as Syncsort compete with products from the major suppliers such as IBM for speed.^{[citation needed]}

Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example^[3]^[4] and The Computer Language Benchmarks Game compares the performance of implementations of typical programming problems in several programming languages.^{[citation needed]}

Even creating "do it yourself" benchmarks can demonstrate the relative performance of different programming languages, using a variety of user specified criteria.^{[citation needed]} This is quite simple, as a "Nine language performance roundup" by Christopher W. Cowell-Shah demonstrates by example.^[5]

Implementation concerns

Implementation issues can also have an effect on efficiency, such as the choice of programming language^{[citation needed]}, or the way in which the algorithm is actually coded,^[6] or the choice of a compiler for a particular language^{[citation needed]}, or the compilation options used^{[citation needed]}, or even the operating system being used.^{[citation needed]} In many cases a language implemented by an interpreter may be much slower than a language implemented by a compiler.^[3] See the articles on just-in-time compilation and interpreted languages.

There are other factors which may affect time or space issues, but which may be outside of a programmer's control; these include data alignment, data granularity, cache locality, cache coherency, garbage collection, instruction-level parallelism, multi-threading (at either a hardware or software level), simultaneous multitasking, and subroutine calls.^[7]

Some processors have capabilities for vector processing, which allow a single instruction to operate on multiple operands^{[citation needed]}; it may or may not be easy for a programmer or compiler to use these capabilities.^{[citation needed]} Algorithms designed for sequential processing may need to be completely redesigned to make use of parallel processing, or they could be easily reconfigured.^{[citation needed]} As parallel and distributed computing grow in importance in the late 2010s, more investments are being made into efficient high-level APIs for parallel and distributed computing systems such as CUDA, TensorFlow, Hadoop, OpenMP and MPI.^{[citation needed]}

Another problem which can arise in programming is that processors compatible with the same instruction set (such as x86-64 or ARM) may implement an instruction in different ways, so that instructions which are relatively fast on some models may be relatively slow on other models.^{[citation needed]} This often presents challenges to optimizing compilers, which must have a great amount of knowledge of the specific CPU and other hardware available on the compilation target to best optimize a program for performance.^{[citation needed]} In the extreme case, a compiler may be forced to emulate instructions not supported on a compilation target platform, forcing it to generate code or link an external library call to produce a result that is otherwise incomputable on that platform, even if it is natively supported and more efficient in hardware on other platforms.^{[citation needed]} This is often the case in embedded systems with respect to floating-point arithmetic, where small and low-power microcontrollers often lack hardware support for floating-point arithmetic and thus require computationally expensive software routines to produce floating point calculations.^{[citation needed]}

Measures of resource usage

Measures are normally expressed as a function of the size of the input $\scriptstyle {n}$ .^{[citation needed]}

The two most common measures are:^{[citation needed]}

Time: how long does the algorithm take to complete?
Space: how much working memory (typically RAM) is needed by the algorithm? This has two aspects: the amount of memory needed by the code (auxiliary space usage)^{[citation needed]}, and the amount of memory needed for the data on which the code operates (intrinsic space usage).^{[citation needed]}

For computers whose power is supplied by a battery (e.g. laptops and smartphones), or for very long/large calculations (e.g. supercomputers), other measures of interest are:^{[citation needed]}

Direct power consumption: power needed directly to operate the computer.
Indirect power consumption: power needed for cooling, lighting, etc.

As of 2018^[update], power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from embedded Internet of things devices to system-on-chip devices to server farms.^{[citation needed]} This trend is often referred to as green computing.^{[citation needed]}

Less common measures of computational efficiency may also be relevant in some cases:

Transmission size: bandwidth could be a limiting factor.^{[citation needed]} Data compression can be used to reduce the amount of data to be transmitted.^{[citation needed]} Displaying a picture or image (e.g. Google logo) can result in transmitting tens of thousands of bytes (48K in this case) compared with transmitting six bytes for the text "Google".^{[citation needed]} This is important for I/O bound computing tasks.^{[citation needed]}
External space: space needed on a disk or other external memory device^{[citation needed]}; this could be for temporary storage while the algorithm is being carried out, or it could be long-term storage needed to be carried forward for future reference.^{[citation needed]}
Response time (latency): this is particularly relevant in a real-time application when the computer system must respond quickly to some external event.^{[citation needed]}
Total cost of ownership: particularly if a computer is dedicated to one particular algorithm.^{[citation needed]}

Time

Theory

Analyze the algorithm, typically using time complexity analysis to get an estimate of the running time as a function of the size of the input data. The result is normally expressed using Big O notation.^{[citation needed]} This is useful for comparing algorithms, especially when a large amount of data is to be processed.^{[citation needed]} More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance.^{[citation needed]} Algorithms which include parallel processing may be more difficult to analyze.^{[citation needed]}

Practice

Use a benchmark to time the use of an algorithm. Many programming languages have an available function which provides CPU time usage. For long-running algorithms the elapsed time could also be of interest.^{[citation needed]} Results should generally be averaged over several tests.^{[citation needed]}

Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a multi-processing and multi-programming environment.^{[citation needed]}

This sort of test also depends heavily on the selection of a particular programming language^{[citation needed]}, compiler^{[citation needed]}, and compiler options^{[citation needed]}, so algorithms being compared must all be implemented under the same conditions.^{[citation needed]}

Space

This section is concerned with use of memory resources (registers, cache, RAM, virtual memory, secondary memory) while the algorithm is being executed. As for time analysis above, analyze the algorithm, typically using space complexity analysis to get an estimate of the run-time memory needed as a function as the size of the input data. The result is normally expressed using Big O notation.^{[citation needed]}

There are up to four aspects of memory usage to consider^{[citation needed]}:

The amount of memory needed to hold the code for the algorithm.
The amount of memory needed for the input data.
The amount of memory needed for any output data.
- Some algorithms, such as sorting, often rearrange the input data and do not need any additional space for output data.^{[citation needed]} This property is referred to as "in-place" operation.^{[citation needed]}
The amount of memory needed as working space during the calculation.
- This includes local variables and any stack space needed by routines called during a calculation^{[citation needed]}; this stack space can be significant for algorithms which use recursive techniques.^{[citation needed]}

Early electronic computers, and early home computers, had relatively small amounts of working memory. For example, the 1949 Electronic Delay Storage Automatic Calculator (EDSAC) had a maximum working memory of 1024 17-bit words^{[citation needed]}, while the 1980 Sinclair ZX80 came initially with 1024 8-bit bytes of working memory.^{[citation needed]} In the late 2010s, it is typical for personal computers to have between 4 and 32 GB of RAM, an increase of over 300 million times as much memory.^{[citation needed]}

Caching and memory hierarchy

Current computers can have relatively large amounts of memory (possibly Gigabytes), so having to squeeze an algorithm into a confined amount of memory is much less of a problem than it used to be.^{[citation needed]} But the presence of four different categories of memory can be significant^{[citation needed]}:

Processor registers, the fastest of computer memory technologies with the least amount of storage space.^{[citation needed]} Most direct computation on modern computers occurs with source and destination operands in registers before being updated to the cache, main memory and virtual memory if needed.^{[citation needed]} On a processor core, there are typically on the order of hundreds of bytes or fewer of register availability,^{[citation needed]} although a register file may contain more physical registers than architectural registers defined in the instruction set architecture.^{[citation needed]}
Cache memory is the second fastest and second smallest memory available in the memory hierarchy.^{[citation needed]} Caches are present in CPUs, GPUs, hard disk drives and external peripherals,^{[citation needed]} and are typically implemented in static RAM.^{[citation needed]} Memory caches are multi-leveled; lower levels are larger, slower and typically shared between processor cores in multi-core processors.^{[citation needed]} In order to process operands in cache memory, a processing unit must fetch the data from the cache, perform the operation in registers and write the data back to the cache.^{[citation needed]} This operates at speeds comparable (about 2-10 times slower) with the CPU or GPU's arithmetic logic unit or floating-point unit if in the L1 cache.^[8] It is about 10 times slower if there is an L1 cache miss and it must be retrieved from and written to the L2 cache, and a further 10 times slower if there is an L2 cache miss and it must be retrieved from an L3 cache, if present.^{[citation needed]}
Main physical memory is most often implemented in dynamic RAM (DRAM).^{[citation needed]} The main memory is much larger (typically gigabytes compared to ≈8 megabytes) than an L3 CPU cache, with read and write latencies typically 10-100 times slower.^[8] As of 2018^[update], RAM is increasingly implemented on-chip of processors, as CPU or GPU memory.^{[citation needed]}
Virtual memory is most often implemented in terms of secondary storage such as a hard disk^{[citation needed]}, and is an extension to the memory hierarchy that has much larger storage space but much larger latency, typically around 1000 times slower than a cache miss for a value in RAM.^[8] While originally motivated to create the impression of higher amounts of memory being available than were truly available, virtual memory is more important in contemporary usage for its time-space tradeoff and enabling the usage of virtual machines.^[8] Cache misses from main memory are called page faults, and incur huge performance penalties on programs.^{[citation needed]}

An algorithm whose memory needs will fit in cache memory will be much faster than an algorithm which fits in main memory, which in turn will be very much faster than an algorithm which has to resort to virtual memory.^{[citation needed]} Because of this, cache replacement policies are extremely important to high-performance computing, as are cache-aware programming and data alignment.^{[citation needed]} To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds.^{[citation needed]} Different systems will have different amounts of these various types of memory, so the effect of algorithm memory needs can vary greatly from one system to another.^{[citation needed]}

In the early days of electronic computing, if an algorithm and its data would not fit in main memory then the algorithm could not be used.^{[citation needed]} Nowadays the use of virtual memory appears to provide much memory, but at the cost of performance.^{[citation needed]} If an algorithm and its data will fit in cache memory, then very high speed can be obtained; in this case minimizing space will also help minimize time.^{[citation needed]} This is called the principle of locality,^{[citation needed]} and can be subdivided into locality of reference, spatial locality and temporal locality.^{[citation needed]} An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well.^{[citation needed]}

References

^ Green, Christopher, Classics in the History of Psychology, retrieved 19 May 2013
^ Knuth, Donald (1974), "Structured Programming with go-to Statements" (PDF), Computing Surveys, 6 (4): 261–301, CiteSeerX 10.1.1.103.6084, doi:10.1145/356635.356640, S2CID 207630080, archived from the original (PDF) on 24 August 2009, retrieved 19 May 2013
^ ^a ^b "Floating Point Benchmark: Comparing Languages (Fourmilog: None Dare Call It Reason)". Fourmilab.ch. 4 August 2005. Retrieved 14 December 2011.
^ "Whetstone Benchmark History". Roylongbottom.org.uk. Retrieved 14 December 2011.
^ OSNews Staff. "Nine Language Performance Round-up: Benchmarking Math & File I/O". osnews.com. Retrieved 18 September 2018.
^ Kriegel, Hans-Peter; Schubert, Erich; Zimek, Arthur (2016). "The (black) art of runtime evaluation: Are we comparing algorithms or implementations?". Knowledge and Information Systems. 52 (2): 341–378. doi:10.1007/s10115-016-1004-2. ISSN 0219-1377. S2CID 40772241.
^ Guy Lewis Steele, Jr. "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO". MIT AI Lab. AI Lab Memo AIM-443. October 1977.[1]
^ ^a ^b ^c ^d Hennessy, John L; Patterson, David A; Asanović, Krste; Bakos, Jason D; Colwell, Robert P; Bhattacharjee, Abhishek; Conte, Thomas M; Duato, José; Franklin, Diana; Goldberg, David; Jouppi, Norman P; Li, Sheng; Muralimanohar, Naveen; Peterson, Gregory D; Pinkston, Timothy Mark; Ranganathan, Prakash; Wood, David Allen; Young, Clifford; Zaky, Amr (2011). Computer Architecture: a Quantitative Approach (Sixth ed.). ISBN 978-0128119051. OCLC 983459758.

[1] Green, Christopher, Classics in the History of Psychology, retrieved 19 May 2013

[Knuth1974-2] Knuth, Donald (1974), "Structured Programming with go-to Statements" (PDF), Computing Surveys, 6 (4): 261–301, CiteSeerX 10.1.1.103.6084, doi:10.1145/356635.356640, S2CID 207630080, archived from the original (PDF) on 24 August 2009, retrieved 19 May 2013

[fourmilab.ch-3] "Floating Point Benchmark: Comparing Languages (Fourmilog: None Dare Call It Reason)". Fourmilab.ch. 4 August 2005. Retrieved 14 December 2011.

[4] "Whetstone Benchmark History". Roylongbottom.org.uk. Retrieved 14 December 2011.

[5] OSNews Staff. "Nine Language Performance Round-up: Benchmarking Math & File I/O". osnews.com. Retrieved 18 September 2018.

[KriegelSchubert2016-6] Kriegel, Hans-Peter; Schubert, Erich; Zimek, Arthur (2016). "The (black) art of runtime evaluation: Are we comparing algorithms or implementations?". Knowledge and Information Systems. 52 (2): 341–378. doi:10.1007/s10115-016-1004-2. ISSN 0219-1377. S2CID 40772241.

[steele1997-7] Guy Lewis Steele, Jr. "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO". MIT AI Lab. AI Lab Memo AIM-443. October 1977.[1]

[CompArc:QuantApp-8] Hennessy, John L; Patterson, David A; Asanović, Krste; Bakos, Jason D; Colwell, Robert P; Bhattacharjee, Abhishek; Conte, Thomas M; Duato, José; Franklin, Diana; Goldberg, David; Jouppi, Norman P; Li, Sheng; Muralimanohar, Naveen; Peterson, Gregory D; Pinkston, Timothy Mark; Ranganathan, Prakash; Wood, David Allen; Young, Clifford; Zaky, Amr (2011). Computer Architecture: a Quantitative Approach (Sixth ed.). ISBN 978-0128119051. OCLC 983459758.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

v t e Computer science
Note: This template roughly follows the 2012 ACM Computing Classification System.
Hardware	Printed circuit board Peripheral Integrated circuit Very-large-scale integration System on a chip (SoC) Energy consumption (green computing) Electronic design automation Hardware acceleration Processor Size / Form
Computer systems organization	Computer architecture Computational complexity Dependability Embedded system Real-time computing Cyber-physical system Fault tolerance Wireless sensor network
Networks	Network architecture Network protocol Network components Network scheduler Network performance evaluation Network service
Software organization	Interpreter Middleware Virtual machine Operating system Software quality
Software notations and tools	Programming paradigm Programming language Compiler Domain-specific language Modeling language Software framework Integrated development environment Software configuration management Software library Software repository
Software development	Control variable Software development process Requirements analysis Software design Software construction Software deployment Software engineering Software maintenance Programming team Open-source model
Theory of computation	Model of computation Stochastic Formal language Automata theory Computability theory Computational complexity theory Logic Semantics
Algorithms	Algorithm design Analysis of algorithms Algorithmic efficiency Randomized algorithm Computational geometry
Mathematics of computing	Discrete mathematics Probability Statistics Mathematical software Information theory Mathematical analysis Numerical analysis Theoretical computer science
Information systems	Database management system Information storage systems Enterprise information system Social information systems Geographic information system Decision support system Process control system Multimedia information system Data mining Digital library Computing platform Digital marketing World Wide Web Information retrieval
Security	Cryptography Formal methods Security hacker Security services Intrusion detection system Hardware security Network security Information security Application security
Human–centered computing	Interaction design Augmented reality Virtual reality Social computing Ubiquitous computing Visualization Accessibility Human–computer interaction Mobile computing
Concurrency	Concurrent computing Parallel computing Distributed computing Multithreading Multiprocessing
Artificial intelligence	Natural language processing Knowledge representation and reasoning Computer vision Automated planning and scheduling Search methodology Control method Philosophy of artificial intelligence Distributed artificial intelligence
Machine learning	Supervised learning Unsupervised learning Reinforcement learning Multi-task learning Cross-validation
Graphics	Animation Rendering Photograph manipulation Graphics processing unit Image compression Solid modeling
Applied computing	Quantum computing E-commerce Enterprise software Computational mathematics Computational physics Computational chemistry Computational biology Computational social science Computational engineering Differentiable computing Computational healthcare Digital art Electronic publishing Cyberwarfare Electronic voting Video games Word processing Operations research Educational technology Document management
Category Outline Glossaries