Wikipedia:User page design guide

BRUTE FORCE

Main features no preprocessing phase; constant extra space needed; always shifts the window by exactly 1 position to the right; comparisons can be done in any order; searching phase in O(mn) time complexity; 2n expected text characters comparisons. Description The brute force algorithm consists in checking, at all positions in the text between 0 and n-m, whether an occurrence of the pattern starts there or not. Then, after each attempt, it shifts the pattern by exactly one position to the right. The brute force algorithm requires no preprocessing phase, and a constant extra space in addition to the pattern and the text. During the searching phase the text character comparisons can be done in any order. The time complexity of this searching phase is O(mn) (when searching for am-1b in an for ×instance). The expected number of text character comparisons is 2n.

DIVIDE AND CONQUER

Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing the problem into smaller subproblems hoping that the solutions of the subproblems are easier to find and then composing the partial solutions into the solution of the original problem.

Little more formally, divide-and-conquer paradigm consists of following major phases:

Breaking the problem into several sub-problems that are similar to the original problem but smaller in size, Solve the sub-problem recursively (successively and independently), and then Combine these solutions to subproblems to create a solution to the original problem.

Binary Search (simplest application of divide-and-conquer) Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an ordered array of n elements, the basic idea of binary search is that for a given element we "probe" the middle element of the array. We continue in either the lower or upper segment of the array, depending on the outcome of the probe until we reached the required (given) element.

DECREASE AND CONQUER

Decrease and Conquer Sorts and Graph Searches Decrease and Conquer algorithm make the problem smaller by reducing problem at each step. They can reduce the problem by constant amount constant factor variable factor

Decrease and conquer is different from divide and conquer in that not both parts need to be solved. Binary search was really a divide and conquer but rather was decrease and conquer algorithm. Insertion Sort Insertion sort is a decrease by 1 algorithm. Because it only decreases by one, we should not expect it to be more efficient than linear.

We scan the array from the sorted part of the array, going from right to left, until we find the element smaller or equal to A[i] and insert A[i] right after that element.

Algorithm InsertionSort(A[0...n-1]) for i ← to n-1 do // move the unsorted portion v ← A[i] // value to sort j ← i-1 // end of the sorted array while j ≥ 0 and A[j] > v do // scan the sorted part of the array for the insertion point A[j+1] ← A[j] // shift the sort array to make room for the insertion j-- A[j+1] ← v // insert

Count the number of comparisons, in the worst case is an array that that is sorted in decreasing value. Cworst(n) = ∑i=1n-1 ∑j=0i-1 1 = n(n-1)/2 ε Θ(n2)

On average the insertion will occur at (i-1)/2, meaning only half the comparisons will be made, so the average case cost is Cavg(n) ≈ n2/4 ε Θ(n2)

The best case is a sorted array then Cbest(n) = ∑i=1n-1 1 = n-1 ε Θ(n)

The exact cost is n + k, where k is the number of inversions in the original array order. k ε Θ(n2). The attractiveness of insertion sort is the good performance on nearly sorted array. This makes it an attractive base case sorting for recursive sorts.

Depth-First Search The algorithm recursively visit adjacent vertices as deep as possible until it cannot find a new vertex to visit and then back ups.

An algorithm for a graph that might not be connected and number the vertices.

Algorithm DFS(G) count ← 0 for each vertex v in V do if v is marked 0 do dfs(v)

Algorithm dfs(v) count++ for each vertex w adjacent to v do if w adjacent to v is marked 0 then dfs(w) // not visited yet

The efficiency is size of the data structure of the graph, Θ(|V|2) for adjacency matrix and Θ(|V| + |E|) for adjacency list. Why is it Θ(|V|2) for adjacency matrix? Because the cost finding adjacent vertices in an adjacency matrix is Θ(|V|)

The edges can be marked as discovery and back. The discovery edges make a forest. Why the name back edges? Because

DFS can determine Spanning Tree/Forest Connectivity - Single tree forest Cycles - back edges Path - a long path

TRANSFORM AND CONQUER

Transform and Conquer: Instances and Structuring Either the problem or algorithm can be transformed in one of three ways: Instance simplification: the instances of the problem can be transformed into an easier instance to solve. Representation change: the data structure can be transformed so that it is more efficient. Problem reduction: the problem can be transformed to an easier problem to solve.

This lecture gives examples of instance simplification and representation change.

Presorting: Instance simplification "Presorting" is a common example of "instance simplification." Presorting is sorting ahead of time, to make repetitive solutions faster. For example if you wish to find many kth statistics in an array then it might make sense to sort the array ahead of time for so that the cost for determining each statistics is constant time.

Presorting is a form of preconditioning. Preconditioning is manipulating the data to make the algorithm faster.

Another example is the problem to determine the uniqueness of array elements. The brute force algorithm would compare each array element with the rest of the array. The cost would be Θ(n2). If we presort the array then the algorithm only needs to compare adjacent elements for uniqueness. The cost for determining uniqueness (without the sorting cost) is Θ(n).

The total cost is T(n) = Tsort(n) + Tscan(n) ε Θ(n log n) + Θ(n) = Θ(n log n)

Another example is computing the mode of an array. The mode is the most frequent array element. The brute force cost would be quadratic. If the array is sorted then we only need to count runs of value.

GREEDY APPROACH

Greedy Introduction

Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the sense that they take decisions on the basis of information at hand without worrying about the effect these decisions may have in the future. They are easy to invent, easy to implement and most of the time quite efficient. Many problems cannot be solved correctly by greedy approach. Greedy algorithms are used to solve optimization problems

Greedy Approach

Greedy Algorithm works by making the decision that seems most promising at any moment; it never reconsiders this decision, whatever situation may arise later. As an example consider the problem of "Making Change".

Coins available are:

dollars (100 cents) quarters (25 cents) dimes (10 cents) nickels (5 cents) pennies (1 cent)

Problem Make a change of a given amount using the smallest possible number of coins.

DYNAMIC PROGRAMMING

Many programs in computer science are written to optimize some value; for example, find the shortest path between two points, find the line that best fits a set of points, or find the smallest set of objects that satisfies some criteria. There are many strategies that computer scientists use to solve these problems. One of the goals of this book is to expose you to several different problem solving strategies. Dynamic programming is one strategy for these types of optimization problems.

A classic example of an optimization problem involves making change using the fewest coins. Suppose you are a programmer for a vending machine manufacturer. Your company wants to streamline effort by giving out the fewest possible coins in change for each transaction. Suppose a customer puts in a dollar bill and purchases an item for 37 cents. What is the smallest number of coins you can use to make change? The answer is six coins: two quarters, one dime, and three pennies. How did we arrive at the answer of six coins? We start with the largest coin in our arsenal (a quarter) and use as many of those as possible, then we go to the next lowest coin value and use as many of those as possible. This first approach is called a greedy method because we try to solve as big a piece of the problem as possible right away.

The greedy method works fine when we are using U.S. coins, but suppose that your company decides to deploy its vending machines in Lower Elbonia where, in addition to the usual 1, 5, 10, and 25 cent coins they also have a 21 cent coin. In this instance our greedy method fails to find the optimal solution for 63 cents in change. With the addition of the 21 cent coin the greedy method would still find the solution to be six coins. However, the optimal answer is three 21 cent pieces.

BACKTRACKING AND BRANCH AND BOUND

Backtracking [1] It is used to find all possible solutions available to the problem. [2] It traverse tree by DFS(Depth First Search). [3] It realizes that it has made a bad choice & undoes the last choice by backing up. [4] It search the state space tree until it found a solution. [5] It involves feasibility function.

Branch-and-Bound [1] It is used to solve optimization problem. [2] It may traverse the tree in any manner, DFS or BFS. [3] It realizes that it already has a better optimal solution that the pre-solution leads to so it abandons that pre-solution. [4] It completely searches the state space tree to get optimal solution. [5] It involves bounding function.

SPACE AND TIME TRADEOFFS

Space and Time Tradeoffs Storage space can save calculation time. Tables can have common functional evaluations to reduce recursive calls. Another technique is to store preprocess calculations in an array to be used during the final calculations, this is called input enhancement. We consider one famous example for string matching, Boyer-Moore algorithm. First we introduce Horspool's Algorithm which is a simpler example.

Input Enhancement in String Matching Recall the string matching problem, a pattern P[0...m-1] is searched for in a text T[0...n-1]. The brute force algorithm in worst case makes m(n-m+1) comparisons, so the cost is Θ(nm). But on average only a few comparisons are made before shifting the pattern, so the cost is Θ(n). We consider two algorithms that also achieve this cost. Horspool's Algorithm Horspool's algorithm shifts the pattern by looking up shift value in the character of the text aligned with the last character of the pattern in table made during the initialization of the algorithm. The pattern is check with the text from right to left and progresses left to right through the text.

Let c be the character in the text that aligns with the last character of the pattern. If the pattern does not match there are 4 cases to consider.

The mismatch occurs at the last character of the pattern: Case 1: c does not exist in the pattern (Not the mismatch occurred here) then shift pattern right the size of the pattern.

T[0] ... S ... T[n-1]

             |
        LEADER
                                    LEADER

Case 2: The mismatch happens at the last character of the pattern and c does exist in the pattern then the shift should be to the right most c in the m-1 remaining characters of the pattern.

T[0] ... A ... T[n-1]

             |
        LEADER
                             LEADER

The mismatch happens in the middle of the pattern: Case 3: The mismatch happens in the middle (therefore c is in pattern) and there are no other c in the pattern then the shift should be the pattern length.