Asymptotic estimates. Asymptotically sharp estimate of the growth function. Open address analysis

An analysis of the comparison of the time costs of algorithms performed by solving an instance of some problem, with large amounts of input data, is called asymptotic. The algorithm with the lowest asymptotic complexity is the most efficient.

In asymptotic analysis, algorithm complexity is a function that allows you to determine how quickly the running time of the algorithm increases with an increase in the amount of data.

The main growth estimates encountered in the asymptotic analysis are:

Ο (O-big) is the upper asymptotic estimate for the growth of the time function;
Ω (Omega) is the lower asymptotic estimate for the growth of the time function;
Θ (Theta) are the lower and upper asymptotic estimates for the growth of the time function.

Let n– amount of data. Then the growth of the algorithm function f(n) can limit functions g(n) asymptotically:

For example, the cleaning time of a room linearly depends on the area of the same room (Θ( S)), i.e., with an increase in the area in n times, the cleaning time will increase also in n once. Looking up a name in the phone book will take linear time Ο (n), if you use the linear search algorithm, or the time that depends logarithmically on the number of records ( Ο ( log 2 ( n))), in case of binary search.

For us, the greatest interest is Ο -function. Also, in later chapters, the complexity of the algorithms will only be given for the upper asymptotic bound.

Under the phrase "the complexity of the algorithm is Ο (f(n))" implies that with an increase in the amount of input data n, the running time of the algorithm will increase no faster than some constant multiplied by f(n).

Important rules of asymptotic analysis:

O(k*f) = O(f) is a constant factor k(constant) is discarded, because with the growth of the amount of data, its meaning is lost, for example:

O(9,1n) = O(n)

O(f*g) = O(f)*O(g) – the estimate of the complexity of the product of two functions is equal to the product of their complexities, for example:

O(5n*n) = O(5n)*O(n) = O(n)*O(n) = O(n*n) = O(n 2)

O(f/g)=O(f)/O(g) – the estimate of the complexity of the quotient of two functions is equal to the quotient of their complexities, for example:

O(5n/n) = O(5n)/O(n) = O(n)/O(n) = O(n/n) = O(1)

O(f+g) is equal to the dominant O(f) And O(g) – the estimate of the complexity of the sum of functions is defined as the estimate of the complexity of the dominant of the first and second terms, for example:

O(n 5 +n 10) = O(n 10)

Counting the number of operations is tedious and, importantly, not at all mandatory. Based on the above rules, in order to determine the complexity of an algorithm, it is not necessary, as we did before, to count all operations, it is enough to know what complexity this or that construction of the algorithm (operator or group of operators) has. So, an algorithm that does not contain cycles and recursions has constant complexity O(1). The complexity of the loop that executes n iterations is equal to O(n). The construction of two nested loops depending on the same variable n, has quadratic complexity O(n 2).

Different approaches can be used to evaluate the performance of algorithms. The simplest is to simply run each algorithm on several tasks and compare the execution time. Another way is to mathematically estimate the execution time by counting operations.

Consider an algorithm for calculating the value of a polynomial of degree n V given point x.
P n (x) = a n x n + a n-1 x n-1 + ... + a i x i + ... + a 1 x 1 + a 0

Algorithm 1- for each term, except for a 0, raise x to a given power by successive multiplication and then multiply by a factor. Then add up the terms.

calculation i-th term( i=1..n) requires i multiplications. So, in total 1 + 2 + 3 + ... + n = n(n+1)/2 multiplications. In addition, it is required n+1 addition. Total n(n+1)/2 + n + 1= n 2 /2 + 3n/2 + 1 operations.

Algorithm 2- take out x-s outside the brackets and rewrite the polynomial in the form
P n (x) = a 0 + x(a 1 + x(a 2 + ... (a i + .. x(a n-1 + a n x))).

For example,
P 3 (x) = a 3 x 3 + a 2 x 2 + a 1 x 1 + a 0 = a 0 + x(a 1 + x(a 2 + a 3 x))

We will evaluate the expression from within. The innermost parenthesis requires 1 multiplication and 1 addition. Its value is used for the next bracket... And so, 1 multiplication and 1 addition for each bracket, which.. n-1 thing. And after calculating the outermost bracket, multiply by x and add a 0. Total n multiplications + n additions = 2n operations.

Often such a detailed assessment is not required. Instead, only the asymptotic rate of increase in the number of operations with increasing n is given.

Function f(n) = n 2 /2 + 3n/2 + 1 increases approximately as n 2 /2(we discard the relatively slowly growing term 3n/2+1). constant multiplier 1/2 we also remove and obtain an asymptotic estimate for Algorithm 1, which is denoted by a special symbol O(n2)[read as "O big from en square"].

This is an upper bound, i.e. the number of operations (and hence the running time) grows no faster than the square of the number of elements. To get a feel for what this is, look at the table, which shows numbers illustrating the growth rate for several different functions.

		*nlog n**
1	0	0	1
16	4	64	256
256	8	2,048	65,536
4,096	12	49,152	16,777,216
65,536	16	1,048,565	4,294,967,296
1,048,576	20	20,969,520	1,099,301,922,576
16,775,616	24	402,614,784	281,421,292,179,456

If we assume that the numbers in the table correspond to microseconds, then for a task with n=1048576 elements of the algorithm with running time O(log n) will take 20 microseconds, the algorithm over time O(n)- 17 minutes, and the algorithm with running time O(n2)- more than 12 days... Now the advantage of algorithm 2 with an estimate O(n) before Algorithm 1 is quite obvious.

The best estimate is O(1)... In this case, the time does not depend on n, i.e. constantly for any number of elements.

Thus, O()- "truncated" estimate of the running time of the algorithm, which is often much easier to obtain than the exact formula for the number of operations.

So, let's formulate two rules for forming the O() estimate.

When evaluating the function, the number of operations that increases the fastest is taken as a function.
That is, if the program has one function, for example, multiplication, is executed O(n) times, and addition - O(n2) times, then the total complexity of the program - O(n2), since in the end with an increase n faster (in a certain, constant number of times) additions will be performed so often that they will affect performance much more than slow but rare multiplications. Symbol O() shows exclusively asymptotics!

When evaluating O() , constants are not taken into account.
Let one algorithm do 2500n + 1000 operations, and the other - 2n+1. Both are rated O(n), since their execution time grows linearly.

In particular, if both algorithms, for example, O(n*log n) However, this does not mean that they are equally effective. The former can be, say, 1000 times more efficient. O() only means that their time increases approximately as a function of n*log n.

Another consequence of omitting the constant is the algorithm over time O(n2) can run much faster than the algorithm O(n) for small n... Due to the fact that the real number of operations of the first algorithm can be n2 + 10n + 6, and the second - 1000000n+5. However, the second algorithm will overtake the first one sooner or later... n 2 growing much faster 1000000n.

Base of a logarithm within a symbol O() is not written.
The reason for this is quite simple. Let us have O(log2n). But log 2 n=log 3 n/log 3 2, A log 3 2, like any constant, the asymptotics is the symbol ABOUT() does not take into account. Thus, O(log2n) = O(log 3 n).

We can go to any base in a similar way, which means that it makes no sense to write it.

Mathematical interpretation of the symbol O().

Definition
O(g)- many functions f, for which there are such constants C And N, What |f(x)|<= C|g(x)| for all x>N.
Recording f = O(g) literally means that f belongs to the set O(g). In this case, the inverse expression O(g) = f doesn't make sense.

In particular, one can say that f(n) = 50n belongs O(n2). Here we are dealing with an inaccurate estimate. Of course f(n)<= 50n 2 at n>1, but a stronger statement would be f(n) = O(n), because for C=50 And N=1 right f(n)<= Cn, n>N.

Other types of assessments.

Along with the evaluation O(n) used score Ω(n)[read as "Omega big from en"]. It denotes a lower estimate for the growth of the function. For example, let the function describe the number of algorithm operations f(n) = Ω(n 2). This means that even in the most successful case, no less than an order of magnitude will be produced. n 2 actions.
...While evaluation f(n) = O(n 3) guarantees that in the worst case the actions will be of order n 3, not more.

Also used is the estimate Θ(n)["Theta from en"], which is a hybrid O() And Ω() .
Θ(n2) is both an upper and a lower asymptotic estimate at the same time - the order n 2 operations. Grade Θ() exists only when O() And Ω() match and equal.

For the algorithms for calculating the polynomial considered above, the found estimates are simultaneously O(), Ω() And Θ() .
If we add to the first algorithm for checking for x=0 in exponentiation, then on the most successful initial data (when x=0) we have order n checks, 0 multiplications and 1 addition, which gives a new estimate Ω(n) together with the old O(n2).

As a rule, the main attention is still paid to the upper bound O(), so despite the "improvement", Algorithm 2 remains preferable.

So, O()- asymptotic estimation of the algorithm on the worst input data, Ω() - on the best input data, Θ() - abbreviated notation of the same O() And Ω() .

For theoretical analysis, it is not a specific function that describes the dependence of the execution time on the number of input data, but the order of its growth for large N that is fundamental. which is more suitable measurement. The key issues are, firstly, in determining the possibility of a computer solution in a finite acceptable time in principle, and secondly, in comparing alternatives and discarding unsuitable algorithms from consideration even before efforts have been expended to achieve their full-fledged high-quality implementation.

In other words, the decisive role is played by asymptotic estimate computational complexity of the algorithm. Let's take the limit from the above relation for the bubble sort algorithm, with the number of input data N tending to infinity:

In marginal estimating, lower powers are discarded because higher powers dominate. Thus, the execution time of the bubble sort algorithm has a quadratic dependence on the amount of input data.

IN general view, the execution time of any algorithm can be represented as follows:

When studying properties and comparing algorithms, we can discard the constant factor, since for sufficiently large N, it is the order of growth of the function that is the determining factor. This is easy to explain with an example. Suppose there are 2 alternative algorithms, the first one has a quadratic order of growth, and the second one is described by a linear function. We also assume that the implementation of the first algorithm is close to optimal, and the program is running on a modern computer. At the same time, the implementation of the second algorithm is far from brilliant and runs on an outdated computer. Such an imbalance external conditions can be modeled using the difference in constant factors (let, a, a). For small N, for example, with 5 data, the execution time of the first algorithm will be less than the second one:

However, as the number of inputs increases, the second algorithm, which has better computational complexity, will outperform the first, despite all the unfortunate factors:

Whatever the values of the constant factors, when the computational complexity of one algorithm is better than the computational complexity of another, there is always some critical amount of input data, starting from which it is the order of growth of the algorithm's execution time that becomes the dominant factor.

For analytical reasoning about asymptotic estimates of the computational complexity of algorithms, several mathematical notations are used in the literature at once - O-estimate, -estimate and -estimate.

The estimate is a comparison with an infinite set of functions with the same order of growth, differing by a constant factor. A function belongs to the set if there are constants u that, for sufficiently large N, from above and below limit the function that describes the speed of the analyzed algorithm. Thus, the following relation holds:

The O-estimator is a subset of the -estimator that restricts the function of the analyzed algorithm only from above. In other words, the O-estimator asymptotically describes the worst case for the analyzed algorithm. This estimate is used most often in the analysis. Much less commonly used is an estimate that limits the function of the analyzed algorithm from below (the best case).

Among the typical asymptotic estimates of the computational complexity of algorithms, the following functions are common:

linear O(N) (time proportional to data increase);

quadratic O(N 2);

polynomial complexity O(N M), in particular, cubic complexity O(N 3);

exponential complexity O(2 N);

factorial complexity O(N!);

logarithmic complexity O(log(N)) (imply base 2 logarithm);

linear-logarithmic complexity O(N * log(N)) ;

constant computational complexity O(1) (time does not depend on the amount of data).

This list does not exclude the appearance of other functions, but the vast majority of cases encountered in practice are listed here. These functions can be ordered from most to least effective as follows:

The computational complexity of the developed algorithms should be as limited as possible. The relationship between these estimates can be felt by representing the number of instructions executed on specific examples, say at N=5, 10, 25, 100 and 500 (we assume that the constant coefficients are the same for ease of understanding). With this amount of data, we get the following indicators:








	so many	so many
so many	so many	so many

Constant computational complexity is an ideal case, often algorithms with such complexity simply do not exist for solving problems. The logarithmic function also grows relatively slowly. Linear and linear-logarithmic functions grow at an acceptable rate. Other functions grow noticeably faster.

If the program belongs to research, the maximum allowable complexity is polynomial, in particular, cubic. Algorithms with higher computational complexity are applicable only for small values of N, otherwise the problems do not have a computer solution with achievable computational costs.

At the same time, in the commercial software sector, where not only the achievability of the result in general is important, but also the acceptable waiting time for the user, algorithms whose complexity exceeds linear-logarithmic are rarely used.

A superficial estimate of computational complexity

In the classical literature on the construction and analysis of algorithms, much attention is paid to rigorous mathematical calculations that analytically derive and prove hypotheses about the computational complexity of specific algorithms. Practicing programmers pay much less attention to this process, avoiding reasoning in the formalized strict style of mathematicians. Nevertheless, if the program code that implements a particular algorithm is not too complicated, it seems possible to express some hypothesis about computational complexity close to the correct one only on the basis of the general structure of the program code.

Below are a few hints for a superficial estimation of computational complexity “by eye” without a rigorous analytical approach.

All elementary instructions - expression evaluation, variable assignment, input-output, value return - should be considered as the simplest blocks with constant computational complexity O(1).

The computational complexity of a sequence of instructions is equal to the maximum complexity of the instructions included in it.

If there is no special information about the probability of conditional jumps firing, then all possible conditional jumps, including implicit ones (omitted else, default branches), should be considered equiprobable. The complexity of each block of instructions is evaluated separately, and then the maximum of them is selected, which becomes the result of the evaluation for the conditional construct as a whole.

If the instruction is in the body of the loop, the number of iterations of which depends on N, then the computational complexity of the instruction is multiplied by a linear function.

The computational complexity of two N-dependent loops nested within each other is most likely described by a quadratic function. Accordingly, nesting of 3 levels corresponds to cubic complexity.

If an algorithm splits a set of inputs into pieces and then processes them separately by the same algorithm recursively - the computational complexity is logarithmic.

Many algorithms are recursive, i.e. call themselves with different arguments. At the same time, to exit the recursion, there must always be some “exit condition” - the values of the arguments at which the recursion is broken and some elementary action is performed. The simplest example is the factorial function:

int factorial( int _x)

if(_x< 1)

return 0;

else if(_x==1)

return 1;

return _x * factorial(_x - 1);

The first two branches of the main condition are elementary instructions, their computational complexity is estimated as O(1). As for the last branch, the complexity is described by the so-called recurrent relation:

To obtain an estimate of the computational complexity, the recurrence relations must be expanded analytically. Substituting the answers into the indicated complexity formula for the factorial step by step, we get a linear relationship.

Algorithm Analysis –

Types of analysis

Worst case: T(n)

Medium case: T(n)

Asymptotic estimate

f (n) = O(g(n))

($c > 0, n 0 >

O(g(n)) = (f (n) : $ c > 0, n 0 >

Example: 2n2 = O(n3)

Merge sort

if(p< r) //1

Recursion tree: T(n) = 2*T(n/2) +cn, с –const, c>0

Technique for evaluation of recursive algorithms.

Iteration Method

Based on the formula T(n), we write the formula for the smaller element located on the right side of the formula for T(n).

We substitute the right side of the resulting formula into the original formula

We carry out the first two steps until we expand the formula into a series without the function T(n)

Estimate the resulting series based on an arithmetic or geometric progression

T(n)=T(n-1)+n, T(1)=1

T(n)=θ(g(n)), g(n)=?

T(n-1)=T(n-2)+(n-1)

T(n-2)=T(n-3)+(n-2)

T(n-3)+(n-2)+(n-1)+n=…

1+…(n-2)+(n-1)+n=

Recursion tree - a graphical method for displaying a self-to-self relation substitution

T(n)=2T(n/2)+n 2

T(n/4) T(n/4) T(n/4) T(n/4)

(n/2) 2 (n/2) 2 log n (1/2)*n 2

(n/4) 2 (n/4) 2 (n/4) 2 (n/4) 2 (1/4)*n 2

Substitution method

Guess (propose) a solution
Check solution using induction
Find and substitute constants

T(n) = 2T(n/2) + n

T(n) = (n log n)

Premise of induction: T(n) ≤ с * n* log n, c>0

Let this estimate be true for n/2

T(n/2) ≤ c*(n/2)*log(n/2)

Substitute it into the original formula for T(n)

T(n) ≤ 2*(c*(n/2)*log(n/2))+n ≤

c*n*log(n/2)+n =

c*n*log n - c*n*log 2 +n =

c*n*log n - c*n +n ≤

c*n*log n

c≥1, n≥2

Main theorem on recursive estimators

T (n) = aT (n/b) + f (n), a ≥ 1, b > 1, f (n) − (f (n) > 0, n > n0)

Algorithms for sorting arrays in polynomial time

Sorting - the process of rearranging objects of a given

aggregates in a certain order (increasing

or decreasing).

The purpose of sorting is usually to facilitate subsequent

searching for elements in a sorted set.

Sorting by simple inserts

void sort_by_insertion(key a , int N)

for (i=1; i< N; i++)

for (j=i-1; (j>=0)&& (x< a[j]); j--)

a = a[j];

Analysis of sorting by simple insertions

Number of comparisons:

C (N) \u003d 1 + 2 + 3 + ... + N - 1 \u003d (N * (N -1)) / 2 \u003d O (N 2)

Total time: T(N) = θ(N 2)

Sort by simple exchange. bubble method.

void bubble_sort(key a , int N)

for (i=0; i

for (j=N-l; j>i; j--)

if (a > a[j]) (

x = a[j] ; a [ j ] = a [ j -1] ; a[j-1]=x;

Simple Exchange Sort Analysis

Worst case: ordered in reverse order array

Number of comparisons:

C(N) = (N - 1) + (N - 2) + ... + 1 = (N * (N-1))/2 = O(N 2)

Total time: T(N) = θ(N 2)

Addendum

Node* _Add(Node *r, T s)

r = new Node(s);

else if(s< r->inf)

r->left = _Add(r->left, s);

r->right = _Add(r->right, s);

Removing an element from the tree

Tree T with root n and key K.

remove from the tree T the node with the key K (if there is one).

Algorithm:

If tree T is empty, stop;

Otherwise, compare K with the key X of the root node n.

If K>X, recursively remove K from the right subtree of T;

If K

If K=X, then it is necessary to consider three cases.

If there are no both children, then we delete the current node and reset the link to it from the parent node;

If one of the children does not exist, then we put the values of the fields of the child m instead of the corresponding values of the root node, overwriting its old values, and freeing the memory occupied by the node m;

If both children are present, then we find the node m, which is the next one after the given one;

copy the data (except for references to child elements) from m to n;

recursively remove the node m.

The element following the given

Given: tree T and key x

We return a pointer to the next element after x, or NULL if x is the last element in the tree.

Algorithm:

We consider two cases separately.

If the right subtree of x is not empty, then the next element after x is the minimum element in this subtree.

Otherwise, if the right subtree of x is empty. We go from x up until we find a vertex that is the left child of its parent. This parent (if any) will be the element you are looking for.

Inserting Nodes

Inserting a new key into an AVL tree is done in the same way as it is done in simple search trees: we go down the tree, choosing the right or left direction of movement, depending on the result of comparing the key in the current node and the key being inserted.

The only difference is that when the recursion returns (i.e. after the key is inserted in either the right or left subtree), the current node is balanced. The imbalance arising from such an insertion in any node along the path of movement does not exceed two, which means that the application of the balancing function described above is correct.

Removing nodes

To remove a vertex from an AVL tree, an algorithm is taken as a basis, which is usually used when removing nodes from a standard binary search tree. We find the node p with the given key k, in the right subtree we find the node min with the smallest key and replace the removed node p with the found node min.

There are several options for implementation. First of all, if the found node p does not have a right subtree, then, by the property of the AVL tree, this node can have only one single child node (a tree of height 1) on the left, or node p in general is a leaf. In both of these cases, you just need to delete the p node and return a pointer to the left child node of the p node as a result.

Now let p have a right subtree. Find the minimum key in this subtree. By property of a binary search tree, this key is at the end of the left branch, starting from the root of the tree. We use the recursive findmin function.

removemin function - by removing the minimum element from the given tree. By the property of an AVL tree, the minimum element on the right either has a single node or is empty. In both cases, you just need to return a pointer to the right node and perform balancing on the way back (when returning from recursion).

Hash tables, chaining method

Direct addressing is used for small sets of keys. It is required to specify a dynamic set, each element of which has a key from the set U = (0,1,..., m - 1), where m is not too large, no two elements have the same keys.

To represent a dynamic set, an array (a table with direct addressing), T , is used, each position, or cell, of which corresponds to a key from the key space U.

Cell k points to an element of the set with key k. If the set does not contain an element with key k, then Т[k] = NULL.

Key lookup operation takes time O(1)

Disadvantages of direct addressing:

If the key space U is large, storing the table T with size |U| impractical, if not impossible, depending on the amount of memory available and the size of the keyspace

The set K of actually stored keys may be small compared to the key space U, in which case the memory allocated to the table T is mostly wasted.

A hash function is a function h that locates the elements of the set U in the table T.

In hashing, the element with key k is stored in location h(k), the hash function h is used to calculate the location for the given key k. The function h maps the key space U to the cells of the hash table T [0..m - 1]:

h: U → (0,1,...,m-1).

the value h(k) is called the hash value of the key k.

When the set K of keys stored in a dictionary is much smaller than the space of possible keys U, the hash table requires significantly less space than a directly addressable table.

The purpose of the hash function is to reduce the working range of array indices, and instead of |U| values, you can get by with just m values.

The memory requirements can be reduced to θ(|K|), while the time to look up an element in a hash table remains O(1) - this is the bound on the average lookup time, while in the case of a table with direct addressing, this bound is valid for worst case.

A collision is a situation where two keys map to the same cell.

For example, h(43) = h(89) = h(112) = k

Chain method:

Idea: Store elements of a set with the same hash value as a list.

h(51) = h(49) = h(63) = i

Analysis

Worst case: if the hash function for all elements of the set produces the same value. The access time is Θ(n), for |U | = n.

Medium case: For the case where the hash values are evenly distributed.

Each key with equal probability can get into any cell of the table, regardless of where other keys got.

Let table T be given, and n keys are stored in it.

Then, a = n/m is the average number of keys in the table cells.

The search time for the missing element is Θ(1 + α).

The constant time to calculate the hash value plus the time to go through the list to the end, because the average list length is α, then the result is Θ(1) + Θ(α) = Θ(1 + α)

If the number of table cells is proportional to the number of elements stored in it, then n = O(m) and, therefore, α = n/m = O(m)/m = O(1), which means that the search for an element in the hash table in on average takes time Θ(1).

Operations

Inserting an element into a table

Removal

also require O(1) time

Selecting a hash function

Keys should be evenly distributed across all cells

The hash key distribution pattern should not correlate with data patterns. (For example, the data is even numbers.)

Methods:

division method

multiplication method

division method

h(k) = k mod m

Small Divisor Problem m

Example #1. m = 2 and all keys are even Þ odd cells are not

filled.

Example #2. m = 2rÞ hash is independent of bits in above r.

multiplication method

Let m= 2 r , the keys are w-bit words.

h(k) = (A k mod 2 w) >> (w - r), Where

A mod 2 = 1 ∩ 2 w-1< A< 2 w

He should not choose A close to 2 w-1 And 2w

This method faster method division

Multiplication method: example

m = 8 = 2 3 , w = 7

Open addressing: search

Search is also sequential research

Success when found the meaning

Failure when found an empty cell or passed the entire table.

Research strategies

Linear -

h(k, i) = (h′(k) + i) mod m

This strategy is easy to implement, but is subject to the problem

primary clustering associated with the creation of a long sequence

number of occupied cells, which increases the average search time.

quadratic

h(k, i) = (h′(k) + с 1 i+ с 2 i 2) mod m

where h′(k) is the usual hash function

Double hashing -

h(k,i) = (h 1 (k) + i h 2 (k)) mod m.

double hashing

This method gives excellent results, but h 2 (k) must be relatively prime to m.

This can be achieved:

using powers of 2 as m and making h 2 (k) produce only odd numbers

m = 2 r and h 2 (k)- odd.

m- prime number, values h 2 are positive integers less than m

For simple m can be installed

h1(k)=k mod m

h2(k)=1+(k mod m')

m' is less than m (m' =m-1 or m-2)

Open Addressing: Paste Example

Let table A be given:

double hashing

h2(k)=1+(k mod 11)

Where will the element be embedded?

Open address analysis

An additional assumption for uniform hashing is that each key is equally likely to receive any of the m! permutations of table study sequences

regardless of other keys.

Finding a Missing Element

Number of probes for a successful search

open addressing

A< 1 - const Þ O(1)

How does it behave A:

Table completed 50% Þ2 study

Table 90% complete Þ 10 studies

Table 100% complete Þ m studies

The principle of greedy choice

It is said that for an optimization problem we apply greedy choice principle, if the sequence is local optimal choices gives a globally optimal solution.

In a typical case, the proof of optimality follows the following scheme:

It is proved that the greedy choice at the first step does not close the path to the optimal solution: for every solution there is another solution that is consistent with the greedy choice and is not worse than the first one.

It is shown that the subproblem arising after the greedy choice at the first step is similar to the original one.

The argument ends by induction.

Optimality for subtasks

The task is said to have the property optimality for subproblems, if the optimal solution of the problem contains the optimal solutions for all its subproblems.

Building the Huffman Code

Any message consists of a sequence of characters of some alphabet. Often, to save memory, to increase the speed of information transfer, the task of compressing information arises. In this case, special character encoding methods are used. These include Huffman codes, which give compression from 20% to 90% depending on the type of information.

The Huffman algorithm finds optimal character codes based on the frequency of characters in the compressed text.

Huffman's algorithm is an example of a greedy algorithm.

Let the character frequencies be known in a file of length 100000 characters:

We need to build a binary code in which each character is represented as a finite sequence of bits, called a code word. When using a uniform code, in which all code words have the same length, a minimum of three bits is spent on each character and 300,000 bits will be spent on the entire file.

An uneven code will be more economical if frequently occurring characters are encoded with short sequences of bits, and rarely occurring characters with long ones. When encoding the entire file, it will take (45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4)*1000 = 224000. That is, an uneven code gives about 25% savings.

Prefix codes

Consider codes in which, for each of two bit sequences representing different characters, neither is a prefix of the other. Such codes are called prefix codes.

When encoding, each character is replaced by its own code. For example, the string abc looks like 0101100. For a prefix code, decoding is unique and is done from left to right.

The first character of a text encoded with a prefix code is uniquely determined, since its code word cannot be the beginning of any other character. Having determined this character and discarded its codeword, we repeat the process for the remaining bits, and so on.

To effectively implement decoding, you need to store information about the code in a convenient form. One possibility is to represent the code in the form code binary tree, whose leaves correspond to the encoded characters. In this case, the path from the root to the encoded character determines the coding sequence of bits: moving along the tree to the left gives 0, and moving to the right gives 1.

The internal nodes indicate the sum of the frequencies for the leaves of the corresponding subtree.

The optimal code for a given file always corresponds to a binary tree in which every vertex that is not a leaf has two sons. The uniform code is not optimal because the corresponding tree contains a node with one child.

An optimal prefix code tree for a file that uses all characters from some set C and only they contain exactly | C | leaves, one for each character and exactly | C | - 1 nodes that are not leaves.

Knowing the tree T corresponding to the prefix code, it is easy to find the number of bits needed to encode the file. For each character c from the alphabet C, let f[c] be the number of times it occurs in the file, and dT(c) the depth of its corresponding leaf, and hence the length of the bit sequence that encodes c. Then to encode the file you will need:

This value is called the cost of the tree T. It is necessary to minimize this value.

huffman proposed a greedy algorithm that builds an optimal prefix code. The algorithm builds a tree T corresponding to the optimal code from bottom to top, starting with a set of | C | leaves and making | C | - 1 merge.

For each symbol, its frequency f [c] is given. To find two objects to merge, a priority queue Q is used, using frequencies f as priorities - the two objects with the lowest frequencies are merged.

As a result of the merging, a new object (internal vertex) is obtained, the frequency of which is considered equal to the sum of the frequencies of the two merged objects. This vertex is added to the queue.

Huffman C)

1.n ←│C│ │ C │- power С

2.Q ← C Q - priority queue

3. for i ← 1 to n-1

4. do z ← Create_Node() z is a node consisting of fields f, left, right

5. x ← left [z] ← Dequeue(Q)

6. y ← right [z] ← Dequeue(Q)

7. f[z] ← f[x] + f[y]

8. Enqueue(Q, z)

9. return Dequeue(Q) return tree root

Grade

The queue is implemented as a binary heap.

You can create a queue in O(n).

The algorithm consists of a loop that is executed n-1 times.

Each queue operation takes O(log n).

Total running time O(n log n).

The problem of building a network

Areas of origin: communication and road networks.

Given: set of network nodes (hosts, cities).

Necessary: building a network with the smallest total weight of edges (length of network cables, length of roads).

Graph model: network nodes are graph nodes, E = V2, we know the weights of all edges.

Result: free tree.

MOD search approach

We build tree A by adding one edge at a time, and before each iteration, the current tree is a subset of some MOD.

At each step of the algorithm, we define an edge (u, v) that can be added to A without violating this property. We will call such an edge safe.

GenericMST(G, w)

2 while A is not a mod

3 do Find an edge (u, v) that is safe for A

4 A ← A U ((u, v))

____________________________________________________________________________

Rib classification

1. tree ribs(tree edges) are the edges of the graph G. An edge (u, v) is a tree edge if v is discovered when exploring this edge.

2. Reverse ribs(back edges) are edges (u,v) connecting vertex u to its ancestor v in the depth-first search tree. Edge cycles that may occur in directed graphs are treated as back edges.

3. straight ribs(forward edges) are non-tree edges (u,v) connecting vertex u to its descendant v in the depth-first search tree.

4. Cross ribs(cross edges) - all other edges of the graph. They can join vertices in the same DFS tree when none of the vertices is an ancestor of the other, or join vertices in different trees.

The DFS algorithm can be modified in such a way that it will classify the edges encountered during operation. The key idea is that each edge (u, v) can be classified by the color of the vertex v the first time it is examined (however, straight and crossed edges are not distinguished).

White color says that this is an edge of a tree.
The gray color defines the back edge.
Black indicates a straight or cross edge.

The first case follows directly from the definition of the algorithm.

Considering the second case, note that the gray vertices always form a linear chain of children corresponding to the stack of active calls to the DFS_Visit procedure; the number of gray vertices is one more than the depth of the last opened vertex in the depth-first search tree. Exploration always starts from the deepest gray vertex, so an edge that reaches another gray vertex reaches the parent of the original vertex.

In the third case, we are dealing with the remaining edges that do not fall under the first or second case. An edge (u, v) can be shown to be straight if d [u]< d [v], и перекрестным, если d [u] >d[v]

___________________________________________________________________________

Topological sort

IN precedence column each edge (u, v) means that u precedes v

Topological sort graph is the construction of a sequence a, where for all a i and a j $(а i ,а j) Þ i< j.

The topological sort of a directed acyclic graph G = (V, E) is a linear ordering of all its vertices such that if the graph G contains an edge (u, v) then u is located before v under this ordering (if the graph is not acyclic , such sorting is not possible). The topological sorting of a graph can be thought of as ordering its vertices along a horizontal line such that all edges are directed from left to right.

Sorted sequence: C2, C6, C7, C1, C3, C4, C5, C8

for each(u in V) color[u] = white; // initialize

L = new linked_list; // L is an empty linked list

for each (u in V)

if (color[u] == white) TopVisit(u);

return L; // L gives final order

TopVisit(u) ( // start a search at u

color[u] = grey; // mark u visited

for each (v in Adj(u))

if (color[v] == white) TopVisit(v);

Append u to the front of L; // on finishing u add to list

T (n) = Θ(V + E)

Procedures

Create - Set (u)- create a set from one vertex u.

Find - Set (u)- find the set to which the vertex belongs ureturns in what set the specified element is located. In fact, this returns one of the elements of the set (called representative or leader). This representative is chosen in each set by the data structure itself (and may change over time, namely, after calls Union).

If the Find - Set call for any two elements returned the same value, then this means that these elements are in the same set, otherwise they are in different sets.

Union (u,v)- merge sets that contain vertices u And v

Sets of elements will be stored in the form trees: one tree corresponds to one set. The root of the tree is the representative (leader) of the set.

When implemented, this means that we get an array parent, in which for each element we store a reference to its ancestor in the tree. For tree roots, we will assume that their ancestor is themselves (i.e., the link loops in this place).

To create a new element (operation Create-Set), we simply create a tree rooted at node v , noting that its ancestor is itself.

To merge two sets (operation union(a,b)), we first find the leaders of the set that contains a and the set that contains b. If the leaders coincide, then we do nothing - this means that the sets have already been united. Otherwise, you can simply specify that the ancestor of node b is equal to f (or vice versa) - thereby joining one tree to another.

Finally, the implementation of the leader search operation ( Find - Set(v)) is simple: we go up the ancestors from the top v until we reach the root, i.e. until the ancestor reference behaves. This operation is more convenient to implement recursively.

Path Compression Heuristic

This heuristic is designed to speed things up Find - Set() .

It lies in the fact that when, after calling Find - Set(v) we'll find the leader we're looking for p set, then remember that the vertex v and all the peaks passed along the way - it is this leader p. The easiest way to do this is by redirecting them parent to this peak p .

Thus, the array of ancestors parent the meaning changes somewhat: now it is compressed array of ancestors, i.e. for each vertex, it can store not the immediate ancestor, but the ancestor of the ancestor, the ancestor of the ancestor of the ancestor, and so on.

On the other hand, it is clear that it is impossible to make these pointers parent always pointed to the leader: otherwise, when performing the operation Union would have to update the leaders O(n) elements.

So to an array parent should be treated exactly like an ancestor array, possibly partially compressed.

Applying Path Compression Heuristics allows one to achieve the logarithmic asymptotics: per request on average

Analysis

Initialization - O(V)

The loop runs V times and each extractMin operation takes - O(logV) times, total O(V logV) times

The for loop runs O(E) times, decreaseKey takes O(logV) time.

Total running time - O(V log V +E logV)= O(E logV)

Shortest path property

Let p = (v 1 , v 2 ..... v k)- the shortest path from the vertex v 1 to the vertex v k in a given weighted directed graph G=(U.E) with weight function w: E → R a p ij = (v i , v i+1 .....v j)- partial path of path p that goes from the vertex v i to the top vj for arbitrary i and j (1 ≤ i< j ≤ k). Then p ij is the shortest path from the top v i To v i

Dijkstra(G, w, s) (

for each (u in V) ( // initialization

d [u] = +infinity

color[u] = white

d[s] =0 // dist to source is 0

Q = new PriQueue(V) // put all vertices in Q

while (Q.nonEmpty()) ( // until all vertices processed

u = Q. extractMin() // select u closest to s

for each (v in Adj[u]) (

if (d[u] + w(u,v)< d[v]) { // Relax(u,v)

d[v] = d[u] + w(u,v)

Q.decreaseKey(v, d[v])

color[u] = black

Difficulty rating

The Bellman-Ford algorithm completes its work within time O(V*E), since the initialization in line 1 takes O(V) time, for each |V| - 1 edge walks in lines 2-4 take O(E) time, and the for loop in lines 5-7 takes O(E) time. .

Asymptotic estimation of the algorithm

Algorithm Analysis – theoretical study of the performance of computer programs and the resources they consume.

Speed - the time of the algorithm, depending on the amount of input data.

It is determined by the function T(n), where n is the amount of input data

Types of analysis

Worst case: T(n) is the maximum time for any input data of size n.

Medium case: T(n) is the expected time for any input of size n.

The best case is the minimum running time.

Asymptotic estimate

O- notation: asymptotic upper bound

f (n) = O(g(n))

($c > 0, n 0 > 0 Þ 0 ≤ f (n) ≤ c g(n), n ≥ n 0)

O(g(n)) = (f (n) : $ c > 0, n 0 > 0 z 0 ≤ f (n) ≤ c g(n), n ≥ n 0 )

Example: 2n2 = O(n3)

Recursive algorithms, construction of an asymptotic estimate. Example

Merge sort

sort(А,p,r) //p - the beginning of the array, r - the end of the array T(n)

if(p< r) //1

q=(p + r)/2; // Calculate the middle of array 1

sort(A, p, q); //sort the left side of T(n/2)

sort(A,q+1,r); //sort the right side of T(n/2)

merge(p,q,r); //merge two arrays into one n