Jump to content

User:Juliefeng/sandbox

From Wikipedia, the free encyclopedia

In computer science, Berry-Ravindran algorithm (BR algorithm) is a fast matching algorithm that is used to find the matched strings. This algorithm efficiently provide a method that shift value from two ongoing characters to the right of the window. It can be quickly search the goal strings. Thus, it can be utilized in wide range of problem solving such as biological information, and search engine, artificial intelligence, information security and others.

Berry-Ravindran Algorithm

[edit]

Berry-Ravindran algorithm searches the text string and pattern to compare either a match or a mismatch occur. There are two phases to complete the matching. First phase is preprocessing. In this step, for each pair of characters (a,b) will be computed as follow formula[1]:

brBc[a,b]= min

              1| if x[m-1]=a
              m-i+1| if x[i]x[i+1]=ab
              m+1| if x[0]=b
              m+2| otherwise 
              
              

Then, the bad character table will be built by each pair of character.

Example

[edit]

Text=”dacbadacdcdcdbcbcacdbcad” Pattern=”dacdcdcd” ; m=8; count from 0 to 7;

When caculated the value that put in the table, each pair (a, b) calculate the minimal value of above formula and then compute all the values that can fill the table. Extracting the each character from text, note: this is no repeat the character and other character use”∗” to present. For example, the pair (b,c) we can see it is not meet the x[m-1]=a because m-1=7, the “7” is d not equal to b. Then, we find this pair also does not meet x[i]x[i+1] because there is no “b,c” showing in the pattern string. Also, it does not meet the third one x [0] =b because the “0” in pattern actually is not b=“c”. Therefore, we can calculate the value m+2= 8+2=10. So, we put the 10 in the table.


     d   a   c   b   ∗

 d   1   1   1   1   1

 a   9 10  7 10 10

 c   3 10 10 10 10

 b  9 10 10 10 10

∗  9 10 10 10 10


After we finished calculate all the characters in the text, then begin the second phase is searching. There should be several attempts to compare the string if there is a match. Firstly, we compare these two, if two characters are not equal, then we swift the bits to rightmost of the window. The value is form the next one behind the pattern string in the text strings. For example, Text=”dacbadacdcdcdbcbcacdbcad” Pattern=”dacdcdcd” the (brBc[d][c])−1, so the pattern string will shift 1; And then continue the process till there are no strings of pattern to compare, that is, we finish comparing them.

Pseudocode

[edit]

1. Preprocessing:

preBrBc(char x, int m, int brBc[Asize][Asize])

 case1: for(a=0; a<Asize; ++a)
         for (b=0;b,< Asize; ++b)
         then print brBc [a][b]=m+2;     
 case2:  for (a = 0; a < Asize; ++a);
 if x[0]=b
         then print brBc[a][x[0]] = m + 1; 
 case3: for (i = 0; if i < m - 1;then ++i) if x[i]x[i + 1]=ab;
          then print  brBc[x[i]][x[i + 1]] = m – i+1;
 case4: for (a = 0; a <Asize; ++a) if x[m-1]=a
           then print  brBc[x[m - 1]][b] = 1;

2. Searching:

BR(char *x, int m, char *y, int n)

  int j, brBc[ASIZE][ASIZE];
  using the shift value of preprocessing table
  preBrBc(x, m, brBc);
  and then do the search;
  y[n + 1] = null;
  j= 0;
  while (j <= n - m) 
 {if (strings(x, y + j, m) == 0)
        output(j);
  j+= brBc[y[j + m]][y[j + m + 1]];}

Correctness proof

[edit]

the two characters T[s+m] and T[s+m+1], m is the size of pattern P, each matching phase shift s, the substring of text T[s+m...s+m+1] is aligned with its rightmost occurrence in P. this number to the shift by brBc(T[s+m],T[s+m+1])positions,where brBc(c1,c2)= min({0<k<m|p[m-k-1]=c1, and p[m-k]=c2}∪{k|k=m and p[0]=c2}∪{m+1})

Complexity

[edit]

In Berry Ravindran algorithm, due to two phases, so the complexity time are separate. First,the preprocessing phase in O(m+δ²) space and time complexity; And the searching phase has O(mn) time complexity.

Applications

[edit]

Berry Ravindran effectively provide a quick searching method that it can be used in computer application, text processing, artificial intelligence, and biological sciences. Wide range of utilization facilitate that can quickly scan and searching the goal strings. Thus, A great number of filed begin to implement this algorithm.

[edit]

Boyer-Moore algorithm

Reference

[edit]
  1. ^ A fast string matching algorithm and experimental results

1.BERRY, T., RAVINDRAN, S., 1999, A fast string matching algorithm and experimental results, in Proceedings of the Prague Stringology Club Workshop`99, J. Holub and M. Simánek ed., Collaborative Report DC-99-05, Czech Technical University, Prague, Czech Republic, 1999, pp 16-26.

2.R.S. Boyer, J.S. Moore. A Fast String Searching Algorithm, Communications of the ACM, vol. 20, pp.762-772, 1977.

3.Knuth, D., Morris, J. H., Pratt, V., "Fast pattern matching in strings," SIAM Journal on Computing, Vol. 6, No. 2, doi: 10.1137/0206024, 1977, pp.323–350.

4.R. Thathoo,A. Virmani, S. Lakshmi, N. Balakrishnan, K. Sekar. TVSBS: A Fast Exact Pattern Matching Algorithm for Biological Sequences, Current Science, vol. 91, pp. 47-53, Jul. 2006.