Jump to content

Two-dimensional pattern matching

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by AmirOnWiki (talk | contribs) at 11:04, 28 June 2025 (See Also). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, two-dimensional pattern matching is the problem of locating occurrences of a two-dimensional matrix of characters ("the pattern") in a bigger two-dimensional matrix ("the picture", or, in analogy with string searching, "the text").

The naïve solution

Assume given a pattern and a text , where . The simplest approach is to compare P against every sub-array of size </math>m\times m</math> in T. This algorithm has a worst-case time of . It is usually assume that , whence this can be written as .

An automaton-based solution

We shall illustrate a more efficient solution by means of an example. Suppose we have the following pattern and text

                 abaab
    aba          baaab
P = bab      T = abaab
    aba          babab
                 ababa

We first construct a Aho-Corasic automaton to search a linear text for occurrences of the columns of P. As a biproduct we obtain an identifier for every column, so that two identical columns get the same identifier: in our example, say the first (and third) columns are idntified by 0, and the middle column by 1.

Next, we run the automaton on the columns of T, obtaining (in linear time) an indication whenever one of the columns of P appears in T: in our example this will be a 3×5 matrix C as follow (a "-" marks an absence of a match):

                 abaab
    aba          baaab
P = bab      T = abaaa    C = 01---
    aba          babab        10--1
                 ababa        010-0

We now use the Knuth-Morris-Pratt Algorithm to search the rows of C for a pattern identical to P, i.e., 010. When we find one, we have idnetified an occurrence of P in T.

The running time of this algorithm is (considering the size of the alphabet to be a constant).

More Efficient Solutions

See Also