Text processing

In computing, the term text processing refers to the discipline of mechanizing the creation or manipulation of electronic text. Text usually refers to all the alphanumeric characters specified on the keyboard of the person performing the mechanization, but in general text here means the abstraction layer that is one layer above the standard character encoding of the target text. The term processing refers to automated (or mechanized) processing, as opposed to the same manipulation done manually.
Text processing involves computer commands which invoke content, content changes, and cursor movement, for example to
- search and replace
- format
- generate a processed report of the content of, or
- filter a file or report of a text file.
The text processing of a regular expression is a virtual editing machine, having a primitive programming language that has named registers (identifiers), and named positions in the sequence of characters comprising the text. Using these the "text processor" can, for example, mark a region of text, and then move it. The text processing of a utility is a filter program, or filter. These two mechanisms comprise text processing.
History
The development of computer text processing started in earnest with Kleene's formalizing what is a regular language. Such regular expressions could then became a mini-program, complete with a compilation process, available to perform any edit, once that language was extended. Similarly, filters are extended by evolving particular options.
Basic concepts
An editor essentially invokes an input stream and directs it to the text processing environment, which is either a command shell or a text editor. The resulting output is applicable to further text processing, the final result of which is comparable to a single application of an algorithm applied once by a more sophisticated and structured computer program.
Text processing is, unlike an algorithm, is a manually administered sequence of simpler macros that are the pattern-action expressions and filtering mechanisms. In either case the programmer's intention is impressed indirectly upon a given set of textual characters in the act of text processing. The results of a text processing step are sometimes only hopeful, and the attempted mechanism is often subject to multiple drafts through visual feedback, until the regular expression or markup language details, or until the utility options, are fully mastered.
Text processing is concerned mostly with producing textual characters at the highest level of computing, where its activities are just below the practical uses of computing—the manual transmission of information.
Ultimately all computing is text processing, from the self-compiling textual characters of an assembler, through the automated programming language generated to handle a blob of graphical data, and finally to the metacharacters of regular expressions which groom existing text documents.
Text processing is its own automation.
Characters
Textual characters come in standardized character sets containing also control characters such a newline character, which arrange text. Other types of control characters arrange the transmission, define the character sets, and perform other housekeeping tasks.
See also
External links
- The subject matter of the book Automatic Text Processing by Gerard Salton
- Database with Text Processing Tools (2013-10-23)
- Content analysis software Software for Content Analysis.
- Text Tools Online Online Text processing tools.