Hierarchical classification
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
A hierarchical classifier is a classifier that maps input data into defined subsumptive output categories.[example needed] The classification occurs first on a low-level with highly specific pieces of input data. The classifications of the individual pieces of data are then combined systematically and classified on a higher level iteratively until one output is produced. This final output is the overall classification of the data. Depending on application-specific details, this output can be one of a set of pre-defined outputs, one of a set of on-line learned outputs, or even a new novel classification that hasn't been seen before. Generally, such systems rely on relatively simple individual units of the hierarchy that have only one universal function to do the classification. In a sense, these machines rely on the power of the hierarchical structure itself instead of the computational abilities of the individual components. This makes them relatively simple, easily expandable, and very powerful.
Hierarchical classification is sometimes referred to as instance space decomposition[1], which splits a complete multi-class problem into a set of smaller classification problems. It serves for learning more accurate concepts due to simpler classification boundaries in subtasks and individual feature selection procedures for subtasks. When doing classification decomposition, the central choice is the order of combination of smaller classification steps, called the classification path. Depending on the application, it can be derived from the confusion matrix and, uncovering the reasons for typical errors and finding ways to prevent the system make those in the future. For example,[2] on the validation set one can see which classes are most frequently mutually confused by the system and then the instance space decomposition is done as follows: firstly, the classification is done among well recognizable classes, and the difficult to separate classes are treated as a single joint class, and finally, as a second classification step the joint class is classified into the two initially mutually confused classes.[citation needed]
Application
Many applications exist that are efficiently implemented using hierarchical classifiers or variants thereof. One such example lies in the area of computer vision. Recognizing pictures is something that hierarchical processing can do well.[citation needed] The reason the model is so well fit to this application is that pictures can intuitively be viewed as a collection of components or objects. These objects can be viewed as collections of smaller components like shapes, which can be viewed as collections of lines, and so on. This coincides directly with the way hierarchical processing works. If a simple unit of the processing hierarchy can classify lines into shapes, then an equivalent unit could process shapes into objects (of course, there are some intermediate steps between these, but the idea is there). Thus, if you arrange these generic classifying units in a hierarchical fashion (using a directed acyclic graph), a full step-by-step classification can ensue from pixels of color all the way up to an abstract label of what is in the picture.
There are a lot of similar applications that can also be tackled by hierarchical classification such as written text recognition[clarification needed - ambiguous term], robot awareness, etc. It is possible that mathematical models and problem solving methods can also be represented in this fashion.[citation needed] If this is the case, future research in this area could lead to very successful automated theorem provers across multiple domains. Such developments would be very powerful,[according to whom?] but is yet unclear how exactly these models are applicable.
See also
External links
- ^ Cohen, S.; Rokach, L.; Maimon, O. (2007). "Decision-tree instance-space decomposition with grouped gain-ratio". Information Sciences. 177 (17). Elsevier: 3592–3612. doi:10.1016/j.ins.2007.01.016.
- ^ Sidorova, J., Badia, T. "ESEDA: tool for enhanced speech emotion detection and analysis". The 4th International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution (AXMEDIS 2008). Florence, November, 17-19, pp. 257–260. IEEE press.