Multimedia Web Ontology Language

Machine interpretation of documents and services in Semantic Web environment is primarily enabled by (a) the capability to mark documents, document segments and services with semantic tags and (b) the ability to establish contextual relations between the tags with a domain model, which is formally represented as ontology. Human beings use natural languages to communicate an abstract view of the world. Natural language constructs are symbolic representations of human experience and are close to the conceptual model that Semantic Web technologies deal with. Thus, Natural Language constructs have been naturally used to represent the ontology elements. This makes it convenient to apply Semantic Web technologies in the domain of textual information.

In contrast, multimedia documents are perceptual recording of human experience. An attempt to use a conceptual model to interpret the perceptual records gets severely impaired by the semantic gap that exists between the perceptual media features and the conceptual world. Notably, the concepts have their roots in perceptual experience of human beings and the apparent disconnect between the conceptual and the perceptual world is rather artificial. The key to semantic processing of multimedia data lies in harmonizing the seemingly isolated conceptual and the perceptual worlds. Representation of the Domain Knowledge needs to be extended to enable perceptual modeling, over and above conceptual modeling that is presently supported. The perceptual model of a domain primarily comprises observable media properties of the concepts. Such perceptual models are useful for semantic interpretation of media documents, just as the conceptual models help in the semantic interpretation of textual documents.

Concepts are formed in human minds through a complex refinement process of personal experiences. An observation of the real world object amounts to reception of perceptual data through our sense organs. The raw data goes through a process of refinement depending on the personal viewpoint of the observer, and gets encoded in our minds to give rise to a mental model. An abstraction of many such mental models, arising out of many observations of the real world, gives rise to concepts, which are labeled with some linguistic constructs to facilitate communication using spoken or written text. Further, the similarities between the perceptual properties of concepts give rise to concept taxonomy, on which domain ontology is built. For example, observation of a number of monuments, and analysis of their structural similarities, gives rise to concepts like the “monument” and its sub-classes, such as “fort”, “palace” and “tomb”. The fact that concepts are abstractions of perceptual observations has an interesting consequence. When we look for an instance of a concept in the real world or in multimedia documents, we expect some perceptual patterns to appear. A concept is recognized on the basis of evidential strength of some of these patterns which are actually observed. For instance, we recognize a monument, when we observe one or more of its characteristic visual patterns such as a dome, the minarets and the facade in still images or videos. Thus, concepts in the real world are characterized by observation models, which comprise expected manifestations of the concept in the perceptual space and are the key to their recognition through perceptual evidences.

Creation of an observation model for a concept requires domain knowledge that embeds perceptual property descriptions of concepts and media illustrations. For example, a part of the observation model for the monument “Tajmahal” can be derived from the knowledge that the Tajmahal is an instance of a “medieval monument” and that medieval monuments comprise “domes”, “minarets” and “arches”, which can be recognized by their characteristic shapes. Multimedia ontology need to encode this information.

To express such ontology representing perceptual models, we need a multimedia ontology language that provides a means for expressing media properties explicitly for the concepts in a domain. It should have enough flexibility to capture the encoding of different types of media feature descriptions and different media formats at varying levels of abstractions. It should be possible to specify the uncertainties associated with the media property of concepts. The language should provide the capability of expressing spatial and temporal relations between the media properties that often characterize multimedia objects and events. To enable formulation of observation model for the domain concepts, it should support reasoning with the relations between the concepts for media property propagation. The ontology language needs to be complemented with a scheme for reasoning with uncertainties for concept recognition.

Existing works on knowledge representation for multimedia data processing lack the rigor of modern ontology languages and associated reasoning. With this background, we present a new ontology representation language Multimedia Web Ontology Language (MOWL) for multimedia data processing.

History

W3C forum has undertaken the initiative of standardizing the ontology representation for web-based applications. The Web Ontology Language (OWL), standardized in 2004 after maturing through XML(S), RDF(S) and DAML+OIL is a result of that effort. Ontology in OWL (and some of its predecessor languages) has been successfully used in establishing semantics of text in specific application contexts.

The concepts and properties in these traditional ontology languages are expressed as text, making an ontology readily usable for semantic analysis of textual documents. Semantic processing of media data calls for perceptual modeling of domain concepts with their media properties. Such modeling was first proposed in the Ph.D. Thesis by Hiranmay Ghosh (Electrical Engineering Department, IIT Delhi, 2002) in the form of Knowledge Description Language (KDL). With the standardization of OWL by W3C, KDL was merged with OWL to form Multimedia Web Ontology Language (MOWL). Several students have taken the work further to implement research prototypes of retrieval systems and ontology learning.

Key Features

Syntactically, MOWL is an extension of OWL. These extensions enable

Definition of media properties following MPEG-7 media description model.
Probabilistic association of media properties with the domain concepts.
Formal semantics to the media properties to enable reasoning.
Formal semantics for spatio-temporal relations across media objects and events.

MOWL is accompanied with reasoning tools that support

Construction of model of observation for a concept in multimedia documents with expected media properties.
Probabilistic (Bayesian) reasoning for concept recognition with the model of observation.

Bibliography

A Mallik, S Chaudhury and H Ghosh. Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance. In ACM Journal of Computing and Cultural Heritage. 4(3), December 2011.
A Malik, S Chaudhury, H Ghosh, Preservation of Intangible Heritage: A case-study of Indian Classical Dance. In eHeritage 2010: 2nd ACM Workshop on eHeritage and Digital Art Preservation [ACM Multimedia Conference], October 2010.
S Chaudhury and H Ghosh. Ontology based access to heritage artefacts on the web. In Multimedia Information Extraction and Digital Heritage Preservation. Ed. B.B. Chaudhuri and U. Munshi. World Scientific Pub Co. Inc., Mar. 2011
H. Ghosh, G. Harit and S. Chaudhury. Using ontology for building distributed digital libraries with multimedia contents. World Digital Library, 1(2), Dec 2008, pp. 83-100.
S. Wattamwar and H. Ghosh. Spatio-Temporal Query for Multimedia Database. Workshop on Multimedia Semantics. ACM Multimedia Conference 2008, Vancouver (Canada), October 2008
H. Ghosh, P. Poornachandra, A. Mallik and S. Chaudhury. Learning Ontology for Personalized Video Retrieval. International Workshop on Many Faces of Multimedia Semantics (WMS07), ACM Multimedia Conference, Augsberg (Germany) September 2007.
H.Ghosh, S. Chaudhury, K. Kashyap and B. Maiti. Ontology Specification and Integration for Multimedia Applications. In Ontologies in the Context of Information Systems, Ed. R. Sharman, R. Kishore and R. Ramesh. Springer, 2007, pp. 265-296
H.Ghosh, G. Harit and S. Chaudhury. Ontology based interaction with multimedia collections. International Conference on Digital Libraries, New Delhi, 2006.
G. Harit, S. Chaudhury and H. Ghosh. Using Multimedia Ontology for generating conceptual annotations and hyperlinks in video collections. International conference on Web Intelligence, Hong Kong, 2006.
T. Karthik, S. Chaudhury and H. Ghosh. Specifying Spatio-Temporal Relations in Multimedia Ontologies. International Conference of Pattern Recognition and Machine Intelligence, Kolkata 2005.
H. Ghosh and S. Chaudhury. Distributed and Reactive Query Planning in R-MAGIC: An Agent based Multimedia Retrieval System. IEEE Trans KDE, 16(9), Sep 2004.
H. Ghosh, N. Rajarathnam and S. Chaudhury. Knowledge Representation for Web based Services in a Multi-cultural Environment. IEEE International Workshop on Website Evolution (WSE-2001), Florence, Nov 2001.