Automatic image annotation

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations, then techniques were developed using machine translation to try to translate the textual vocabulary with the 'visual vocabulary', or clustered regions known as blobs. Work following these efforts have included classification approaches, relevance models and so on.

The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user.^[1] CBIR generally (at present) requires users to search by image concepts such as color and texture, or finding example queries. Certain image features in example images may override the concept that the user is really focusing on. The traditional methods of image retrieval such as those used by libraries have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

Automatic Image Annotation Software

SuperAnnotate

SuperAnnotate is an end-to-end platform for computer vision engineers and annotation teams to annotate, manage, train, and ultimately automate computer vision pipelines.

Automation: The platform allows three distinct types of automation both on labeling and quality assurance levels. The automation can be done through transfer learning, active learning^[2] and mislabel detection^[3]. Through the established link between the data annotation projects and Neural Network environment, one has the capacity to train custom models, perform manual corrections and iterate, all within the same platform, consequently increasing the speed and the accuracy of each new annotation task. The platform also allows you to select the most relevant frames from the large set of images which will help to reach the highest recognition accuracy with the limited dataset. Apart from the annotation automation itself, SuperAnnotate allows the elimination of data noise by automating the detection of mislabeled training samples. The platform is specifically built to unify and automate the entire data annotation pipeline.

API Integrations: The platform comes with a built in Python SDK that automates project setup and distribution, team management, and scaling for larger projects. The SDK includes a variety of data transfers functions, annotation converters, functions for data manipulations of images, annotations etc^[4]. It also allows CV engineers to run training, compare multiple training results, automatically find risky annotations etc.^[5]

References

^ [1]
^ SuperAnnotate (2020-09-30), AnnotationSoftware/active_learning, retrieved 2020-11-17
^ SuperAnnotate (2020-09-17), AnnotationSoftware/qa-automation, retrieved 2020-11-17
^ SuperAnnotate (2020-09-17), AnnotationSoftware/superannotate-python-sdk, retrieved 2020-11-17
^ "SuperAnnotate Desktop". opencv.org. Retrieved 2020-11-17.

Datta, Ritendra; Dhiraj Joshi; Jia Li; James Z. Wang (2008). "Image Retrieval: Ideas, Influences, and Trends of the New Age". ACM Computing Surveys. 40 (2): 1–60. doi:10.1145/1348246.1348248.
Nicolas Hervé; Nozha Boujemaa (2007). "Image annotation : which approach for realistic databases ?" (PDF). ACM International Conference on Image and Video Retrieval. Archived from the original (PDF) on 2011-05-20. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
M Inoue (2004). "On the need for annotation-based image retrieval" (PDF). Workshop on Information Retrieval in Context. pp. 44–46. Archived from the original (PDF) on 2014-08-08. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Y Mori; H Takahashi & R Oka (1999). "Image-to-word transformation based on dividing and vector quantizing images with words.". Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management. CiteSeerX 10.1.1.31.1704. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Annotation as machine translation

P Duygulu; K Barnard; N de Fretias & D Forsyth (2002). "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary". Proceedings of the European Conference on Computer Vision. pp. 97–112. Archived from the original on 2005-03-05. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Statistical models

J Li & J Z Wang (2006). "Real-time Computerized Annotation of Pictures". Proc. ACM Multimedia. pp. 911–920. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

J Z Wang & J Li (2002). "Learning-Based Linguistic Indexing of Pictures with 2-D MHMMs". Proc. ACM Multimedia. pp. 436–445. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Automatic linguistic indexing of pictures

J Li & J Z Wang (2008). "Real-time Computerized Annotation of Pictures". IEEE Transactions on Pattern Analysis and Machine Intelligence. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

J Li & J Z Wang (2003). "Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach". IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 1075–1088. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Hierarchical Aspect Cluster Model

K Barnard; D A Forsyth (2001). "Learning the Semantics of Words and Pictures". Proceedings of International Conference on Computer Vision. pp. 408–415. Archived from the original on 2007-09-28. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Latent Dirichlet Allocation model

D Blei; A Ng & M Jordan (2003). "Latent Dirichlet allocation" (PDF). Journal of Machine Learning Research. pp. 3:993–1022. Archived from the original (PDF) on 2005-05-21. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Supervised multiclass labeling

G Carneiro; A B Chan; P Moreno & N Vasconcelos (2006). "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 394–410. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Texture similarity

R W Picard & T P Minka (1995). "Vision Texture for Annotation". Multimedia Systems. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Support Vector Machines

C Cusano; G Ciocca & R Scettini (2004). "Image Annotation Using SVM". Proceedings of Internet Imaging IV. Internet Imaging V. Vol. 5304. p. 330. Bibcode:2003SPIE.5304..330C. doi:10.1117/12.526746. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Ensemble of Decision Trees and Random Subwindows

R Maree; P Geurts; J Piater & L Wehenkel (2005). "Random Subwindows for Robust Image Classification". Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 1:34–30. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Maximum Entropy

J Jeon; R Manmatha (2004). "Using Maximum Entropy for Automatic Image Annotation" (PDF). Int'l Conf on Image and Video Retrieval (CIVR 2004). pp. 24–32. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Relevance models

J Jeon; V Lavrenko & R Manmatha (2003). "Automatic image annotation and retrieval using cross-media relevance models" (PDF). Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 119–126. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Relevance models using continuous probability density functions

V Lavrenko; R Manmatha & J Jeon (2003). "A model for learning the semantics of pictures" (PDF). Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Coherent Language Model

R Jin; J Y Chai; L Si (2004). "Effective Automatic Image Annotation via A Coherent Language Model and Active Learning" (PDF). Proceedings of MM'04. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Inference networks

D Metzler & R Manmatha (2004). "An inference network approach to image retrieval" (PDF). Proceedings of the International Conference on Image and Video Retrieval. pp. 42–50. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Multiple Bernoulli distribution

S Feng; R Manmatha & V Lavrenko (2004). "Multiple Bernoulli relevance models for image and video annotation" (PDF). IEEE Conference on Computer Vision and Pattern Recognition. pp. 1002–1009. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Multiple design alternatives

J Y Pan; H-J Yang; P Duygulu; C Faloutsos (2004). "Automatic Image Captioning" (PDF). Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME'04). Archived from the original (PDF) on 2004-12-09. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Natural scene annotation

J Fan; Y Gao; H Luo; G Xu (2004). "Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation". Proceedings of the 27th annual international conference on Research and development in information retrieval. pp. 361–368. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Relevant low-level global filters

A Oliva & A Torralba (2001). "Modeling the shape of the scene: a holistic representation of the spatial envelope" (PDF). International Journal of Computer Vision. pp. 42:145–175. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Global image features and nonparametric density estimation

A Yavlinsky, E Schofield & S Rüger (2005). "Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation" (PDF). Int'l Conf on Image and Video Retrieval (CIVR, Singapore, Jul 2005). Archived from the original (PDF) on 2005-12-20. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Video semantics

N Vasconcelos & A Lippman (2001). "Statistical Models of Video Structure for Content Analysis and Characterization" (PDF). IEEE Transactions on Image Processing. pp. 1–17. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Ilaria Bartolini; Marco Patella & Corrado Romani (2010). "Shiatsu: Semantic-based Hierarchical Automatic Tagging of Videos by Segmentation Using Cuts". 3rd ACM International Multimedia Workshop on Automated Information Extraction in Media Production (AIEMPro10). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Image Annotation Refinement

Yohan Jin; Latifur Khan; Lei Wang & Mamoun Awad (2005). "Image annotations by combining multiple evidence & wordNet". 13th Annual ACM International Conference on Multimedia (MM 05). pp. 706–715. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Changhu Wang; Feng Jing; Lei Zhang & Hong-Jiang Zhang (2006). "Image annotation refinement using random walk with restarts". 14th Annual ACM International Conference on Multimedia (MM 06). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Changhu Wang; Feng Jing; Lei Zhang & Hong-Jiang Zhang (2007). "content-based image annotation refinement". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07). doi:10.1109/CVPR.2007.383221. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Ilaria Bartolini & Paolo Ciaccia (2007). "Imagination: Exploiting Link Analysis for Accurate Image Annotation". Springer Adaptive Multimedia Retrieval. doi:10.1007/978-3-540-79860-6_3. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Ilaria Bartolini & Paolo Ciaccia (2010). "Multi-dimensional Keyword-based Image Annotation and Search". 2nd ACM International Workshop on Keyword Search on Structured Data (KEYS 2010). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Automatic Image Annotation by Ensemble of Visual Descriptors

Emre Akbas & Fatos Y. Vural (2007). "Automatic Image Annotation by Ensemble of Visual Descriptors". Intl. Conf. on Computer Vision (CVPR) 2007, Workshop on Semantic Learning Applications in Multimedia. doi:10.1109/CVPR.2007.383484. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

A New Baseline for Image Annotation

Ameesh Makadia and Vladimir Pavlovic and Sanjiv Kumar (2008). "A New Baseline for Image Annotation" (PDF). European Conference on Computer Vision (ECCV). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Simultaneous Image Classification and Annotation

Chong Wang and David Blei and Li Fei-Fei (2009). "Simultaneous Image Classification and Annotation" (PDF). Conf. on Computer Vision and Pattern Recognition (CVPR). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation

Matthieu Guillaumin and Thomas Mensink and Jakob Verbeek and Cordelia Schmid (2009). "TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation" (PDF). Intl. Conf. on Computer Vision (ICCV). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Image Annotation Using Metric Learning in Semantic Neighbourhoods

Yashaswi Verma & C. V. Jawahar (2012). "Image Annotation Using Metric Learning in Semantic Neighbourhoods" (PDF). European Conference on Computer Vision (ECCV). Archived from the original (PDF) on 2013-05-14. Retrieved 2014-02-26. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Automatic Image Annotation Using Deep Learning Representations

Venkatesh N. Murthy & Subhransu Maji and R. Manmatha (2015). "Automatic Image Annotation Using Deep Learning Representations" (PDF). International Conference on Multimedia (ICMR). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Medical Image Annotation using bayesian networks and active learning

N. B. Marvasti & E. Yörük and B. Acar (2018). "Computer-Aided Medical Image Annotation: Preliminary Results With Liver Lesions in CT". IEEE Journal of Biomedical and Health Informatics. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

[1] [1]

[2] SuperAnnotate (2020-09-30), AnnotationSoftware/active_learning, retrieved 2020-11-17

[3] SuperAnnotate (2020-09-17), AnnotationSoftware/qa-automation, retrieved 2020-11-17

[4] SuperAnnotate (2020-09-17), AnnotationSoftware/superannotate-python-sdk, retrieved 2020-11-17

[5] "SuperAnnotate Desktop". opencv.org. Retrieved 2020-11-17.

[1]

[2]

[3]

[4]

[5]

Automatic image annotation