Ir al contenido

General Architecture for Text Engineering

De Wikipedia, la enciclopedia libre

Esta es una versión antigua de esta página, editada a las 12:27 4 feb 2011 por Rodamaker (discusión · contribs.). La dirección URL es un enlace permanente a esta versión, que puede ser diferente de la versión actual.

(difs.) ← Revisión anterior · Ver revisión actual (difs.) · Revisión siguiente → (difs.)
GATE

ventana principal de GATE Developer v5
Información general
Tipo de programa Minería de textos Extracción de la información
Desarrollador GATE research team, Dept. Computer Science, University of Sheffield
Lanzamiento inicial 1995
Licencia LGPL
Idiomas Inglés
Información técnica
Programado en Java
Plataformas admitidas máquina virtual Java
Versiones
Última versión estable 5.2.1 (info) (06 de mayo de 2010 (15 años, 2 meses y 17 días))
Última versión en pruebas 6.0.0 (04 de febrero de 2011 (14 años, 5 meses y 19 días))
Enlaces

General Architecture for Text Engineering o GATE es una suite de herramientas Java desarrolladas en la Universidad de Sheffield, que comenzó en 1995 y hoy es usada por una amplia comunidad de científicos, compañías, profesores y estudiantes para tareas de Procesamiento de lenguajes naturales (PLN o NLP) de todo tipo, incluyendo Extracción de la información, en varios idiomas.

GATE incluye[1]​:

  • un IDE, GATE Developer: un entorno de desarrollo integrado para Procesamiento de lenguajes naturales, con componentes incluidos con una extracción de info ampliamente usada information extraction system and a comprehensive set of other plugins
  • a web app, GATE Teamware: a collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimised backend service infrastructure
  • a framework, GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the services used by GATE Developer and more
  • an architecture: a high-level organisational picture of language processing software composition
  • a process for the creation of robust and maintainable services.

Under development:

  • a wiki/CMS[2]
  • a cloud computing solution for hosted large-scale text processing, GATE Cloud

GATE aims to remove the necessity for solving common engineering problems before doing useful research, or re-engineering before deploying research results into applications. Core functions of GATE take care of the lion’s share of the engineering:

  • modelling and persistence of specialised data structures
  • measurement, evaluation, benchmarking
  • visualisation and editing of annotations, ontologies, parse trees, etc.
  • a finite state transduction language for rapid prototyping and efficient implementation of shallow analysis methods (JAPE, see below)
  • extraction of training instances for machine learning
  • pluggable machine learning implementations (Weka, SVM Light, an in-house uneven margins SVM implementation[3]​ and more.)

On top of the core functions, GATE includes components for diverse natural language processing tasks, e.g. parsers, morphology, tagging, information retrieval tools, information extraction components for various languages, and many others. It has been widely applied in fields such as bioinformatics[4]​ and others. GATE Developer and Embedded are supplied with an information extraction system (ANNIE) which has been adapted and evaluated very widely (numerous industrial systems, research systems evaluated in MUC, TREC, ACE, DUC, Pascal, NTCIR, etc.). ANNIE is often used to create RDF or OWL (metadata) for unstructured content (semantic annotation). GATE has been compared to NLTK, R and RapidMiner.[5]​ As well as being widely used in its own right, it forms the basis of the KIM semantic platform.[6]

GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects.

As of December 4, 2009, 691 people are on the gate-users mailing list at SourceForge.net, and 98,858 downloads from SourceForge are recorded since the project moved to SourceForge in 2005.[7]​ The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications"[8]​ has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,[9]​ include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,[10]​ and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.[11]


Véase también

Referencias

  1. GATE Family page on the GATE website
  2. GATE Wiki
  3. Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Journal Of Natural Language Engineering 2009 (Y. Li, K. Bontcheva and H. Cunningham)
  4. "Combining Biological Databases and Text Mining to Support New Bioinformatics Applications", by René Witte and Christopher J.O. Baker (in "Lecture Notes in Computer Science, Springer Berlin, Volume 3513, 2005)
  5. "Open Source Text Analytics" web article by Seth Grimes
  6. "KIM – a semantic platform for information extraction and retrieval", by Popov et al (Natural Language Engineering (2004), 10:375-392)
  7. GATE project page on SourceForge
  8. "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications", by Cunningham H., Maynard D., Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)
  9. GATE User Guide
  10. "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady
  11. "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock

Enlaces externos