Jump to content

Biomedical text mining

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Xtisths (talk | contribs) at 12:46, 1 August 2008 (Examples). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Biomedical text mining (also known as BioNLP) refers to text mining applied to texts and literature of the biomedical and molecular biology domain. It is a rather recent research field on the edge of natural language processing, bioinformatics, medical informatics and computational linguistics.

There is an increasing interest in text mining and information extraction strategies applied to the biomedical and molecular biology literature due to the increasing number of electronically available publications stored in databases such as PubMed.


Main applications

The main developments in this area have been related to the identification of biological entities (named entity recognition), such as protein and gene names in free text, the association of gene clusters obtained by microarray experiments with the biological context provided by the corresponding literature, automatic extraction of protein interactions and associations of proteins to functional concepts (e.g. gene ontology terms). Even the extraction of kinetic parameters from text or the subcellular location of proteins have been addressed by information extraction and text mining systems.

Examples

  • Chilibot: A tool for finding relationships between genes or gene products.
  • Information Hyperlinked Over Proteins (iHOP) (ref.: Bioinformatics, 2005 Sep 1;21 Suppl 2:ii252-ii258.): "A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural way of accessing millions of PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource, bringing all advantages of the internet to scientific literature research."
  • FABLE: A gene-centric text-mining search engine for MEDLINE
  • GoPubMed: retrieves PubMed abstracts for your search query, then detects ontology terms from the Gene Ontology and Medical Subject Headings in the abstracts and allows the user to browse the search results by exploring the ontologies and displaying only papers mentioning selected terms, their synonyms or descendants.
  • LitInspector - Gene and signal transduction pathway data mining in PubMed abstracts.
  • PubGene - Co-occurrence networks display of gene and protein symbols as well as MeSH, GO, PubChem and interaction terms (such as "binds" or "induces") as these appear in MEDLINE records (that is, PubMed titles and abstracts).
  • Biolab Experiment Assistant (BEA) - Knowledge management and hypothesis generation software suite for the Life Sciences. BEA creates networks of interconnected biomedical concepts (such as genes, diseases, pathways and drugs) extracted from the scientific literature.

References

Conferences at which BioNLP research is presented

BioNLP is presented at a variety of meetings:

See also