Jump to content

Document-term matrix

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 62.103.252.79 (talk) at 16:13, 2 December 2004. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

When creating a database of terms that appear in a set of documents the "Document-Term Matrix" contains rows that refer to the documents whose coordinates are the terms... For instance if one has the following two - short - documents:

D1 = "I like databases" D2 = "I hate databases",

then the Document-Term Matrix would be:

    I  like   hate   databases
  ----------------------------

D1 | 1 1 0 1 | D2 | 1 0 1 1 |

  ----------------------------

which merely shows which documents contain which terms (and how many times).