Corpus Resources

Corpora are electronic bodies of linguistic data (texts) that linguists extract (isolate from their larger texts) and concordance (align by keyword) to generate natural language samples for term, phrase or syntax modeling.

Corpora can help translators empirically verify their intuitions about sense, connotation and near-synonymy, show patterns of actual frequencies or potential language use, reveal the lexical density of a text (particularly in translation research), identify semantic prosodies (connotations) and semantic preferences (the “clustering” of words around certain poles of meaning), and assist in overcoming imperfect overlap in collocational ranges across languages. Hatim and Munday (2005) map corpora in translation use as an interface with the language engineering discipline. Customized corpora may be generated with leasable software, while “found” corpora—some in the multimillions of words—are available on web-based concordancing sites.