Published 2021 | Version v1
Dataset Open

The Corpus of Contemporary Written Kurdish

Description

The CCWK comprises a selection of contemporary written, primarily literary texts in Northern Kurdish (Kurmanjî). The corpus was compiled by Abdullah Incekan as part of his PhD project (Incekan 2018) under the supervision of Geoffrey Haig.

Please note that due to copyright constraints, the corpus data are available only on request. Please contact Geoffrey Haig if you wish to access the data.

The corpus consists of more than 900 000 words, predominantly fiction (~77%) combined with some non-fiction Kurmanjî Kurdish texts (~23%). The texts stem from a variety of contemporary sources (from the early 1990's to the present). They are intended to be approximately representative of contemporary Kurdish prose written in the largely standardized roman-based Kurmanjî alphabet. The corpus is not tagged or translated.

 

Citation

  • Incekan, Abdullah & Haig, Geoffrey. 2021. The Corpus of Contemporary Written Kurdish (CCWK). Bamberg: University of Bamberg. (DOI: 10.48564/unibafd-hp82b-k0k26) (date accessed)

 

Files

ccwk_corpus_description.pdf

Files (101.1 kB)

Name Size Download all
md5:d7964d93fe86f35aa1f643f5cdcc206f
101.1 kB Preview Download