Published 2024 | Version 1.0
Collection Open

WOWA — The Word Order in Western Asia Corpus

Description

WOWA (Word Order in Western Asia) is an open-access collection of transcribed and annotated spoken texts from 41 languages spoken across a region loosely referred to as Western Asia. Most texts are spontaneous (i.e. unscripted) narrative monologues such as oral history and traditional tales. The languages selected are generally under-researched, non-standardized minority languages, which reflect the long-term linguistic diversity of the region more faithfully than the currently dominant written official languages (Turkish, Arabic, and Persian).

The collection includes original text sources for all languages, and sound files for a large subset. WOWA was designed to investigate areal effects in word order, in particular in the post-predicate domain. The main results of have been published in Haig et al. (2024). For further details and references, please consult the README file included in the archive.

WOWA was funded by the Alexander-von-Humboldt Stiftung (grant number 1135327-IRN-IP), awarded to Geoffrey Haig (Bamberg) and Mohammad Rasekh-Mahand (Hamedan), 2019–2023. The archive was designed and implemented by N. Schiborr.

 

Archive structure

Each of the 41 data set in WOWA is an individually citable resource that includes minimally source texts (PDF, optionally also WAV audio recordings), original annotations (XLS spreadsheets and TSV), a metadata sheet (PDF), and a citation file (TXT). The contents of the archive are split into into three parts, each wrapped into separate ZIP files:

  • 1* wowa__annotations.zip:
    • annotation and documentation files for all 41 data sets
  • 1* wowa__sources-pdf.zip:
    • source texts for all 41 data sets
  • 23* wowa_[...]__sources-wav.zip:
    • source recordings for the 23 data sets that have them

 

Citation for the entire WOWA collection

  • Haig, Geoffrey & Stilo, Donald & Dogan, Mahîr C. & Schiborr, N. (eds.). 2024. WOWA — Word Order in Western Asia: A spoken-language-based corpus for investigating areal effects in word order variation. Bamberg: University of Bamberg. (DOI: 10.48564/unibafd-gyws0-g4218) (date accessed)

Additionally, each data set in the collection is an individually citable resource with the contributors as authors. Please refer to the citation guides included in the archive with each data set for more information.

 

List of data sets ([*] = with audio)

  • Armenian
    • Armenian (Eastern, Agulis) — Katherine Hodgson [*]
  • Hellenic
    • Pontic Greek (Madan) — Katherine Hodgson [*]
    • Pontic Greek (Romeyka) — Laurentia Schreiber
  • Indo-Aryan
    • Kholosi (Kholos) — Maryam Nourzaei [*]
  • Iranian
    • Balochi (Coastal) — Maryam Nourzaei [*]
    • Balochi (Koroshi) — Maryam Nourzaei [*]
    • Balochi (Turkmen) — Geoffrey Haig [*]
    • Bashkardi (Northern) — Agnes Korn, Ilya Gershevitch [*]
    • Bashkardi (Southern) — Agnes Korn, Ilya Gershevitch [*]
    • Gorani (Gawraju) — Masoud Mohammadirad [*]
    • Kumzari (Musandam) — Geoffrey Haig
    • Kurdish (Central, Sanandaj) — Masoud Mohammadirad [*]
    • Kurdish (Northern, Ankara) — Kateryna Iefremenko [*]
    • Kurdish (Northern, Lachin) — Donald Stilo
    • Kurdish (Northern, Mus) — Geoffrey Haig [*]
    • Kurdish (Southern, Bijar) — Masoud Mohammadirad [*]
    • Mazandarani (Kordxeyl) — Donald Stilo, Geoffrey Haig
    • Persian (New) — Elham Izadi [*]
    • Persian (New, Early Classical) — Mehdi Parizadeh
    • Talyshi (Lerik) — Donald Stilo
    • Tati (Hazarrudi) — Raheleh Izadifar [*]
    • Vafsi (Gurchani) — Mahîr Can Dogan [*]
    • Zazakî (Çewlîg) — Netîce Demir, Mahîr Dogan [*]
    • Zazakî (Siwêreg) — Netîce Demir, Mahîr Dogan [*]
  • Kartvelian
    • Laz (Arhavi) — Donald Stilo, René Lacroix
  • Semitic
    • Arabic (Jewish, Baghdad) — Assaf Bar-Moshe, Alexandru Craevschi [*]
    • Arabic (Christian, Ka'biye) — Paul Noorlander
    • Arabic (Khuzestan) — Bettina Leitner [*]
    • Central Neo-Aramaic (Mlahso) — Paul Noorlander
    • Central Neo-Aramaic (Turoyo, Midyat) — Paul Noorlander
    • NE Neo-Aramaic (Christian, Barwar) — Donald Stilo
    • NE Neo-Aramaic (Christian, Shaqlawa) — Paul Noorlander
    • NE Neo-Aramaic (Christian, Urmi) — Paul Noorlander
    • NE Neo-Aramaic (Jewish, Dohok) — Dorota Molin [*]
    • NE Neo-Aramaic (Jewish, Sanandaj) — Paul Noorlander
    • NE Neo-Aramaic (Jewish, Urmi) — Paul Noorlander
  • Turkic
    • Oghuz (Ankara) — Kateryna Iefremenko [*]
    • Oghuz (Erzurum) — Mahîr Dogan
    • Oghuz (Gagauz) — Mahîr Dogan
    • Oghuz (Qashqai) — Sohrab Dolatkhah, Laurentia Schreiber [*]
    • Oghuz (Tabriz) — Donald Stilo

 

Files

wowa__readme.pdf

Files (7.1 GB)

Name Size Download all
md5:89cee5962504ed8340ebd4c09966fa4f
13.7 MB Preview Download
md5:de67a30a16c1753257ddf89532cf8d13
110.5 kB Preview Download
md5:249d97fdb0423cab8862ef174aa9f2af
155.6 MB Preview Download
md5:9a2d144f8397fcab8da96d766edd437c
633.0 MB Preview Download
md5:de0a004860ca01acefe67f42376459d0
330.7 MB Preview Download
md5:89abeef934baf68310f7ca4d96ccefaa
239.3 MB Preview Download
md5:cb6044392a88ca984cb1e9ee0c9ed820
253.4 MB Preview Download
md5:aa67e0f46a35b5fc8905c061c277256e
322.7 MB Preview Download
md5:a60034170d738220151b0490edbd1748
179.0 MB Preview Download
md5:7c3686b1556d7ce201a65bdb9b4e53b8
188.5 MB Preview Download
md5:4d745d72a5c394b5ccff190a4d877427
69.9 MB Preview Download
md5:5f3cdec9ce2e02e4aaa276e01fd646d9
363.0 MB Preview Download
md5:0a8a5208b4c230b619ba3ff67c7d49af
121.2 MB Preview Download
md5:39ea22f3c5dfb54940546c0742c71789
300.9 MB Preview Download
md5:6365cac2fcd9059aca0725397e91f8bf
149.1 MB Preview Download
md5:631c5ac6a487ce1629fe5bb53c3a48c1
567.4 MB Preview Download
md5:a7e834f23317b8d4c24a16f8d28b45b6
603.7 MB Preview Download
md5:fbbe869244af3345221237d6bd30d7f7
396.9 MB Preview Download
md5:f49fb99bcd26c92d20267c0a3f8c6fe7
336.6 MB Preview Download
md5:75212e40a6ab9cf6548bda21796900a5
198.9 MB Preview Download
md5:0c4572b8f1efc4ee113376e9e38a16d3
131.7 MB Preview Download
md5:82a3ea866f227b5db34ab8d94aaf096b
313.6 MB Preview Download
md5:e582008f74038452e8f332e26bff9649
670.4 MB Preview Download
md5:077368be2f129817c01e38c31a9e669b
212.0 MB Preview Download
md5:ca1fed100b9e388dfe18cb09e0fd5b8e
281.9 MB Preview Download
md5:82cb626e6174bc6e9886493cc8271e7e
113.9 MB Preview Download