Published October 2025 | Version 3.0
Dataset Open

HamBam — The Hamedan-Bamberg Corpus of Contemporary Spoken Persian

Description

HamBam, the Hamedan-Bamberg Corpus of Contemporary Spoken Persian (Haig & Rasekh-Mahand 2022), is an unrestrictedly accessible online corpus of contemporary spoken Persian. The design of the corpus follows the architecture and rationale of Multi-CAST (Haig & Schnell 2015), but with certain modifications. As in Multi-CAST, the texts are annotated using the free annotation software ELAN, which links sound files to annotation files. The annotated data are available in various formats (sound files, ELAN annotation files, tab-separated value files, and XML). This archive contains version 3.0 of the corpus (published in October 2025), which has been edited and expanded with six additional recordings. It fully supersedes all earlier versions.

 

HamBam at a glance

  • number of individual recordings: 44
  • total runtime: 166 minutes
  • total grammatical words: 20000

 

The HamBam team

  • Geoffrey Haig
  • Mohammad Rasekh-Mahand
  • Elham Izadi
  • Fariba Sabouri
  • Maryam Pouyankhah
  • Iran Abdi
  • Mehdi Parizadeh
  • Mehrdad Meshkinfam
  • Laurentia Schreiber
  • N. Schiborr

 

Citation

  • Haig, Geoffrey & Rasekh-Mahand, Mohammad. 2022. HamBam: Hamedan-Bamberg Corpus of Contemporary Spoken Persian. Version 3.0. (DOI: 10.48564/unibafd-v80bg-h0243)

 

Files

hambam_corpus-description.pdf

Files (1.1 GB)

Name Size Download all
md5:99609df7e0fdda95ce42be4909b6b39c
2.6 MB Preview Download
md5:f3cfae6ed48d1f5f76b306cb59f60ae2
197.8 kB Preview Download
md5:c6f554ee725ec010db24e4de1152483d
34.0 kB Preview Download
md5:9d2b1597742fb822dba53d3fb007955c
4.6 kB Download
md5:8db25485206e4402ee56519ecc90f3bd
153.9 MB Preview Download
md5:e27dc68e868d66decae8b8d7385f39ae
974.0 MB Preview Download
md5:09a3f24abbe1894f6131917b8bf53b49
321 Bytes Preview Download
md5:ec4f01f6059cb5b84f359732b579a489
1.2 kB Preview Download