Volume 20, Issue 4 (December (Special Issue on ADLEEE) 2024)                   IJEEE 2024, 20(4): 3299-3299 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

yamini P, daneshfar F, Ghorbani A. KurdSM: Transformer-Based Model for Kurdish Abstractive Text Summarization with an Annotated Corpus. IJEEE 2024; 20 (4) :3299-3299
URL: http://ijeee.iust.ac.ir/article-1-3299-en.html
Abstract:   (216 Views)
With the exponential growth of unstructured data on the Web and social networks, extracting relevant information from multiple sources; has become increasingly challenging, necessitating the need for automated summarization systems. However, developing machine learning-based summarization systems largely depends on datasets, which must be evaluated to determine their usefulness in retrieving data. In most cases, these datasets are summarized with humans’ involvement. Nevertheless, this approach is inadequate for some low-resource languages, making summarization a daunting task. To address this, this paper proposes a method for developing the first abstractive text summarization corpus with human evaluation and automated summarization model for the Sorani Kurdish language. The researchers compiled various documents from information available on the Web (rudaw), and the resulting corpus was released publicly. A customized and simplified version of the mT5-base transformer was then developed to evaluate the corpus. The model's performance was assessed using criteria such as Rouge-1, Rouge-2, Rouge-L, N-gram novelty, manual evaluation and the results are close to reference summaries in terms of all the criteria. This unique Sorani Kurdish corpus and automated summarization model have the potential to pave the way for future studies, facilitating the development of improved summarization systems in low-resource languages.
Full-Text [PDF 1023 kb]   (121 Downloads)    
Type of Study: Research Paper | Subject: Speech Processing
Received: 2024/05/26 | Revised: 2024/12/23 | Accepted: 2024/12/06

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Creative Commons License
© 2022 by the authors. Licensee IUST, Tehran, Iran. This is an open access journal distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.