News classification with human annotators: A case study

The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news websites are categorised manually, the task becomes more strenuous considering the tremendous surge of data updates every day. This paper add...

Full description

Main Authors: Fuddoly, A., Jaafar, J., Zamin, N.
Format: Article
Institution: Universiti Teknologi Petronas
Record Id / ISBN-0: utp-eprints.31406 /
Published: Penerbit UTM Press 2015
Online Access: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84932612791&doi=10.11113%2fjt.v74.4829&partnerID=40&md5=fd48188f8c7af2463936f05ad866bf70
http://eprints.utp.edu.my/31406/
Tags: Add Tag
No Tags, Be the first to tag this record!
id utp-eprints.31406
recordtype eprints
spelling utp-eprints.314062022-03-26T03:18:59Z News classification with human annotators: A case study Fuddoly, A. Jaafar, J. Zamin, N. The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news websites are categorised manually, the task becomes more strenuous considering the tremendous surge of data updates every day. This paper addresses the question of how text classification algorithms can substitute the particular task over manual classification methods. A combined method using Bracewell's algorithm and top-n method is demonstrated and tested using Indonesian language corpus. The experiment also uses human evaluation as the benchmark. The result from the human evaluation is further investigated in order to understand how the annotators classify documents and the aspects that can be improved to enhance the method in the future. The results indicate that the method can outperform human annotators by 13 in terms of accuracy. © 2015 Penerbit UTM Press. All rights reserved. Penerbit UTM Press 2015 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84932612791&doi=10.11113%2fjt.v74.4829&partnerID=40&md5=fd48188f8c7af2463936f05ad866bf70 Fuddoly, A. and Jaafar, J. and Zamin, N. (2015) News classification with human annotators: A case study. Jurnal Teknologi, 74 (10). pp. 21-28. http://eprints.utp.edu.my/31406/
institution Universiti Teknologi Petronas
collection UTP Institutional Repository
description The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news websites are categorised manually, the task becomes more strenuous considering the tremendous surge of data updates every day. This paper addresses the question of how text classification algorithms can substitute the particular task over manual classification methods. A combined method using Bracewell's algorithm and top-n method is demonstrated and tested using Indonesian language corpus. The experiment also uses human evaluation as the benchmark. The result from the human evaluation is further investigated in order to understand how the annotators classify documents and the aspects that can be improved to enhance the method in the future. The results indicate that the method can outperform human annotators by 13 in terms of accuracy. © 2015 Penerbit UTM Press. All rights reserved.
format Article
author Fuddoly, A.
Jaafar, J.
Zamin, N.
spellingShingle Fuddoly, A.
Jaafar, J.
Zamin, N.
News classification with human annotators: A case study
author_sort Fuddoly, A.
title News classification with human annotators: A case study
title_short News classification with human annotators: A case study
title_full News classification with human annotators: A case study
title_fullStr News classification with human annotators: A case study
title_full_unstemmed News classification with human annotators: A case study
title_sort news classification with human annotators: a case study
publisher Penerbit UTM Press
publishDate 2015
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-84932612791&doi=10.11113%2fjt.v74.4829&partnerID=40&md5=fd48188f8c7af2463936f05ad866bf70
http://eprints.utp.edu.my/31406/
_version_ 1741197568223739904
score 11.62408