News classification with human annotators: A case study

The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news websites are categorised manually, the task becomes more strenuous considering the tremendous surge of data updates every day. This paper add...

Full description

Main Authors: Fuddoly, A., Jaafar, J., Zamin, N.
Format: Article
Institution: Universiti Teknologi Petronas
Record Id / ISBN-0: utp-eprints.25981 /
Published: Penerbit UTM Press 2015
Online Access: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84932612791&doi=10.11113%2fjt.v74.4829&partnerID=40&md5=fd48188f8c7af2463936f05ad866bf70
http://eprints.utp.edu.my/25981/
Tags: Add Tag
No Tags, Be the first to tag this record!
Summary: The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news websites are categorised manually, the task becomes more strenuous considering the tremendous surge of data updates every day. This paper addresses the question of how text classification algorithms can substitute the particular task over manual classification methods. A combined method using Bracewell's algorithm and top-n method is demonstrated and tested using Indonesian language corpus. The experiment also uses human evaluation as the benchmark. The result from the human evaluation is further investigated in order to understand how the annotators classify documents and the aspects that can be improved to enhance the method in the future. The results indicate that the method can outperform human annotators by 13 in terms of accuracy. © 2015 Penerbit UTM Press. All rights reserved.