Contextualized Word Embedding Untuk Ekstraksi Kutipan Berita Indonesia
DOI:
https://doi.org/10.33557/2ayqqa48Keywords:
BERT, Direct Quotes, Indonesian News, Named Entity Recognition, Word2VecAbstract
This study aims to develop a Named Entity Recognition (NER) model based on Recurrent Neural Networks (RNN) to extract direct quotes from Indonesian news articles, with a focus on enhancing the Medmon system by Kabayan Group, which is used to monitor the public image of public figures and brands. The study is limited to Indonesian news articles and does not include other languages or news sources. Two models are compared in this research: one utilizing static word embedding Word2Vec and the other using contextual word embedding BERT. The experiment was conducted using PFSA-ID corpus, which consist 1,018 Indonesian news articles annotated for direct quotes using BILOU scheme. Both models were trained and evaluated using Python programming libraries such as Pytorch and Hugging Face Transformers. The results show that the BERT model outperforms Word2Vec, with an F1-Score difference of 14.03 points. The BERT model achieved a highest F1-Score of 92.28%, while Word2Vec only reached 78.05%. This research contributes to the field of online media monitoring by improving the efficiency and accuracy of direct quote extraction in Indonesian news, offering practical value for media analysts and organizations relying on automated media analysis
Downloads
References
[1] S. Hong, “Shaping Public Opinion in the Digital Age: The Role of Online News and Social Media in Forming Political Leaders’ Image,” Public Relat Rev, vol. 46, no. 2, pp. 101–111, 2020.
[2] C. J. Vargo, L. Guo, and M. A. Amazeen, “Media Coverage and Public Perception: How Online News Influences Public Image of Political Figures,” Digital Journalism, vol. 7, no. 3, pp. 348–365, 2019.
[3] Kabayan Group, “Kabayan Group.” [Online]. Available: https://kabayan.id
[4] Kabayan Group, “Company Profile CV. Kabayan Consulting,” 2022.
[5] Y. S. Purnomo W.P., Y. J. Kumar, and N. Z. Zulkarnain, “Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia,” Global Knowledge, Memory and Communication, vol. 70, no. 6–7, pp. 655–671, 2020, doi: 10.1108/GKMC-07-2020-0098.
[6] J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” Neural Comput Appl, vol. 36, no. 16, pp. 8995–9022, 2022, doi: 10.1007/s00521-024-09646-6.
[7] J. Papay and S. Pado, “Quotation Extraction Using Deep Learning,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), 2019, pp. 1431–1440.
[8] L. Ishwara, Jurnalisme dasar. Penerbit Buku Kompas, 2011.
[9] Y. Syaifudin and A. Nurwidyantoro, “Quotations identification from Indonesian online news using rule-based method,” Proceeding - 2016 International Seminar on Intelligent Technology and Its Application, ISITIA 2016: Recent Trends in Intelligent Computational Technologies for Sustainable Energy, pp. 187–194, 2016, doi: 10.1109/ISITIA.2016.7828656.
[10] Y. S. Purnomo W.P., Y. J. Kumar, N. Z. Zulkarnain, and B. Raza, “Extraction and attribution of public figures statements for journalism in Indonesia using deep learning,” Knowl Based Syst, vol. 289, no. February, 2024, doi: 10.1016/j.knosys.2024.111558.
[11] Y. S. Purnomo W.P., Y. J. Kumar, and N. Z. Zulkarnain, “PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributions,” Global Knowledge, Memory and Communication, vol. 73, no. 6–7, pp. 853–870, 2024, doi: 10.1108/GKMC-04-2022-0091.
[12] X. Zhong and E. Cambria, Time Expression and Named Entity Analysis and Recognition. 2021.
[13] J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Naacl-Hlt 2019, no. Mlm, pp. 4171–4186, 2019, [Online]. Available: https://aclanthology.org/N19-1423.pdf
[14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12, 2013.
[15] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 8875–8898, 2021, doi: 10.18653/v1/2021.emnlp-main.699.
[16] D. C. Wintaka, M. A. Bijaksana, and I. Asror, “Named-entity recognition on Indonesian tweets using bidirectional LSTM-CRF,” Procedia Comput Sci, vol. 157, pp. 221–228, 2019, doi: 10.1016/j.procs.2019.08.161.
[17] C. Che, C. Zhou, H. Zhao, B. Jin, and Z. Gao, “Fast and effective biomedical named entity recognition using temporal convolutional network with conditional random field,” Mathematical Biosciences and Engineering, vol. 17, no. 4, pp. 3553–3566, 2020, doi: 10.3934/MBE.2020200.
[18] S. Khan et al., “BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4335–4344, 2022, doi: 10.1016/j.jksuci.2022.05.006.
[19] R. An, J. M. Perez-Cruet, X. Wang, and Y. Yang, “Build Deep Neural Network Models to Detect Common Edible Nuts from Photos and Estimate Nutrient Portfolio,” Nutrients, vol. 16, no. 9, pp. 1–9, 2024, doi: 10.3390/nu16091294.
[20] G. Popovski, B. K. Seljak, and T. Eftimov, “A Survey of Named-Entity Recognition Methods for Food Information Extraction,” IEEE Access, vol. 8, pp. 31586–31594, 2020, doi: 10.1109/ACCESS.2020.2973502.
[21] Warto, Muljono, Purwanto, and E. Noersasongko, “Improving Named Entity Recognition in Bahasa Indonesia with Transformer-Word2Vec-CNN-Attention Model,” International Journal of Intelligent Engineering and Systems, vol. 16, no. 4, pp. 655–668, 2023, doi: 10.22266/ijies2023.0831.53.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Jurnal Ilmiah Matrik

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Jurnal Ilmiah Matrik byhttps://journal.binadarma.ac.id/index.php/jurnalmatrik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.