Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia

Purpose – Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fi...

Full description

Saved in:
Bibliographic Details
Main Authors: Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto, Sumarni Adi
Format: Article
Language:English
Published: Emerald Publishing 2025-01-01
Series:Applied Computing and Informatics
Subjects:
Online Access:https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0054/full/pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583485105111040
author Ema Utami
Irwan Oyong
Suwanto Raharjo
Anggit Dwi Hartanto
Sumarni Adi
author_facet Ema Utami
Irwan Oyong
Suwanto Raharjo
Anggit Dwi Hartanto
Sumarni Adi
author_sort Ema Utami
collection DOAJ
description Purpose – Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies). Design/methodology/approach – In this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language. Findings – The authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score). Originality/value – The process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.
format Article
id doaj-art-8722f33ad26e4c3abb1168ac56a4b237
institution Kabale University
issn 2634-1964
2210-8327
language English
publishDate 2025-01-01
publisher Emerald Publishing
record_format Article
series Applied Computing and Informatics
spelling doaj-art-8722f33ad26e4c3abb1168ac56a4b2372025-01-28T12:19:18ZengEmerald PublishingApplied Computing and Informatics2634-19642210-83272025-01-01211/214115110.1108/ACI-03-2021-0054Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa IndonesiaEma Utami0Irwan Oyong1Suwanto Raharjo2Anggit Dwi Hartanto3Sumarni Adi4Universitas Amikom Yogyakarta, Sleman, IndonesiaUniversitas Amikom Yogyakarta, Sleman, IndonesiaIST AKPRIND, Yogyakarta, IndonesiaUniversitas Amikom Yogyakarta, Sleman, IndonesiaUniversitas Amikom Yogyakarta, Sleman, IndonesiaPurpose – Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies). Design/methodology/approach – In this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language. Findings – The authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score). Originality/value – The process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0054/full/pdfSupervised learningResampling techniquesProfiling analysisDISCTwitter informationBahasa Indonesia
spellingShingle Ema Utami
Irwan Oyong
Suwanto Raharjo
Anggit Dwi Hartanto
Sumarni Adi
Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
Applied Computing and Informatics
Supervised learning
Resampling techniques
Profiling analysis
DISC
Twitter information
Bahasa Indonesia
title Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
title_full Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
title_fullStr Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
title_full_unstemmed Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
title_short Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
title_sort supervised learning and resampling techniques on disc personality classification using twitter information in bahasa indonesia
topic Supervised learning
Resampling techniques
Profiling analysis
DISC
Twitter information
Bahasa Indonesia
url https://www.emerald.com/insight/content/doi/10.1108/ACI-03-2021-0054/full/pdf
work_keys_str_mv AT emautami supervisedlearningandresamplingtechniquesondiscpersonalityclassificationusingtwitterinformationinbahasaindonesia
AT irwanoyong supervisedlearningandresamplingtechniquesondiscpersonalityclassificationusingtwitterinformationinbahasaindonesia
AT suwantoraharjo supervisedlearningandresamplingtechniquesondiscpersonalityclassificationusingtwitterinformationinbahasaindonesia
AT anggitdwihartanto supervisedlearningandresamplingtechniquesondiscpersonalityclassificationusingtwitterinformationinbahasaindonesia
AT sumarniadi supervisedlearningandresamplingtechniquesondiscpersonalityclassificationusingtwitterinformationinbahasaindonesia