Abstract: The rapid growth of user-generated content on video-sharing platforms such as YouTube has significantly enhanced global communication while simultaneously increasing the prevalence of cyberbullying within comment sections. Offensive and harmful comments negatively impact users’ psychological well-being and degrade online interactions. Manual moderation of such large-scale textual data is inefficient, necessitating automated intelligent detection systems. This study proposes a machine learning-based framework for multi-class classification of YouTube comments into bullying, non-bullying, and supportive categories. The system integrates Natural Language Processing (NLP) techniques including text cleaning, tokenization, stop-word removal, and lemmatization. Feature extraction is performed using TF-IDF vectorization with n-gram representations, along with sentiment-based features. To address class imbalance, SMOTE is applied during training. The classification framework employs XGBoost and Logistic Regression models, which are combined using a stacking ensemble approach to improve generalization and predictive performance. Experimental results demonstrate that the ensemble model effectively captures linguistic patterns in cyberbullying-related content and achieves reliable performance across accuracy, precision, recall, and F1-score. The proposed system provides a scalable and computationally efficient solution for automated cyberbullying detection on YouTube.

Keywords Cyberbullying Detection, YouTube Comment Analysis, Social Media Monitoring, Machine Learning, Natural Language Processing, Text Classification, Sentiment Analysis.


Downloads: PDF | DOI: 10.17148/IARJSET.2026.13225

How to Cite:

[1] Inbanathan S, Dr. P. Menaka, "Cyberbullying Detection on Social media Platforms using ML and NLP," International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2026.13225

Open chat