ADA Library Digital Repository

Exploring Multi-Modal Natural Language Processing Methods for Effective Social Media Post Classification

Show simple item record

dc.contributor.author Talibzade, Nazim
dc.date.accessioned 2025-10-28T07:43:35Z
dc.date.available 2025-10-28T07:43:35Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/20.500.12181/1509
dc.description.abstract People are connected now through social media and not just in-person interactions. Every year, there are more social media applications, and with them comes increased user engagement and posts being shared. Hence, a good machine learning model for the classification of such posts is required. Among the difficulties are challenges in effectively classifying multimodal content on social media. For instance, if the program cannot identify a post as harmful or misleading, misinformation and social risks can spread. This thesis investigates whether multi-modal NLP models can classify social media posts. Furthermore, the objective is to classify the posts in the Azerbaijani language, which creates another obstacle to dataset availability for low-resource languages like Azerbaijani. Thus, a dataset was prepared of approximately 10,000 Azerbaijani-language social media posts containing textual and visual data. The dataset underwent extensive preprocessing, including text tokenization, image resizing to 224×224 pixels, and feature normalization. We evaluated models including FLAVA, BLIP, ViLT, and custom BERT+ResNet-based fusion baselines. Early, late, and hybrid fusion strategies were used to evaluate multimodal classification effectiveness. Performances were assessed using accuracy, macro-F1, and evaluation loss, with results contextualized against known benchmarks in multi-modal classification. With 87.6% accuracy and 87.1% macro-F1, FLAVA showed excellent cross-modal representation learning. After task-specific fine-tuning, BLIP came in at 83.8% accuracy and 83.3% macro-F1. Among the fusion baselines, BERT+ResNet with early fusion showed good performance (85.5% accuracy, 85.2% macro-F1), stressing the possibility of lightweight substitutes. With reasonable outcomes (83.2% accuracy, 82.7% macro-F1), ViLT provided an effective transformer-only solution. ALBEF, while architecturally promising due to its hybrid fusion and contrastive alignment, underperformed on this task (66.6% accuracy, 48.9% macro-F1), possibly due to vocabulary mismatch and inadequate adaptation to Azerbaijani content. The results reveal the trade-offs between accuracy, model complexity, and computing economy in multimodal NLP— especially in low-resource environments such as Azerbaijan. Future work will focus on fusion structures with attention mechanisms, dataset enlargement to better represent language variation, and explainable artificial intelligence tools like Grad-CAM and SHAP. Overall, this study aims at creating inclusive, multimodal artificial intelligence systems for low-resource languages in order to facilitate social media monitoring and analysis. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Machine learning -- Data processing. en_US
dc.subject Natural language processing (Computer science). en_US
dc.subject Multimodal learning -- Algorithms. en_US
dc.subject Social media -- Data analysis. en_US
dc.subject Artificial intelligence -- Application to low-resource languages. en_US
dc.title Exploring Multi-Modal Natural Language Processing Methods for Effective Social Media Post Classification en_US
dc.type Thesis en_US
dcterms.accessRights Absolute Embargo (No access without the author's permission)


Files in this item

Files Size Format View

There are no files associated with this item.

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account