MoArLex: An Arabic Sentiment Lexicon Built Through Automatic Lexicon Expansion

Youssef M.

El-Beltagy S.R.

Research addressing Sentiment Analysis has witnessed great attention over the last decade especially after the huge increase in social media networks usage. Social networks like Facebook and Twitter generate an incredible amount of data on a daily basis, containing posts that discuss all kinds of different topics ranging from sports and products to politics and current events. Since data generated within these mediums is created by users from all over the world, it is multilingual in nature. Arabic is one of the important languages recently targeted by many sentiment analysis efforts. However, Arabic is considered to be under-resourced in terms of lexicons and datasets when compared to English. This paper presents a novel technique for automatically expanding an Arabic sentiment lexicon using word embeddings. Evaluation of the quality of the automatically added terms was done in multiple ways, all of which have shown that lexicon entries added using the presented way are more accurate than sentiment lexicon entries obtained using machine learning or distant supervision methods. © 2018 The Authors. Published by Elsevier B.V.