Sentiment analysis plays a crucial role in understanding customer opinions and feedback, especially in the realm of online reviews. This project delves into sentiment analysis using various techniques on the Amazon Fine Food Reviews dataset, encompassing over 500,000 reviews spanning more than a decade. Traditional machine learning models, word embeddings, and deep learning models are explored and compared for their efficacy in sentiment analysis. The dataset underwent rigorous preprocessing, including handling missing values, binary classification of ratings, and text cleaning. Traditional machine learning models were trained using TF-IDF vectorization, while word embeddings such as Word2Vec were employed alongside both traditional and deep learning models. Deep learning models, including BERT and its variant LoraBERT, were fine-tuned for sentiment analysis. The results highlight the superior performance of deep learning models, particularly BERT, in achieving high accuracy compared to traditional methods.
The dataset
Dataset after cleaning using tokenizer, stop words removal, lemmatizing and stemming.
Results
Sentiment analysis, a subfield of natural language processing, focuses on extracting subjective information from text, enabling the interpretation of sentiments expressed in reviews, comments, or social media posts. In the context of e-commerce, sentiment analysis aids in understanding customer satisfaction, identifying trends, and informing business decisions. The Amazon Fine Food Reviews dataset, encompassing a vast array of reviews, provides an ideal platform for exploring and evaluating sentiment analysis techniques. This project aims to investigate and compare the performance of traditional machine learning models, word embeddings, and deep learning models in sentiment analysis, with a focus on accuracy, precision, recall, and F1-score metrics.
The results of the project reveal the effectiveness of various sentiment analysis techniques on the Amazon Fine Food Reviews dataset. Traditional machine learning models coupled with TF-IDF vectorization demonstrate reasonable performance, while Word2Vec embeddings yield slightly lower accuracy, attributed to potential contextual information loss. However, deep learning models, particularly BERT, outperform traditional methods, achieving the highest accuracy among all models evaluated. Fine-tuning BERT significantly enhances its performance, emphasizing the potential of leveraging advanced deep learning techniques for sentiment analysis tasks.
Find my projects on my github profile.