Author name | Marina Nteva |
---|---|
Title | Aspect Based Sentiment Analysis on User-Generated Context, Detect and Classify Aspect Terms from Greek Hotel Reviews |
Year | 2019-2020 |
Supervisor | Georgios Petasis GeorgiosPetasis |
Lots of research has been done in the field of natural language processing as far as the part of sentiment analysis is concerned. On top of that, other research focused more specifically on the aspect to which the sentiment refers, and so the domain was evolved to aspect-based sentiment analysis. The problem to which the ABSA solution contributes is the extraction of the aspects out of a sequence, which has a positive or negative impact. In contrast to SA, which tries to elicit the overall emotion out of a piece of text or sentence. In this study, we try to export and categorize aspects from Greek reviews about hotel facilities. The aforementioned research that has been made is mainly based on the English language. Instead, there are only a few attempts regarding ABSA in Greek texts. Therefore, the data available for language model training is limited and most difficult to use for studies. For this research, we tried to fine-tune BERT pre-trained model on Greek hotel reviews taken from a big international travel site that collects reviews and ratings for several facilities from visitors all around the world. In order to annotate the aspect of each review, we used the Ellogon annotation tool developed by a team in NSCR Demokritos. The export from the tool was a JSON file containing the spans of each aspect as well as the sentiment assigned to it. For the annotation, we use BIO- encoding to indicate tokens in the beginning, inside or outside of an aspect term, along with the negative or positive sentiment according to the sequence content. Using the BIO-encoding, we combined both the tag for the aspect term (Begging, Inside, Outside) and the sentiment expressed about the aspect term. To our knowledge, there is not done yet in prior research, where the aspect extraction and aspect classification were treated as two different tasks. The next step was feeding the data to the BERT model using Greek word embeddings available from Hugging Face and predict the tags for each token. Doing so, although the accuracy score was high, we observed that the values of the F1- score were unsatisfying. We then tried to use the FLAIR framework, which allows the application of different combinations of word embeddings. The basic concept of FLAIR is the training of a bi-directional LSTM model. The outcome was that the results regarding the individual F1-score for the minority classes were low. The minority classes were tags indicating that a word is inside an aspect term either positive or negative, which is because most aspect terms consist of only one token. In addition to this and to further improve the outcome, an up-sampling for the minority classed was performed combined with an under-sampling of the majority class, which was the tag indicating tokens outside an aspect. The performance of the model was visibly better, and the values of the F1 score satisfactorily improved. Finally, to further improve the prediction of inner-aspect tags, we tried to build a neural network classifier that would predict if the following token of a sequence belongs or not to the previous aspect term found. The input to the neural network is the embeddings for each word elicited by fine-tuning BERT, and the model tries to classify each term whether inside or outside of an aspect. In summary, we could say that the model performs a re-positioning of multi-word aspect terms to better capture the ground truth aspect. We call this network as Aspect Corrector Network, and the whole approach could be an enhancement to the base BERT model, called BERT with Aspect Corrector Network (BERT-ACN). The whole methodology and experiments performed are explained in detail in chapters 3 and 4.