The impact of different representations in the presence of language drift

Author nameIoannis Christodoulou
TitleThe impact of different representations in the presence of language drift
Year2020-2021
Supervisor

Ilias Zavitsanos

IliasZavitsanos

Summary

Natural language inherently contains an interpretation of the world in the form of vocabulary and the different meanings of words. Language changes can reflect sociocultural evolution; therefore, their systematical exploration is a valuable tool to social and humanities sciences researchers. In this thesis, we examine the detection of semantic changes between two time periods t1, t2. For the empirical study, we use datasets of four different languages (English, German, Latin, and Swedish ) provided from the SemEval-2020 Task 1. The whole set of our experiments is evaluated against a binary classification task, depending on whether a word’s sense changes or not.

Based on the results of our empirical study, we answer three different questions. The first is related to identifying the most suitable alignment method for the word embeddings Wt1, Wt2. The methods under investigation are the Orthogonal Procrustes, the Incremental Training, and the Temporal Word Embeddings with a Compass. The next question refers to the performance of the Word2vec pre-trained embeddings compared to others whose weights had not been prior initialized. Finally, through the application of LDA2vec, we explore whether the LDA (Latent Dirichlet Allocation) topics improve the performance of the SGNS (Skip-gram with Negative Sampling) or not.