National Institute for Research and Development in Informatics – ICI Bucharest
Abstract: In this paper, it is proposed to understand how the computer is able to extract a simple human feeling of “liked” or “disliked” from a text. Basically the computer will learn to correctly place a movie review in one of the two categories of positive or negative. We’ll see how, starting with input values and output values called labels, the computer begins to learn and correctly recognize the output value (in this case the 0 or 1 digit, zero representing a negative feeling and the one a positive feeling) through a model built on the technique called supervised learning. So the proposed objective is to guess the human feeling (translated by the number 0 or number 1) which is in fact the output value of the model, at a new value of the input, once this model has been known. In this exercise we will use Keras API built on TensorFlow, a set of movie reviews taken from IMDB and a recurring neural network RNN with LSTM (Long-Short Term Memory) cells to preserve the memory of the words that were previously encountered. Keras comes with a set of 50,000 movie reviews that were already pre-processed (this will be explained below). By feeding the neural network with these tens of thousands of texts (25,000 texts for training followed by another 25,000 texts for test), the model built by Keras (using relationships of the words), manages to guess with a good accuracy, the positive or negative human feeling, in other words the polarity of the text. The applications for sentiment analysis are endless starting from social media monitoring and VOC, tweets and facebook posts analyzes, to the business analysis by text analysis.
Keywords: library, vector, tensor, matrix, LSTM cells, labels, variable, back propagation, forward propagation.
CITE THIS PAPER AS:
Paul TEODORESCU, Extracting a human feeling from a text (a natural language processing task called Sentiment Analysis) using a recurrent neural network together with Keras library, Romanian Journal of Information Technology and Automatic Control, ISSN 1220-1758, vol. 30(3),
pp. 119-132, 2020. https://doi.org/10.33436/v30i3y202009