For the past few months, I am working on a project and the time for the official release is coming closer and closer. I and my colleagues had put a lot of thought and work into it, in order to provide the best possible experience to the consumer.
So, I came up with the idea to find out what people think about our new product when it is released. My first thought was Twitter of course (where you can follow me btw! :P). I figured to track Twitter’s feed and extract the tweets that mention the name of our product. But then what? Am I going to go through the entire database and make a decision about the general thought? NO of course! That’s too much work! I will use a technique called sentiment analysis and fortunately, Python has some great libraries for that!
So with no further ado, let’s start! In this post, I will cover only the sentiment analysis part. I will later add a post that I will explain how to get in real-time the twitter feed. For now, you can check one of my previous posts, Mining the Social Media using Python 2.7 on how to get tweets from Twitter.
1. Set up the environment
- Download Python
- Get a sentiment analysis package
I like to use TextBlob or a DeepLearning model(later post) for sentiment analysis.
To install TextBlob simply type
pip install -U textblob
2. Seeing TextBlog in practice
Let’s start with a simple example. Every single line of code you can find it on my GitHub page.
# Import the TextBlob package from textblob import TextBlob # A simple sentiment analysis test # Create a TextBlob from a string or a text test_blob = TextBlob("This is an awesome day!") # Extract sentiment from blob print(test_blob.sentiment) # Printed: Sentiment(polarity=1.0, subjectivity=1.0)
As we can see we have a tuple in the form of (polarity, subjectivity). Polarity if the actual sentiment that ranges [-1.0, 1.0], where -1 is negative and 1 is positive (makes sense) and subjectivity ranges [0.0, 1.0], where 0 is very objective and 1 is very subjective.
You can tweak around with the thresholds and see what fits best for your dataset. Usually, do not count the 0 polarity as one of the sentiments but go really, really near the 0 like 0.001 or 0.01. In theory, you have three sentiment states, positive, negative and neutral, in practice, you have two, positive and negative. If you want to account for neutral you can adjust the thresholds, for example, negative it could be less than -0.1, positive more than 0.1 and neutral between -0.1 and 0.1.
# Import the TextBlob package from textblob import TextBlob # A negative text negative_blob = TextBlob("This is an awful day!") # Extract sentiment from blob print(negative_blob.sentiment) # Print: Sentiment(polarity=-1.0, subjectivity=1.0) # A neutral text neutral_blob = TextBlob("My day is neutral") # Extract sentiment from blob print(neutral_blob.sentiment) # Print: Sentiment(polarity=0.0, subjectivity=0.0)
This neutral text is a bit too obvious, let’s try another one
# Another neutral text neutral_blob = TextBlob("My day is neither positive nor negative") # Extract sentiment from blob print(neutral_blob.sentiment) #Print: Sentiment(polarity=-0.03636363636363636, subjectivity=0.4727272727272727)
As we can see, the statement is neutral but the polarity is negative. I haven’t played that much with TextBlob to figure out the “perfect” thresholds, but it seems to work really good on the range of |0.05| to |0.03|.
That’s all for today’s post! Please let me know if you have any questions in the comments section below or post on my Twitter @siaterliskonsta! Till next time, take care and bye-bye!
2 thoughts on “An Introduction to Sentiment Analysis with Python”
I didn’t realize there were Python packages for sentiment analysis. The last time I saw something like this in the works was a senior project a couple years ago. The team was using sentiment analysis on Amazon reviews to categorize books by mood. This is some cool stuff! Thanks for sharing. 🙂
Thank you very much for your feedback! Yea, there are some petty amazing tools! I couldn’t believe the accuracy of TextBlob. There are some other packages for sentiment analysis but this one seems to outperform the other. I have still left to compare the deep learning model, something I am currently building/training.