Exploring Apple’s new built-in sentiment analysis and NLP’s text classification on the Rotten Tomatoes dataset
Apple showed some good progress in natural language processing WWDC 2019, They have brought improvements to both the two pillars of NLP, text classification and word tagging.
In this article, we will only discuss the progress made in text classification, which deals with classifying input text into a set of predefined class labels.
This year, Apple has come up with transfer learning technology for training text classifier models. Transfer learning pays attention to the semantics of the overall context of the text and is able to detect cases where the same word has different meanings in different parts of the text.
Although state-of-the-art transfer learning is better equipped for semantic analysis, it takes a little longer to train than the maximum entropy algorithm.
Apple has also introduced built-in sentiment analysis this year. Using the NLP framework, you now obtain a score in the range of -1 to 1 indicating the degree of emotion.
Here is a sample code showing how NLP’s built-in sentiment analysis predicts sentiment scores in a string:
import NaturalLanguagetagger.string = textlet (sentiment, _) = tagger.tag(at: text.startIndex, unit: .paragraph, scheme: .sentimentScore)print(sentiment.rawValue)
The underlying sentiment analysis is very accurate as we can see below:
In the next sections, we will use Rotten Tomatoes dataset To build the text classifier Core ML model using CREATE ML and deploy it in our natural language framework.
- Converting csv dataset to train test folder.
- To train the dataset using CREATE ML.
- Deploying the Core ML Model in the Natural Language Framework with SwiftUI.
Our dataset is a CSV file as shown below:
The following Python script is used to split the CSV into training and test data folders:
import pandas as pddf = pd.read_csv("rotten_tomatoes_reviews.csv", nrows=50000)
df.columns = ["freshness", "review"]
# Split data into training and testing sets by label
train = df.sample(frac=0.8, random_state=42)
train_good = train[train.freshness == 1]
train_bad = train.drop(train_good.index)
test = df.drop(train.index)
test_good = test[test.freshness == 1]test_bad = test.drop(test_good.index)
# Create folders for data
#Write out data
def write_text(path, df):
for i, r in df.iterrows():
with open(path + str(i) + ".txt", "w") as f:
For the sake of simplicity and timing, we will parse the first 50000 rows out of 480,000 Rotten Tomatoes reviews and divide the dataset into a standard 80–20 ratio.
Once the csv is split into the respective folders, we can launch our Create ML application, which has now got an independent entity this year.
Create a new text classifier model project in Create ML and add the training folder. You can choose any technique to train your model and drink a cup of coffee during training and validation. It took me 4 hours to train a transfer learning model.
Here is an example that compares the model metrics in the two techniques in text classification:
Transfer learning based model extrapolates better. Although you can try to get better accuracy by increasing the size of the dataset (training the TL model took four hours for me on a dataset of 15000 texts).
Alternatively, you can create your model programmatically using Create ML. Just pass the desired algorithm in the argument
init(trainingData: MLTextClassifier.DataSource, parameters: MLTextClassifier.ModelParameters)
Evaluate our model
Once the model is trained and tested, go to the Output tab in Create ML and enter the text to run the predictions on. The following illustration shows some of the predictions I’ve run into.
Now our model is ready to be deployed for natural language processing.
Create a new Xcode SwiftUI based project and drag and drop the Core ML model we created earlier.
We are developing an application that shows a list of texts (usually Rotten Tomatoes reviews) on which we will run our Core ML model using NLP to determine whether it was rot (good or bad). review).
In addition, we will run NLP’s built-in sentiment analysis to find out how it predicts Rotten Tomatoes reviews and see how accurate the sentiment degree and CoreML predictions are.
The following code shows how to feed the Core ML model into the Natural Language Framework:
let predictor = try NLModel(mlModel: ReviewClassifier().model)
predictor.predictedLabel(for: "Fight Club was a master piece. One of it's kind cinema.")
Creating SwiftUI Lists with Navigation
Let’s start by creating a SwiftUI list that is populated using a review structure that conforms to the identifiable protocol.
In the above code, we have added navigation links to the list items that take them to the ReviewDetail view which we will see next:
The above code is straightforward. We’ve added two SwiftUI buttons that evaluate text for sentiment analysis and how good or bad the review is, respectively.
In return, we get the following results in our SwiftUI preview:
NLP’s built-in sentiment analysis has worked great with a dataset that it is largely not familiar with. At the same time, our Core ML model performed decently alongside the natural language framework in determining whether the reviewer liked the film or panned it.
So that is the essence of Core ML and Natural Language Framework. We saw how sentiment analysis built into language processing is such a powerful tool.
Full source code along with python script for parsing csv is available in github repository, In addition, it includes models built using maximum entropy and transfer learning. You can try playing with them and see which one fits the bill better.
That’s all for it. I hope you enjoyed reading and model training!
#Categorize #movie #reviews #Natural #Language #Framework #Core #SwiftUI #Anupam #Chughu