Text Classification Model – Classify Text SMS As Spam Or Ham

Text Classification is a technique using which we organize or group text with some labels or categories.

Anton

Nov 4, 2022 — 3 min read

Text Classification is a technique using which we organize or group text with some labels or categories. It is also known as Text Tagging. For example, “I am feeling very happy and energetic today”. According to Sentiment Analysis/ Text Classification, we can classify it as ‘Positive’. We use various Machine Learning algorithms/ Deep Learning techniques along with Natural language Processing (NLP) to organize text data. This article will discuss how a pre-trained Machine Learning model can classify a given text SMS as Spam or Ham. The Text Classification Model is based on the Multinomial Naive Bayes Algorithm. The complete code is written in Python Programming Language.

Pykit- Text Classification Model SMS Spam Detection Pretrained Model — AI Market Place- Pykit

Examples Where We Can Use Text Classification

Sentiment Analysis of a given text- Positive or Negative
Categorization of News Articles or Topic Detection
Detection of Language

Steps Involved In SMS Text Classification- Python-Based

Load Pre-Trained Model
Provide Text Data
Get Prediction

We have created a Flask-based Rest API for the model. We will use the API endpoint to get the prediction. The model can predict the text as Spam or Ham along with the level of confidence.

Open Python Editor & Load Pre-Trained Model

# to load the Label Encoder/ Count-vectorizer
import pickle
with open('parameters.pickle', "rb") as f:
     Le, cv = pickle.load(f)
with open('classifier.pickle', "rb") as m:
     clf = pickle.load(m)

Complete Implementation with Python-Flask Module

import pickle
from flask import Flask, jsonify

# function to predict result
def model_prediction(usr_txt):
    result = {}
    with open('parameters.pickle', "rb") as f:
        Le, cv = pickle.load(f)
    with open('classifier.pickle', "rb") as m:
        clf = pickle.load(m)
    cv_text = cv.transform([usr_txt]).toarray()
    pred_res = clf.predict(cv_text)
    result['pred_label'] = Le.inverse_transform(pred_res)[0]
    result['confidence'] = {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]}
    result['input_text'] = usr_txt
    return jsonify(pred_label= Le.inverse_transform(pred_res)[0],
                   confidence= {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]},
                   input_text= usr_txt)

# flask implementation
app = Flask(__name__)
@app.route("/")

def home():
     return '''Created & Distributed by Pykit: https://pykit.org/'''
    
@app.route("/smsPredict/<string:txt>")
def smsPredict(txt):
    userText = txt
    return model_prediction(userText)

if __name__ == "__main__":
    app.run(debug=True, use_reloader=False)

Execute the above Python script

Click on the link in a new tab (in the browser):

Provide a Text and Press Enter To Get Results:

Result- JSON

Explanation of JSON Result- Text Classification Model Result

confidence– The level of confidence at which the Text Classification Model predicts a message as Spam or Ham. For example, the above sample has around 99.2% that it’s a Ham and a 0.8% chance that it’s Spam.
input_text– Text provided by the user.
pred_label– Label predicted by the Text Classification Model.

The complete implementation of the code is available at my GitHub repository.

If you want to learn the complete steps of building an SMS Spam Classification Model from scratch, you can check out my recent article “Build Email Spam Classification Model (Naive Bayes Classifier)“

Summary

In this article, we discussed how we can Use a pre-trained model to classify a given text SMS as Spam or Ham (not spam). For text classification, we have used a Multinomial Naive Bayes Classification model written in Python Programming language to predict the result. We have also implemented the complete setup as a Flask-based Rest API.