Text Classification Model – Classify Text SMS As Spam Or Ham

Text Classification is a technique using which we organize or group text with some labels or categories.

Text Classification Model – Classify Text SMS As Spam Or Ham

Text Classification is a technique using which we organize or group text with some labels or categories. It is also known as Text Tagging. For example, “I am feeling very happy and energetic today”. According to Sentiment Analysis/ Text Classification, we can classify it as ‘Positive’. We use various Machine Learning algorithms/ Deep Learning techniques along with Natural language Processing (NLP) to organize text data. This article will discuss how a pre-trained Machine Learning model can classify a given text SMS as Spam or Ham. The Text Classification Model is based on the Multinomial Naive Bayes Algorithm. The complete code is written in Python Programming Language.

Pykit- Text Classification Model SMS Spam Detection Pretrained Model
AI Market Place- Pykit

Examples Where We Can Use Text Classification

  • Sentiment Analysis of a given text- Positive or Negative
  • Categorization of News Articles or Topic Detection
  • Detection of Language

Steps Involved In SMS Text Classification- Python-Based

  • Load Pre-Trained Model
  • Provide Text Data
  • Get Prediction

We have created a Flask-based Rest API for the model. We will use the API endpoint to get the prediction. The model can predict the text as Spam or Ham along with the level of confidence.

Open Python Editor & Load Pre-Trained Model

# to load the Label Encoder/ Count-vectorizer
import pickle
with open('parameters.pickle', "rb") as f:
     Le, cv = pickle.load(f)
with open('classifier.pickle', "rb") as m:
     clf = pickle.load(m)

Complete Implementation with Python-Flask Module

import pickle
from flask import Flask, jsonify

# function to predict result
def model_prediction(usr_txt):
    result = {}
    with open('parameters.pickle', "rb") as f:
        Le, cv = pickle.load(f)
    with open('classifier.pickle', "rb") as m:
        clf = pickle.load(m)
    cv_text = cv.transform([usr_txt]).toarray()
    pred_res = clf.predict(cv_text)
    result['pred_label'] = Le.inverse_transform(pred_res)[0]
    result['confidence'] = {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]}
    result['input_text'] = usr_txt
    return jsonify(pred_label= Le.inverse_transform(pred_res)[0],
                   confidence= {'ham':clf.predict_proba(cv_text)[0][0],'spam':clf.predict_proba(cv_text)[0][1]},
                   input_text= usr_txt)

# flask implementation
app = Flask(__name__)
@app.route("/")

def home():
     return '''Created & Distributed by Pykit: https://pykit.org/'''
    
@app.route("/smsPredict/<string:txt>")
def smsPredict(txt):
    userText = txt
    return model_prediction(userText)

if __name__ == "__main__":
    app.run(debug=True, use_reloader=False)

Execute the above Python script

Pykit- Text Classification Model SMS Spam Detection Pretrained Model
Execute Script- Click On The Link
Home Page/ Index Page

Provide a Text and Press Enter To Get Results:

Provide A Text- Sample
Provide A Text- Sample

Result- JSON

Prediction Result- Spam or Ham
Prediction Result- Spam or Ham

Explanation of JSON Result- Text Classification Model Result

  1. confidence– The level of confidence at which the Text Classification Model predicts a message as Spam or Ham. For example, the above sample has around 99.2% that it’s a Ham and a 0.8% chance that it’s Spam.
  2. input_text– Text provided by the user.
  3. pred_label– Label predicted by the Text Classification Model.

The complete implementation of the code is available at my GitHub repository.

If you want to learn the complete steps of building an SMS Spam Classification Model from scratch, you can check out my recent article “Build Email Spam Classification Model (Naive Bayes Classifier)

Summary

In this article, we discussed how we can Use a pre-trained model to classify a given text SMS as Spam or Ham (not spam). For text classification, we have used a Multinomial Naive Bayes Classification model written in Python Programming language to predict the result. We have also implemented the complete setup as a Flask-based Rest API.