Text Classification Using Natural Language Processing and Deep Learning
Muhammad Zeeshan, Department of Computer Sciences, University of Engineering and Technology, Taxila, Pakistan.
Zeshan Iqbal, Department of Computer Sciences, University of Engineering and Technology, Taxila, Pakistan.
Corresponding Author:
Muhammad Zeeshan (im.zishan0303@gmail.com)
Abstract:
Text classification is a task in Natural Language Processing (NLP) that aims to classify text data including sentences, documents and questions, etc. The process of text classification has become increasingly important as huge digital documents have become more common, especially for businesses looking to increase productivity or even profitability. Hence, the researchers are keen to develop various automated methods to achieve the task. We suggest a completely automated text classification method employing Deep Learning (DL) based frameworks such as Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Bi-Directional Encoder Representation from Transformers (BERT). Two public and state-of-the-art databases have been used to conduct this study. Initially, the text database is cleaned by removing special characters and stop words that do not contribute to classification. The text data is then tokenized and supplied to frameworks for deep feature extraction and classification. Various experiments are conducted to find optimal architectures and hyper-parameter values. The final architecture attained highest validation accuracy of 98% and can be deployed for real-time text classification scenario.
Keywords:
Deep Learning; Text Classification; CNN; LSTM; BERT