Purchase Prediction in E-Commerce Business using Machine Learning Approaches
Fahad Saeed, Department of Computer Sciences, University of Engineering and Technology, Taxila, Pakistan.
Farrukh Zeeshan, Department of Computer Sciences, University of Engineering and Technology, Taxila, Pakistan.
Corresponding Author:
Fahad Saeed (fahad.saeed6825@gmail.com)
Abstract:
Accurately predicting consumer purchase behavior has become essential for companies looking to improve marketing tactics, raise customer satisfaction levels, and increase income as e-commerce platforms continue to grow at an exponential rate. Though most people are used to performing online shopping, thus, to understand the online behavior and intent of online visitors is the subject of a long line of research. Therefore, it is needed to understand the purchase intent of visitors, to increase the number of visits that end with a purchase. To predict the customer purchase intention, we are applying different machine learning algorithms on a standard dataset. ML algorithms identify customers’ purchase intentions and the one with highest accuracy is considered. The initial step is the pre-processing of data in the dataset. In this step, apply the semi supervised clustering technique because we have the data of complete or incomplete orders. After pre-processing the data, clusters of data are made that are formed based on completion or incompletion of orders. Incomplete orders may have several reasons like product price, product not in high demand of user, or user rejection. Overall, most algorithms produced extremely excellent performance results, particularly on the sequence and whole dataset. These results are valuable to management and decision makers since they provide a foundation for making business decisions. The full dataset produced the best results when compared to the other datasets, with the sequence dataset coming in second. The customer dataset produced by far the worst outcomes across all algorithms. The amount of training data, generating the padded sequence data for the RNN input, training iterations, and hyper-parameter were all impacted by the memory and processing limitations of a local machine. In regards to the training data, there wasn't enough of it, particularly for the FNN and RNN, two algorithms that need a lot of data to correctly identify patterns in the data and make accurate predictions. Although it would have resulted in longer training times, using more data would have most likely produced higher prediction results.
Keywords:
Machine Learning Approaches; Natural Language Processing (NLP); Support Vector Machines (SVMs); Logistic Regression (LR); Random Forest (RF).