Author Age and Gender Identification using Query Likelihood and Vector Space Models
Lala Rukh, Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture Peshawar, Pakistan.
Muhammad Arshad, Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture Peshawar, Pakistan.
Bilal Khan, Department of Computer Science, City University of Science and Information Technology Peshawar, Pakistan.
Asfandyar Khan, Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture Peshawar, Pakistan.
Mohib Ullah, Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture Peshawar, Pakistan.
Sana Zahir, Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture Peshawar, Pakistan.
Corresponding Author:
Lala Rukh (rukh82@aup.edu.pk)
Abstract:
Author profiling is a piece of data recovery wherein distinctive planned of a creator are seen by considering different attributes like local language, sex and age. Various methods are utilized to extricate the necessary data utilizing content investigation like recognizable proof of creator via web-based networking media and for Short Message Administration (SMS). Writer profiling helps in security and advertising purposes for distinguishing proof while catching writers composing conduct through messages, posts, remarks, websites, tweets and talk logs. Most of work around there has been accomplished for English and other local dialects. Then again, Roman Urdu is likewise getting consideration for creator profiling task, yet it needs to change over roman Urdu to English to extricate significant highlights like Named Substance Acknowledgment (NER) and other semantic highlights. The change may misfortune significant data while having impediment to change over one language to another dialect. This exploration investigates AI systems that can be utilized for all dialects to defeat the transformation constraint. Vector Space Model (VSM) and Question Probability (QL) are utilized to distinguish the creator's age and sexual orientation. Test results uncovered that QL delivers better outcomes as far as precision.
Keywords:
Vector Space Model; Query Likelihood Model; Information Retrieval (IR); Text Mining; Author Profiling Security