As the world population grows, the demand for workers increases, leading to a rise in online job advertisements to connect employers with potential employees on a national scale. However, this shift also brings the risk of falling victim to fraud. Reported commercial crimes in Malaysia saw a 15.3% increase in 2021, with fraud being the highest among them. Several studies have proposed Machine Learning models to classify genuine and fraudulent job advertisements, but the analysis of certain techniques remains limited. The paper aims to develop a predictive model for identifying fraudulent job advertisements using selected features from imbalanced and balanced datasets. The Employment Scam Aegean Dataset was utilized to build Machine Learning classification models using Logistic Regression, Support Vector Machine, Decision Tree, and Naïve Bayes algorithms. These models were combined with different vectorizers like Term Frequency-Inverse Document Frequency, Bag of Words, and Hash. The Decision Tree model with Bag of Words vectorizer on a balanced dataset outperformed other models, achieving an accuracy of 0.705, precision of 0.73, recall of 0.70, F1-score of 0.71, and Area Under Curve score of 0.68. This model shows promise in effectively identifying fraudulent job advertisements, safeguarding job seekers from scams in the online job market.
Abdullah Asuhaimi, F., Pauzai, A. N., Yusob, L. M., & Asari, K.-N. (2017). Rules on advertisement in Malaysia. World Applied Sciences Journal, 35(9), 1723–1729.
ACFE. (2023). Fraud 101: What is Fraud? Association of Certified Fraud Examiner.
Agarwal, P., Reddivari, S., & Reddivari, K. (2022). Fake news detection: An investigation based on machine learning. Proceedings - 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science, IRI 2022, 61–62.
Amaar, A., Aljedaani, W., Rustam, F., Ullah, S., Rupapara, V., & Ludi, S. (2022). Detection of fake job postings by utilizing machine learning and natural language processing approaches. Neural Processing Letters, 54(3), 2219–2247.
Bauder, R. A., Khoshgoftaar, T. M., & Hasanin, T. (2018). Data sampling approaches with severely imbalanced big data for medicare fraud detection. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2018-Novem, 137–142.
Carmen, Glover; Janet, N. (2010). How to Use the Internet to Get Your Next Job.
Chen, Y. R., Leu, J. S., Huang, S. A., Wang, J. T., & Takada, J. I. (2021). Predicting default risk on peer-to-peer lending imbalanced datasets. IEEE Access, 9, 73103–73109.
Cross, C., & Grant-Smith, D. (2021). Recruitment Fraud: Increased opportunities for exploitation in times of uncertainty? 40(4), 9–14.
Daud, M. (2021). Freedom of misinformation and the relevance of co-regulation in malaysia: a cross-jurisdictional analysis. IIUM Law Journal, 29(2), 27–54.
DOSM. (2022). Big Data Analytics Job Market Insights and My Job Profile: Job Vacancies Landscape in Malaysia, Third and Fourth Quarter of 2021 Job [Media Statement] (pp. 1–4).
FBI. (2016). Crime in United States: Offense Definitions.
Goyal, N., Mamidi, R., Sachdeva, N., & Kumaraguru, P. (2023). Warning: It’s a scam!! Towards understanding the employment scams using knowledge graphs. ACM International Conference Proceeding Series, 303–304.
Gupta, V., Mathur, R. S., Bansal, T., & Goyal, A. (2022). Fake news detection using machine learning. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, May, 84–89.
Habiba, S. U., Islam, M. K., & Tasnim, F. (2021). A comparative study on fake job post prediction using different data mining techniques. International Conference on Robotics, Electrical and Signal Processing Techniques, 543–546.
Idrus, P. G. (2022). Indonesia to increase supervision to stop citizens from being trafficked to Cambodia—BenarNews. BenarNews.
International Labour Organization (ILO). (2022). Global Estimates of Modern Slavery: Forced Labour and Forced Marriage.
ITU. (2022). Number of internet users worldwide from 2005 to 2021 (in millions) [Graph].
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1–54.
Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys, 52(4), 1–36.
Lal, S., Jiaswal, R., Sardana, N., Verma, A., Kaur, A., & Mourya, R. (2019). ORFDetector: Ensemble Learning Based Online Recruitment Fraud Detection. 2019 12th International Conference on Contemporary Computing, IC3 2019.
Lokanan, M., & Liu, S. (2021). Predicting fraud victimization using classical machine learning. Entropy, 23(300), 1–19.
Lokku, C. (2021). Classification of Genuinity in job posting using machine learning. International Journal for Research in Applied Science and Engineering Technology, 9(12), 1569–1575.
Maleki, F., Ovens, K., Najafian, K., Forghani, B., Reinhold, C., & Forghani, R. (2020). Overview of machine learning Part 1: Fundamentals and classic approaches. Neuroimaging Clinics of North America, 30(4), 17–32.
Mehboob, A., & Malik, M. S. I. (2021). Smart fraud detection framework for job recruitments. Arabian Journal for Science and Engineering, 46(4), 3067–3078.
Mrozek, P., Panneerselvam, J., & Bagdasar, O. (2020). Efficient resampling for fraud detection during anonymised credit card transactions with unbalanced datasets. Proceedings - 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing, UCC 2020, 426–433.
Nasser, I. M., Alzaanin, A. H., & Maghari, A. Y. (2021). Online Recruitment Fraud Detection using ANN. Proceedings - 2021 Palestinian International Conference on Information and Communication Technology, PICICT 2021, 13–17.
Nessa, I., Zabin, B., Faruk, K. O., Rahman, A., Nahar, K., Iqbal, S., Hossain, M. S., Mehedi, M. H. K., & Rasel, A. A. (2022). Recruitment Scam Detection Using Gated Recurrent Unit. 2022 IEEE 10th Region 10 Humanitarian Technology Conference (R10-HTC) 2022, 445–449.
Niaz, N. U., Shahariar, K. M. N., & Patwary, M. J. A. (2022). Class imbalance problems in machine learning: A review of methods and future challenges. ACM International Conference Proceeding Series, 485–490.
Ravenelle, A. J., Janko, E., & Kowalski, K. C. (2022). Good jobs, scam jobs: Detecting, normalizing, and internalizing online job scams during the COVID-19 pandemic. New Media and Society, 24(7), 1591–1610.
Rekha, G., Tyagi, A. K., Sreenath, N., & Mishra, S. (2021). Class Imbalanced Data: Open Issues and Future Research Directions. 2021 International Conference on Computer Communication and Informatics, ICCCI 2021.
Rubaidi, Z. S., Ammar, B. Ben, & Aouicha, M. Ben. (2022). Fraud detection using large-scale imbalance dataset. International Journal on Artificial Intelligence Tools, 31(8), 1–23.
Santhi, K., & Rama Mohan Reddy, A. (2019). A systematic methodology on class imbalanced problems involved in the classification of real-world datasets. International Journal of Recent Technology and Engineering, 8(3), 7071–7081.
Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science, 2(3), 1–21.
SPF. (2022). Leading types of scams in Singapore in 2021, by number of cases [Graph] (p. [Online]).
Tabassum, H., Ghosh, G., Atika, A., & Chakrabarty, A. (2021). Detecting Online Recruitment Fraud Using Machine Learning. 2021 9th International Conference on Information and Communication Technology, ICoICT 2021, 472–477.
Tharwat, A. (2018). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192.
Tran, T. C., & Dang, T. K. (2021). Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection. Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication, IMCOM 2021.
Vidros, S., Kolias, C., & Kambourakis, G. (2016). Online recruitment services: Another playground for fraudsters. Computer Fraud & Security, 2016(3), 8–13.
Vidros, S., Kolias, C., Kambourakis, G., & Akoglu, L. (2017). Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Future Internet, 9(1), 1–19.