Cross site scripting attacks (XSS) are one of the main security threats facing current web applications, and attackers often use multi-layer encoding and obfuscation techniques to evade traditional detection mechanisms. This paper proposes an improved feature modeling and preprocessing method to address the issues of incomplete semantic restoration of multiple obfuscated scripts in the preprocessing stage and ignoring cross domain contextual associations between HTML and JavaScript in the feature modeling stage. The core of this method includes: 1) a layered trigger decoding mechanism that achieves efficient and accurate semantic restoration of complex obfuscated payloads through a detection driven recursive decoding strategy; 2) A domain aware feature modeling method that combines HTML structural features with JavaScript behavioral features to construct a joint feature representation that captures cross domain semantics. To verify the effectiveness of the method, this paper crawled and annotated a dataset containing 30000 samples based on the open-source XSS vulnerability library XSSed and real network traffic. The experimental results on this dataset show that the proposed method outperforms traditional preprocessing processes in terms of accuracy, recall, and F1 score. In addition, compared with mainstream deep learning models, this method improves inference speed by only 2.1 ms/sample while maintaining a considerable detection accuracy of F1 Score 99.59%, demonstrating its practical value and potential in web security applications that require high real-time performance and low resource consumption.
Alaoui, R. L., & Nfaoui, E. H. (2023). Cross site scripting attack detection approach based on LSTM encoder-decoder and word embeddings. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 277-282.
Alaoui, R. L., & Nfaoui, E. H. (2023). Cross site scripting attack detection approach based on LSTM encoder-decoder and word embeddings. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 277-282.
Alsaffar, M., Aljaloud, S., Mohammed, B. A., Al-Mekhlafi, Z. G., Almurayziq, T. S., Alshammari, G., & Alshammari, A. (2022). Detection of web cross-site scripting (xss) attacks. Electronics, 11(14), 2212.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Gupta, S., & Gupta, B. B. (2016). XSS-SAFE: a server-side approach to detect and mitigate cross-site scripting (XSS) attacks in JavaScript code. Arabian Journal for Science and Engineering, 41(3), 897-920.
Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2), 217–288.
Hamzah, K. H., Osman, M. Z., Anthony, T., Ismail, M. A., Abdullah, Z., & Alanda, A. (2024). Comparative analysis of machine learning algorithms for cross-site scripting (XSS) attack detection. JOIV?: International Journal on Informatics Visualization, 8(3–2), 1678.
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28.
Lee, W. M. (2019). Getting started with scikit-learn for machine learning. Python Machine Learning, John Wiley & Sons Inc, 93-117.
Li, Z., Liu, F., Gu, Z., & Liu, Y. (2025). XSS attack detection method based on cnn-bilstm-attention. Applied Sciences, 15(16), 8924.
Mahdavifar, S., & Ghorbani, A. A. (2023). CapsRule: Explainable deep learning for classifying network attacks. IEEE Transactions on Neural Networks and Learning Systems, 35(9), 12434-12448.
Meghdouri, F., Zseby, T., & Iglesias, F. (2018). Analysis of lightweight feature vectors for attack detection in network traffic. Applied Sciences, 8(11), 2196.
OWASP. (2021). OWASP top ten 2021. OWASP Foundation. https://owasp.org/www-project-top-ten/2021
Pazos, J. C., Legare, J. S., Beschastnikh, I., & Aiello, W. (2020). Precise XSS detection and mitigation with Client-side Templates. arXiv e-prints, arXiv-2005.
Pazos, J. C., Legare, J.-S., & Beschastnikh, I. (2021). XSnare: Application-specific client-side cross-site scripting protection. 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 154–165.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Pradeepa. P. K. (2022). A survey on an investigation of detection & prevention methods for cross-site scripting (XSS) attacks. International Journal of Advanced Research in Science, Communication and Technology, 405–413.
Rathore, S., Sharma, P. K., & Park, J. H. (2017). XSSClassifier: An efficient XSS attack detection approach based on machine learning classifier on snss. Journal of Information Processing Systems.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Subba, B., & Gupta, P. (2021). A tfidfvectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Computers & Security, 100, 102084.
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. Proceedings of the 26th Annual International Conference on Machine Learning, 1113–1120.
Li, H., & Kamsin, A. (2025). Research on Feature Modeling and Preprocessing Methods for Cross-Site Scripting Attack Detection. International Journal of Academic Research in Business and Social Sciences, 15(11), 630–643.
Copyright: © 2025 The Author(s)
Published by Knowledge Words Publications (www.kwpublications.com)
This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at: http://creativecommons.org/licences/by/4.0/legalcode