Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering
- Vélez de Mendizabal, Iñaki 1
- Vidriales, Xabier 1
- Basto-Fernandes, Vitor 2
- Ezpeleta, Enaitz 1
- Méndez, José Ramón 345
- Zurutuza, Urko 1
-
1
Universidad de Mondragón/Mondragon Unibertsitatea
info
- 2 Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL
- 3 Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Universidade de Vigo
- 4 CINBIO - Biomedical Research Centre, Universidade de Vigo
- 5 SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), Hospital ÁlvaroCunqueiro Bloque técnico
ISSN: 1989-1660
Año de publicación: 2023
Volumen: 8
Número: 4
Páginas: 46-55
Tipo: Artículo
Otras publicaciones en: International Journal of Interactive Multimedia and Artificial Intelligence
Resumen
The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order todistribute content over networks. The distribution of content encoded in images or the use of Leetspeak areconcrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealingwith these problems, the number of studies to solve them is quite small, and the reported performance is verylimited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposesa new technique based on using neural networks for decoding purposes. In addition, we distribute an imagedatabase specifically created for training Leetspeak decoding models. We have also created and made availablefour different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, wehave experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained haveshown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences.