Transformer-based models have emerged as a powerful solution for network traffic classification, achieving high accuracy by autonomously learning patterns in raw traffic data. However, their high computational costs make real-time deployment impractical. In contrast, industry-proven tools like Snort and Suricata offer efficient network analysis but rely on manually crafted signatures, resulting in slower updates and limited adaptability to emerging threats.In this work, we propose a cascading model that leverages the strengths of both approaches. During training, a transformer-based model learns traffic patterns, which are then extracted using SHAP analysis to enhance the knowledge base of a signature-based IDS. In deployment, the IDS handles routine classifications, while only complex cases are escalated to the transformer model. Our experiments combining the analysis of ET-BERT with Snort demonstrate a four-fold performance improvement over running only ET-BERT without compromising false positive or false negative rates.