This repository contains all the digital artifacts associated with the paper "AutoFlow: An Autoencoder-based Approach for IP Flow Record Compression with Minimal Impact on Traffic Classification."
The dataset used in this work is available in the IFLforTFC repository. The notebook will automatically download it, but you can also obtain it manually:
- Direct link: dataset.parquet
- Contents: 3,163,462 IP flow records with 72 features and application labels for 10 traffic classes
1. VanillaAE
This Jupyter Notebook contains all resources related to the Vanilla Autoencoder used in the paper. It includes the entire workflow:
- Data Preprocessing: Includes steps like outlier removal and robust scaling.
- Autoencoder Architecture: Details of the Vanilla Autoencoder model used for IP flow record compression.
- Training Process: Training configuration, loss function, optimizer, and training duration.
- Results Analysis: Contains visualizations and metrics for model performance.
This Jupyter Notebook provides a comparative analysis between the Vanilla Autoencoder and the Denoising Autoencoder (DAE).
- Install dependencies:
pip install torch scikit-learn pandas numpy matplotlib seaborn scipy joblib requests - Run VanillaAE.ipynb: Downloads the dataset and reproduces all experiments and results from the paper.
- Run VanillaAEvsDAE.ipynb: Comparative analysis between Vanilla AE and Denoising AE. Requires
dataset.parquet(produced by step 2).
Both notebooks include pre-computed cell outputs matching the final results reported in the paper.