@datajoely , no not yet, I may have a play around soon
@Nok Lam Chan the CSV files are converted from pcap files, where each pcap file represents a TCP session (or "flow"). Some of these sessions are huge in the source datasets and unfortunately can't be helped. I will sample from these CSV files later in the process, so that the resulting data samples I'll be using when training my models will only use a fraction of the large files (in reference to my original problem statement, this will happen in the Y node).
Reasons for not wanting to move from CSV files are:
1. this is what the current code is doing, and I would prefer to stick more closely with that for now
2. some encrypted network traffic source datasets come preprocessed in CSV files (i.e. no pcap files are provided) where each CSV file represents a session