Hadoop (https://hadoop.apache.org) is a big data processing framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Due to the increasing importance of Hadoop in industry, it has been widely studied in the literature.
@@ -12,8 +15,6 @@ Each application has been run for several times, simulating both normal and abno
We provide the labeled abnormal/normal job IDs in `abnormal_label.txt`.
### Download
The raw logs are available for downloading at https://github.com/logpai/loghub.
### Citation
If you use this dataset from loghub in your research, please cite the following paper.