@@ -49,6 +49,23 @@ To the best of our knowledge, this is the largest competition-validated dataset
Both datasets are reusable beyond microservice RCA: time-aligned Metrics+Logs+Traces support multimodal time-series anomaly detection; RCA100's causal-chain labels and entity-relation graphs supply ground truth for causal discovery and graph neural methods; AIOps2025's per-modality key-evidence labels are rare material for studying expert–agent reasoning-path divergence and agent self-evaluation.
## Citation
If you use AgenticOpsEval in your research, please cite the accompanying paper ([arXiv:2606.29193](https://arxiv.org/abs/2606.29193)):
```bibtex
@misc{cai2026agenticopseval,
title={A Multi-Dataset Benchmark for Evaluating LLM Agents in Microservice Failure Diagnosis},
author={Cai, Yuanhong and Nie, Xiaohui and Yin, Kanglin and Pei, Changhua and Sun, Yongqian and Zhang, Shenglin and Liu, Haibin and Liu, Guiyang and Wen, Xidao and Situ, Fang and Pei, Dan},