Merge branch 'master' into 'main' (7cd85ba6) · 提交 · AIOps-NanKai / model / Nezha

.gitignore

0 → 100644

+2 −0

原始行号	差异行号	差异行
		.vscode/
		__pycache__
		No newline at end of file

FSE2023_Nezha.pdf

0 → 100644

+3.6 MB

添加文件。

此文件类型的文件没有差异预览。

查看文件

INSTALL.md

0 → 100644

+73 −0

原始行号	差异行号	差异行
		# Quick Start

		### 1.1 Requirements

		- Python3.6 is recommended to run the anomaly detection. Otherwise, any python3 version should be fine.
		- Git is also needed.

		### 1.2 Setup

		download `Nezha` first by `git clone git@github.com:IntelligentDDS/Nezha.git`

		`python3.6 -m pip install -r requirements.txt` to install the dependency for Nezha

		### 1.3 Running Nezha

		#### 1.3.1 Localize OnlineBoutique at service level


		```
		python3.6 ./main.py --ns hipster --level service

		pattern_ranker.py:622: -------- hipster Fault numbuer : 56-------
		pattern_ranker.py:623: --------AS@1 Result-------
		pattern_ranker.py:624: 92.857143 %
		pattern_ranker.py:625: --------AS@3 Result-------
		pattern_ranker.py:626: 96.428571 %
		pattern_ranker.py:627: --------AS@5 Result-------
		pattern_ranker.py:628: 96.428571 %
		```

		#### 1.3.2 Localize OnlineBoutique at inner service level

		```
		python3.6 ./main.py --ns hipster --level inner

		pattern_ranker.py:622: -------- hipster Fault numbuer : 56-------
		pattern_ranker.py:623: --------AIS@1 Result-------
		pattern_ranker.py:624: 92.857143 %
		pattern_ranker.py:625: --------AIS@3 Result-------
		pattern_ranker.py:626: 96.428571 %
		pattern_ranker.py:627: --------AIS@5 Result-------
		pattern_ranker.py:628: 96.428571 %
		```

		#### 1.3.3 Localize Trainticket at service level

		```
		python3.6 ./main.py --ns ts --level service

		pattern_ranker.py:622: -------- ts Fault numbuer : 45-------
		pattern_ranker.py:623: --------AS@1 Result-------
		pattern_ranker.py:624: 86.666667 %
		pattern_ranker.py:625: --------AS@3 Result-------
		pattern_ranker.py:626: 97.777778 %
		pattern_ranker.py:627: --------AS@5 Result-------
		pattern_ranker.py:628: 97.777778 %
		```

		#### 1.3.4 Localize Trainticket at inner service level

		```
		python3.6 ./main.py --ns ts --level inner

		pattern_ranker.py:622: -------- ts Fault numbuer : 45-------
		pattern_ranker.py:623: --------AIS@1 Result-------
		pattern_ranker.py:624: 86.666667 %
		pattern_ranker.py:625: --------AIS@3 Result-------
		pattern_ranker.py:626: 97.777778 %
		pattern_ranker.py:627: --------AIS@5 Result-------
		pattern_ranker.py:628: 97.777778 %
		```

		The details of service level results and inner-service level results will be printed and recorded in `./log`
		No newline at end of file

LICENSE

0 → 100644

+21 −0

原始行号	差异行号	差异行
		MIT License

		Copyright (c) 2023 IntelligentDDS

		Permission is hereby granted, free of charge, to any person obtaining a copy
		of this software and associated documentation files (the "Software"), to deal
		in the Software without restriction, including without limitation the rights
		to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
		copies of the Software, and to permit persons to whom the Software is
		furnished to do so, subject to the following conditions:

		The above copyright notice and this permission notice shall be included in all
		copies or substantial portions of the Software.

		THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
		IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
		FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
		AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
		LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
		OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
		SOFTWARE.
		No newline at end of file

README.md

0 → 100644

+175 −0

原始行号	差异行号	差异行
		# Nezha

		This repository is the basic implementation of our publication in `FSE'23` conference paper [Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data](./FSE2023_Nezha.pdf)

		## Description

		`Nezha` is an interpretable and fine-grained RCA approach that pinpoints root causes at the code region and resource type level by incorporative analysis of multimodal data. `Nezha` transforms heterogeneous multi-modal data into a homogeneous event representation and extracts event patterns by constructing and mining event graphs. The core idea of `Nezha` is to compare event patterns in the fault-free phase with those in the fault-suffering phase to localize root causes in an interpretable way.

		## Quick Start

		### Requirements

		- Python3.6 is recommended to run the anomaly detection. Otherwise, any python3 version should be fine.
		- Git is also needed.

		### Setup

		Download `Nezha` first via `git clone git@github.com:IntelligentDDS/Nezha.git`

		Enter `Nezha` content by `cd Nezha`

		`python3.6 -m pip install -r requirements.txt` to install the dependency for Nezha


		### Running Nezha

		#### OnlineBoutique at service level


		```
		python3.6 ./main.py --ns hipster --level service

		pattern_ranker.py:622: -------- hipster Fault numbuer : 56-------
		pattern_ranker.py:623: --------AS@1 Result-------
		pattern_ranker.py:624: 92.857143 %
		pattern_ranker.py:625: --------AS@3 Result-------
		pattern_ranker.py:626: 96.428571 %
		pattern_ranker.py:627: --------AS@5 Result-------
		pattern_ranker.py:628: 96.428571 %
		```

		#### OnlineBoutique at inner service level

		```
		python3.6 ./main.py --ns hipster --level inner

		pattern_ranker.py:622: -------- hipster Fault numbuer : 56-------
		pattern_ranker.py:623: --------AIS@1 Result-------
		pattern_ranker.py:624: 92.857143 %
		pattern_ranker.py:625: --------AIS@3 Result-------
		pattern_ranker.py:626: 96.428571 %
		pattern_ranker.py:627: --------AIS@5 Result-------
		pattern_ranker.py:628: 96.428571 %
		```

		#### Trainticket at service level

		```
		python3.6 ./main.py --ns ts --level service

		pattern_ranker.py:622: -------- ts Fault numbuer : 45-------
		pattern_ranker.py:623: --------AS@1 Result-------
		pattern_ranker.py:624: 86.666667 %
		pattern_ranker.py:625: --------AS@3 Result-------
		pattern_ranker.py:626: 97.777778 %
		pattern_ranker.py:627: --------AS@5 Result-------
		pattern_ranker.py:628: 97.777778 %
		```

		#### Trainticket at inner service level

		```
		python3.6 ./main.py --ns ts --level inner

		pattern_ranker.py:622: -------- ts Fault numbuer : 45-------
		pattern_ranker.py:623: --------AIS@1 Result-------
		pattern_ranker.py:624: 86.666667 %
		pattern_ranker.py:625: --------AIS@3 Result-------
		pattern_ranker.py:626: 97.777778 %
		pattern_ranker.py:627: --------AIS@5 Result-------
		pattern_ranker.py:628: 97.777778 %
		```

		The details of service level results and inner-service level results will be printed and recorded in `./log`

		## Dataset

		[2022-08-22](./rca_data/2022-08-22/) and [2022-08-23](./rca_data/2022-08-23/) is the fault-suffering dataset of OnlineBoutique


		[2023-01-29](./rca_data/2023-01-29/) and [2023-01-30](./rca_data/2023-01-30/) is the fault-suffering dataset of Trainticket

		### Fault-free data

		[construct_data](./construct_data/) is the data of fault-free phase

		[root_cause_hipster.json](./construct_data/root_cause_hipster.json) is the inner-servie level label of root causes in OnlineBoutique

		[root_cause_ts.json](./construct_data/root_cause_ts.json) is the inner-servie level label of root causes in Trainticket

		As an example,

		```
		"checkoutservice": {
		"return": "Start charge card_Charge successfully",
		"exception": "Start charge card_Charge successfully",
		"network_delay": "NetworkP90(ms)",
		"cpu_contention": "CpuUsageRate(%)",
		"cpu_consumed": "CpuUsageRate(%)"
		},
		```

		The label of `checkoutservice` means that the label `return` fault of `checkoutservice` is core regions between log statement contains `Start charge card` and `Charge successfully`.


		### Fault-suffering Data

		[rca_data](./rca_data/) is the data of fault-suffering phase

		[2022-08-22-fault_list](./rca_data/2022-08-22-fault_list) and [2022-08-23-fault_list](./rca_data/2022-08-23-fault_list) is the servie level label of root causes in OnlineBoutique

		[2023-01-29-fault_list](./rca_data/2022-01-29-fault_list) and [2022-01-30-fault_list](./rca_data/2022-01-30-fault_list) is the servie level label of root causes in TrainTicket


		## Project Structure
		```
		.
		├── LICENSE
		├── README.md
		├── construct_data
		│ ├── 2022-08-22
		│ │ ├── log
		│ │ ├── metric
		│ │ ├── trace
		│ │ └── traceid
		│ ├── 2022-08-23
		│ ├── 2023-01-29
		│ ├── 2023-01-30
		│ ├── root_cause_hipster.json: label at inner-service level for OnlineBoutique
		│ └── root_cause_ts.json: label at inner-service level for ts
		├── rca_data
		│ ├── 2022-08-22
		│ │ ├── log
		│ │ ├── metric
		│ │ ├── trace
		│ │ ├── traceid
		│ │ └── 2022-08-22-fault_list.json: label at service level
		│ ├── 2022-08-23
		│ ├── 2023-01-29
		│ └── 2023-01-30
		├── log: RCA result
		├── log_template: drain3 config
		├── alarm.py: generate alarm
		├── data_integrate.py: transform metric, log, and trace to event graph
		├── log_parsing.py: parsing logs
		├── log.py: record logs
		├── pattern_miner.py: mine patterns from event graph
		├── pattern_ranker.py: rank suspicious patterns
		├── main.py: running nezha
		└── requirements.txt

		```

		## Reference
		Please cite our FSE'23 paper if you find this work is helpful.
		```
		@inproceedings{nezha,
		title={Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data},
		author={Yu, Guangba and Chen, Pengfei and Li, Yufeng and Chen, Hongyang and Li, Xiaoyun and Zheng, Zibin},
		booktitle={ESEC/FSE 2023},
		pages={},
		year={2023},
		organization={ACM}
		}
		```
		No newline at end of file