电机状态故障分类预测与研究
大约 6 分钟故障分类电机状态预测故障诊断
- date: 2020-12
- author:小知
- describe:
1、数据集处理
- 电机状态.txt数据集中最后一列是
电机状态
标签,其余列是特征; - 本数据集可以作为学习工业数据的分类算法使用
- (1)分析不同电机状态的特征分布情况;
- (2)建立分类模型体验分类算法的应用。
数据集下载:电机状态数据
import pandas as pd
DATA_PATH = '../database/电机状态.txt'
SAVE_PATH = '../database/motor.csv'
fp = open(DATA_PATH)
data = [i.strip().split(' ') for i in fp.readlines()]
df = pd.DataFrame(data)
df = df.rename(columns={48: "label"})
df.to_csv(SAVE_PATH, index=False)
2、导入模块
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn import metrics
from sklearn import tree
plt.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
3、配置参数
# 路径参数
DATA_PATH = '../database/motor.csv'
SAVE_PATH = ''
# 模型参数
model_params = {"criterion": "gini",
"splitter": "best",
"max_depth": None,
"min_samples_split": 2,
"min_samples_leaf": 1
}
4、加载数据集
df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head()
(58509, 49)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -3.014600e-07 | 8.260300e-06 | -0.000012 | -0.000002 | -1.438600e-06 | -0.000021 | 0.031718 | 0.031710 | 0.031721 | -0.032963 | ... | -0.63308 | 2.9646 | 8.1198 | -1.4961 | -1.4961 | -1.4961 | -1.4996 | -1.4996 | -1.4996 | 1 |
1 | 2.913200e-06 | -5.247700e-06 | 0.000003 | -0.000006 | 2.778900e-06 | -0.000004 | 0.030804 | 0.030810 | 0.030806 | -0.033520 | ... | -0.59314 | 7.6252 | 6.1690 | -1.4967 | -1.4967 | -1.4967 | -1.5005 | -1.5005 | -1.5005 | 1 |
2 | -2.951700e-06 | -3.184000e-06 | -0.000016 | -0.000001 | -1.575300e-06 | 0.000017 | 0.032877 | 0.032880 | 0.032896 | -0.029834 | ... | -0.63252 | 2.7784 | 5.3017 | -1.4983 | -1.4983 | -1.4982 | -1.4985 | -1.4985 | -1.4985 | 1 |
3 | -1.322600e-06 | 8.820100e-06 | -0.000016 | -0.000005 | -7.282900e-07 | 0.000004 | 0.029410 | 0.029401 | 0.029417 | -0.030156 | ... | -0.62289 | 6.5534 | 6.2606 | -1.4963 | -1.4963 | -1.4963 | -1.4975 | -1.4975 | -1.4976 | 1 |
4 | -6.836600e-08 | 5.666300e-07 | -0.000026 | -0.000006 | -7.940600e-07 | 0.000013 | 0.030119 | 0.030119 | 0.030145 | -0.031393 | ... | -0.63010 | 4.5155 | 9.5231 | -1.4958 | -1.4958 | -1.4958 | -1.4959 | -1.4959 | -1.4959 | 1 |
5 rows × 49 columns
5、数据集维度大小
df.shape
(58509, 49)
6、数据集信息
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58509 entries, 0 to 58508
Data columns (total 49 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 58509 non-null float64
1 1 58509 non-null float64
2 2 58509 non-null float64
3 3 58509 non-null float64
4 4 58509 non-null float64
5 5 58509 non-null float64
6 6 58509 non-null float64
7 7 58509 non-null float64
8 8 58509 non-null float64
9 9 58509 non-null float64
10 10 58509 non-null float64
11 11 58509 non-null float64
12 12 58509 non-null float64
13 13 58509 non-null float64
14 14 58509 non-null float64
15 15 58509 non-null float64
16 16 58509 non-null float64
17 17 58509 non-null float64
18 18 58509 non-null float64
19 19 58509 non-null float64
20 20 58509 non-null float64
21 21 58509 non-null float64
22 22 58509 non-null float64
23 23 58509 non-null float64
24 24 58509 non-null float64
25 25 58509 non-null float64
26 26 58509 non-null float64
27 27 58509 non-null float64
28 28 58509 non-null float64
29 29 58509 non-null float64
30 30 58509 non-null float64
31 31 58509 non-null float64
32 32 58509 non-null float64
33 33 58509 non-null float64
34 34 58509 non-null float64
35 35 58509 non-null float64
36 36 58509 non-null float64
37 37 58509 non-null float64
38 38 58509 non-null float64
39 39 58509 non-null float64
40 40 58509 non-null float64
41 41 58509 non-null float64
42 42 58509 non-null float64
43 43 58509 non-null float64
44 44 58509 non-null float64
45 45 58509 non-null float64
46 46 58509 non-null float64
47 47 58509 non-null float64
48 label 58509 non-null int64
dtypes: float64(48), int64(1)
memory usage: 21.9 MB
7、数据集缺失值情况
df.isnull().sum()
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
41 0
42 0
43 0
44 0
45 0
46 0
47 0
label 0
dtype: int64
8、数据是否均衡分布
label_counts = df['label'].value_counts().to_frame().reset_index()
plt.bar(label_counts['index'], label_counts['label'], label='各标签数量', color='r')
plt.legend()
9、划分训练集和验证集
X, Y = df.iloc[:, :-1], df['label'].values
x_train, x_valid, y_train, y_valid = train_test_split(X, Y, test_size=0.3)
print(x_train.shape, x_valid.shape)
(40956, 48) (17553, 48)
10、特征重要性分析
from sklearn import tree
m = tree.DecisionTreeClassifier()
m.fit(x_train, y_train)
imp = m.feature_importances_
print(">>> 特征重要性:", imp)
plt.figure(figsize=(16,4))
plt.bar(np.arange(len(imp)), imp, 0.9, label='FeatureImportances', color='r')
plt.xticks(np.arange(len(imp)), np.arange(1, len(imp)+1).astype(str), rotation=90)
plt.xlabel('Features')
plt.ylabel('Importances')
plt.legend()
>>> 特征重要性: [5.98440225e-04 3.08229107e-04 1.79175577e-04 1.38681841e-03
4.76279751e-04 1.64382030e-04 1.63175566e-01 1.42651633e-02
9.49281490e-02 2.27506770e-01 3.14778801e-02 2.20768156e-01
1.99832800e-02 1.64328257e-03 3.79598939e-04 4.09215028e-02
2.95897557e-04 6.24109633e-04 1.06046500e-02 6.31802097e-03
1.44326689e-02 1.74156237e-02 5.58853781e-03 6.33530587e-03
4.25448464e-02 2.51972101e-04 1.57817267e-04 1.22483584e-02
1.23145170e-04 1.46994902e-04 1.71503682e-02 2.49675915e-03
1.35631032e-03 8.04364163e-03 1.86114910e-02 3.98997444e-03
3.14444030e-03 2.12384190e-04 8.10758828e-05 3.74770769e-03
5.02345318e-04 2.75927645e-04 3.57242162e-04 9.54643152e-04
1.91576281e-03 5.65266378e-04 1.13399277e-03 2.10044738e-04]
11、建立电机状态预测模型
def train(model):
model.fit(x_train, y_train)
train_acc = model.score(x_train, y_train)
y_pred = model.predict(x_valid)
valid_acc = metrics.accuracy_score(y_valid, y_pred)
valid_mat = metrics.confusion_matrix(y_valid, y_pred)
valid_report = metrics.classification_report(y_valid, y_pred)
print(">>> 训练集准确率:{}".format(train_acc))
print(">>> 验证集准确率:{}".format(valid_acc))
print(">>> 验证集混淆矩阵:\n{}".format(valid_mat))
print(">>> 验证集分类评价:\n{}".format(valid_report))
print(">>> 训练和评估完毕....")
return model
12、电机模型性能评估
dc = tree.DecisionTreeClassifier(**model_params)
model = train(model=dc)
>>> 训练集准确率:1.0
>>> 验证集准确率:0.9835355779638808
>>> 验证集混淆矩阵:
[[1542 0 0 0 0 26 0 0 8 0 0]
[ 0 1630 0 0 0 0 0 0 0 38 0]
[ 1 0 1548 2 14 0 0 1 1 0 0]
[ 0 0 3 1572 3 0 2 3 0 0 0]
[ 0 1 16 4 1523 0 0 29 0 0 0]
[ 27 0 3 0 0 1552 0 0 17 0 0]
[ 0 0 0 2 0 0 1618 0 0 0 0]
[ 0 0 0 4 24 0 0 1538 0 0 0]
[ 4 2 0 0 1 30 0 0 1595 3 0]
[ 0 20 0 0 0 0 0 0 0 1618 0]
[ 0 0 0 0 0 0 0 0 0 0 1528]]
>>> 验证集分类评价:
precision recall f1-score support
1 0.98 0.98 0.98 1576
2 0.99 0.98 0.98 1668
3 0.99 0.99 0.99 1567
4 0.99 0.99 0.99 1583
5 0.97 0.97 0.97 1573
6 0.97 0.97 0.97 1599
7 1.00 1.00 1.00 1620
8 0.98 0.98 0.98 1566
9 0.98 0.98 0.98 1635
10 0.98 0.99 0.98 1638
11 1.00 1.00 1.00 1528
accuracy 0.98 17553
macro avg 0.98 0.98 0.98 17553
weighted avg 0.98 0.98 0.98 17553
>>> 训练和评估完毕....