电机状态故障分类预测与研究

大约 6 分钟故障分类电机状态预测故障诊断

- date: 2020-12
- author:小知
- describe:

1、数据集处理

  • 电机状态.txt数据集中最后一列是电机状态标签,其余列是特征;
  • 本数据集可以作为学习工业数据的分类算法使用
    • (1)分析不同电机状态的特征分布情况;
    • (2)建立分类模型体验分类算法的应用。

数据集下载:电机状态数据

import pandas as pd


DATA_PATH = '../database/电机状态.txt'
SAVE_PATH = '../database/motor.csv'

fp = open(DATA_PATH)
data = [i.strip().split(' ') for i in fp.readlines()]
df = pd.DataFrame(data)
df = df.rename(columns={48: "label"})
df.to_csv(SAVE_PATH, index=False)

2、导入模块

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn import metrics
from sklearn import tree

plt.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False

3、配置参数

# 路径参数
DATA_PATH = '../database/motor.csv'
SAVE_PATH = ''

# 模型参数
model_params = {"criterion": "gini", 
          "splitter": "best", 
          "max_depth": None, 
          "min_samples_split": 2, 
          "min_samples_leaf": 1
}

4、加载数据集

df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head()

(58509, 49)

0123456789...394041424344454647label
0-3.014600e-078.260300e-06-0.000012-0.000002-1.438600e-06-0.0000210.0317180.0317100.031721-0.032963...-0.633082.96468.1198-1.4961-1.4961-1.4961-1.4996-1.4996-1.49961
12.913200e-06-5.247700e-060.000003-0.0000062.778900e-06-0.0000040.0308040.0308100.030806-0.033520...-0.593147.62526.1690-1.4967-1.4967-1.4967-1.5005-1.5005-1.50051
2-2.951700e-06-3.184000e-06-0.000016-0.000001-1.575300e-060.0000170.0328770.0328800.032896-0.029834...-0.632522.77845.3017-1.4983-1.4983-1.4982-1.4985-1.4985-1.49851
3-1.322600e-068.820100e-06-0.000016-0.000005-7.282900e-070.0000040.0294100.0294010.029417-0.030156...-0.622896.55346.2606-1.4963-1.4963-1.4963-1.4975-1.4975-1.49761
4-6.836600e-085.666300e-07-0.000026-0.000006-7.940600e-070.0000130.0301190.0301190.030145-0.031393...-0.630104.51559.5231-1.4958-1.4958-1.4958-1.4959-1.4959-1.49591

5 rows × 49 columns

5、数据集维度大小

df.shape

(58509, 49)

6、数据集信息

df.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 58509 entries, 0 to 58508
    Data columns (total 49 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   0       58509 non-null  float64
     1   1       58509 non-null  float64
     2   2       58509 non-null  float64
     3   3       58509 non-null  float64
     4   4       58509 non-null  float64
     5   5       58509 non-null  float64
     6   6       58509 non-null  float64
     7   7       58509 non-null  float64
     8   8       58509 non-null  float64
     9   9       58509 non-null  float64
     10  10      58509 non-null  float64
     11  11      58509 non-null  float64
     12  12      58509 non-null  float64
     13  13      58509 non-null  float64
     14  14      58509 non-null  float64
     15  15      58509 non-null  float64
     16  16      58509 non-null  float64
     17  17      58509 non-null  float64
     18  18      58509 non-null  float64
     19  19      58509 non-null  float64
     20  20      58509 non-null  float64
     21  21      58509 non-null  float64
     22  22      58509 non-null  float64
     23  23      58509 non-null  float64
     24  24      58509 non-null  float64
     25  25      58509 non-null  float64
     26  26      58509 non-null  float64
     27  27      58509 non-null  float64
     28  28      58509 non-null  float64
     29  29      58509 non-null  float64
     30  30      58509 non-null  float64
     31  31      58509 non-null  float64
     32  32      58509 non-null  float64
     33  33      58509 non-null  float64
     34  34      58509 non-null  float64
     35  35      58509 non-null  float64
     36  36      58509 non-null  float64
     37  37      58509 non-null  float64
     38  38      58509 non-null  float64
     39  39      58509 non-null  float64
     40  40      58509 non-null  float64
     41  41      58509 non-null  float64
     42  42      58509 non-null  float64
     43  43      58509 non-null  float64
     44  44      58509 non-null  float64
     45  45      58509 non-null  float64
     46  46      58509 non-null  float64
     47  47      58509 non-null  float64
     48  label   58509 non-null  int64  
    dtypes: float64(48), int64(1)
    memory usage: 21.9 MB

7、数据集缺失值情况

df.isnull().sum()
    0        0
    1        0
    2        0
    3        0
    4        0
    5        0
    6        0
    7        0
    8        0
    9        0
    10       0
    11       0
    12       0
    13       0
    14       0
    15       0
    16       0
    17       0
    18       0
    19       0
    20       0
    21       0
    22       0
    23       0
    24       0
    25       0
    26       0
    27       0
    28       0
    29       0
    30       0
    31       0
    32       0
    33       0
    34       0
    35       0
    36       0
    37       0
    38       0
    39       0
    40       0
    41       0
    42       0
    43       0
    44       0
    45       0
    46       0
    47       0
    label    0
    dtype: int64

8、数据是否均衡分布

label_counts = df['label'].value_counts().to_frame().reset_index()
plt.bar(label_counts['index'], label_counts['label'], label='各标签数量', color='r')
plt.legend()

9、划分训练集和验证集

X, Y = df.iloc[:, :-1], df['label'].values
x_train, x_valid, y_train, y_valid = train_test_split(X, Y, test_size=0.3)
print(x_train.shape, x_valid.shape)

(40956, 48) (17553, 48)

10、特征重要性分析

from sklearn import tree
m = tree.DecisionTreeClassifier()
m.fit(x_train, y_train)
imp = m.feature_importances_
print(">>> 特征重要性:", imp)
plt.figure(figsize=(16,4))
plt.bar(np.arange(len(imp)), imp, 0.9, label='FeatureImportances', color='r')
plt.xticks(np.arange(len(imp)), np.arange(1, len(imp)+1).astype(str), rotation=90)
plt.xlabel('Features')
plt.ylabel('Importances')
plt.legend()
    >>> 特征重要性: [5.98440225e-04 3.08229107e-04 1.79175577e-04 1.38681841e-03
     4.76279751e-04 1.64382030e-04 1.63175566e-01 1.42651633e-02
     9.49281490e-02 2.27506770e-01 3.14778801e-02 2.20768156e-01
     1.99832800e-02 1.64328257e-03 3.79598939e-04 4.09215028e-02
     2.95897557e-04 6.24109633e-04 1.06046500e-02 6.31802097e-03
     1.44326689e-02 1.74156237e-02 5.58853781e-03 6.33530587e-03
     4.25448464e-02 2.51972101e-04 1.57817267e-04 1.22483584e-02
     1.23145170e-04 1.46994902e-04 1.71503682e-02 2.49675915e-03
     1.35631032e-03 8.04364163e-03 1.86114910e-02 3.98997444e-03
     3.14444030e-03 2.12384190e-04 8.10758828e-05 3.74770769e-03
     5.02345318e-04 2.75927645e-04 3.57242162e-04 9.54643152e-04
     1.91576281e-03 5.65266378e-04 1.13399277e-03 2.10044738e-04]

11、建立电机状态预测模型

def train(model):
    model.fit(x_train, y_train)
    train_acc = model.score(x_train, y_train)
    y_pred = model.predict(x_valid)
    valid_acc = metrics.accuracy_score(y_valid, y_pred)
    valid_mat = metrics.confusion_matrix(y_valid, y_pred)
    valid_report = metrics.classification_report(y_valid, y_pred)
    print(">>> 训练集准确率:{}".format(train_acc))
    print(">>> 验证集准确率:{}".format(valid_acc))
    print(">>> 验证集混淆矩阵:\n{}".format(valid_mat))
    print(">>> 验证集分类评价:\n{}".format(valid_report))
    print(">>> 训练和评估完毕....")
    return model
    

12、电机模型性能评估

dc = tree.DecisionTreeClassifier(**model_params)
model = train(model=dc)
    >>> 训练集准确率:1.0
    >>> 验证集准确率:0.9835355779638808
    >>> 验证集混淆矩阵:
    [[1542    0    0    0    0   26    0    0    8    0    0]
     [   0 1630    0    0    0    0    0    0    0   38    0]
     [   1    0 1548    2   14    0    0    1    1    0    0]
     [   0    0    3 1572    3    0    2    3    0    0    0]
     [   0    1   16    4 1523    0    0   29    0    0    0]
     [  27    0    3    0    0 1552    0    0   17    0    0]
     [   0    0    0    2    0    0 1618    0    0    0    0]
     [   0    0    0    4   24    0    0 1538    0    0    0]
     [   4    2    0    0    1   30    0    0 1595    3    0]
     [   0   20    0    0    0    0    0    0    0 1618    0]
     [   0    0    0    0    0    0    0    0    0    0 1528]]
    >>> 验证集分类评价:
                  precision    recall  f1-score   support
    
               1       0.98      0.98      0.98      1576
               2       0.99      0.98      0.98      1668
               3       0.99      0.99      0.99      1567
               4       0.99      0.99      0.99      1583
               5       0.97      0.97      0.97      1573
               6       0.97      0.97      0.97      1599
               7       1.00      1.00      1.00      1620
               8       0.98      0.98      0.98      1566
               9       0.98      0.98      0.98      1635
              10       0.98      0.99      0.98      1638
              11       1.00      1.00      1.00      1528
    
        accuracy                           0.98     17553
       macro avg       0.98      0.98      0.98     17553
    weighted avg       0.98      0.98      0.98     17553
    
    >>> 训练和评估完毕....