学习sklearn库,进行数据的生成、分组、回归及算法性能分析。
步骤
数据生成
1
dataset = datasets.make_classification(n_samples=1000, n_features=10,n_informative=2, n_redundant=2, n_repeated=0, n_classes=2)
数据分组
1
2
3
4
5kf = cross_validation.KFold(len(data), n_folds=10, shuffle=True)
for train_index, test_index in kf:
X_train, y_train = data[train_index], target[train_index]
X_test, y_test = data[test_index], target[test_index]对于每组数据,进行回归:
Gaussian Navie Bayes:
1
2
3clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)SVC:
1
2
3clf = SVC(C=1, kernel='rbf')
clf.fit(X_train, y_train)
pred = clf.predict(X_test)Random Forest:
1
2
3clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
数据分析:
accuracy:
1
metrics.accuracy_score(y_test, pred)
F1-score:
1
metrics.f1_score(y_test, pred)
AUC ROC:
1
metrics.roc_auc_score(y_test, pred)
总代码
1 | from sklearn import datasets, cross_validation |
结果
1 | acc: |