If you have suprevised data in EXCEL and you want to do machine learning on that data as given in the below link
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
u need to convert the excel data into a format required in the link X, y arrays.
here in the link X is a 2 dimensional numpy float array and y is one dimensional numpy array
For ex: if u have data as shown below
fishlength fishwidth classlabel
1.2 1.4 1
1.5 4.3 2
We require X=array([[ 1.2 , 1.4],[ 1.5,4.3]]) and y=array([1,2])
inorder to convert that data into the above format as required in the link
In the excel sheets remove the headers (fishlength ,fishwidth, classlabel) and cut the classlabel entire column and save it as .csv(comma delimited ) in the SAVE AS dialog box (for ex: here the filename is finaldata.csv)
and paste the classlabel column in seperate excel workbook using PasteSpecial and choosing transpose option in PasteSpecial.
Save this workbook with .csv(comma delimited ) in the SAVE AS dialog box(for ex: here the filename is label.csv)
import numpy as np
f=open("finaldata.csv")
X=np.genfromtxt(f,delimiter=",")
f.close()
f=open('label.csv')
csv=np.genfromtxt(f,delimiter=",")
y=csv.astype(int)
f.close()
n_classes = y.shape[1]#shape[1] gives columns (3 here) and y.shape[0] gives rows
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5)#50% are taken as training data(X_train and those labels as y_train)
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True))#creating classifier
y_score = classifier.fit(X_train, y_train).decision_function(X_test)#ysocre=array([[ -2.49503189e+00, 4.13933465e-01, 9.99997811e-01],...X_test value belongs to class 3
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])#y_test has actual label information and y_score is predicted fpr=false positive rate
roc_auc[i] = auc(fpr[i], tpr[i])
# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
u need to convert the excel data into a format required in the link X, y arrays.
here in the link X is a 2 dimensional numpy float array and y is one dimensional numpy array
For ex: if u have data as shown below
fishlength fishwidth classlabel
1.2 1.4 1
1.5 4.3 2
We require X=array([[ 1.2 , 1.4],[ 1.5,4.3]]) and y=array([1,2])
inorder to convert that data into the above format as required in the link
In the excel sheets remove the headers (fishlength ,fishwidth, classlabel) and cut the classlabel entire column and save it as .csv(comma delimited ) in the SAVE AS dialog box (for ex: here the filename is finaldata.csv)
Save this workbook with .csv(comma delimited ) in the SAVE AS dialog box(for ex: here the filename is label.csv)
type the below code:
import csvimport numpy as np
f=open("finaldata.csv")
X=np.genfromtxt(f,delimiter=",")
f.close()
f=open('label.csv')
csv=np.genfromtxt(f,delimiter=",")
y=csv.astype(int)
f.close()
#the below code is copied and pasted by removing nosiy features in above link
y = label_binarize(y, classes=[1, 2, 3])#array([[1, 0, 0],[1, 0, 0],...n_classes = y.shape[1]#shape[1] gives columns (3 here) and y.shape[0] gives rows
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5)#50% are taken as training data(X_train and those labels as y_train)
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True))#creating classifier
y_score = classifier.fit(X_train, y_train).decision_function(X_test)#ysocre=array([[ -2.49503189e+00, 4.13933465e-01, 9.99997811e-01],...X_test value belongs to class 3
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])#y_test has actual label information and y_score is predicted fpr=false positive rate
roc_auc[i] = auc(fpr[i], tpr[i])
# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()
No comments:
Post a Comment