Page 1 of 1

PYTHON CODING! Need help on the parts that say "YOUR CODE HERE"

Posted: Mon Jun 06, 2022 1:27 pm
by answerhappygod
PYTHON CODING! Need help on the parts that say "YOUR CODE HERE"
Python Coding Need Help On The Parts That Say Your Code Here 1
Python Coding Need Help On The Parts That Say Your Code Here 1 (95.48 KiB) Viewed 57 times
Task 2. Logistic regression ¶ On second task logistic regression classifiers to predict the risk of developing diabetes in women of the Pima Native American tribe is created. The data used in the task have been collected from women over the age of 21 in the Pima Native American tribe by measuring various variables that correlate with diabetes. The variables used are number of pregnancies, fasting plasma glucose, blood pressure, skin thickness, blood insulin, body mass index, coefficient calculated from the incidence of diabetes in the family, and age. The data include 500 patients with diabetes and 268 healthy patients. Download data and evaluate distribution of variables between diabetic and healthy patients using box-segment patterns and histograms. In the box-segment pattern, the yellow line indicates the location of the median, the top and bottom of the box the median of the upper and lower quarters, and the maximum and minimum value of the segment data that do not contain outliers. Deviating observations are marked in the box-segment pattern with circles. In [ ] # Load data pima-indians. Variable data refers to data's values and variable classes refers to data's class information data = np.genfromtxt('data/pima-indians.txt', delimiter=', ', usecols = (0,1,2,3,4,5,6,7)) classes = np.genfromtxt('data/pima-indians.txt', delimiter=',', usecols = (8)) data_diabetes = data[np.where(classes == 1)] data_no_diabetes = data[np.where(classes == 0)] # From the box-segment patterns analyze which variables correlate best with the class information headers = ['Ammount of pregnancies', 'Ammount of fasting plasma glucose in blood', 'Blood pressure', 'Skin thickness', 'Ammount of in fig, ax = plt.subplots (2, 4) for i in range (2): for j in range (4): ax[i, j].boxplot([data_diabetes [:,j*2+i], data_no_diabetes [:,j*2+i]], positions=[0,1]) ax[i, j].set_title(headers [j*2+i]) ax[i, j].axes.set_xticklabels(['diabetes', 'no diabetes']) plt.suptitle('Box-segment patterns of both classes for data variables ) fig, ax = plt.subplots (2, 4) for i in range (2): for j in range (4): ax[i, j].hist([data_diabetes [:,j*2+i], data_no_diabetes[:,j*2+i]], histtype='stepfilled', alpha=0.8, bins-40, label=['diat ax[i, j].legend (loc='upper right') ax[i, j].set_title(headers [j*2+1]) plt.suptitle('Histograms of both classes for data variables') plt.show() Based on the box-segment patterns and histograms, the two variables that best correlate with suffering from diabetes are selected. The most obvious differences in the distributions of the categories "diabetes" and "non-diabetes" differ in the variable fasting plasma glucose. This is seen as for example in the box-segment pattern of a variable as clearly separate boxes for data classes. When choosing the second variable the differences between the categories are not as clear but we end up with the variable weight index. Next, the data is divided into teaching data and test data in a 60:40 division ratio. The logistic regression classifiers for the selected variables individually and for their combination are then taught. Finally, the classification results of the three taught classifiers are compared.

An example of a logistic regression classifier for fasting plasma glucose is taught. Draw the created model and place the test data samples on the same graph. Finally, the classification accuracy is calculated for the test data. Classification accuracy simply tells you what percentage of the test data samples the classifier classified into the correct categories. In [ ]: # Divide data to teaching data and test data data_teaching, data_test, classes_teaching, classes_test = train_test_split(data, classes, test_size=0.4, random_state=0) # Convert teaching data for fasting plasma glucose in right format fasting_plasma_glucose_teaching = np.reshape(data_teaching[:,1], (-1,1)) # Convert test data for fasting plasma glucose in right format fasting_plasma_glucose_test = np.reshape(data_test[:,1], (-1,1)) # Initialize the classifier classifier_fasting_plasma_glucose = LogisticRegression () # Teach Logistic regression classifier for fasting plasma glucose classifier_fasting_plasma_glucose.fit(fasting_plasma_glucose_teaching, classes_teaching) # Predict classes for test data samples classes_fasting_plasma_glucose_predicted = classifier_fasting_plasma_glucose.predict(fasting_plasma_glucose_test) # Calculate classification accuracy for fasting plasma glucose's Logistic regression classifier classification_accuracy_fasting_plasma_glucose = accuracy_score (classes_test, classes_fasting_plasma_glucose_predicted) def logistic_sigmoid_function(X, a, b): In this function logistic regression's classifier is calculated according to sigmoid function return 1/(1+np.exp(-(a*X+b))) # Draw fitted function and place test data samples to same graph plt.plot(fasting_plasma_glucose_test, classes_test, 'o', c='r', alpha=0.5) X_fasting_plasma_glucose = np.linspace(min (fasting_plasma_glucose_teaching), max(fasting_plasma_glucose_teaching)+50) Y_fasting_plasma_glucose = logistic_sigmoid_function(X_fasting_plasma_glucose, classifier_fasting_plasma_glucose.coef_, classifi plt.plot(X_fasting_plasma_glucose, Y_fasting_plasma_glucose, color='k', linewidth=3) plt.xlabel('Ammount fasting plasma glucose in blood') plt.ylabel('Likelihood of illness P(Y="diabetes")') plt.show() print('The classification accuracy for the fasting plasma glucose logistic regression classifier is {}%' .format(round(100*classit def sign (number): In this function we get sign of equation parameter for printing if number >= 0: return '+' return '_'

# Print the Logistic equation of the Logistic regression classifier for fasting plasma glucose print('\nLogistic function: P(Y="diabetes")= 1/(1+exp^(-({} x {} {})))'.format(round(classifier_fasting_plasma_glucose.coef_[0][ In the same way teach a logistic regression classifier for a weight index, predict classes for test data samples, and calculate classification accuracy. In [ ] # Convert teaching data for weight index in right format weight_index_teaching = np.reshape(data_teaching[:,5], (-1,1)) # Convert test data for weight index in right format weight_index_test = np.reshape(data_test[:,5], (-1,1)) # Initialize classifier classifier_weight_index = LogisticRegression () #-------- Your code here # Teach Logistic regression classifier for weight index (Hint: .fit(teaching_data, classes_teaching_data)) # Predict classes for test data samples (Hint: .predict(testdata)) classes_weight_index_predicted = # Calculate classification accuracy for weight index's Logistic regression classifier (Hint: accuracy_score(classes_test_data, cl classification_accuracy_weight_index = # Draw fitted function and place test data samples to same graph plt.plot(weight_index_test, classes_test, 'o', c='r', alpha=0.5) X_weight_index = np.linspace(min (weight_index_teaching), max(weight_index_teaching)+10) Y_weight_index = logistic_sigmoid_function (X_weight_index, classifier_weight_index.coef_, classifier_weight_index.intercept_).rav plt.plot(X_weight_index, Y_weight_index, color='k', linewidth=3) plt.xlabel('Weight index') plt.ylabel('Likelihood of illness P(Y="diabetes")') plt.show() print('The classification accuracy for the weight index logistic regression classifier is {}%' .format(round(100*classification_a # Print the Logistic equation of the Logistic regression classifier for weight index print("\nLogistic function: P(Y="diabetes") = 1/(1+exp^(-({} x {} {})))'.format(round(classifier_weight_index.coef_[0][0],4), sig A Finally, teach the logistic regression classifier of the two variables (fasting plasma glucose and body mass index), predict the categories of test data for the samples, and calculate the classification accuracy. Combine both variables in teaching data and test data. In [ ] # Combine both variables in teaching data combined_teaching = np.concatenate ((fasting_plasma_glucose_teaching, weight_index_teaching), axis=1) # combine both variables in test data Yhdistetään testidataan molemmat muuttujat combined_test = np. concatenate((fasting_plasma_glucose_test, weight_index_test), axis=1) #.

# Print the Logistic equation of the Logistic regression classifier for fasting plasma glucose print('\nLogistic function: P(Y="diabetes")= 1/(1+exp^(-({} x {} {})))'.format(round(classifier_fasting_plasma_glucose.coef_[0][ In the same way teach a logistic regression classifier for a weight index, predict classes for test data samples, and calculate classification accuracy. In [ ] # Convert teaching data for weight index in right format weight_index_teaching = np.reshape(data_teaching[:,5], (-1,1)) # Convert test data for weight index in right format weight_index_test = np.reshape(data_test[:,5], (-1,1)) # Initialize classifier classifier_weight_index = LogisticRegression () #-------- Your code here # Teach Logistic regression classifier for weight index (Hint: .fit(teaching_data, classes_teaching_data)) # Predict classes for test data samples (Hint: .predict(testdata)) classes_weight_index_predicted = # Calculate classification accuracy for weight index's Logistic regression classifier (Hint: accuracy_score(classes_test_data, cl classification_accuracy_weight_index = # Draw fitted function and place test data samples to same graph plt.plot(weight_index_test, classes_test, 'o', c='r', alpha=0.5) X_weight_index = np.linspace(min (weight_index_teaching), max(weight_index_teaching)+10) Y_weight_index = logistic_sigmoid_function (X_weight_index, classifier_weight_index.coef_, classifier_weight_index.intercept_).rav plt.plot(X_weight_index, Y_weight_index, color='k', linewidth=3) plt.xlabel('Weight index') plt.ylabel('Likelihood of illness P(Y="diabetes")') plt.show() print('The classification accuracy for the weight index logistic regression classifier is {}%' .format(round(100*classification_a # Print the Logistic equation of the Logistic regression classifier for weight index print("\nLogistic function: P(Y="diabetes") = 1/(1+exp^(-({} x {} {})))'.format(round(classifier_weight_index.coef_[0][0],4), sig A Finally, teach the logistic regression classifier of the two variables (fasting plasma glucose and body mass index), predict the categories of test data for the samples, and calculate the classification accuracy. Combine both variables in teaching data and test data. In [ ] # Combine both variables in teaching data combined_teaching = np.concatenate ((fasting_plasma_glucose_teaching, weight_index_teaching), axis=1) # combine both variables in test data Yhdistetään testidataan molemmat muuttujat combined_test = np. concatenate((fasting_plasma_glucose_test, weight_index_test), axis=1) #.

# Initialize classifier classifier_combined = LogisticRegression() #-------- Your code here ------ # Teach Logistic regression classifier for combined data (Hint: .fit(teaching_data, classes_teaching_data)) # Predict classes for test data samples (Hint: .predict(testdata)) classes_combined_predicted = # Calculate classification accuracy for combined data's Logistic regression classifier (Hint: accuracy_score (classes_test_data, c classification_accuracy_combined #- print('The classification accuracy for the two-variable logistic regression classifier is {}%'.format(round(100*classification_a # Print the Logistic equation of the Logistic regression classifier for combined data print('\nLogistic function: P(Y="diabetes")= 1/(1+exp^(-({} x1 {} {} x2 {} {})))'.format(round(classifier_combined.coef_[0][0],4

Visualize two-dimensional test data and the class boundary of a two-variable classifier into a two-dimensional coordinate system. In [ ]: # Place samples of test data on a graph for diabetic and healthy patients 1], s=10, label='diabetes') 1], s-10, label='no diabetes') plt.scatter (combined_test[np.where(classes_test==1), 0], combined_test[np.where(classes_test==1), plt.scatter (combined_test[np.where(classes_test==0), 0], combined_test[np.where(classes_test==0), plt.title('Fasting plasma glucose and body mass index in diabetic and healthy patients ') plt.xlabel('Ammount of fasting plasma glucose in blood") plt.ylabel('Weight index') plt.legend() def draw_classifier_class_bound (classifier, X): In this function the class boundary of the classifier is drawn on a two-dimensional graph x_min, x_max = X[:,0].min(), X[:,0].max() y_min, y_max = X[:,1].min(), X[:,1].max() xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.2), np.arange (y_min, y_max, 0.1)) Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape (xx.shape) plt.contour(xx, yy, Z, colors= 'k', linewidths-0.7) draw_classifier_class_bound (classifier_combined, combined_test) plt.show() When a woman from the Pima Indian family who comes to the reception has a body mass index of 25 and a fasting plasma glucose level of 180 mg / dL, what is the probability that the patient has diabetes? Use the two-variable logistic regression classifier classifier_connected. predict_proba() function to predict the posterior probabilities of classes. In [ ]: patient_data = [[180,25]] #-------- Your code here posterior_probability = print('The patient has a { } % posterior probability of developing diabetes'.format(round(100*posterior_probability [0][1], 2)))