Page 1 of 1

After writing code please also send working screenshots This the txt file data sample:- -3.633271721169406110e+00 -2.749

Posted: Thu May 05, 2022 12:43 pm
by answerhappygod
After writing code please also send working
screenshots
This the txt file data sample:-
-3.633271721169406110e+00 -2.749418531032250979e+00
-2.976795656232069209e+00 -3.130130619095286004e+00
-4.046241035806734665e+00 -2.413595797540917687e+00
Purpose of this program is that use
this preliminary dataset that contains the geographical (2D)
locations of habitual pizza eaters. Based on
this information we have to identify the clusters of potential
customers and their location centers by implementing the k-means
(see k-means algorithm on Wikipedia website) clustering algorithm.
For that follow, these patterns
start code with these modules.
These are the steps:
import numpy as np
import matplotlib.pyplot as plt
1. Load and visualize the
dataset: Compute a new 1-dimensional integer array of
n elements, called assigment, which describes the customer to
cluster assignment. The assignment is decided on the geographical
distances. For each customer store the integer index of the
cluster, whose current location center (see centroids) is the
closest to the customer. Do not use explicit for loops.
2.Determine the number of optimal
clusters: Use your human intuition based on the
scatter plot above to decide how many clusters of customers you
want to identify. This is the only step which relies on human
brainpower (in practice, this can be automated as well, but that is
beyond the scope of the assignment). Set variable k to the number
of desired clusters.
3.Initialize the cluster centers: The
algorithm will keep track (and update) the centers of each cluster
and will assign each customer to exactly one cluster based on the
distances of the customer and centers. For the initial locations of
the cluster centers pick k number of random of customer locations
and store this array in a new variable, called centroids. Verify,
that the shape os this array is (k, 2).
- Verify this part using this code.
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Initial cluster assignment");
4.Update the cluster centers: Compute the
updated location of the cluster centers. Based on the cluster
assignment, compute the mean location of the customers in each
cluster. This is going to be the new/updated location of the
cluster center (centroids).
-Verify this output of this part using this
code:
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Initial cluster centers");
5.Iterative optimization: Based on steps 4
(you can use copy & paste), implement a loop which iteratively
updates the cluster assignment and the cluster centers as long as
there is some change in the cluster assignment.
-Verify this part using this code
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Final clusters");
The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns
some modules for this code:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
1. Load and visualize the dataset
Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0].
2.Determine the number of optimal clusters
Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters.
3.Initialize the cluster centers
The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).
4.Initialize the cluster assignment
Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops.
Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers.
5.Verify the cluster assignment
Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet.
This is the code to verify your code----
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Initial cluster assignment");
6.Update the cluster centers----
Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids).
Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster.
7.Verify the updated cluster centers
Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers.
This is the code to verify this
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Initial cluster centers");
8.Iterative optimization
Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment.
9.Verify the final clusters
Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result.
This is the code to verify that
plt.scatter(customers[:, 0], customers[:, 1], c=assignment)
plt.scatter(centroids[:, 0], centroids[:, 1], c="red")
plt.title("Final clusters");
File Edit View -3.633271721169406110e+00 -2.749418531032250979e+00 -2.976795656232069209e+00 -3.130130619095286004e+00 -4.046241035806734665e+00 -2.413595797540917687e+00 -3.183621547459694057e+00 -2.977639683920154212e+00 -3.904108587053154888e+00 -2.951091432822544380e+00 -4.071339081664882897e+00 -2.115075173563672806e+00 -3.447559215925898091e+00 -2.980188866448493457e+00 -4.287105999033476778e+00 -2.585069959723080846e+00 -4.814356372871987588e+00 -3.429133604769490695e+00 -4.608749571748512963e+00 -2.559789061555480583e+00 -3.284028185085224205e+00 -2.575397699107908611e+00 -4.107026100720188033e+00 -2.312489516765867670e+00 -3.745452631681100986e+00 -2.272984902724009437e+00 -4.498589231871656047e+00 -1.940104407638526318e+00 -2.904279496557304885e+00 -2.600874593005827240e+00 -4.084467024266913882e+00 -2.862792504061023369e+00 -3.271673844947790677e+00 -3.413581559927078679e+00 -3.931534472019772686e+00 -3.212371884819077206e+00 -3.466785710627703132e+00 -3.138573013614168961e+00 -3.612295593413866079e+00 -3.023156182750712961e+00 -3.727258144722126687e+00 -2.848107858858544450e+00 -3.953124565531799917e+00 -3.224786800329186853e+00 -4.705704394830918957e+00 -3.262516703252164696e+00 -3.460736643741590512e+00 -1.986497950173193416e+00 -4.009482121549115874e+00 -3.203895027118378813e+00 -3.920261751360571534e+00 -2.531711316055035965e+00 -3.629803635077848867e+00 -2.968804447175029892e+00 -3.891567316336832505e+00 -2.812282510113611522e+00 -3.748114007276682536e+00 -2.379283960750413485e+00 -3.948081672063044056e+00 -3.535852976432075412e+00 -3.363040524165844758e+00 -3.056029241842494582e+00 -3.138654804442309043e+00 -2.827927829381158720e+00 -3.513565383118311125e+00 -2.262256857668634033