**** Please keep it simple and let me know where to paste each section of code. Thank you!**** Overview TargetCRM is a C
Posted: Tue Jul 12, 2022 8:17 am
**** Please keep it simple and let me know where topaste each section of code. Thank you!****
Overview
TargetCRM is a CRM software company that is currently lookingfor a way to improve the architecture of its solution. The softwaresends emails to registered users every week and today the systemdoes not have a proper, structured way to persist the mailing listdata, so the data is stored in in-memory data structures likedictionaries and lists. As working with data structures likedictionaries in Python might be memory (resource) consuming and notfailure-proof (that is, any hardware/software failure might leadthe data to be lost forever if the only place we have them is inmemory), we need to move the project to a more robust, structuredway of storing data.
What you need to know, part 1
Files are a great way to persist data, especially small tomid-size datasets. As a company effort to create a more robustinfrastructure to store the email addresses data, they asked you topersist the user emails into csv files. CSV stands forcomma-separated values. As the name implies, csv files store datawith a comma character separating each field in the data file. Thecsv format was chosen because of the flexibility of working withthis type of file and the great support from analytic tools, suchas Excel, and, becoming more popular in recent years, BI tools. Asdata generated by the users grow, the company wants to use thisdata in the future to analyze it and get informative data fromit.
Another facility to work with files is that we can distributethem in the company’s local environment to work in parallel. So, ifthe mailing list database increases in a pace that we cannot storeevery user in a single file, we can distribute the users amongseveral csv files and process files separately, or in batches, andthen gather them together when we need to perform some sort ofanalysis. This sort of capability is very desirable in today's realapplications as the data generated by online users is increasing ata very fast pace.
The company is shifting from an in-memory persistence to a morerobust strategy. To justify the investment on this newarchitecture, they will create a local server infrastructure tohold the data files. As csv files can take any number of columns(as long as they are comma-separated), it is very important thatyour algorithm writes a standardized data format for the case thereare multiple csv files and there is a need to combine them in thefuture. Consider the scalability of your solution: if you havethousands of active users, you need a scalable architecture tohandle this problem as the data gets larger. Therefore, if all thedata is not viable to be stored in a single file, we will split theuser data to be stored in multiple files.
There is a problem with multiple files, however. Maintaining theintegrity of the dataset is difficult. So, in order to make theintegration easy, make sure that your files follow the samestructure (for this assessment), otherwise the system couldbreak.
TargetCRM asked you to implement a mechanism to read, process,and persist the data for its platform. For this task, you will beworking with file operations. You will need to open the raw mailinglist, saved in a csv file, filter the users that have beenunsubscribed, and print back the resulting mailing list to anothercsv file.
TargetCRM is a fictitious CRM software company especiallycreated for the purpose of this code assessment.
What you need to know, part 2
In this assessment, you will need to read the mailing list fromthe sample file shown in Figure 8.1, update themailing list by filtering out the users with the flag differentfrom "active", and write the results back to another csv file.
To update the mailing list, there are some requirements:
You will create auser-defined mailinglist_validation_util() function thatreceives the following parameters:
Then you will save the output, that is, the ids of the activeusers, into the resulting csv file
Input:
Output:
In order for you to start this activity, we will provide afunction template that can be followed to solve this challenge. Thecode block below is a sample way that this project could bestructured.
To make this challenge code more readable, we can divide eachtask into a separate function: one function to read the raw file, asecond to write back the results and a third one to call the othertwo.
Finally, create a python module to encapsulate the functions youjust created in order to make this code reusable by other files oreven other python projects.
In order for you to start this assessment, we will provide afunction template that can be followed to solve this challenge. Thecode block below is a sample way that this project could bestructured.
To make this assessment code more readable, we can divide eachtask into a separate function: one function to read the raw file, asecond to write back the results and a third one to call the othertwo.
Finally, create a python module to encapsulate the functions youjust created in order to make this code reusable by other files oreven other python projects:
You will create another file and call it main.py.This file will serve as a tester to your package. To check thateverything is in place and working properly, create a function thatcalls the function from your package and return the results. Theresulting output must be the same as the previous example, exceptthat now you’re calling the function from a user-defined package.This will be explained in more details in the following tasks.
What you need to do, part 1
Updated the mailing list
Create a python function called update_mailing_list toupdate the original mailing list passed as parameter and filter theinvalid email addresses. The function stub can be found inthe update_mailing_list.py file. Consider therules below to filter out (and update) the original mailinglist:
What you need to do, part 2
Trigger the file operations
This function is the main entry point to trigger the otherfunctions to handle file operations. First, callthe read_mailing_list_file() to read the original datasetand process it with the update_mailing_list() function.Then, cache the resulting active user id list. Next, callthe save_output_file() function to persist the user idsto an output csv file.
Finally, compute the length of the output file to check if itmatches the result of our previous function to update the mailinglist. The template for this function is shown below:
1 # Import 2 from your_package import your_module 3 4 5 def mailing_list_utils(): 6 7 8 9 10 11 12 13 234 main.py X module... x update.... X output.... X optional... X mailinglist_validation_util from your package 56 16 17 18 || || || 14 if __name__ 15 Your docstring documentation starts here. return # Returning the output of the `mailinglist_validation_util` function '__main__': # Calling the function from your package print('The output file has length {}.'.format (mailing_list_utils_extended())) mailing_... X +
main.py 1 import csv 2 3 # Import the mailing list updater from the appropriate file 4 from update_mailing_list import update_mailing_list 5 6 # Global variable to set the base path to our dataset folder 7 base_url = '../dataset/' 8 9 10 def read_mailing_list_file(filename, io_mode): 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X |||||| Your docstring documentation starts here. || || || # Open the file with the `with` context manager with open(the_url, the_io_mode) as csv_file: file_reader = # Open the csv file, passing the ',' delimiter, which is generally the case for csv files line_count = # Declare a counter to control the number of lines from dataset (this is useful to skip the #header row and print only the data values) +
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 42 3 44 45 46 41 main.py header × modules_pac... X update_maili... X output.csv X optional_pan... X mailing_list.c... X #header row and print only the data values) mailing_list = # Declare a list variable to hold the rows read from file # Looping through each row of the file for row in file_reader: if # Check if the line is not the header row: # Append each line to the `mailing_list` variable, excluding the line_count += # Increment the variable in 1 mailing_list_buffer = # Create another list variable that will be used as a temporary buffer to transform # our previous list into a dictionary, which is the data structure expected from the update_mailing_list_extended` # function 43 # Looping through the mailing list object for item in mailing_list: # Creating tuples with each row in the original list +
44 45 46 47 48 49 50 51 52 53 + X modules_pac... X update_maili... X output.csv X optional_pan.... X | mailing_list.c... X for item in mailing_list: # Creating tuples with each row in the original list main.py mailing_dict = # Transforming the list of tuples into a python dictionary updated_mailing_list_ids = # Call the `update_mailing_list_extended` from chapter 4 passing the mailing # list dictionary 54 55 56 57 58 def save_output_file(updated_mailing_list, output_filename, io_mode): 59 60 61 62 63 64 65 66 67 return # Return the resulting ids of the active users || || || || || || Your docstring documentation starts here. # Open the output file with the `with` context manager with open (base_url + output_filename, io_mode) as active_users_file: csv_writer = # Create a csv_writer object that will be responsible to persist the active users ids to a
67 X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X + csv_writer = # Create a csv_writer object that will be responsible to persist the active users ids to a # resulting csv file main.py 68 69 70 71 72 73 def mailinglist_validation_util(filename, output_filename, io_mode): 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 || || || || || || # Write each user id as a new row to the file Your docstring documentation starts here. updated_mailing_list = # Call the function to read the original mailing list file # and cache the results from the function # Call the function to write the results back to a csv file save_output_file(...) output_file = # Open output file to count the number of lines written output_file_length = # Compute the length of the output file # Closing the output file return # Return the output file length
main.py 1 # Import suggested package 2 from package import function 3 4 5 def update_mailing_list(mailing_list): 567 8 9 10 11 12 13 14 x modules_pac... X update_maili... X output.csv 15 16 17 18 19 20 21 22 22 || || || Your docstring documentation starts here. X optional_pan... X mailing list.c... X For more information on how to proper document your function, please refer to the official PEP8: https://www.python.org/dev/peps/pep-000 ... on-strings. # Checks it the flag `opt-out is present. You can use lower() to lowercase the flags and contemplate both # `opt-out and OPT-OUT` cases # Then, checks for the presence of the `unsubscribed` flag Finally, # checks if the email address contains `@gmail` provider for key, value in mailing_list_copy.items(): # Your conditional logic to filter out the unsubscribed users if (): # Remove the key if one of the above conditions is satisfied +
12 13 I 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 main.py || || || x modules_pac... X update_maili... X output.csv X optional_pan... X| mailing_list.c... X + # Checks it the flag `opt-out is present. You can use lower() to lowercase the flags and contemplate both # `opt-out and OPT-OUT` cases # Then, checks for the presence of the unsubscribed` flag Finally, # checks if the email address contains `@gmail` provider for key, value in mailing_list_copy.items(): # Your conditional logic to filter out the unsubscribed users if (): # Remove the key if one of the above conditions is satisfied # An array to collect the final output ids = [] # Loop through the updated mailing list and append the ids of the active users to the id list for key, value in # Your updated dictionary: # Append only the ids of the active users # Returns the updated mailing list with the active users return ids
main.py + X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X 1 import pandas as pd 2 3 # Global variable to set the base path to our dataset folder 4 base_url = '../dataset/' 5 6 7 def update_mailing_list_pandas (filename): 8 9 10 11 12 13 14 15 16 || || || || || || Your docstring documentation starts here. df = # Read your csv file with pandas return # Your logic to filter only rows with the `active` flag the return the number of rows 17 18 # Calling the function to test your code 19 print (update_mailing_list_pandas('mailing_list.csv')) 20
main.py 1 uuid, username, email, subscribe_status 2 307919e9-d6f0-4ecf-9bef-c1320db8941a, 3 8743d75d-c62a-4bae-8990-3390fefbe5c7, 4 68a32cae-847a-47c5-a77c-0d14ccf11e70, 5 a50bd76f-bc4d-4141-9b5d-3bfb9cb4c65d, 6 26edd0b3-0040-4ba9-8c19-9b69d565df36, 7 5c96189f-95fe-4638-9753-081a6e1a82e8, 8 480fb04a-d7cd-47c5-8079-b580cb14b4d9, 9 d08649ee-62ae-4d1a-b578-fdde309bb721, 10 5772c293-c2a9-41ff-a8d3-6c666fc19d9a, 11 9e8fb253-d80d-47b5-8e1d-9a89b5bcc41b, 12 055dff79-7d09-4194-95f2-48dd586b8bd7, 13 5216dc65-05bb-4aba-a516-3c1317091471, 14 41c30786-aa84-4d60-9879-0c53f8fad970, 15 3fd55224-dbff-4c89-baec-629a3442d8f7, 16 2ac17a63-a64b-42fc-8780-02c5549f23a7, 17 X modules_pac... X update_maili... X output.csv X optional_pan... X unsubscribed mailing list.c... X afarrimondo, thartus@@reuters.com, opt-out tdelicatel, [email protected], opt-out edelahuntyk, [email protected],OPT-OUT tdelicate10, [email protected], active ogelder2, [email protected], unsubscribed bnornable3, [email protected], opt-out csheraton4, [email protected], active tstodart5, [email protected], active mbaudino6, [email protected], unsubscribed paspling7, [email protected], active mknapton8, [email protected], active ajelf9, [email protected], unsubscribed cgoodleyh, ccowlinj @hp.com, active smcgonnelli, [email protected], opt-out mmayoralj, [email protected], +
Overview
TargetCRM is a CRM software company that is currently lookingfor a way to improve the architecture of its solution. The softwaresends emails to registered users every week and today the systemdoes not have a proper, structured way to persist the mailing listdata, so the data is stored in in-memory data structures likedictionaries and lists. As working with data structures likedictionaries in Python might be memory (resource) consuming and notfailure-proof (that is, any hardware/software failure might leadthe data to be lost forever if the only place we have them is inmemory), we need to move the project to a more robust, structuredway of storing data.
What you need to know, part 1
Files are a great way to persist data, especially small tomid-size datasets. As a company effort to create a more robustinfrastructure to store the email addresses data, they asked you topersist the user emails into csv files. CSV stands forcomma-separated values. As the name implies, csv files store datawith a comma character separating each field in the data file. Thecsv format was chosen because of the flexibility of working withthis type of file and the great support from analytic tools, suchas Excel, and, becoming more popular in recent years, BI tools. Asdata generated by the users grow, the company wants to use thisdata in the future to analyze it and get informative data fromit.
Another facility to work with files is that we can distributethem in the company’s local environment to work in parallel. So, ifthe mailing list database increases in a pace that we cannot storeevery user in a single file, we can distribute the users amongseveral csv files and process files separately, or in batches, andthen gather them together when we need to perform some sort ofanalysis. This sort of capability is very desirable in today's realapplications as the data generated by online users is increasing ata very fast pace.
The company is shifting from an in-memory persistence to a morerobust strategy. To justify the investment on this newarchitecture, they will create a local server infrastructure tohold the data files. As csv files can take any number of columns(as long as they are comma-separated), it is very important thatyour algorithm writes a standardized data format for the case thereare multiple csv files and there is a need to combine them in thefuture. Consider the scalability of your solution: if you havethousands of active users, you need a scalable architecture tohandle this problem as the data gets larger. Therefore, if all thedata is not viable to be stored in a single file, we will split theuser data to be stored in multiple files.
There is a problem with multiple files, however. Maintaining theintegrity of the dataset is difficult. So, in order to make theintegration easy, make sure that your files follow the samestructure (for this assessment), otherwise the system couldbreak.
TargetCRM asked you to implement a mechanism to read, process,and persist the data for its platform. For this task, you will beworking with file operations. You will need to open the raw mailinglist, saved in a csv file, filter the users that have beenunsubscribed, and print back the resulting mailing list to anothercsv file.
TargetCRM is a fictitious CRM software company especiallycreated for the purpose of this code assessment.
What you need to know, part 2
In this assessment, you will need to read the mailing list fromthe sample file shown in Figure 8.1, update themailing list by filtering out the users with the flag differentfrom "active", and write the results back to another csv file.
To update the mailing list, there are some requirements:
You will create auser-defined mailinglist_validation_util() function thatreceives the following parameters:
Then you will save the output, that is, the ids of the activeusers, into the resulting csv file
Input:
Output:
In order for you to start this activity, we will provide afunction template that can be followed to solve this challenge. Thecode block below is a sample way that this project could bestructured.
To make this challenge code more readable, we can divide eachtask into a separate function: one function to read the raw file, asecond to write back the results and a third one to call the othertwo.
Finally, create a python module to encapsulate the functions youjust created in order to make this code reusable by other files oreven other python projects.
In order for you to start this assessment, we will provide afunction template that can be followed to solve this challenge. Thecode block below is a sample way that this project could bestructured.
To make this assessment code more readable, we can divide eachtask into a separate function: one function to read the raw file, asecond to write back the results and a third one to call the othertwo.
Finally, create a python module to encapsulate the functions youjust created in order to make this code reusable by other files oreven other python projects:
You will create another file and call it main.py.This file will serve as a tester to your package. To check thateverything is in place and working properly, create a function thatcalls the function from your package and return the results. Theresulting output must be the same as the previous example, exceptthat now you’re calling the function from a user-defined package.This will be explained in more details in the following tasks.
What you need to do, part 1
Updated the mailing list
Create a python function called update_mailing_list toupdate the original mailing list passed as parameter and filter theinvalid email addresses. The function stub can be found inthe update_mailing_list.py file. Consider therules below to filter out (and update) the original mailinglist:
What you need to do, part 2
Trigger the file operations
This function is the main entry point to trigger the otherfunctions to handle file operations. First, callthe read_mailing_list_file() to read the original datasetand process it with the update_mailing_list() function.Then, cache the resulting active user id list. Next, callthe save_output_file() function to persist the user idsto an output csv file.
Finally, compute the length of the output file to check if itmatches the result of our previous function to update the mailinglist. The template for this function is shown below:
1 # Import 2 from your_package import your_module 3 4 5 def mailing_list_utils(): 6 7 8 9 10 11 12 13 234 main.py X module... x update.... X output.... X optional... X mailinglist_validation_util from your package 56 16 17 18 || || || 14 if __name__ 15 Your docstring documentation starts here. return # Returning the output of the `mailinglist_validation_util` function '__main__': # Calling the function from your package print('The output file has length {}.'.format (mailing_list_utils_extended())) mailing_... X +
main.py 1 import csv 2 3 # Import the mailing list updater from the appropriate file 4 from update_mailing_list import update_mailing_list 5 6 # Global variable to set the base path to our dataset folder 7 base_url = '../dataset/' 8 9 10 def read_mailing_list_file(filename, io_mode): 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X |||||| Your docstring documentation starts here. || || || # Open the file with the `with` context manager with open(the_url, the_io_mode) as csv_file: file_reader = # Open the csv file, passing the ',' delimiter, which is generally the case for csv files line_count = # Declare a counter to control the number of lines from dataset (this is useful to skip the #header row and print only the data values) +
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 42 3 44 45 46 41 main.py header × modules_pac... X update_maili... X output.csv X optional_pan... X mailing_list.c... X #header row and print only the data values) mailing_list = # Declare a list variable to hold the rows read from file # Looping through each row of the file for row in file_reader: if # Check if the line is not the header row: # Append each line to the `mailing_list` variable, excluding the line_count += # Increment the variable in 1 mailing_list_buffer = # Create another list variable that will be used as a temporary buffer to transform # our previous list into a dictionary, which is the data structure expected from the update_mailing_list_extended` # function 43 # Looping through the mailing list object for item in mailing_list: # Creating tuples with each row in the original list +
44 45 46 47 48 49 50 51 52 53 + X modules_pac... X update_maili... X output.csv X optional_pan.... X | mailing_list.c... X for item in mailing_list: # Creating tuples with each row in the original list main.py mailing_dict = # Transforming the list of tuples into a python dictionary updated_mailing_list_ids = # Call the `update_mailing_list_extended` from chapter 4 passing the mailing # list dictionary 54 55 56 57 58 def save_output_file(updated_mailing_list, output_filename, io_mode): 59 60 61 62 63 64 65 66 67 return # Return the resulting ids of the active users || || || || || || Your docstring documentation starts here. # Open the output file with the `with` context manager with open (base_url + output_filename, io_mode) as active_users_file: csv_writer = # Create a csv_writer object that will be responsible to persist the active users ids to a
67 X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X + csv_writer = # Create a csv_writer object that will be responsible to persist the active users ids to a # resulting csv file main.py 68 69 70 71 72 73 def mailinglist_validation_util(filename, output_filename, io_mode): 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 || || || || || || # Write each user id as a new row to the file Your docstring documentation starts here. updated_mailing_list = # Call the function to read the original mailing list file # and cache the results from the function # Call the function to write the results back to a csv file save_output_file(...) output_file = # Open output file to count the number of lines written output_file_length = # Compute the length of the output file # Closing the output file return # Return the output file length
main.py 1 # Import suggested package 2 from package import function 3 4 5 def update_mailing_list(mailing_list): 567 8 9 10 11 12 13 14 x modules_pac... X update_maili... X output.csv 15 16 17 18 19 20 21 22 22 || || || Your docstring documentation starts here. X optional_pan... X mailing list.c... X For more information on how to proper document your function, please refer to the official PEP8: https://www.python.org/dev/peps/pep-000 ... on-strings. # Checks it the flag `opt-out is present. You can use lower() to lowercase the flags and contemplate both # `opt-out and OPT-OUT` cases # Then, checks for the presence of the `unsubscribed` flag Finally, # checks if the email address contains `@gmail` provider for key, value in mailing_list_copy.items(): # Your conditional logic to filter out the unsubscribed users if (): # Remove the key if one of the above conditions is satisfied +
12 13 I 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 main.py || || || x modules_pac... X update_maili... X output.csv X optional_pan... X| mailing_list.c... X + # Checks it the flag `opt-out is present. You can use lower() to lowercase the flags and contemplate both # `opt-out and OPT-OUT` cases # Then, checks for the presence of the unsubscribed` flag Finally, # checks if the email address contains `@gmail` provider for key, value in mailing_list_copy.items(): # Your conditional logic to filter out the unsubscribed users if (): # Remove the key if one of the above conditions is satisfied # An array to collect the final output ids = [] # Loop through the updated mailing list and append the ids of the active users to the id list for key, value in # Your updated dictionary: # Append only the ids of the active users # Returns the updated mailing list with the active users return ids
main.py + X modules_pac... X update_maili... X output.csv X optional_pan... X mailing list.c... X 1 import pandas as pd 2 3 # Global variable to set the base path to our dataset folder 4 base_url = '../dataset/' 5 6 7 def update_mailing_list_pandas (filename): 8 9 10 11 12 13 14 15 16 || || || || || || Your docstring documentation starts here. df = # Read your csv file with pandas return # Your logic to filter only rows with the `active` flag the return the number of rows 17 18 # Calling the function to test your code 19 print (update_mailing_list_pandas('mailing_list.csv')) 20
main.py 1 uuid, username, email, subscribe_status 2 307919e9-d6f0-4ecf-9bef-c1320db8941a, 3 8743d75d-c62a-4bae-8990-3390fefbe5c7, 4 68a32cae-847a-47c5-a77c-0d14ccf11e70, 5 a50bd76f-bc4d-4141-9b5d-3bfb9cb4c65d, 6 26edd0b3-0040-4ba9-8c19-9b69d565df36, 7 5c96189f-95fe-4638-9753-081a6e1a82e8, 8 480fb04a-d7cd-47c5-8079-b580cb14b4d9, 9 d08649ee-62ae-4d1a-b578-fdde309bb721, 10 5772c293-c2a9-41ff-a8d3-6c666fc19d9a, 11 9e8fb253-d80d-47b5-8e1d-9a89b5bcc41b, 12 055dff79-7d09-4194-95f2-48dd586b8bd7, 13 5216dc65-05bb-4aba-a516-3c1317091471, 14 41c30786-aa84-4d60-9879-0c53f8fad970, 15 3fd55224-dbff-4c89-baec-629a3442d8f7, 16 2ac17a63-a64b-42fc-8780-02c5549f23a7, 17 X modules_pac... X update_maili... X output.csv X optional_pan... X unsubscribed mailing list.c... X afarrimondo, thartus@@reuters.com, opt-out tdelicatel, [email protected], opt-out edelahuntyk, [email protected],OPT-OUT tdelicate10, [email protected], active ogelder2, [email protected], unsubscribed bnornable3, [email protected], opt-out csheraton4, [email protected], active tstodart5, [email protected], active mbaudino6, [email protected], unsubscribed paspling7, [email protected], active mknapton8, [email protected], active ajelf9, [email protected], unsubscribed cgoodleyh, ccowlinj @hp.com, active smcgonnelli, [email protected], opt-out mmayoralj, [email protected], +