This lab takes you through a real world example of query optimization. First, let's examine the original query. Don't fr

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899604
Joined: Mon Aug 02, 2021 8:13 am

This lab takes you through a real world example of query optimization. First, let's examine the original query. Don't fr

Post by answerhappygod »

This Lab Takes You Through A Real World Example Of Query Optimization First Let S Examine The Original Query Don T Fr 1
This Lab Takes You Through A Real World Example Of Query Optimization First Let S Examine The Original Query Don T Fr 1 (136.56 KiB) Viewed 20 times
This lab takes you through a real world example of query optimization. First, let's examine the original query. Don't fret at it's length as we just want to look through a few key parts at this point. Download and examinerinize while reading the next section. Notice the use of the WITH keyword. This feature is known as Common Table Expressions. The easiest way to think of this is that it creates a named temporary table which gets used later as a nested query. This is important is it is crucial to our issue. MySQL docs for this feature: Al Also notice that there are several operators in this query. They are specific to the Amazon Redshift database this query was written for an will not work in MySQL. These are the :: for casting types and for concatenation. There are a couple of other features specific to Redshift such as ILIKE as a case insensitive version of LIKE and the syntax for INTERVAL being slightly different. A version of this query modified to work with MySQL will be provided below. One more feature to examine is the CASE statement. This is not relevant to this optimization task but a highly useful feature which functions similar to SWITCH statements in programming languages. BAR Now that you've looked through the original query, and realize it is a series of smaller queries nested, let's try the MySQL version. First, run the initialization query. Second, load the converted query.iur. The query should run no problem, despite no data entered in the table. We don't need data, we need the query plan. Try running EXPLAIN On this query. A query this complicated gives a messy result which makes optimization difficult. If this was your starting point, the the best approach would be to pull out each temporary table and explain in isolation, step by step. But since this is originally from Redshift, let's look at Redshift'S EXPLAIN results What you should draw your attention to are the cost values. While these are arbitrary, they are not meaningless. Find the section of the query where this grows the largest, that is the slowest part of the query in need of optimization. (Save yourself some time as search for other publisher_views) As this query had been previously reviewed for proper indexing, all queries should be indexed so we need to find another optimization. Since this is already indexed, draw your attention to the ROWS returned. If you look at the next query which makes use of these results, you'll notice the Rows is much smaller. So our problem is we are looking at way more data than we need to. We are pulling over 100.000 rows when we are trying to find 4 of them. Now let's extract the critical section to work with, along with simplified versions of its dependencies: WITH

Now let's extract the critical section to work with, along with simplified versions of its dependencies: WITH listing_info_raw AS ( SELECT '' AS property id sha, "' AS mc export property id sha, (Rental)' AS pretty address, (Rental)' AS display address ), periods AS SELECT true AS is_current_period, CAST (2020-11-09' AS dale) AS start_dale, CAST ('2020-11-15' AS dale) As end_date ), other_publisher_views AS SELECT clv.date, clv.streeteasy_page_views, clv.trulia_page_views, clv.zillow page views, clv.realtor page views FROM agent insights_lt.agent insights combined listing views AS clv JOIN listing_info_raw ON clv.property_id_sha = listing_info_raw.property_id_sha ), other_publisher_views_by_period AS ( SELECT is_current period, start_date, end_date, coalesce (sum (streeteasy_page_views), 0) AS streeteasy_page_views, coalesce (sum(trulia_page_views), 0) AS trulia_page_views, coalesce (sum (zillow_page_views), 0) AS zillow_page_views, coalesce (sum (realtor_page_views), 0) AS realtor_page views FROM periods LEFT JOIN other publisher views ON other publisher views.date BETWEEN periods.start date AND periods.end date > SELECT * FROM other publisher_views_by_period; The task for this lab is to optimize this section of the query, which includes other_publisher_views & other_publisher_views_by_period. Submit your optimized version of this section of the query. Extra Credit: Find and optimize similar sections of the larger query. (MySQL version)
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply