SQL Joins

Image
Joins are used to combine rows from two or more tables based on related columns between them. Joins allow you to retrieve data from multiple tables simultaneously, enabling you to create complex queries that fetch data from different sources. There are different types of joins in SQL, including: INNER JOIN Returns only the rows that have matching values in both tables based on the specified join condition. It discards non-matching rows from both tables. Example:           create table t1(x int); insert into t1 values(1); insert into t1 values(1); insert into t1 values(0); create table t2(y int); insert into t2 values(0); insert into t2 values(1); insert into t2 values(1);           select * from t1 inner join t2 on t1.x = t2.y Output: 2. LEFT JOIN (or) LEFT OUTER JOIN Returns all the rows from the left (or first) table and the matching rows from the right (or second) table. If there is no match, NULL values are

WLM, Summary Tables and Subquery

Work Load Management

    By default , redshift cluster is configured with default parameter group with concurrency level of 5 with single queue that means maximum of 5 queries run in parallel in single queue.
So suppose you are firing more than five queries than remaining queries will wait untill the query slot getting freed in a queue.

With WLM we can have maximum of 8 queue and concurrnecy level of 50 across all the queues.
With effective WLM we can maxmize  the query performance , so that the small queries no need to wait untill the long running queries get completed.

We cannot add queues in the default parameter group . So in first we need to create our custom parameter group then only only we can define our queues.

We can map the query to queue based on either by user who running the query or by query type.

query type example.

set query_group to 'small';

Summary Tables

   Try to work on the summarised data . For example suppose multiple queries resulting in aggregating data on around 10 million records . Then we can bring that result as a separate summarised table . So that we can simply going to select records instead of doing the  repeated complex operations that to on minimal data.

SubQueries
 Subqueries helps us to considerably reduce the time taken by queries.for example, Consider table1 & table2 each having 10 million rows.
select columns,... from table1 inner join table2 on table1.column1=table2.column where table1.column='somevalue'
the below query perform better than the above query
select columns,... from (select columns,... from table1 where table1.column='somevalue') table1 inner join table2 on table1.column1=table2.column 

Other important points to consider
  • Always select the required minimal columns instead of using '* from '.
  • When designing table for the varchar columns set the minimum size.

Comments

Popular posts from this blog

Machine Learning Foundations - Deep Learning Summary - Quiz

Machine Learning Foundation - Deep Learning - Programming Assignment

Machine Learning Foundations - Recommender System - Quiz