SortKey, DistKey & Compression

- March 10, 2017

Sortkeys

Compound Sortkeys

Effecient when used in the where conditions are in same order as specified in the sortkey.
This is the default sortkey .

ex : sortkey (column1, column2)

where column1='somevalue' and column2='somevalue' give better performance than where column2='somevalue'

Interleaved Sortkeys

Gives equal weightage to all the columns specified in the sortkey. Gives better results than the compound sortkey in the circumstances when cannot able to identity columns used in the where conditions.

ex : interleaved sortkey (column1, column2)

As time grows the data added to the existing table grows , it will degrade the performance on both the compound and interleved sortkeys , but it has greter effect on interleaved key.Need to Vaccum Reindex sort key tables whenever new data added.

http://docs.aws.amazon.com/redshift/latest/dg/r_vacuum-decide-whether-to-reindex.html

Distribution Key

Defines how rows should be stored across the nodes.Usually moving the data between the nodes during joins consumes more time, so we have to more careful in defining distribution keys.
we can define only one distkey per fact table.

if sort and distkey uses the same column , it would help the query optimizer to choose effective MERGE JOIN step instead of other join types.

In some cases we cannot come to the conclusion in identifying the distkey , in that case we can use diststyle even , it would randomly distribute the rows across the nodes.

for the dimension table if data are minimal then use diststyle all, it would copy the entire table across all the nodes , so it would avoid the cost effective DB_BDCAST,DB_DB_INNER.

Compresssion Encoding

Compression techniques helps to reduce the data size in disk , so during the join operation the size of the data moved between nodes will get reduced , it will gradually increase the performance.
We can enable the compression encoding for all the columns in table.
Its better to avoid enabling the encoding for the sort keys , so while processing it has to decode the sort column and where conditions will be applied it will cause performance degrading.
We can enable compression either manually or automatically.
If its manual , then we need to analyze the columns before enabling encoding.

analyze compression tablename

will do this.But with sufficient data it would suggest more accurate encoding type.

The best option is to leave the finding of perfect encoding type to redshift by enabling endoing during copy like

COPY ... COMPUPDATE ON

Search This Blog

Mgk work's

SQL Joins