Join the Review in Azure SQL Data Warehouse

How are you planning to create fact and dimension tables too quickly joining the new Azure SQL Data Warehouse?

Will hash propagate large fact tables and replicate smaller tables to speed up joins or to make indexing a priority?

+3


source to share


3 answers


Azure SQL Data Warehouse natively offers two types of tables - Round Robin and Hash Distributed (see DW Table Docs at https://azure.microsoft.com/documentation/articles/sql-data-warehouse-develop-table-design/ ) ,

Typically, for size charts, you go for the round distribution. For fact tables, you will want to choose HASH based distributed table schemas.



** Edit: Replication is now supported, which can be a useful option for some dimension tables.

+3


source


Your basic premise of hash spreading large fact tables and replicating smaller tables works fine in MPP environments such as PDW, but since SQL DW does not assume replicated data (yet hopefully one day), you will need to use Round distribution to do this. Robin.



If you can minimize data movement, then there are some good steps you can take to improve connection performance. However, having data on the right server is only half the battle, and you should also consider indexing strategies as you would in a regular (SMP) SQL Server environment.

+1


source


Please note that ADW REPLICATE is in public view, but I think it is still faulty. I have a few small tables that I changed to REPLICATE, but when I join these replicated tables and go through the explain xml plan, I still see data movement steps that shouldn't be in the data, REPLICATED for all nodes. So to figure out why I did DBCC PDW_SHOWSPACEUSED on multiple replicated tables, and instead of seeing that the row count is the same for all nodes, they differ in that the node count has zero rows. I am by no means an expert, but I believe their job should be done, but I cannot find any forums, discussions or review pages to report these issues.

0


source







All Articles