"Union All" and "Full Outer Join"

I sometimes come across a situation where I need to get different columns from different tables which do not have anything in common, all into a single resultset as multiple rows.  Examples include junk dimensions or analytical resultset which source from different tables. Those source tables may not be related in a way to warrant a straightforward join.


Situation 1: Junk dimension
For the record, a Junk Dimension is a dimension table with combinations of multiple flags or other descriptive attributes. If those attributes are completely independent and are not tired to any entity, the brute method would be store all possible combinations in the dimension table. For example, if there are 5 just attributes with 3 values each, then it would result in 5X5X5=125 combinations. This is well and good when the number of attributes and the distinct values for each comprise a small set.

Say, if those attributes originate from a single entity in the source, we can then get only those specific combinations instead of storing all the possible combinations. In the earlier example, if the attributes are associated with an entity, and only 50 out of the possible combinations ever occur in practice, then we can query the source entity to get only those 50 combinations to be stored in our dimension.

Now, let's complicate this a little bit. Say, we are adding a couple of attributes to this junk dimension but they are either not coming from the same source table and not associated with the entity. What we want here is that for all the existing 50 records, the attributes 6 and 7, which are coming from an independent source, need to be NULL. And have additional rows with combinations of 6 and 7 populated and the first 5 attributes as NULL.

A simple way to do this is to use Union All. Select all columns from the first source table, the attributes 6 and 7 will be hard-coded as NULL. Union this with the select statement selecting all columns from second table, where only attributes 6 and 7 have values.


Select Col1, Col2, Col3, Col4, Col5, NULL as Col6, NULL as Col7 From TableA UNION ALL Select NULL as Col1, NULL as Col2, NULL as Col3, NULL as Col4, NULL as Col5, Col6, Col7 From TableB

Depending on the number of columns and the complexity of transformations involved, an easier and more compact form of code can be as below, using Full Outer Join:

Select Col1, Col2, Col3, Col4, Col5, Col6, Col7 From TableA FULL OUTER JOIN TableB ON 1 = 2

Situation 2: Combine metrics from two different tables into a single row
Consider two tables representing two different entities. For example, Sales and Inventory. Each has its own metrics (Sales Amount, Profit, Inventory, Safety Stock Days) but have only a couple of  common attributes that we would like to analyze by (Product and Date). The grains of both the tables include other disparate columns. If we would like to see all these metrics together in a row for a given product and time period, we cannot join on these common columns and expect to get right values for the metrics from the right table, as those two columns do not represent the grain of either table. That kind of join leads to inflated values resulting from double-counting.

So, we can use the trusted Union All method to get what we want:
SELECT Month, Product, Sum(SalesAmount) TotalSales, Sum(Profit) TotalProfit, Avg(SafetyStockDays) AvgSafetyStockDays FROM ( SELECT Product, MONTH(DateOfSale) as Month, SalesAmount, Profit, NULL AS SafetyStockDays FROM Sales WHERE DateOfSale BETWEEN '2018-01-01' AND '2018-03-31' UNION ALL SELECT Product, Month(Date) as Month, NULL AS SalesAmount, NULL AS Profit, SafetyStockDays, FROM Inventory I WHERE I.Date BETWEEN '2018-01-01' AND '2018-03-31' ) Combined GROUP BY Month, Product

Or the full outer join like this:

SELECT COALESCE(Sales.MonthOfSale, Inventory.Month) Month, COALESCE(Sales.Product, Inventory.Product) Product, Sum(SalesAmount) TotalSales, Sum(Profit) TotalProfit, Avg(SafetyStockDays) AvgSafetyStockDays FROM ( SELECT Product, SalesAmount, Profit, MONTH(DateOfSale) as MonthOfSale FROM Sales WHERE DateOfSale BETWEEN '2018-01-01' AND '2018-03-31' ) Sales FULL OUTER JOIN ( SELECT Product, Inventory, SafetyStockDays, Month(Date) as Month FROM Inventory I WHERE I.Date BETWEEN '2018-01-01' AND '2018-03-31' ) Inventory ON 1 = 2 GROUP BY COALESCE(Sales.MonthOfSale, Inventory.Month),
COALESCE(Sales.Product, Inventory.Product)


Comments