Invalid Relationship – Where the Blanks Come from?

Summary

When you’re sure there is no blanks in your dimension data but still see blanks in your visuals, chances are, you have an invalid relationship.

A valid relationship between two tables is based on a column which values should be in common. This way, any row in the fact table can find the corresponding rows in the dimension table via the same values. However, if some values in the fact table don’t exist in the dimension table, the engine will create an additional blank row to store those data.

 

Blanks? It can be scary …

You go to the data view to check if there are blanks in the Color field.

If yes, it’s easy to make the blanks disappear by assign them a more meaningful name such as “Unknown”.

What if no?

If it’s in this case, the reason is, you have an invalid relationship between the two tables and this invalid relationship is due to bad data.

Yes, bad data.

 

In this example, a valid relationship between two tables is based on a column which values (Product Key) should be in common. This way, any row in the fact table (Sales) can find the corresponding colors in the dimension table (Product) via the same values of Product Key.

What if some rows in the fact table (Sales) have the Product Keys not existing in the dimension table (Product)? These rows of data flow into Product table but can’t find corresponding colors. They have to go to an additional row which is created only for them, the blank row.

So even you can’t see the blank in the data view, you still can spot them in a slicer visual. In report view, drag and drop a slicer visual on the canvas, and then drag the Color in Product table into this slicer. Now you see the additional blank.

We can even see the numbers of blanks by creating a measure Product Count and put it in the table visual:

We see there are 25295 rows of data has no corresponding colors.

Let’s take a closer look at it by taking Product Key “215” as an example. In the Data view, we see there are 201 rows of sales with Product Key of “215” exist in Sales table while not in Product table.

We can find it out even clearer by creating another slicer for Product Key from Sales table:

Now we have proven this bad data issue. How to fix it? The only way is to fix the data.

In some cases, we need to fix it in data source, make sure all of fact data has corresponding rows in the dimension table. In another cases, we need to check how the data is refreshed and make sure all tables are refreshed at the same time. For instance, when you work with real time data in database or data warehouse, you may refresh a table when doing transformation in Power Query but the others have not. Make sure to refresh all the tables right after you finish the transformation.

Now let’s pretend we have fixed the data by adding two colors’ data in data source. Now you see the blank disappear!

Related Posts

Leave a Reply