Summarizing data – Model the data

Summarizing data

If your source data provides a level of detail that’s not required by reporting, then you may want to consider summarizing your data to reduce cardinality.

For example, if the source data contains daily sales information but you only report monthly values, you may want to summarize your sales data to be at the month level instead of the day level. This approach will reduce the size of your model dramatically, though it will make the reporting of daily data impossible.

It’s preferable to summarize your data as close to the data source as possible. Power Query also allows you to summarize data by using the Group By functionality on the Transform ribbon.

Data summarization involves a trade-off between data model size and the available level of detail; whether you should summarize data depends on your business requirements.

Need More Review? Data Reduction Techniques

All possible data reduction techniques are outside the scope of this book. For more examples of reducing your data when working with imported data, see “Data reduction techniques for Import modeling” at https://docs.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction.

Chapter summary

  • Power BI supports various types of schemas: flat (fully denormalized), star, and snowflake. The preferred schema for Power BI is the star schema.
  • You can configure various column and table properties in the Model view.
  • In some cases, it may be preferable to define role-playing dimensions, which allow you to use a single dimension to filter one fact table by using different keys in the table.
  • Power BI supports the following three cardinality types for relationships: one-to-one, one-to-many, and many-to-many. For one-to-one relationships, the cross-filter direction is always Both (each table filters the other). One-to-many dimensions can have their cross-filter direction be set to either Single (the one side filters the many side) or Both. You choose the cross-filter direction of many-to-many relationships depending on your business requirements. Relationships whose cross-filter direction is set to Both are also known as bidirectional relationships.
  • For bidirectional relationships, security filters won’t flow in both directions automatically, though you can configure that behavior in the relationship properties.
  • For best performance, look carefully at the storage mode of each table, the cardinality and cross-filter direction of relationships, and the cardinality of columns (the number of distinct values).
  • Besides measures, you can use DAX to create calculated tables and calculated columns in Power BI.
  • You can create a common table in Power BI by using Power Query or DAX, or you can load it from a data source.
  • Power BI supports the creation of hierarchies, which can be useful to make models more user-friendly, though they have no technical advantages over several fields being used together in a visual without being combined in a hierarchy.
  • You can secure your data model by using row-level security, which can use static DAX filters on one or more tables, or dynamic row-level security that considers which user is viewing the report.
  • Power BI allows you to use natural language queries by using the Q&A visual. You can add synonyms to your data model to make Q&A work better.
  • CALCULATE is one of the most important functions in DAX, and you can use it to manipulate filters. More specifically, you can add, ignore, and update filters. CALCULATE is also used for context transition.
  • The Time Intelligence family of DAX functions allows you to aggregate values across time; for instance, you can use DATESYTD to calculate year-to-date values, or you can use DATEADD to calculate a value during the same period last year. There are also functions that allow you to create semi-additive measures, such as OPENINGBALANCEMONTH.
  • Power BI has a feature called Quick Measures, which allows you to define calculations without writing any DAX code.
  • In some cases, it may be preferable to replace numeric columns with measures to reduce the size of the data model.
  • In general, you should only load data that is necessary for analysis by removing columns or filtering rows in Power Query, especially for primary keys of fact tables.
  • Performance Analyzer in Power BI can be useful to identify performance bottlenecks.
  • You can improve the cardinality of columns by selecting appropriate data types, as well as summarizing data.