- Dejan Pajk
Ten Techniques for Optimising Memory Usage in Microsoft Power BI
With Power BI you can connect to a multitude of data sources. You can either import data, which is the most common way, or you can connect directly to the original data source via DirectQuery or Live Connection. When you import data to Power BI, you take advantage of the high-performance in-memory query engine known as VertiPaq.
In this post, I will share with you a few simple techniques that you can apply to optimise the memory usage when importing your data to Power BI. A smaller model not only saves computer memory, it also delivers better performance due to faster scans.
A smaller model not only saves computer memory, it also delivers better performance due to faster scans.
If you would like to speed up data refresh time or improve the responsiveness of your Power BI reports and dashboards, consider applying the following techniques:
Filter source data – Import only rows you need for your analysis, e.g. set date filter to import only transactions for the last two years and not the entire sales history.
Remove columns – Remove all columns that are not relevant to your analysis, such as primary keys not used in relationships, columns that can be calculated from other columns without performance penalty, or description columns stored in a fact table.
Lower granularity – Higher the granularity of a fact table, the more rows it will have. Our data sources usually store fine granular transactional data. Transaction-level grain is too detailed for most reporting scenarios. Lowering the granularity by grouping and pre-aggregating data is a great way to reduce the number of rows in your fact table. For example, consider grouping your data by month if you only need to slice by month and year.
Decrease cardinality – i.e. the number of distinct values of a column. VertiPaq is a columnar database. It stores each column in a separate data structure and applies sophisticated compression and encoding to reduce the memory footprint. With dictionary encoding, which is commonly used, only distinct values of a column are stored in a database. The cardinality is therefore considered as the main factor that determines a column size. Reducing the cardinality is recommended for all columns stored in large tables, such as a fact table. Here are few actions you can apply to decrease the cardinality: round numbers to remove obsolete decimals; round time to remove milliseconds/seconds; separate text columns into two or more parts; split DateTime into date and time columns.
Avoid calculated columns – Calculated columns come with hidden costs. They consume memory, they are not optimally compressed as native columns and they increase data refresh time. Rather than creating a calculated column after data has been loaded into the model, a native column can be added to your source database or within Query Editor. Another efficient way is to replace the calculated column with a measure, which will not consume any memory. You should always gauge performance improvements of a calculated column and related memory costs when deciding whether or not to use it.
Disable load – When you import data from a source, you apply transformations, such as merging and appending queries. As a result, you may end up with queries that are only used as intermediate transformation steps. By default, all queries from Query Editor are loaded into the memory of Power BI model. It is crucial to disable load for all queries that are not required in the final model.
Disable Auto Date/Time – Power BI automatically creates a built-in date table for each date field in the model to support time intelligence DAX functions. Those tables are hidden, they consume memory, and there is no flexibility to add custom columns, e.g. to support the fiscal year in Australia. Many analysts prefer to create their own date table, over which they have full control. Such tables should be marked as a Date Table (feature added in Power BI Desktop February update). To remove all hidden date tables from your model, in Power BI Desktop select File / Options and Settings / Options / Data Load and untick the Auto Date/Time.
Choose appropriate data type – Make sure all columns have the correct data type. Note: Fixed Decimal Number data type is internally stored as an integer. This data type is recommended for all decimal numbers with the precision of four or fewer decimals.
Transform data at the right place – Most data transformations generally take place in Query Editor in Power BI Desktop. Query Editor is a powerful and user-friendly tool. It will keep track of all applied transformation steps which is useful for traceability and future maintenance. However, you may obtain improved performance when you apply transformations directly at the source database. For example, grouping your sales data by month in your transactional database will increase the source query execution times and as a result, only grouped data will be sent over the network to Power BI. Another option is to import fine granular source data and perform grouping later in Query Editor. This will bring opposite effects – faster reads at the source, larger dataset to send over the network and more intensive processing in Power BI. There are many factors that will influence this decision, e.g. network bandwidth, source database speed. Notice that shaping your data in the source database can introduce performance issues for other database users and systems. It is essential to measure query execution times and data refresh times before making a decision.
Consider using DirectQuery – You should import data to Power BI wherever possible, however, if your goals cannot be met by importing data, then consider using DirectQuery. DirectQuery is generally only feasible when the underlying data source can provide interactive queries and is able to handle the query load. Follow the official guidance to implement DirectQuery successfully.
Remember, memory is the biggest asset in Power BI.
Remember, memory is the biggest asset in Power BI. Techniques represented in this post will reduce memory footprint which has a direct impact on the performance of your reports and dashboards.
Link to the original article: https://www.linkedin.com/pulse/ten-techniques-optimising-memory-usage-microsoft-power-dejan-pajk