top of page
  • Writer's pictureJason Cockington

Speed Up Power BI Development by Pointing to an Alternate Data Source

Last week I launched my blog with an illustration in how to Combine Multiple Excel Files in SharePoint using Power BI. One of the techniques mentioned in that article is the idea of creating a FileList query, that offers an easy entry point to update the folder location path should the SharePoint folder ever be moved or renamed. This week I'd like to continue on this journey, by discussing how we can leverage this approach to easily update the entire data load process, so that we have flexibility to create both a development and production version of the Power BI report data.


SharePoint's Plusses and Minuses

PLUS: Scheduled Refresh is Easy with SharePoint

One of the main reasons you might choose to load data from a SharePoint data source, is that when you publish your finished Power BI report up to the Power BI Service, it can easily be configured with a Scheduled Refresh so that the data in your report stays up to date. This is because SharePoint is a Microsoft Cloud data source, and therefore does not require a Power BI Gateway in order for Power BI Service to access the data.

To configure the scheduled refresh, simply select the Schedule Refresh icon from the Semantic Model of the report in the Workspace.



When the Settings page opens, select the Data Source Credentials (1), click on the Edit Credentials, and sign into SharePoint. Once authenticated, the Refresh settings will be available (2). Simply enable the Refresh schedule by clicking on the button (3). By default, this will configure a Daily refresh to occur at approximately midnight each day. If you need more specific control over when the refresh occurs, you can simply set the refresh frequency to occur Daily or Weekly, and click Add Another Time (4) to specify exactly when you want the data to be updated.



NOTE: The number of times you can set the refresh to occur is limited by your Power BI licence. Power BI Pro users can schedule up to 8x every 24 hours. Power BI Premium users (includes PPU) can schedule up to 48x every 24 hours. Unfortunately, if you are working with a Power BI Free licence, then scheduled refreshes are not available to you, and you will have to update the data manually.


MINUS: Data Preparation is SLOW with SharePoint

Accessing data from SharePoint is relatively straight forward in Power BI Desktop, however if there is a need to do any transformation to the data as part of the overall data acquisition, the process of preparing the data can become exceedingly slow. This is in part due to the fact that Power Query cannot take advantage of Query Folding when working with a SharePoint data source, but it is more complex than that, because modifying the equivalent data from a local folder/file also cannot Query Fold, but local files don't suffer the same degree of performance degradation.

As a guide, to save you the pain of experiencing this first hand, I have built Power BI models working directly with SharePoint data sources and while preparing the data in Power Query, have been stuck waiting for 5 minutes for the Power Query preview to update before I could move on. This is not an efficient use of time, and there must be a better way.


Develop Using Local Files but Deploy with SharePoint Files

The solution I now prefer to use is to develop the Power BI Reports using local copies of the data source files, then manually switch to the SharePoint files prior to deployment. This approach gives you the best of both worlds; the speed of developing on fast performant local files and the ease of setting a Scheduled Refresh against SharePoint files when the report is ready to publish. To do this, I configure the report to connect to both a local folder and SharePoint folder in Power Query and toggle between the two depending on if I am developing or deploying.

NOTE: The file path structure for SharePoint is different to the file path structure for local files, so you need to use some caution during the set up. Providing you follow the techniques described in the previous article, you won't run into any issues.


Copying the Files to Your PC

Now there are a few ways you can get a copy of the files from SharePoint onto your PC. The method I would recommend is to use OneDrive to Sync the folders you need. That way your PC files (1) will stay in sync with the SharePoint files (2) and the sub folder structure will be the same.



But you can also set up a local folder on your PC and simply place the files you need there manually. If you do it this manual way, I recommend that you use at least the lowest level subfolder name so it matches the lowest level SharePoint subfolder name exactly (spelling, spaces and capitalisation). This is needed later in the process to ensure that you can swap easily between local files and SharePoint files.  It is not mandatory that you replicate the sub folder structure from SharePoint exactly, as long as you can identify your files using at least one sub folder.


Get Files from Local Folder

Ordinarily, when building a SharePoint solution, I would actually start the data preparation directly from the local files, and only switch to the SharePoint files once everything is in place. For this demonstration however, I am modifying the data acquisition of the report developed in the previous article.

Once again, start by clicking the Get Data button on the Home ribbon of Power BI Desktop. This time however, instead of choosing from SharePoint Folder (2), select the Folder option (1).



In the from Folder wizard that appears, either Browse for the desired files or enter your local files' folder path. Click OK. Then when the Navigator window appears, click the Transform Data option.



The Power Query Editor will open at your newly created Folder query. Double click on the query name to give it a new name (1). I am naming mine loc_FileList.



Next, right-click on the loc_FileList, and untick the Enable Load setting. The query should now be italicised, allowing you to see that the query is a connection-only query.

NOTE: In this example, I am repointing from SharePoint to Local Files. In a real-world example, I would start with the Local Files, then perform the File Combine on the Local Files, and only just before publishing the Power BI report would the data source be re-pointed to the SharePoint files.


Pointing to the New Data Source

There are now two different source locations available for our data transformations and load. This gives us the option to develop our acquisition logic on the Local Files data, getting the benefit of the performance gains this data source offers, with the flexibility to re-point the process to run against the SharePoint files when we're ready to deploy.

Looking at the Dependencies view of our current setup reveals we need one more stepping stone in place that sits between the Sample File and the sp_FileList before the technique can be wrapped up. That stepping stone's only job is to allow easy switching of data sources.



To create the stepping stone we need to start with right-clicking the sp_FileList, and choose Duplicate. The new query "sp_FilesList (2)" that is generated contains all the logic necessary to connect to the SharePoint files.

Next, double click on the sp_FileList query, and rename it to FileList. Then rename the "sp_FileList (2)" to sp_FileList.

NOTE: It is important to follow this technique when converting and existing File Combine process, otherwise the repoint may not work correctly.



There are now three source queries: FileList, sp_FileList & loc_FileList. The FileList and sp_FileList both point to the SharePoint source and the loc_FileList points to the Local Folder. The final step in repointing the query from one source to the other is by updating the m-code in the FileList.



Modifying the FileList M-Code

The final stage in the process is to modify the underlying code that is currently resulting in the FileList query pointing to the SharePoint folder. This is most simply achieved by updating the m-code in the Formula Bar but can also be done via the Advanced Editor.

Select the FileList query and replace the formula in the Formula Bar with "= sp_FileList".




From this point on, to switch between sources just requires the formula in the FileList query to be changed from sp_FileList to loc_FileList.



Switching Source with a Parameter Toggle

Now at this point you may be thinking that it would be neat if you could configure a parameter to toggle between both. I agree – great idea. But in order to get this to work the FileList query will need an IF statement to toggle the loc_FileList and the sp_FileList depending on the parameter value. This all works great while in Power BI Desktop, but once the report is published, the Power BI Service will identify that one potential setting on the Parameter will connect to local Folder, and therefore will require a Power BI Gateway. That being the case, a minor manual adjustment to the acquisition logic prior to publishing the report is a minor inconvenience considering what can be gained by following this setup.


Final Thoughts

While using SharePoint data sources are very convenient when it comes to configuring reports with automated refresh schedules in the Power BI Service, SharePoint can be problematically slow as a development data source. By splitting the connection to the data source apart from the transformation logic used to prepare the data destined for the Power BI Semantic Model, it is possible to get the best of both worlds. During development, a local copy of the files can be used. This copy can either be a complete copy of the desired data, or just a subset of the data that is indicative of the overall target data. The transformation steps can be authored, previewed and tested against this faster more performant data source. Then when the report is ready to publish and share with your target audience, simply return to Power Query and re-point the transformation queries from the local files to the desired SharePoint files.

As a final point to consider, this technique can be modified slightly and implemented when accessing large database data sources. By creating a local copy of a subset of the target database, you can reduce Power BI Desktop refresh speeds, allowing you to get straight into to building the report on the development data, and once again, simply re-point the transformation queries to the target data source prior to publishing the report for final review.

111 views0 comments

コメント

5つ星のうち0と評価されています。
まだ評価がありません

評価を追加
bottom of page