A beginning is a very delicate time (excuse the DUNE reference there 😉). Know then that it was 10 years ago today, February 23, 2014, that my dear friend and mentor Matt Allington posted his first blog article on Excelerator BI. The article was titled Consolidate Multiple Excel Files in SharePoint using Power Query.
Things have changed a lot in 10 years, and consolidating files using Power Query is much easier now, but there are still some tricks that can make this process more durable. It is for this reason that I have chosen to write this article as a new spin on a familiar technique, as the first article I am posting to my new Blog.
When and why do we need to combine files
A common mistake I see among new Power BI starters, is when they have data spanning multiple years stored in separate Excel source files (common with budgets) they load each year’s data as a separate table into Power BI.
While this doesn’t prevent analysis of the data, it makes calculation logic much more complex than it needs to be. In the example above, to calculate [Total Units Sold], the measure will look something like this:
Total Units Sold = SUM('2013'[Value]) + SUM('2014'[Value])
The calculation will then need to be updated each time an additional year of data is added to the model. Needless to say, this is not the most efficient way to structure the data.
There’s a better way
Whenever we load data into Power BI, we are engaging the Power Query mashup engine. This means we have ample opportunity to modify and tweak the way the data gets loaded into Power BI. The key is to click on the Transform Data button instead of the Load button whenever you need to make adjustments.
In the example above, each data table has been loaded by choosing to get data from Excel. Unfortunately, there is no simple way to modify an Excel request to allow you to combine multiple files together. Power Query does however have an inbuilt feature that allows you to import data from a folder, and it is via this connector that we can use to combine all of our data files together.
Note: The Folder (1) connector is the local files version of the SharePoint Folder (2) connector. In this article I will be using the SharePoint Folder connector, however the technique is the same when connecting to a local Folder.
Getting Setup
To start the process, I have added a few basic Excel files into a new folder in my SharePoint site. This folder should contain all the files you want to consolidate, however they do not all need to be at the same level of folder path. For example, in the demo I am presenting below, I have added a folder called FileCombine to my SharePoint site as the top-level folder, and then inside this folder is three sub folders, 2013, 2014 and 2015, and in each of these folders is two Excel workbooks. The idea here is that we want to consolidate all the files inside the FileCombine folder, irrespective of which particular year they are from. This is possible because Power Query looks in the specified folder and all its subfolders. A process known as recursion.
Getting a clean start
To start the process, we need to click on the Get Data icon from the Home ribbon of Power BI, and select the SharePoint Folder connector. Once selected, the connection parameter wizard asks for the root URL for the SharePoint Folder. This is generally everything before the Shared%20Documents in the URL of your SharePoint location.
After Power Query has successfully connected to your SharePoint Folder, a navigator window will appear, asking how you would like to proceed. While the Combine button will be tempting to click at this stage, I recommend you instead click the Transform Data button.
There are two main reasons for opting down this path.
1. The Combine button will result in the folder location being hardcoded in two places in Power Query, making it harder to update the load if your files ever move to a new location.
2. The Transform Data button gives us an opportunity to create a single-entry point into the file-combine, which can easily be adjusted in the future if there is a need to create another entry point for development vs production (I will discuss this in another blog).
Create a FileList
When the Power Query Editor window opens, the first thing we need to do is rename the Query. This can be done by simply double-clicking on the query in the Queries pane (1). Let’s call it sp_FileList. I recommend adding the prefix sp_ so that it is easy to ascertain that this is a SharePoint Folder data source.
Next, right-click on the sp_FileList, and untick the Enable Load setting.
The query should now be italicised, allowing you to see that the query is a connection-only query.
Once again, right-click on the query, and this time select Reference from the popup menu.
This new dependent query will be called “sp_FileList (2)” by default, so double-click and rename it appropriately. This query will end up becoming the final table that is loaded into Power BI with all the combined data (the Master Query). For this demo, I am renaming mine CombinedSales.
Peaking at the Query Dependencies, I can now see the logical flow of the data load.
Housekeeping the Master Query
Before we are ready to combine the files, we must ensure that we are only combining the files we intended to combine. This means we need to locate those files and protect the process against rouge files that could appear in the SharePoint Folder. For example, if a PDF document was accidentally placed into the SharePoint Folder, given that this file-combine is combining Excel documents, the PDF would actually break the data load process, and the entire report would fail to update!
Finding the Files
The first step of housekeeping is to ensure we only target the SharePoint Folder with our files. Because the initial connection parameter only accepts the root URL, at present, the Master Query has every file inside that root folder.
To filter the data down to the few files we do want to combine, navigate to the Folder Path column. Click on the filter arrow (1), and choose Text Filters (2), Contains... (3).
When the wizard appears, provide the name of the Folder your data lives in.
Note: If your data is spread across multiple folder paths, make sure you provide the appropriate keywords as an Or Contains condition to locate and keep all of the correct folders.
It is important that we identify the correct SharePoint Folder(s) via the Text Filter Contains… option, as it ensures we don’t hard code the Folder Path, which will be important if the folder ever moves.
Avoiding refresh issues from rogue files
The last task in housekeeping the Master Query is to ensure that only the file type we are working with exists in the list of files. To achieve this, navigate to the Extension column, click the filter arrow (1), and untick Select All (2).
This will also untick the .xlsx file type, which is unavoidable, but the result will be that an Applied Step will be generated by the interface, producing a formula that just needs a minor tweak to fix.
To fix, simply replace the <> in the formula with =
This forces Power Query to only keep the .xlsx files in the folder, so any rogue PDFs or CSV’s will be removed from the FileList before the file-combine commences.
Note: if you don’t already have the formula bar active in Power Query, you can activate it by going to the View ribbon (1) and ticking the Formula Bar check box (2).
Why all this trouble?
While this process appears much more complicated than if we had clicked the Combine button all the way back at the folder navigator, by going through these preparation steps, we will ultimately have developed a much more robust file-combine data load than would otherwise exist.
In the image below, you can see the lological flow of the two techniques. The navigator’s Combine button is the green arrow process. By performing our housekeeping steps, we have followed the black arrows and are now ready to reengage the One-click Combine.
Combining the files in the Master Query
To begin the file-combine, navigate to the Content column and click the Combine Files arrow.
When the Combine Files wizard appears, be sure to keep the Sample File set to First file (unless there is a valid reason why you need to hard code a specific file), select the object you wish to combine from each file (in this case, Sheet1), and click OK.
Power Query will then prepare several Helper Queries, all of which are important to the overall process; however, discussing their role is a topic for another day.
Now that the folder of files has been successfully combined, you can perform any necessary adjustments to the transformation logic, to further clean and prepare your data. For now, however, my data is adequately prepared, so I will close and apply.
Wrap Up
The result of this technique ensures that all of the data in separate Excel files ends up in a single data table within my Power BI data model, making it easy to filter and write DAX measures to calculate any necessary aggregations.
The entire data load logic flow can be seen in the dependencies schematic below, highlighting that even though the entire data load is complex, the sp_FileList query is easy to identify, and update if the SharePoint Folder ever moves location.
While I am sure many of you reading this article have performed a file-combine before, I hope I have shown you an alternate approach to the normal built-in technique, adding durability and portability to a process that is otherwise delicate and easily disrupted.
Great blog!
Like it , good to go, appreciated