Back to top

Power BI Meta Data extraction using Python

  • Data & Analytics
  • Yash Kumar Shrivastava
  • May 17, 2023

In this blog we are going to learn about Power BI.pbit files, Power BI desktop file Meta data, Extraction of Power BI Meta data and saving it as an excel file using .pbit file and a simple Python code using libraries like Pandas, OS, Regex, JSON and dax_extract.

What is Power BI and .pbix files?

Power BI is a market leading business intelligence tool by Microsoft for Cleaning, Modifying and Visualizing raw data to come up with actionable insights. Power BI comes with its own data transformation engine called power query and a formula expression language called DAX (Data Analysis Expressions).

DAX gives power BI the ability to calculate new columns, dynamic measures, and tables inside Power Bi desktop.

By default, Power BI report files are saved with .pbix extension which is a renamed version of a ZIP file which contains multiple components, such as the visuals, report canvas, model metadata, and data.

What is Power BI .pbit file

.pbit is a template file created by Power Bi desktop which is also a renamed version of a ZIP file that contains all the Meta data for the Power BI report but doesn’t contain the data itself. Once we extract .pbit file we get a DataModelSchema file along with other files which contain all the Meta data of a Power BI desktop files.

Later in this blog we will be using these .pbit and DataModelSchema files to extract Power BI desktop Meta data.

What is the Meta data in a Power BI Desktop file

Regarding what you see in the Report View in a Power BI desktop, meta data is everything. You can think of all the information as meta data, including the name, source, expression, data type, calculated tables, calculated columns, calculated measures, relationships and lineage between the model’s various tables, hierarchies, parameters, etc.

We will mainly concentrate on extracting Calculated Measures, Calculated Columns, and Relationships in this blog.

Extraction of Meta data using Python

Python was used to process and extract the JSON from the.pbit file and DataModelSchema. We first converted JSON to a Python dictionary before extracting the necessary Meta data.

Below are the steps we will need to achieve the requirement:

    1. Exporting .pbix file as .pbit file

    There are two ways to save our power BI desktop file as .pbit file.

    • Once we are in Power BI desktop, we have an option to save our file as power BI template(.pbit) file
    • We can go to File–>Export–>Power BI Template and save the .pbit file at the desired directory.

    2. Unzipping .pbit file to get DataModelSchema file

    We can directly unzip the .pbit file using the 7z-Zip file manager or any other file manager. Once we Unzip the file, we will get a folder with the same name as that of the .pbit file. Inside the folder we will get the DataModelSchema file, we will have to change its extension to .txt for reading in python.

    3. Reading .pbit and Data model schema file in python

    We have an option to directly read the .pbit file in python using the dax_extract library. Second option to read the text file in python and using the JSON module convert it into a Python dictionary. Code can be found in the GitHub repository link given at the end of this file.

    4. Extracting Measures from the dictionary

    The dictionary that we get consists details of all the tables as separate lists, Individuals tables have details related to the columns and measures belonging to that table, we can loop on each table one by one and get details of columns, Measures etc. Below is an example of the Python code can be found in the GitHub Repository link given at the end of this file.

    table Numbertable NameMeasure NameMeasure Expression
    05Query Data% Query ResolvedCALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
    15Query DataSpecial Query PercentageCALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
    26Asset DataClient Retention RateCALCULATE(COUNT(‘Asset Data'[Client ID]),’Asse…

    5. Extracting calculated columns from the Dictionary

    Like how we extracted the measures we can loop on each table and get details of all the calculated columns. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file.


    table noTable Namenameexpression
    82CalendarQuarterCONCATENATE(“Q”,QUARTER(‘Calendar'[Date]) )

    Also Read:  Certainty in streaming real-time ETL

    6. Extracting relationships from the dictionary

    Data for relationships is available in the model key of the data dictionary and can be easily extracted. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file. 

    From TableFrom ColumnTo TableTo ColumnState
    0Operational DataRefresh DateLocalDateTable_50948e70-816c-4122-bb48-2a2e442…Dateready
    1Operational DataClient IDClient DataClient IDready
    2Query DataQuery DateCalendarDateready
    3Asset DataClient IDClient DataClient IDready
    4Asset DataContract Maturity DateLocalDateTable_d625a62f-98f2-4794-80e3-4d14736…Dateready
    5Asset DataEnrol DateCalendarDateready

    7. Saving Extracted data as an Excel file

    All the extracted data can be saved in empty lists and these lists can be used to derive a Pandas data frame. This Pandas data frame can be exported as Excel and easily used for reference and validation purposes in a complex model. Below snapshot gives an idea of how this can be done.

    Do you want to know more about Power BI meta data using Python? Then reach out to our experts today.

    Click here


    In this blog we learnt about extracting metadata from .pbit and DataModelSchema file. We have created a Python script that allows users to enter the file location of .pbit and DataModelSchema file and then metadata extraction along with excel generation can be automated. The code can be found on the below GitHub also sample excel files can be downloaded from below GitHub link. Hope this is helpful and will see you soon with another interesting topic.

    Yash Kumar Shrivastava