Indium is a Sponsor at HIMSS23. Meet us at Booth #8300-12

Indium Software Indium Software
Indium software Logo
  • SERVICES
    • Application Engineering
          • Agile Product Engineering
            • Product & Platform Engineering
            • Product Modernization
            • Product Maintenance & Support
          • Agile Application Development
            • Enterprise Application Development
            • Rapid Application Development
            • Application Modernization
            • Application Maintenance & Support
          • Low-Code Development
            • Mendix Services
            • PowerApps
            • OutSystems
    • Data & Analytics
          • Data Engineering
            • Data Architecture
            • Data Ingestion
            • Data Warehouse & Data Lake
            • Cloud Data Management
            • DataOps
          • Data Modernization
            • Data Migration Services
            • Data Workload Optimization
            • Data Pipelines
          • Data Analytics
            • Self service BI
            • Embedded Analytics
            • Data Visualization Services
            • Data Democratization
          • Data Science & AI
            • Data Discovery
            • Predictive Analytics
            • Data Annotation and Labelling
            • Data Augmentation
            • Text Analytics
            • MLOps
            • IoT Analytics
    • Cloud Engineering
          • Cloud Services
            • Cloud Strategy & Advisory Services
            • Cloud Solution Architecture
            • Hybrid Cloud & Optimization
            • Cloud Migration
            • Cloud Modernization
          • Cloud Platforms
            • AWS
            • GCP
          • DevOps
            • CI/CD Services
            • Deployment Automation
            • DevOps Containerization Services
            • Shared DevOps Services
    • Digital Assurance
          • Quality Assurance Services
            • Functional Testing
            • Test Automation
            • Performance Testing
            • Test Data Management
            • Business Value Chain Testing
            • UAT Testing Services
          • Quality Engineering Services
            • Data Assurance
            • API/Microservices Testing
            • Resiliency & Chaos Engineering
            • Low Code Platform Testing
            • TestOps Services
          • Future Tech Testing Services
            • In-sprint Automation
            • NFT and Blockchain Testing
            • IoT Testing
            • ETL Testing
          • Test Advisory & Consulting
            • QA Maturity Assessment
            • Automation Strategy
            • Transformation from QA to QE
            • Agile Testing Services
    • Low Code Development
          • Mendix
            • Mendix Design and Architecture
            • Mendix App development
            • Mendix QA
            • Mendix Reseller
          • PowerApps
            • PowerApps development Services
            • Indium Power stack for PowerApps
          • Outsystems
            • Indium OutSystems Services
            • Why OutSystems
          • Low Code Framework
            • LOCAS
  • DIGITAL ACCELERATORS
    • teX.ai
    • uphoriX
    • iDAF
  • INDUSTRIES
    • Healthcare
    • BFSI
    • Retail
    • Manufacturing
  • PARTNERS
    • Mendix
    • AWS
    • Striim
    • Databricks
    • GCP
  • INSIGHTS
    • Blogs
    • Case Studies
    • Success Stories
    • Whitepapers
    • Webinars & Podcasts
  • ABOUT
    • About Us
    • News and Events
    • CSR
    • Contact
  • CAREERS
  • INQUIRE NOW
Data & Analytics

Power BI Meta Data extraction using Python

By Yash Kumar Shrivastava May 17, 2023 5 Mins Read
285
SHARES
ShareTweet

In this blog we are going to learn about Power BI.pbit files, Power BI desktop file Meta data, Extraction of Power BI Meta data and saving it as an excel file using .pbit file and a simple Python code using libraries like Pandas, OS, Regex, JSON and dax_extract.

Topics Covered

  • What is Power BI and .pbix files?
  • What is Power BI .pbit file
  • What is the Meta data in a Power BI Desktop file
  • Extraction of Meta data using Python
    • 1. Exporting .pbix file as .pbit file
    • 2. Unzipping .pbit file to get DataModelSchema file
    • 3. Reading .pbit and Data model schema file in python
    • 4. Extracting Measures from the dictionary
    • 5. Extracting calculated columns from the Dictionary
    • 6. Extracting relationships from the dictionary
    • 7. Saving Extracted data as an Excel file
  • Conclusion

What is Power BI and .pbix files?

Power BI is a market leading business intelligence tool by Microsoft for Cleaning, Modifying and Visualizing raw data to come up with actionable insights. Power BI comes with its own data transformation engine called power query and a formula expression language called DAX (Data Analysis Expressions).

DAX gives power BI the ability to calculate new columns, dynamic measures, and tables inside Power Bi desktop.

By default, Power BI report files are saved with .pbix extension which is a renamed version of a ZIP file which contains multiple components, such as the visuals, report canvas, model metadata, and data.

What is Power BI .pbit file

.pbit is a template file created by Power Bi desktop which is also a renamed version of a ZIP file that contains all the Meta data for the Power BI report but doesn’t contain the data itself. Once we extract .pbit file we get a DataModelSchema file along with other files which contain all the Meta data of a Power BI desktop files.

Later in this blog we will be using these .pbit and DataModelSchema files to extract Power BI desktop Meta data.

What is the Meta data in a Power BI Desktop file

Regarding what you see in the Report View in a Power BI desktop, meta data is everything. You can think of all the information as meta data, including the name, source, expression, data type, calculated tables, calculated columns, calculated measures, relationships and lineage between the model’s various tables, hierarchies, parameters, etc.

We will mainly concentrate on extracting Calculated Measures, Calculated Columns, and Relationships in this blog.

Extraction of Meta data using Python

Python was used to process and extract the JSON from the.pbit file and DataModelSchema. We first converted JSON to a Python dictionary before extracting the necessary Meta data.

Below are the steps we will need to achieve the requirement:

    1. Exporting .pbix file as .pbit file

    There are two ways to save our power BI desktop file as .pbit file.

    • Once we are in Power BI desktop, we have an option to save our file as power BI template(.pbit) file
    • We can go to File–>Export–>Power BI Template and save the .pbit file at the desired directory.

    2. Unzipping .pbit file to get DataModelSchema file

    We can directly unzip the .pbit file using the 7z-Zip file manager or any other file manager. Once we Unzip the file, we will get a folder with the same name as that of the .pbit file. Inside the folder we will get the DataModelSchema file, we will have to change its extension to .txt for reading in python.

    3. Reading .pbit and Data model schema file in python

    We have an option to directly read the .pbit file in python using the dax_extract library. Second option to read the text file in python and using the JSON module convert it into a Python dictionary. Code can be found in the GitHub repository link given at the end of this file.

    4. Extracting Measures from the dictionary

    The dictionary that we get consists details of all the tables as separate lists, Individuals tables have details related to the columns and measures belonging to that table, we can loop on each table one by one and get details of columns, Measures etc. Below is an example of the Python code can be found in the GitHub Repository link given at the end of this file.

    table Numbertable NameMeasure NameMeasure Expression
    05Query Data% Query ResolvedCALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
    15Query DataSpecial Query PercentageCALCULATE(COUNT(‘Query Data'[Client ID]),’Quer…
    26Asset DataClient Retention RateCALCULATE(COUNT(‘Asset Data'[Client ID]),’Asse…

    5. Extracting calculated columns from the Dictionary

    Like how we extracted the measures we can loop on each table and get details of all the calculated columns. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file.

     

    table noTable Namenameexpression
    62CalendarDayDAY(‘Calendar'[Date])
    72CalendarMonthMONTH(‘Calendar'[Date])
    82CalendarQuarterCONCATENATE(“Q”,QUARTER(‘Calendar'[Date]) )
    92CalendarYearYEAR(‘Calendar'[Date])

    Also Read:  Certainty in streaming real-time ETL

    6. Extracting relationships from the dictionary

    Data for relationships is available in the model key of the data dictionary and can be easily extracted. Below is the sample output of the Python code can be found in the GitHub Repository link given at the end of this file. 

    From TableFrom ColumnTo TableTo ColumnState
    0Operational DataRefresh DateLocalDateTable_50948e70-816c-4122-bb48-2a2e442…Dateready
    1Operational DataClient IDClient DataClient IDready
    2Query DataQuery DateCalendarDateready
    3Asset DataClient IDClient DataClient IDready
    4Asset DataContract Maturity DateLocalDateTable_d625a62f-98f2-4794-80e3-4d14736…Dateready
    5Asset DataEnrol DateCalendarDateready

    7. Saving Extracted data as an Excel file

    All the extracted data can be saved in empty lists and these lists can be used to derive a Pandas data frame. This Pandas data frame can be exported as Excel and easily used for reference and validation purposes in a complex model. Below snapshot gives an idea of how this can be done.

    Do you want to know more about Power BI meta data using Python? Then reach out to our experts today.

    Click here

    Conclusion

    In this blog we learnt about extracting metadata from .pbit and DataModelSchema file. We have created a Python script that allows users to enter the file location of .pbit and DataModelSchema file and then metadata extraction along with excel generation can be automated. The code can be found on the below GitHub also sample excel files can be downloaded from below GitHub link. Hope this is helpful and will see you soon with another interesting topic.

    Author Yash Kumar Shrivastava

    Prev Post

    Enabling intercommunication of distributed Google Virtual Machines via a secured private network

    May 17, 2023 6 Mins Read
    Next Post

    Data Wrangling 101 – A Practical Guide to Data Wrangling

    May 17, 2023 5 Mins Read

    Related Posts

    Streamline Snowflake Error Logs with Real-time Notifications to Slack Channel

    June 5, 2023

    Unveiling the Shadows: Understanding the Reach and Possible Security Threats of Your Digital Footprint

    May 31, 2023

    Seamless Communication: Exploring the Advanced Message Queuing Protocol (AMQP)

    May 30, 2023
    • Most view
      • OutSystems: The Low-Code Platform Empowering Business Growth
      • Empowering Testing Excellence: Exploring the Synergy between Azure DevOps and Diverse Testing Techniques
      • 1 Click Deployment Framework for Mendix Application on Public Cloud(s)
      • Streamline Snowflake Error Logs with Real-time Notifications to Slack Channel
      • Enhancing Sensory Perception: Developing an Olfactory Detection App with Flutter
      • Unveiling the Shadows: Understanding the Reach and Possible Security Threats of Your Digital Footprint
      • Seamless Communication: Exploring the Advanced Message Queuing Protocol (AMQP)
      • Neo Banking: Exploring Achievements, Failures, and the Role of Technology
      • Testing Assistive Technologies in a Product
      • Mastering Data Visualization: Tips and Tricks to Effectively Analyze Information
      • Application Engineering
      • Product Engineering
      • Application Modernization
      • Rapid Application Development
      • Low Code Development
      • Data & Analytics
      • Data Engineering
      • Data Modernization
      • Data Analytics
      • AI & ML Solutions
      • Databricks
      • Cloud Engineering
      • Cloud Migration
      • Cloud Modernization
      • Cloud Optimization
      • Hybrid Cloud
      • Devops
      • Digital Assurance
      • Test Advisory & Consulting
      • Quality Assurance Services
      • Quality Engineering Services
      • Future Tech Testing Services
      • Smart Test Automation
      • Low Code Development
      • Mendix Development
      • Power Platform
      • Digital Accelerators
      • teX.ai
      • uphoriX

      Indium Software provides digital engineering services that make technology work

      Cupertino, CA 95014-2358, USA
      +1 (888) 207 5969

      Facebook Twitter YouTube LinkedIn

      © 2023 All Rights Reserved

      Sitemap | Privacy Policy

    Indium Software

      Type above and press Enter to search. Press Esc to cancel.