A #dataOp ramblings

Adventures in Business Intelligence

Extracting tables from SQL by using Sqlfluff

I got an assignment the other day to produce documentation to send to a customer. The extraction of the table names required to execute a certain Databricks notebook was part of the task. The plan was to build an object dependency tree. The query spanned 279 lines. How can you extract only the table names from a file without having to manually look for them? Can we make use of this technique again in the future?

Azure Active Directory extraction with Databricks

During data engineering projects I tend to try and minimize the tools being used. I think it’s a good practice. Having too many tools causes sometimes for errors going unnoticed by the teams members. One of the advantages of having a tool like Databricks is that it allows us to use all the power of python and avoid, like I did in the past, to have something like Azure Functions to compensate for the limitations of some platform.

Calculating workdays in Databricks

There’s not an official function to calculate workdays in Databricks. Here are some solutions. Having a DimCalendar with holidays and at least Databricks Runtime 12 If you have a DimCalendar in the system you can now do LEFT LATERAL JOIN without the correlated subquery errors when using non-equality predicates. Check SPARK-36114 for more details. Calculating working days is then as simple as run a query like: SELECT mt.StartDate, mt.EndDate, dc.Workdays FROM myTable mt LEFT JOIN LATERAL (SELECT COUNT(DateKey) - 1 AS Workdays FROM dimCalendar dc WHERE mt.

Mass changing power query connection strings in Excel using PowerShell

A while ago I’ve received an email from Jason Alvarez asking me if I knew a way to change Power Query connections inside an Excel file. The problem is similar to my previous post on Mass changing pivot table connection strings in Excel using PowerShell. Turns out you can and he was able to find this solution: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 #root path to the collection of excel docs: $rootPath = "C:\Temp\PowerQueryDemo\" #Current query file path: $currentPath = "C:\Temp\PowerQueryDemo\" #Updated query file path: $newPath = "C:\Temp\PowerQueryDemo\PowerQueryNewSource" Get-ChildItem -Path $rootPath -Recurse -Include *.

PBIPT #41 - Solving Wordle with Power BI

Yesterday I had the pleasure to be back to in-person to give a small fun presentation about bending Power BI to help us solve Wordle puzzles. The content is in Portuguese but I will upload an English version during this week. Here’s the link on GitHub where you can download the file: https://github.com/fpvmorais/presentations If you have any questions please reach me. Have fun!

JavaException: Must have Java 8 or newer installed.

While creating a new machine to be the Integration Runtime for Purview and after I have installed the mandatory JRE that allows the connection to Snowflake I kept getting this error: Error: (3913) JavaException: Must have Java 8 or newer installed. This puzzled me because I had installed version 17. Started troubleshooting and followed this guide to check if the installation was correct. Everything seemed alright but the error was still there.

Python Azure Functions Tips

Here are some tips of the things I’ve learned while creating them. App Plan If you already have an Windows ‘App Service Plan’ you’ll need a different one for running Python. You can’t host Linux and Windows apps in the same resource group. If you have an existing resource group named AzureFunctionsQuickstart-rg with a Windows function app or web app, you must use a different resource group. Python version If you need to use ‘azure-identity’ you’ll need to use python 3.

Accessing local settings while unit testing Azure Functions

There’s a lot to chew while unit testing Azure Functions. I going to be quite liberal with the terminology because technically some of this will be in fact integration testing and not unit testing per se. Either way, Azure Functions load the local.settings.json on startup, creating several environment variables that then we can use in our code. In C# we would access them like so: tenantId = configRoot["TenantId"]; appId = configRoot["AppId"]; or in Python:

Add multiple scans using the Purview API

As I explained in a previous post about Purview with the official Snowflake connector you need to create a scan per database/schema pair. In the customer where I was working there are a few hundred databases plus their schemas. Doing this by using the UI would mean that we will be wasting a lot of time clicking away and be prone to forget some database/schema pair. I’ve explored the documentation about the Scans - REST API and the rest of this post is just some PowerShell code to automate the creation of scans and their schedules in Purview.

Add user(s) to multiple Power BI workspaces quickly

A few days ago I had to add a new team member to more that 50 workspaces. I started dying inside just thinking of going to each workspace and add it using the portal ui. Here’s a little snippet that saved me a few hundred clicks: Login-PowerBI $workspaces = Get-PowerBIWorkSpace $user = "mySpecial.User@MyDomain.com" foreach($workspace in $workspaces) { Write-Host "Adding User to : " $workspace.Name try { Add-PowerBIWorkspaceUser -Id $workspace.Id -UserPrincipalName $user -AccessRight Member } catch { Write-Host "Message: [$($_.