SQL

Calculating workdays in Databricks

There’s not an official function to calculate workdays in Databricks. Here are some solutions.

Having a DimCalendar with holidays and at least Databricks Runtime 12

If you have a DimCalendar in the system you can now do LEFT LATERAL JOIN without the correlated subquery errors when using non-equality predicates. Check SPARK-36114 for more details.

Calculating working days is then as simple as run a query like:

SELECT mt.StartDate,
       mt.EndDate,
       dc.Workdays
 FROM myTable mt
  LEFT JOIN LATERAL
    (SELECT COUNT(DateKey) - 1 AS Workdays
     FROM dimCalendar dc
     WHERE mt.StartDate <= dc.DateKey
        AND mt.EndDate >= dc.Datekey
        AND dc.IsDayWeekDay = TRUE
        AND dc.IsDayHoliday = FALSE
    )

If this query is slow, please check if the data types for the dates columns match. All are DATE or all are TIMESTAMP. If the sluginess remains check if any of the fields are not part of the statistics of the table.