Calculating workdays in Databricks
There’s not an official function to calculate workdays in Databricks. Here are some solutions.
Having a DimCalendar with holidays and at least Databricks Runtime 12
If you have a DimCalendar in the system you can now do LEFT LATERAL
JOIN
without the correlated subquery errors when using non-equality
predicates. Check SPARK-36114 for more details.
Calculating working days is then as simple as run a query like:
SELECT mt.StartDate,
mt.EndDate,
dc.Workdays
FROM myTable mt
LEFT JOIN LATERAL
(SELECT COUNT(DateKey) - 1 AS Workdays
FROM dimCalendar dc
WHERE mt.StartDate <= dc.DateKey
AND mt.EndDate >= dc.Datekey
AND dc.IsDayWeekDay = TRUE
AND dc.IsDayHoliday = FALSE
)
If this query is slow, please check if the data types for the dates
columns match. All are DATE
or all are TIMESTAMP
. If the sluginess
remains check if any of the fields are not part of the statistics of
the table.