Advent of Code 2020 - Day 13 · A #dataOp ramblings

28 Feb 2021, 18:00

Power BI / DAX / Power Query / Advent of Code 2020

We arrived to the island after the ferry trip and we see ourselves in need of transportation to the nearest airport. Fortunately there’s a shuttle bus service between the sea port and the airport that could take us there.

Day 13 is a wonderful mathematical puzzle, following the theme of modular arithmetic of the previous one. Let’s dive into it as there some surprises with the numeric limits on Power BI.

If you want to follow along please download AdventOfCode2020.pbit and check Day 1 post for instructions.

Part 1

To solve the first part of the puzzle we have to find the earliest time we could depart on a bus. The first line of our input is a numeric value representing a timestamp and the second line the ids of the bus on service. There’s some ‘x’ characters amidst the bus ids that represent out of service bus that are of no interest to find the answer.

The timestamp marks the moment after we can depart aboard of the buses and each bus id represents the duration of the trip. So a bus with id 7, departs each 7 minutes \([0, 7, 14, 21,…]\).

Now that we understand the rules we need to find which bus leaves first after a certain timestamp, calculate the difference between the two timestamps and multiply that value by the bus id.

This looks like a nice problem to solve using some DAX.

But first we need to import our data.

Split by delimiter, add an index:

And remove those useless ‘x’s:

With our data ready, a way to solve this problem is to subtract to the bus id the remainder of the division of the timestamp by the bus id. The smallest difference will be the answer. Taking the sample data we can test our algorithm:

\[7 - (939 \bmod 7) = 6 \] \[13 - (939 \bmod 13) = 10 \] \[59 - (939 \bmod 59) = 5 \] \[31 - (939 \bmod 31) = 22 \] \[19 - (939 \bmod 19) = 11 \]

The smallest is 5 and multiplying by 59 we have 295.

Translating this idea to DAX, we create a variable to store the timestamp in our file and on variable tt the remaining rows of the table. In our return value, we calculate the difference between the bus id and the remainder of the division of the timestamp by the bus ids. Then we sort that value by ascending order and pick the first row using the TOPN function.

Multiplying the difference by the bus id returns the answer for the first part of our puzzle.

VAR earliestTimestamp =
    CALCULATE (
        MAX ( input_day_12_sample[BusId] ),
        input_day_12_sample[Index] = 0
    )
VAR tt =
    FILTER (
        input_day_12_sample,
        input_day_12_sample[BusId] <> earliestTimestamp
    )
RETURN
    SELECTCOLUMNS (
        TOPN (
            1,
            ADDCOLUMNS ( tt, "Mod", [BusId] - MOD ( earliestTimestamp, [BusId] ) ),
            [Mod], ASC
        ),
        "Part 1", [BusId] * [Mod]
    )

Part 2

Now the fun begins. We need to find the first timestamp from which the first bus on the list departs and in the subsquent timestamp the second one departs and so on until the end of the list. An ‘x’ in the schedule means that there are no departures in that minute.

If you check the provided example:

7,13,x,x,59,x,31,19

| timestamp | bus departure |
+-----------+---------------+
| t         |        7      |
| t+1       |        13     |
| t+4       |        59     |
| t+6       |        31     |
| t+7       |        19     |

How do we find that t? A possibility would be to try and apply some brute force testing incrementing numbers until we find one which is multiple of seven, will have a remainder of one when divided by 13, 4 when divided by 59, 6 when divided by 31 and 7 when divided by 19. But reading through the text on the puzzle we see that brute forcing will be impractical as the numbers can be really big.

A more doable approach is being smart about the increment steps. We would start with increments of size 7 plus 1, the delay, and when a number that’s divisible by 13 is found we would increment the loop step by 13*7. An example to illustrate the algorithm:

time stamp:   7     loop interval:  7      check: ( 7+1) mod 13 == 0? FALSE
time stamp:  14     loop interval:  7      check: (14+1) mod 13 == 0? FALSE
...
time stamp:  78     loop interval:  7      check: (78+1) mod 13 == 0? TRUE
time stamp: 169     loop interval: 91      check: (91+4) mod 59 == 0? FALSE
...

And we will keep going till the last bus, getting our answer. This will work but there’s a more elegant of solving the problem by using the method known as the “Chinese Remainder Theorem”. We are in conditions to apply it because the examples and the input file are composed of only prime numbers.

I’ll try my best to show you how we can use the theorem to solve this and implement the code in Power Query.

We want a number that will satisfy the conditions:

\begin{align*} x &\equiv 0 \:(\bmod 7) \\\ x &\equiv 12 \:(\bmod 13) \\\ x &\equiv 55 \:(\bmod 59) \\\ x &\equiv 25 \:(\bmod 31) \\\ x &\equiv 12 \:(\bmod 19) \\\ \end{align*}

First step is to calculate \(N\).

\begin{align*} N &= 7 * 13 * 59 * 31 * 19\\\ &= 3162341 \end{align*}

Then each of our \(Ni\):

\begin{align*} N_1 &= N/7\ \; =& 451\,763\\\ N_2 &= N/13 =& 243\,257\\\ N_3 &= N/59 =& 53\,599 \\\ N_4 &= N/25 =& 102\,011\\\ N_5 &= N/12 =& 166\,439\\\ \end{align*}

After we calculate the inverse \(mod\) for each of our values:

\begin{align*} 0x_1 &\equiv 1 (\bmod 7) \\\ 12x_2 &\equiv 1 (\bmod 13) \\\ 55x_3 &\equiv 1 (\bmod 59) \\\ 25x_4 &\equiv 1 (\bmod 31) \\\ 12x_5 &\equiv 1 (\bmod 19) \\\ \end{align*}

We will get:

\begin{align*} 2x_1 &\equiv 1 (\bmod 7) \\\ 1x_2 &\equiv 1 (\bmod 13) \\\ 35x_3 &\equiv 1 (\bmod 59) \\\ 3x_4 &\equiv 1 (\bmod 31) \\\ 18x_5 &\equiv 1 (\bmod 19) \\\ \end{align*}

Putting our data in a table, we just have to sum the products and will have our solution:

\(bi\)	\(Ni\)	\(x_i\)	\(b_iN_ix_i\)
0	451763	2	0
12	243257	1	2919084
55	53599	35	103178075
25	102011	3	7650825
12	166439	18	35950824

\begin{equation} x = 0+2\,919\,084+103\,178\,075+7\,650\,825+35\,950\,824 = 149\,698\,808\\\ \\\ x \equiv 149\,698\,808\: (\bmod 3\,162\,341)\\\ \\\ x \equiv \mathbf{1\,068\,781}\: (\bmod 3\,162\,341) \end{equation}

We got to the result. So it seems we are ready to implement this algorithm in Power BI. Starting from this table:

\(N\) List.Product([Modulus])
\(b_i\) Number.Mod([Modulus] - Number.Mod([Index]-1,[Modulus]), [Modulus])
\(N_i\) N / [Modulus]
\(x_i\) fnInvertMod([Ni], [Modulus])
\(b_iN_ix_i\) List.Product({[Remainders], [Ni], [Xi]}, Precision.Decimal)

Here’s the fnInvertMod code that implements the Extended Euclidian Algorithm:

fnInvertMod = (ni as number, modulus as number) =>
let
    fnRecInvertMod =
        (a as number, b as number, x as number, y as number) =>
            if b = 0
            then if x < 0 then x + modulus else x
            else @fnRecInvertMod(b, Number.Mod(a, b), y, x - y * Number.IntegerDivide(a,b))
in
    fnRecInvertMod(ni, modulus, 1, 0)

What you are seeing here is a recursive function. These functions in Power Query have a the @ character on the recursive call which makes them easy to spot. They are an alternative to the list looping functions we have been using while solving the puzzles. Beware as they can be slow or can fault when the recursive tree is long.

Another interesting thing you can see is the use of the Precison.Decimal parameter when calculating the product. This is need because by default Power Query works with double precision which conforms with IEEE standard. You can check more details on this Power Query M Primer (Part 7): Types—Numbers by Ben Gribaudo. To shown you what I mean, when working with the full set of values:

Not forcing the precision causes these differences and you should be aware when working with such large numbers of the limits of Power BI.

And finally our answer can be obtained by doing:

Number.Mod(List.Sum([BiNiXi],Precision.Decimal),N,Precision.Decimal)

Conclusion

I have had a lot fun with this one. Really liked to use some Math to solve problems instead of going full brute force mode on. Also was a great exploration on the numeric limits to the calculations you can use on Power BI. The part 2 of this puzzle was impossible to solve by using DAX due to the numeric rounding. There’s an implicit conversion from ‘Whole Number’ to ‘Decimal Number’ when dividing numbers that causes the imprecision. (https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-data-types#implicit-and-explicit-data-type-conversion-in-dax-formulas) Also be aware due to JavaScript limitations visuals are limited to numbers below 9 007 199 254 740 991 (16 digits long). Here’s what you can see when trying to see a number bigger than the limit:

And here’s the real number as shown with Power Query:

I’ll leave you with an excellent read about this theme - Choosing Numeric Data Types in DAX article from Marco Russo. It can help you avoid issues when designing your data models.

Have fun!!

Advent of Code 2020 - Day 13

Part 1

Part 2

Conclusion

Share!