r/adventofcode • u/musifter • Apr 21 '26

Other [2018 Day 21] In Review (Chronal Conversion)

And we're back to time slipping, with the final Chronal problem. Which is also the third and final part of the assembly problems this year. As we work on hacking the device to underflow to get back to future.

The first bit of the input is clearly there because of Perl's bizarre bitwise string operators. Basically, if you don't handle things correctly, the "123" & "456" ends up producing "002" instead of 72. Version 5.22 of Perl added separate versions of the operators for the string versions and warnings... but I avoided it by making sure the input was turned into numbers and stayed that way (there is one way that strings can seep back in, and that's with the boolean checks, because Perl doesn't return 0 and 1, but the empty string and 1).

But after that, you get the real function, which is essentially a 24-bit PRNG, that's basically hashing the values. It adds a bit in the high byte so that all three bytes get processed, and then it proceeds to rip off the bytes one at a time to run through an LCG. After we have a hash, we check it against register 0 at the bottom. Register 0 isn't touched, so we don't have to worry about it changing, only matching the hash. And for the shortest time, that's the first hash value produced. And for the longest... well, Pidgeon Hole Principle says it will loop eventually, and the longest will be the value just before the first repeat (ie the end of the cycle... and the start of the cycle will not be the hash from part 1, this hash is not one-to-one reversible).

As for running it in the VM... part 2 takes about 2.5 billion instructions for my input. It took about 40 minutes on my old hardware with a simple interpreter in Perl (that also was outputting the hashes as it went). A good compiled version is going to be a lot faster. So this one is doable using the VM to do the hashing if you want. But you'd still need to figure out what you're looking for and code for that. So it's not just running the VM on the input.

Transcoding is going to be a lot faster... a direct interpretation is a little slow because of a lack of a right bit-shifter (the rest is really fast though). If you optimize that out, then it's pretty much immediate... the cycle on mine starts about 2600 hashes in and ends before 11k.

So, another fun little assembly problem. This is one that you aren't going to make sense of things just by outputting registers at any point, you need to get a little more dirty in the reverse engineering.

4 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/adventofcode/comments/1srjcz1/2018_day_21_in_review_chronal_conversion/
No, go back! Yes, take me to Reddit

83% Upvoted

u/musifter Apr 21 '26

Out of curiosity, I decided to see how much better the VM would be with a right shift opcode. I replaced the routine with just that opcode (followed by the jump). It now needs less than 360k instructions to do part 2. So 7000x less instructions, and so is 7000x faster (it now runs in under .4s).

u/terje_wiig_mathisen Apr 21 '26

This one obviously required quite a bit of reverse engineering to become usably fast:

My primary solver contains lots & lots of commented-out (Rust) tracing code, it actually interpret the instructions in the source code, but then it replaces a few chunks, might be the same idea as u/musifter 's improved VM. This one runs in about 200 (Surface) ms, while my second-try solver which is a full manual translation drops this to 15 ms.

fn innerloop(input:usize) -> usize
{
    let mut bytes = input | 65536;
    let mut hash = 10373714;
    while bytes > 0 {
        hash += bytes & 255;
        hash &= 0xffffff;
        hash *= 65899;
        hash &= 0xffffff;
        bytes >>= 8;
    }
    return hash;
}

fn p2()->usize
{
    let mut i = 0;
    let mut count = 0;
    let mut counts:Vec<usize> = vec![0;0x1000000];
    let mut prev = 0;
    loop {
        count += i * (256+65536);
        i = innerloop(i);
        if counts[i] > 0 {break;}
        counts[i] = count;
        prev = i;
    }
    return prev;
}

I have not looked at u/maneatingape, but the 66 us proves that he is doing _far_ less work. :-)

2

u/DelightfulCodeWeasel Apr 21 '26

You shouldn't need to hard-code your input hash btw. It should always be on the 8th instruction; as the immediate argument to a seti instruction.

1

u/ednl Apr 27 '26 edited Apr 27 '26

I got it down to 46 (edit: 42.7) µs but it depends on what & how you measure (e.g. no file read from disk for me) and of course my CPU is probably faster than his in 2018 (if that is when he timed it as 66 µs) https://www.reddit.com/r/adventofcode/comments/1srjcz1/2018_day_21_in_review_chronal_conversion/oiikaiu/

u/e_blake Apr 21 '26

I did notice for part 1 that register 0 is never modified, so tracing the comparisons to the register made it easy to figure out what value to try, without actually restarting the machine with different candidate values. My git history also showed that my C solution in 2018 tried to transcode the assembly into C for speed, but had a logic bug that cost me a couple of failed attempts before getting the star when I didn't mask by the right constant.

I also noticed on the megathread that different people mentioned different encodings (for example, the ip register assignment is not identical across inputs); my git comments mention that it was harder to write a peephole optimization that would (hopefully) work across all valid inputs, when compared to other assembly puzzles where only one or two magic constants differ but the rest of the assembly was identical for everyone. Eric must have had fun writing his generator, for deciding how to scramble the outputs while still reusing the same flow control.

u/ednl Apr 27 '26 edited Apr 27 '26

I made a very nice compiled version (if I say so myself..) with function pointers including an "undocumented" right shift opcode, and cycle detection which ran in 1.7 ms. Then I came here and saw that people made full native versions with "seen" arrays which were much faster. So I made one myself. Fastest time on an Apple M4 includes parsing for the two main parameters but not reading from disk: 46.1 µs.

static uint hash(uint prev)
{
    uint next = seed;
    for (prev |= 0x10000U; prev; prev >>= 8)
        next = (next + (prev & 0xFFU)) * mult;
    return next & 0xFFFFFFU;
}

https://github.com/ednl/adventofcode/blob/main/2018/21.c

Timing values are highly dependent on how I measure. One single timing inside the program: about 250 µs. One timing inside the program, many program runs to get a minimum: about 100 µs. A thousand timing loops inside, one program run: about 70 µs. Thousand inside, many outside: 46.1 µs. This is all single-threaded. I guess the CPU throttles down very quickly.

EDIT: found a way to speed up the hash function, best measurement is now 42.7 µs:

static uint hash(uint prev)
{
    uint next = (seed + (prev & 0xFFU)) * mult;
    next = (next + (prev >> 8 & 0xFFU)) * mult;
    return ((next + (prev >> 16 | 1U)) * mult) & 0xFFFFFFU;
}

Other [2018 Day 21] In Review (Chronal Conversion)

You are about to leave Redlib