Intended Readers: IT-Nerds and everyone
TLDR; The new uuids of Things 3.13 basically calculate like this: uuid = base58(sha1(old_uuid).slice(0,16));
.
I want to tell the story of how I wasted^H^H^H^H^H^H used 30h to find out how the new uuids are generated.
I maintain a project to show the tasks in today of Things on a Raspberry Pi: MMM-rusty-things . I use this on a screen in my flat to remind me of my task which need doing whenever I walk past the screen. Sometime after upgrading Things to the newest version 3.13 I noticed that the pi didn't show any of the changes I made to the my today list it didn't get updated. So I logged into the Pi and got greeted by an errormessage along the lines: "Can't create a new Task without a title". Which I knew meant something was wrong with the uuids. (I build this error message so no new entries would wrongly created without title. I keep a the same, although stripped down, sqlite db as the original application.)
So then I nerd-sniped me. I looked into the sqlite file on my mac maintained by Things itself and found strange UUIDs (e.g.: 2GsLNkuMTiupYex3GZhibh
, Be5S7VD3EqXQHiJqEt8EPS
). I was intrigued what they are as I wanted to fix my project. So I nerd sniped myself quite hard.
First act: static analysis
I knew that the "old" sqlite db must somehow be upgraded to use the new identiefiers I assumed there would be a migration of the form update table set uuid = ...
or something along those lines. In the ThingsModel Framework I found UPDATE 'TMTask' SET uuid=migrate_uuid_to_entity_identifier_base58string(uuid)
which looked exaclty like what I wanted. Now I only needed to find out what migrate_uuid_to_entity_identifier_base58string
did. When searching for this as a function name in the same Framework in hopper I did find a function named __Z46migrate_uuid_to_entity_identifier_base58stringP15sqlite3_contextiPP13sqlite3_value
which looked somewhat like what I wanted. (Later I found out that this is c++ mangling the name to include more info about parameters and return-values) It called a class method [THMSyncSchemaNormalizer normalizeUUID]
. But I didn't understand what happend there. So I moved back and searched the internet for what base58 was, because I haven't heard about it before diving into this adventure.
It turns out it was invented by Satoshi Nagamoto for bitcoin-adresses. In short it is an encoding like base64 but removes +
, /
and some chars which are ambiguous, like Lower L and Upper I and zero and Upper O.
Then I thought I found the solution because base58 requires 22 chars for a 16 byte value (length of a UUID). So I assumed that the UUID as hex representation was "just" base58 encoded. But everything I tried here failed miserably. I thought maybe they didn't use a default implementation to get a base58 representation and the method to encode base58 looked suspicious to me. Hopper has a decompiler built in which spitts out pseudo-c code. The code had many shifts and multiplications with seemingly random / magic values and I thought they must have changed something about that.
Quick Question, what does this line do / what is the actual c code which is equivalent?
value = (SAR(HIQWORD((rbx - r12) * 0x51eb851eb851ebee) + (rbx - r12) * 0x8a, 0x6)) + (HIQWORD((rbx - r12) * 0x51eb851eb851ebee) + (rbx - r12) * 0x8a >> 0x3f);
Take a second and try to figure it out. Did you figure it out? Read on to follow my journey
I didn't figure it out. In the meantime I looked around for some implementations of base58 encoding in different languages. I compiled the c code and to my surprise clang and gcc created the same line. That is the faster way to multiply by 138 and divide by 100 which calculates the amount of chars required for the base58 string. WTF are these compiler optimization??!?!? The other fun compiler optimization is modulo 58 and dividing by 58. After some time I decompiled the complete function and more or less it is a default implementation and I had to look somewhere else to find the differences between my simple method and the actual implementation.
In the end I had to go back a step and take a new jump into a different part. I figured I will try to use Xcode to use the method directly.
Learnings:
- Calling convention on mac os x: Params: rdi, rsi, rdx, rcx, ... Returnvalue: rax
- Using hopper and IDA to decompile and disassembler
- Create decompiled output using ret det and other
- decompiling code using pseudo-code and assembler
Second act: using xcode
Why should I not use the Frameworks in an own project (just a simple file but including all needed frameworks) and call the function myself. So I created a new Xcode project and included the frameworks and created a in-memory sqlite db and did the following query: select normalize('71247D15-698E-4308-B2DB-B2CF252A378F') from my_table;
. This gave me the correct result I saw after the migration in the Things sqlite db. To use this function I dynamically got the a pointer to the function using
void* BaseHandle = dlopen("/Applications/Things3.app/Contents/Frameworks/ThingsModel.framework/Versions/A/ThingsModel", RTLD_LOCAL);
if (BaseHandle) {
char* (*normalize)() = dlsym(BaseHandle, "_Z46migrate_uuid_to_entity_identifier_base58stringP15sqlite3_contextiPP13sqlite3_value");
if (normalize) {
sqlite3_create_function( db, "normalize", 1, SQLITE_ANY, NULL, normalize, NULL, NULL);
}
else{
printf("Not available\n");
}
}
I have to admit that it felt like I was hacking quite hard at that moment. Loading a function from somewhere else and register it to use with sqlite.
Using this method I gathered more info on how it was working. Things calls CC_SHA1
on the whole uuid-string represenation and base58 the first 16 bytes of that. I thought I figured it out. I touted to my friends that I figured that out. But as it turns out, I didn't. At least not for all uuids. The strange thing was my algo was
Learnings:
- lldb commands like
re re
to print all registers - one can create breakpoints for strangly named function names in xcode by using lldb directly like:
b ___lldb_unnamed_symbol1077$$ThingsModel
"b libcommonCrypto.dylib`CC_SHA1"
Third act: dynamic analysis
You need to know that Things doesn't only store uuids in the uuid column but for repeating tasks it stores the uuid of the template + "-yyyymmdd" (being a date representation e.g. 20200529 for May 29th 2020). Repeating tasks where calculated wrong by me. So I set out to find out how they where calculated.
The breakpoint in sha1 triggered twice for these uuids and only once for uuids without extension. When looking at the input of the second invocation I saw some gibberish followed by the string which was left "-yyyymmdd" but I couldn't figure out what the first part was. But after some time I figured I would look at the input of a invocation I know, the first one or better even the only invocation for uuids without extension. I found the string starting at $rdi
like I would expect. Ok. lets try another sha1 implementation and put the gibberish + extension into it and look at the output. It was the same. What? At that point I remembered sha1 operates on bytes and not on chars. As it turns out the gibberish are the first 16 bytes of the first sha1 invocation. Now the puzzle solved for me.
For uuids without extension: uuid = b58(sha1(old_uuid).slice(0,16))
Uuids: with extension: uuid = b58(sha1(sha1(old_uuid.slice(0,36)).slice(0,16) + old_uuid.slice(36)).slice(0,16))
Learnings:
- Lldb command
x/20x $rdi
to print 20 ints in hex representation (reversed because of big-endianness),x/s $rdi
to print a string at$rdi
- If it feels to complicated take a step back and look for an easier way.
Closing act
Before I put the code into Rust which is needed for the project I coded it in node to see if it really did work. And lo and behold it now worked for every uuid (with or without extension) I threw at it. Success. Now only make the rust compiler happy and I'm done. This commit concludes the work I put into this over the duration of one week: The one commit resulting from this work
Helpful links and Tools for this Project
General to understand the problem
Ascii Table (I can't remember the ascii codes for the life of me)
x86 Register on Wikipedia (which register overlap)
Decompile
HopperApp (Can use trial version)
IDA Pro (free version for students exists)
Debug
npm packages: base-58
and sha1
to build a node js version to test hypothesis