# Saving precious bytes by storing ULID strings as binary with Drizzle
I recently read The problem with using UUID primary key in MySQL by PlanetScale where they argue the benefits of using auto-incrementing integers as primary keys in MySQL databases. I am not a fan of this approach, as it's hard to ensure unique IDs across multiple databases without some sort of central ID authority.
However, that's not the only point this article makes. It also underlines the importance of IDs that follow a predictable and sequential order. Mainly for performance reasons, as it allows for better storage utilization and smaller, faster indexes (with fully random IDs, InnoDB may only use 50% of the available space in each page!).
Therefore I prefer ULID, which promises to offer (as its name implies) "Universally Unique Lexicographically Sortable Identifiers". In my opinion, this is a great compromise between fully random UUIDs and auto-incrementing integers (I also find them to be quite pretty :p). But there's a problem with how most people store ULIDs in databases: as strings. 😱
This is a big waste of memory, as ULIDs are just 128 bits, or 16 bytes long. Storing them as CHAR(26)
will use 72 bytes if you're using the default utf8mb4
encoding in MySQL. That's 4.5 times more than if youre using a binary column! It's understandable that most people do this, as it's way easier to work with strings than with binary data in most programming languages. But what if I told you that you can have the best of both worlds?
Intoducing: Custom column types in Drizzle! With this feature, you can define your own column types and how they are de/serialized. This allows you to store ULIDs as binary data in the database, while still working with strings in your application code.
But there's a problem with this code. Our Uint8Array will actually be 17 bytes, not 16. This is because ULIDs are 128 bits long, and base32 encoding uses 5 bits per character. And guess what, 128 is not divisible by 5!
An easy way to fix this is by first padding the ULID with zeroes to 32 characters, then slicing the first four bytes after decoding it. This way, we'll always have a 16-byte buffer.
When re-encoding our buffer back into a base32 string, we'll also have to pad the buffer with zeroes before encoding it to ensure the bytes line up with the original ULID.
Now we can use this custom column type in your Drizzle schema:
And that's it! Now you can store ULIDs as binary data in your database, while still working with strings in your application code.
Now, let's see how we can use this in our application code:
And the best part? This technique can be used with any binary data while still working with strings in your application code, not just ULIDs! (but please, don't store massive blobs in your database, that's what object storage is for! ðŸ˜)