I made a new side project! After listening to this episode of ATP, it had been rattling around in my head for a little while to try making a program to check data integrity and hopefully detect bit rot. The co-hosts (specifically Siracusa) suggested this would be an interesting project for a budding programmer. I just completed my undergraduate CS degree, so I'd like to imagine I'm slightly farther along than budding; however, I had just finished taking an Operating Systems course, and this project sounded interesting enough for me to want to do it. I thought I'd also give myself a little extra challenge and try doing this in a new language, so at the suggestion of my brother James, I tried it in Rust. My program is called bitwrought: afraid that your bits wrought havoc? Try bitwrought!

How it works

The program itself is pretty simple; it's only about 300 lines excluding tests. The general logic looks like this:

for file in files {
  if !file.has_saved_hash() {
    // save the file's hash as an xattr
    // save the file's modified timestamp as an xattr
  } else {
    saved_hash = file.saved_hash()
    new_hash = file.calculate_hash()
    if saved_hash == new_hash {
      println!("Your file is okay")
    } else {
      if file.last_modified() != file.saved_last_modified() {
        println!("Your file was written to")
      } else {
        // if the file hashes are not equal but the file was
        // not modified, that is likely a data integrity issue!
        println!("Your file is not okay")

Pretty simple! The "hashes" are saved as extended attributes directly on the file, which is reasonable because macOS checks data integrity for metadata, but not user data (e.g. files). I got this idea from Howard Oakley of The Eclectic Light Company, who has written lots of helpful articles about bit rot, file integrity, ECC, his own solution called cinch, and much more. Some things not pictured are: allowing the deletion of saved hashes, handling nonexistent files or files with bad permissions, and traversing directories (possibly the first time recursion has seemed the most straightforward way to solve a problem).

What I learned

Does it work?

Pretty much! I have a folder with a bunch of large MP3s that I care about, totaling 10GB in size. I've installed my own program from cargo and run it on these files. It finishes running them in a minute or two. The one thing I want to improve is some kind of progress indicator, as right now it basically looks like it's frozen until it's done and then prints the output. Overall, this has been a fun project and I feel like I've learned a lot. It's also made me feel excited about little side projects again, and I intend to do some more!

Written: March 1, 2023
Last edited: April 23, 2024