Comparing File Modified Date FAT32 vs NTFS

For the last year, I've been doing system backups using a program I wrote in Rust. If you're interested in how it works, you can scroll to the bottom, but this post is mostly about a new feature I wrote to compare file modification dates to determine if a file needs to be copied.

Basically, I have a source folder (A) and a backup folder (B). When the backup runs, it copies files and folders from A to B. If the backup is run a second time, it will do the same thing. But if the file already exists in B, and the modified dates between A and B are the same, then I skip the file copy to speed things up. In rust, you can get the modified date of a file like so...

fn main() {
    use std::time::SystemTime;
    
    let source_metadata = file.metadata()?;
    let target_metadata = target.metadata()?;
    let src_modified : SystemTime = source_metadata.modified()?;
    let target_modified : SystemTime = target_metadata.modified()?;
}

So now you have two SystemTime variables. But keep in mind...

"Although a SystemTime cannot be directly inspected, the UNIX_EPOCH constant is provided in this module as an anchor in time to learn information about a SystemTime. By calculating the duration from this fixed point in time, a SystemTime can be converted to a human-readable time, or perhaps some other string representation."

So you need to convert these to a Duration before you compare them...

use core::time::Duration;

let src_mod_from_unix_epoch : Duration = src_modified
    .duration_since(std::time::SystemTime::UNIX_EPOCH)
    .expect("Could not get duration since unix epoch");

let target_mod_from_unix_epoch : Duration = target_modified
    .duration_since(std::time::SystemTime::UNIX_EPOCH)
    .expect("Could not get duration since unix epoch");

And you might think that comparing the two would be as easy as checking for equality...

if src_mod_from_unix_epoch == target_mod_from_unix_epoch {
    // Same modified date!
}

However, that's not always the case, because the modified date on the files will be based on the file system you're using. For example, my backup drive uses FAT32 and my system drive using NTFS. And FAT32 records the modified date with a 2 second resolution. That means that on my particular system, if I have two files that I consider to the same (one in the backup drive, the other on my system) their modified dates will be at most 2 seconds apart and will rarely be an exact match.

So that means that I need to determine equality slightly differently. To do that I converted the durations to milliseconds and determine the difference between the two.

let difference : u128 = u128::abs_diff(
    src_mod_from_unix_epoch.duration().as_millis(),
    target_mod_from_unix_epoch.duration().as_millis()
);

If the difference is less than or equal to 2000 milliseconds, then I can consider them the same.

let is_same_modified_date : bool = difference <= 2000;

And if that's the case, then I skip copying bytes and my backup workflow runs faster.


How The Backup Program Works

There are likely better ways to do backups, but this is just something simple that works for me. My backup program basically copies files from folder A to folder B with a few extra features.

I have lot of programming projects with giant dependency folders like node_modules, .stack-work, target, etc... So I skip copying any content specified in .gitignore files. I don't want to backup these ignored files and folders.

For some folders, I synchronize the content from source to target, so if the target directory has files that are not in the source, I delete those files, so I can always be sure that my backup is in sync with my file system.

I also have three "backup tiers": daily, incremental, and archive.

The daily backup is fairly small and has the most important files in it. It's always backed up to Daily/YYYY-MM-DD So if I need to, I can always go back in time to get something I may have accidentally deleted. As time goes on, I can delete older snapshots.

The incremental files are much larger and less important, and I trust myself to not accidentally delete something in these folders. So I run this backup as a synchronized backup from the source to the Incremental folder. So at any given time, I will always have the files on my system, and also on the backup drive. I guess this is technically more of a "mirror" than a "backup"? I suppose I should probably have some YYYY-MM-DD snapshots in case I do accidentally delete something on my system and then run the synchronized backup. If that happened, the content would be gone forever.

The archive folders are even larger and they are rarely updated. This folder is also a synchronized backup and it goes to the Archive folder.

I run the backup tool automatically using Windows Scheduler. It runs every hour. Within the program, I check to see if the Daily, Incremental, or Archive backup needs to be run. Daily happens every day. Incremental happens every week, Archive happens every month. If a backup tier has already run during those time frames, then the backup is skipped for that tier.

When one of the tiered backups run, I record when it happened as a plaintext file on the backup drive.

When one of the tiered backups runs, I write log files that record what happened - what files were copied, which were skipped, and if anything went wrong.