Updating the navigation on my site (again)

A while ago I wrote about attempting to use typescript, rust, and finally settling in on Java to write a little script to update the navigation on my website without parsing the HTML. Since I've learned a bit more rust since then, I wanted to revisit the event and see if I could succeed with rust this time.

What I did before

You're welcome to read the original post about writing a small script to update the navigation on this website. But the basic gist of it is that I wanted to automate a bunch of tedius HTML updates on this site. When I decided to do it, rather than use a language I was familiar with, I attempted to dive into Typescript to do it. I ran into trouble, got frustrated, and the next day decided to try doing it with rust. Which resulted in a repeat of the same.

So, I swapped to a language I knew well to just get the dang task I wanted to do done. Which worked, but also left me unsatisfied with not being able to apply myself to a new language how I wanted to. Of course, a couple months went by, I actually read a bit more of the rust book, and I completed my first ever advent of code, entirely in rust.

So, today is for revenge. Let's see if, now that I've managed to make it 13 chapters into the rust book and successfully parsed 25 files for the coding challenge of AoC, I can actually write the dang executable I had wanted to write before.

Parsing the command line arguments in rust

First off, I've recently finished the 13th chapter, which covers making a simple CLI tool. How perfect! So, I can use the pattern that they introduced to define what my programs expected inputs are pretty trivially. We've got two inputs at the moment. The file to use as a reference for what I want the header to look like, and the path I want it to update. Following along the convention of having a lib file for the guts of the program and keeping the main file for the binary as calls out to it, I can define the configuration for the CLI tool as this:

pub struct Config {
    pub template_file: String,
    pub path_to_update: String,
}

I suppose I could make this fancier and default the path to update to . if there's no path given, and use an Option for it, but I'm going to skip that for now I think and be explicit instead. We can refactor this later after our first pass after all.

Next, parsing the CLI arguments using the args iterator is the way chapter 13 also showed how to do things, so. We can create a build method to do that:

impl Config {
    pub fn build(mut args: impl Iterator<Item = String>) -> Result<Config, &'static str> {
	    if let None = args.next() {
		    return Err("Didn't get the program name somehow.");
        };
					
	    let template_file = match args.next() {
		    Some(arg) => arg,
		    None => return Err("First argument should be the template file"),
	    };
	    let path_to_update = match args.next() {
		    Some(arg) => arg,
		    None => return Err("Second argument should be file or path to update"),
	    };
					
	    Ok(Config {
		    template_file,
		    path_to_update,
	    })
    }
					
    pub fn path_is_directory(&self) -> bool {
	    match fs::metadata(&self.path_to_update) {
		    Ok(metadata) => metadata.is_dir(),
		    Err(_) => false,
	    }
    }
}

and then follow the pattern of having a run method that will actually do the work we'll eventually want to do.

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let template_file = fs::read_to_string(&config.template_file)?;
    let header_data = header_lines_from_template(&template_file);

    if config.path_is_directory() {
        update_files_in_dir(&config, &header_data)?;
    } else {
        let contents_to_update = fs::read_to_string(&config.path_to_update)?;
        let new_contents = get_updated_file_contents(&header_data, contents_to_update);
        fs::write(&config.path_to_update, new_contents)?;
    }
    Ok(())
}

Breaking things apart like this makes our actual main method trivial to verify for correctness and is the whole point of the pattern. In main.rs this is all we've got:

use nav_update::Config;
use std::env;
use std::process;

fn main() {
    let config = Config::build(env::args()).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = nav_update::run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

Replacing the data in the input with a templated nav

Now, onto the fun part. Most of this code is just boilerplate at the moment. The actual actions we want to do are the replacement of the lines within the <header> tags of any HTML files we see. In order to get the lines we'll be replacing them with, we can write it in a procedural style like so:

fn header_nav_slice_from_template(template: String) -> String {
    let mut iter = template.lines();
    let mut header_copy = String::new();
    while let Some(line) = iter.next() {
        if !line.contains("<header>") {
            continue;
        }

        while let Some(line) = iter.next() {
            if !line.contains("</header>") {
                header_copy.push_str(format!("{line}\n").as_ref());
            } else {
                break;
            }
        }
        break;    
    }
    header_copy
}

Or, we can write it in a more functional way with iterators:

fn header_nav_slice_from_template(template: &str) -> Vec<&str>
    template
        .lines()
        .skip_while(|line| !line.contains("<header>"))
        .take_while(|line| !line.contains("</header>"))
        .skip(1)
        .collect()
}

Both of these accomplish the same thing, but one is probably a tad easier to read for one group of people, and the other for the folks who hate all things good in the world. Identifying which is which is an exercise left to the reader. The more observant of you will notice that I swapped from returning a String to returning a vector of slices of the input. It just seemed like a good idea to me based on what I've read in the book so far, even if my first instinct is to just build a string.

String vs slice aside, To take these lines and splice them into the contents of another file, we can write out the code procedurally first, like so:

fn get_updated_file_contents(
    template_header_lines: &Vec<&str>,
    contents_to_update: String
) -> String {
    let mut iter = contents_to_update.lines();
    let mut new_contents = String::with_capacity(contents_to_update.len() + template_header_lines.len());
    while let Some(line) = iter.next() {
        if line.contains("<header>") {
            new_contents.push_str(line);
            new_contents.push_str("\n");

            for templated_line in template_header_lines.iter() {
                new_contents.push_str(templated_line);
                new_contents.push_str("\n");
            }
            while let Some(line) = iter.next() {
                if line.contains("</header>") {
                    new_contents.push_str(line);
                    break;
                } else {
                    continue;
                }
            }
        } else {
            new_contents.push_str(line);
        }
        new_contents.push_str("\n");
    }
    new_contents
}

In a nut shell, we'll consume the contents of a file until we see a header tag, then emit the templated data, then skip to the closing tag for header and carry on. This is pretty similar to the Java code we wrote in the previous blog post that looked like this:

private void updateFile(Path file, List<String> newHeaderLines) throws IOException {                    
    List<String> lines = Files.readAllLines(file);
    StringBuilder newFile = new StringBuilder(lines.size() * 100);

    for (int i = 0; i < lines.size(); i++) {
        if (lines.get(i).contains("<header>")) {
            for (String l : newHeaderLines) {
                newFile.append(l);
                newFile.append('\n');
            }

            for (int j = i; j < lines.size(); j++) {
                if (lines.get(j).contains("</header>")) {
                    i = j;
                    break;
                }
            }
        }

        newFile.append(lines.get(i));
        newFile.append('\n');
    }
    Files.writeString(file, newFile.toString());
}
                

In the java code we used some simple for loops, while in the rust code we used iterators. You could probably swap and write them either way in either language if you felt like one was simpler to follow. I will say, that I did attempt to work on a functional style version of this, but ran into a bit of trouble getting my test to pass.

This is probably a skill issue on my part, but I was attempting (and failing) to use the .chain method for iterators to take the data leading up to and including the header, then combining the templated code with the chain method, and then chaining an iterator that had skipped over to the closing header tag to the end. Conceptually it sounds like it should work for me, in practice?

A broken text where my iterator wasn't supplying the rest of the output. I might return to the code to figure out what's going on with it in the future, but for now, let's move on because I want to talk about the slightly more interesting part.

A replacement for the java FileVisitor class?

So, one of the things I really loved about working with Java for these little scripts is the FileVisitor class because it makes everything so easy for me to think about. Just get a visitor, supply it with your method of choice for the visitation, and you're good to go!

How do we the same thing in Rust?

Well, it's not quite the same. But rust does have some useful methods for traversing a file tree. The fs (for filesysem) module has the useful method read_dir, which in the examples section even includes a recursive method that basically lets us write a similar method:

fn visit_files(dir: &Path, action: &dyn Fn(&fs::DirEntry) -> Result<(), Box<dyn Error>>) -> Result<(), Box<dyn Error>> {
    if dir.is_dir() {
        for entry in fs::read_dir(dir)? {
            let entry = entry?;
            let path = entry.path();
            if path.is_dir() {
                visit_files(&path, action)?;
            } else {
                action(&entry)?;
            }
        }
    }
    Ok(())
}

This is nearly the same, but I've updated it to let me write a closure so that the calling code can look like this:

fn update_files_in_dir(
    config: &Config,
    template_header_lines: &Vec<&str>,
) -> Result<(), Box<dyn Error>> {
    visit_files(Path::new(&config.path_to_update), &|dir_entry: &fs::DirEntry| {
        if dir_entry.path().extension().and_then(|e| e.to_str()) != Some("html") { 
            return Ok(());
        }

        let contents_to_update = fs::read_to_string(dir_entry.path())?;
        let new_contents = get_updated_file_contents(
            template_header_lines,
            contents_to_update
        );
        fs::write(&dir_entry.path(), new_contents)?;
        Ok(())
    })?;
    Ok(())
}

As you can see, we check if the file is an HTML one, then call our handy method to replace the navigation and then write the file contents out. Not bad right? I was initially concerned about how I wasn't using the buffered methods for reading/writing that I saw in the rust documentation. But that disappeared when I was reading the documentation for the open method in the File struct. Which, explicitly says

If you only need to read the entire file contents, consider std::fs::read() or std::fs::read_to_string() instead.

Not bad right? But... can we make it better? Well, my initial thought to make things "better" is to figure out a way to treat the list of files as an iterator itself. Remembering that one can't modify an iterator you're currenting iterate-ing, we're probably not going to get very far with fold and friends. But... what if we encapsulated a queue?

#[derive(Debug)]
struct RecursiveDirIterator {
    q: VecDeque<fs::DirEntry>,
}

impl RecursiveDirIterator {
    fn new(d: &Path) -> Result<RecursiveDirIterator, Box<dyn Error>> {
        let mut q = VecDeque::new();
        if d.is_dir() {
            let entries = fs::read_dir(&d)?;
            for entry in entries {
                if let Ok(entry) = entry {
                    q.push_back(entry);
                }
            }
        }

        Ok(RecursiveDirIterator { q })
    }
}

impl Iterator for RecursiveDirIterator {
    type Item = fs::DirEntry;

    fn next(&mut self) -> Option<fs::DirEntry> {
        let n = self.q.pop_front();
        match n {
            None => n,
            Some(dir_entry) => {
                let path = dir_entry.path();
                if path.is_dir() {
                    if let Ok(entries) = fs::read_dir(&path) {
                        for entry in entries {
                            if let Ok(entry) = entry {
                                self.q.push_back(entry);
                            }
                        }
                    } else {
                        eprintln!("Could not read entry in path {}", path.display());
                    }
                }
                Some(dir_entry)
            }
        }
    }
}

Alright, that's a lot of code at once1. First off, the new method. Our constructor for our struct is pretty similar to the visit_files method in that we're reading a list of directory entries from whatever path we're given. But, rather than immediately recursing and fetching the various entries, we hold off. Only adding the top level entries from the current path.

The next method handles the "recursion". Every time you ask for another item from the iterator, it checks to see if the current DirEntry being dequeued is a directory or not. If it is, then we list off those nested files or folders and add them to the queue before handing the directory we just listed off to the user for processing. In this way, we do the same work as the visit_files method, but delayed until the user wants it done. We're now a little bit lazy.

The nice thing about Iterator is that we only have to implement the next method, and then all the other handy fancy methods we're used to are available. So, we can rewrite our update_files_in_dir method to look like this now:

fn update_files_in_dir(
    config: &Config,
    template_header_lines: &Vec<&str>,
) -> Result<(), Box<dyn Error>> {
    let entries = RecursiveDirIterator::new(Path::new(&config.path_to_update))?;
    entries
        .filter(|dir_entry| {
            dir_entry
                .path()
                .extension()
                .and_then(|e| e.to_str()) == Some("html")
        })
        .for_each(|dir_entry| {
            if let Ok(contents_to_update) = fs::read_to_string(dir_entry.path()) {
                let new_contents = get_updated_file_contents(template_header_lines, contents_to_update);
                if let Err(e) = fs::write(&dir_entry.path(), new_contents) {
                    eprintln!("{e}");
                }
            }
    });
    Ok(())
}

Is this shorter? Eh, no, not really. But we did break up the condition and early return from checking the extension and the actual action we're performing to write the new contents of the file out. So that feels a bit more clear from a do one thing at a time perspective. Which might be valuable if we were doing a lot of different actions on the files in a directory.

Wrap up

The proof is in the pudding though. So, does the new tool do the same thing as the Java one?

Yes it does. And the nice thing? I don't have to call java -jar or write a script to wrap things up to make it look and feel like an exe on windows. It just is. Of course, we can confirm that this works for more than just one file for good measure. So, if I update my navigation on my website, then I should be able to see a pleasant diff in git. Le'ts say I wanted to change my header from this:

<header>
    <nav>
      <a href="/"><button>Home</button></a>
      <a href="/cooking/index.html"><button>Recipes</button></a>
      <a href="/blag/index.html"><button>Blog</button></a>
      <a href="https://www.twitch.tv/peetseater"><button>Stream</button></a>
      <a href="/video-archive"><button>Video Archive</button></a>
      <a id="rss" href="/blag/feed.xml"><img src="/images/RSS_Logo.png" title="RSS Reader Link "/></a>
    </nav>
</header>

To this, for whatever reason depriving the world of my lovely cookie recipes and other random ideas about how food can be made:

<header>
    <nav>
      <a href="/"><button>Home</button></a>
      <a href="/blag/index.html"><button>Blog</button></a>
      <a href="https://www.twitch.tv/peetseater"><button>Stream</button></a>
      <a href="/video-archive"><button>Video Archive</button></a>
      <a id="rss" href="/blag/feed.xml"><img src="/images/RSS_Logo.png" title="RSS Reader Link "/></a>
    </nav>
</header>

So then, running the exe on the folder like so:

$ ../tools/target/release/nav-update.exe blag/template.html .

And I can see that just the cooking line has disappeared! Just like if I were to have ran the java version.

Isn't that a nice way to wrap up a weekend and get a little bit more practice writing rust for fun! If you want to look at the code in all its glory (it even has some doc strings and some minor tweaks from my running and fixing up cargo clippy then you can find the code over on Github here. Happy hacking!