An Incomplete Explanation of the Proc Macro That Saved Me 4000 Lines of Rust

Recently I’ve been working on a project to surface census data via a GraphQL API, mostly as a way to learn GraphQL. I did end up learning GraphQL, but I also ended up learning a lot about proc macros. I was using Juniper, which creates a GraphQL schema using structs like this:

#[derive(GraphQLObject, Copy, Clone, Debug)]
struct Demographics {
    female: Option<i32>,
    male: Option<i32>,
}

The problem, is that the census data I was looking to surface had way too many variables, I would have had to write out 207 structs, with a total of 352 fields. Here’s a nice tree-view of the census data, courtesy of the frangipanni tool. Obviously, I didn’t want to write out all these structs and fields by hand, even with some fancy vim macros that would have been too much tedious work for a side project that’s supposed to be fun.

I dipped my toes into proc macros for the first time, and found they really weren’t that hard, but the resources out there are so scarce. All I wanted was a few examples of non-trivial macros, but I couldn’t find many. So this isn’t a full proc macros walkthrough, just a quick look at the macro I ended up writing for my use case.

The input and output

The tree of census fields from above is the input to my macro, and the output looks like this.

Set up

I was surprised to find that I couldn’t just create a macro within the binary I was developing. Instead, I had to create a new library and use that as a crate. It’s trivial enough, just run cargo new macro-example --lib, then add it as a path dependency, but slihtly annoying. Another gotcha is that you’ll then need to add the following lines to your Cargo.toml in that new cargo project:

[lib]
proc-macro = true

The macro

You can check out the full macro here. Note it’s slightly simplified from the one that generated the output I linked previously, in the interest of only keeping the interesting parts. In the full macro there’s a lot of extra code to handle bad characters, and repeated field names. I’m not going to go over the whole thing, but I’ll note some of the macro-related parts I had to learn about.

Testing the macro / workflow

proc_macro2 was an essential crate for testing. The signature of a proc_macro function will look something like this:

#[proc_macro]
pub fn generate_census_structs(item: proc_macro::TokenStream) -> proc_macro::TokenStream

Unfortunately, you can’t directly call this function in a #[test]. One option is to create a new project just to use the crate you’re developing, and call the macro in there. The better option (IMO) is to move all your macro generation code into a separate function, which will produce a proc_macro2::TokenStream. This can be easily converted to a proc_macro::TokenStream using into(), so you get something like this:

fn generate_census_structs_inner(item: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
  // ...
}

#[proc_macro]
pub fn generate_census_structs(item: proc_macro::TokenStream) -> proc_macro::TokenStream {
    generate_census_structs_inner(proc_macro2::TokenStream::from(item)).into()
}

This way, you can directly call that generate_census_structs_inner function, in your tests. I’ve got a test like this:

#[test]
fn census_gen_test() {
    use std::{fs::File, io::Write};
    let generated = generate_census_structs_inner(
        quote!(
            "
dp03
    DEMOGRAPHICS
        Male
        Female
    INCOME
        Less than 50 k
        More than 50 k
                    "
        )
        .into(),
    );
    let mut file = File::create("test.rs").unwrap();
    file.write_all(format!("{}", generated).as_bytes()).unwrap();
}

When I started developing this macro, I’d assert that the output matched what I expected, but as the output got larger it became more feasible to just write the output to a file and see if it checked out. Not best practice obviously, but it worked for me.

In the end I had cargo watch -x test running at the same time as a process to print out the contents of the test.rs file, formatted with rustfmt. This gave me a quick REPL to test stuff out.

Generating arbitrary identifiers

quote! is the primary way of generating TokenStreams. One speedbump I encountered is that it will treat strings as strings, rather than identifiers, ex. if you have something like:

// Pretend this is a string we're getting from somewhere
let struct_name = "Blah"; 
return quote! {
    struct #struct_name;
};

Then the output will be

struct "Blah";

Which wasn’t what I wanted. In order to convert your Strings to identifiers that quote! will print out directly, you use format_ident!() this macro works just like print! and format!, but will return an Ident.

So the example above would instead be:

// Again pretend this string is coming from somewhere, not static
let struct_name = format_ident!("{}", "Blah");
return quote! {
    struct #struct_name;
};

Iterating over multiple Vectors in a quote!

In my macro, I generate a Vec of field names and a Vec of field types. I wanted to iterate through both of them together, something that would usually be done with a zip. But inside a quote! you can’t do stuff like that. So I found out that if you have two vectors, and refer to both inside a #( )* section (Which is just the special syntax for iterating in a macro), then it will iterate over both in parallel:

generated_structs.push(quote! {
    #[derive(GraphQLObject, Copy, Clone, Debug)]
    struct #struct_name {
       #( #field_names: #field_types ),* // <---
    }
})

Casing

For census categories like “Total households”, I needed to generate a pascal case version for the struct (TotalHouseholds), and a snake case version for the field (total_households). The convert_case crate was extremely useful for that.

Using invalid rust as macro input

Macros are really meant to transform valid Rust syntax to other valid Rust syntax. That wasn’t my use case, so I cheated a bit by calling the macro with a single string, and then parsing that:

let input: Vec<TokenTree> = item.into_iter().collect();
let literal = match &input.get(0) {
    Some(proc_macro2::TokenTree::Literal(literal)) => literal.to_string(),
    _ => panic!(),
};

The approach works, but it’s a bit strange dealing with a Literal. For example I had to trim the quote characters off myself, as there didn’t seem to be a way to get the actual string contents.

Closing thoughts

All in all I’m pretty happy with macro development in Rust, but I wish it didn’t require creating a separate crate, and that there could be some more useful tools for debugging them.