5 minutes
An Incomplete Explanation of the Proc Macro That Saved Me 4000 Lines of Rust
Recently I’ve been working on a project to surface census data via a GraphQL API, mostly as a way to learn GraphQL. I did end up learning GraphQL, but I also ended up learning a lot about proc macros. I was using Juniper, which creates a GraphQL schema using structs like this:
#[derive(GraphQLObject, Copy, Clone, Debug)]
struct Demographics {
female: Option<i32>,
male: Option<i32>,
}
The problem, is that the census data I was looking to surface had way too many variables, I would have had to write out 207 structs, with a total of 352 fields. Here’s a nice tree-view of the census data, courtesy of the frangipanni tool. Obviously, I didn’t want to write out all these structs and fields by hand, even with some fancy vim macros that would have been too much tedious work for a side project that’s supposed to be fun.
I dipped my toes into proc macros for the first time, and found they really weren’t that hard, but the resources out there are so scarce. All I wanted was a few examples of non-trivial macros, but I couldn’t find many. So this isn’t a full proc macros walkthrough, just a quick look at the macro I ended up writing for my use case.
The input and output
The tree of census fields from above is the input to my macro, and the output looks like this.
Set up
I was surprised to find that I couldn’t just create a macro within the binary I
was developing. Instead, I had to create a new library and use that as a crate.
It’s trivial enough, just run cargo new macro-example --lib
, then
add it as a path dependency, but slihtly annoying. Another gotcha is that you’ll
then need to add the following lines to your Cargo.toml
in that new cargo
project:
[lib]
proc-macro = true
The macro
You can check out the full macro here. Note it’s slightly simplified from the one that generated the output I linked previously, in the interest of only keeping the interesting parts. In the full macro there’s a lot of extra code to handle bad characters, and repeated field names. I’m not going to go over the whole thing, but I’ll note some of the macro-related parts I had to learn about.
Testing the macro / workflow
proc_macro2
was an
essential crate for testing. The signature of a proc_macro
function will look
something like this:
#[proc_macro]
pub fn generate_census_structs(item: proc_macro::TokenStream) -> proc_macro::TokenStream
Unfortunately, you can’t directly call this function in a #[test]
. One option
is to create a new project just to use the crate you’re developing, and call the
macro in there. The better option (IMO) is to move all your macro generation
code into a separate function, which will produce a proc_macro2::TokenStream
.
This can be easily converted to a proc_macro::TokenStream
using into()
, so
you get something like this:
fn generate_census_structs_inner(item: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
// ...
}
#[proc_macro]
pub fn generate_census_structs(item: proc_macro::TokenStream) -> proc_macro::TokenStream {
generate_census_structs_inner(proc_macro2::TokenStream::from(item)).into()
}
This way, you can directly call that generate_census_structs_inner
function,
in your tests. I’ve got a test like this:
#[test]
fn census_gen_test() {
use std::{fs::File, io::Write};
let generated = generate_census_structs_inner(
quote!(
"
dp03
DEMOGRAPHICS
Male
Female
INCOME
Less than 50 k
More than 50 k
"
)
.into(),
);
let mut file = File::create("test.rs").unwrap();
file.write_all(format!("{}", generated).as_bytes()).unwrap();
}
When I started developing this macro, I’d assert
that the output
matched what I expected, but as the output got larger it became more feasible to
just write the output to a file and see if it checked out. Not best practice
obviously, but it worked for me.
In the end I had cargo watch -x test
running at the same time as a process to
print out the contents of the test.rs
file, formatted with rustfmt
. This
gave me a quick REPL to test stuff out.
Generating arbitrary identifiers
quote!
is the primary way of generating TokenStream
s. One speedbump I
encountered is that it will treat strings as strings, rather than identifiers,
ex. if you have something like:
// Pretend this is a string we're getting from somewhere
let struct_name = "Blah";
return quote! {
struct #struct_name;
};
Then the output will be
struct "Blah";
Which wasn’t what I wanted. In order to convert your String
s to identifiers
that quote!
will print out directly, you use format_ident!()
this macro
works just like print!
and format!
, but will return an Ident
.
So the example above would instead be:
// Again pretend this string is coming from somewhere, not static
let struct_name = format_ident!("{}", "Blah");
return quote! {
struct #struct_name;
};
Iterating over multiple Vectors in a quote!
In my macro, I generate a Vec
of field names and a Vec
of field types. I
wanted to iterate through both of them together, something that would usually be
done with a zip
. But inside a quote!
you can’t do stuff like that. So I
found out that if you have two vectors, and refer to both inside a #( )*
section (Which is just the special syntax for iterating in a macro), then it
will iterate over both in parallel:
generated_structs.push(quote! {
#[derive(GraphQLObject, Copy, Clone, Debug)]
struct #struct_name {
#( #field_names: #field_types ),* // <---
}
})
Casing
For census categories like “Total households”, I needed to generate a
pascal case version for the struct (TotalHouseholds
), and a snake case version
for the field (total_households
). The
convert_case crate was
extremely useful for that.
Using invalid rust as macro input
Macros are really meant to transform valid Rust syntax to other valid Rust syntax. That wasn’t my use case, so I cheated a bit by calling the macro with a single string, and then parsing that:
let input: Vec<TokenTree> = item.into_iter().collect();
let literal = match &input.get(0) {
Some(proc_macro2::TokenTree::Literal(literal)) => literal.to_string(),
_ => panic!(),
};
The approach works, but it’s a bit strange
dealing with a Literal
. For example I had to trim the quote characters off
myself, as there didn’t seem to be a way to get the actual string contents.
Closing thoughts
All in all I’m pretty happy with macro development in Rust, but I wish it didn’t require creating a separate crate, and that there could be some more useful tools for debugging them.
If you want to support me, you can buy me a coffee.
If you play games, my PSN is mbuffett, always looking for fun people to play with.
If you're into chess, I've made a repertoire builder. It uses statistics from hundreds of millions of games at your level to find the gaps in your repertoire, and uses spaced repetition to quiz you on them.
Samar Haroon, my girlfriend, has started a podcast where she talks about the South Asian community, from the perspective of a psychotherapist. Go check it out!.