bandarra.me

Rust Markdown Syntax Highlighting: A Practical Guide

You're a Rust developer, and you love Markdown's simplicity and readability. You might use it to write blog posts, documentation, or even as part of an interactive code editor. However, displaying plain code within Markdown can be tough on the eyes. Enter syntax highlighting, a feature that adds color and structure to your code, making it more visually appealing and easier to understand.

This blog post will guide you on combining two powerful Rust libraries – pulldown-cmark and syntect – to seamlessly add syntax highlighting to your Markdown files and output the result as a styled HTML file.

We'll cover:

Let's get started!

Understanding Markdown Events with pulldown-cmark

You're already familiar with Markdown's simple syntax, but the key to working with it programmatically is understanding how pulldown-cmark represents the parsed content. This library uses events to model the structure of your Markdown document. Think of each event as a signal about what's being encountered while parsing.

Let's break down the key events you'll be working with:

To illustrate how these events work in identifying code blocks, here's a basic example:

use pulldown_cmark::{Event, Parser, Tag, TagEnd};

fn main() {
    let markdown = r#"
# Hello, World

Here's a code block:

```rust
fn main() {
    println!("Hello, World");
}
```
"#;

    let parser = Parser::new(markdown);

    for event in parser {
        match event {
            Event::Start(Tag::CodeBlock(_)) => {
                println!("Code block start");
            }
            Event::End(TagEnd::CodeBlock) => {
                println!("Code block end");
            }
            Event::Text(t) => {
                println!("Text: {}", t);
            }
            _ => {}
        }
    }
}

In this example, the loop iterates through the events emitted by pulldown-cmark. We are particularly interested in events representing the start and end of code blocks, and also the Text events that appear inside of code blocks.

Now that you understand these core concepts, you're ready to move on to incorporating syntect for syntax highlighting!

Highlighting Code with syntect

Now that you've learned how to identify code blocks using pulldown-cmark events, let's bring in the powerful syntax highlighting capabilities of syntect. This library makes applying beautiful syntax coloring to your code incredibly straightforward.

What syntect brings to the table

The syntect library shines by providing you with the tools to define and apply custom syntax definitions and color themes. It even leverages Sublime Text's widely popular syntax definitions, enabling you to instantly support a plethora of programming languages.

Here's a breakdown of what syntect offers:

Getting Started with syntect

Here's a quick demonstration on how to apply syntax highlighting using syntect:

use syntect::{highlighting::ThemeSet, html::highlighted_html_for_string, parsing::SyntaxSet};

fn main() {
    let code = r#"
fn main() {
    println!("Hello, World");
}
    "#;
    let syntax_set = SyntaxSet::load_defaults_newlines();
    let syntax_reference = syntax_set.find_syntax_by_token("rust").unwrap();
    let theme = ThemeSet::load_defaults().themes["base16-ocean.dark"].clone();
    let html = highlighted_html_for_string(code, &syntax_set, &syntax_reference, &theme).unwrap();
    println!("{}", html);
}

In this snippet:

  1. SyntaxSet::load_defaults_newlines() loads the default set of syntax definitions, including definitions for Rust, JavaScript, Python, and many other languages.
  2. syntax_set.find_syntax_by_token("rust") retrieves the specific syntax definition for Rust, which is later used to highlight the code.
  3. ThemeSet::load_defaults().themes["base16-ocean.dark"].clone() accesses the base16-ocean.dark theme from the default set of themes, offering a clean and modern dark theme.
  4. highlighted_html_for_string() is the main function responsible for applying highlighting. It takes the code, the syntax set, the theme, and the chosen language, generating a syntax highlighted HTML snippet.
  5. The generated html string is then printed to the console.

Let's dive deeper into customization next!

Integrating pulldown-cmark and syntect for Syntax Highlighting

Now you're ready to combine the power of pulldown-cmark and syntect to bring syntax highlighting to your Markdown content. This section walks you through the process, step by step, with code examples to guide you.

Let's start by outlining the key steps:

  1. Parse Markdown with pulldown-cmark: Use pulldown-cmark's event iterator to extract the relevant data from your Markdown content.
  2. Identify Code Blocks: Specifically look for Event::Start(Tag::CodeBlock) events to pinpoint code sections.
  3. Apply Syntax Highlighting with syntect: For each code block:
    • Determine the language used (e.g., "rust").
    • Use syntect to apply the appropriate syntax highlighting.
    • Replace the code block content with syntax highlighted HTML.
  4. Render the Final HTML Output: Stitch the highlighted code blocks back into the pulldown-cmark events stream. Finally, use pulldown-cmark::html::push_html to generate the HTML representation of your Markdown.

Here's how you can implement these steps within a function named markdown_to_html:

pub fn markdown_to_html(markdown: &str) -> String {
    static SYNTAX_SET: LazyLock<SyntaxSet> = LazyLock::new(SyntaxSet::load_defaults_newlines);
    static THEME: LazyLock<Theme> = LazyLock::new(|| {
        let theme_set = ThemeSet::load_defaults();
        theme_set.themes["base16-ocean.dark"].clone()
    });

    let mut sr = SYNTAX_SET.find_syntax_plain_text();
    let mut code = String::new();
    let mut code_block = false;
    let parser = Parser::new(markdown).filter_map(|event| match event {
        Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(lang))) => {
            let lang = lang.trim();
            sr = SYNTAX_SET
                .find_syntax_by_token(&lang)
                .unwrap_or_else(|| SYNTAX_SET.find_syntax_plain_text());
            code_block = true;
            None
        }
        Event::End(TagEnd::CodeBlock) => {
            let html = highlighted_html_for_string(&code, &SYNTAX_SET, &sr, &THEME)
                .unwrap_or(code.clone());
            code.clear();
            code_block = false;
            Some(Event::Html(html.into()))
        }

        Event::Text(t) => {
            if code_block {
                code.push_str(&t);
                return None;
            }
            Some(Event::Text(t))
        }
        _ => Some(event),
    });
    let mut html_output = String::new();
    pulldown_cmark::html::push_html(&mut html_output, parser);
    html_output
}

Let's examine this code:

Now, this is an essential example of how to use pulldown-cmark and syntect. The core concept is how events are filtered for certain events and replaced with new HTML.

We've touched on many ways to apply these ideas. It's up to you to create different tools or applications based on your specific use cases!

Optimization and Performance Best Practices

You've now got a good understanding of how to use pulldown-cmark and syntect for syntax highlighting. However, for real-world use cases, you'll likely want to optimize the process for speed and efficiency, particularly when dealing with large Markdown files. Here are some essential best practices to keep in mind:

Optimizing Syntax Set and Theme Loading

The initial loading of syntax sets and themes is a relatively expensive operation. Since loading these resources can significantly impact performance, it's crucial to load them wisely. You can use LazyLock to ensure these resources are loaded only when needed, rather than upfront:

static SYNTAX_SET: LazyLock<SyntaxSet> = LazyLock::new(SyntaxSet::load_defaults_newlines);
static THEME: LazyLock<Theme> = LazyLock::new(|| {
        let theme_set = ThemeSet::load_defaults();
        theme_set.themes["base16-ocean.dark"].clone()
 });

This way, SYNTAX_SET and THEME are loaded only once and will be available globally in your project, ensuring that resources are efficiently managed, reducing unnecessary overhead.

Efficient Event Processing Techniques

A naïve approach to handle the events is to use collect() from the pulldown-cmark event iterator, turning it into a Vec of Events. However, this approach iterates over the entire vector multiple times, creating performance problems for larger Markdown files.

Here's how you can rewrite the core loop of the markdown rendering function to use an iterator approach, which optimizes for performance:

// ...
    let parser = Parser::new(markdown).filter_map(|event| { 
        match event {
            Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(lang))) => {
                // ... Handle start of a code block.
            }
            Event::End(TagEnd::CodeBlock) => {
                // ... Handle the end of a code block.
            }
            Event::Text(t) => {
                // ... Handle Text within a code block
            }
            _ => Some(event), // Return other events to continue the processing 
        }
    });

    // This uses a `filter_map`, and the `match` inside creates the output based on the events.
    let mut html_output = String::new();
    pulldown_cmark::html::push_html(&mut html_output, parser);
    // ...

In this revised snippet, we employ a filter and mapping pattern, creating a streamlined and performant code. The idea is that the pulldown-cmark::html::push_html method iterates through each event on the fly, applies the logic and only modifies the needed events.

Summary of Optimizations

By embracing these optimizations, you can significantly improve the performance and efficiency of your syntax highlighting code while reducing the overall memory consumption:

Conclusion: Elevating Markdown Rendering with Syntax Highlighting

Combining the power of pulldown-cmark and syntect allows you to unlock a whole new level of polish and functionality when working with Markdown files in your Rust projects. This approach transforms Markdown rendering into something truly delightful, enhancing your ability to produce visually engaging and easy-to-read content for blogs, documentation, and code editors.

Imagine generating your documentation with beautifully highlighted code, creating blog posts with captivating syntax highlighting, or empowering your interactive code editor with the elegance of colored code – this dynamic duo empowers you to achieve all this and more.

By mastering these libraries, you not only streamline the process of creating Markdown-based content, but you also infuse it with an enhanced visual experience, ultimately enhancing communication and readability. You can focus on creating clear, structured content, knowing that your code will be presented with the style it deserves.

Take the time to experiment with these powerful tools, explore different themes, languages, and use cases. As you become comfortable with the capabilities of pulldown-cmark and syntect, you'll discover new ways to create compelling and engaging content with Markdown.

Note: This post was created with the assistance of AI. While a human carefully reviewed and edited the content, it's important to remember that AI tools may introduce errors or biases. If you have any concerns or questions, please feel free to reach out.