Nov 19, 2024 · 7 min read

Convert Markdown to PDF using Pandoc

This tutorial offers a comprehensive guide on converting Markdown to PDF using Pandoc. It includes essential installation instructions, prerequisites, fundamental commands, and advanced customization techniques to produce high-quality, professional documents efficiently.

Pandoc, a powerful command-line tool, simplifies the process of converting Markdown files to PDF, enabling users to create professional-quality documents with ease.

In this tutorial, I’ll show you how to convert Markdown files to PDF using Pandoc. It covers installation, prerequisites, basic commands, and advanced customization options to help you create professional documents efficiently.

Installing Pandoc

To install Pandoc, download it from the official Pandoc installation page. Follow the instructions specific to your operating system (Windows, macOS, or Linux) and verify the installation by running:

pandoc --version

Prerequisites for PDF Conversion

For PDF output, you’ll need a LaTeX distribution. Popular options include:

TeX Live (Linux/Windows)
MacTeX (macOS)
MiKTeX (Windows)

Install a distribution suitable for your platform and verify it with:

pdflatex --version

This ensures Pandoc can generate PDFs without issues.

Converting Markdown to PDF using Pandoc

Pandoc simplifies document conversion with an intuitive command-line syntax. For example, to convert a Markdown file to a PDF:

pandoc input.md -o output.pdf

In this command:

input.md specifies the source file in Markdown format.
-o output.pdf defines the output filename and format. Pandoc determines the output format from the file extension (.pdf in this case).

Pandoc supports additional options for customization. For instance, you can explicitly specify the input and output formats, although it’s often unnecessary:

pandoc -f markdown -t pdf input.md -o output.pdf

You can also convert multiple Markdown files into a single PDF:

pandoc chapter1.md chapter2.md chapter3.md -o complete_document.pdf

By default, Pandoc uses system settings or defaults for formatting the PDF. However, for advanced customizations (like fonts or layouts), you can include metadata or reference templates during conversion. For example:

pandoc input.md -o output.pdf --metadata title="Document Title"

This makes Pandoc a flexible tool for converting Markdown files to professional-grade PDFs efficiently.

Customizing Output while Converting Markdown to PDF

Pandoc allows extensive customization of the output, making it easy to create personalized, professional documents.

Using Templates for Formatting

Templates control the structure and styling of the output. For example, when converting Markdown to PDF, you can use a custom LaTeX template to define margins, fonts, and headers:

pandoc input.md -o output.pdf --template=custom_template.tex

Specifying Metadata

Metadata allows you to add details like the document’s title, author, and date directly during conversion. These can be passed as command-line options:

pandoc input.md -o output.pdf --metadata title="My Document" --metadata author="John Doe" --metadata date="2024-11-19"

Alternatively, you can define metadata in the Markdown file itself using a YAML block:

---
title: My Document
author: John Doe
date: 2024-11-19
---

This metadata will be reflected in the final output format, such as the title page in a PDF.

Adjusting Styles with CSS or YAML

For HTML output, you can include custom CSS to define styles:

pandoc input.md -o output.html --css=styles.css

In Markdown, you can also define additional styles and settings using YAML metadata, such as custom fonts or colors:

---
css: "styles.css"
highlight-style: pygments
---

This flexibility ensures your documents meet specific design requirements across all supported output formats.

Handling Images in Markdown while Converting to PDF

Pandoc provides robust support for embedding and customizing images in documents during conversion.

Embedding Images in Markdown:

Images in Markdown are added using the following syntax:

![Alt text](path/to/image.jpg "Optional title")

Alt text: Describes the image for accessibility or as fallback text.
Path: Specifies the file location, which can be relative (images/photo.jpg) or absolute (/home/user/images/photo.jpg).
Title: An optional tooltip displayed when hovering over the image.

For example:

![A scenic view](images/scenic.jpg "Mountain Landscape")

Adjusting Image Size and Alignment:

Pandoc allows size and alignment customization using Markdown or HTML-like syntax:

To set the image size in LaTeX-based outputs like PDF:
```
![Alt text](path/to/image.jpg){width=50%}
```
This scales the image to 50% of the page width. You can also specify exact dimensions:
```
![Alt text](path/to/image.jpg){width=200px height=150px}
```
For alignment, Pandoc interprets raw HTML or LaTeX depending on the output format:
```
<img src="path/to/image.jpg" style="float: right; width: 50%;">
```
This aligns the image to the right in HTML output. For PDFs, LaTeX templates can achieve similar results:
```
\includegraphics[width=0.5\textwidth, align=c]{path/to/image.jpg}
```

Ensuring Proper File Paths

During conversion, image file paths must be accessible relative to the working directory or explicitly defined.

Relative paths: Work when images are in the same directory as the Markdown file or a subdirectory. For example:
```
![Example](images/example.jpg)
```
If converting from a different directory, ensure paths are adjusted or use the --resource-path option to set search paths:
```
pandoc input.md -o output.pdf --resource-path=./images
```
Absolute paths: Ensure that the specified paths are correct for the system running the conversion.

To include remote images hosted online, provide a valid URL:

![Remote Image](https://example.com/images/photo.jpg)

By combining these features, you can ensure images are displayed correctly and formatted to fit the style and structure of your output documents.

Handling Tables in Markdown While converting to PDF

Pandoc supports creating and converting tables from Markdown to various formats, including PDF, Word, and HTML, while preserving table structure and formatting.

Creating Tables in Markdown

Markdown supports simple tables using pipes (|) and dashes (-):

| Header 1   | Header 2   | Header 3   |
|------------|------------|------------|
| Row 1 Col 1| Row 1 Col 2| Row 1 Col 3|
| Row 2 Col 1| Row 2 Col 2| Row 2 Col 3|

This creates a basic table with headers and rows. For more complex tables, Pandoc’s grid tables or multiline tables are ideal:

+------------+------------+------------+
| Header 1   | Header 2   | Header 3   |
+============+============+============+
| Row 1 Col 1| Row 1 Col 2| Row 1 Col 3|
+------------+------------+------------+
| Row 2 Col 1| Row 2 Col 2| Row 2 Col 3|
+------------+------------+------------+

Formatting Tables for Different Outputs

PDF Output:
When converting to PDF, Pandoc uses LaTeX for rendering tables. It automatically adjusts column widths and aligns text, but you can use custom LaTeX templates or options for more control:
```
| Product  | Price ($) | Availability |
|----------|-----------|--------------|
| Item A   | 10.00     | In Stock     |
| Item B   | 15.50     | Out of Stock|
```
Include this in a Markdown file and convert it:
```
pandoc input.md -o output.pdf
```
Use the --table-width option for complex tables that require column adjustments.

Handling Large or Complex Tables:

For tables with many rows or columns, Markdown’s grid table syntax can become unwieldy. In such cases:

Use CSV files as input and convert them to tables during conversion. Pandoc can process CSV with table filters or pre-process it into Markdown tables.
Break large tables into sections to maintain readability, especially in PDF or Word outputs.
Add captions and labels for better context and indexing:
```
: Table 1: Product Pricing
```

By leveraging Markdown’s flexibility and Pandoc’s support for table formatting, you can seamlessly integrate tables into professional documents.

Advanced Features

Pandoc’s advanced capabilities enable users to customize transformations, automate workflows, and combine multiple documents into cohesive outputs.

Using Filters for Custom Transformations:

Filters allow custom processing of documents during conversion. These scripts modify the intermediate representation (AST) Pandoc uses, enabling highly specific changes.

Filters can be written in languages like Python or Lua. For example, to capitalize all headers in a Markdown file, you can use a Lua filter:
```
function Header(el)
  el.content[1].text = string.upper(el.content[1].text)
  return el
end
```
Apply the filter during conversion:
```
pandoc input.md -o output.pdf --lua-filter=capitalize_headers.lua
```
Filters can also be used for advanced tasks like adding custom metadata, transforming specific elements, or injecting custom code snippets.

Scripting Workflows with Pandoc in Automation:

Pandoc integrates seamlessly with scripts to automate repetitive tasks, such as batch conversions or multi-format outputs.

For example, a shell script to convert all Markdown files in a folder to PDFs:

#!/bin/bash
for file in *.md; do
  pandoc "$file" -o "${file%.md}.pdf"
done

This script iterates through all .md files and generates corresponding PDFs. You can extend it to include options like templates, metadata, or filters.

For more complex workflows, tools like Makefiles or CI/CD pipelines can orchestrate Pandoc tasks to automatically generate documents during development or deployment processes.

Troubleshooting

Missing Dependencies: Errors like “pdflatex not found” occur when required tools (e.g., LaTeX for PDF output) are not installed. Install the necessary dependencies and verify using commands like pdflatex --version or pandoc --version.
Incorrect File Paths: Ensure all input files, images, and templates have correct paths. Use relative paths for portability or the --resource-path option to set directories:
```
pandoc input.md -o output.pdf --resource-path=./images
```

Debugging with Verbose Mode:

The --verbose flag provides detailed output during conversion, helping identify issues:

pandoc --verbose input.md -o output.pdf

Use this to spot errors in file paths, metadata, or dependencies.

Conclusion

Pandoc is a versatile tool for converting and customizing documents across a wide range of formats. Its flexibility makes it suitable for simple tasks like Markdown-to-PDF conversion and advanced workflows involving templates, filters, and automation.

Explore its advanced features to unlock greater potential, such as custom scripting, complex formatting, and large-scale document processing. Join the Pandoc community to contribute, share use cases, and access support.

For further learning, consult the official Pandoc documentation, the GitHub repository, and user forums for tips and examples.