41  
aspnetcore
Поиск  
Always will be ready notify the world about expectations as easy as possible: job change page
Feb 25, 2023

Exploring generating PDF files from HTML in ASP.NET Core

Автор:
Mike Brind
Источник:
Просмотров:
4545

Back in 2008, I wrote a series of articles about using iTextSharp to generate PDF files in an ASP.NET application. I still use iTextSharp in a large MVC 5 application that I'm in the process of migrating to ASP.NET Core. The version I use is very old (4.1.6), and the API is very low level so it takes quite a while to write the code required to generate even a moderately complex PDF. Ideally I need a replacement for the new application that can generate PDF files purely from HTML, which is an API I'm much more comfortable with. This ancient version of iTextSharp doesn't support HTML as a source of content. In this article, I consider some alternatives.

For this exercise, I only want to generate a PDF. No editing, reading or password-protecting PDFs required. The PDF content is a report consisting of table of data from a database. The design makes use of Bootstrap 5 CSS and icons. I also want to use web fonts (Open Sans from Google Fonts) within the PDF. Here's a screenshot of the web version of the report. The table uses the table-striped CSS class to apply alternative backgrounds to table rows. Discontinued items are displayed using the text-black-50 class from bootstrap 5. It also uses Bootstrap icons to indicate whether items need to be reordered. The colour of the icon in these instances is controlled by the text-danger CSS class. The header and the logo are placed in a flex container and positioned using the justify-content-between CSS class from Bootstrap. You can see the source code on Github if you are interested.

Reorder Report

I decided to have a look at three different options: iText 7 - an up-to-date replacement for the iTextSharp library that I'm familiar with; DinkToPdf - a free open source project that does well in searches relating to PDF from HTML in ASP.NET Core; and ChromeHTMLToPdf - another free open source option.

Each option can generate a PDF file from a string of HTML (as well as other sources, including variously files, streams and URLs). I'm generating my HTML by rendering Razor partials to a string using the technique I blogged about previously. The contents of the partial is essentially a complete HTML5 file. It includes references to local CSS assets using relative paths:

<link href="/lib/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css" />
<link href="/css/bootstrap-icons.css" rel="stylesheet" type="text/css" />
<link href="/css/pdf.css" rel="stylesheet" type="text/css" />

The first two references bring in Bootstrap 5 and Bootstrap icons, while the final one brings in some rules that set the font to Open Sans and ensure that tables are broken nicely over multiple pages.

iText

I should start by mentioning that, like the other two options, iText is also open source software. However, unlike the other two options, it is not free for commercial use. I have no idea what the cost of a commercial licence is. Their site requires that you fill out a form and have a sales person contact you to "establish the best licencing model for you". Note that you can use iText free of charge under the AGPL licence.

I'll show the full code for the PageModel class for the Razor page that produces the PDF generated by iText. It includes the code for getting the data for the partial and for rendering it to a string. The services for both of these tasks (IProductManager and IRazorTemplateRenderer) are registered with the service container as scoped services (because they both depend on other scoped services) and injected into the constructor. The PageModel also includes a string property called BaseHref which consists of the current request's Scheme and Host properties, resulting in an absolute URL. This is important for iText.

public class ITextVersionModel : PageModel
{
    private readonly IProductManager productManager;
    private readonly IRazorTemplateRenderer renderer;
 
    public ITextVersionModel(IProductManager productManager, IRazorTemplateRenderer renderer)
    {
        this.productManager = productManager;
        this.renderer = renderer;
    }
    string BaseHref => $"{HttpContext.Request.Scheme}://{HttpContext.Request.Host}";
    public List<Product> Products { get; set; } = new();
    public async Task<FileResult> OnGetReportFromPartialAsync()
    {
        Products = await productManager.GetProducts();
        var html = await renderer.RenderPartialToStringAsync("_ProductReport-v3", this);
        ConverterProperties converterProperties = new ();
        converterProperties.SetBaseUri(BaseHref);
        using var stream = new MemoryStream();
        HtmlConverter.ConvertToPdf(html, stream, converterProperties);
        return File(stream.ToArray(), MediaTypeNames.Application.Pdf, "Reorder Report (iText).pdf");
    }
}

The primary method for generating a PDF from HTML in iText is the HtmlConverter.ConvertToPdf method. This overload take a string, a stream for the output and a ConverterProperties object that consists of options for the converter. You can tell this library was written by Java developers. They love to provide methods for setting property values whereas .NET developers are more likely to allow you to just set the value via a public property. Otherwise the API for generating an A4 portrait PDF (the default document size and orientation) is straightforward . Anyway, we use the ConverterProperties object to set the BaseUri without which iText is unable to resolve the relative URLs in the CSS references in the partial, resulting in no styling being applied to the final PDF. I only discovered this from a Stackoverflow post. The one example on the iText site that demonstrates generating a PDF from HTML fails to mention it. Let's take a look at what is actually generated. The resulting file size is 43KB:

The image, CSS and icons were all located and applied, but there are some shortcomings. iText claims to support flex, but clearly it had trouble with the Bootstrap justify-content-between class. The header and the logo should have been flush with the start and the end of the container respectively. In addition, iText has not applied the text-danger class to the icons that appear in the Reorder column. In Bootstrap 5, text-danger uses CSS custom properties:

.text-danger {
    --bs-text-opacity: 1;
    color: rgba(var(--bs-danger-rgb),var(--bs-text-opacity))!important;
}

It appears that iText does not support these yet, which is further evidenced by the fact that the striped effect has not been applied to the table. This also relies on custom properties in Bootstrap 5.

iText Pros

  • Fully supported (with a commercial licence).
  • In active development.
  • Includes support for advanced PDF features including editing, reading, forms, security.
  • Product developers seem reasonably active on Stackoverflow.
  • No third party dependencies.
  • Reasonably straightforward API for simple requirements.

iText Cons

  • Only free under the AGPL Licence terms.
  • No indication of the cost of a commercial licence.
  • Only partial support for more modern CSS features.
  • Website doesn't contain much by way of tutorials or guides.

DinkToPdf

As I mentioned, DinkToPdf comes up in searches for generating PDFs in ASP.NET, which is why I look at it. The first thing to mention, however, is that there has been no new release since April 2017 and Github issues go unanswered, so this looks like a dead project. When you dig further, you find that it depends on the wkhtmltopdf library, which is also a dead project. This in turn depends on QtWebKit which has had no updates since 2012. Nevertheless, it is available, free and relatively simple to use for generating PDFs from HTML. However, it requires that you deploy the wkhtmltopdf native library (Windows dll is about 40MB) as part of your application manually. The Nuget installation process does not do this for you, although a forked version of this project, Haukcode.WkHtmlToPdfDotNet does .

In a web application, the recommended way to generate PDFs using DinkToPdf is to use the thread-safe SynchronizedConverter. This is best registered as a singleton service:

builder.Services.AddSingleton<IConverter>(provider => new SynchronizedConverter(new PdfTools()));

I inject this into a service that uses the converter to return a byte array:

public class PdfGenerator : IPdfGenerator
{
    private readonly IConverter converter;
    public PdfGenerator(IConverter converter) => this.converter = converter;
    public byte[] Render(GlobalSettings globalSettings, ObjectSettings objectSettings) =>
        converter.Convert(new HtmlToPdfDocument() { GlobalSettings = globalSettings, Objects = { objectSettings } });
}

This service is also registered as a singleton:

builder.Services.AddSingleton<IPdfGenerator, PdfGenerator>();

The PageModel that uses DinkToPdf to generate the PDF is shown here. This time, along with the services required to get the data, render the partial to a string and generate the PDF, I've injected the IWebHostEnvironment service. This is required so I can obtain the value of the WebRootPath property, which is the absolute file path to the wwwroot folder. Whereas iText requires you to explicitly set the base URL for static assets, I could only get DinkToPdf to pick up the CSS and images if I provide a full file path instead of a URL. We'll see the changes needed to the partial to accommodate this in a minute. The converter wants some global settings (page size, orientation etc) and some object settings (the content).

public class DinkToPdfVersionModel : PageModel
{
    private readonly IProductManager productManager;
    private readonly IWebHostEnvironment environment;
    private readonly IRazorTemplateRenderer renderer;
    private readonly IPdfGenerator pdfGenerator;
 
    public DinkToPdfVersionModel(
        IProductManager productManager,
        IWebHostEnvironment environment,
        IRazorTemplateRenderer renderer,
        IPdfGenerator pdfGenerator)
    {
        this.productManager = productManager;
        this.environment = environment;
        this.renderer = renderer;
        this.pdfGenerator = pdfGenerator;
    }
 
    public List<Product> Products { get; set; } = new();
    public string WebRootPath => environment.WebRootPath;
    public async Task<FileResult> OnGetReportFromPartialAsync()
    {
        Products = await productManager.GetProducts();
        var html = await renderer.RenderPartialToStringAsync("_ProductReport-dink", this);
        var globalSettings = new GlobalSettings
        {
            Orientation = Orientation.Portrait,
            PaperSize = PaperKind.A4,
        };
        var objectSettings = new ObjectSettings()
        {
            HtmlContent = html,
        };
        return File(pdfGenerator.Render(globalSettings, objectSettings), MediaTypeNames.Application.Pdf, "Reorder Report (DinkToPDF).pdf");
    }
}

Here's the links in the partial for the CSS files. The image src attribute also uses a file path:

<link href="@System.IO.Path.Combine(Model.WebRootPath,"lib\\bootstrap\\dist\\css\\bootstrap.min.css")" rel="stylesheet" type="text/css" />
<link href="@System.IO.Path.Combine(Model.WebRootPath,"css\\bootstrap-icons.css")" rel="stylesheet" type="text/css" />
<link href="@System.IO.Path.Combine(Model.WebRootPath,"css\\pdf.css")" rel="stylesheet" type="text/css" />

Here's the rendered result which came in at 29KB, a 33% decrease on the iText version:

Unsurprisingly, given that this library's rendering engine is 10 years old, flex is not supported at all. Nor are custom properties. I found that a reasonable result can be obtained by downgrading to Bootstrap 3 and using older ways to control position. If you have spent much time downgrading your HTML to accommodate the desktop version of Outlook for mailers, this is a small price to pay.

DinkToPdf Pros

  • Free and always will be.
  • Smaller final file than iText.
  • All you need can be deployed with your web application.
  • Easy API.

DinkToPdf Cons

  • Dead project, so no support or new features.
  • No support for modern CSS.
  • Minimal documentation.
  • No support for advanced PDF features such as reading, editing, securing, forms.

ChromeHtmlToPdf

The ChromeHtmlToPdf library makes use of Chrome headless, basically the Chrome browser without a UI. This means that the Chrome browser needs to be installed on the server and your application requires access to it. Assuming that you can resolve these requirements, here's the PageModel code for the Chrome version:

public class ChromeVersionModel : PageModel
{
    private readonly IProductManager productManager;
    private readonly IRazorTemplateRenderer renderer;
 
    public ChromeVersionModel(IProductManager productManager, IRazorTemplateRenderer renderer)
    {
        this.productManager = productManager;
        this.renderer = renderer;
    }
    public List<Product> Products { get; set; } = new();
    public string BaseHref => $"{HttpContext.Request.Scheme}://{HttpContext.Request.Host}";
    public async Task<FileResult> OnGetReportFromPartialAsync()
    {
        Products = await productManager.GetProducts();
        var html = await renderer.RenderPartialToStringAsync("_ProductReport-chrome", this);
        var pageSettings = new PageSettings(ChromeHtmlToPdfLib.Enums.PaperFormat.A4);
        var stream = new MemoryStream();
        using var converter = new Converter();
        converter.ConvertToPdf(html, stream, pageSettings);
        return File(stream.ToArray(), MediaTypeNames.Application.Pdf, "Reorder Report (Chrome).pdf");
    }
}

Very similar stuff to the other examples. As with the iText version, a base URL is required, only this time, it needs to be set in the partial file itself. In addition we need to import the fonts and icons explicitly because they don't seem to resolve when the imports are placed in CSS files:

<base href="@Model.BaseHref" />
<style>
    @@import url('https://fonts.googleapis.com/css2?family=Open+Sans:wght@400&display=swap');
    @@import url("https://cdn.jsdelivr.net/npm/bootstrap-icons@1.10.2/font/bootstrap-icons.css");
</style>

We instantiate an instance of the Converter in a using block so that it is disposed at the end of the handler. We pass the HTML, a stream for the output, and a PageSettings object containing basic PDF options to its ConvertToPdf method. The resulting PDF looks like this:

As you can see, this is the closest we get to the web version. However, the resulting file size comes in at a whopping 333KB. That 7.5 times larger than iText and 11.5 times larger than DinkToPdf.

ChromeHtmlToPdf Pros

  • Free and always will be
  • Easy API
  • Full support for modern CSS

ChromeHtmlToPdf Cons

  • Requires Chrome to be installed on the target server
  • Massive file size compared with alternatives
  • No support for advanced PDF features such as reading, editing, securing, forms
  • No technical support available
  • Minimal documentation

Summary

I've taken a look at generating PDF files from HTML within an ASP.NET Core application using three different tools. Each has their different features and requirements. Hopefully this exploration will help you choose a suitable solution for your use. If not, there are a large number of other solutions, mostly 100% commercial, available.

All the code in this article is made available under the AGPL licence on Github.

Похожее
Aug 19
Author: Mukesh Murugan
In this guide, we will learn how to implement Advanced Pagination in ASP.NET Core WebApi with ease. Pagination is one of the most important concepts while building RESTful APIs. You would have seen several public APIs implementing this feature for...
Nov 25, 2022
Author: Amit Naik
In this article, we will see Distributed caching, Redis cache, and also Redis caching in ASP.NET Core. Follow me on Github and Download source code from GitHub Table of Content What is distributed caching and its benefit IDistributedCache interface Framework...
Jul 29
Author: Rick Strahl
Over the last few years, Markdown has become a ubiquitous text-entry model for HTML text. It's creeping up in more and more places and has become a standard for documents that are shared and edited for documentation purposes on the...
Feb 17, 2023
Author: Juldhais Hengkyawan
A Guide to Building Scalable, Maintainable Web API using ASP .NET Core The term “Clean Architecture” has become increasingly popular in software development in recent years. Clean Architecture is a software design pattern that prioritizes the separation of concerns, making...
Написать сообщение
Тип
Почта
Имя
*Сообщение