49  
netcore
Поиск  
Always will be ready notify the world about expectations as easy as possible: job change page
Jan 15

Programmatically convert HTML to PDF in .NET Core C# without dependencies

Programmatically convert HTML to PDF in .NET Core C# without dependencies
Автор:
Источник:
Просмотров:
3402

HyperText markup language, commonly referred to as HTML, has been the foundation of creating and navigating web pages from the very beginning. Its significance further increases provided how the world is moving towards digitization. Hence, working with this format just can’t be confined to internet use; users even look forward to accessing the information available in online and offline modes. What better than PDF format to serve the purpose?

Grapecity Documents has been offering the Grapecity Documents for Html (GcHtml) library dedicated to the sole purpose of converting HTML content to PDF and images. With the v6 release, the GcHtml package has evolved to end any dependencies on the browser version or GPL/LGPL licenses. GcHtml no longer depends on a custom Chromium build. It can now work with the Chrome or Edge browsers installed in the operating system. Also, it can download Chromium from a public website and install it in a local folder to be used in the application.

In this blog, we will learn about the new GcHtml package, tips for migrating from the old to the new GcHtml package, and finally, explore how to use the new GcHtml package for converting HTML to PDF.

GcHtml Package

GcHtml uses a Chrome or Edge browser (already installed in the current system or downloaded from a public website) in headless mode and interacts with it using the WebSocket protocol. Additionally, as in previous versions, platform(OS) specific NuGet packages are no longer required.

The list below defines the fundamental classes structured under the GrapeCity.Documents.Html namespace, which are mandatory for the conversion:

  1. BrowserFetcher: This class helps discover the path to the installed browser or download the Chromium browser from a public server.
  2. GcHtmlBrowser: This class represents a browser process such as Chrome, Edge, or Chromium.
  3. HtmlPage: This class represents a browser tab with HTML content and provides various methods to save the HTML content to PDF or images.

For detailed information on these and other classes available in the package, refer to the documentation.

NOTE: The old GcHtmlRenderer class is now obsolete, but it is still available (for backward compatibility) and works internally through the GcHtmlBrowser class.

Migration tips

In this section, we will discuss some tips that will help you migrate from the old GcHtml package to the new GcHtml package, as listed below:

1. Remove any references to these old packages in project dependencies, as they are no longer required. The new single unified package would now serve all these different platforms.

  • GrapeCity.Documents.Html.Windows.X64
  • GrapeCity.Documents.Html.Linux.X64
  • GrapeCity.Documents.Html.Mac.X64

2. The licensing code in your project should be updated as described below:

GcHtmlRenderer.SetGcImagingLicenseKey(key); -> GcHtmlBrowser.SetGcImagingLicenseKey(key);
GcHtmlRenderer.SetGcPdfLicenseKey(key); -> GcHtmlBrowser.SetGcPdfLicenseKey(key);

3. As noted above GcHtmlRenderer class is now obsolete, and we must use GcHtmlBrowser class to begin with the conversion. So, to create and use an instance of GcHtmlBrowser, you will need the path to a Chromium-based browser on the current system, which can be fetched using the BrowserFetcher class as depicted in the code snippets below:

  • Get the path to an existing instance of Chrome installed on the current system:

    var path = BrowserFetcher.GetSystemChromePath();
  • Or, download and install Chrome in a location of your choice, for example:

    var tp = Path.GetTempPath();
    var bf = new BrowserFetcher() { DestinationFolder = Path.Combine(tp, ".gc-chromium") };
    var path = bf.GetDownloadedPath();

4. Once we have the browser path, we can create an instance of GcHtmlBrowser class by specifying various options using the LaunchOptions class. For example, the code snippet below depicts instantiating GcHtmlBrowser with the RunWithNoSandbox option, which may be needed on some Linux systems:

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
    return new GcHtmlBrowser(path, new LaunchOptions { RunWithNoSandbox = true });
else
    return new GcHtmlBrowser(path);

NOTE: Some options from the old Pdf/Jpeg/PngSettings classes have been moved to the LaunchOptions class, while others are now in PageOptions.

5. All the method calls to GcGraphics.DrawHtml() method must be updated to insert the browser instance as the first parameter to the call:

g.DrawHtml(html, ...); -> g.DrawHtml(browser, html, ...);

6. The code that uses the RenderToPdf method of GcHtmlRenderer to render URIs to PDF, for example:

using var re = new GcHtmlRenderer(uri);
...
re.RenderToPdf(file, new PdfSettings() {…});

should be replaced with the SaveAsPdf method of HtmlPage class as depicted below:

// Create an HtmlPage from the URI
// (DefaultBackgroundColor and WindowSize options from Pdf/Jpeg/PngSettings
// have moved to PageOptions, while some other options are now in LaunchOptions):
using var htmlPage = browser.NewPage(uri, new PageOptions() { WindowSize = pixelSize;… });
...
htmlPage.SaveAsPdf(file, new PdfOptions() {...});

7. Lastly, it is highly recommended to dispose of GcHtmlBrowser and HtmlPage as soon as they are not needed anymore.

Render HTML to PDF

To begin with, let’s understand how the GcHtml library helps convert HTML to PDF. GcHtml library provides two different ways to perform HTML to PDF conversion. The list below summarizes the same:

1. Using GcHtmlBrowser class: This approach can be considered when you want to generate a PDF document from scratch or a PDF document that solely consists of the HTML content you are looking forward to rendering.

To implement this approach, the NewPage method of GcHtmlBrowser class should be invoked to prepare a browser page with HTML content. This method has two overloads, the one that accepts Uri to the source HTML page and the other that accepts HTML as a plain string.

This method returns an instance of the HtmlPage class, and then the SaveToPdf method of the HtmlPage class helps to convert the source HTML to PDF. It accepts the output file path as the first parameter. The second parameter (optional) is the PdfOptions instance that defines parameters for the output PDF file.

2. Using DrawHtml method: This approach can be considered when you would like to append the HTML information into an existing PDF file that already has some other content available, and all the HTML content you want to render as PDF will be appended on a new page in the existing document.

This method extends the GcGraphics and allows it to render an HTML text or page in a PDF. This allows inserting HTML fragments into a PDF file along with other (non-HTML) content.

DrawHtml method has two overloads:

  • Draws an HTML text on GcPdfGraphics, at a specified position:

    bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, string html, float x, float y, HtmlToPdfFormat format, out SizeF size);

    Here, HtmlToPdfFormat class contains attributes for rendering HTML on a GcPdfGraphics instance using DrawHtml extension methods.
     
  • Draws an HTML page specified by an URI on GcPdfGraphics, at a specified position:

    bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, Uri htmlUri, float x, float y, HtmlToPdfFormat format, out SizeF size);

    The sections ahead in this blog will describe in detail how to use the above-defined methods to perform the conversion in different scenarios.

Here, HtmlToPdfFormat class contains attributes for rendering HTML on a GcPdfGraphics instance using DrawHtml extension methods.

The sections ahead in this blog will describe in detail how to use the above-defined methods to perform the conversion in different scenarios.

HTML Files to PDF

Consider a scenario where an e-commerce firm’s transactions are carried out online. The invoices for these transactions are generated over the same platform in HTML format. The style and layout of these invoices may not remain intact when viewed offline or on other devices.

Since these invoices need to be distributed to the customers and they may use different devices or browsers to view the invoices, converting HTML to PDF would be better to retain the content, layout, and formatting. Hence, to provide these invoices to the customers over email, the company converts the HTML files to PDF.

Here is a quick view of an Invoice in HTML file format:

Invoice in HTML
Invoice in HTML

To serve this purpose, GcPdf and GcHtml packages can be used. Let us see how to go about it from scratch:

  1. Open Visual Studio and create a .Net Core Console application by selecting the same from the templates.
  2. In your application, right-click ‘Dependencies’ and select ‘Manage NuGet Packages’.
  3. With the “Package source” set to the NuGet website, search for GrapeCity.Documents.Pdf under the ‘Browse’ tab and click Install.
  4. Similarly, install the “GrapeCity.Documents.Html” package.

Note: While installing, you’ll receive two confirmation dialogs: ‘Preview Changes’ (if the “Show preview window” option setting for the package is checked) and ‘License Acceptance’, click ‘OK’ and ‘I Agree’ respectively, to continue.

5. Add references to the following namespaces in Program.cs file:

using GrapeCity.Documents.Html;
using GrapeCity.Documents.Pdf;
using GrapeCity.Documents.Drawing;

6. Now, we can achieve the conversion using the approaches defined above, i.e., GcHtmlBrowser class and the DrawHtml method. The code snippets below depict the implementation both ways:

Using GcHtmlBrowser class:

// Define the HTML file URI
var uri = new Uri("Invoice.html", UriKind.Relative);

//Invoke the NewPage method to generate a browser page with HTML content
using var pg = browser.NewPage(uri, new PageOptions
{
   WindowSize = new Size(1024, 1024)
});
    
// Save HTML to PDF using SaveAsPDF method
pg.SaveAsPdf("Invoice_Save.pdf", new PdfOptions
{       
   FullPage = false
});

Using DrawHtml method:

// Create a GcPdfDocument instance
var doc = new GcPdfDocument();

// Add a new page to the document
var page = doc.Pages.Add();

// Take the Graphics instance of the page
var g = page.Graphics;

// Add the HTML file to it, using the DrawHtml method which reads the html content from the invoice file
g.DrawHtml(browser, File.ReadAllText("Invoice.html"), 72, 72, new HtmlToPdfFormat(false) { MaxPageWidth = 6.5f, MaxPageHeight = 9f }, out SizeF size);

// Save the PDF Document
doc.Save("Invoice_Draw.pdf");

With these quick steps, you are ready with a PDF file generated from an HTML file, as depicted in the screenshot below:

Invoice

HTML String to PDF

Simple HTML strings can also be directly rendered to PDF using DrawHtml method. This can be done using HTML files, so you can directly specify the HTML content now.

Follow steps (1) to (5) as mentioned above. After that, add the following code in Program.cs file, which performs the HTML to PDF conversion using the DrawHtml method approach:

// Create a variable containing the HTML code as string
var html = "<!DOCTYPE html>" +
"<html>" +
"<head>" +
"<style>" +
"p.round {" +
"font: 36px verdana;" +
"color: Red;" +
"border: 4px solid SlateBlue;" +
"border-radius: 16px;" +
"padding: 3px 5px 3px 5px;" +
"}" +
"</style>" +
"</head>" +
"<body>" +
"<p class='round'>Thank You for shopping with us!</p>" +
"<p class='round'>Hope to see you again soon.</p>" +
"</body>" +
"</html>";

// Create a GcPdfDocument instance
var doc = new GcPdfDocument();
// Add a new page to the document
var page = doc.Pages.Add();
// Take the Graphics instance of the page
var g = page.Graphics;

//Define GcHtmlBrowser instance
var path = new BrowserFetcher().GetDownloadedPath();
using (var browser = new GcHtmlBrowser(path))
{
    // Render the HTML string on the PDF, using the DrawHtml method
    var ok = g.DrawHtml(browser, html, 72, 72, new HtmlToPdfFormat(false) { MaxPageWidth = 6.5f }, out SizeF size);
    
    // Additionally, draw a rounded rectangle around this HTML string
    if (ok)
    {
        var rc = new RectangleF(72 - 4, 72 - 4, size.Width + 8, size.Height + 8);
        g.DrawRoundRect(rc, 8, Color.PaleVioletRed);
    }

    //Save the PDF Document
    doc.Save("HTMLStringToPDF.pdf");
}

The screenshot below depicts the PDF file generated by executing the above code snippet:

HTML string to PDF

Web Pages to PDF

The GcHtmlBrowser class and HtmlPage class can be used to render webpages to a PDF. If the above-discussed firm wants to update its customers or the stakeholders with the new products every month, it sends a PDF generated from the New Releases page on its website. The process should be automated to make the stakeholders aware of the new launches regularly and create a consolidated report at the end of every year.

Here is a view of one such web page:

Web page to PDF

The GcHtmlBrowser class, along with HtmlPage class, can be used to serve this purpose. The Uri of the webpage is used, and the required settings of the PDF are applied using PdfOptions class.

Follow steps (1) to (5) as mentioned above. After that, add the following code in Program.cs file, which performs the HTML to PDF conversion using the GcHtmlBrowser class approach:

// Specify a PDF file name
var fn = @"webpage.pdf";

// Specify the url to be used for PDF conversion
var uri = new Uri(@"https://www.amazon.com/gp/new-releases/electronics/ref=zg_bs_tab_t_bsnr");

// Define GcHtmlBrowser instance
var path = new BrowserFetcher().GetDownloadedPath();
using (var browser = new GcHtmlBrowser(path))
{
    // The PdfOptions instance is created to specify the pdf related settings that will show up in the generated PDF.
    var pdfOptions = new PdfOptions()
    {
        PageRanges = "1-100",
        Margins = new PdfMargins(0.2f), // narrow margins all around          
        Landscape = false,
        PreferCSSPageSize = true
    };
    
    // Create an HtmlPage instance rendering the source Uri:
    using var htmlPage = browser.NewPage(uri);
    
    // Render the source Web page to the temporary file:
    htmlPage.SaveAsPdf(fn, pdfOptions);
}

Here is a quick view of the PDF file generated from the web page:

Web page to PDF

How do you use GcHtml in your applications? Let us know in the comments.

Visit Help | GcHtml Architecture | Demo

• • •

Originally published at https://www.grapecity.com on December 15, 2022.

Похожее
Jul 7, 2021
Author: Changhui Xu
C# has a feature, String Interpolation, to format strings in a flexible and readable way. The following example demonstrates the way how we usually output a string when we have data beforehand then pass data to the template string. var...
Jul 15
I recently migrated this blog from WordPress to a custom Nuxt site. I moved from WordPress to have more control over the blog and not have to rely on plugins to do everything. It’s worked out really well but there...
Jan 10
Author: MESCIUS inc.
In today’s connected world, data is piling up very fast. We are generating more data than ever in human history, and a lot of this data is getting stored in non-relational formats like JSON documents. JSON has become a ubiquitous...
Apr 11
Author: Jon Hilton
Sometimes you need to run scheduled tasks for your .NET web app. Automated report creation, status checks, routine admin tasks, that sort of thing. I spent the last few weeks migrating Practical ASP.NET to .NET 8 (static SSR). One requirement...
Написать сообщение
Тип
Почта
Имя
*Сообщение