Поиск  
Always will be ready notify the world about expectations as easy as possible: job change page
Mar 12, 2023

C# 11.0 new features: raw string literals

Автор:
Ian Griffiths
Источник:
Просмотров:
609

C# 11.0 became available with .NET 7.0 in November 2022. It has made a few improvements for string literals. In this post, I'll show how the new raw string literals feature can significantly improve readability.

Problems with string literals

There are two problems that have long plagued string literals in C#:

  • Strings that include double quotes tend to be unreadable
  • Indentation looks messy in multiline strings

For example, suppose you wanted to put the following JSON in a string literal:

{
  "number": 42,
  "text": "Hello, world",
  "nested": { "flag": true }
}

(Note: In many scenarios you won't want to create JSON text this way because of there are security issues if you plug in user-supplied data. It might be better to look at using the System.Text.Json.Nodes types to build up your JSON structure. However, there are some scenarios in which a simple string literal is a valid approach.)

As an ordinary quoted string literal, it looks like this:

string json = "{\r\n  \"number\": 42,\r\n  \"text\": \"Hello, world\",\r\n  \"nested\": { \"flag\": true }\r\n}"

This is pretty horrible. JSON requires property names to be delimited with double quotes, which is unfortunate, because C# uses the same character to delimit strings literals. This means we've had to escape all of the double quotes that are in there as part of the JSON, with the effect that the text is now bristling with backslashes.

Also, it's all on one line. We've had to denote new lines with escaped control characters, and the indentation is not at all clear. C# veterans will note that verbatim string literals work slightly better here. These can span multiple lines, making it a little easier to see the structure:

string json = @"{
  ""number"": 42,
  ""text"": ""Hello, world"",
  ""nested"": { ""flag"": true }
}";

The @ at the start marks this out as a verbatim string.) Although that looks better, this approach doesn't work quite as well as we might hope in practice. That example is left-aligned, but look what happens if we want to use a verbatim string literal in some more deeply nested code:

foreach (x in y)
{
    if (Test(x))
    {
        string json = @"{
            ""number"": 42,
            ""text"": ""Hello, world"",
            ""nested"": { ""flag"": true }
        }";
        Use(json);
    }
}

We've had to leave all the lines inside the verbatim string literal over on the left, because any whitespace will be included. If we had indented it to be aligned with the rest of the code, the resulting JSON would include all of the corresponding whitespace. This is annoying.

The quotes also look different. They still need to be escaped, because the string is still delimited by double quotes, but in verbatim literals, this works differently. Verbatim literals do not treat backslash as an escape character. (This makes them particularly good for strings that need a lot of backslashes. For example, instead of "C:\\Windows\\System32\\drivers\\etc" we can write @"C:\Windows\System32\drivers\etc". A side effect of this is that \" no longer represents an escaped double quotation mark; it is a backslash followed by the end of the string. To enable verbatim strings to contain double quotes, C# recognizes pairs of double quotes as signifying a single double quote, and not the end of the string.)

Frankly, this is still not ideal. C# 11.0 adds a new feature that enables string literals to avoid this kind of mess. Moreover, the feature is designed to avoid producing yet another version of this problem in the future. (If you pick any one character to work differently in a string, there's always the possibility that the future will bring a string format that requires you to use that particular character a lot. For example, JSON was not popular when C# was invented, so there was much less call for string literals that contained double quotes back then, meaning that most of the time, either normal or verbatim literals worked pretty well. But JSON's double-quote-heavy syntax spoiled that. Languages that allow single-quote-delimited string literals sidestep this particular problem, but they would still have issues if you needed a mixture of single and double quotes in a string.) The language designers wanted to avoid designating any one character sequence as being off-limits.

C# 11.0 raw string literals

Here's how that last example looks with a raw literal:

foreach (x in y)
{
    if (Test(x))
    {
        string json = """
        {
          "number": 42,
          "text": "Hello, world",
          "nested": { "flag": true }
        }
        """;
        Use(json);
    }
}

The raw string is delimited by """ (three double quotes) in this particular example, although as we'll see, it doesn't necessarily have to be. Both of the earlier problems have been fixed: inside the literal, we've been able to use individual double quote characters—no need to escape them here. So the JSON looks like JSON. Furthermore, the indentation aligns with our code. And unlike with a verbatim literal, not all of that indentation is included in the final value. However, the compiler hasn't stripped all of it out either. It has retained just what we want: the value of this raw literal is exactly the JSON shown at the start of this post, including the two spaces of indentation for the contents of the object.

How has the C# compiler worked out exactly how much indentation to trim? It knows because we've written this as a multiline raw string literal. We indicate that we want to write a multiline constant by making the opening """ the last thing on its line. When we do this, the closing """ must be on a line where it is preceded only by white space. (It's an error for the closing """ of a multiline raw string literal to appear after non-whitespace content on the same line.) When we do this, the compiler counts the number of spaces from the start of the line to the closing """ and then removes exactly that much whitespace from the start of each line in the string.

When you write a multiline raw string literal, the compiler considers the string to start on the line after the opening """ and to finish on the line before the closing """. So although it looks here like there's a newline before the opening {, there won't be. Likewise, there will be no newline after the closing }. (If you want newlines at the start and end, you can just add blank lines.)

Interpolated raw string literals

If you put a $ in front of a raw string literal, it can include expressions, delimited by braces (just like when you do that with an ordinary string literal). For example:

string json = $"""
    # {title}
    
    The topic for today is {topic}.
    """;

If you've ever attempted to write JSON with C# string interpolation, you will know that it can be problematic. Just as with the mess around double quotes, there's a problem because both C# and JSON have special meanings for braces ({ and }). However, raw string literals have a solution for this. Look at this code:

string json = $$"""
    {
        "number": 42,
        "text": "{{message}}",
        "nested": { "flag": true }
    }
    """;

Here, I've been able to use braces for my JSON structure without problems. Why hasn't the compiler attempted to process these as embedded expressions in the interpolated string? The clue is that this string has not one but two $ signs at the start. This indicates that in this particular string, single { and } characters should handled normally, and that only double ones should be treated as delimiters for embedded expressions. This is a little confusing because with ordinary interpolated strings it's the other way around—we double up the braces to indicate that we don't mean them to be processed as delimiters. However, the inversion certainly works better in this example, and it's also more generalizable: we can put as many $ signs as we like at the start of the string, and the number we use determines the number of consecutive { or } symbols there need to be for the compiler to interpret them as expression delimiters.

So if we wanted to be able to include pairs of braces, we could indicate that only triple braces denote delimiters:

string m = $$$"""{{{name}}} has an exceptionally bushy moustache: :-{{""";

In fact, it's not just the embedded expression delimiters that can play this trick.

Changing the string delimiter

Not only can we adjust the number of $ signs to configure the number of braces required to denote an expression, we can also change the number of double quotation marks at the start and end of the string. Earlier, I said that with the raw string literals feature, the C# language team wanted to avoid ever getting into a situation where the string delimiters clash with something we want to put in a string. So what if some future syntax happens to require three quotes in the middle of a string? Easy: we can just make our raw string literal use four-quote delimiters:

string whoIsInventingTheseSyntaxes = """"
    For some reason, it's necessary to include """ in this text.
    """";

You can put any number of double quotation marks in a row as the opening of a raw literal. You must then use the same number to mark the end. Your string is then free to contain consecutive runs of double quotes of any length, provided that they are shorter than the delimiters. Since there's no practical limit to the number of quotes that appear in the delimiters, there's no limit to the number of consecutive double quotes that can appear inside the string itself.

Summary

Raw string literals improve the readability of strings. By giving us the freedom to change the delimiter sequences used for the string itself (and also for embedded expressions, if the string is interpolated) we can write string literals that contain any normal characters in any sequence without needing to sprinkle escape characters everywhere. And the special whitespace handling for multiline strings means we can indent our strings along with the rest of our code without including excess indentation in the string value.

Написать сообщение
Тип
Почта
Имя
*Сообщение