Поиск  
Always will be ready notify the world about expectations as easy as possible: job change page
May 9

Performance benchmarks of PostgreSQL .NET with Npgsql, Dapper, and Entity Framework Core

Performance benchmarks of PostgreSQL .NET with Npgsql, Dapper, and Entity Framework Core
Автор:
Источник:
Просмотров:
3819

In a previous blog post, I showed you how to use PostgreSQL in C# with Npgsql, Dapper, and Entity Framework Core. But if you’re going to use one of them, it’s probably a good idea to make sure you’re not choosing a library that has really bad performance. Or at least to understand how much it’s going to hurt.

This blog post compares the performance of our three .NET clients for Postgres: Npgsql, Dapper, and Entity Framework Core. How much of an overhead are you going to be paying for each of them? We’ll look at benchmarks that compare the main operations of Insert, Select, and Update, and the best ways to implement them. So let’s get to it.

Master plan

For the actual benchmarks, I’ll be using the canonic .NET benchmarking tool: BenchmarkDotNet. Doing benchmarks properly is not that easy, and this library takes care to do it right. It runs operations many times to extract the mean, runs warmup operations, checks error deviation, considers CPU cache, and so on.

I’ll be using a table of teachers, that corresponds to the following Teacher class in C#:

public class Teacher : DbContext
{
    public int Id { get; set; } // primary key
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Subject { get; set; }
    public int Salary { get; set; }
}

You can find the benchmark project in a GitHub repo here.

By the way, it’s entirely possible that I made mistakes. Performance benchmarks are hard. I’m open to feedback in the comments below or with issues on GitHub.

Inserting records

I created two benchmarks for insertion. The first inserts a single row into a table and the second inserts 1000 rows in bulk.

Here’s the code for inserting a single Teacher record. Note that I have two different ways of inserting rows with Npgsql (which is the default C# PostgreSQL client) because the naïve implementation turned out to be about 15% slower and I wanted to show the difference. Using raw SQL, as opposed to parameterized, and synchronous code, as opposed to asynchronous, went a long way.

public async Task NpgsqlInsertParameterized(Teacher teacher)
{
    // cmd (of type 'NpgsqlCommand') is reused here
    cmd.CommandText = $"INSERT INTO teachers (first_name, last_name, subject, salary) VALUES (@firstName, @lastName, @subject, @salary)";
    cmd.Parameters.Clear();
    cmd.Parameters.AddWithValue("firstName", teacher.FirstName);
    cmd.Parameters.AddWithValue("lastName", teacher.LastName);
    cmd.Parameters.AddWithValue("subject", teacher.Subject);
    cmd.Parameters.AddWithValue("salary", teacher.Salary);
    await cmd.ExecuteNonQueryAsync().ConfigureAwait(false);
}

public void NpgsqlInsertRawSql(Teacher teacher)
{
    cmd.CommandText = $"INSERT INTO teachers (first_name, last_name, subject, salary) VALUES ('{teacher.FirstName}', '{teacher.LastName}', '{teacher.Subject}', {teacher.Salary})";
    cmd.ExecuteNonQuery();   
}

public async Task DapperInsert(Teacher teacher)
{
    string sqlCommand = $"INSERT INTO teachers (first_name, last_name, subject, salary) VALUES (@firstName, @lastName, @Subject, @salary)";
    
    var queryArguments = new
    {
        firstName = teacher.FirstName,
        lastName = teacher.LastName,
        subject = teacher.Subject,
        salary = teacher.Salary,
    };
    
    // '_connection' is of a type from Npgsql and ExecuteAsync is an extension method from Dapper
    await _connection.ExecuteAsync(sqlCommand, queryArguments).ConfigureAwait(false);
}

public async Task EFInsert(Teacher teacher)
{
    await db.Teachers.AddAsync(teacher).ConfigureAwait(false); // db (of type SchoolContext) is reused here
    await db.SaveChangesAsync().ConfigureAwait(false);
}

There’s a little setup for this, like creating a connection, but you can read my previous post where I show how to run Npgsql, Dapper, and Entity Framework from scratch. Or peek in the GitHub project for the actual implementation.

And here are the results:

Method Mean Error StdDev
NpgsqlInsert 683.5 us 30.99 us 90.41 us
NpgsqlInsertRawSql 528.8 us 20.75 us 60.52 us
DapperInsert 637.3 us 25.30 us 73.79 us
EFInsert 980.6 us 38.54 us 111.18 us

As you can see, using Npgsql with Raw SQL is fastest, better than Dapper or parameterized Npgsql. Entity Framework Core is 1.5 times slower than the other methods. Not that it’s very slow by user perception standards. The operations are measured in microseconds (1000us = 1ms), so a single insert with Entity Framework is still going to be slower than 1 millisecond.

Tip: Running Entity Framework in NoTracking mode reduced speed from 2xDapper to just 1.5xDapper (thank you @ShayRojansky for suggesting it).

Bulk Insert

Inserting many items at once is both important and sensitive to performance problems. Adding 100,000 rows can take many milliseconds or whole seconds, as opposed to the microseconds it takes to insert a single item. This becomes a big deal if any kind of user interaction depends on it.

In the case of bulk insert, we can’t compare Npgsql to either Dapper or Entity Framework because they don’t support it. So we’re going to compare 4 different ways to do it with Npgsql. The first two ways are the correct ways to bulk insert data, whereas the last two implementations are just for comparison’s sake and general interest.

To do this, I’ve generated 1000 random Teacher objects upfront.

public void BulkInsertRegular(IEnumerable<Teacher> teachers)
{
    // using raw SQL here as opposed to a parameterized approach because it's faster
    cmd.CommandText = "INSERT INTO teachers (first_name, last_name, subject, salary) VALUES " +
        (String.Join(',',teachers.Select(t => $"('{t.FirstName}','{t.LastName}','{t.Subject}',{t.Salary})")));
    cmd.ExecuteNonQuery();
}

public async Task BulkInsertBinaryImporter(IEnumerable<Teacher> teachers)
{
    using (var writer = con.BeginBinaryImport("COPY teachers (first_name, last_name, subject, salary) FROM STDIN (FORMAT BINARY)"))
    {
        foreach (var teacher in teachers)
        {
            await writer.StartRowAsync().ConfigureAwait(false);
            await writer.WriteAsync(teacher.FirstName, NpgsqlTypes.NpgsqlDbType.Varchar).ConfigureAwait(false);
            await writer.WriteAsync(teacher.LastName, NpgsqlTypes.NpgsqlDbType.Varchar).ConfigureAwait(false);
            await writer.WriteAsync(teacher.Subject, NpgsqlTypes.NpgsqlDbType.Varchar).ConfigureAwait(false);
            await writer.WriteAsync(teacher.Salary, NpgsqlTypes.NpgsqlDbType.Integer).ConfigureAwait(false);
        }
        await writer.CompleteAsync().ConfigureAwait(false);
    }
}

// This is the WRONG way to do bulk insert because inserts items one by one
public void OneByOneInsert(IEnumerable<Teacher> teachers)
{
    cmd.Parameters.Clear();
    cmd.CommandText = "INSERT INTO teachers (first_name, last_name, subject, salary) VALUES (@firstName, @lastName, @subject, @salary)";

    foreach (var teacher in teachers)
    {
        cmd.Parameters.AddWithValue("firstName", teacher.FirstName);
        cmd.Parameters.AddWithValue("lastName", teacher.LastName);
        cmd.Parameters.AddWithValue("subject", teacher.Subject);
        cmd.Parameters.AddWithValue("salary", teacher.Salary);
        cmd.ExecuteNonQuery();
        cmd.Parameters.Clear();
    }
}

public void BulkInsertInTransaction(IEnumerable<Teacher> teachers)
{
    using var transaction = con.BeginTransaction();
    var sql = "INSERT INTO teachers (first_name, last_name, subject, salary) VALUES (@firstName, @lastName, @subject, @salary)";
    using var command = new NpgsqlCommand(sql, con, transaction);
    foreach (var teacher in teachers)
    {
        command.Parameters.AddWithValue("firstName", teacher.FirstName);
        command.Parameters.AddWithValue("lastName", teacher.LastName);
        command.Parameters.AddWithValue("subject", teacher.Subject);
        command.Parameters.AddWithValue("salary", teacher.Salary);
        command.ExecuteNonQuery();
        command.Parameters.Clear();
    }
    transaction.Commit();
}

This produced the following results:

Method Mean Error StdDev
NpgsqlInsertRegular 9.850 ms 0.1898 ms 0.2259 ms
NpgsqlInsertBinaryImporter 3.922 ms 0.0774 ms 0.1059 ms
NpgsqlInsertOneByOne 188.604 ms 3.6940 ms 3.4553 ms
NpgsqlInsertInTransaction 88.293 ms 1.7622 ms 2.2913 ms

As you can see, using a BinaryImporter (with the COPY command) is by far the most efficient way to insert multiple rows with Postgres. It’s also the recommended way according to PostgreSQL documentation. Using INSERT with raw SQL is also fast, but almost 3 times slower than COPY.

In comparison, you can see that inserting items one by one is about 20 times slower than an INSERT operation and 50 times slower than a COPY operation.

I thought it was also interesting to show that grouping many INSERT operation into a single transaction makes it more than 2 times faster.

Selecting items

To check SELECT operations, I created a table of 10,000 random teacher rows. I then retrieved teachers by their subject, which should return 10 items on average (there are 1000 subjects). I tried reading with and without an index.

Here are the implementations:

public async Task<int> NpgsqlRead(string subject)
{
    cmd.CommandText = $"SELECT * FROM teachers WHERE subject='{subject}'";
    NpgsqlDataReader reader = await cmd.ExecuteReaderAsync().ConfigureAwait(false);
    var result = new List<Teacher>();
    while (await reader.ReadAsync().ConfigureAwait(false))
    {
        result.Add(new Teacher(
                id: (int)reader[0],
                first_name: reader[1] as string,
                last_name: reader[2] as string,
                subject: reader[3] as string,
                salary: (int)reader[4]));

    }
    await reader.CloseAsync().ConfigureAwait(false);
    return result.Count(); // return Count() to make sure the LINQ was evaluated
}

public async Task<int> DapperRead(string subject)
{
    string sql = $"SELECT * FROM {Teacher.TableName} WHERE subject='{subject}'";
    var result = await _connection.QueryAsync<Teacher>(sql).ConfigureAwait(false);
    return result.Count(); // return Count() to make sure the LINQ was evaluated
}

public async Task<int> EFRead(string subject)
{
    var result = await _db.Teachers.Where(t => t.Subject == subject).ToListAsync().ConfigureAwait(false);
    return result.Count;
}

When running without indexing subjects, the benchmark results were:

Method Mean Error StdDev
NpgsqlRead 882.3 us 17.16 us 33.06 us
DapperRead 929.8 us 6.47 us 5.40 us
EFRead 1,014.5 us 11.39 us 10.66 us

Not too surprisingly, Npgsql is the fastest. Faster by 5% than Dapper and by 13% than Entity Framework. Not a big deal by any means.

After introducing an index, the results were the following:

Method Mean Error StdDev
NpgsqlRead 261.3 us 4.44 us 5.11 us
DapperRead 261.1 us 5.15 us 7.55 us
EFRead 381.1 us 7.34 us 10.99 us

The index seems to improve the results by 600 milliseconds across the board. Other than that, Npgsql and Dapper are now equally fast, while Entity Framework Core became a whole 32% slower in comparison.

Update

To check Update, I looked for a teacher with a certain Id and changed their last name to something else. Here’s the code:

public async Task NpgsqlUpdate(int id, string newLastName)
{
    cmd.CommandText = $"UPDATE teachers SET last_name='{newLastName}' WHERE id = {id}";
    await cmd.ExecuteNonQueryAsync().ConfigureAwait(false);
}

public async Task DapperUpdate(int id, string newLastName)
{
    string sqlCommand = $"UPDATE {Teacher.TableName} SET last_name='{newLastName}' WHERE id = {id}";
    await _connection.ExecuteAsync(sqlCommand).ConfigureAwait(false);
}

public async Task EFUpdate(int id, string newLastName)
{
    var t = await _db.Teachers.FindAsync(id).ConfigureAwait(false);
    t.LastName = newLastName;
    await _db.SaveChangesAsync().ConfigureAwait(false);
}

public async Task EFUpdateWithExecuteUpdate(int id, string newLastName)
{
    var t = await _db.Teachers.Where(t => t.Id== id)
            .ExecuteUpdateAsync(t => t.SetProperty(t => t.LastName, t => newLastName))
            .ConfigureAwait(false);
}

And here are the results:

Method Mean Error StdDev
NpgsqlUpdate 393.9 us 6.14 us 5.74 us
DapperUpdate 405.2 us 7.85 us 10.20 us
EFUpdate 599.7 us 7.93 us 7.42 us
EFUpdateWithExecuteUpdate 541.4 us 6.90 us 6.46 us

Note that Id is already indexed as a primary key.

Npgsql and Dapper are about equally fast. Entity Framework is 33% slower in its regular mode. The new ExecuteUpdate mode introduced in EF Core 7 improves it to be only 27% slower. This is consistent with everything we measured so far.

Overview of what we got

First of all, I admit that this is far from a complete benchmark. I used a single table without any foreign key constraints. I didn’t check joins, and I could have tested more operations, like DELETE and GROUP BY, where raw SQL would have probably been even faster in comparison to Entity Framework. And I could have tested more environments. But I think these results are enough for a ballpark idea of what you’re getting with each technology.

Here’s a summary of the insights we saw:

  • Npgsql and Dapper are about equally fast, but Npgsql can be slightly faster if you’re not using parameters and not using async API.
  • For Bulk Insert, using a COPY operation with Npgsql was much faster than other alternatives.
  • Entity Framework Core came in slowest in all operations.

Considering these results, I’d probably recommend going with Dapper for the nicer syntax, or Npgsql if you don’t want to introduce another dependency for some reason. However, Entity Framework has a lot of advantages, and it’s perfectly fine to mix and match these usages. EF doesn’t require your team to know SQL, or the particular flavor of SQL that PostgreSQL uses. It’s arguably less prone to mistakes since you’re using strongly-typed C# code instead of writing raw SQL strings. And it abstracts away your database, so the developer doesn’t even have to know or think whether you’re using SQL Server, Postgres, or something else. So you might want to choose to use Npgsql in your application’s low-level infrastructure and to allow using Entity Framework in day-to-day development. Although, I personally like writing the SQL myself.

That’s it for this one, cheers.

Похожее
Mar 20
...
Jun 13, 2023
One of the things that we may struggle with as developers when working on a green field project is our stack. Choosing the right tech to solve a problem can be a harrowing experience. Databases, in particular, can be a...
Oct 25, 2023
Author: Ignacio Chiazzo
As a developer, which areas should we focus on for performance optimizations? How can we measure the impact? Read this post to learn the framework. Performance optimization is any software or hardware modifications that make the platform faster and more...
Jul 16, 2021
Author: Edward Pollack
Monitoring SQL Server for performance is a critical part of a DBA’s job. In this article, Edward Pollack explains some of the SQL Server performance monitor data and how to collect it. Windows collects a wide array of useful metrics...
Написать сообщение
Тип
Почта
Имя
*Сообщение