Поиск  
Always will be ready notify the world about expectations as easy as possible: job change page
Feb 6

Calculating optimistic memory footprint of managed object

Calculating optimistic memory footprint of managed object
Источник:
Просмотров:
424
Calculating optimistic memory footprint of managed object favorites 0

The way of calculating amount of memory occupied by some object in C# .NET.

Introduction

C#.NET is a high-level, multipurpose modern language with only two flaws compared to C++: generics and sizeof operator. Anyone who tried using generics in C# (coming from the C++ world) the same way as in C++, was quickly disappointed: not only can't you access static members of template (generic) type, which rends policy-based design not so swift (need to create instance of a generic type), but the generic type is (almost) useless, unless the constraint (by the where clause) will provide some interface to it. It would not be such a pain, except you can't even add two instances of the generic type, even when the generic type implements IConvertible interface (the interface all primitives implement). There is dynamic operator, which enables you to do such duck-typing, but well, it is by name dynamic, so there is no compile time type checking and a slight performance drawback. So there it is, as a programmer and not C# language implementer, one can't do much about it. Actually, the level of abstraction layers meticulously built on top of C# language makes it such an incredibly risky project, that it would be pure madness to even try. But there is this second thing, the sizeof operator, and we can do something about it, and what this tip will show is the method (actually a helper class) that rends possible calculating the optimistic (the object occupies at least returned amount of memory) size an object occupies in memory. Obviously, it makes use of reflection, but that is what the reflection is for.

Background

There have been many approaches to calculating managed object living in CLR. The approaches can be categorized as:

Calculate the difference of memory before and after the object release.

This one is straight forward:

var before = GC.GetTotalMemory(true);
// do some allocation
Console.Write("Memory used = {0}", before - GC.GetTotalMemory());

The obvious drawbacks are: you will fail in multithreaded environment - even operating on single thread requires the rest of CLR to remain totally frozen, even then the negative result when measuring little objects should not be a surprise (uups, the GC just collected some garbage). In short - it's neither reliable nor to the point.

The second approach, proposed here and here, relies on serialization. There are 3 types of serialization in C#, which I will not describe here and refer to the MSDN article: Serialization (C# and Visual Basic). The reason this approach is not optimal is that the serialization is not meant to measure the object size, rather to persist it, or send away. This means that measuring bytes of serialized object, one will once measure names of variables and overheaded XML, or in case of binary serialization, the compressed object size (the compression is lossless, but the object is represented using whole set of characters).

The third approach is to calculate all fields the object references sizes, add them up and voila. Or at least it seems that simple. One way it can be done in .NET is to create an interface exposing method size() returning an int representing number of bytes occupied by an instance. This is great, however: (1) one cannot measure this way build in or third-party objects, (2) it's the contract, it has not to be implemented as one would expect in class implementing the interface, and even if it would be, this is error prone. The better way is the divide and conquer way (or at least it resembles it): we start from the sizes we know, and this includes all primitives, decimal, string (str.Lenght * sizeof(char)) and few more. Then, the question must be asked: are there any (managed) objects that do not rely entirely on those primitives? The answer is, (disputably) not, or at least we are not interested in others. Consider an example: there is a class reading data (clients, stock quotes) from database to internal buffer, say, list of client instances and managing the DB connection. The client class has a couple of standard fields (int age, string name, reference to a product), and the product can be referred by many clients and contains internally a list of all client references that bought it.

We have in this example all 4 dangers one has to address when calculating memory size:

  1. Unmanaged resource which is database.
  2. Object referencing other objects (product), that the user of the Size function not necessarily wanted/intended to calculate.
  3. The object that has no one-to-one relation: the same product can be referenced by many customers, thus its size can be calculated more than once), and the final boss.
  4. Circular references back from the product to (!) all customers that bought it. What could go wrong.

How to address these problems:

  1. Unmanaged does not interest us. Simple. What does, on the other hand, is the connection object, which size should be calculated: mostly the connection string probably.
  2. Object referencing other objects - The assumption is straight forward: you reference it, you own it. This may not be good reasoning for other situations, but in this case saves a lot of trouble of finding example object that created it (?).
  3. No one-to-one relation: HashSet of references. It requires the function to store the state between calculations, so in case of recurrent function, the helper wrapper class is needed, but it's that, nothing else. + Bonus: it solves problem with circular references.
  4. See 3.
  5. Additionally, by counting fields only, all properties that have field underlying are counted once, and ex. FullName { get { return Forename + Surname; } } are not counted (the Forename and Surname will be separate).
  6. There is also a not-so-obvious problem with System.Reflection.Pointer which will be discussed.

Using the code

public static class Utilities
{
    /// <summary>
    /// Nice way to calculate the size of managed object!
    /// </summary>
    /// <typeparam name="TT"></typeparam>
    internal class Size<TT>
    {
        private readonly TT _obj;
        private readonly HashSet<object> references;
        private static readonly int PointerSize =
        Environment.Is64BitOperatingSystem ? sizeof(long) : sizeof(int);
        public Size(TT obj)
        {
            _obj = obj;
            references = new HashSet<object>() { _obj };
        }
        public long GetSizeInBytes()
        {
            return this.GetSizeInBytes(_obj);
        }

        // The core functionality. Recurrently calls itself when an object appears to have fields
        // until all fields have been  visited, or were "visited" (calculated) already.
        private long GetSizeInBytes<T>(T obj)
        {
            if (obj == null) return sizeof(int);
            var type = obj.GetType();

            if (type.IsPrimitive)
            {
                switch (Type.GetTypeCode(type))
                {
                    case TypeCode.Boolean:
                    case TypeCode.Byte:
                    case TypeCode.SByte:
                        return sizeof(byte);
                    case TypeCode.Char:
                        return sizeof(char);
                    case TypeCode.Single:
                        return sizeof(float);
                    case TypeCode.Double:
                        return sizeof(double);
                    case TypeCode.Int16:
                    case TypeCode.UInt16:
                        return sizeof(Int16);
                    case TypeCode.Int32:
                    case TypeCode.UInt32:
                        return sizeof(Int32);
                    case TypeCode.Int64:
                    case TypeCode.UInt64:
                    default:
                        return sizeof(Int64);
                }
            }
            else if (obj is decimal)
            {
                return sizeof(decimal);
            }
            else if (obj is string)
            {
                return sizeof(char) * obj.ToString().Length;
            }
            else if (type.IsEnum)
            {
                return sizeof(int);
            }
            else if (type.IsArray)
            {
                long size = PointerSize;
                var casted = (IEnumerable)obj;
                foreach (var item in casted)
                {
                    size += this.GetSizeInBytes(item);
                }
                return size;
            }
            else if (obj is System.Reflection.Pointer)
            {
                return PointerSize;
            }
            else
            {
                long size = 0;
                var t = type;
                while (t != null)
                {
                    size += PointerSize;
                    var fields = t.GetFields(BindingFlags.Instance | BindingFlags.Public |
                            BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
                    foreach (var field in fields)
                    {
                        var tempVal = field.GetValue(obj);
                        if (!references.Contains(tempVal))
                        {
                            references.Add(tempVal);
                            size += this.GetSizeInBytes(tempVal);
                        }
                    }
                    t = t.BaseType;
                }
                return size;
            }
        }
    }

    // The actual, exposed method:
    public static long SizeInBytes<T>(this T SomeObject)
    {
        var temp = new Size<T>(SomeObject);
        var tempSize = temp.GetSizeInBytes();
        return tempSize;
    }
}

Points of interest

The most trippy part, after embracing the whole recurrent reference-jumping compared against HashSet is the System.Reflection.Pointer. It's a hellish creature to appear as a field in code using reflection, because it's not CLS compliant, and when not "crossed out" explicitly, will cause stack overflow quickly.

Also note that the generic collections and even ArrayList are not arrays in the sense of Type.IsArray, which is good actually, letting the object fall to the last case, where all fields are counted: example, size which is kept internally and incremented/decremented behind the scenes.

Похожее
May 12, 2023
Author: Alex Maher
Language Integrated Query (LINQ) is a powerful feature in C# .NET that allows developers to query various data sources using a consistent syntax. In this article, we’ll explore some advanced LINQ techniques to help you level up your skills and...
Mar 28, 2024
...
Aug 1, 2024
Author: Sasha Marfut
Clean Architecture in practice. Clean Architecture is a popular approach to building software applications. A Clean Architecture defines clear boundaries between different application layers, resulting in improved maintainability. In addition, Clean Architecture approach aims to keep the business logic independent...
Feb 10, 2023
Author: Hr. N Nikitins
Design patterns are essential for creating maintainable and reusable code in .NET. Whether you’re a seasoned developer or just starting out, understanding and applying these patterns can greatly improve your coding efficiency and overall development process. In this post, we’ll...
Написать сообщение
Тип
Почта
Имя
*Сообщение
RSS
Если вам понравился этот сайт и вы хотите меня поддержать, вы можете
Soft skills: 18 самых важных навыков, которыми должен владеть каждый работник
Плохо девелопмент
Зачем нужен MediatR?
Текстовый редактор для Git в среде Windows
Использование SQLite в .NET приложениях
Мультитаскинг, или Как работать над несколькими проектами и не сойти с ума
Какого черта мы нанимаем, или осмысленность собеседований в IT
Проблема понимания существующего кода, или Как делать иногда [не] надо
Как мы столкнулись с версионированием и осознали, что вариант «просто проставить цифры» не работает
Почему сеньоры ненавидят собеседования с кодингом, и что компании должны использовать вместо них
Boosty
Donate to support the project
GitHub account
GitHub profile