Jul 31, 2023

Structs in C# are fun - Part 2/9: A brief introduction to Value Types vs Reference Types

Leia este post em português

Lisez cet post en french.

  1. Structs in C# are fun
  2. Brief introduction to Value Types vs Reference Types (this post)
  3. Field initialization in structs.
  4. Constructors and struct behavior.
  5. Other scenarios in which struct constructors behavior may surprise you.
  6. Struct with default argument values in constructors, a.k.a, are you not confused yet ?
  7. `required` feature from C# 11 will not save your a** job.
  8. Struct used as default argument values.
  9. Bonus: Struct evolution in C#.

Since the first oficial release (version 1.0, back in January/2002) of C#, developers have been faced with a decision when introducing new types: declaring it as a class (a reference type) or a struct (a value type), represented in the .NET type system by System.Object and System.ValueType respectively.

Picking one over the other has non trivial implications in usability, performance, extensibility to list a few. In this post I want to briefly cover the main differences (more details will be presented in future posts) and clarify one misconception (which I am probably guilt of contributing to disseminate). So without further ado lets get into the most important distinction about the two...

Reference type vs Value type semantics

The most important characteristic distinguishing these 2 kind of types relates to equality and assignment/parameter passing are behaviour. The table below shows these differences (assuming the types in question does not overloads Equals() method or ==/!= operators):


Value Type Reference Type
Assignment semantics
by value, i.e, the instance contents is copied leaving  two independent copies.
by reference, i.e, assignment just copies a reference and changes through any of the references will be observed when accessing the object through the other one.
Equality
(Identity semantics)
two instances are equal if they are instances of the same type and all its fields are equal. two instances are equal if they reference the same object (i.e, exact same reference)

In order to make it easier to visualise, imagine the following code:

After running the code through line 15, we can represent the memory used by v1 and v3, in an overly simplified way, as something like:


i,e, variable v1 is stored at address 0x100 while v3 at address 0x400. Notice though that v3 does not store the actual  instance data of the AReferenceType object. Instead it stores a reference (or, in an oversimplification, a pointer) to the actual object allocated at address 0x1000. In contrast, AValueType instances stores its data inline.

If we inspect the memory state when program reaches line 19, we'd observe something like:

Notice that both variables (v1 & v3) had its contents copied to the newly declared ones (v2 & v4) but since v3 is a reference type, copying its value means copying a reference, a fact that will become evident later.

When the code prints the IntValue fields from v1 & v2 it is evident that the same value will be displayed but after storing the value 71 in v2.IntValue ( at line 21), the memory will look like:

so line 22 will print the values 42 & 71, or, in other words,  v1 & v2 are independent of each other.

Reference types work differently; let's start with line 25 in which v3 and v4 IntValues are displayed; in order to figure out which values will be passed to Console.WriteLine() method, first the value of  v3 (i.e, 0x1000) is read from the location at address 0x400 (v3) and then the contents at that address (actually the four bytes that makes up an int in C#) is read, producing the value 42; next the same process is repeated for v4; since v4 references the same object as v3 (0x1000) that will produce the same result.

Since a similar process  is applied when changing reference type state, after executing line 26, the memory will look like:

 

and line 27 will print 71 & 71 i.e, changes applied to v4 are observable through v3.

Misconceptions

Over the years I have seem multiple articles, interviewes and even some books claiming that one of the main difference between value types vs reference types is that value types are always stored on the stack while reference types are always stored in the managed heap (as I mentioned above, most likely I am guilt of propagating this misconception also :(). 

This misconception is so spread that Eric Lippert (which worked as one of the designers of C# in the past) wrote 2 blog posts to debunk it.

As Eric mentions, the fact that value types are usualy allocated on the stack is an implementation detail, but IMO, one that will hardly change; that said this implementation  behavior is important in scenarios where keeping heap allocations (and consequently GC pressure) low is crutial to achieve predictable performance characteristics.

As always, all feedback is welcome.

Have fun!

No comments: