Tuesday, April 14, 2009

Smart Pointers != Managed Automatic Memory Management

I posted an answer on a stackoverflow question about C++ and garbage collection. My answer wasn't ideal and caused a bit of discussion in the comments so I thought I would just sum things up here.

Manual (Traditional) Memory Management

Manual memory management places the onus of memory management on the programmer. It is the style of memory management available to C and C++ developers. Here are some of the attributes of manual memory management:

  • Slow free-list allocation. To allocate memory in this style of memory management the runtime library consults a free list to find a free block. When allocating lots of blocks this can be slower than bump-pointer allocation.
  • No locality benefits. Blocks allocated sequentially won't usually be allocated next to each other. Later when some blocks are freed the remaining blocks a fixed in place and can't be moved.
  • Data is harder to share between modules. Module writers must manually decide who is responsible for freeing data and when it can be freed.
  • Invalid and dangling pointers are possible. Blocks can be freed multiple times. Every allocation must be checked. Etc, etc, etc..

Automatic Dynamic Memory Management

This is usually referred to as garbage collection and is the type of memory management used in Java, C#, Python, Ruby, etc, etc. Here are some memory management strategies possible with automatic GC:

  • Fast bump pointer allocation. Objects can be allocated using a bump pointer and then promoted to a free-list manage space only if they live. Alternatively a copying or compacting GC may always allocate with a bump pointer.
  • Locality benefits. Since objects can be moved and references updated objects can be allocated and later moved to improve locality. This results in better CPU cache usage.
  • No pointer problems, easy to share data between modules, etc, etc.

Smart Pointers

Some people argue that the answer to the problems of manual memory management is using smart pointers. They are certainly better than using nothing however they only provide the most primitive implementation of reference counting:

  • Unbounded cost on decrement. If the root of a large data structure is decremented to zero there is an unbounded cost to free all the data. (I personally find this property amusing since this problem is often an argument against GC)
  • Manual cycle collection. To prevent cyclic data structures from leaking memory the programmer must manually break any potential structures by replacing part of the cycle with a weak smart pointer. This is simply another source of potential defects.

First Class Reference Counting

Reference counting garbage collection has several advantages over smart pointers (beyond the base GC advantages).

  • Changes to an object reference count are ignored for stack and register references. Instead when a GC is triggered these objects are retained by collecting a root set.
  • Changes to the reference count can be deferred and processed in batches. This results in higher throughput.
  • It is possible to coalesce changes to the reference count. This makes it possible to ignore most changes to an objects reference count improving RC performance for frequently mutated references.
  • Incremental cycle detection cost. It is possible to process cycle detection tasks in bounded chunks at GC time.

Summary

Automatic memory management has several technical advantages over manual memory management. When you factor in the reduced cost of development and maintenance when using garbage collection you should be in front.

For details of a high performance reference counting GC implementation look at this paper.

2 comments:

Anonymous said...

I should point out that only old *nix implementations of malloc() linear search a free list. Every implementation I've read recently buckets based on size and allocates with a bump pointer in the hot path (sizes with large churn will use free lists, but since the list is of [aprox.] the same size due to bucketing, there is no need to search).

Do you have some links to describe this "Reference counted GC" you describe? I can't figure out how that would work.

Unknown said...

Doy, you give a link in the summary.