The following rewrites Sections 7.22 (General Utilities <stdlib.h>) and 7.22.3 (Memory management functions) of the C1X draft standard (June 2010) which is available at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1494.pdf. Your suggestions on this proposal are most welcome.

DEADLINE IS 15^th OCTOBER 2010

LAST UPDATED: 8^th SEPTEMBER 2010 (v1.0)

Big questions for this proposal still to be answered:

The type attribute of object alignment is one of the major features of the C1X standard (see Section 6.2.8). Given its pervasiveness throughout statically declared object storage, it would be more consistent if dynamically allocated storage also kept its alignment across calls to realloc without having to respecify it in the arguments as the API below does. Given that alignment ought to be a two’s power, this requires perhaps an extra four bits of storage per block allocated in the memory allocator implementation. Is this too great an overhead for a feature which is not presently used to a great extent? Note that with increasing vectorisation and stream computing CPU technologies, the chances are that alignment is going to become much more important in the near future.
How far ought we to go in adding introspection routines which allow the querying of the layout of virtual address space? Such querying isn’t of much use unless one can also force the allocator to try to use a certain address for an allocation.
Should we also add independent_malloc and independent_comalloc? These functions allocate a sequence (of same sized and differently sized blocks respectively) where that sequence is guaranteed to be consecutive in memory, but where each block can be thereafter treated independently. This can greatly improve cache locality for certain kinds of code implementation, and it may be a worthy addition as cache locality becomes ever more important in modern architectures.

Notes on this proposal:

Separate memory pools have been deliberately left out. Rationales:

There are plenty of existing third party memory allocators which already do this.
There are significant changes coming in the next decade in memory allocator implementation theory, and the ISO C specification is not the place to predict these developments. The exponentially rising number of CPU cores in processors will surely additionally introduce elements of non-unified and virtualized memory allocation theory, and one can see how separate memory pools could be an elegant solution to this problem.

One will surely note how the extra parameters, when defaulted to zero, equal the original C function. It is expected that malloc, calloc, realloc and free simply become parameter defaulting wrappers around the new functions – and equally, that on C++ the extra parameters are given a default of zero. If C ever gets default parameters then this case is covered.

The Proposed Text

7.22 General utilities <stdlib.h>

The header <stdlib.h> declares five types and several functions of general utility, and defines several macros.
The types declared are size_t and wchar_t (both described in 7.19),div_twhich is a structure type that is the type of the value returned by the div function,ldiv_twhich is a structure type that is the type of the value returned by the ldiv function, and
lldiv_t

which is a structure type that is the type of the value returned by the lldiv function.

The macros defined are NULL (described in 7.19);EXIT_FAILUREandEXIT_SUCCESS
which expand to integer constant expressions that can be used as the argument to the exit function to return unsuccessful or successful termination status, respectively, to the host environment;

RAND_MAX

which expands to an integer constant expression that is the maximum value returned by the rand function;

MB_CUR_MAX

which expands to a positive integer expression with type size_t that is the maximum number of bytes in a multibyte character for the extended character set specified by the current locale (category LC_CTYPE), which is never greater than MB_LEN_MAX;

M2_ZERO_MEMORY[ned1]

which requests malloc2 and realloc2 to return any newly allocated space initialized to all bits zero;

M2_PREVENT_MOVE[ned2]

which inhibits the address relocation of an object being resized via realloc2;

M2_CONSTANT_TIME[ned3]

which may cause the implementation to avoid any atypical (e.g. housekeeping) operations taking an unpredictable length of time during this particular operation;

M2_RESERVE_MULT(N)

which may cause malloc2 and realloc2 to reserve N times the initial allocation size of address space, thus allowing subsequent object size expansions via realloc2 up to that reservation to not relocate that object in memory; and

M2_RESERVE_SHIFT(N)

which may cause malloc2 and realloc2 to reserve 2^N bytes of address space for this object, thus allowing subsequent object size expansions via realloc2 up to that reservation to not relocate that object in memory, with this setting overriding M2_RESERVE_MULT(N) if also present. [ned4]

7.22.3 Memory management functions

The order and contiguity of storage allocated by successive calls to the calloc, malloc, malloc2, realloc and realloc2 functions is unspecified. The pointer, unless otherwise specified by a non-zero value for an alignment parameter, returned if the allocation succeeds is default aligned, which is defined as the alignment such that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated according to the values of the parameters supplied, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
If there is a flags parameter taken by the call, this consists of a bitwise addition (i.e. operator |) of the following macro defined flags as defined at the top of Section 7.22:

M2_ZERO_MEMORY
M2_PREVENT_MOVE
M2_CONSTANT_TIME
M2_RESERVE_MULT(N)
M2_RESERVE_SHIFT(N)

It is expected that implementations may add additional flags not specified here. The right to use up to half the bits provided by a uintmax_t in future versions of this specification is reserved.

7.22.3.1 The calloc function

Synopsis

1. #include <stdlib.h>

void *calloc(size_t nmemb, size_t size);

Description

The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero.

Returns

The calloc function returns either a null pointer or a pointer to the allocated space.

7.22.3.2 The free function

Synopsis

1. #include <stdlib.h>

void free(void *ptr);

Description

The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free, free2, realloc or realloc2 function, the behavior is undefined.

Returns

The free function returns no value.

7.22.3.3 The free2 function

Synopsis

1. #include <stdlib.h>

void free2(void *ptr, uintmax_t flags);

Description

The free2 function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free, free2, realloc or realloc2 function, the behavior is undefined.

Returns

The free2 function returns no value.

7.22.3.4 The malloc function

Synopsis

1. #include <stdlib.h>

void *malloc(size_t size);

Description

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

Returns

The malloc function returns either a null pointer or a pointer to the allocated space.

7.22.3.5 The malloc2 function

Synopsis

1. #include <stdlib.h>

void *malloc2(size_t size, size_t alignment, uintmax_t flags);

Description

The malloc2 function allocates space for an object whose alignment is specified by alignment if that parameter is non-zero, whose size is specified by size, and whose value is indeterminate unless the flag M2_ZERO_MEMORY is specified. If non-zero, the value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.

Returns

The malloc2 function returns either a null pointer or a pointer to the allocated space.

7.22.3.6 The malloc_usable_size function[ned5]

Synopsis

1. #include <stdlib.h>

size_t malloc_usable_size(void *ptr);

Description

The malloc_usable_size function returns the size in bytes of the object pointed to by ptr. This will always be no smaller than the size requested when allocating or resizing the object, however it may or may not be larger.
If ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free, free2, realloc or realloc2 function, the behavior is undefined.

Returns

The malloc_usable_size function returns the usable space in bytes of the object pointed to by ptr.

7.22.3.7 The realloc function

Synopsis

1. #include <stdlib.h>

void *realloc(void *ptr, size_t size);

Description

The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free, free2, realloc or realloc2 function, the behavior is undefined. If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.

Returns

The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.

7.22.3.8 The realloc2 function[ned6]

Synopsis

1. #include <stdlib.h>

void *realloc2(void *ptr, size_t size, size_t alignment, uintmax_t flags);

Description

The realloc2 function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size and whose alignment is specified by alignment if that parameter is non-zero. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values unless flags contains M2_ZERO_MEMORY, in which case the newly allocated bytes are initialized to all bits zero.
If non-zero, the value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment. If the value of alignment differs from the value of alignment as specified when the old object pointed to by ptr was last allocated or resized, the behavior is undefined.
If ptr is a null pointer, the realloc2 function behaves like the malloc2 function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free, free2, realloc or realloc2 function, the behavior is undefined.
If ptr is non-zero and flags contains M2_PREVENT_MOVE, the pointer returned is guaranteed to either be ptr or a null pointer if the new object could not be allocated. If ptr is a null pointer and flags contains M2_PREVENT_MOVE, the realloc2 function behaves like the malloc2 function.
If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.

Returns

The realloc2 function returns a pointer to the new object (which will have the same value as a pointer to the old object if M2_PREVENT_MOVE was specified, and may have the same value as a pointer to the old object otherwise), or a null pointer if the new object could not be allocated.

[ned1]The choice of the M2_ macro prefix is from the M_ prefixed options for the mallopt function in SVID/XPG.

[ned2]This solves a huge problem for any code which can’t have reallocation suddenly move an object in address space (e.g. almost every object orientated language). Implementations may handle not supporting its usage by always returning a null pointer from realloc2 when this flag is used.

Example of usage in pseudo-C++:

template<typename T, typename A> void vector<T, A>::resize(size_t n)
{
  if(n>size()) {
    void *newdata;
    if(!(newdata=realloc2(data, n*sizeof(T), myalignment(),  M2_PREVENT_MOVE))) {
      if(type_traits<T>::isPOD)
        if(!(newdata=realloc2(data, n*sizeof(T), myalignment(), 0))) throw bad_alloc();
      else {
        if(!(newdata=malloc2(n*sizeof(T), myalignment(), 0))) throw  bad_alloc();
        move_construct(newdata, data, n);
        free(data);
      }
    }
    data=newdata;
  }
}

[ned3]This solves a very longstanding problem in interrupt handler implementations where a free function call could introduce catastrophic interrupt handling latency due to a sudden free space coalescing. This flag MAY inhibit such behaviour for the duration of the call, thus making life very much easier for operating system and driver implementations. This flag MAY also use a totally separate heap and heap implementation.

[ned4]These guys finally add virtual address space awareness to C code and those languages using the C memory allocator API. It is anticipated that they are combined with M2_PREVENT_MOVE such that (for example) repeatedly extending large arrays avoids almost all memory copying. They are however optional and may do absolutely nothing.

Example of usage in pseudo-C:

void *mem1=malloc2(1Mb, 0, M2_RESERVE_SHIFT(1Gb));
void *mem2=malloc2(4Kb, 0, 0); /* Would cause next realloc to relocate block normally */
void *mem3=realloc2(mem1, 1Gb, 0, M2_PREVENT_MOVE|M2_RESERVE_SHIFT(1Gb));
assert(mem1==mem3); /* True */

[ned5]Some may wonder why this function has been included – it is simply a codification of existing practice in that every major operating system out there already has an equivalent call. By the way, this call is very useful when building per-thread lookaside memory block caches as well as a whole load of other uses.

[ned6]This function is now the most complex in this section of the specification, and without doubt the hardest to “get right” if there ever can be such a thing. Comments on this are most welcome.

9 Comments

Jeffrey Yasskin

Posted September 22, 2010 at 10:00 pm | Permalink | Reply

1) realloc() is easy to use improperly in that a call like “ptr = realloc(ptr, …)” is always incorrect, even though that’s the overall semantics most uses want. A signature like “bool realloc(void**ptr, …)”, where realloc updates *ptr to the new address of the object and returns true iff the usable size has been updated, would be harder to use incorrectly. On the other hand, it would make it harder for LLVM’s (current) IR to capture the semantics.

2) To accommodate malloc_usable_size(), you probably need to update malloc’s spec from “The malloc function allocates space for an object whose size is specified by *size*” to “The malloc function allocates space for an object whose size is at least *size*”. And similarly for realloc.
- nialldouglas
  
  Posted September 23, 2010 at 5:32 pm | Permalink | Reply
  
  Point 1: Good idea. I’ve raised a poll for this at https://mallocv2.wordpress.com/polls/how-should-realloc-return-blocks/.
  
  Point 2: Well caught, thank you.
  - nialldouglas
    
    Posted September 27, 2010 at 5:12 pm | Permalink
    
    A thought I had:
    
    void* realloc2(void **mem, size_t bytes, size_t alignment, unsigned flags);
    
    T *mem=0;
    realloc2(&mem, sizeof(T), 0, 0);
    
    Unfortunately C will choke here because void ** won’t take a T**. You can get around this via templates in C++, but requiring:
    
    realloc2((void **) &mem, sizeof(T), 0, 0);
    
    … also looks ugly when compared to if((newmem=realloc2(oldmem))). I suppose you could always define realloc2 as a macro which does the appropriate casting, but that’s another can of worms.
    
    Niall
Gene B

Posted September 25, 2010 at 1:15 pm | Permalink | Reply

I think a better function would be:

void * try_realloc(void *ptr, size_t size);

which returns ptr if reallocation in place is performed or NULL otherwise. In any case the memory pointed by ptr should be preserved. Since memory address is the same, alignment doesn’t change and therefore, there is no need in alignment parameter. There is also no need in zeroing memory flag, as the user program can do it effortlessly whenever it’s needed, and in most cases it’s not needed, because memory is filled with real data after allocation anyway. There is no need in PREVENT_MOVE flag either, because the only interesting function is the one that doesn’t move.
- nialldouglas
  
  Posted September 27, 2010 at 5:18 pm | Permalink | Reply
  
  Thanks for the comment.
  
  After all these discussions I am tending towards removing malloc2 and free2 altogether, and coalescing the entire thing into:
  
  void* malloc_do(void *mem, size_t bytes, size_t alignment, uintmax_t flags);
  
  Now to allocate: T *t=malloc_do(0, sizeof(T), alignof(T), 0);
  To resize: T *t_=malloc_do(t, sizeof(T), alignof(T), 0);
  To free: malloc_do(t, 0, 0, 0);
  
  Thoughts?
  
  Niall
Jeffrey Yasskin

Posted September 27, 2010 at 7:33 pm | Permalink | Reply

I forwarded this to Lawrence Crowl, and he pointed out that tcmalloc spends a significant amount of time looking up the size of a freed block. He also made this comment to the C++ committee, but I think they’re likely to defer to C, rather than inventing something on their own. See US 26, “C++ Sized Deallocation” in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3102.pdf. To fix this problem, it’d be good to have a way to pass the object’s size to the new free() function, however it’s spelled.
- nialldouglas
  
  Posted September 29, 2010 at 1:38 pm | Permalink | Reply
  
  I take it that Lawrence Crowl is one of the authors of tcmalloc? If so, that’s great because the email provided by tcmalloc’s docs for contacting them bounces. Thanks for that.
  
  In the very first, non-public version of this proposal free2() took a size parameter which could be zero – in which case it meant the value returned by malloc_usable_size – or the size of the block being freed. My intention for adding it was more one of support for tiny bootstrap allocators which don’t store the size of the block rather than as a performance feature.
  
  Anyway it got shot down real early as it was viewed as opening a security hole, and if it didn’t then the logic checking it for a sane value would be slower than not specifying it at all. In other words, while it does make things considerably faster, in practical real world use it could not be allowed to be faster than not having it at all. Hence it was dropped.
  
  Niall
Jeffrey Yasskin

Posted September 27, 2010 at 8:00 pm | Permalink | Reply

I remembered another thing to think about: numa-node affinity. http://developer.amd.com/Assets/NUMA_aware_heap_memory_manager_article_final.pdf describes AMD’s work to make tcmalloc numa-aware, but their recommendations only cover the OS interface used by the memory allocator, not the allocator’s interface used by the user’s program. They assume that the allocating thread will be the heaviest user of the allocated block, which isn’t true in all applications, and it might be useful to tell the allocator which thread will actually use the result.

On the other hand, any clean interface I can think of for that gets into a lot of threading details, which may not belong in the core malloc interface.
- nialldouglas
  
  Posted September 29, 2010 at 2:02 pm | Permalink | Reply
  
  I think it’s still too early to make definitive assumptions regarding how NUMA will be implemented. I understand that what they’re going to try is that local core clusters will stay SMP as we know it today, but you’ll have “relaxed SMP” between the clusters which is a sort of lazy or delayed symmetry with high (asynchronous) latencies. So local cluster latencies might be in the dozens of clock cycles range but non-local clusters would be in the hundreds or thousands range. In such a scenario, one might have a thread-local pool, a cluster-local pool and then the process pool. However, equally, it might be that “relaxed SMP” might prove difficult or impossible to get working well in which case all of the above is no longer valid.
  
  I think adopting virtual memory awareness is a definite must – we’ve had it for decades now, and it won’t be going away. Explicit threading and NUMA support I think it’s still too early. After all, third party libraries can go ahead with such designs if they want and let the ISO standard adopt whatever emerges as best.
  
  Niall

A Collaborative Latency Reducing Enhancement Of The C/C++ malloc API

v1.00 of the C Proposal Text

The Proposed Text

9 Comments

Post a Comment

RSS Links

Blog Stats

Search

Blogroll

RSS Feeds

Meta

A Collaborative Latency Reducing Enhancement Of The C/C++ malloc API

v1.00 of the C Proposal Text

The Proposed Text

Share this:

9 Comments

Post a Comment

RSS Links

Blog Stats

Search

Blogroll

RSS Feeds

Meta