April 18, 2004
| Revision History | |
|---|---|
| Revision 1 | April 18, 2004 |
|
Initial version for comment. | |
Abstract
The Notes on Works in Progress series is a set of ramblings prompted by issues I have run into as I implement various projects. Think of it as a sort of ``hacker's diary,'' written by someone concerned as much about the art of programming as about the end result.
In this note, I discourse on an obscure corner of C++ that cost me nearly three months.
Every programming language has its share of complexities. C++ is certainly no exception. In fact, because C++ carries a heavy burden of legacy compatibility requirements with C, it may have (arguably) more than its share. This article describes a problem that I ran into while implementing the latest version of the OpenCM system that cost me several months and nearly led me to abandon a design path in a detrimental way.
By way of background, let me first describe the project. OpenCM is an open source configuration management system that is the joint effort of The EROS Group, Ltd and Johns Hopkins University. The system design and architecture are primarily my own, but a number of very talented people have contributed to the code.
OpenCM has evolved through several implementations in different languages. I initially (circa 200) did some quick prototyping in Python, but at the time there wasn't a standard high-speed Python implementation of the SHA-1 hash available. I implemented a version in Python itself, found that it was impossibly slow, and abandoned the language.
The next implementation, also in 2000 was done in Scheme. I'm a fan of Scheme, and I'ld already pretty much decided to use Scheme as the OpenCM scripting language for security reasons. At the time, it seemed like my life would be simpler if I used one language rather than two. At the time, I was exploring the wonderful DrScheme implementation, I had the OpenCM project near to hand, so I jumped in. The DrScheme environment is simply amazing, and Scheme is a beautiful language. Unfortunately, there isn't enough standardization of useful runtime environment in Scheme to do serious programming in it. For example, I couldn't get such basics as reliable file size information, which OpenCM unquestionably needed. Out went the Scheme implementation, with some regret.
At this point, it was time to choose a production implementation language for OpenCM. I considered Java, but in spite of the hype Java really doesn't run everywhere, and I wanted to be sure that we could run our stuff on EROS someday. I looked at GCJ (a native-code Java Compiler), but the class library wasn't ready yet. For reasons that will become clear below, the decision not to use GCJ was a good one, because GCJ relies on the Boehm-Weiser conservative garbage collector.
I flirted briefly with C++, but decided not to use it for two reasons:
It was clear by this point that the EROS project needed to get out of C++. Whatever its merits as an application-level programming language — which today are few, in my opinion[1] — C++ simply isn't appropriate for low-level systems code, because there is too much code transformation going on within the compiler to easily understand what is going on at the machine level.
In the absence of any experience, I was concerned that interfacing C++ code to a scripting language when we added one later might be difficult. It was clear, for example, that the scripting version of MzScheme (see the DrScheme pages) were designed to tie into C code rather than C++ code, and I was doubtful about how the C++ exception support requirements would interact with this.
None of these was a good reason to avoid C++ as a language for OpenCM. Probably I was just burned out on C++ at the time. In any case, I allowed the scripting language issue to persuade me, and I decided to use C as the OpenCM implementation language. The C language decision forced us to hack together an exceptions package. Exception handling is one of those incredibly wonderful features. It substantially reduces the amount of code devoted to error checking, which clarifies everything. Unfortunately, it's only easy to use if the programmer doesn't have to explicitly manage the storage leak consequences. Exceptions are a real pain in the neck if you have to handle all the storage consequences yourself. This led us to look creatively at storage allocation.
One lesson from the Python and Scheme attempts did stick with me: garbage collection surely is convenient when it works. Some form of automatic storage management is essential when you are using exception handling in a system like OpenCM. We tried, in sequence, two collectors before giving up on the idea of garbage collection. Each failed us.
I'ld heard good things about the Boehm-Weiser conservative garbage collector, and decided that we would rely on it for OpenCM. Because the Boehm collector is a conservative collector, it is a ``drop in'' solution when it works. There is no need to modify existing C code for compatibility. In fact, there is no need to have the source code at all. This was important to us because we were relying on existing, standard libraries that were not implemented with garbage collection in mind. On balance, the Boehm collector looked very promising, and in some respects this was a good design decision, because it allowed us to work around some memory allocation bugs in one version of OpenSSL.
Unfortunately, it didn't scale well. We experienced two problems with the Boehm collector. The first is purely a function of size. There are points in the execution of the OpenCM server where it has an absolutely enormous heap, almost all of which is reachable. Attempts to garbage collect this heap are essentially useless, because none of it is garbage. Unfortunately, the Boehm collector cannot tell this, and we found ourselves spending an absolutely incredible amount of time in the garbage collector.
This isn't an intrinsic problem with garbage collection. There are collectors that would do perfectly fine in this situation, but they all require the ability to relocate and compact objects. C isn't designed for collection, so it can't support these collectors. In any case, Boehm wasn't going to work for us,
The second problem might be described as ``obscurity.'' When we did run encounter problems with the Boehm collector, they were nearly impossible to debug. On occasion, we hit bugs within the collector itself, which was challenging. The bigger issue is that trying to determine why the collector wasn't working the way we expected was extremely difficult. It would have helped a lot to have the ability to do an annotating trace, but the Boehm collector doesn't provide this ability.[2]
In the end, we found that we were unable to keep the server heap size within manageable limits, and we were forced to look for a different solution. It's likely that we were the victims of false object aliasing, but we didn't have the time or the energy to run the problem down. A significant part of the problem was that the aliasing was coming out of libraries that we didn't control. We could create private versions, but the continuing maintainance and patch process involved would have killed the OpenCM project.
Having played with the Boehm Collector, liked the concept, but not found it a direct solution, we independently reinvented the concept of the collector that Keith Packard built for the Nickel language. This collector requires source annotation, but it isn't terribly invasive. We didn't actually use the Nickel collector directly, because we wanted to support our exception handling package, and that required some tricky handling. Nothing major, and the concept of our collector is essentially similar to that of the Nickel collector.
The key idea in the Nickel collector is that there is a natural point to do garbage collection — procedure returns. At the return statement from a procedure, you know that the only surviving pointer variable is the return variable. This makes it a good point to collect storage.
From a practical perspective, we found two problems with the Nickel-style collector:
We had some cases where there were loops that iterated over very large numbers of objects and needed to do collection inside the loop. These don't happen in many places in a given program, but when they do it is critical to handle them well, and the Nickel mechanism isn't friendly to them.
As with the Boehm collector, when something goes wrong it goes very wrong, and it's hard to track down.
In the end, we were defeated by a stray pointer error that clobbered the collector's data structures. This proved impractical to debug, and ultimately it convinced me that we needed to be in a somewhat safer programming language. First class exceptions and some ability to do storage management were critical in making a new selection.
Another key issue was that I had begun to see that the server would probably need to be multithreaded, and there simply aren't any good concurrent garbage collectors.
With a safer programming language in mind, we looked longingly at both Java and C#. I was eager to try C#, but I had users on a short fuse. While the C# runtime is well documented and relatively open, there are fast public implementations only for the x86 (the Mono project), and they were still too young to depend on for something as critical as source code control. This status has recently changed, and if I had it to do over, I might reconsider using Mono. One issue is that Mono uses the Boehm-Weiser collector, and we already knew that that wasn't working for us. The issues are a bit different, because Mono is in a position to make better use of the collector, but it's still a concern. Meanwhile, Java still had all the problems that concerned us originally.
So, with a great deal of trepidation, I chose C++ for the next attempt. This decision was motivated by three considerations:
The huge majority of our garbage generation was the result of string manipulation. The essential problem came about because of the absence of a first-class C string type; string copying cost wasn't the driving consideration. Simply by moving to the C++ std::string implementation we would eliminate the vast majority of the unmanaged storage and eliminate the majority of the pointers that appeared in the code.
Somewhere back before templates were incorporated into the language, I had built what was probably the first reference-counting pointer implementation for C++. With a mostly-working templates implementation, this would be a lot easier to build, and we could use it to implement storage management for OpenCM.
With the storage management issue mostly solved, we could go ahead and use the C++ exceptions mechanism directly.
All of which brings us to the actual point of this note.
It seems like everyone has done a smart pointer implementation in C++ at some point. For all I know, this may be partly my fault. Smart pointers were examined in one of the chapters of A C++ Toolkit, and something like 35,000 copies of that book were purchased, read, and presumably influenced the behavior of the first crop of mainstream C++ programmers. Whether or not it was my fault, a quick web search will find you 15 or 20 smart pointer implementations. Books on template programming will find you a bunch more.
All of them are wrong — including the one that I am going to describe below.
I don't mean that there are limits to how much a smart pointer can look like a real pointer. There are, and it's irritating, but that's largely beside the point. I mean that some of the really basic issues aren't addressed in most of the current implementations. My three favorite shortcomings are:
The following code fails for most smart pointer implementations:
class B : public class A {
};
. . .
SmartPtr<A> aPtr;
SmartPtr<B> bPtr;
aPtr = bPtr; // fails on most smart pointers
Most of them fail when using smart pointers to constant objects.
Most of them fail when using smart pointers to global variables. Sooner or later the global gets destroyed prematurely, and the system can later crash when the global destructor gets run on the same object.
The constant object problem is simple enough. The problem is that you cannot do invasive reference counting on constant objects because the reference count field is constant! I found this out the hard way. The global variable problem is actually trickier, because you need a way to determine whether the target pointer is global, and there isn't any standard way to do that. But the pointer assignment is the real issue, so let me focus on that.
The problem with the smart pointer assignment above is that the template mechanism is a lexical mechanism. The problem is that most smart pointer implementations define a naive assignment operator that looks something like:
template<class T>
class SmartPtr {
...
inline SmartPtr&
SmartPtr(const SmartPtr& other)
{
IncrementRefCount(other.pObject);
DecrementRefCount(pObject); // mine
pObject = other.pObject;
return *this;
}
};
Even though the object pointers embedded in aPtr and bPtr are assignment compatible, the two variables have different types, so the assignment fails to match the type signature of the assignment operator. Having figured this out the hard way, an excessively clever programmer (to wit, me) might say to themselves ``okay, what I need here is another template,'' and use the following assignment operator instead:
template<class T>
class SmartPtr {
...
template<typename T2>
inline SmartPtr& operator=(const SmartPtr<T2>& p)
{
// Following works around the fact that GCC doesn't
// know how to do template friend declarations.
T2* pOtherObject = &*p;
IncrementRefCount(pOtherObject); // theirs
DecrementRefCount(pObject); // mine
pObject = pOtherObject;
return *this;
}
};
Given this definition of the assignment operator, the assignment statement:
aPtr = bPtr;
works as expected, and (equally important) the reverse assignment:
bPtr = aPtr;
Fails with a type violation.
So I implemented this trick for both the assignment operator and the SmartPtr from pointer constructor: This is the one that lets me write things like:
aPtr = new B;
Imagine my surprise when my program crashed while executing the destructor on a globally declared SmartPtr. It turned out that the target object had already been destroyed!
Fortunately the culprit smart pointer was (a) global, and (b) assigned in only one place. This made it pretty easy to find the offending line of code:
aPtr = new A;
Those of you who can figure out what is wrong without reading the rest of this note are truly experts of C++ obscurity. Given your evident level of skill, you should probably try to master the Pentium protection architecture next. For the rest of you who were as stumped as I was, read on.
Stepping through this code with a debugger, my first surprise was that a SmartPtr constructor was getting run. It turned out that I had ommitted the ``assign from pointer type'' operator, and the compiler was helping me by using the ``construct from pointer type'' constructor. It was a good thing I had, or I would never have detected the problem here. There is probably a lesson in there somewhere, but at the moment it escapes me.
The second problem was that this temporary SmartPtr was then destructed, reducing the reference count of hte newly allocated target object to zero and promptly destroying it.
The question: how was the variable aPtr getting set up in a way that later prompted a second destruct?
The answer: in converting the assignment operator to template form, I had accidentally eliminated the default assignment overload, which is absolutely necessary for SmartPtr. I looked at the two template form, decided that it would work fine when T2 matched T, and happily went on with my development.
Not so fast.
The C++ compiler only tries template expansion when it doesn't have an operator already defined. Because the class definition did not specify an overload of the default assignment operator, the compiler did have an assignment operator defined: the default ``copy by fields'' assignment operator that C++ inherited from the C language. Given the presence of this ``exact match'' assignment operator, there was no need to consider template expansion, and my template assignment never got consulted.
The solution was to keep both the original and the two template assignment operators. This solved the problem. The program now compiles and executes correctly (in its current, truncated form), and I can go on to work on other issues.
While I've solved the immediate problem to my satisfaction, it took me three months to puzzle this one out. It's not just that the C++ behavior is non-obvious in this case. Rather, it's that the program clearly expressed the behavior that I desired and the actions of the compiler didn't match expectations. There was a clearly expressed assignment operator — it just wasn't considered as an option for default assignment. Several observations jump to mind from this experience.
The first is that I'm still not sure whether this is a compiler bug or not. It's clear enough that in the absence of an assignment operator the compiler generates a fieldwise copy, but it's not clear (to me, at least) if a templated assignment operator should have been considered first.
I'm sure that some language lawyer can give us an authoritative answer, but it doesn't really matter. C++ violates the expectations of even expert programmers. This isn't the only area where this is so, but so far it's the most expensive one (in terms of time lost) that I've run into. It would be useful to have some tool that, given some input C++ code, would render explicit all of the operations that the compiler will do to it. Perhaps this could be implemented as a variant of the existing compiler, in which the output is a line by line explanation of how each line is compiled.
Programming languages should support rather than surprise. For C++ this is a daunting challenge because of the legacy burden that it carries. It's time to start shifting to cleaner languages.
In abstract, I'ld love to switch this code base over t oJava or C#. God knows we've rewritten it enough times that we should be able to do so with our eyes more or less closed. At the end of the day, though, Java remains a language without any conforming, open reference implementation or a publicly available, conforming class library.
C# is in much the same state, with the added burden that Microsoft will continue its ``embrace and extend'' policy by providing an ever growing pool of proprietary libraries that are ``must have'' if your application is to look and feel current. Perhaps it's time to consider creating a branding program for ``Open C#'' to indicate code that relies exclusively on open, publicly specified library interfaces that have portable licenses. I'm not looking for Open Source, necessarily (though that would be nice too). Rather, I'm looking for libraries that are implemented exclusively within the CLR framework and talk to a universally available UI toolkit. C# and CLR should become the next development target platform, letting us ignore the Windows vs. UNIX issue entirely.
[1] To put that in perspective, my group at Bell Labs built the first code in C++ that was ever intended for commercial deployment, which was quite possibly the first commercial C++ code anywhere. I'm also the author of A C++ Toolkit, which was the first book on how to do reusable programming in C++. The point being only that I've been at this C++ thing for a while.
[2] The kind of annotating trace we needed requires (a) adding a user-defined word to every object so that we can determine what kind of object we are looking at (a user-defined type tag), and (b) having a version of the mark pass that called a user-defined function on every object. This would have allowed us to build a map of the reachable state and determine where the storage was going.