Delphi: Understanding and analysing access violation errors.
The CPU of a computer has a few modes that it can run in, such as real, protected, and virtual modes. When using a modern operating system, your CPU will be running in protected mode. In this mode, the CPU uses a table to see what bits of memory a process can access in what ways. That means some process X may have rights to access some part of memory Y to read and write, but not to execute. And to some other part of memory, it may not have access rights at all.
So what happens if an application does try to access a part of memory in a way that it doesn’t have the rights to? Well, the CPU notices this and raises (or throws for those C people among you) an application error exception with code 0xC0000005 which is called “access violation”.
After that, a few things can happen. There is a system of protective fences set up using the stack to stop exceptions from reaching the operating system. In delphi this can be done by using the keywords try..finally or try..except. Using the latter, the application can do something to “catch” the exception and do something to handle it. When this doesn’t happen, eventually the operating system will catch it, stop the application, clean up its memory, and show a message that the application has stopped because of an error.
In order to try to prevent the application just disappearing, delphi applications will catch the exception just before it would be going to the operating system, display an error, and (try to) continue its normal operation.
There are a few very valuable things you can learn from this error message. Let’s analyse a few example access violation error messages:
1. Access violation at address 5005FD2C in module ‘rtl260.bpl’. Read of address 00000000.
Here the exception occurred while executing code at memory location 5005FD2C, which looks like a normal location where an application’s code might be located. We also need to take note that it occurred in Delphi’s run time library (rtl260.bpl). The system doesn’t always tell us in which module it happened, but in this case thankfully it did. Now just that the exception occurred in rtl260.bpl does not mean there is a bug in the run time library. Somewhere in the application something happened to cause a situation where the run time library was forced at some point to do something that the hardware does not allow.
To find out more we need to look at the rest of the message, which says “read of address 00000000”. That address number is very interesting. When a pointer (such as a variable which points to an object) is set to nil, that means its actual value becomes zero, which in 8-digit hexadecimal is formatted as 00000000. You can try to see it for yourself by casting nil to an integer and putting it on the screen with ShowMessage() for example. So that 00000000 gives us a clue that some variable was nil here, which the programmer who wrote the code that caused the access violation, apparently didn’t expect it to be.
The best way of finding out what the cause was, is to try reproducing the error while running the application in a debugger. When the exception hopefully occurs again while debugging, you can pause the execution and look at the code where the exception was raised. Now because it happened in the run time library, maybe that code doesn’t make a lot of sense to you, but you can go up in the stack to find a way back to your code. Look for a call to a function in the run time library where your code is passing nil to it. It’s also possible your code called a function in the visual component library (VCL), passing an object which contained a variable that was set to nil.
2. Access violation at address 00000000. Read of address 00000000.
This is a funny one. It looks like the application was running at address 0, but how is that possible? You might think address 0 in memory is where the kernel lives, but actually, address 0 is just not used by anything. Because addresses as they are used by an application are virtual and not the same as a physical location in the actual hardware memory, the virtual address 0 is not even assigned any physical location. So doesn’t that make it even stranger, that it was running at an address that doesn’t even exist in the physical memory? It may seem like that.
First you have to realise it wasn’t actually running at address 0. It was trying to. Next you need to consider that the destructor “Destroy” of an object is a virtual method. So what happens when you have a variable that points to an object and you call Free() on it twice? You might think that after the first time, because the object was destroyed, the code doesn’t exist in the memory any more, so that when it tried to go there, it arrived at this “no-man’s-land” in memory. But when you do analysis of the assembly code and data for an object in memory, you find out that the code is never duplicated when creating an object. When creating a normal object, the only thing that gets stored in memory is the data for object. The code for all the objects of the same type is shared by all these objects. But there is one bit of data that many people are not aware of, which is called the virtual method table. This is a bit of administrative data that tells the system the address of all the virtual methods. This makes it possible to have a variable of type TObject, but use it to refer to a StringList, and still execute the correct destructor when freeing it. There is an entry in the virtual method table that points to what destructor to use.
In order to solve this problem, you need to look for who “owns” an object.. Every creation must be matched with exactly one free. Something that may help a lot here, is the use of the function FreeAndNil(x), which changes the value of x to nil, so that when you try it again, Free() sees that self == nil, and exits instead of calling the destructor, preventing the problem. But it’s useful to be aware that FreeAndNil() is kind of a workaround. Trying to free the same object a million times does cost extra processing time, which is a complete waste of system resources. That being said, I find that FreeAndNil() is worth the extra little bit of processing to prevent problems. One reason why it’s justified to use FreeAndNil() is because freeing an object more than once can have more nasty side-effects, such as memory corruption, leading to data corruption, which if you’re really unlucky can even lead to your company’s reputation getting damaged.
3. Access violation at address 5005FD2C. Read of address 5005FD2C.
I believe this is a tricky one to get. The address that the exception occurred at looks valid, but the address that was read from was the same address, which makes it an invalid address to read from after all, otherwise it wouldn’t have been a violation. But unless some memory corruption has occurred, it must have been a valid address at some point. So that gives a few possible situations:
- Memory corruption has occurred, altering the value of a pointer to a function. Then when that pointer to a function was used, the application tried to execute a jump to an invalid address, resulting in this access violation.
- b. There was no memory corruption and the address used to be valid but it no longer is, because the code at that location was removed from memory. This might happen when a pointer to a function was used and that function was in a library that was dynamically loaded at runtime, after which the pointer to the function was taken, then the library was unloaded, and then the application tried to call the function that was no longer present in memory. This could also happen when the function is a member function (or procedure) of a class from a dynamically loaded library (or package).
- c. The application uses an advanced technique called self-modifying code, which is outside the scope of this article. But to summarise, there was a pointer to a function that was generated at runtime in dynamic memory with execute priveleges, but after the dynamic memory was freed, the pointer was not reset to nil and something tried to call the generated code that’s no longer available.
4. Access violation at address 016FE0B0. Execution of address 016FE0B0.
Here the application tried to execute code at the given address. Note it did not say anything about reading at that address, so that means the application had the rights to read from that location, making the pointer a potentially valid one, just not allowed to be used for execution of instructions.
This can happen when the application uses self-modifying code to generate some machine code in dynamic memory and then jump to that piece of memory to execute the generated machine code. In windows there is an option to check that processes have execution rights on the area of memory they try to run code at, which is a good idea to have on by default as an extra security measure against viruses, spyware, randsomware and hackers. Note that not all software is compatible with this and so some application mights break when switching this on. But it should be only few applications.
But this access violation looks like that’s what happened. The option was enabled, the application allocated some dynamic memory, put some machine code there and tried to jump to it, but forgot to request execution permissions on that piece of memory. Another possibility is that memory corruption occurred and a pointer to a function was overwritten with data, so that at some point the application made a jump to the specified address and the application had read permissions on that address but no execute permissions. A third possibility is that a malicious party found a way to insert executable instructions into a piece of memory buffer via input and made a jump to that piece of memory, but the application (fortunately) did not have execute permissions on that piece of memory. This is the type of abuse that this protection was designed to protect against.
So do you always get an access violation when memory access goes bad?
No, unfortunately not. When a pointer goes bad or you’re accessing memory in a way you didn’t intend, but your application still had the right to access that memory in the way it tried to access it, you won’t get an exception, even though the data you get from it might be garbage.
You see when you allocate a bit of dynamic memory, you don’t always get exactly the amount you requested. The memory manager usually assigns a block of memory like 2k, 8k, 16k or something, to your application. The memory manager keeps an administration of what your application requested. And when you request multiple pieces of memory (say 2 blocks of 8 bytes), you may get a pointer for each of those which is within the same 8 kilobytes of dynamic memory that the memory manager allocated for your application. When you don’t “free” all of the requested pieces, that block of memory can stay in use. If you get two pointers for two blocks of 8 bytes, it’s possible that these two blocks are next to each other in the address space, for instance from $10000000 to $1000000F, for a total of 16 bytes. When you then use the first pointer and write 10 bytes to it, you will end up overwriting the first two bytes of the second block, corrupting the data for the second block in the process. This is probably not what you intended, but because your application has the right to access the memory, you don’t get an access violation for it.
In the end it always comes down to knowing what you’re doing with the memory. Before you write TStringList.Create, think about what part of the application has the responsibility over freeing that object. Usually it’s a good idea to make the object that creates something, also be responsible for freeing it. But in the case of a factory pattern for instance, that doesn’t work, so you’ll need to think where that responsibility goes.