Nice Clean Example

DLL Dump utility -- using Reflection to explore a .NET DLL or EXE

This article discusses the techniques and structures used to access type and structure information in a DLL or EXE file created by .NET. It uses Reflection, a .NET technology, to access information inside the DLL. The source and project files are available for download, and the sample application can be run as well. There are classes in a DLLWrapper dll which should be straightforward to customize for your application, they are minimal and general purpose and meant to be extensible. You could use this, for instance, to get a list of all methods or properties that are in a dll.

The sample code and application associated with this article are located at this link.

This is probably a familiar tale. I was working on a project for a client who has had different programmers working at different times on pieces of his app, and there wasn't a centralized source code store like SubVersion or Source Safe, so different versions of source and binary were floating around. The task was to enhance one of the many small programs in use. Although it required only a few lines of code, the major hurdle was trying to create a build, and to do that I needed the source. Well, there was an old version here with source, and another verison there that was compiled, etc. etc.

Now, there was a running version, so of course thats the authority. I needed something as close to that as possible. I realized I needed a quick and easy way to see how alike (or not) multiple dll's that held various versions of the code were. I didn't need a full decompile of the dll, just getting the names and parameters of methods and properties would pretty much tell me what I needed to know. Then I could decide on decompiling or whatever, but there was a good chance that the source was around somewhere; it was at least worth a peek into the various versions.

While you can use somethign like ILDASM or object explorer to poke inside a DLL I really wanted something that was an ASCII listing that had sections sorted by method/property name so I could throw different versions up side by side in plain old NotePad and quickly see the differences.

I really wanted this command that I could run at a command prompt:

dlldump someversion.dll > out.txt
Actually what I really wanted was this:
dlldiff someversion.dll anotherversion.dll > out.txt

However, I'd settle for txt files of different dumps and just look at the damn things in Notepad. They key is what I needed and what I didn't need for the task at hand.

One of the most potent things about .NET is that it keeps type information around at runtime and in builds of executables, and uses a technology called "Reflection" to allow access to that.

Up until now, type information had always been considered necessary for the human programmers and discarded as part of compilation to a release version. Assembler language is all integers and floats and doesn't need no stinkin' types!

I had known about Reflection for some time although I had done very little with it. There is excellant documentation for this, so this seemed like a good time to become compulsive and stay up until midnight writing a utility so that I could spend 10 minutes actually using. But hey, I got an article and another sample application out of it, plus I now have my dlldump utility, as well as classes that I can use as a base for future work of this sort.

Plus, its structured as a DLL of classes so you can take it and leverage it as the start of your own utility, because the exact thing you need will be different than what I need.

Starting from the top

The highest conceptual construct in this world is the "assembly", which is intuitively just a dll or an executable. This code loads a DLL from a file using Reflection:
Assembly assembly = System.Reflection.Assembly.LoadFrom(file);
Once thats done you then have a world of data structures and collections available to you. Lets just march down the tree for the stuff I used for this app.

An assembly contains a set of Types

A DLL, especially a class DLL, will contain some number of related classes, and of course a class is a type. Here is a loop that processes all the types (classes) in a dll.
foreach (System.Type type in assembly.GetTypes()) {
	// sample fields: type.FullName, type.GUID
}        

A type contains Constructors, Methods, Properties, etc.

Your class definition contains the stuff that make up the class, all of the methods and properties and other items. Here is some code that shows how these are accessed.
/*
 * Proces list of constructors
 */
foreach (ConstructorInfo info in type.GetConstructors())
{
	// do something with 'info'
}
/*
 * Process the methods in this class
 */
foreach (MethodInfo info in type.GetMethods())
{
    // do something with info
} 
/*
 * Process the list of properties
 */
foreach (PropertyInfo info in type.GetProperties())
{
     // do something with info
}
As you can see, the Reflection namespace give us control structures for all of the aspects of a class.

Constructors, Methods, Properties, etc. have various properties

As you drill further down into the types that Reflection reveals you get specific information about each Constructor, Method, Property, etc. as this sample code for a method reveals.
void dostuff(MethodInfo info) {
	string s;
	// here is the method name
	s = info.Name;
	
	// here is the method return type
	s = info.ReturnType.Name; 
	
	// Here is the parameter list
	ParameterInfo[] plist = info.GetParameters();
	
	// Lets reach down further to show the name and type of the parameters
	// This gives us enough information to build a prototype.
	
	foreach (ParameterInfo pinfo in plist) {
		s = pinfo.ParameterType.Name; 	// type of parameter
		s = pinfo.Name;					// name of parameter
	}
	
	// check this out. We can get the MSIL code for this method.
	MethodBody mb = info.GetMethodBody();
	byte[] il = mb.GetILAsByteArray();
	
	// the il array contains the MSIL executable code. We can 
	// use other Reflection classes to help decompile it. cool, huh?
}

The DLLWrapper classes

Because the Reflection types contain all of the information we need it doesn't make sense to create a data structure that duplicates that. As a result, when you look at the classes in the DLLWrapper dll you will see that they are really, for the most part, well, wrappers.

The DLLWrapper dll contains these classes, and most are really just a layer around the corresponding Reflection Class.

  • DLL -- container for all the types in a DLL
  • OneType -- Information about a single type in the DLL
  • Constructor -- holds the ConstructorInfo
  • Event -- holds the EventInfo
  • Interface -- holds the InterfaceInfo
  • Member -- holds the MemberInfo
  • Method -- holds the MethodInfo
  • Parameter -- holds the ParameterInfo
  • Property -- holds the PropertyInfo
  • Disassembler -- has logic to disassemble MSIL code

Declaration and initialization tend to look pretty ordinary, each one of my classes wraps an instance of the Reflection type.

public class Parameter
{
    System.Reflection.ParameterInfo info;

    public Parameter(ParameterInfo info)
    {
        this.info = info;
    }
}
The idea is that the wrapper code will give me a place to drop those things where I can get to them but also allow me to create a specific place that I can write the logic that I care about, the logic that lets me create the smart access that I will want in any particular situation. The alternative is a container class with many methods that operate on the built-in Reflection types, and that gets cluttered pretty quick.

My first application is a simple ASCII dump, so I added a ToString() method to all of the classes that would display the values of the item. Here is the ToString() for the Property class:

public void ToString(StringWriter sw)
{
    sw.Write("{1} {2} {{",
        info.Attributes.ToString(),
        info.PropertyType.Name,
        info.Name);

    if (info.CanRead) sw.Write(" get { }");
    if (info.CanWrite) sw.Write(" set { }");

    sw.Write("}");

}
You can see that I'm using a StringWriter to save the information so I can get to it later, the StringWriter class is very handy for creating large strings. Note how I'm referring back to the 'info' member, which contains all of the information, to get what I want. There is already a place where the property name is stored, for instance, I don't need to create a "Name" property of my own because I can get it whenever I need it.

Using the Wrapper classes

Here is how to use the wrapper class
using DLLWrapper;
/*
 * Load the assembly
 */
Assembly assembly = System.Reflection.Assembly.LoadFrom(file);
if (assembly == null) throw new Exception("Invalid DLL: " + file);
/*
 * Make a pass through the DLL to set up easy access to info
 */
DLL thedll = new DLL(assembly);
/*
 * At this point its loaded. Add your own methods to get what you want,
 * each of the things you need has easy access to the Reflection type
 * that contains all of the detail.
 */

Adding in a disassembler, just for fun

I really didn't need to do disassembly for my project, but when I saw how easy it was to get access to the byte codes of a method I was compelled to see how much more effort it would take to disassemble it into text. The text form of the language is called CIL, or Common Intermediate Language, and was formerly called MSIL, or Microsoft Intermediate Language.

Because of my operating system work I've spent a fair amount of time around assemblers, disassemblers, linkers, compilers, etc. I did a full disassember for the 80386 binary, including "real mode" and extended mode.

<Blast_From_Past>
In 80386 assembler language, 0x66 and 0x67 are the operand size override and address size
override codes, and they appear before an instruction that operates on data or an address.
If you are in 16 bit mode they mean 'use 32 bits'. If you are in 32 bit mode
they mean 'use 16 bits'.
</Blast_From_Past>
Sorry, that came out of nowhere and I had to type it down so I could move on. I wonder what search query will pull the page up because of that? hmm..

Anyway, to disassemble something you take a binary value and look up the instruction that that corresponds to. In .NET, because of Reflection, this is extraordinarily easy because all of the opcodes and their values are built into the Reflection classes. If you take a look at the Disassembler class in the wrapper DLL you will see that the logic for going from the binary to the opcode is just a matter of running through members of the the OpCodes class, which contains an OpCode structure for each valid CIL opcode.

Granted, this isn't very efficient, a lookup table based on opcode value would be much faster, but this was just for demo and learning purposes.

4 responses