A very generic datamodel.

I’ve come up with several projects in the past and a few have been mentioned here before. For example, the Garagesale project which was based on a system I called “CART”. Or the WordChain project that was a bit similar in structure. And because those similarities, I’ve been thinking about a very generic datamodel that should be handled to almost any project.

The advantage of a generic database is that you can focus on the business layer while you don’t need to change much in the database itself. The datamodel would still need development but by using the existing model, mapping to existing entities, you could keep it all very simple. And it resulted in this Datamodel:ClassDiagram(Click the image to see a bigger version.)

The top class is ‘Identifier’ which is just an ID of type GUID to find the records. Which will work fine in derived classes too. Since I’m using the Entity Framework 6 I can just use POCO to keep it all very simple. All I have to do is define a DBContext that tells me which tables (classes) I want. If I don’t create an entry for ‘Identifier’, the table won’t be created either.

The next class is the ‘DataContent’ class, which can hold any XML. That way, this class can contain all information that I define in code without the need to create new tables. I also linked it to a ‘DataTemplate’ class which can be used to validate the content of the XML with an XML schema or special style sheet. (I still need to work out how, exactly.) The template can be used to validate the data inside the content.

The ‘BaseItem’ and ‘BaseLink’ classes are the more important here. ‘BaseItem’ contains all fixed data within my system. In the CART system, this would be the catalog. And ‘BaseLink’ defines transactions of a specific item from one item to another. And that’s basically three-fourth of the CART system. (The template is already defined in the ‘DataTemplate’ class.)

I also created two separate link types. One to deal with fixed numbers which is called ‘CountLink’ which you generally use for items. (One cup, two girls, etc.) The other is for fractional numbers like weights or money and is called ‘AmountLink’. These two transaction types will be the most used transaction types, although ‘BaseLink’ can be used to transfer unique items. Derived links could be created to support more special situations but I can’t think of any.

The ‘BaseItems’ class will be used to derive more special items. These special items will define the relations with other items in the system. The simplest of them being the ‘ChildItem’ class that will define more information related to a specific item. They are strongly linked to the parent item, like wheels on a car or keys on a keyboard.

The ‘Relation’ class is used to group multiple items together. For example, we can have ‘Books’ defined as relation with multiple book items linked to it. A second group called ‘Possessions’ could also be created to contain all things I own. Items that would be in both groups would be what is in my personal library.

A special relation type is ‘Property’ which indicates that all items in the relation are owned by a specific owner. No matter what happens with those items, their owner stays the same. Such a property could e.g. be a bank account with a bank as owner. Even though customers use such accounts, the account itself could not be transferred to some other bank.

But the ‘Asset’ class is more interesting since assets are the only items that we can transfer. Any transaction will be about an asset moving from one item to another. Assets can still be anything and this class doesn’t differ much from the ‘BaseItem’ class.

A special asset is a contract. Contracts have a special purpose in transactions. Transactions are always between an item and a contract. Either you put an asset into a contract or extract it from a contract. And contracts themselves can be part of bigger contracts. By checking how much has been sent or received to a contract you can check if all transactions combined are valid. Transactions will have to specify if they’re sending items to the contract or receiving them from the contract.

The ‘BaseContract’ class is the more generic contract type and manages a list of transactions. When it has several transactions, it is important that there are no more ‘phantom items’. (A phantom item would be something that’s sent to the contract but not received by another item, or vice versa.) These contracts will need to be balanced as a check to see if they can be closed or not. They should be temporary and last from the first transaction to the last.

The ‘Contract’ type derived from ‘BaseContract’ contains an extra owner. This owner will be the one who owns any phantom items in the contract. This reduces the amount of transactions and makes the contract everlasting. (Although it can still be closed.) Balancing these contracts is not required, making them ideal as e.g. bank accounts.

Yes, it’s a bit more advanced than my earlier CART system but I’ve considered how I could use this for various projects that I have in mind. Not just the GarageSale project, but also a simple banking application, a chess notation application, a project to keep track of sugar measurements for people with diabetics and my WordChain application.

The banking application would be interesting. It would start with two ‘Relation’ records: “Banks” and “Clients”. The Banks relation would contain Bank records with information of multiple banks. The Clients relation would contain the client records for those banks. And because of the datamodel, clients can have multiple banks.

Banks would be owners of bank accounts, and those accounts would be contracts. All the bank needs to do is keep track of all money going in our out the account. (Making money just another item and all transactions will be of type ‘AmountLink’.) But to link those accounts to the persons who are authorized to receive money from the account, each account would need to be owner of a Property record. The property record then has a list of clients authorized to manage the account.

And we will need six different methods to create transactions. Authorized clients can add or withdraw money from the account. Other clients can send or receive payments from the account, where any money received from the contract needs to be authorized. Finally, the bank would like to have interest, or pays interest. (Or not.) These interest transactions don’t need authorization from the client.

The Chess Notation project would also be interesting. It would start with a Board item and 64 squares items plus a bunch of pieces assets. The game itself would be a basic contract without owner. The Game contract would contain a collection of transactions transferring all pieces to their first locations. A collection of ‘Move’ contracts would also be needed where the Game Contract owns them. The Move would show which move it is (including branches of the game) and the transactions that take place on the board. (White Rook gone from A1, White Rook added to A4 and Black pawn removed from A4, which translates into rook takes pawn at A4.)

It would be a very complex way to store a chess game, but it can be done in the same datamodel as my banking application.

With the diabetes project, each transaction would be a measurement. The contract would be owned by the person who is measuring his or her blood and we don’t need to send or receive these measurements, just link them to the contract.

The WordChain project would be a bit more complex. It would be a bunch of items with relations, properties and children. Contracts and assets would be used to support updates to the texts with every edit of a WordChain item kicking the old item out of the contract and adding a new item into the contract. That would result in a contract per word in the database.

A lot of work is still required to make sure it works as well as I expect. It would not be the most ideal datamodel for all these projects but it helps me to focus more on the business layer and the GUI without worrying about any database changes. Once the business model becomes more advanced, I could create a second data layer with a better datamodel to improve the performance of the data management.

 

 

 

Multithreading, multi-troubling.

Recently, I worked on a small project that needed to make a catalog of image files and folders on my hard disk and save this catalog in a database. Since my CGI and my photography hobby generated a lot of images, it would be practical to have something easy to support it all. Plenty of software that already does something like this, but none that I liked. Especially since I want to connect images to derived images, group them, tag them, share them, assign licenses to them and publish them. And I want to keep track of where I’ve shared them already. Are they on Flickr? CafePress? DeviantArt? Plus, I wanted to know if they should be rated as adult. Some of my CGI artwork is naughty by nature (because nude models are easier to work with) and thus unsuitable for a broad audience.

But for this simple catalog I just wanted to store the image folder, the image filename, an image name that would be the filename without extension and without diacritics, plus the width and height of the image so I could calculate the image ratio. To make it slightly more complex, the folder name would be a relative folder name based on a root folder that’s set in the configuration. This would allow me to move the images to a different folder or use the same database on a different machine without the need to adjust all records.

So, the database structure is simple. One table that has the folders, one table containing image ratios and one for the image names and sizes. The ratio table will help me to group images based on the ratio between width and height. The folder table would do the same for grouping by folder. The Entity Framework would help to connect to this database and take away a lot of my troubles. All I have to do now is write a simple library that would fill and keep up this catalog plus a console application to call those methods. Sounds simple enough.

Within 30 minutes, the first version was ready. I would first enumerate all folders below the source folder, then for each folder in that list I would collect all image files of type PNG, JPG and BMP. The folder would be written to the folder table and the file would be put in the Image table. Just one minor challenge, though…

I want to add the width and height of the image to the image table too, and based on the ratio between width and height, I would have to either add a new ratio record, or change an existing one. And this meant that I had to read every file into memory to find its size and then look if there’s already a ratio record related to it. If not, I would need to add the new ratio record and make sure the next request for ratio records would now include the new ratio record. Plus, I needed to check if the image and folder records also exist in the database, because this tool needs to update only for new images.

The performance was horrible, as could easily be predicted. Especially since I make images and photo’s at high resolutions, so reading those files does take dozens of milliseconds. No matter that my six cores at 3.5 GHz and 32 GB of RAM turns my system in a Speed Demon, these read actions are just slow. And I did it inefficiently since I have six cores but my code is just single-threaded. So, redo from start and this time do it multithreaded.

But multithreading and the Entity Framework don’t go well together. The database connection isn’t threadsafe and thus you cannot access the database methods from multiple threads. Besides, the ratio table could generate collisions when two images with the same, new ratio are processed. Both threads would notice the ratio doesn’t exist thus both would add it. But one of those would then fail because the other would have added it first. So I needed to change my approach.

So I Used ‘Parallel.ForEach’ to walk through the folder list and then again for all files within the folder. I would collect the data in internal lists and when the file loop was done, I would loop through all images and add those that didn’t exist. And yes, that improved performance a lot and kept the conflicts with the ratio table away. Too bad I was still reading all images but that was not a big issue.Performance went up from hours to slightly over one hour. Still slow.

So one more addition. I would first read all existing folders and images from the database and if a file existed in this list, I would not read it’s size anymore since it wasn’t needed. I could skip the image. As a result, it still took an hour the first time I imported all images, but the second run would finish within a minute, since there wasn’t anything left to read or add. The speed was limited to just reading the files and folders from the database and from the disk.

When you’re operating these kinds of projects in an Agile team and you’re scrumming around, things will slow down considerably if you haven’t thought about these challenges before you started the sprint to create the code. Since the first version looks quite simple, you might have planned it as a very short task and thus end up with extremely slow code. In the next sprint you would have to consider options to speed things up and thus you will realize that making it multithreaded is a bigger task. And while you are working on the multithreaded version, you might discover the conflicts with the Entity Framework plus the possible collisions within the tables. So the second sprint might end with a buggy but faster solution with lots of exception handling to catch all possible problems. The third sprint would then fix these, if you manage to find a better solution. Else, this problem might haunt you to the deadline of the project…

And this is where teams have to be real careful. The task sounds very simple, but it’s not. These things are easily underestimated by a team and should be well-planned before you start writing code. Experienced developers will detect these problems before they start, thus knowing that they should take their time and plan carefully without writing code immediately. (I only did it so I could write this post.) The task seems extremely simple and I managed to describe it in the second paragraph of this post with just three lines. But the solution with a high performance will require me to think before I start writing code.

My last approach is the most promising, though. And it can be done by using multithreading but it’s far more complex than you’d assume at first. And it will be memory-hungry because you need to create several lists in memory.

You would have to start with two threads. One thread will read the database and generate lists of files, folders and ratios. These lists must be completely in-memory because if you keep them as queryable lists, the system would try to continuously read them. Besides, once you’re done generating these lists you will want to close the database connection. This all tells you what you already have. The second thread will read all folders and by using parallel threads it would have to read all image files within those folders. But you would not read the image sizes yet, nor calculate all ratios.

When you’re done collecting the data, you will have to compare it all. You would start by comparing the lists of folders. Folders that exist in both lists can be ignored (but not their files.) Folders that exist in the database list but not the disk list should be deleted, including all files within those folders! Folders that are on disk but not in the database need to be added. Thus you can now start two threads, each with their own database connection. One will delete all folders plus their related images from the database that have been deleted while the other adds all new folders that are found on the disk. And by using two database connections, you can speed things up. You will have to wait for both threads to finish, though. But it shouldn’t be slow.

The next step would be the comparison of images. Here you do something similar as with folders. You split the lists in three different lists. One with all images that are unchanged. One with all images that need to be deleted. And one with all images that need to be added. And you would create a separate thread with its own database connection to delete the images so your main process can start working on the ratios table.

Because we now know which images need to be added, we can go through those files using parallel processing, read the image width and height and add this information to the image file records. When we have enriched this list with these sizes, we can use a LINQ query to generate a list of all ratios of those images and removing all duplicate ratios in this list. This generates the list of ratios that we would need to check.

Before we add the new images, we will have to check the ratios table. As with the folders table, we check for all differences. However, we cannot delete ratios that we haven’t found among the images, because we skipped the images that already exist. We will do this later, though. We will first start adding the new ratios to the database. This too can be done in a separate thread but it’s pretty fast anyways so why bother? A performance gain of two seconds isn’t worth the extra effort if a process takes minutes to finish. So add the new ratios.

Once all ratios are added, we can add all images. We could do this using parallel threads, with each thread creating a new database connection and processing all images from one specific folder or with one specific ratio. But if you want to add them multi-threaded I would just recommend to divide the images in groups of similar sizes. Keep the amount of groups relative to the number of processes (e.g. 24 for my six cores) and let the system do its work. By evenly dividing the images over multiple threads, they should all take about the same amount of time.

When adding the new images, you will have to find the related folder and ratio in the database again. This makes adding images slower than adding folders or ratios because you need the extra lookup. This performance would increase if we had kept the Folders and Ratio lists as queryable lists but then we could not open and close the connections, not could we use multiple connections to add those images. And we want multiple connections to speed things up. So we accept a slightly worse performance at this point, although we could probably speed it up a bit by using a stored procedure to add the images. The stored procedure would have parameters for the image name, the image filename, the width and height, the folder name and the ratio width and height. I’m not too fond of procedures with many parameters and I haven’t tested if this would increase the performance, but in theory it should be faster, especially if the database is on a different machine than the application.

And thus a simple task of adding images to a database turns out to be complex, simply because we need better performance. It would still take hours if it has a lot of new images to add but once you have it mostly filled, it will do quite well.

But you will have to ask yourself and your team if you are capable to detect these problems before you start a new sprint. Designs are simple, because designers don’t always keep the performance in mind. These things are easily asked for because they appear very simple, but have a lot of consequences. Similar problems might arise when you work with projects that need to be secure. The design might ask for a login screen with username and password, and optionally a few OpenID providers as alternative logins, but the amount of code to manage all this data and keep it secure is quite complex. These are real moments when you need to design some technical documentation first, which is something people often forget when working on an Agile project.

Still, you cannot blame the developer if the designer just writes a few lines and the developer chooses the first, slow solution. The result would be the requested task. It is the designer who needs to be aware of these possible performance pitfalls. And with Agile, you have a team. All team members should be able to point out that this simple description would have these pitfalls, thus making it a long and complex task. They should all realise that they will have to discuss possible solutions for this and preferably they do so as a team with just one computer. (The computer would be used to find information, not to write code!) Only when they agree on the proper solution then one or two of them could start writing code. And they would know how long this task will take. Thus, the task would finish within two sprints. In the first sprint, all team members would have a small task to meet and discuss the options. In the second sprint, one or more members would have a big task of implementing the code.

Or, to keep it simple: think before you start writing code!

The challenge for the CART system.

In The CART datamodel I displayed the datamodel that would be required for the CART system. Basically, the data model would store the items, transactions and contracts while the templates will be stored in code, as XML structures that are serialized to objects and back again. As a result, I would split the relations from the data, thus allowing me to avoid regular updates to the database structure. All that I might want to update are the templates but even that might not be required for as long as the software will support “older versions” of those objects.

But serializing those objects to and from XML isn’t as easy as it seems. Since I’ve separated data from relations, the data itself doesn’t know its own ID. Why? Because the ID is part of the relation, thus I would not need to store it within the XML. (It would be redundant.) But if I want to use these objects through a web service, having the ability to send this ID to the client is very practical, so I can send back changes through the same web service. I would need the ID to tell the server what to do with which object.

Thus I’ll end up with two methods of serializations. One is just to serialize the data itself. The other is the data plus its ID. And now I will have to decide a way to support both. Preferably in a simple way that would not require me to generate lots and lots of code.

In the data layer, I would split up every object into a relation and a data part. The data would be stored as XML within the relation object. To make sure the relation object will be able to expose the object, I would need to give it a DataObject property that would be the data object. It’s get/set methods should connect to the XML inside, preferably even by using an object as buffer so I don’t have to serialize it over and over again.

In the business layer, I should not have an XML property, nor should I have a DataObject property. The data fields should be there, with the ID. And basically, I would need a mapping to convert between the data layer and the business layer. The trouble with this approach is that I would define all data twice. Once in the data template and once in the business layer. That’s not very smart. I need to re-use things…

I’m considering to add my serialization method for the data templates. This means that I will include the ID within the template, so it becomes part of the object. All properties would be defined as data members, including the ID. That way, the ID is sent from the business layer to the client. But to store the template in the relation object, I would need to create my solution.

One solution would be by implementing methods to convert the data to XML plus a constructor that would accept XML to create it. It would also mean that I need a way to recognize each object type so I can call the proper construction and probably inherit a bunch of code or write interfaces to make objects more practical to be called upon. It would be complex…

Another solution would be by defining my own attributes. One would be for the class name, thus allowing me to find classes based on this custom attribute. The other would be for the property and my code would just use all of those to generate the XML or to read it back again. This too would allow custom field names. It would be a cleaner solution since I would define the template as just a bunch of properties. Which, basically, they are.

But this second solution is a bit complex, since I still need a way to call the constructor of these specific classes. So I’ve opened a question on StackOverflow, hoping I will get an interesting answer that would solve this easily. Why? Because part of being a good developer is to ask other experts for possible solutions when yourself don’t have a good answer! 🙂

The CART datamodel

Well, my back problems made me think a lot about the CART system that I’ve mentioned before. And it made me consider how I need to handle the difference between plain data and the relationship between the objects. Because the most troubling thing I had to deal with was that I did not want to change my datamodel for every new item that I want to store. So it made me think…

The CART system is mostly created to handle relationships between items, transactions and contracts. It’s not about handling of the data itself. Actually, the system doesn’t even care about the data. All that matters are the relationships. Data is just something to display to the user, something to store but normally not something that you’ll need to process very often at the data layer. So, considering the fact that you can serialize objects in .NET to XML data, I’ve decided to support a basic structure for my Garage Sale project for all the items, transactions and contracts. And each of them will contain a Data property that has the serialized data, that I could convert to data objects and back again.

This idea makes it also more clear where the templates are within my system. My templates are these object definitions! I will have a very generic database with a very simple layout, and I can generate a very complex business layer around this all that I can change as often as I like without the need to change my database model. As a result, I might never have to change the data model again!

Of course it will have a drawback, since the database will contain serialized objects. If I change those objects, I will also need to keep track of the changes in those stored structures and either update them or keep up multiple versions of those objects. Updating those structures would mean that I have to create update apps that know both the old structures and the new structures. It should then convert each old structure to a new structure. Maintaining multiple versions might be more practical since that would leave all old data intact in your system. Anyways, it’s some added complexity that I’ll have to consider.

But now, my datamodel as created by Visual Studio 2012 by using the Entity Framework:EF-CART

 

So, what do you see?

  • The DataObject is the base class for any CART object. It has a unique identifier, a name that can be used to show the related object and a Data property that will contain an object as XML.
  • DataItem is a generic item class, containing an additional description just for more practical reasons. When a user wants to select an existing item, having a description makes it possible to avoid reading the object within the data.
  • The Collection table is just an item with an extra bonus. It can contain multiple child items without the need for any transactions. Actually, this is just a shortcut solution to allow more complex structures within your items. For example, you might create a complete Household collection containing husband, wife, four children and a dog. And although you could link them together by using transactions, having them in a collection just saves the need to create those transactions.
  • DataTransactions is the base class for the transactions, having a sender, receiver and subject item connected together. It also has a link to a rule and a timestamp, indicating when the transaction tool place. (Or will take place for future transactions.)
  • IntegerTransaction is just a basic transaction with a multiplier. This way, you don’t have to add a lot of transactions when your customer buys ten bags of flour.
  • DecimalTransaction is also a basic transaction that will handle items that can be traded in all kinds of different numbers, including fractional amounts. For example, the price of a product, or it’s weight, length or light intensity.
  • DataRule is the basic contract type. It’s a collection of transactions that are all related to one another. For example, the sale of a product would result in a sale rule.
  • The Contract class is more than just a rule. It’s a rule that can contain child rules, thus allowing structured contracts that are made up of several subcontracts. These are used when you have to deal with complex
    situations, like mortgages. A mortgage would include a rule for purchasing a house, a rule for lending money and paying it back, plus other rules for all kinds of insurances.

Now, as I’ve said before, this datamodel should offer me more than enough structural parts to use for my Garage Sale project. All I need to do is compile it and then just leave it alone. There should not be a need to include anything else.

Well, okay… That’s not completely true, since I might want to handle user accounts and store large images of products. But these things would actually require new database structures and should preferably be part of separate databases.

Looking back at this design, it surprises even me how less data it actually has. But the trick here is that I’ve separated the relationships between objects from the actual data itself. Each object can still contain a huge amount of data. But it’s just not important for the model itself.

Code quality with NDepend

In the ICT branch, quality tends to be overlooked. Plenty of products need regular updates and patches to fix problems in the existing code. Problem is that the software market has many software developers who don’t have a good grasp on the concept of quality, because they’re afraid that writing quality code will slow them down. Quality tends to be difficult, hard to understand. While writing programs is quite easy, since all you do is tell a computer what to do in a language the computer will understand.

One of the things that will always annoy me is when I see a project created by someone else, and it works great yet the code is an unreadable mess of Jibber-jabber. And often it’s caused because multiple people have added code to it, each and everyone in their own style and based on their own knowledge. And some of them are very good at their work, while others are bad. And even when a project started with some great code at first, all changes and updates just tend to make it too complex. And often it’s just because of deadlines or inexperience that code becomes bad. Or the lack of courage among developers to just throw away bad code and completely rewrite it using a better style and technique.

When I was developing in Delphi (which is still a hobby of mine) I had Peganza’s Pascal Analyzer available to help me tell about the status of the code. And although the Delphi compiler would just tell me about errors and warnings, those weren’t just enough to find the code quality. Because badly written code could still compile without warnings and errors. So I used the analyzer to tell me where I needed to re-factor the code to improve its quality.

So I wanted something similar to use with Visual Studio and C#. And although VS2012 offers its own analysis tools, they didn’t show me very much information. I discovered that I had to select a set of rules first, so I did. I chose the “Extended Design guidelines” and it gave me a few more warnings about my code. I have to avoid namespaces with just a few types, I need to validate the arguments of public methods. And I needed to mark a member as static. Too bad that the code I’ve used started as a generated project by Visual Studio itself. All I did was add a bit of meat to its bones, but the meat wasn’t the problem. The bones were…

So Microsoft’s own analysis complains about Microsoft’s own templates. I see. Yeah. Hum.

The next in line is Resharper, and Resharper doesn’t just analyse code, it also makes suggestions while typing and can even correct things on a simple mouse-click. Resharper is great when you want to re-factor your code and even analyses JavaScript code and other languages. It’s a great tool to fix things, but if you need to fix things, it means you’ve broken something first. And generally, when something is fixed you would prefer to have an immediate warning when your code starts to “smell” again, so you can do a rollback and tell the person responsible to do a better job. You don’t want to fix things at the end of a sprint when things start to smell bad. You want warnings sooner, when the project is built by one of the developers in your team. And basically, you want a report telling you about the current quality of the project, which you could show to clients and use to compliment your team on their good work. (Or to bash some heads when to do it badly.)

So, NDepend… It’s a bit expensive but for those who use a special build server to make new, daily builds based upon the latest code it is also a practical solution. It allows you to define your own set of rules based upon a query language that’s based upon LINQ. And it gave me plenty of warnings on my little project that are actually a bit more helpful than those VS2012 or Resharper gave me. It advised me to re-factor one complex method, which was indeed a bit too complex. It told me to make some classes ‘sealed’ since I’m not inheriting from them. Practical, since I wasn’t planning to inherit from these anyways. Some funny warning about a namespace without types, which isn’t very practical. Unfortunately, this default rule didn’t tell me which namespace it was or where it’s located so I have to search a bit. And a warning that I should avoid defining multiple types in a single source file. And indeed, I had declared three types in a single file, which isn’t good practice. Then again, this file was auto-generated too…

As it turns out, this simple project of mine didn’t have a lot of issues. But some large solutions that I’ve worked on do have plenty to report and the analyzer makes it very clear that maintaining solutions for over 7 years without regular quality check-ups on code do collect a lot of stink. I already knew some of those solutions were bad, but the analyzer made it even look horrible, almost unable to tell about all the smelly parts. It would take months to just fix those smelly parts and thus cleaning it up becomes a bit too expensive. You wouldn’t need a tool that tells you it smells, you’d need a tool that cleans it up. So for those big solutions, Resharper and common sense would be better.

I also decided to use it on a solution that I’ve worked upon myself, just curious about the amount of code smell in that solution, compared to some of the other large solutions I’ve seen. And I wasn’t surprised about the fact that it would have plenty of warnings. And it did. My small project only violated 13 rules while a large solution that I have had broken 86 rules. My solution broke 71 rules… So, almost as bad as the big ones, until I started to check the number of times each rule was violated. In my solution, I made perhaps a dozen violations for things that I consider important up to 3302 methods that could have had a lower visibility. Lowering visibility on those methods it mentioned isn’t that important to me, so those I won’t even fix. But the other big solutions I’ve analyzed had more than 100 violations per rule with only a few exceptions. Several rules had over a thousand violations with one rule having more than 12.000 violations. I would almost feel sorry for those who have to support that!

So, NDepend tells me my code stinks. Does it tell me even more? Yes, it does! It provides practical statistics about my code, displaying all namespaces and types but also the dependencies between the assemblies within a solution. And it shows the dead code within my solution. I had one dead method, that was actually part of a third-party component.

NDepend gets it power from the ability to create new rules, to group rules together and in the fact that it generates an HTML report that you could share among all developers, your managers and CEO and perhaps even send to customers who want to check if you’re making quality software. It can help you to improve your code but its main purpose is to tell you the quality of your code. It gives you a challenge. The challenge to rewrite your solution so it won’t have anything to warn you about. Once you’re at that point, NDepend could become part of your build process to make sure your code maintains such a high quality.

So, would you need NDepend in your tool collection? Is it worth the price you’d have to pay for it, or is it too expensive?

Well, for me as a single developer it’s practical since it challenges me to write code that creates no warnings. And this can be a real tough challenge when you’re working on complex solutions. But my personal budget is reasonable big, thus allowing me to make a few large expenses and probably even allows me to overpay a bit. It is an expensive product and I would recommend to first buy a Resharper license before considering buying NDepend. As long as Resharper tells you about code smell, you’re still in need of learning better design practices.

For a team of professional developers it is basically the same as for single developers. However, teams of developers will need reports telling them if the quality of the code is improving or not. You need to depend on the others in your team, and often you will lack the time to do a code review on the work of each team member. For teams of developers I would recommend to use NDepend if you, as a team, want to improve the code quality and are willing to check reports on the code quality with every new build. A license for your build server would be most practical in that case, especially when the build server will do regular (daily) builds automatically.

When your job is doing code reviews, you might want to use the build-in analyzer of VS2012 or Resharper but those are more meant to allow developers to fix code immediately. Code reviewing isn’t just meant to improve code, but is meant to improve developers. When you’re doing regular code reviews, NDepend would be very useful to do a lot of work for you. You would have to write your own rules and change several of the existing rules that NDepend provides, but it will help you to review the works of others.

If you’re a manager in an ICT department then reviewing code will probably be a bit complex. However, you would still need to know about the code quality and the quality of the team of developers. Here, too, a license for the build server would be practical to keep you informed about the health of your product and to take steps when your developers start adding smelly code. Especially when you need to outsource the code the use of code analysis becomes very important since you need to know if that far-away team is actually doing a good job. Too many outsourced projects have become extremely smelly, even though they seem to work just great. It would be a shame is a great start-up becomes a rotten egg simply because you forgot to check the code quality.

When you look at NDepend more closely, you will see how powerful this tool actually can be. You just have to keep in mind that this tool isn’t really meant to improve code, like Resharper and the VS2012 Analyzer does. It’s meant to improve the developer because it tells him if he’s improving or not. It tells teams of developers where their weaknesses are and will help to get rid of those weaknesses. And it will help companies to keep up a quality in the code that should equal the quality of their products.

Just don’t use it when your projects are already extremely smelly. You would need to clean up your code before you can clean up your developers…