The need of security, part 2 of 3.

Azra Yilmaz Poses II

Enter a caption

What is encryption and what do we need to encrypt? That is an important question that I hope to answer now.

Encryption is a way to protect sensitive data by making it harder to read the data. It basically has to prevent that people can look at it and immediately recognize it. Encryption is thus a very practical solution to hide data from plain view but it doesn’t stop machines from using a few extra steps to read your data again.

Encryption can be very simple. There’s the Caesar Cipher which basically shifts letters in the alphabet. In a time when most people were illiterate, this was actually a good solution. But nowadays, many people can decipher these texts without a lot of trouble. And some can do it just inside their heads without making notes. Still, some people still like to use ROT13 as a very simple encryption solution even though it’s almost similar to having no encryption at all. But combined with other encryption methods or even hashing methods, it could be making encrypted messages harder to read, because the input for the more complex encryption method has already a simple layer of encryption.

Encryption generally comes with a key. And while ROT13 and Caesar’s Cipher don’t seem to have one, you can still build one by creating a table that tells how each character gets translated. Than again, even the mathematical formula can be considered a key.

Having a single key will allow secret communications between two or more persons and thus keep data secure. Every person will receive a key and will be able to use it to decrypt any incoming messages. These are called symmetric-key algorithms and basically allows communication between multiple parties, where each member will be able to read all messages.

The biggest problem of using a single key is that the key might fall into the wrong hands, thus allowing more people access to the data than originally intended. That makes the use of a single key more dangerous in the long run but it is still practical for smaller sessions between multiple groups, as long as each member has a secure access to the proper key. And the key needs to be replaced often.

A single key could be used by chat applications where several people will join the chat. They would all retrieve a key from a central environment and thus be able to read all messages. But you should not store the information for a long time.

A single key can also be used to store sensitive data into a database, since you would only need a single key to read the data.

A more popular solution is an asymmetric-key algorithm or public-key algorithm. Here, you will have two keys, where you keep the private (master) key and give others the public key. The advantage of this system is that you can both encrypt and decrypt data with one of the two keys, but you can’t use the same key to reverse that action again. Thus it is very useful to send data into a single direction. Thus the private key encrypts data and you would need the public key to decrypt it. Or the public key encrypts data and you would need the private key to decrypt it.

Using two keys thus limits communication to a central hub and a group of people. Everything needs to be sent to the central hub and from there it can be broadcasted to the others. For a chat application it would be less useful since it means the central hub has to do more tasks. It needs to continuously decrypt and encrypt data, even if the hub doesn’t need to know the content of this data.

For things like email and secure web pages, two keys is practical, though. The mail or web server would give the public key to anyone who wants to connect to it so they can encrypt sensitive data before sending it to the server. And only the server can read it by using the private key. The server can then use the private key to encrypt new data and send it to the visitor, who will use the public key to decrypt the message again. Thus, you have secure communications between two parties.

Both methods have some very secure algorithms but also some drawbacks. Using a single key is risky if that key falls into the wrong hands. One way to solve this is by sending the single key using a two-key algorithm to the other side! That way, it is transferred in a secure way, as long as the key used by the receiver is secret enough. In general, that key should need to be a private key so only the recipient can read the single-key you’ve sent.

A single key is also useful when encrypting files and data inside databases since it would only require one key for both actions. Again, you would need to store the key in a secure way, which would again use a two-key algorithm. You would use a private key to encrypt the single key and include a public key in your application to decrypt this data again. You would also use that public key inside your applications only but it would allow you to use a single public key in multiple applications for access to the same data.

As I said, you need to limit access to data as much as possible. This generally means that you will be using various different keys for various purposes. Right now, many different encryption algorithms are already in use but most developers don’t even know if the algorithm they use is symmetrical or asymmetrical. Or maybe even a combination of both.

Algorithms like AES, Blowfish and RC4 are actually using a single key while systems like SSH, PGP and TLS are two-key algorithms. Single-key algorithms are often used for long-term storage of data, but the key would have additional security to avoid easy access to it. Two-key algorithms are often used for message systems, broadcasts and other forms of communication because it is meant to go into a single direction. You don’t want an application to store both a private key and matching public key because it makes encryption a bit more complex and would provide a hacker a way to get the complete pair.

And as I said, a single key allows easier communications between multiple participants without the need for a central hub to translate all messages. All the hub needs to do is create a symmetrical key and provide it to all participants so they can communicate with each other without even bothering the central hub. And once the key is deleted, no one would even be able to read this data anymore, thus destroying almost all traces of the data.

So, what solution would be best for your project? Well, for communications you have to decide if you use a central hub or not. The central hub could archive it all if it stays involved in all communications, but you might not always want this. If you can provide a single key to all participants then the hub won’t be needed afterwards.

For communications in one single direction, a two-key algorithm would be better, though. Both sides would send their public key to the other side and use this public key to send messages, which can only be decrypted by the private key which only one party has. It does mean that you actually have four keys, though. Two private keys and two public keys. But it happens to be very secure.

For data storage, using a single key is generally more practical, since applications will need this key to read the data. But this single key should be considered to be sensitive thus you need to encrypt it with a private key and use a public key as part of your application to decrypt the original key again.

In general, you should use encryption whenever you need to store sensitive data in a way that you can also retrieve it again. This is true for most data, but not always.

In the next part, I will explain hashing and why we use it.

The need of security, part 1 of 3.

Azra Yilmaz Poses I

Enter a caption

Of all the things developers have to handle, security tends to be a very important one. However, no one really likes security and we rather live in a society where you can leave your home while keeping your front door open. We generally don’t want to deal with security because it’s a nuisance!

The reality? We lock our doors, afraid that someone gets inside and steal things. Or worse, waits for us to return to kill us. We need it to protect ourselves since we’re living in a world where a few people have very bad intentions.  And we hate it because security costs money, since someone has to pay for the lock. And it takes time to use it, because locking and unlocking a door is still an extra action you need to take.

And when you’re developing software, you generally have the same problem! Security costs money and slows things down a bit. And it is also hard to explain to a client why they have to pay for security and why the security has to cost so much. Clients want the cheapest locks, yet expect their stuff is as safe as Fort Knox or even better.

The worse part of all security measures is that it’s never able to keep everyone out. A lock on your door won’t help if you still leave the window open. And if the window is locked, it is still glass that can be broken. The door can be kicked in too. There are always a lot of ways for the Bad People to get inside so what use is security anyways?

Well, the answer is simple: to slow down any would-be attacker so he can be detected and dealt with, and to make the break-in more expensive than the value of the loot stored inside. The latter means that the more valuable the loot is, the stronger your security needs to be. Fort Knox contains very valuable materials so it has a very strong security system with camera’s and lots of armed guards and extremely thick walls.

So, how does this all translate to software? Well, simple. The data is basically the loot that people are trying to get at. Legally, data isn’t property or doesn’t even has much legal protection so it can’t be stolen. However, data can be copyrighted or it can contain personal information about people. Or, in some cases, the data happens to be secrets that should not be exposed to the outside world. Examples of these three would be digital artwork, your name and bank account number or the formula for a deadly poison that can be made from basic household items.

Of all this data, copyrighted material is the most common item to protect, and this protection is made harder because this material is meant to be distributed. The movie and music industry is having a very hard time protecting all copyrighted material that they have and the same applies to photographers and other graphical artists. But also software developers. The main problem is that you want to distribute a product in return for payment and people are getting it without paying you. You could consider this lost profit, although if people had no option but pay for your product, they might not have wanted it in the first place. So the profit loss is hard to prove.

To protect this kind of material you will generally need some application that can handle the data that you’re publishing. For software, this would be easy because you would include additional code to your project that will check if the software has been legally installed or not. Often, this includes a serial number and additional license information and nowadays it tends to include calling a special web server to check if licenses are still valid.

For music and films, you can use a technique called DRM which works together with proper media players to make additional checks to see if the media copy happens to be from a legal source or not. But it would limit the use of your media to media players that support your DRM methods. And to get media players to support your DRM methods, you need to publish those methods and hope they’re secure enough. But DRM has already been bypassed by hackers many times so it has proven to be not as effective as people hoped.

And then there’s a simpler option. Add a copyright notice to the media. This is the main solution for artwork anyways, since there’s no DRM for just graphic images. You might make the image part of an executable but then you have to build your own picture viewer and users won’t be able to use your image. Not many people want to just see images, unless it is pornography. So you will have to support the basic image file formats, which are generally .JPG or .PNG for any image on the Internet. Or .GIF for animations. And you protect them by adding a warning in the form of a copyright notice. Thus, if someone is misusing your artwork and you discover the use of your art without a proper license, then you can start legal actions against the violator and claim damages. This would start by sending a bill and if they don’t pay, go to court and have a judge force them to pay.

But media like films, music and images tend to be hard to protect and often require going to court to protect your intellectual property. And you won’t always win such cases either.

Next on the list is sensitive, personal information. Things like usernames and passwords, for example. One important rule to remember is that usernames should always be encrypted and passwords should always be hashed. These are two different techniques to protect data and will be explained in the next parts.

But there is more sensitive data that might need to be stored and which would be valuable. An email address could be misused to spam people so that needs to be encrypted. Name, address and phone numbers can be used to look up people and annoy those people by ordering stuff all over the Internet and have it sent to their address. Or to make fake address changes to change their address to somewhere else, so they won’t receive any mail or other services. Or even to visit the address, wait until the people left the house and then break in. And what has happened in the past with addresses of young children is that a child molester learns of their address and goes to visit them to rape and/or kidnap them. So, this information is also sensitive and needs to be encrypted.

Other important information would be bank account information, medical data and employment history would be sensitive enough to have encrypted. Order information from visitors might also be sensitive if the items were expensive since those items would become interesting things to steal. You should basically evaluate every piece of information to determine if it needs to be encrypted or not. In case of doubts, encrypt it just to make it more secure.

Do keep in mind that you can often generate all kinds of reports about this personal data. A simple address list of all your customers, for example. Or the complete medical file of a patient. These documents are sensitive too and need to be protected, but they’re also just basic media like films and artwork so copies of those reports are hard to protect and often not protected by copyrights. So be very careful with report generators and have report contain warnings about how sensitive the data in it actually is. Also useful is to have a cover page included as the first page of a report, in case people will print it. The cover page would thus cover the content if the user keeps it closed. It’s not much protection but all small bits are useful and a cover page prevents easy reading by passer-by’s of the top page of the report.

Personal information is generally protected by privacy laws and thus misuse of personal information is often considered a criminal offense. This is unlike copyright violations, which are just civil offenses in general. But if you happen to be a source of leaking personal information, you and your company could be considered guilty of the same offense and will probably be forced to pay for damages and sometimes a large fine in case of clear negligence in protecting this data.

The last part of sensitive data tends to be ideas, trade secrets and more. In general, these are just media files like reports and thus hard to protect, although there are systems that could store specific data as personal data so you can limit access to it. Ideas and other similar data are often not copyrightable. You can’t get copyright on an idea. You can only get copyright over the document that explains your idea but anyone who hears about your idea can just use it. So if you find a solution for cleaner energy, anyone else could basically build your idea into something working and make profits from it without providing you any compensation. They don’t even have to say it was your idea!

Still, to protect ideas you can use a patent, which you will have to register in many countries just to protect your idea everywhere. Patents become open to the public so everyone will know about it and be able to use it, but they will need to compensate you for using your idea. And you can basically set any price you like. This system tends to be used by patent trolls in general, since they describe very generic ideas and then go after anyone who seems to use something very similar to their idea. They often claim an amount of damages that would be lower than the legal amount it would cost the accused to fight back, so they tend to get paid for this trolling. This is why many are calling for patent reforms to stop these patent trolls from abusing the system.

So, ideas are very sensitive. You generally don’t want to share them with the generic public since it would allow others to implement your ideas. Patents are a bit expensive and not always easy to protect. And you can’t patent everything anyways. Some patents will be refused because they’ve already been patented before. And yet you still need to share them with others so you can build the idea into a project. And for this, you would use a non-disclosure agreement or NDA.

An NDA is basically a contract to make sure you can share your idea with others and they won’t be allowed to share it with more people without your permission. And if your idea does get leaked, those others would have to compensate your financial losses due to leakage as mentioned in the NDA contract. It’s not very secure but it generally does prevent people from leaking your ideas.

Well, except for possible whistleblowers who might leak information about any illegal or immoral parts of your idea. For example, if your idea happens to be to blow up the subway in Amsterdam and have an NDA with a few other terrorists to help you then it becomes difficult when one of those others just walks to the police to report you and those who help you. The NDA just happens to be a contract and can be invalidated for many reasons, including the more obvious criminal actions that would relate to it.

But there are also so-called blacklists of things you can’t force in an NDA, depending on the country where you live. It is just a contract and thus handled by the Civil courts. And if the NDA violates the rights of those who sign it then it could be invalid. One such thing would be the right of free speech, where you would ban people from even discussing if your idea happens to be good or not.

Other sensitive information would be things like instructions on how to make explosives or business information about the future plans of Intel, which could influence the stock market. Some of this information could get you into deep trouble, including the Civil Court or Criminal Court as part of your troubles, resulting in fines and possibly imprisonment if they are leaked.

In general, sensitive information isn’t meant to be shared with lots of people so you should seriously limit access to such information. It should not be printed and you should not email this information either. The most secure location for this information would be on a computer with no internet connection but having a strong firewall that blocks most access methods would be good enough for many purposes.

So we have media, which is hard to protect because it is meant to be published. We have sensitive data which should not fall in the wrong hands for various reasons and we have personal data, which is basically a special case of sensitive data that relates to people and thus has additional laws as protection.

And the way to secure it is by posting warnings and limiting access to the data, which is difficult if it was meant to be published. But for those data that we want to keep private, we have two ways of protecting it next to limiting the physical access to this data.

To keep things private, you will need to have user accounts with passwords or other security keys to lock the data and limit access to it. And these user accounts are already sensitive data so you should start with protecting it here, already.

Of all the things software developers do, security happens to be the most complex and expensive part, since it doesn’t provide any returns on investments made. All it does is try to provide assurance that data will only be available for those who are meant to use it.

The two ways to protect data is through encryption and through hashing, which are two similar things, yet also differ in their purpose. I will discuss both in my next posts.

Four models on Shapeways (NSFW)

I like Shapeways since you can upload your own 3D designs and end up with a 3D printed model. This allows me to e.g. create custom boxes for small hardware experiments. These boxes are combined with my Poser models and will thus result in very interesting designs. But like everything with 3D, you will have to do some experiments first. I created three new models in Poser named Nora, Tommi and Cassiopa and I used some interesting trick to create a special rack to include in the pose. But first, let’s look at Nora:

WIN_20151026_102324 WIN_20151026_102455

Nora was printed in two versions: White plastic and Colored sandstone. And in both models a few flaws were already visible. Nora’s shoes were made of a very thin material and the upload to Shapeways did a repair that removed the very thin parts. As a result, the shoes are flawed.

WIN_20151026_102331 WIN_20151026_102500

Well, a bit of glue and plastic can fix that. But her fingers were also a bit delicate and the sandstone version ended up with broken fingers because the fingers are actually too thin. Again, some glue and they’re back in place.

WIN_20151026_102352 WIN_20151026_102505

Her thumb is still missing, though. Then again, I was more interested in checking how well the 3D printer handles holes, like the area where she keeps her left hand. In front of her genitals, to keep it decent, yet far away so it doesn’t touch. Combined with the position of her legs, this results in a complex hole to print but it ended up flawless. Even her left hand was intact.

WIN_20151026_102409 WIN_20151026_102511

So, what I’ve learned from Nora is that thin elements like fingers and shoes won’t print very well. White plastic does a better job than sandstone, though. That’s because sandstone needs further processing after the printing is done, which requires some manual labour. Thus, small parts can end up being damaged.

Another part that’s important with the sandstone version is the textures. For this, I will check her face:

WIN_20151026_102344 WIN_20151026_102524

And in case you’re wondering why her hair is covered by a towel, well… Hair really doesn’t print very well. It tends to generate loose shells or often to parts that are too thin to print. Besides, the towel makes her look as if she’s just out of bath, relaxing.

The White plastic versions shows a reasonable amount of details in her face. Even her open mouth is printed quite nicely. The sandstone model also has an open mouth and you might see her tongue and teeth if you look inside with a microscope. But I’m more looking at her face and eyes.

Printing in colored sandstone has an ink density of about 50 DPI. Normally, a printer would print at 300 DPI so the colors will lose details. But I chose a light-colored iris and Nora has good-looking pupils in this print. Which is important to remember, since dark eye colors might darken the whole eye. It still looks good in my opinion. At least better than what I can do with paint and a brush.


The next model is Cassiopa. Since I know that thin parts won’t print well, I’ve placed her on a towel, hoping for a better result. The result is okay but the sandstone version did not survive the print because the towel was too thin. So I uploaded a newer version of Cassiopa on a more solid floor and in this version, I also adjusted her clothing. Why? Because I need to test more than just panties on topless women. Still, the white plastic version looks okay, although it is a bit small:

WIN_20151026_102632 WIN_20151026_102652 WIN_20151026_102700 WIN_20151026_102817

The model was almost 15 CM long, but that’s the length of the towel. Cassiopa uses only 2/3rd of this length, thus she’s smaller than my other models. (This also happens with one of my Tommi models.) Smaller means that fewer details will be visible but it is still detailed enough.

The towel she’s on has a hole in it, which is too bad but I’m not too worried about it. I now know that I can’t use these kinds of thin plateaus for my models to rest upon. In the sandstone version, the towel had crumbled away.


The last model is Tommi which I’ve combined with a rack. I made a second version of Tommi climbing this rack but Tommi herself becomes small if you do this, thus losing details. Let’s look at the climbing version first:

WIN_20151026_103142 WIN_20151026_103151 WIN_20151026_103216 WIN_20151026_103233 WIN_20151026_103242 WIN_20151026_103256

I gave Tommi a skirt instead of panties so you should have been able to look up her skirt. However, Shapeways repairs this automatically and as a result, the skirt became solid. And that’s a flaw in the skirt model.

This is a colored print so her texture helps to add details, but she’s too small to be very clear in details. She did have a flaw in her right hand, since her fingers were too thin and either did not get printed or broke off afterwards. A bit of paint will fix that, though. It is just something to remember.

So, remember: make sure thin parts are well-supported and preferably resting against something else and with clothes, be aware that Shapeways might fill in specific areas that you’ve hoped would stay hollow. In this case her skirt but I also tried another interesting top on Tommi but that added a white mass over her breasts since Shapeways was filling the area between the left and right cup.

Next, the bigger version of Tommi with her resting upon the rack. That one was perfect, although one of the legs from the rack had broken off during transport. So, even if a part is thick enough to print, it might still be very vulnerable. With a length of over 4 CM, they can’t handle a lot of stress. Still, this model is great with no broken appendices and even her toenails are visible!

WIN_20151026_102904 WIN_20151026_102914 WIN_20151026_102958 WIN_20151026_103102

Well, at least I glued the leg back in place. I might decide to remove all four instead, though, if I fear they will break again. This model happens to be quite heavy too, which makes sense since she has the biggest volume of all. Her eyes are nicely detailed and her skin color even has some variation around her knees. And you can see her toenails! A bigger model is nice in that regard so if your model has a lot of fine details, have it printed in a larger scale! Although the price will scale up too, since more materials will be required.

Well, these three models all look reasonable well and taught me what I need to know about printing Poser models: use a reasonable large-scale, support all small parts and be aware that hollow spaces might end up being filled with extra material because Shapeways “repairs” some thin materials.

I kept these models mostly undressed because I know the textures of these models and needed to see how the color printing will support the texture details. Also, it is difficult to find Poser clothing models that are working well when uploaded to Shapeways. These models are not made to be printed in 3D but to be rendered. So finding good clothes to print is difficult. For Victoria 4, her bikini top and bottom do print quite well, though. They too are filled up, but the filling it towards the body of the model and not between both cups.

Another problem is the limitations on models set by Shapeways. There’s a size limit and there’s a polygon limit. (64 MB or 1 million polygons.) Poser models can easily go over this amount of polygons so you will have to find a way to reduce those, while keeping textures intact.


And then there’s the rack used by both models. The rack is the same length for both and I’ve created it myself by using the Firestorm viewer with the Second Life virtual worlds, but I could have used my own OpenSim world too. I just joined several cylinders for the rounded sides and balls for the rounded corners to build the framework. I also created a square plane with a hole inside, which I copied three times and put next to one another. I then exported the whole model from the SL viewer to a Collada file, which I imported in AccuTrans 3D to clean it up a bit and to reduce the complexity of it. (For example, by merging all parts into one single part.)

And then I checked if the rack has enough space for other hardware.

WIN_20151026_104131 WIN_20151026_104323

Well, the rack isn’t wide enough for an Arduino board

Since I copied the square plane three times, I had expected all holes to have the same size. And the rack was made so I can add some hardware in the empty rack space and have some wires or other parts move through the open holes to e.g. shine a LED light on the model. So, I was surprised when I discovered that the middle hole was slightly bigger than the other two. Which I discovered by trying to fit an Arduino-board. (The YUN is shown in the picture.) The length is long enough for the Arduino Mega but it will have a few millimeters on the sides of the rack. The pins are actually at the exact location of the long bars. So you could actually put an Arduino in the rack if you don’t mind the width.

But smaller devices like the Arduino Mini, the Trinket, the NetDuino mini and the Digispark have plenty of room inside the rack.

But back to the holes!

WIN_20151026_103544WIN_20151026_103535 WIN_20151026_103705 WIN_20151026_103853 WIN_20151026_103920 WIN_20151026_104434

Using the climbing Tommi version, I used to try a green LED. It doesn’t fit the top or bottom hole but it does fit the middle hole. Trying it again with a regular lamp of 5 MM diameter, I see it going through the middle hole without effort but the top and bottom ones don’t fit. A laser light won’t even fit the middle hole, though.

The conclusion is that these holes are a bit too small for LED lights. No problem, since I can take a drill bit and make them wider. Still, I had hoped they would be big enough for a LED light. So I have to redo my calculations. And I have to wonder why the middle hole is bigger than the other two, while they’re basically all the same in my 3D software.

Anyway, I now have two great models for containing some of my experimental hardware. I know the racks are open so the hardware would be exposed but that’s something I will solve with a next version of my rack. I also know how thin the walls can be and how thin the walls of my rack are. I can still have the rounded areas but the rack should get more solid walls. Thin walls too, since the rack has a lot of volume.

Next, the question what I would like to create with these models. Whatever I think of should match the model. The three holes in the rack are meant for lights, cables, buttons or something else but I don’t want to show too much hardware on the model side of the rack. I also need to find a solution to attach the additional hardware to the rack, since it doesn’t have any special pins or whatever to hold them. Then again, these models were created to see how well these racks would print. The different hole size was a surprise for me which I need to include in my calculations.

And the three rack-less models? They’re just nice desk ornaments.I have ordered more prints so I will likely have more ornaments soon.

My next designs will have better racks, preferably with extra points to hold my hardware in place. The sandstone prints still look great but I have to consider the size of the whole thing. And I will need to experiment with clothing, to see which items will print best. The same is true with hair, since I still have to find hair that prints well in 3D.

All in all, 3D printing is a very interesting challenge. Slightly expensive too, though.

An example of bad development…

I recently received an email from a company that’s doing questionnaires. And well, I subscribed to this and did some of their questionnaires before, so I wanted to do this new one too. Unfortunately, the page loaded quite slow, only to return a very nasty error message. A message that told me that this organisation is using amateurs for developers and administrators.

Let me be clear about one thing: errors will happen. Every developer should expect weird things to happen, but this case is not an error but evidence of amateurs. So, let’s start with analyzing the message…

Server Error in ‘/’ Application.

Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

So, what’s wrong with this? Users should never see these messages! When you develop in ASP.NET you can just tell the system to just keep these error messages only when the user is connected locally. A remote user should see a much simpler message.

This is something the administrator of the website should have known, and checked. He did not. By failing at this simple configuration setting the organisation is leaking some sensitive information about their website. Information that’s enough for me to convince they’re amateurs.

This error is also a quite common error message. Basically, it’s telling me that the system is having too many database connection open. One common cause for this error is when the code fails to close a connection after opening them. Keep that in mind, because I will show that this is what caused the error…

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

This is a standard follow-up message. The fact that users of the site would see this stack trace too is just bad.

Exception Details: System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

A timeout error. A reference to the connection pool and the max pool size. This already indicates that there are more connections are opened than closed and the system can’t handle that correctly. There are frameworks for .NET that are better suited for this to prevent these kinds of errors. That’s because these errors happened to be very common with ASP.NET applications. And with generic database applications written in .NET.

Basically, the top of the error message is just repeating itself. Blame Microsoft for that since this is a generic message from ASP.NET itself. Developers can change the way it looks but that’s not very common. Actually, developers should prevent users from seeing these kinds of messages to begin with. Preferably, the error should be caught by an exception handler which would write it to a log file or database and send an alert out to the administrator.

Considering that I received this error on a Friday afternoon, I bet the developer and administrators are already back home, watching television like I do now. Law & Order is just on…

Source Error:

Line 1578:
Line 1579:        cmSQL = New SqlCommand(strSQL, cnSQLconfig)
Line 1580:        cnSQLconfig.Open()
Line 1581:
Line 1582:        Try

This is interesting… The use of SqlCommand is a bit old-fashioned. Modern developers would have switched to e.g. the Entity Framework or other, more modern solution for database access. But the developers of this site are just connecting to the database in code, probably to execute a query and collect the data and then should close the connection again. The developers are clearly using ADO.NET for this site. And I can’t help but wonder why. They could have used more modern techniques instead. But probably they just need to keep up an existing site and aren’t they allowed to use more modern solutions.

But it seems to me that closing the database is not going to happen here. There are too many connections already open thus this red line of code fails. The code has an existing connection called cnSQLConfig which is already open. It then tries to open and execute an SQL command that fails. Unfortunately, opening that command happens outside a try-except block and if this fails, it is very likely that the connection won’t be closed either.

If this happens once or twice, then it still would not be a big problem. The connection pool is big enough. But here it just happened too often.

Another problem is that the ADO.NET technique used here is also vulnerable for SQL Injection. This would also be a good reason to use a different framework for database access. It could still be that they’re using secure code to protect against this but what I see here doesn’t give me much confidence.

Source File: E:\wwwroot\beta.example.com\index.aspx.vb    Line: 1580

A few interesting, other facts. First of all, the code was written in Visual Basic. That was already clear from the code but this just confirms it. Personally, I prefer C# over Visual Basic, even though I’ve developed in both myself. And in a few other languages. Language should not matter much, especially with .NET, but C# is often considered more professional than BASIC. (Because the ‘B’ in BASIC stands for ‘Beginners’.)

Second of all, this piece of code has over 1580 lines of code. I don’t know what the rest of the code is doing but it’s probably a lot of code. Again, this is an old-fashioned way of software development. Nowadays, you see more usage of frameworks that allow developers to write a lot less code. This makes code more readable. Even in a main index of a web site, the amount of code should be reasonable low. You can use views to display the pages, models to handle the data and controllers to connect both.

Yes, that’s Model-View-Controller, or MVC. A technique that’s practical in reducing the amount of code, if used well enough.

And one more thing is strange. While I replaced the name of the site with ‘example.com’, I kept the word ‘beta’ in front of it. I, a user, am using a beta-version of their website! That’s bad. Users should not be used as testers because it will scare them off when things go wrong. Like in this case, where the error might even last the whole weekend because developers and administrators are at home, enjoying their weekend.

Never let users use your beta versions! That’s what testers are for. You can ask users to become testers, but then users know they can expect errors like these.

Stack Trace:

[InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.]
   System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) +4863482
   System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) +117
   System.Data.SqlClient.SqlConnection.Open() +122
   _Default.XmlLangCountry(String FileName) in E:\wwwroot\beta.example.com\index.aspx.vb:1580
   _Default.selectCountry() in E:\wwwroot\beta.example.com\index.aspx.vb:1706
   _Default.Page_Load(Object sender, EventArgs e) in E:\wwwroot\beta.example.com\index.aspx.vb:251
   System.Web.UI.Control.OnLoad(EventArgs e) +99
   System.Web.UI.Control.LoadRecursive() +50
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

And that’s the stack trace. We see the site loading its controls and resources and the ‘Page_Load’ method is called at line 251. At line 1706 the system is apparently loading country-information which would be needed to set the proper language. Then it returns to line 1580 where it probably opens some table based on information from the language file.

Again, this is a lot of code for basically loading the main page. I even wonder why it needs to load data from the database based on the country information. Then again, I was about to fill in a questionnaire so it probably wanted to load the questionnaire in the proper language. If the questionnaire is multi-lingual then that would make sense.

Version Information: Microsoft .NET Framework Version:2.0.50727.3655; ASP.NET Version:2.0.50727.3658

And here’s one more bad thing. This site still uses .NET version 2.0 while the modern version is 4.5 and we’re close to version 5.0… It would not surprise me if these developers still use Visual Studio 2005 or 2008 for this all. If that’s the case then their budget for development is probably quite low. I wonder if the developers who are maintaining this site are even experts at software development. It’s not a lot of information that I can base this upon but in short:

  • The administrator did not prevent error messages to show up for users.
  • The use of ADO.NET adds vulnerabilities related to the connection pool and SQL injection.
  • The use of VB.NET is generally associated to less experienced developers.
  • The amount of code is quite long but common for sites that are developed years ago.
  • Not using a more modern framework makes the site more vulnerable.
  • Country information seems to be stored in XML while the questionnaire is most likely stored inside the database.
  • The .NET version has been out-of-date for a few years now.

My advice would be to just rewrite the whole site from scratch. Use the Entity Framework for the database and MVC 4 for the site itself. Rewrite it in C# and hire more professional developers to do the work.

A very generic datamodel.

I’ve come up with several projects in the past and a few have been mentioned here before. For example, the Garagesale project which was based on a system I called “CART”. Or the WordChain project that was a bit similar in structure. And because those similarities, I’ve been thinking about a very generic datamodel that should be handled to almost any project.

The advantage of a generic database is that you can focus on the business layer while you don’t need to change much in the database itself. The datamodel would still need development but by using the existing model, mapping to existing entities, you could keep it all very simple. And it resulted in this Datamodel:ClassDiagram(Click the image to see a bigger version.)

The top class is ‘Identifier’ which is just an ID of type GUID to find the records. Which will work fine in derived classes too. Since I’m using the Entity Framework 6 I can just use POCO to keep it all very simple. All I have to do is define a DBContext that tells me which tables (classes) I want. If I don’t create an entry for ‘Identifier’, the table won’t be created either.

The next class is the ‘DataContent’ class, which can hold any XML. That way, this class can contain all information that I define in code without the need to create new tables. I also linked it to a ‘DataTemplate’ class which can be used to validate the content of the XML with an XML schema or special style sheet. (I still need to work out how, exactly.) The template can be used to validate the data inside the content.

The ‘BaseItem’ and ‘BaseLink’ classes are the more important here. ‘BaseItem’ contains all fixed data within my system. In the CART system, this would be the catalog. And ‘BaseLink’ defines transactions of a specific item from one item to another. And that’s basically three-fourth of the CART system. (The template is already defined in the ‘DataTemplate’ class.)

I also created two separate link types. One to deal with fixed numbers which is called ‘CountLink’ which you generally use for items. (One cup, two girls, etc.) The other is for fractional numbers like weights or money and is called ‘AmountLink’. These two transaction types will be the most used transaction types, although ‘BaseLink’ can be used to transfer unique items. Derived links could be created to support more special situations but I can’t think of any.

The ‘BaseItems’ class will be used to derive more special items. These special items will define the relations with other items in the system. The simplest of them being the ‘ChildItem’ class that will define more information related to a specific item. They are strongly linked to the parent item, like wheels on a car or keys on a keyboard.

The ‘Relation’ class is used to group multiple items together. For example, we can have ‘Books’ defined as relation with multiple book items linked to it. A second group called ‘Possessions’ could also be created to contain all things I own. Items that would be in both groups would be what is in my personal library.

A special relation type is ‘Property’ which indicates that all items in the relation are owned by a specific owner. No matter what happens with those items, their owner stays the same. Such a property could e.g. be a bank account with a bank as owner. Even though customers use such accounts, the account itself could not be transferred to some other bank.

But the ‘Asset’ class is more interesting since assets are the only items that we can transfer. Any transaction will be about an asset moving from one item to another. Assets can still be anything and this class doesn’t differ much from the ‘BaseItem’ class.

A special asset is a contract. Contracts have a special purpose in transactions. Transactions are always between an item and a contract. Either you put an asset into a contract or extract it from a contract. And contracts themselves can be part of bigger contracts. By checking how much has been sent or received to a contract you can check if all transactions combined are valid. Transactions will have to specify if they’re sending items to the contract or receiving them from the contract.

The ‘BaseContract’ class is the more generic contract type and manages a list of transactions. When it has several transactions, it is important that there are no more ‘phantom items’. (A phantom item would be something that’s sent to the contract but not received by another item, or vice versa.) These contracts will need to be balanced as a check to see if they can be closed or not. They should be temporary and last from the first transaction to the last.

The ‘Contract’ type derived from ‘BaseContract’ contains an extra owner. This owner will be the one who owns any phantom items in the contract. This reduces the amount of transactions and makes the contract everlasting. (Although it can still be closed.) Balancing these contracts is not required, making them ideal as e.g. bank accounts.

Yes, it’s a bit more advanced than my earlier CART system but I’ve considered how I could use this for various projects that I have in mind. Not just the GarageSale project, but also a simple banking application, a chess notation application, a project to keep track of sugar measurements for people with diabetics and my WordChain application.

The banking application would be interesting. It would start with two ‘Relation’ records: “Banks” and “Clients”. The Banks relation would contain Bank records with information of multiple banks. The Clients relation would contain the client records for those banks. And because of the datamodel, clients can have multiple banks.

Banks would be owners of bank accounts, and those accounts would be contracts. All the bank needs to do is keep track of all money going in our out the account. (Making money just another item and all transactions will be of type ‘AmountLink’.) But to link those accounts to the persons who are authorized to receive money from the account, each account would need to be owner of a Property record. The property record then has a list of clients authorized to manage the account.

And we will need six different methods to create transactions. Authorized clients can add or withdraw money from the account. Other clients can send or receive payments from the account, where any money received from the contract needs to be authorized. Finally, the bank would like to have interest, or pays interest. (Or not.) These interest transactions don’t need authorization from the client.

The Chess Notation project would also be interesting. It would start with a Board item and 64 squares items plus a bunch of pieces assets. The game itself would be a basic contract without owner. The Game contract would contain a collection of transactions transferring all pieces to their first locations. A collection of ‘Move’ contracts would also be needed where the Game Contract owns them. The Move would show which move it is (including branches of the game) and the transactions that take place on the board. (White Rook gone from A1, White Rook added to A4 and Black pawn removed from A4, which translates into rook takes pawn at A4.)

It would be a very complex way to store a chess game, but it can be done in the same datamodel as my banking application.

With the diabetes project, each transaction would be a measurement. The contract would be owned by the person who is measuring his or her blood and we don’t need to send or receive these measurements, just link them to the contract.

The WordChain project would be a bit more complex. It would be a bunch of items with relations, properties and children. Contracts and assets would be used to support updates to the texts with every edit of a WordChain item kicking the old item out of the contract and adding a new item into the contract. That would result in a contract per word in the database.

A lot of work is still required to make sure it works as well as I expect. It would not be the most ideal datamodel for all these projects but it helps me to focus more on the business layer and the GUI without worrying about any database changes. Once the business model becomes more advanced, I could create a second data layer with a better datamodel to improve the performance of the data management.

 

 

 

Great photography, licensed or self-made…

The Internet has become extremely important in our daily lives. And more importantly, the Internet requires many developers to think more graphically. Twenty-five years ago, computers were mostly text-based with some little graphics. The Internet was about to be born and graphics was mostly restricted to small icons and images with a limited amount of colors. When you were lucky, your graphics card would be a VGA card, able to handle images with 256 colors at resolutions of 640×480 pixels. A need for a graphic standard was required back then and a few new formats were born.

The PCX format, created by the now-defunct Zsoft Corporation, turned out reasonable successful because it supported up to 256 colors with an extra color palette that allowed the selection of 256 colors from any of the true-color images. It also supported data compression, making it reasonable small. Yet the decompression method was pretty fast, thus the processor would not need to work hard to display the image.

The PCX format has extended to true-color more recently but the JPG format turned out to be a better format. Since processors started to improve their performance, the more complex compression of the JPG format was fast enough to use and resulted in smaller files, although the images would lose some details.

Another popular format was the GIF format, that allowed images with 255 colors plus a transparent layer. (Or 256 colors without transparency.) This format is still popular since it’s great for logos and cartoons and it allows animations. And the compression of GIF files would reduce the image considerably in size without losing any details.

The PNG format has become more popular and was created as successor of the GIF format. It was needed because modern graphics required more colors and there was a demand for a better transparency layer. The PNG format uses 24-bits or 48-bits for its colors allowing more colors than the human eye can detect, plus an alpha channel (24-bits only) allowing images to define the transparency level of each pixel to be anything between transparent and opaque. This was great to e.g. create dirty glass windows or thin, silk nightgowns as graphics.

There are, of course, many other graphic formats but I want to talk about art, not formats. And this time, I want to talk about Pavel Kiselev, also known as photoport (NSFW), who likes to create glamorous pictures of pretty women. Today, he posted this picture of Irene, of one of his models. (I’ve licensed it for personal use, and this is my personal blog so it should be okay.)

IreneAnd this is the kind of photography that I love to see. Should I say more?

Well, okay… I do have to keep in mind that I wanted to relate this to software development so I should not distract myself by continuously looking in those pretty eyes. 🙂 So, back to the software development part…

When you’re designing websites, you have to keep in mind that you will need a lot of graphics. Something simple like an icon to display in the browser is already a requirement these days, else people have some trouble finding your site among their favorites. They can, of course, read the labels in the menu but most people will glance over all icons first and clicking on the icon that they recognise as your icon. Without the icon, they have more trouble finding you so never forget to add a Favicon to your site! Something that people will easily recognize as your brand.

Next, your site will need a logo and a background image. Or at least a logo. The best logos are PNG or GIF images, because they are small and allow transparency. The image of Irene would be bad as logo since it’s big and has a lot of bytes. When people visit your site with a slow internet connection, it would just look bad if the logo takes too long to download. Thus, keep it small yet detailed enough to be recognisable.

The background image might be bigger, unless you’re designing websites for mobile devices. For mobile devices, no background image would be better since it will take less bandwidth. Many mobile devices are accessing the Internet through providers who charge by the megabytes of data sent or received. Thus, for mobile sites you need to keep the amount of data to an absolute minimum, else it becomes expensive to visit your mobile websites forcing visitors to stay away when they’re roaming around…

But a favicon, logo and background aren’t always enough. Let’s forget the mobile devices for now and focus on the regular browsers and users who pay a fixed price for their connection. Your website will probably offer some services to customers and you need them to easily recognise what they’re looking at. And these days, more and more people dislike reading descriptions and prefer to see something more graphical. You might consider hieroglyphs on your website but not many people are capable of reading ancient Egyptian. You you need your own set of icons and images for the most important actions on your website. Preferably icons with an extra label next to it.

Take a look at your browser and find the following buttons: Back, Next, Refresh and Home. Did you read some text to find them? Most likely, you found them by looking at the images. Arrows for back and next buttons, an arrow in a circle for the refresh button and a symbol of a house for the home button. Images that have become standard so make sure you have a few of your own to put on your own website. Especially when you want navigation buttons on your own site. However, do keep in mind that you either have to create these images yourself or get a proper license for the images created by someone else. Considering that many icons are already in the public domain or have been created under a Creative Commons license, it should be no big problem to find any for free.

Next, you will probably need images for the products that you want to sell or display. While Irene looks very pretty, I would not use it when I want to sell socks. I would use a picture of socks instead. And make sure I have licensed that picture or created it myself. Preferably, I would create multiple images at different sizes so I can display thumbnails first and a larger version if the user wants to see more details. Again, this would speed up loading your site.

It does create a bit of a challenge, though. Would you resize the image to a thumbnail dynamically or will you store the image as thumbnail and original format? Both have their advantages. Dynamic resizing will allow you to change the thumbnail size when you like and even allows you to create all kinds of custom sizes. However, your server will need more processing power to do the resizing, which is slow if your original images are created at huge resolutions. (Like most of my artwork.) If you’re expecting a lot of visitors, storing images at different sizes would improve performance considerably but will require more disk space, which could be a minor problem when you have your site hosted and have to pay for the storage per megabyte. Then again, hosts don’t charge much for extra disk space these days, if they’re even charging anything at all.

The image of Irene would be practical for dating sites and sites for bathing products. Her hair has a wet look, giving the impression that she just washed it. She also looks very seductive which would certainly attract attention of many men and probably a few women too. However, on dating sites the members would probably recognise her as a professional model and thus consider it a fake image. She’s too pretty to use a dating site. You’d probably scare a few members away if you would use this image. It would still look great for selling shampoo, though.

So, you’re designing a website and thus you will need images to fill it up. This is often the biggest problem for many companies. In many cases, developers will just use Google to find some image and copy it to the project, ignoring the need for any license. They have good reasons to work this way, because adding proper images isn’t a real task for developers. But it could cause legal troubles if the site is published and some photographer recognizes his images. Without a proper license, it could cost you hundreds of euros to correct the situation and that’s without any other legal costs. Thus it is really bad when developers have to search for the proper images themselves.

A better solution would be by creating placeholder images. Provide the developers with some dummy images that you’ve created yourself by adding a textual description to a newly created image at the preferred size. Make sure it has a proper filename too. This placeholder can then be used by the developer to insert in the proper location, allowing him to continue his work while you start to look for a nice image to replace this placeholder. This will allow time to get a proper license or to make it yourself. Once you’re about to publish the site, all you have to do is replace the placeholders with the images that you want to display.

One more, very important thing to remember. When you get a license for any image that you use, make sure that you keep track of the specific details of the license. It would be best if you have your own database where you can store the image with more information about where you’ve licensed the image, where you found the image and the license and the name of the author. You will need this information if the author or some company representing the author finds your image online and thinks you don’t have a proper license.

Of course, there’s a risk of having a fraudulent license. You might have gotten a license from someone pretending to be the author. This is a risk which you might avoid by keeping track of the origins of every image used by your organisation. And yes, it’s a lot of additional bookkeeping. With this information about where you got your license, you will have a good excuse to get away without any financial damages if the license turns out to be fraud. If you can continue to use the image will depend on the local legislation of the country where your organisation is located and the legislation of the country where your website is hosted.

My personal preference for images is to just create it myself. This takes time and I need opportunities to create those images. For CGI artwork, my computer is fast enough to render an image in the background while I continue to work on developing my sites. Still, I am limited to one image per computer at any time and my license for Vue limits me to using the software on just a single computer. Rendering can easily take a few hours, even days, so I have to be patient.

Of course, I could just take one of my digital cameras but that often means that I need a model, a place and the right weather if I’m going to take pictures outside. This is a lot of work for a bunch of images and I will need to do extra work on those photos once I’ve taken them. They need to be cropped, lighting needs to be adjusted, colors need to be enhanced. This is just too much work for a software developer to do. Thus, you’d better hire a professional to do this work if you don’t have someone in your organisation dedicated to this. Do make sure the photographer you hire will do a “Work for hire” so you’re the official author. Otherwise, the photographer will have influence on how you can use the photos he took!

So, organisations will have a complex task of maintaining licenses and their own images. A lot of organisations do tend to forget about these details which can result in costly problems. Make sure your developers will have something to work with while they are developing. Make sure they don’t have to waste time on those images themselves since developers are costly too. They should focus on the code, not the graphics themselves. Make sure someone in your organisation will manage all images and who is responsible for checking anything that’s about to be published for unknown images. If the image isn’t in the system maintained by the image manager, then you should block the publication until this is fixed.

Multithreading, multi-troubling.

Recently, I worked on a small project that needed to make a catalog of image files and folders on my hard disk and save this catalog in a database. Since my CGI and my photography hobby generated a lot of images, it would be practical to have something easy to support it all. Plenty of software that already does something like this, but none that I liked. Especially since I want to connect images to derived images, group them, tag them, share them, assign licenses to them and publish them. And I want to keep track of where I’ve shared them already. Are they on Flickr? CafePress? DeviantArt? Plus, I wanted to know if they should be rated as adult. Some of my CGI artwork is naughty by nature (because nude models are easier to work with) and thus unsuitable for a broad audience.

But for this simple catalog I just wanted to store the image folder, the image filename, an image name that would be the filename without extension and without diacritics, plus the width and height of the image so I could calculate the image ratio. To make it slightly more complex, the folder name would be a relative folder name based on a root folder that’s set in the configuration. This would allow me to move the images to a different folder or use the same database on a different machine without the need to adjust all records.

So, the database structure is simple. One table that has the folders, one table containing image ratios and one for the image names and sizes. The ratio table will help me to group images based on the ratio between width and height. The folder table would do the same for grouping by folder. The Entity Framework would help to connect to this database and take away a lot of my troubles. All I have to do now is write a simple library that would fill and keep up this catalog plus a console application to call those methods. Sounds simple enough.

Within 30 minutes, the first version was ready. I would first enumerate all folders below the source folder, then for each folder in that list I would collect all image files of type PNG, JPG and BMP. The folder would be written to the folder table and the file would be put in the Image table. Just one minor challenge, though…

I want to add the width and height of the image to the image table too, and based on the ratio between width and height, I would have to either add a new ratio record, or change an existing one. And this meant that I had to read every file into memory to find its size and then look if there’s already a ratio record related to it. If not, I would need to add the new ratio record and make sure the next request for ratio records would now include the new ratio record. Plus, I needed to check if the image and folder records also exist in the database, because this tool needs to update only for new images.

The performance was horrible, as could easily be predicted. Especially since I make images and photo’s at high resolutions, so reading those files does take dozens of milliseconds. No matter that my six cores at 3.5 GHz and 32 GB of RAM turns my system in a Speed Demon, these read actions are just slow. And I did it inefficiently since I have six cores but my code is just single-threaded. So, redo from start and this time do it multithreaded.

But multithreading and the Entity Framework don’t go well together. The database connection isn’t threadsafe and thus you cannot access the database methods from multiple threads. Besides, the ratio table could generate collisions when two images with the same, new ratio are processed. Both threads would notice the ratio doesn’t exist thus both would add it. But one of those would then fail because the other would have added it first. So I needed to change my approach.

So I Used ‘Parallel.ForEach’ to walk through the folder list and then again for all files within the folder. I would collect the data in internal lists and when the file loop was done, I would loop through all images and add those that didn’t exist. And yes, that improved performance a lot and kept the conflicts with the ratio table away. Too bad I was still reading all images but that was not a big issue.Performance went up from hours to slightly over one hour. Still slow.

So one more addition. I would first read all existing folders and images from the database and if a file existed in this list, I would not read it’s size anymore since it wasn’t needed. I could skip the image. As a result, it still took an hour the first time I imported all images, but the second run would finish within a minute, since there wasn’t anything left to read or add. The speed was limited to just reading the files and folders from the database and from the disk.

When you’re operating these kinds of projects in an Agile team and you’re scrumming around, things will slow down considerably if you haven’t thought about these challenges before you started the sprint to create the code. Since the first version looks quite simple, you might have planned it as a very short task and thus end up with extremely slow code. In the next sprint you would have to consider options to speed things up and thus you will realize that making it multithreaded is a bigger task. And while you are working on the multithreaded version, you might discover the conflicts with the Entity Framework plus the possible collisions within the tables. So the second sprint might end with a buggy but faster solution with lots of exception handling to catch all possible problems. The third sprint would then fix these, if you manage to find a better solution. Else, this problem might haunt you to the deadline of the project…

And this is where teams have to be real careful. The task sounds very simple, but it’s not. These things are easily underestimated by a team and should be well-planned before you start writing code. Experienced developers will detect these problems before they start, thus knowing that they should take their time and plan carefully without writing code immediately. (I only did it so I could write this post.) The task seems extremely simple and I managed to describe it in the second paragraph of this post with just three lines. But the solution with a high performance will require me to think before I start writing code.

My last approach is the most promising, though. And it can be done by using multithreading but it’s far more complex than you’d assume at first. And it will be memory-hungry because you need to create several lists in memory.

You would have to start with two threads. One thread will read the database and generate lists of files, folders and ratios. These lists must be completely in-memory because if you keep them as queryable lists, the system would try to continuously read them. Besides, once you’re done generating these lists you will want to close the database connection. This all tells you what you already have. The second thread will read all folders and by using parallel threads it would have to read all image files within those folders. But you would not read the image sizes yet, nor calculate all ratios.

When you’re done collecting the data, you will have to compare it all. You would start by comparing the lists of folders. Folders that exist in both lists can be ignored (but not their files.) Folders that exist in the database list but not the disk list should be deleted, including all files within those folders! Folders that are on disk but not in the database need to be added. Thus you can now start two threads, each with their own database connection. One will delete all folders plus their related images from the database that have been deleted while the other adds all new folders that are found on the disk. And by using two database connections, you can speed things up. You will have to wait for both threads to finish, though. But it shouldn’t be slow.

The next step would be the comparison of images. Here you do something similar as with folders. You split the lists in three different lists. One with all images that are unchanged. One with all images that need to be deleted. And one with all images that need to be added. And you would create a separate thread with its own database connection to delete the images so your main process can start working on the ratios table.

Because we now know which images need to be added, we can go through those files using parallel processing, read the image width and height and add this information to the image file records. When we have enriched this list with these sizes, we can use a LINQ query to generate a list of all ratios of those images and removing all duplicate ratios in this list. This generates the list of ratios that we would need to check.

Before we add the new images, we will have to check the ratios table. As with the folders table, we check for all differences. However, we cannot delete ratios that we haven’t found among the images, because we skipped the images that already exist. We will do this later, though. We will first start adding the new ratios to the database. This too can be done in a separate thread but it’s pretty fast anyways so why bother? A performance gain of two seconds isn’t worth the extra effort if a process takes minutes to finish. So add the new ratios.

Once all ratios are added, we can add all images. We could do this using parallel threads, with each thread creating a new database connection and processing all images from one specific folder or with one specific ratio. But if you want to add them multi-threaded I would just recommend to divide the images in groups of similar sizes. Keep the amount of groups relative to the number of processes (e.g. 24 for my six cores) and let the system do its work. By evenly dividing the images over multiple threads, they should all take about the same amount of time.

When adding the new images, you will have to find the related folder and ratio in the database again. This makes adding images slower than adding folders or ratios because you need the extra lookup. This performance would increase if we had kept the Folders and Ratio lists as queryable lists but then we could not open and close the connections, not could we use multiple connections to add those images. And we want multiple connections to speed things up. So we accept a slightly worse performance at this point, although we could probably speed it up a bit by using a stored procedure to add the images. The stored procedure would have parameters for the image name, the image filename, the width and height, the folder name and the ratio width and height. I’m not too fond of procedures with many parameters and I haven’t tested if this would increase the performance, but in theory it should be faster, especially if the database is on a different machine than the application.

And thus a simple task of adding images to a database turns out to be complex, simply because we need better performance. It would still take hours if it has a lot of new images to add but once you have it mostly filled, it will do quite well.

But you will have to ask yourself and your team if you are capable to detect these problems before you start a new sprint. Designs are simple, because designers don’t always keep the performance in mind. These things are easily asked for because they appear very simple, but have a lot of consequences. Similar problems might arise when you work with projects that need to be secure. The design might ask for a login screen with username and password, and optionally a few OpenID providers as alternative logins, but the amount of code to manage all this data and keep it secure is quite complex. These are real moments when you need to design some technical documentation first, which is something people often forget when working on an Agile project.

Still, you cannot blame the developer if the designer just writes a few lines and the developer chooses the first, slow solution. The result would be the requested task. It is the designer who needs to be aware of these possible performance pitfalls. And with Agile, you have a team. All team members should be able to point out that this simple description would have these pitfalls, thus making it a long and complex task. They should all realise that they will have to discuss possible solutions for this and preferably they do so as a team with just one computer. (The computer would be used to find information, not to write code!) Only when they agree on the proper solution then one or two of them could start writing code. And they would know how long this task will take. Thus, the task would finish within two sprints. In the first sprint, all team members would have a small task to meet and discuss the options. In the second sprint, one or more members would have a big task of implementing the code.

Or, to keep it simple: think before you start writing code!