The need of security, part 3 of 3.

Azra Yilmaz Poses III
Enter a caption

Do we really need to hash data? And how do we use those hashed results? That is the current topic.

Hashing is a popular method to generate a key for a piece of data. This key can be used to check if the data is unmodified and thus still valid. It is often used as an index for data but also as a way to store passwords. The hashed value isn’t unique in general, though. It is often just a large number between a specific range of values. If this range happens to be 0 to 9, it would basically mean that any data will result in one of 10 values as identifier, so if we store 11 pieces of data as hashes, there will always be two pieces of data that generate the same hash value. And that’s called collisions.

There are various hashing algorithms that are created to have a large numerical range to avoid collisions. Chances of collisions are much bigger in smaller ranges. Many algorithms have also been created to generate a more evenly distribution of hash values which further reduces the chance of collisions.

So, let’s have a simple example. I will hash a positive number into a value between 0 and 9 by adding all digits to get a smaller number. I will repeat this for as long as the resulting number is larger than 9. So the value 654321 would be 6+5+4+3+2+1 or 21. That would become 2+1 thus the hash value would be 3. Unfortunately, this algorithm won’t divide all possible hash values equally. The value 0, will only occur when the original value is 0. Fortunately, the other numbers will be divided equally as can be proven by the following piece of code:

Snippet

using System;
 
namespace SimpleHash
{
    class Program
    {
        static int Hash(int value)
        {
            int result = 0;
            while (value > 0)
            {
                result += value % 10;
                value /= 10;
            }
            if (result >= 10) result = Hash(result);
            return result;
        }
 
        static void Main(string[] args)
        {
            int[] index = new int[10];
            for (int i = 0; i < 1000000; i++) { index[Hash(i)]++; }
            for (int i = 0; i < 10; i++) { Console.WriteLine("{0}: {1}", i, index[i]); }
            Console.ReadKey();
        }
    }
}

Well, it proves it only for the values up to a million, but it shows that 999,999 of the numbers will result in a value between 1 and 9 and only one in a value of 0, resulting in exactly 1 million values and 10 hash values.

As you can imagine, I use a hash to divide a large group of numbers in 10 smaller groups. This is practical when you need to search for data and if you have a bigger hash result. Imagine having 20 million unsorted records and a hash value that would be between 1 and 100,000. Normally, you would have to look through 20 million records but if they’re indexed by a hash value, you just calculate the hash for a piece of data and would only have to compare 200 records. That increases the performance, but at the cost of maintaining an index which you need to build. And the data needs to be an exact match, else the hash value will be different and you would not find it.

But we’re focusing on security now and the fact that you need to have a perfect match makes it a perfect way to check a password. Because you want to limit the amount of sensitive data, you should not want to store any passwords. If a user forgets a password, it can be reset but you should not be able to just tell them their current password. That’s a secret only the user should know.

Thus, by using a hash, you make sure the user provides the right password. But there is a risk of collisions so passwords like “Wodan5tr1ke$Again” and “123456” might actually result in the same hash value! So, the user thinks his password is secure, yet something almost everyone seems to have used as password will also unlock all treasures! That would be bad so you need two things to prevent this.

First of all, the hash algorithm needs to provide a huge range of possible values. The more, the better. If the result happens to be a 256-bit value then that would be great. Bigger numbers are even more welcome. The related math would be more complex but hashing algorithms don’t need to be fast anyways. Fast algorithms actually speed up brute-force attacks so with hashing, slower algorithms are better. The user can wait a second or two. But for a hacker, two seconds per attempt means he’ll spent weeks, months or longer just to try a million possible passwords through brute force.

Second of all, it is a good idea to filter all simple and easy to guess passwords and use a minimum length requirement together with an added complexity requirement like requiring upper and lower case letters together with a digit and special character. And users should not only pick a password that qualifies for these requirements but also when they enter a password, these checks should be performed before you check the hash value for the password. This way, even if a simple password collides with one of the more complex ones, it still will be denied since it doesn’t match the requirements.

Use regular expressions, if possible, for checking if a password matches all your requirements and please allow users to enter special characters and long passwords. I’ve seen too many sites which block the use of special characters and only use the first 6 characters for whatever reason, thus making their security a lot weaker. (And they also tend to store passwords in plain-text to add to the insult!)

Security is a serious business and you should never store more sensitive data than needed. Passwords should never be stored anyways. Only hashes.

If you want to make even a stronger password check, then concatenate the user name to the password. Convert the user name to upper case, though. (Or lower case) so the user name is case-insensitive. Don’t do the same with the password, though! The result of this will be that the username and password together will result in a hash value, so even if multiple people use the same password, they will still have different hashes.

Why is this important? It is because some passwords happen to be very common and if a hacker knows one such password, he could look in the database for similar hashes and he would know the proper passwords for those accounts too! By adding the user name, the hash will be different for every user,  even if they all use the same password. This trick is also often forgotten yet is simple enough to make your security a lot more secure.

You can also include the timestamp of when the user registered their account, their gender or other fixed data that won’t change after the account is created. Or if you allow users to change their account name, you would require them to provide their (new) password too, so you can calculate the new hash value.

The need of security, part 2 of 3.

Azra Yilmaz Poses II
Enter a caption

What is encryption and what do we need to encrypt? That is an important question that I hope to answer now.

Encryption is a way to protect sensitive data by making it harder to read the data. It basically has to prevent that people can look at it and immediately recognize it. Encryption is thus a very practical solution to hide data from plain view but it doesn’t stop machines from using a few extra steps to read your data again.

Encryption can be very simple. There’s the Caesar Cipher which basically shifts letters in the alphabet. In a time when most people were illiterate, this was actually a good solution. But nowadays, many people can decipher these texts without a lot of trouble. And some can do it just inside their heads without making notes. Still, some people still like to use ROT13 as a very simple encryption solution even though it’s almost similar to having no encryption at all. But combined with other encryption methods or even hashing methods, it could be making encrypted messages harder to read, because the input for the more complex encryption method has already a simple layer of encryption.

Encryption generally comes with a key. And while ROT13 and Caesar’s Cipher don’t seem to have one, you can still build one by creating a table that tells how each character gets translated. Than again, even the mathematical formula can be considered a key.

Having a single key will allow secret communications between two or more persons and thus keep data secure. Every person will receive a key and will be able to use it to decrypt any incoming messages. These are called symmetric-key algorithms and basically allows communication between multiple parties, where each member will be able to read all messages.

The biggest problem of using a single key is that the key might fall into the wrong hands, thus allowing more people access to the data than originally intended. That makes the use of a single key more dangerous in the long run but it is still practical for smaller sessions between multiple groups, as long as each member has a secure access to the proper key. And the key needs to be replaced often.

A single key could be used by chat applications where several people will join the chat. They would all retrieve a key from a central environment and thus be able to read all messages. But you should not store the information for a long time.

A single key can also be used to store sensitive data into a database, since you would only need a single key to read the data.

A more popular solution is an asymmetric-key algorithm or public-key algorithm. Here, you will have two keys, where you keep the private (master) key and give others the public key. The advantage of this system is that you can both encrypt and decrypt data with one of the two keys, but you can’t use the same key to reverse that action again. Thus it is very useful to send data into a single direction. Thus the private key encrypts data and you would need the public key to decrypt it. Or the public key encrypts data and you would need the private key to decrypt it.

Using two keys thus limits communication to a central hub and a group of people. Everything needs to be sent to the central hub and from there it can be broadcasted to the others. For a chat application it would be less useful since it means the central hub has to do more tasks. It needs to continuously decrypt and encrypt data, even if the hub doesn’t need to know the content of this data.

For things like email and secure web pages, two keys is practical, though. The mail or web server would give the public key to anyone who wants to connect to it so they can encrypt sensitive data before sending it to the server. And only the server can read it by using the private key. The server can then use the private key to encrypt new data and send it to the visitor, who will use the public key to decrypt the message again. Thus, you have secure communications between two parties.

Both methods have some very secure algorithms but also some drawbacks. Using a single key is risky if that key falls into the wrong hands. One way to solve this is by sending the single key using a two-key algorithm to the other side! That way, it is transferred in a secure way, as long as the key used by the receiver is secret enough. In general, that key should need to be a private key so only the recipient can read the single-key you’ve sent.

A single key is also useful when encrypting files and data inside databases since it would only require one key for both actions. Again, you would need to store the key in a secure way, which would again use a two-key algorithm. You would use a private key to encrypt the single key and include a public key in your application to decrypt this data again. You would also use that public key inside your applications only but it would allow you to use a single public key in multiple applications for access to the same data.

As I said, you need to limit access to data as much as possible. This generally means that you will be using various different keys for various purposes. Right now, many different encryption algorithms are already in use but most developers don’t even know if the algorithm they use is symmetrical or asymmetrical. Or maybe even a combination of both.

Algorithms like AES, Blowfish and RC4 are actually using a single key while systems like SSH, PGP and TLS are two-key algorithms. Single-key algorithms are often used for long-term storage of data, but the key would have additional security to avoid easy access to it. Two-key algorithms are often used for message systems, broadcasts and other forms of communication because it is meant to go into a single direction. You don’t want an application to store both a private key and matching public key because it makes encryption a bit more complex and would provide a hacker a way to get the complete pair.

And as I said, a single key allows easier communications between multiple participants without the need for a central hub to translate all messages. All the hub needs to do is create a symmetrical key and provide it to all participants so they can communicate with each other without even bothering the central hub. And once the key is deleted, no one would even be able to read this data anymore, thus destroying almost all traces of the data.

So, what solution would be best for your project? Well, for communications you have to decide if you use a central hub or not. The central hub could archive it all if it stays involved in all communications, but you might not always want this. If you can provide a single key to all participants then the hub won’t be needed afterwards.

For communications in one single direction, a two-key algorithm would be better, though. Both sides would send their public key to the other side and use this public key to send messages, which can only be decrypted by the private key which only one party has. It does mean that you actually have four keys, though. Two private keys and two public keys. But it happens to be very secure.

For data storage, using a single key is generally more practical, since applications will need this key to read the data. But this single key should be considered to be sensitive thus you need to encrypt it with a private key and use a public key as part of your application to decrypt the original key again.

In general, you should use encryption whenever you need to store sensitive data in a way that you can also retrieve it again. This is true for most data, but not always.

In the next part, I will explain hashing and why we use it.

The need of security, part 1 of 3.

Azra Yilmaz Poses I
Enter a caption

Of all the things developers have to handle, security tends to be a very important one. However, no one really likes security and we rather live in a society where you can leave your home while keeping your front door open. We generally don’t want to deal with security because it’s a nuisance!

The reality? We lock our doors, afraid that someone gets inside and steal things. Or worse, waits for us to return to kill us. We need it to protect ourselves since we’re living in a world where a few people have very bad intentions.  And we hate it because security costs money, since someone has to pay for the lock. And it takes time to use it, because locking and unlocking a door is still an extra action you need to take.

And when you’re developing software, you generally have the same problem! Security costs money and slows things down a bit. And it is also hard to explain to a client why they have to pay for security and why the security has to cost so much. Clients want the cheapest locks, yet expect their stuff is as safe as Fort Knox or even better.

The worse part of all security measures is that it’s never able to keep everyone out. A lock on your door won’t help if you still leave the window open. And if the window is locked, it is still glass that can be broken. The door can be kicked in too. There are always a lot of ways for the Bad People to get inside so what use is security anyways?

Well, the answer is simple: to slow down any would-be attacker so he can be detected and dealt with, and to make the break-in more expensive than the value of the loot stored inside. The latter means that the more valuable the loot is, the stronger your security needs to be. Fort Knox contains very valuable materials so it has a very strong security system with camera’s and lots of armed guards and extremely thick walls.

So, how does this all translate to software? Well, simple. The data is basically the loot that people are trying to get at. Legally, data isn’t property or doesn’t even has much legal protection so it can’t be stolen. However, data can be copyrighted or it can contain personal information about people. Or, in some cases, the data happens to be secrets that should not be exposed to the outside world. Examples of these three would be digital artwork, your name and bank account number or the formula for a deadly poison that can be made from basic household items.

Of all this data, copyrighted material is the most common item to protect, and this protection is made harder because this material is meant to be distributed. The movie and music industry is having a very hard time protecting all copyrighted material that they have and the same applies to photographers and other graphical artists. But also software developers. The main problem is that you want to distribute a product in return for payment and people are getting it without paying you. You could consider this lost profit, although if people had no option but pay for your product, they might not have wanted it in the first place. So the profit loss is hard to prove.

To protect this kind of material you will generally need some application that can handle the data that you’re publishing. For software, this would be easy because you would include additional code to your project that will check if the software has been legally installed or not. Often, this includes a serial number and additional license information and nowadays it tends to include calling a special web server to check if licenses are still valid.

For music and films, you can use a technique called DRM which works together with proper media players to make additional checks to see if the media copy happens to be from a legal source or not. But it would limit the use of your media to media players that support your DRM methods. And to get media players to support your DRM methods, you need to publish those methods and hope they’re secure enough. But DRM has already been bypassed by hackers many times so it has proven to be not as effective as people hoped.

And then there’s a simpler option. Add a copyright notice to the media. This is the main solution for artwork anyways, since there’s no DRM for just graphic images. You might make the image part of an executable but then you have to build your own picture viewer and users won’t be able to use your image. Not many people want to just see images, unless it is pornography. So you will have to support the basic image file formats, which are generally .JPG or .PNG for any image on the Internet. Or .GIF for animations. And you protect them by adding a warning in the form of a copyright notice. Thus, if someone is misusing your artwork and you discover the use of your art without a proper license, then you can start legal actions against the violator and claim damages. This would start by sending a bill and if they don’t pay, go to court and have a judge force them to pay.

But media like films, music and images tend to be hard to protect and often require going to court to protect your intellectual property. And you won’t always win such cases either.

Next on the list is sensitive, personal information. Things like usernames and passwords, for example. One important rule to remember is that usernames should always be encrypted and passwords should always be hashed. These are two different techniques to protect data and will be explained in the next parts.

But there is more sensitive data that might need to be stored and which would be valuable. An email address could be misused to spam people so that needs to be encrypted. Name, address and phone numbers can be used to look up people and annoy those people by ordering stuff all over the Internet and have it sent to their address. Or to make fake address changes to change their address to somewhere else, so they won’t receive any mail or other services. Or even to visit the address, wait until the people left the house and then break in. And what has happened in the past with addresses of young children is that a child molester learns of their address and goes to visit them to rape and/or kidnap them. So, this information is also sensitive and needs to be encrypted.

Other important information would be bank account information, medical data and employment history would be sensitive enough to have encrypted. Order information from visitors might also be sensitive if the items were expensive since those items would become interesting things to steal. You should basically evaluate every piece of information to determine if it needs to be encrypted or not. In case of doubts, encrypt it just to make it more secure.

Do keep in mind that you can often generate all kinds of reports about this personal data. A simple address list of all your customers, for example. Or the complete medical file of a patient. These documents are sensitive too and need to be protected, but they’re also just basic media like films and artwork so copies of those reports are hard to protect and often not protected by copyrights. So be very careful with report generators and have report contain warnings about how sensitive the data in it actually is. Also useful is to have a cover page included as the first page of a report, in case people will print it. The cover page would thus cover the content if the user keeps it closed. It’s not much protection but all small bits are useful and a cover page prevents easy reading by passer-by’s of the top page of the report.

Personal information is generally protected by privacy laws and thus misuse of personal information is often considered a criminal offense. This is unlike copyright violations, which are just civil offenses in general. But if you happen to be a source of leaking personal information, you and your company could be considered guilty of the same offense and will probably be forced to pay for damages and sometimes a large fine in case of clear negligence in protecting this data.

The last part of sensitive data tends to be ideas, trade secrets and more. In general, these are just media files like reports and thus hard to protect, although there are systems that could store specific data as personal data so you can limit access to it. Ideas and other similar data are often not copyrightable. You can’t get copyright on an idea. You can only get copyright over the document that explains your idea but anyone who hears about your idea can just use it. So if you find a solution for cleaner energy, anyone else could basically build your idea into something working and make profits from it without providing you any compensation. They don’t even have to say it was your idea!

Still, to protect ideas you can use a patent, which you will have to register in many countries just to protect your idea everywhere. Patents become open to the public so everyone will know about it and be able to use it, but they will need to compensate you for using your idea. And you can basically set any price you like. This system tends to be used by patent trolls in general, since they describe very generic ideas and then go after anyone who seems to use something very similar to their idea. They often claim an amount of damages that would be lower than the legal amount it would cost the accused to fight back, so they tend to get paid for this trolling. This is why many are calling for patent reforms to stop these patent trolls from abusing the system.

So, ideas are very sensitive. You generally don’t want to share them with the generic public since it would allow others to implement your ideas. Patents are a bit expensive and not always easy to protect. And you can’t patent everything anyways. Some patents will be refused because they’ve already been patented before. And yet you still need to share them with others so you can build the idea into a project. And for this, you would use a non-disclosure agreement or NDA.

An NDA is basically a contract to make sure you can share your idea with others and they won’t be allowed to share it with more people without your permission. And if your idea does get leaked, those others would have to compensate your financial losses due to leakage as mentioned in the NDA contract. It’s not very secure but it generally does prevent people from leaking your ideas.

Well, except for possible whistleblowers who might leak information about any illegal or immoral parts of your idea. For example, if your idea happens to be to blow up the subway in Amsterdam and have an NDA with a few other terrorists to help you then it becomes difficult when one of those others just walks to the police to report you and those who help you. The NDA just happens to be a contract and can be invalidated for many reasons, including the more obvious criminal actions that would relate to it.

But there are also so-called blacklists of things you can’t force in an NDA, depending on the country where you live. It is just a contract and thus handled by the Civil courts. And if the NDA violates the rights of those who sign it then it could be invalid. One such thing would be the right of free speech, where you would ban people from even discussing if your idea happens to be good or not.

Other sensitive information would be things like instructions on how to make explosives or business information about the future plans of Intel, which could influence the stock market. Some of this information could get you into deep trouble, including the Civil Court or Criminal Court as part of your troubles, resulting in fines and possibly imprisonment if they are leaked.

In general, sensitive information isn’t meant to be shared with lots of people so you should seriously limit access to such information. It should not be printed and you should not email this information either. The most secure location for this information would be on a computer with no internet connection but having a strong firewall that blocks most access methods would be good enough for many purposes.

So we have media, which is hard to protect because it is meant to be published. We have sensitive data which should not fall in the wrong hands for various reasons and we have personal data, which is basically a special case of sensitive data that relates to people and thus has additional laws as protection.

And the way to secure it is by posting warnings and limiting access to the data, which is difficult if it was meant to be published. But for those data that we want to keep private, we have two ways of protecting it next to limiting the physical access to this data.

To keep things private, you will need to have user accounts with passwords or other security keys to lock the data and limit access to it. And these user accounts are already sensitive data so you should start with protecting it here, already.

Of all the things software developers do, security happens to be the most complex and expensive part, since it doesn’t provide any returns on investments made. All it does is try to provide assurance that data will only be available for those who are meant to use it.

The two ways to protect data is through encryption and through hashing, which are two similar things, yet also differ in their purpose. I will discuss both in my next posts.

An example of bad development…

I recently received an email from a company that’s doing questionnaires. And well, I subscribed to this and did some of their questionnaires before, so I wanted to do this new one too. Unfortunately, the page loaded quite slow, only to return a very nasty error message. A message that told me that this organisation is using amateurs for developers and administrators.

Let me be clear about one thing: errors will happen. Every developer should expect weird things to happen, but this case is not an error but evidence of amateurs. So, let’s start with analyzing the message…

Server Error in ‘/’ Application.

Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

So, what’s wrong with this? Users should never see these messages! When you develop in ASP.NET you can just tell the system to just keep these error messages only when the user is connected locally. A remote user should see a much simpler message.

This is something the administrator of the website should have known, and checked. He did not. By failing at this simple configuration setting the organisation is leaking some sensitive information about their website. Information that’s enough for me to convince they’re amateurs.

This error is also a quite common error message. Basically, it’s telling me that the system is having too many database connection open. One common cause for this error is when the code fails to close a connection after opening them. Keep that in mind, because I will show that this is what caused the error…

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

This is a standard follow-up message. The fact that users of the site would see this stack trace too is just bad.

Exception Details: System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

A timeout error. A reference to the connection pool and the max pool size. This already indicates that there are more connections are opened than closed and the system can’t handle that correctly. There are frameworks for .NET that are better suited for this to prevent these kinds of errors. That’s because these errors happened to be very common with ASP.NET applications. And with generic database applications written in .NET.

Basically, the top of the error message is just repeating itself. Blame Microsoft for that since this is a generic message from ASP.NET itself. Developers can change the way it looks but that’s not very common. Actually, developers should prevent users from seeing these kinds of messages to begin with. Preferably, the error should be caught by an exception handler which would write it to a log file or database and send an alert out to the administrator.

Considering that I received this error on a Friday afternoon, I bet the developer and administrators are already back home, watching television like I do now. Law & Order is just on…

Source Error:

Line 1578:
Line 1579:        cmSQL = New SqlCommand(strSQL, cnSQLconfig)
Line 1580:        cnSQLconfig.Open()
Line 1581:
Line 1582:        Try

This is interesting… The use of SqlCommand is a bit old-fashioned. Modern developers would have switched to e.g. the Entity Framework or other, more modern solution for database access. But the developers of this site are just connecting to the database in code, probably to execute a query and collect the data and then should close the connection again. The developers are clearly using ADO.NET for this site. And I can’t help but wonder why. They could have used more modern techniques instead. But probably they just need to keep up an existing site and aren’t they allowed to use more modern solutions.

But it seems to me that closing the database is not going to happen here. There are too many connections already open thus this red line of code fails. The code has an existing connection called cnSQLConfig which is already open. It then tries to open and execute an SQL command that fails. Unfortunately, opening that command happens outside a try-except block and if this fails, it is very likely that the connection won’t be closed either.

If this happens once or twice, then it still would not be a big problem. The connection pool is big enough. But here it just happened too often.

Another problem is that the ADO.NET technique used here is also vulnerable for SQL Injection. This would also be a good reason to use a different framework for database access. It could still be that they’re using secure code to protect against this but what I see here doesn’t give me much confidence.

Source File: E:\wwwroot\beta.example.com\index.aspx.vb    Line: 1580

A few interesting, other facts. First of all, the code was written in Visual Basic. That was already clear from the code but this just confirms it. Personally, I prefer C# over Visual Basic, even though I’ve developed in both myself. And in a few other languages. Language should not matter much, especially with .NET, but C# is often considered more professional than BASIC. (Because the ‘B’ in BASIC stands for ‘Beginners’.)

Second of all, this piece of code has over 1580 lines of code. I don’t know what the rest of the code is doing but it’s probably a lot of code. Again, this is an old-fashioned way of software development. Nowadays, you see more usage of frameworks that allow developers to write a lot less code. This makes code more readable. Even in a main index of a web site, the amount of code should be reasonable low. You can use views to display the pages, models to handle the data and controllers to connect both.

Yes, that’s Model-View-Controller, or MVC. A technique that’s practical in reducing the amount of code, if used well enough.

And one more thing is strange. While I replaced the name of the site with ‘example.com’, I kept the word ‘beta’ in front of it. I, a user, am using a beta-version of their website! That’s bad. Users should not be used as testers because it will scare them off when things go wrong. Like in this case, where the error might even last the whole weekend because developers and administrators are at home, enjoying their weekend.

Never let users use your beta versions! That’s what testers are for. You can ask users to become testers, but then users know they can expect errors like these.

Stack Trace:

[InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.]
   System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) +4863482
   System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) +117
   System.Data.SqlClient.SqlConnection.Open() +122
   _Default.XmlLangCountry(String FileName) in E:\wwwroot\beta.example.com\index.aspx.vb:1580
   _Default.selectCountry() in E:\wwwroot\beta.example.com\index.aspx.vb:1706
   _Default.Page_Load(Object sender, EventArgs e) in E:\wwwroot\beta.example.com\index.aspx.vb:251
   System.Web.UI.Control.OnLoad(EventArgs e) +99
   System.Web.UI.Control.LoadRecursive() +50
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

And that’s the stack trace. We see the site loading its controls and resources and the ‘Page_Load’ method is called at line 251. At line 1706 the system is apparently loading country-information which would be needed to set the proper language. Then it returns to line 1580 where it probably opens some table based on information from the language file.

Again, this is a lot of code for basically loading the main page. I even wonder why it needs to load data from the database based on the country information. Then again, I was about to fill in a questionnaire so it probably wanted to load the questionnaire in the proper language. If the questionnaire is multi-lingual then that would make sense.

Version Information: Microsoft .NET Framework Version:2.0.50727.3655; ASP.NET Version:2.0.50727.3658

And here’s one more bad thing. This site still uses .NET version 2.0 while the modern version is 4.5 and we’re close to version 5.0… It would not surprise me if these developers still use Visual Studio 2005 or 2008 for this all. If that’s the case then their budget for development is probably quite low. I wonder if the developers who are maintaining this site are even experts at software development. It’s not a lot of information that I can base this upon but in short:

  • The administrator did not prevent error messages to show up for users.
  • The use of ADO.NET adds vulnerabilities related to the connection pool and SQL injection.
  • The use of VB.NET is generally associated to less experienced developers.
  • The amount of code is quite long but common for sites that are developed years ago.
  • Not using a more modern framework makes the site more vulnerable.
  • Country information seems to be stored in XML while the questionnaire is most likely stored inside the database.
  • The .NET version has been out-of-date for a few years now.

My advice would be to just rewrite the whole site from scratch. Use the Entity Framework for the database and MVC 4 for the site itself. Rewrite it in C# and hire more professional developers to do the work.

A very generic datamodel.

I’ve come up with several projects in the past and a few have been mentioned here before. For example, the Garagesale project which was based on a system I called “CART”. Or the WordChain project that was a bit similar in structure. And because those similarities, I’ve been thinking about a very generic datamodel that should be handled to almost any project.

The advantage of a generic database is that you can focus on the business layer while you don’t need to change much in the database itself. The datamodel would still need development but by using the existing model, mapping to existing entities, you could keep it all very simple. And it resulted in this Datamodel:ClassDiagram(Click the image to see a bigger version.)

The top class is ‘Identifier’ which is just an ID of type GUID to find the records. Which will work fine in derived classes too. Since I’m using the Entity Framework 6 I can just use POCO to keep it all very simple. All I have to do is define a DBContext that tells me which tables (classes) I want. If I don’t create an entry for ‘Identifier’, the table won’t be created either.

The next class is the ‘DataContent’ class, which can hold any XML. That way, this class can contain all information that I define in code without the need to create new tables. I also linked it to a ‘DataTemplate’ class which can be used to validate the content of the XML with an XML schema or special style sheet. (I still need to work out how, exactly.) The template can be used to validate the data inside the content.

The ‘BaseItem’ and ‘BaseLink’ classes are the more important here. ‘BaseItem’ contains all fixed data within my system. In the CART system, this would be the catalog. And ‘BaseLink’ defines transactions of a specific item from one item to another. And that’s basically three-fourth of the CART system. (The template is already defined in the ‘DataTemplate’ class.)

I also created two separate link types. One to deal with fixed numbers which is called ‘CountLink’ which you generally use for items. (One cup, two girls, etc.) The other is for fractional numbers like weights or money and is called ‘AmountLink’. These two transaction types will be the most used transaction types, although ‘BaseLink’ can be used to transfer unique items. Derived links could be created to support more special situations but I can’t think of any.

The ‘BaseItems’ class will be used to derive more special items. These special items will define the relations with other items in the system. The simplest of them being the ‘ChildItem’ class that will define more information related to a specific item. They are strongly linked to the parent item, like wheels on a car or keys on a keyboard.

The ‘Relation’ class is used to group multiple items together. For example, we can have ‘Books’ defined as relation with multiple book items linked to it. A second group called ‘Possessions’ could also be created to contain all things I own. Items that would be in both groups would be what is in my personal library.

A special relation type is ‘Property’ which indicates that all items in the relation are owned by a specific owner. No matter what happens with those items, their owner stays the same. Such a property could e.g. be a bank account with a bank as owner. Even though customers use such accounts, the account itself could not be transferred to some other bank.

But the ‘Asset’ class is more interesting since assets are the only items that we can transfer. Any transaction will be about an asset moving from one item to another. Assets can still be anything and this class doesn’t differ much from the ‘BaseItem’ class.

A special asset is a contract. Contracts have a special purpose in transactions. Transactions are always between an item and a contract. Either you put an asset into a contract or extract it from a contract. And contracts themselves can be part of bigger contracts. By checking how much has been sent or received to a contract you can check if all transactions combined are valid. Transactions will have to specify if they’re sending items to the contract or receiving them from the contract.

The ‘BaseContract’ class is the more generic contract type and manages a list of transactions. When it has several transactions, it is important that there are no more ‘phantom items’. (A phantom item would be something that’s sent to the contract but not received by another item, or vice versa.) These contracts will need to be balanced as a check to see if they can be closed or not. They should be temporary and last from the first transaction to the last.

The ‘Contract’ type derived from ‘BaseContract’ contains an extra owner. This owner will be the one who owns any phantom items in the contract. This reduces the amount of transactions and makes the contract everlasting. (Although it can still be closed.) Balancing these contracts is not required, making them ideal as e.g. bank accounts.

Yes, it’s a bit more advanced than my earlier CART system but I’ve considered how I could use this for various projects that I have in mind. Not just the GarageSale project, but also a simple banking application, a chess notation application, a project to keep track of sugar measurements for people with diabetics and my WordChain application.

The banking application would be interesting. It would start with two ‘Relation’ records: “Banks” and “Clients”. The Banks relation would contain Bank records with information of multiple banks. The Clients relation would contain the client records for those banks. And because of the datamodel, clients can have multiple banks.

Banks would be owners of bank accounts, and those accounts would be contracts. All the bank needs to do is keep track of all money going in our out the account. (Making money just another item and all transactions will be of type ‘AmountLink’.) But to link those accounts to the persons who are authorized to receive money from the account, each account would need to be owner of a Property record. The property record then has a list of clients authorized to manage the account.

And we will need six different methods to create transactions. Authorized clients can add or withdraw money from the account. Other clients can send or receive payments from the account, where any money received from the contract needs to be authorized. Finally, the bank would like to have interest, or pays interest. (Or not.) These interest transactions don’t need authorization from the client.

The Chess Notation project would also be interesting. It would start with a Board item and 64 squares items plus a bunch of pieces assets. The game itself would be a basic contract without owner. The Game contract would contain a collection of transactions transferring all pieces to their first locations. A collection of ‘Move’ contracts would also be needed where the Game Contract owns them. The Move would show which move it is (including branches of the game) and the transactions that take place on the board. (White Rook gone from A1, White Rook added to A4 and Black pawn removed from A4, which translates into rook takes pawn at A4.)

It would be a very complex way to store a chess game, but it can be done in the same datamodel as my banking application.

With the diabetes project, each transaction would be a measurement. The contract would be owned by the person who is measuring his or her blood and we don’t need to send or receive these measurements, just link them to the contract.

The WordChain project would be a bit more complex. It would be a bunch of items with relations, properties and children. Contracts and assets would be used to support updates to the texts with every edit of a WordChain item kicking the old item out of the contract and adding a new item into the contract. That would result in a contract per word in the database.

A lot of work is still required to make sure it works as well as I expect. It would not be the most ideal datamodel for all these projects but it helps me to focus more on the business layer and the GUI without worrying about any database changes. Once the business model becomes more advanced, I could create a second data layer with a better datamodel to improve the performance of the data management.

 

 

 

Is XML in decline?

I happen to be one of those older software developers who saw the rise of XML. I even remember the older SGML standard, although I never used SGML. Version 1.0 of XML became an official standard in 1998. Once it became a standard, many companies started working to create the Killer App to work with XML without much of a hassle. And although at first many companies started to create their own XML parsers, not all of them were completely conform the standard. Those parsers disappeared fast enough too.

Right now, version 1.1 of XML is the latest standard. Yes, in 16 years not much has happened to this standard. And the changes that have been applied are more about supporting EBCDIC platforms and the newer Unicode definitions. There are discussions about a version 2.0 but it’s not likely to become a standard soon. Strange as it might sound, XML seems to be in decline if you look at how it’s used.

The power of XML was, of course, in the way how you defined these files and how you could do transformations on these file types. While we used DTD definition files at first to define the structure of an XML file, some smart people came up with the XSD schema format, which allowed more flexibility and is by itself an XML file. Combined with some nice, graphical tools, the XSD made it easier to define an XML file and to validate if an XML file conforms to the proper structure. And I’ve made plenty of XSD files between 2000 and 2010 since my work required a lot of XML data exchanges.

Of course, transformations are also important and here we use stylesheets. An XSLT file would be made in XML itself and define how you would convert an XML file to some other output format. In general, this output would be another XML file, an HTML document to display it in a web browser, a simple text file or even a comma-separated file. And in some special cases it could even create a complete rich text document that you could open in Word. This meant that you could e.g. send an XML file to a server and the server would then process it. It would validate the file with a schema and could do additional validation tools by using a style sheet. If it passed these validation style sheets, other stylesheets could then be used to extract data from the XML and send it to other servers for further processing, while it could also generate documentation to return to the user. You could do a lot of processing with just XML files.

Of course, XML also became popular because more developers started to create web services. And they used the SOAP protocol for this, which is a slightly complex protocol that’s heavily dependant on XML standards. Since SOAP also had some build-in version mechanism, you could always make sure if the client was still using the right SOAP definitions or not. You could even use several SOAP message formats on the same system with only the version number as difference. It wasn’t easy to set up, but it worked extremely well.
And more has been developed to support XML even more. The XPath expressions would allow you to point to specific elements within an XML document. With XQuery, you could execute queries on XML files and process the result. With namespaces you could even combine multiple XML definitions that uses similar entities. And then we have things like XLink, XPointer and XForms, which never have been very popular.

Between 2000 and 2010, it seemed that XML would be a dominating development technique. No more writing code in other programming languages that needed to be compiled, simply because XML happens to be a fast scripting environment. Many platforms started to have a standard for objects that could process XML files and knowledge of XML became a hard-needed requirement for developers. So, what changed?

Well, many developers consider the XML format a bit bulky, especially because tags are often used twice. Once to open the element and once to close it. Thus, if an element is called ‘NumberOfElements‘ then you have to write <NumberOfElements>10</NumberOfElements> and that’s a lot of text to store the number 10. As a result, some developers would then shorten those tag names so the resulting XML would be smaller. If you have 10,000 of these tags in your XML file, shortening it to TOE would save 26 characters per element, thus 260,000 characters in total. This doesn’t seem much but developers feel they gain more by these kinds of optimizations. With modern multi-core processors and systems with 8 or more GB of RAM, such optimizations might make the code half a second faster, which you barely notice with web services, but still… Developers think it saves a lot. And yes, when resources are truly limited, it makes a lot of sense but modern mentalities are that companies will just add a second server if one is too slow. Or more, if need be. This is because the costs of the more hardware is less expensive than the costs of having developers optimize the code even further.

These kinds of optimizations make XML files less human-readable while the purpose was to make this kind of data more readable. It becomes slightly worse when the XML file uses namespaces, since those namespaces are also shortened to just a few letters.

Another problem is the need to parse XML to extract the data. More and more companies are creating web applications that run within web browsers and heavily rely on JavaScript. These apps need to be able to run on multiple devices too. Unfortunately, not all browsers support parsing XML files and even those who do are a bit complex to use. With regular expressions it’s still possible to extract some data from the XML but if you need to fill a grid with 50 rows and 20 columns, things become real complex. And to solve this, developers started to send data to web applications as JavaScript instead of XML. This could then be executed and thus the data would load itself into memory. Since JavaScript objects are less bulky than the begin/end tags of XML elements, it made this new format very practical and thus JSON was born.

The birth of JSON also demanded a change in web services. Since web applications would call these services directly, it would be very clumsy if they have to set up SOAP messages and then parse the SOAP results. A newer, simpler style of web services arose, which uses the REST protocol. Of course, there are many other web service protocols but REST seems to become the new standard. Especially because it’s a simpler protocol that relies on the HTTP(s) protocol.

Of course, web applications have become more important these days because we’re getting more and more devices with all kinds of different operating systems, which all have web browsers. And, as I said, not all of those devices have a native XML parser built-in. They do support JavaScript though, and as a result it becomes quite easy to develop web applications for all devices which use data in JSON formats.

Of course, many devices also allow special platform-dependant apps that can be created with development tools for their specific platforms. For OS X and iOS-based devices you would use Objective C while you would use C++ or Java for Android devices. (Java is the preferred development platform for Android.) For Windows RT you would use .NET for Metro-style applications with either VB or C# as primary language. This makes it a bit difficult to develop software that runs on all three devices but there are several parties who have created compilers that will compile platformdependent executables from platform-independent code. Unfortunately, working with XML parsers still differs on all these platforms and those third-party compilers need to wrap their parsers around the built-in parsers of the underlying platform. That makes them a bit slow.

Since the number of operating systems have risen since the market starts getting more and more new devices, it becomes more difficult to keep a single standard that’s supported by all those systems. And the XML standard is quite complex so the different parsers might not all support the same things. In that regard, JSON is much simpler since these are just simple assignment statements. And these assignment statements are based on the Java syntax, which also happens to be similar to the C++, C# and Objective C syntax. The only difference with these languages is the fact that JSON puts the field names between quotes too, which you can’t do inside these languages.

So, XML is becoming less useful because it requires too much work to use. JSON makes data serialization simpler and is less bulky. Especially when developers are more focussing on web applications and apps for specific devices, the use of XML is in decline in favor of JSON and other solutions. But there’s one more reason why XML is in decline. And this is something within the .NET framework that’s called LINQ.

LINQ was implemented as a separate library for .NET version 3.5 but has become popular since then. Basically, LINK allows you to support data in a structured object and use simple queries to, or to execute transformations on extract data from those objects. This would be similar to XPath and XSLT but now it’s part of your development language, allowing you more choice in functions that you can apply to the data. This is especially important for date fields, since XML doesn’t work well with date formats. LINQ actually makes extracting data from object trees quite easy and can be used on an XML document if you’ve read this document in memory in a proper XDocument or XmlDocument object. Thus, the need for XSLT to transform data has disappeared since you can do the same in C#, VB, F# or Oxygene.

The result is that .NET developers don’t have to learn about XML anymore. Their .NET knowledge combined with LINQ is more than enough. Since .NET also allows serialization to and from XML formats, it’s also quite easy to read and write XML files in .NET. You can import an existing XSD file into your .NET application and have it converted to code, but since most XML data starts as objects that need to be stored in XML before serialization, you will often see that developers just define the objects and include attributes to tell if the object and its fields are elements or attributes, and have the serialization library use these object definitions to serialize it to and from XML. Thus, knowledge of XML schemas is not a requirement anymore.

Because .NET development made the dependency on XML knowledge almost obsolete, the popularity of XML is in decline. It’s still used quite often, but the knowledge that you need to do practical things with XML with XML tools is disappearing. And similar things are happening on other platforms. Java and PHP also started supporting LINQ queries. And, as a result, those environments can work on structured objects instead of XML data. Thus, XML is only needed if the data needs to be sent to some other process and even then, other formats might be chosen too.

In fact, many developers are less concerned about the data format that’s used for inter-process communication. The system is handling this for them and they just use a specific serialization library that does the bulk of the work for them. XML isn’t really declining, but less developers need knowledge about the XML format since development tools have nice wrappers around them that allow these developers to use XML without even realizing they’re using XML. It’s not XML that’s in decline. It’s the knowledge about XML that is in decline…

Motivating developers…

One of the biggest problems for software developers is finding the proper motivations to sit behind the screen for 8 hours per day, designing and developing new code, new projects. It’s generally boring work that requires a lot of mental efforts. And the rewards tend to be just more of the same work the next day, and the day afterwards. Creating new code or fixing existing code is like working in a factory in an assembly line, just placing a lid on a pot which someone else will close, over and over and over.

But developing code is a mental job, unlike adding lids to pots. During physical jobs, your mind can wander around to what you’re going to do in the weekend, what’s on television or whatever else you have on your mind. A mental job makes that very difficult since you can’t think about your last holiday while also thinking about how to solve this bug. And thus developers have a much more complex job than those at the assembly line. A job that causes a lot of mental fatigue. (And sitting so long behind a screen is also a physical challenge.)

Three things will generally motivate people. Three basic things, actually, that humans have in common with most animals. We all like a good night of sleep, we all like to eat good food and we’re all more or less interested in sex. Three things that will apply for almost anyone. Three things that an employer might help with.

First of all, the sleep. Developers can be very busy both at home and at work with their jobs. Many of them have a personal interest in their own job and can spend many hours at home learning, playing or even doing some personal work at their own computers. Thus, a developer might start at 8:30 and work until 17:00. The trip home, dinner and meet and greet with the family will take some time but around 19:30 the developer will be back online on Facebook and other social media, play some online games or study new things. This might go on until well past midnight before they go to bed. Some 6 hours of sleep afterwards, they get up again, have breakfast, read the morning paper and go back to work again.

But a job that is mentally challenging will require more than 6 hours of sleep per day. So you might want to tell your employees to take well care of themselves if you notice they’re up past midnight. You need them well-rested else they’re less productive. Even though those developers might do a great job, they could improve even more if they take those eight hours of sleep every day. And as an employer you can help by allowing employees to visit social sites during work hours since it will help them relax. It lowers the need to check those sites while they’re at home. The distraction of e.g. Facebook might actually even improve their mental skills because it relaxes the mind.

The second motivation is food. Employers should consider providing free lunches to their employees. Preferably sharing meals all together in a meeting room or even a dinner room. Have someone do groceries at the local supermarket to get bread, spread, cheese, butter, milk, soda’s and other drinks and other snacks. While it might seem a waste of the money spent on those groceries, the shared meal will increase moral, allow employees to have all kinds of discussions with one another and increases the team building. It also makes sure everyone will have lunch at the same moment, so they will all be back at work at the same time again.

Developers tend to have lunch between 11:30 and 14:00 and if they have to get their own lunch, it’s not unlikely for them to just go out to the local supermarket themselves or to bring lunch from home. When they go shopping for lunch, they would be unavailable during that time. Of course, lunch time is their own time, but if you need them you don’t want to wait until they’re back from the supermarket. And another problem is that those employees will start storing food at work in their desk or wherever else they can store it. This could attract mice, and I don’t mean computer mice but those live, walking and eating animals.

If an employer provides the lunch and other snacks, this also means there’s a generic storage for food products. This storage is easier to keep up than the desks of developers. Besides, those developers now know their food requirements are satisfied during work hours thus they feel more comfortable.

The third motivation is sex. And here, employers have to be extra careful because this is a very sensitive subject. For example, a developer might spend some time on dating websites or even porn sites. Like social websites, a small distraction often helps during mental processes but a social website might take two minutes to read a post and then respond. A dating website will take way more time to process the profiles of possible dating partners. A porn site will also be distracting for too long and might put the developer in a wrong mood.

The situation at home might also be problematic. An employee might be dealing with a divorce which will impact their sex lives. It also puts them back into the world of dating and thus interfere in their nightlife a bit more. This is a time when they will be less productive, simply because they have too much of their personal lives on their minds. And not much can be done to help them because they need to find a way to stabilize their personal lives again. Do consider sending the employee to a proper counselor for help, though.

Single developers might be a good option, though. They are already dealing with a life of being single and thus will be less distracted by their dates. Still, if they’re young, their status of being single might change and when that happens, it can have impact on their jobs. But the impact might be even an improvement because their partner might actually force them to go to bed sooner, thus fulfilling the sleep motivation.

Married developers who also have children might be the best option since their family lives will require them to live a very regular life. The care for their children will force this regularity. But the well-being of those children might cause the occasional distractions too. For example, when a child gets sick, the developer needs someone to care for the child at home. And they might want to work at home a few days a week to take care of their children.

As an employer, you can’t deal with the sex lives of your employees at work. Those things are private. However, it can be helpful for employees if they can spend more time at home, in a private area, if they have certain needs in this regard. Allowing them to work at home would give them some more options. Since they don’t need to travel to work, they have more time available. If they decide to visit a dating site for half an hour, they could just work half an hour longer and no one would even know about it. If their child is sick, they can take care of them and still work too.

In conclusion, make sure your employees sleep well, give them free lunches and other snacks at the workplace and allow them to work at home for their personal needs. This all will help to make them more productive and allow them to improve themselves.