So, you want to be a software developer? Part 1.

Ramona on Computer 1_0001.png

So, you want to be a software developer too? Great! So, where do you start?

I started with a mentor, my father. Someone who could teach me some of the more important principles, even though it still early in the World of Computers. I had to learn from books and magazines but more importantly, I had to learn by practicing! And that’s how you should start too. You need to practice your computer skills first.

But nowadays, you have one major advantage over me. You have the Internet and there are thousands of mentors willing to help you with anything you need. So, before you will learn to write programs, you need to know a few important resources.

The most important resource for all programmers is this simple search engine called Google. You probably know it already and it is used by quite a few people, although some people dislike Google and how powerful this company actually is. They might use Yahoo, Bing or even DuckDuckGo. That’s just fine but in my own experience, Google tends to provide the most relevant answers to your questions.

Google does this by profiling. They keep a history of things you’ve searched for and if you use Google Chrome as web browser, they will most likely have a good idea of your interests. If you also have a GMail or Google Apps for Work account then they will have an even better profile of you and your preferences. And yes, in a way that invades in your privacy! So you have to be smart in how you use Chrome and Google!

One simple way to get around this is by using multiple accounts. See this link on how to add more than one profile to Chrome. Make one account for any development-related stuff and make sure a Google account and mailbox is assigned to this profile. This is what you will use to let Google profile your development requests. Create a second account for anything that you want to keep more private and use whatever you like. This second account can avoid all Google products and can be used if you like to surf for porn or whatever else using Yahoo. Or to write on Facebook and read tweets on Twitter. But use at least two accounts in Chrome if you want to develop software!

While Google is profiling your development account, you will notice how all search results will start to relate more and more towards software development. This way, you take advantage of what many people consider an invasion of privacy. Just make sure that anything you do with the development account is development-related.

So, now you have the most powerful online resource available to you and as you continue to use it, it will learn more and more about the information you need. So, let’s focus on a few other resources.

An old resource is Experts Exchange, which has a bit unfortunate name. People tend to confuse it with Expert Sexchange, which happens when people forget the dash in the URL or put it in the wrong place. You don’t want to confuse those two! Anyways, EE has been a very useful resource for me for about 15 years now, although the knowledge-base of this site has become saturated. They also turned it into a paid website once, started serving advertisement and they have been losing members since new members had no free access to any answers, which annoyed many people. To get free access, you needed to get a minimum number of points per month by answering the questions of other members for which points could be awarded to you. It’s a good system and they’re continuously searching for new ways to get extra revenue. And they have a huge knowledge-base that they are continuously expanding. Still, there is a slightly better resource available.

This alternative is Stack Overflow, where membership is free, although not ad-free. When you answer questions on this site, you can receive points which increase your rank. When your rank increases, you gain more access rights on the whole system and can even become a moderator. And like EE, this site has already a huge knowledge-base. And this is just one of the many Stack Exchange sites! The makers of SO decided to use the same web software to set up various other sites for other topics and this received a lot of support, allowing them to keep maintaining the software and sites. With over 150 different communities and topics, this is one of the best resources online. But for developers, Stack Overflow should be enough.

Another good resource is Quora. But Quora is more generic and can have all kinds of topics. It is a good place to ask non-technical questions about software development, like what kind of chair you need, what the best setup for your monitor and mouse is or even what to do when someone copied your source code without permission.

And of course, there are many more resources online that you can use, but to start, download Google Chrome and set up that developer account. Then use the resources I’ve just mentioned and start searching for more resources that you think are useful. Add all these resources to your favorites and make sure your Google settings are set to synchronize your settings with Google. Why? Because it allows you to share your favorites with your developer account on multiple computers! Or to synchronize between your computer, your laptop, your tablet with Chrome and your Android phone…

You may have noticed that I talk about what programmers have to do and I have said nothing about programming yet! This is because you will need to prepare first. As a developer, the first thing you need to know is how to find the information you need.

One thing to keep in mind is that programming languages aren’t as important as they seem. A programming language is just a tool, like a hammer or screwdriver. You use it to create a product. For developers, that product is a piece of software. And for people who will use it, it just doesn’t matter how you created it as long as it just works as it is supposed to. So relax! Programming isn’t really about learning programming languages but about making products with your typing hands and brains! Just like a carpenter who makes furniture…

My history as a software developer

In the following posts I will speak about the things you need to learn to become a professional software developer. But before I start that series I first start talking about my own experiences.

I was born in 1966 so I’ve seen this world for about half a century already. And I’ve seen how computers developed from devices as large as a house to devices the size of my fingernails. I’ve seen how difficult it was to write code in the beginning, how standards started to become more popular and made things easier and how people started to complain about standards and, as XKCD makes clear, “solved” the problem by making more standards…

standards

And this is still continuing. It is interesting to see how people just continue to invent new standards, new languages and new formats just because they dislike the old ones. It is also annoying for developers because you will miss some job opportunities if you can’t use one or more of these standards.

But back about me! I was lucky, since my father happened to be a software developer back then, until the day he retired in 1994. He worked for a bank, which was called the “RijksPostSpaarbank”or RPS. (You could translate it to “State Mail Savings Bank”.) This was later known as just the “Postbank” (mail bank) and is now known as the ING. My dad actually was part of the “Postcheque- en Girodienst” (PCGD) which handled a lot of financial transactions.

The PCGD was one of the first fully automated Giro (order?) services in the world working with punched cards to automate a lot of stuff. Working with punched cards was quite fun back then and my father once brought a bag full of the punched-out paper to use as confetti for some party we were holding. That stuff was so nasty that when we moved out of our house in 1984, we could still find small pieces of this confetti in various locations. Still, fun stuff!

And yes, my interests in programming and computers have been inherited from my father. He once worked on one of the earlier Apple systems to help an uncle of mine to automate his bookkeeping and when he didn’t use it, I was allowed to play games on it and experiment a bit.

When he later bought a programmable calculator, the TI-58, I was allowed to use it for school and quickly learned to write simple programs for it, within the 50 instructions the memory allowed. And the first program I wrote myself was on this calculator and a piece of paper, for the Quadratic Formula! And yes, I needed it written down on paper since this calculator would lose the content of its memory once you turned it off so I had to reprogram it very often!

Later, he bought a ZX-81 for me to learn to use more about computer programming. He himself used it too for his bookkeeping but he found out it wasn’t as powerful as he’d hoped to. So I ended up to be the one to use it. Mostly for games but also for programming purposes. And by this time I had been programming various things already, reading magazines to learn even more.

This was around 1982 and my school also decided to teach about computers, so I got a simple education in programming through my school. Funnily enough, the school started with punched cards where we had to fill in the holes with a black pencil. They would then be sent to some university where a machine would stamp holes for all the black spots and would then run the code, returning a printed output of the results.

Compiling my code and getting the results back would often take two to three weeks. Which XKCD also documented:

compiling

Soon afterwards, we got the Sinclair QL which was a bit more serious computer. Back in those days it was fast and the two microdrives of each 100KB made storage of applications a lot easier. By then, I was also studying at the “Higher Laboratory School” to become a Lab Assistant, simply because this school offered an extra ICT education, teaching me how to use PAscal on a Minix system. It was the most interesting part of school while the rest sucked so after a year, I quit and went to look for a job as programmer or whatever else in the ICT. But since I’d learned Pascal, I had bought a PAscal compiler for the QL, which was quite rare back then.

I had one of the earlier PC’s, in my case the Tulip System PC Advance/Extend with a hard disk of 20 MB and 640 KB of RAM and an EGA video card. I had an illegal copy of Turbo Pascal 3.0 which I used for writing my own programs and I had a copy of NetHack, a fun game that I could play for hours.

Around 1987 I had the chance to get some AMBI (Dutch) modules. Part of this was learning to program in COBOL and a special Assembly language called EXAT. (Exam Assembly Language, specifically created for exams.) I also worked as an intern for 6 months at IBM Netherlands as a COBOL Developer, although my job didn’t involve as many programming tasks as I hoped for.

As my job as Intern, I was also given a first look at SQL and even got some kind of diploma indicating that I was good enough as an SQL developer in a time when SQL was just mostly restricted to the “SELECT” statement.

I’ve upgraded to newer computers several times, working all kinds of jobs and trying to get a job as Software Developer, meaning that I had to extend my skills. I got an illegal copy of Turbo Pascal 5 too but when Turbo Pascal 6 came out, I had the financial means to just purchase a license. Which I also did when Borland Pascal 7 arrived on the market. (It also supported Windows 3.11 development, which I was using back then. And around 1994 there was the first Borland Delphi version, which I purchased. And upgraded all the way until Delphi 2007.

The only Delphi version I bought after the 2007 edition was Delphi XE5. Problem was that Delphi was losing the battle against .NET and I needed to switch my skills. So in 2002 I started with the first Visual Studio compilers and I later would upgrade to the versions 2008, 2010, 2012, 2012 and now 2015. Why? Because since 2001, the .NET environment was gaining control over the Windows market and development jobs shifted from desktop applications to web development.

Nowadays, it shifts towards mobile development and embedded “Smart” devices and the “Internet of Things”.

Around 1993, I also got my first Linux distribution, which was interesting. It was difficult to get software for it, since the Internet was still under development back then and it was hard to find good sources and good documentation. Still, I managed to run Linux from a floppy disk and use the console for some simple things. But I would rarely use it until I started using Virtual Machines around 2002 using VMWare. Since then, I have created (and deleted) various virtual machines running some version of Linux. Also one or two versions of FreeBSD and once even Solaris.

And 1993 was also the year when I started to really focus on other languages. It was when the Internet started to rise and I had a dial-up connection through CompuServe. These were well-known back then since they provided a free CD for Internet access with almost every computer magazine back then. A lot of people had huge stacks of CompuServe CD’s at home, not knowing what to do with them, although they were great frisbees and also practical to scare off birds in your garden. But I used one to subscribe and kept using it until 2005, even though I had moved to Chello in 1998. (And Chello later became UPC and is now called Ziggo.)

Anyways, I had learned some Forth and a few other languages, tried my best with Perl, got some early experience with HTML and through Delphi I learned more about DBase IV and Paradox. (Both databases.) I also started to focus on C and C++, which was important since the Windows kernel was built in C and exposed methods to call to from Delphi that used the C calling methods and logic. The Windows API was well-documented for C developers but I had to learn to convert this C code to Pascal code so I could use the same libraries in Delphi.

With my copy of Borland Pascal 7 also came a copy of Turbo Assembly so I also focused on Assembly. I wrote a mouse driver in Assembly to use with Pascal on MS-DOS and when I worked as a software developer in 1994 for a company called Duware B.V. I also used Assembly to create a screen saver to use within the applications we created.

Applications that were created with Microsoft Basic Professional Development System 7.1, which technically was the latest version of QuickBasic before Microsoft created Visual Basic. Since my employer wanted to move from MS-DOS to Windows, he was also looking for a good programming language for Windows. My suggestion of Delphi was ignored because that meant my boss would need to learn PAscal, which he did not want to do. We also looked at Gupta SQL Windows, which seemed promising but when he hired a new employee who had PowerBuilder experience, he decided that we would move to PowerBuilder instead! This language was similar to the BASIC he knew and seemed a bit promising.

Still, when it took two days for two of his employees to make an animated button on a form and he allowed them to waste that much time on an unimportant feature, I realised that this job wasn’t very promising. For my boss, BASIC was like a Golden Hammer. And in my experience, you need to stay far away from people who use golden hammers since they think they’re always right and always have the right tool. What matters to them is that the problem must fit the tool, else the project needs to be changed to match.

Real developers realise that it’s the opposite! A tool must match the project, else you have to pick a different tool. What matters is that you have a large toolkit of programming languages and various techniques, APIs and frameworks that you’re familiar with so you can pick the right tool for each project.

And while I’ve been working as a Delphi developer for almost two decades, I have always focused on other languages too. I’ve done some projects partly in Assembly to speed some processes up. I’ve worked on C projects that needed to compile on various mainframes and Unix systems so I could only use the standard libraries. I’ve worked with techniques like ActiveX, COM, DCOM and COM+. I’ve created web pages in PHP that were served from a Delphi server application. I’ve written code in C++ whenever that was required. And since 2001 I also focused on .NET and specifically C# and ASP.NET for web development and web services. I’ve used Python, Perl, JavaScript and I’ve specialized in XML with style sheets and creating XML schemas. I even worked with ASN.1 for a project where I had to communicate with an external device that used a BER encoding standard.

And these days, my main focus is on Visual Studio 2015 with C# and C++, CLang, JavaScript and jQuery. I’m also learning more about electronics, writing C programs and libraries to use with an ATTiny 85 and other Atmel micro controllers to make my own hardware and to communicate with these self-made devices from e.g. my web server.

As a developer, it is a good thing to experiment with various electronic devices and micro controllers to hone your skills. It provides a better insight in hardware and how to communicate between devices. You will often have to consider techniques like WiFi, Bluetooth or Infrared communications and come up with proper protocols to send information between devices.

All in all, I have a varied experience with lots of hardware and software, I can manage my own web servers and am experienced with various operating systems like Windows, Linux, BSD and IOS. I am now focusing on embedded devices and Android/IOS development but I still keep all my skills up to date, including my Delphi knowledge next to C#. I need various tools in my toolbox, which is important for each and every software developer in this world.

And no, I don’t think that language X is better than language Y. Good developers care as much about programming languages as expert carpenters do about their hammers and screwdrivers. Because it is not the tool that matters, but the final product that you’re building!

The need of security, part 3 of 3.

Azra Yilmaz Poses III
Enter a caption

Do we really need to hash data? And how do we use those hashed results? That is the current topic.

Hashing is a popular method to generate a key for a piece of data. This key can be used to check if the data is unmodified and thus still valid. It is often used as an index for data but also as a way to store passwords. The hashed value isn’t unique in general, though. It is often just a large number between a specific range of values. If this range happens to be 0 to 9, it would basically mean that any data will result in one of 10 values as identifier, so if we store 11 pieces of data as hashes, there will always be two pieces of data that generate the same hash value. And that’s called collisions.

There are various hashing algorithms that are created to have a large numerical range to avoid collisions. Chances of collisions are much bigger in smaller ranges. Many algorithms have also been created to generate a more evenly distribution of hash values which further reduces the chance of collisions.

So, let’s have a simple example. I will hash a positive number into a value between 0 and 9 by adding all digits to get a smaller number. I will repeat this for as long as the resulting number is larger than 9. So the value 654321 would be 6+5+4+3+2+1 or 21. That would become 2+1 thus the hash value would be 3. Unfortunately, this algorithm won’t divide all possible hash values equally. The value 0, will only occur when the original value is 0. Fortunately, the other numbers will be divided equally as can be proven by the following piece of code:

Snippet

using System;
 
namespace SimpleHash
{
    class Program
    {
        static int Hash(int value)
        {
            int result = 0;
            while (value > 0)
            {
                result += value % 10;
                value /= 10;
            }
            if (result >= 10) result = Hash(result);
            return result;
        }
 
        static void Main(string[] args)
        {
            int[] index = new int[10];
            for (int i = 0; i < 1000000; i++) { index[Hash(i)]++; }
            for (int i = 0; i < 10; i++) { Console.WriteLine("{0}: {1}", i, index[i]); }
            Console.ReadKey();
        }
    }
}

Well, it proves it only for the values up to a million, but it shows that 999,999 of the numbers will result in a value between 1 and 9 and only one in a value of 0, resulting in exactly 1 million values and 10 hash values.

As you can imagine, I use a hash to divide a large group of numbers in 10 smaller groups. This is practical when you need to search for data and if you have a bigger hash result. Imagine having 20 million unsorted records and a hash value that would be between 1 and 100,000. Normally, you would have to look through 20 million records but if they’re indexed by a hash value, you just calculate the hash for a piece of data and would only have to compare 200 records. That increases the performance, but at the cost of maintaining an index which you need to build. And the data needs to be an exact match, else the hash value will be different and you would not find it.

But we’re focusing on security now and the fact that you need to have a perfect match makes it a perfect way to check a password. Because you want to limit the amount of sensitive data, you should not want to store any passwords. If a user forgets a password, it can be reset but you should not be able to just tell them their current password. That’s a secret only the user should know.

Thus, by using a hash, you make sure the user provides the right password. But there is a risk of collisions so passwords like “Wodan5tr1ke$Again” and “123456” might actually result in the same hash value! So, the user thinks his password is secure, yet something almost everyone seems to have used as password will also unlock all treasures! That would be bad so you need two things to prevent this.

First of all, the hash algorithm needs to provide a huge range of possible values. The more, the better. If the result happens to be a 256-bit value then that would be great. Bigger numbers are even more welcome. The related math would be more complex but hashing algorithms don’t need to be fast anyways. Fast algorithms actually speed up brute-force attacks so with hashing, slower algorithms are better. The user can wait a second or two. But for a hacker, two seconds per attempt means he’ll spent weeks, months or longer just to try a million possible passwords through brute force.

Second of all, it is a good idea to filter all simple and easy to guess passwords and use a minimum length requirement together with an added complexity requirement like requiring upper and lower case letters together with a digit and special character. And users should not only pick a password that qualifies for these requirements but also when they enter a password, these checks should be performed before you check the hash value for the password. This way, even if a simple password collides with one of the more complex ones, it still will be denied since it doesn’t match the requirements.

Use regular expressions, if possible, for checking if a password matches all your requirements and please allow users to enter special characters and long passwords. I’ve seen too many sites which block the use of special characters and only use the first 6 characters for whatever reason, thus making their security a lot weaker. (And they also tend to store passwords in plain-text to add to the insult!)

Security is a serious business and you should never store more sensitive data than needed. Passwords should never be stored anyways. Only hashes.

If you want to make even a stronger password check, then concatenate the user name to the password. Convert the user name to upper case, though. (Or lower case) so the user name is case-insensitive. Don’t do the same with the password, though! The result of this will be that the username and password together will result in a hash value, so even if multiple people use the same password, they will still have different hashes.

Why is this important? It is because some passwords happen to be very common and if a hacker knows one such password, he could look in the database for similar hashes and he would know the proper passwords for those accounts too! By adding the user name, the hash will be different for every user,  even if they all use the same password. This trick is also often forgotten yet is simple enough to make your security a lot more secure.

You can also include the timestamp of when the user registered their account, their gender or other fixed data that won’t change after the account is created. Or if you allow users to change their account name, you would require them to provide their (new) password too, so you can calculate the new hash value.

The need of security, part 2 of 3.

Azra Yilmaz Poses II
Enter a caption

What is encryption and what do we need to encrypt? That is an important question that I hope to answer now.

Encryption is a way to protect sensitive data by making it harder to read the data. It basically has to prevent that people can look at it and immediately recognize it. Encryption is thus a very practical solution to hide data from plain view but it doesn’t stop machines from using a few extra steps to read your data again.

Encryption can be very simple. There’s the Caesar Cipher which basically shifts letters in the alphabet. In a time when most people were illiterate, this was actually a good solution. But nowadays, many people can decipher these texts without a lot of trouble. And some can do it just inside their heads without making notes. Still, some people still like to use ROT13 as a very simple encryption solution even though it’s almost similar to having no encryption at all. But combined with other encryption methods or even hashing methods, it could be making encrypted messages harder to read, because the input for the more complex encryption method has already a simple layer of encryption.

Encryption generally comes with a key. And while ROT13 and Caesar’s Cipher don’t seem to have one, you can still build one by creating a table that tells how each character gets translated. Than again, even the mathematical formula can be considered a key.

Having a single key will allow secret communications between two or more persons and thus keep data secure. Every person will receive a key and will be able to use it to decrypt any incoming messages. These are called symmetric-key algorithms and basically allows communication between multiple parties, where each member will be able to read all messages.

The biggest problem of using a single key is that the key might fall into the wrong hands, thus allowing more people access to the data than originally intended. That makes the use of a single key more dangerous in the long run but it is still practical for smaller sessions between multiple groups, as long as each member has a secure access to the proper key. And the key needs to be replaced often.

A single key could be used by chat applications where several people will join the chat. They would all retrieve a key from a central environment and thus be able to read all messages. But you should not store the information for a long time.

A single key can also be used to store sensitive data into a database, since you would only need a single key to read the data.

A more popular solution is an asymmetric-key algorithm or public-key algorithm. Here, you will have two keys, where you keep the private (master) key and give others the public key. The advantage of this system is that you can both encrypt and decrypt data with one of the two keys, but you can’t use the same key to reverse that action again. Thus it is very useful to send data into a single direction. Thus the private key encrypts data and you would need the public key to decrypt it. Or the public key encrypts data and you would need the private key to decrypt it.

Using two keys thus limits communication to a central hub and a group of people. Everything needs to be sent to the central hub and from there it can be broadcasted to the others. For a chat application it would be less useful since it means the central hub has to do more tasks. It needs to continuously decrypt and encrypt data, even if the hub doesn’t need to know the content of this data.

For things like email and secure web pages, two keys is practical, though. The mail or web server would give the public key to anyone who wants to connect to it so they can encrypt sensitive data before sending it to the server. And only the server can read it by using the private key. The server can then use the private key to encrypt new data and send it to the visitor, who will use the public key to decrypt the message again. Thus, you have secure communications between two parties.

Both methods have some very secure algorithms but also some drawbacks. Using a single key is risky if that key falls into the wrong hands. One way to solve this is by sending the single key using a two-key algorithm to the other side! That way, it is transferred in a secure way, as long as the key used by the receiver is secret enough. In general, that key should need to be a private key so only the recipient can read the single-key you’ve sent.

A single key is also useful when encrypting files and data inside databases since it would only require one key for both actions. Again, you would need to store the key in a secure way, which would again use a two-key algorithm. You would use a private key to encrypt the single key and include a public key in your application to decrypt this data again. You would also use that public key inside your applications only but it would allow you to use a single public key in multiple applications for access to the same data.

As I said, you need to limit access to data as much as possible. This generally means that you will be using various different keys for various purposes. Right now, many different encryption algorithms are already in use but most developers don’t even know if the algorithm they use is symmetrical or asymmetrical. Or maybe even a combination of both.

Algorithms like AES, Blowfish and RC4 are actually using a single key while systems like SSH, PGP and TLS are two-key algorithms. Single-key algorithms are often used for long-term storage of data, but the key would have additional security to avoid easy access to it. Two-key algorithms are often used for message systems, broadcasts and other forms of communication because it is meant to go into a single direction. You don’t want an application to store both a private key and matching public key because it makes encryption a bit more complex and would provide a hacker a way to get the complete pair.

And as I said, a single key allows easier communications between multiple participants without the need for a central hub to translate all messages. All the hub needs to do is create a symmetrical key and provide it to all participants so they can communicate with each other without even bothering the central hub. And once the key is deleted, no one would even be able to read this data anymore, thus destroying almost all traces of the data.

So, what solution would be best for your project? Well, for communications you have to decide if you use a central hub or not. The central hub could archive it all if it stays involved in all communications, but you might not always want this. If you can provide a single key to all participants then the hub won’t be needed afterwards.

For communications in one single direction, a two-key algorithm would be better, though. Both sides would send their public key to the other side and use this public key to send messages, which can only be decrypted by the private key which only one party has. It does mean that you actually have four keys, though. Two private keys and two public keys. But it happens to be very secure.

For data storage, using a single key is generally more practical, since applications will need this key to read the data. But this single key should be considered to be sensitive thus you need to encrypt it with a private key and use a public key as part of your application to decrypt the original key again.

In general, you should use encryption whenever you need to store sensitive data in a way that you can also retrieve it again. This is true for most data, but not always.

In the next part, I will explain hashing and why we use it.

The need of security, part 1 of 3.

Azra Yilmaz Poses I
Enter a caption

Of all the things developers have to handle, security tends to be a very important one. However, no one really likes security and we rather live in a society where you can leave your home while keeping your front door open. We generally don’t want to deal with security because it’s a nuisance!

The reality? We lock our doors, afraid that someone gets inside and steal things. Or worse, waits for us to return to kill us. We need it to protect ourselves since we’re living in a world where a few people have very bad intentions.  And we hate it because security costs money, since someone has to pay for the lock. And it takes time to use it, because locking and unlocking a door is still an extra action you need to take.

And when you’re developing software, you generally have the same problem! Security costs money and slows things down a bit. And it is also hard to explain to a client why they have to pay for security and why the security has to cost so much. Clients want the cheapest locks, yet expect their stuff is as safe as Fort Knox or even better.

The worse part of all security measures is that it’s never able to keep everyone out. A lock on your door won’t help if you still leave the window open. And if the window is locked, it is still glass that can be broken. The door can be kicked in too. There are always a lot of ways for the Bad People to get inside so what use is security anyways?

Well, the answer is simple: to slow down any would-be attacker so he can be detected and dealt with, and to make the break-in more expensive than the value of the loot stored inside. The latter means that the more valuable the loot is, the stronger your security needs to be. Fort Knox contains very valuable materials so it has a very strong security system with camera’s and lots of armed guards and extremely thick walls.

So, how does this all translate to software? Well, simple. The data is basically the loot that people are trying to get at. Legally, data isn’t property or doesn’t even has much legal protection so it can’t be stolen. However, data can be copyrighted or it can contain personal information about people. Or, in some cases, the data happens to be secrets that should not be exposed to the outside world. Examples of these three would be digital artwork, your name and bank account number or the formula for a deadly poison that can be made from basic household items.

Of all this data, copyrighted material is the most common item to protect, and this protection is made harder because this material is meant to be distributed. The movie and music industry is having a very hard time protecting all copyrighted material that they have and the same applies to photographers and other graphical artists. But also software developers. The main problem is that you want to distribute a product in return for payment and people are getting it without paying you. You could consider this lost profit, although if people had no option but pay for your product, they might not have wanted it in the first place. So the profit loss is hard to prove.

To protect this kind of material you will generally need some application that can handle the data that you’re publishing. For software, this would be easy because you would include additional code to your project that will check if the software has been legally installed or not. Often, this includes a serial number and additional license information and nowadays it tends to include calling a special web server to check if licenses are still valid.

For music and films, you can use a technique called DRM which works together with proper media players to make additional checks to see if the media copy happens to be from a legal source or not. But it would limit the use of your media to media players that support your DRM methods. And to get media players to support your DRM methods, you need to publish those methods and hope they’re secure enough. But DRM has already been bypassed by hackers many times so it has proven to be not as effective as people hoped.

And then there’s a simpler option. Add a copyright notice to the media. This is the main solution for artwork anyways, since there’s no DRM for just graphic images. You might make the image part of an executable but then you have to build your own picture viewer and users won’t be able to use your image. Not many people want to just see images, unless it is pornography. So you will have to support the basic image file formats, which are generally .JPG or .PNG for any image on the Internet. Or .GIF for animations. And you protect them by adding a warning in the form of a copyright notice. Thus, if someone is misusing your artwork and you discover the use of your art without a proper license, then you can start legal actions against the violator and claim damages. This would start by sending a bill and if they don’t pay, go to court and have a judge force them to pay.

But media like films, music and images tend to be hard to protect and often require going to court to protect your intellectual property. And you won’t always win such cases either.

Next on the list is sensitive, personal information. Things like usernames and passwords, for example. One important rule to remember is that usernames should always be encrypted and passwords should always be hashed. These are two different techniques to protect data and will be explained in the next parts.

But there is more sensitive data that might need to be stored and which would be valuable. An email address could be misused to spam people so that needs to be encrypted. Name, address and phone numbers can be used to look up people and annoy those people by ordering stuff all over the Internet and have it sent to their address. Or to make fake address changes to change their address to somewhere else, so they won’t receive any mail or other services. Or even to visit the address, wait until the people left the house and then break in. And what has happened in the past with addresses of young children is that a child molester learns of their address and goes to visit them to rape and/or kidnap them. So, this information is also sensitive and needs to be encrypted.

Other important information would be bank account information, medical data and employment history would be sensitive enough to have encrypted. Order information from visitors might also be sensitive if the items were expensive since those items would become interesting things to steal. You should basically evaluate every piece of information to determine if it needs to be encrypted or not. In case of doubts, encrypt it just to make it more secure.

Do keep in mind that you can often generate all kinds of reports about this personal data. A simple address list of all your customers, for example. Or the complete medical file of a patient. These documents are sensitive too and need to be protected, but they’re also just basic media like films and artwork so copies of those reports are hard to protect and often not protected by copyrights. So be very careful with report generators and have report contain warnings about how sensitive the data in it actually is. Also useful is to have a cover page included as the first page of a report, in case people will print it. The cover page would thus cover the content if the user keeps it closed. It’s not much protection but all small bits are useful and a cover page prevents easy reading by passer-by’s of the top page of the report.

Personal information is generally protected by privacy laws and thus misuse of personal information is often considered a criminal offense. This is unlike copyright violations, which are just civil offenses in general. But if you happen to be a source of leaking personal information, you and your company could be considered guilty of the same offense and will probably be forced to pay for damages and sometimes a large fine in case of clear negligence in protecting this data.

The last part of sensitive data tends to be ideas, trade secrets and more. In general, these are just media files like reports and thus hard to protect, although there are systems that could store specific data as personal data so you can limit access to it. Ideas and other similar data are often not copyrightable. You can’t get copyright on an idea. You can only get copyright over the document that explains your idea but anyone who hears about your idea can just use it. So if you find a solution for cleaner energy, anyone else could basically build your idea into something working and make profits from it without providing you any compensation. They don’t even have to say it was your idea!

Still, to protect ideas you can use a patent, which you will have to register in many countries just to protect your idea everywhere. Patents become open to the public so everyone will know about it and be able to use it, but they will need to compensate you for using your idea. And you can basically set any price you like. This system tends to be used by patent trolls in general, since they describe very generic ideas and then go after anyone who seems to use something very similar to their idea. They often claim an amount of damages that would be lower than the legal amount it would cost the accused to fight back, so they tend to get paid for this trolling. This is why many are calling for patent reforms to stop these patent trolls from abusing the system.

So, ideas are very sensitive. You generally don’t want to share them with the generic public since it would allow others to implement your ideas. Patents are a bit expensive and not always easy to protect. And you can’t patent everything anyways. Some patents will be refused because they’ve already been patented before. And yet you still need to share them with others so you can build the idea into a project. And for this, you would use a non-disclosure agreement or NDA.

An NDA is basically a contract to make sure you can share your idea with others and they won’t be allowed to share it with more people without your permission. And if your idea does get leaked, those others would have to compensate your financial losses due to leakage as mentioned in the NDA contract. It’s not very secure but it generally does prevent people from leaking your ideas.

Well, except for possible whistleblowers who might leak information about any illegal or immoral parts of your idea. For example, if your idea happens to be to blow up the subway in Amsterdam and have an NDA with a few other terrorists to help you then it becomes difficult when one of those others just walks to the police to report you and those who help you. The NDA just happens to be a contract and can be invalidated for many reasons, including the more obvious criminal actions that would relate to it.

But there are also so-called blacklists of things you can’t force in an NDA, depending on the country where you live. It is just a contract and thus handled by the Civil courts. And if the NDA violates the rights of those who sign it then it could be invalid. One such thing would be the right of free speech, where you would ban people from even discussing if your idea happens to be good or not.

Other sensitive information would be things like instructions on how to make explosives or business information about the future plans of Intel, which could influence the stock market. Some of this information could get you into deep trouble, including the Civil Court or Criminal Court as part of your troubles, resulting in fines and possibly imprisonment if they are leaked.

In general, sensitive information isn’t meant to be shared with lots of people so you should seriously limit access to such information. It should not be printed and you should not email this information either. The most secure location for this information would be on a computer with no internet connection but having a strong firewall that blocks most access methods would be good enough for many purposes.

So we have media, which is hard to protect because it is meant to be published. We have sensitive data which should not fall in the wrong hands for various reasons and we have personal data, which is basically a special case of sensitive data that relates to people and thus has additional laws as protection.

And the way to secure it is by posting warnings and limiting access to the data, which is difficult if it was meant to be published. But for those data that we want to keep private, we have two ways of protecting it next to limiting the physical access to this data.

To keep things private, you will need to have user accounts with passwords or other security keys to lock the data and limit access to it. And these user accounts are already sensitive data so you should start with protecting it here, already.

Of all the things software developers do, security happens to be the most complex and expensive part, since it doesn’t provide any returns on investments made. All it does is try to provide assurance that data will only be available for those who are meant to use it.

The two ways to protect data is through encryption and through hashing, which are two similar things, yet also differ in their purpose. I will discuss both in my next posts.

Four models on Shapeways (NSFW)

I like Shapeways since you can upload your own 3D designs and end up with a 3D printed model. This allows me to e.g. create custom boxes for small hardware experiments. These boxes are combined with my Poser models and will thus result in very interesting designs. But like everything with 3D, you will have to do some experiments first. I created three new models in Poser named Nora, Tommi and Cassiopa and I used some interesting trick to create a special rack to include in the pose. But first, let’s look at Nora:

WIN_20151026_102324 WIN_20151026_102455

Nora was printed in two versions: White plastic and Colored sandstone. And in both models a few flaws were already visible. Nora’s shoes were made of a very thin material and the upload to Shapeways did a repair that removed the very thin parts. As a result, the shoes are flawed.

WIN_20151026_102331 WIN_20151026_102500

Well, a bit of glue and plastic can fix that. But her fingers were also a bit delicate and the sandstone version ended up with broken fingers because the fingers are actually too thin. Again, some glue and they’re back in place.

WIN_20151026_102352 WIN_20151026_102505

Her thumb is still missing, though. Then again, I was more interested in checking how well the 3D printer handles holes, like the area where she keeps her left hand. In front of her genitals, to keep it decent, yet far away so it doesn’t touch. Combined with the position of her legs, this results in a complex hole to print but it ended up flawless. Even her left hand was intact.

WIN_20151026_102409 WIN_20151026_102511

So, what I’ve learned from Nora is that thin elements like fingers and shoes won’t print very well. White plastic does a better job than sandstone, though. That’s because sandstone needs further processing after the printing is done, which requires some manual labour. Thus, small parts can end up being damaged.

Another part that’s important with the sandstone version is the textures. For this, I will check her face:

WIN_20151026_102344 WIN_20151026_102524

And in case you’re wondering why her hair is covered by a towel, well… Hair really doesn’t print very well. It tends to generate loose shells or often to parts that are too thin to print. Besides, the towel makes her look as if she’s just out of bath, relaxing.

The White plastic versions shows a reasonable amount of details in her face. Even her open mouth is printed quite nicely. The sandstone model also has an open mouth and you might see her tongue and teeth if you look inside with a microscope. But I’m more looking at her face and eyes.

Printing in colored sandstone has an ink density of about 50 DPI. Normally, a printer would print at 300 DPI so the colors will lose details. But I chose a light-colored iris and Nora has good-looking pupils in this print. Which is important to remember, since dark eye colors might darken the whole eye. It still looks good in my opinion. At least better than what I can do with paint and a brush.


The next model is Cassiopa. Since I know that thin parts won’t print well, I’ve placed her on a towel, hoping for a better result. The result is okay but the sandstone version did not survive the print because the towel was too thin. So I uploaded a newer version of Cassiopa on a more solid floor and in this version, I also adjusted her clothing. Why? Because I need to test more than just panties on topless women. Still, the white plastic version looks okay, although it is a bit small:

WIN_20151026_102632 WIN_20151026_102652 WIN_20151026_102700 WIN_20151026_102817

The model was almost 15 CM long, but that’s the length of the towel. Cassiopa uses only 2/3rd of this length, thus she’s smaller than my other models. (This also happens with one of my Tommi models.) Smaller means that fewer details will be visible but it is still detailed enough.

The towel she’s on has a hole in it, which is too bad but I’m not too worried about it. I now know that I can’t use these kinds of thin plateaus for my models to rest upon. In the sandstone version, the towel had crumbled away.


The last model is Tommi which I’ve combined with a rack. I made a second version of Tommi climbing this rack but Tommi herself becomes small if you do this, thus losing details. Let’s look at the climbing version first:

WIN_20151026_103142 WIN_20151026_103151 WIN_20151026_103216 WIN_20151026_103233 WIN_20151026_103242 WIN_20151026_103256

I gave Tommi a skirt instead of panties so you should have been able to look up her skirt. However, Shapeways repairs this automatically and as a result, the skirt became solid. And that’s a flaw in the skirt model.

This is a colored print so her texture helps to add details, but she’s too small to be very clear in details. She did have a flaw in her right hand, since her fingers were too thin and either did not get printed or broke off afterwards. A bit of paint will fix that, though. It is just something to remember.

So, remember: make sure thin parts are well-supported and preferably resting against something else and with clothes, be aware that Shapeways might fill in specific areas that you’ve hoped would stay hollow. In this case her skirt but I also tried another interesting top on Tommi but that added a white mass over her breasts since Shapeways was filling the area between the left and right cup.

Next, the bigger version of Tommi with her resting upon the rack. That one was perfect, although one of the legs from the rack had broken off during transport. So, even if a part is thick enough to print, it might still be very vulnerable. With a length of over 4 CM, they can’t handle a lot of stress. Still, this model is great with no broken appendices and even her toenails are visible!

WIN_20151026_102904 WIN_20151026_102914 WIN_20151026_102958 WIN_20151026_103102

Well, at least I glued the leg back in place. I might decide to remove all four instead, though, if I fear they will break again. This model happens to be quite heavy too, which makes sense since she has the biggest volume of all. Her eyes are nicely detailed and her skin color even has some variation around her knees. And you can see her toenails! A bigger model is nice in that regard so if your model has a lot of fine details, have it printed in a larger scale! Although the price will scale up too, since more materials will be required.

Well, these three models all look reasonable well and taught me what I need to know about printing Poser models: use a reasonable large-scale, support all small parts and be aware that hollow spaces might end up being filled with extra material because Shapeways “repairs” some thin materials.

I kept these models mostly undressed because I know the textures of these models and needed to see how the color printing will support the texture details. Also, it is difficult to find Poser clothing models that are working well when uploaded to Shapeways. These models are not made to be printed in 3D but to be rendered. So finding good clothes to print is difficult. For Victoria 4, her bikini top and bottom do print quite well, though. They too are filled up, but the filling it towards the body of the model and not between both cups.

Another problem is the limitations on models set by Shapeways. There’s a size limit and there’s a polygon limit. (64 MB or 1 million polygons.) Poser models can easily go over this amount of polygons so you will have to find a way to reduce those, while keeping textures intact.


And then there’s the rack used by both models. The rack is the same length for both and I’ve created it myself by using the Firestorm viewer with the Second Life virtual worlds, but I could have used my own OpenSim world too. I just joined several cylinders for the rounded sides and balls for the rounded corners to build the framework. I also created a square plane with a hole inside, which I copied three times and put next to one another. I then exported the whole model from the SL viewer to a Collada file, which I imported in AccuTrans 3D to clean it up a bit and to reduce the complexity of it. (For example, by merging all parts into one single part.)

And then I checked if the rack has enough space for other hardware.

WIN_20151026_104131 WIN_20151026_104323

Well, the rack isn’t wide enough for an Arduino board

Since I copied the square plane three times, I had expected all holes to have the same size. And the rack was made so I can add some hardware in the empty rack space and have some wires or other parts move through the open holes to e.g. shine a LED light on the model. So, I was surprised when I discovered that the middle hole was slightly bigger than the other two. Which I discovered by trying to fit an Arduino-board. (The YUN is shown in the picture.) The length is long enough for the Arduino Mega but it will have a few millimeters on the sides of the rack. The pins are actually at the exact location of the long bars. So you could actually put an Arduino in the rack if you don’t mind the width.

But smaller devices like the Arduino Mini, the Trinket, the NetDuino mini and the Digispark have plenty of room inside the rack.

But back to the holes!

WIN_20151026_103544WIN_20151026_103535 WIN_20151026_103705 WIN_20151026_103853 WIN_20151026_103920 WIN_20151026_104434

Using the climbing Tommi version, I used to try a green LED. It doesn’t fit the top or bottom hole but it does fit the middle hole. Trying it again with a regular lamp of 5 MM diameter, I see it going through the middle hole without effort but the top and bottom ones don’t fit. A laser light won’t even fit the middle hole, though.

The conclusion is that these holes are a bit too small for LED lights. No problem, since I can take a drill bit and make them wider. Still, I had hoped they would be big enough for a LED light. So I have to redo my calculations. And I have to wonder why the middle hole is bigger than the other two, while they’re basically all the same in my 3D software.

Anyway, I now have two great models for containing some of my experimental hardware. I know the racks are open so the hardware would be exposed but that’s something I will solve with a next version of my rack. I also know how thin the walls can be and how thin the walls of my rack are. I can still have the rounded areas but the rack should get more solid walls. Thin walls too, since the rack has a lot of volume.

Next, the question what I would like to create with these models. Whatever I think of should match the model. The three holes in the rack are meant for lights, cables, buttons or something else but I don’t want to show too much hardware on the model side of the rack. I also need to find a solution to attach the additional hardware to the rack, since it doesn’t have any special pins or whatever to hold them. Then again, these models were created to see how well these racks would print. The different hole size was a surprise for me which I need to include in my calculations.

And the three rack-less models? They’re just nice desk ornaments.I have ordered more prints so I will likely have more ornaments soon.

My next designs will have better racks, preferably with extra points to hold my hardware in place. The sandstone prints still look great but I have to consider the size of the whole thing. And I will need to experiment with clothing, to see which items will print best. The same is true with hair, since I still have to find hair that prints well in 3D.

All in all, 3D printing is a very interesting challenge. Slightly expensive too, though.

An example of bad development…

I recently received an email from a company that’s doing questionnaires. And well, I subscribed to this and did some of their questionnaires before, so I wanted to do this new one too. Unfortunately, the page loaded quite slow, only to return a very nasty error message. A message that told me that this organisation is using amateurs for developers and administrators.

Let me be clear about one thing: errors will happen. Every developer should expect weird things to happen, but this case is not an error but evidence of amateurs. So, let’s start with analyzing the message…

Server Error in ‘/’ Application.

Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

So, what’s wrong with this? Users should never see these messages! When you develop in ASP.NET you can just tell the system to just keep these error messages only when the user is connected locally. A remote user should see a much simpler message.

This is something the administrator of the website should have known, and checked. He did not. By failing at this simple configuration setting the organisation is leaking some sensitive information about their website. Information that’s enough for me to convince they’re amateurs.

This error is also a quite common error message. Basically, it’s telling me that the system is having too many database connection open. One common cause for this error is when the code fails to close a connection after opening them. Keep that in mind, because I will show that this is what caused the error…

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

This is a standard follow-up message. The fact that users of the site would see this stack trace too is just bad.

Exception Details: System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

A timeout error. A reference to the connection pool and the max pool size. This already indicates that there are more connections are opened than closed and the system can’t handle that correctly. There are frameworks for .NET that are better suited for this to prevent these kinds of errors. That’s because these errors happened to be very common with ASP.NET applications. And with generic database applications written in .NET.

Basically, the top of the error message is just repeating itself. Blame Microsoft for that since this is a generic message from ASP.NET itself. Developers can change the way it looks but that’s not very common. Actually, developers should prevent users from seeing these kinds of messages to begin with. Preferably, the error should be caught by an exception handler which would write it to a log file or database and send an alert out to the administrator.

Considering that I received this error on a Friday afternoon, I bet the developer and administrators are already back home, watching television like I do now. Law & Order is just on…

Source Error:

Line 1578:
Line 1579:        cmSQL = New SqlCommand(strSQL, cnSQLconfig)
Line 1580:        cnSQLconfig.Open()
Line 1581:
Line 1582:        Try

This is interesting… The use of SqlCommand is a bit old-fashioned. Modern developers would have switched to e.g. the Entity Framework or other, more modern solution for database access. But the developers of this site are just connecting to the database in code, probably to execute a query and collect the data and then should close the connection again. The developers are clearly using ADO.NET for this site. And I can’t help but wonder why. They could have used more modern techniques instead. But probably they just need to keep up an existing site and aren’t they allowed to use more modern solutions.

But it seems to me that closing the database is not going to happen here. There are too many connections already open thus this red line of code fails. The code has an existing connection called cnSQLConfig which is already open. It then tries to open and execute an SQL command that fails. Unfortunately, opening that command happens outside a try-except block and if this fails, it is very likely that the connection won’t be closed either.

If this happens once or twice, then it still would not be a big problem. The connection pool is big enough. But here it just happened too often.

Another problem is that the ADO.NET technique used here is also vulnerable for SQL Injection. This would also be a good reason to use a different framework for database access. It could still be that they’re using secure code to protect against this but what I see here doesn’t give me much confidence.

Source File: E:\wwwroot\beta.example.com\index.aspx.vb    Line: 1580

A few interesting, other facts. First of all, the code was written in Visual Basic. That was already clear from the code but this just confirms it. Personally, I prefer C# over Visual Basic, even though I’ve developed in both myself. And in a few other languages. Language should not matter much, especially with .NET, but C# is often considered more professional than BASIC. (Because the ‘B’ in BASIC stands for ‘Beginners’.)

Second of all, this piece of code has over 1580 lines of code. I don’t know what the rest of the code is doing but it’s probably a lot of code. Again, this is an old-fashioned way of software development. Nowadays, you see more usage of frameworks that allow developers to write a lot less code. This makes code more readable. Even in a main index of a web site, the amount of code should be reasonable low. You can use views to display the pages, models to handle the data and controllers to connect both.

Yes, that’s Model-View-Controller, or MVC. A technique that’s practical in reducing the amount of code, if used well enough.

And one more thing is strange. While I replaced the name of the site with ‘example.com’, I kept the word ‘beta’ in front of it. I, a user, am using a beta-version of their website! That’s bad. Users should not be used as testers because it will scare them off when things go wrong. Like in this case, where the error might even last the whole weekend because developers and administrators are at home, enjoying their weekend.

Never let users use your beta versions! That’s what testers are for. You can ask users to become testers, but then users know they can expect errors like these.

Stack Trace:

[InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.]
   System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) +4863482
   System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) +117
   System.Data.SqlClient.SqlConnection.Open() +122
   _Default.XmlLangCountry(String FileName) in E:\wwwroot\beta.example.com\index.aspx.vb:1580
   _Default.selectCountry() in E:\wwwroot\beta.example.com\index.aspx.vb:1706
   _Default.Page_Load(Object sender, EventArgs e) in E:\wwwroot\beta.example.com\index.aspx.vb:251
   System.Web.UI.Control.OnLoad(EventArgs e) +99
   System.Web.UI.Control.LoadRecursive() +50
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

And that’s the stack trace. We see the site loading its controls and resources and the ‘Page_Load’ method is called at line 251. At line 1706 the system is apparently loading country-information which would be needed to set the proper language. Then it returns to line 1580 where it probably opens some table based on information from the language file.

Again, this is a lot of code for basically loading the main page. I even wonder why it needs to load data from the database based on the country information. Then again, I was about to fill in a questionnaire so it probably wanted to load the questionnaire in the proper language. If the questionnaire is multi-lingual then that would make sense.

Version Information: Microsoft .NET Framework Version:2.0.50727.3655; ASP.NET Version:2.0.50727.3658

And here’s one more bad thing. This site still uses .NET version 2.0 while the modern version is 4.5 and we’re close to version 5.0… It would not surprise me if these developers still use Visual Studio 2005 or 2008 for this all. If that’s the case then their budget for development is probably quite low. I wonder if the developers who are maintaining this site are even experts at software development. It’s not a lot of information that I can base this upon but in short:

  • The administrator did not prevent error messages to show up for users.
  • The use of ADO.NET adds vulnerabilities related to the connection pool and SQL injection.
  • The use of VB.NET is generally associated to less experienced developers.
  • The amount of code is quite long but common for sites that are developed years ago.
  • Not using a more modern framework makes the site more vulnerable.
  • Country information seems to be stored in XML while the questionnaire is most likely stored inside the database.
  • The .NET version has been out-of-date for a few years now.

My advice would be to just rewrite the whole site from scratch. Use the Entity Framework for the database and MVC 4 for the site itself. Rewrite it in C# and hire more professional developers to do the work.

A very generic datamodel.

I’ve come up with several projects in the past and a few have been mentioned here before. For example, the Garagesale project which was based on a system I called “CART”. Or the WordChain project that was a bit similar in structure. And because those similarities, I’ve been thinking about a very generic datamodel that should be handled to almost any project.

The advantage of a generic database is that you can focus on the business layer while you don’t need to change much in the database itself. The datamodel would still need development but by using the existing model, mapping to existing entities, you could keep it all very simple. And it resulted in this Datamodel:ClassDiagram(Click the image to see a bigger version.)

The top class is ‘Identifier’ which is just an ID of type GUID to find the records. Which will work fine in derived classes too. Since I’m using the Entity Framework 6 I can just use POCO to keep it all very simple. All I have to do is define a DBContext that tells me which tables (classes) I want. If I don’t create an entry for ‘Identifier’, the table won’t be created either.

The next class is the ‘DataContent’ class, which can hold any XML. That way, this class can contain all information that I define in code without the need to create new tables. I also linked it to a ‘DataTemplate’ class which can be used to validate the content of the XML with an XML schema or special style sheet. (I still need to work out how, exactly.) The template can be used to validate the data inside the content.

The ‘BaseItem’ and ‘BaseLink’ classes are the more important here. ‘BaseItem’ contains all fixed data within my system. In the CART system, this would be the catalog. And ‘BaseLink’ defines transactions of a specific item from one item to another. And that’s basically three-fourth of the CART system. (The template is already defined in the ‘DataTemplate’ class.)

I also created two separate link types. One to deal with fixed numbers which is called ‘CountLink’ which you generally use for items. (One cup, two girls, etc.) The other is for fractional numbers like weights or money and is called ‘AmountLink’. These two transaction types will be the most used transaction types, although ‘BaseLink’ can be used to transfer unique items. Derived links could be created to support more special situations but I can’t think of any.

The ‘BaseItems’ class will be used to derive more special items. These special items will define the relations with other items in the system. The simplest of them being the ‘ChildItem’ class that will define more information related to a specific item. They are strongly linked to the parent item, like wheels on a car or keys on a keyboard.

The ‘Relation’ class is used to group multiple items together. For example, we can have ‘Books’ defined as relation with multiple book items linked to it. A second group called ‘Possessions’ could also be created to contain all things I own. Items that would be in both groups would be what is in my personal library.

A special relation type is ‘Property’ which indicates that all items in the relation are owned by a specific owner. No matter what happens with those items, their owner stays the same. Such a property could e.g. be a bank account with a bank as owner. Even though customers use such accounts, the account itself could not be transferred to some other bank.

But the ‘Asset’ class is more interesting since assets are the only items that we can transfer. Any transaction will be about an asset moving from one item to another. Assets can still be anything and this class doesn’t differ much from the ‘BaseItem’ class.

A special asset is a contract. Contracts have a special purpose in transactions. Transactions are always between an item and a contract. Either you put an asset into a contract or extract it from a contract. And contracts themselves can be part of bigger contracts. By checking how much has been sent or received to a contract you can check if all transactions combined are valid. Transactions will have to specify if they’re sending items to the contract or receiving them from the contract.

The ‘BaseContract’ class is the more generic contract type and manages a list of transactions. When it has several transactions, it is important that there are no more ‘phantom items’. (A phantom item would be something that’s sent to the contract but not received by another item, or vice versa.) These contracts will need to be balanced as a check to see if they can be closed or not. They should be temporary and last from the first transaction to the last.

The ‘Contract’ type derived from ‘BaseContract’ contains an extra owner. This owner will be the one who owns any phantom items in the contract. This reduces the amount of transactions and makes the contract everlasting. (Although it can still be closed.) Balancing these contracts is not required, making them ideal as e.g. bank accounts.

Yes, it’s a bit more advanced than my earlier CART system but I’ve considered how I could use this for various projects that I have in mind. Not just the GarageSale project, but also a simple banking application, a chess notation application, a project to keep track of sugar measurements for people with diabetics and my WordChain application.

The banking application would be interesting. It would start with two ‘Relation’ records: “Banks” and “Clients”. The Banks relation would contain Bank records with information of multiple banks. The Clients relation would contain the client records for those banks. And because of the datamodel, clients can have multiple banks.

Banks would be owners of bank accounts, and those accounts would be contracts. All the bank needs to do is keep track of all money going in our out the account. (Making money just another item and all transactions will be of type ‘AmountLink’.) But to link those accounts to the persons who are authorized to receive money from the account, each account would need to be owner of a Property record. The property record then has a list of clients authorized to manage the account.

And we will need six different methods to create transactions. Authorized clients can add or withdraw money from the account. Other clients can send or receive payments from the account, where any money received from the contract needs to be authorized. Finally, the bank would like to have interest, or pays interest. (Or not.) These interest transactions don’t need authorization from the client.

The Chess Notation project would also be interesting. It would start with a Board item and 64 squares items plus a bunch of pieces assets. The game itself would be a basic contract without owner. The Game contract would contain a collection of transactions transferring all pieces to their first locations. A collection of ‘Move’ contracts would also be needed where the Game Contract owns them. The Move would show which move it is (including branches of the game) and the transactions that take place on the board. (White Rook gone from A1, White Rook added to A4 and Black pawn removed from A4, which translates into rook takes pawn at A4.)

It would be a very complex way to store a chess game, but it can be done in the same datamodel as my banking application.

With the diabetes project, each transaction would be a measurement. The contract would be owned by the person who is measuring his or her blood and we don’t need to send or receive these measurements, just link them to the contract.

The WordChain project would be a bit more complex. It would be a bunch of items with relations, properties and children. Contracts and assets would be used to support updates to the texts with every edit of a WordChain item kicking the old item out of the contract and adding a new item into the contract. That would result in a contract per word in the database.

A lot of work is still required to make sure it works as well as I expect. It would not be the most ideal datamodel for all these projects but it helps me to focus more on the business layer and the GUI without worrying about any database changes. Once the business model becomes more advanced, I could create a second data layer with a better datamodel to improve the performance of the data management.

 

 

 

Great photography, licensed or self-made…

The Internet has become extremely important in our daily lives. And more importantly, the Internet requires many developers to think more graphically. Twenty-five years ago, computers were mostly text-based with some little graphics. The Internet was about to be born and graphics was mostly restricted to small icons and images with a limited amount of colors. When you were lucky, your graphics card would be a VGA card, able to handle images with 256 colors at resolutions of 640×480 pixels. A need for a graphic standard was required back then and a few new formats were born.

The PCX format, created by the now-defunct Zsoft Corporation, turned out reasonable successful because it supported up to 256 colors with an extra color palette that allowed the selection of 256 colors from any of the true-color images. It also supported data compression, making it reasonable small. Yet the decompression method was pretty fast, thus the processor would not need to work hard to display the image.

The PCX format has extended to true-color more recently but the JPG format turned out to be a better format. Since processors started to improve their performance, the more complex compression of the JPG format was fast enough to use and resulted in smaller files, although the images would lose some details.

Another popular format was the GIF format, that allowed images with 255 colors plus a transparent layer. (Or 256 colors without transparency.) This format is still popular since it’s great for logos and cartoons and it allows animations. And the compression of GIF files would reduce the image considerably in size without losing any details.

The PNG format has become more popular and was created as successor of the GIF format. It was needed because modern graphics required more colors and there was a demand for a better transparency layer. The PNG format uses 24-bits or 48-bits for its colors allowing more colors than the human eye can detect, plus an alpha channel (24-bits only) allowing images to define the transparency level of each pixel to be anything between transparent and opaque. This was great to e.g. create dirty glass windows or thin, silk nightgowns as graphics.

There are, of course, many other graphic formats but I want to talk about art, not formats. And this time, I want to talk about Pavel Kiselev, also known as photoport (NSFW), who likes to create glamorous pictures of pretty women. Today, he posted this picture of Irene, of one of his models. (I’ve licensed it for personal use, and this is my personal blog so it should be okay.)

IreneAnd this is the kind of photography that I love to see. Should I say more?

Well, okay… I do have to keep in mind that I wanted to relate this to software development so I should not distract myself by continuously looking in those pretty eyes. 🙂 So, back to the software development part…

When you’re designing websites, you have to keep in mind that you will need a lot of graphics. Something simple like an icon to display in the browser is already a requirement these days, else people have some trouble finding your site among their favorites. They can, of course, read the labels in the menu but most people will glance over all icons first and clicking on the icon that they recognise as your icon. Without the icon, they have more trouble finding you so never forget to add a Favicon to your site! Something that people will easily recognize as your brand.

Next, your site will need a logo and a background image. Or at least a logo. The best logos are PNG or GIF images, because they are small and allow transparency. The image of Irene would be bad as logo since it’s big and has a lot of bytes. When people visit your site with a slow internet connection, it would just look bad if the logo takes too long to download. Thus, keep it small yet detailed enough to be recognisable.

The background image might be bigger, unless you’re designing websites for mobile devices. For mobile devices, no background image would be better since it will take less bandwidth. Many mobile devices are accessing the Internet through providers who charge by the megabytes of data sent or received. Thus, for mobile sites you need to keep the amount of data to an absolute minimum, else it becomes expensive to visit your mobile websites forcing visitors to stay away when they’re roaming around…

But a favicon, logo and background aren’t always enough. Let’s forget the mobile devices for now and focus on the regular browsers and users who pay a fixed price for their connection. Your website will probably offer some services to customers and you need them to easily recognise what they’re looking at. And these days, more and more people dislike reading descriptions and prefer to see something more graphical. You might consider hieroglyphs on your website but not many people are capable of reading ancient Egyptian. You you need your own set of icons and images for the most important actions on your website. Preferably icons with an extra label next to it.

Take a look at your browser and find the following buttons: Back, Next, Refresh and Home. Did you read some text to find them? Most likely, you found them by looking at the images. Arrows for back and next buttons, an arrow in a circle for the refresh button and a symbol of a house for the home button. Images that have become standard so make sure you have a few of your own to put on your own website. Especially when you want navigation buttons on your own site. However, do keep in mind that you either have to create these images yourself or get a proper license for the images created by someone else. Considering that many icons are already in the public domain or have been created under a Creative Commons license, it should be no big problem to find any for free.

Next, you will probably need images for the products that you want to sell or display. While Irene looks very pretty, I would not use it when I want to sell socks. I would use a picture of socks instead. And make sure I have licensed that picture or created it myself. Preferably, I would create multiple images at different sizes so I can display thumbnails first and a larger version if the user wants to see more details. Again, this would speed up loading your site.

It does create a bit of a challenge, though. Would you resize the image to a thumbnail dynamically or will you store the image as thumbnail and original format? Both have their advantages. Dynamic resizing will allow you to change the thumbnail size when you like and even allows you to create all kinds of custom sizes. However, your server will need more processing power to do the resizing, which is slow if your original images are created at huge resolutions. (Like most of my artwork.) If you’re expecting a lot of visitors, storing images at different sizes would improve performance considerably but will require more disk space, which could be a minor problem when you have your site hosted and have to pay for the storage per megabyte. Then again, hosts don’t charge much for extra disk space these days, if they’re even charging anything at all.

The image of Irene would be practical for dating sites and sites for bathing products. Her hair has a wet look, giving the impression that she just washed it. She also looks very seductive which would certainly attract attention of many men and probably a few women too. However, on dating sites the members would probably recognise her as a professional model and thus consider it a fake image. She’s too pretty to use a dating site. You’d probably scare a few members away if you would use this image. It would still look great for selling shampoo, though.

So, you’re designing a website and thus you will need images to fill it up. This is often the biggest problem for many companies. In many cases, developers will just use Google to find some image and copy it to the project, ignoring the need for any license. They have good reasons to work this way, because adding proper images isn’t a real task for developers. But it could cause legal troubles if the site is published and some photographer recognizes his images. Without a proper license, it could cost you hundreds of euros to correct the situation and that’s without any other legal costs. Thus it is really bad when developers have to search for the proper images themselves.

A better solution would be by creating placeholder images. Provide the developers with some dummy images that you’ve created yourself by adding a textual description to a newly created image at the preferred size. Make sure it has a proper filename too. This placeholder can then be used by the developer to insert in the proper location, allowing him to continue his work while you start to look for a nice image to replace this placeholder. This will allow time to get a proper license or to make it yourself. Once you’re about to publish the site, all you have to do is replace the placeholders with the images that you want to display.

One more, very important thing to remember. When you get a license for any image that you use, make sure that you keep track of the specific details of the license. It would be best if you have your own database where you can store the image with more information about where you’ve licensed the image, where you found the image and the license and the name of the author. You will need this information if the author or some company representing the author finds your image online and thinks you don’t have a proper license.

Of course, there’s a risk of having a fraudulent license. You might have gotten a license from someone pretending to be the author. This is a risk which you might avoid by keeping track of the origins of every image used by your organisation. And yes, it’s a lot of additional bookkeeping. With this information about where you got your license, you will have a good excuse to get away without any financial damages if the license turns out to be fraud. If you can continue to use the image will depend on the local legislation of the country where your organisation is located and the legislation of the country where your website is hosted.

My personal preference for images is to just create it myself. This takes time and I need opportunities to create those images. For CGI artwork, my computer is fast enough to render an image in the background while I continue to work on developing my sites. Still, I am limited to one image per computer at any time and my license for Vue limits me to using the software on just a single computer. Rendering can easily take a few hours, even days, so I have to be patient.

Of course, I could just take one of my digital cameras but that often means that I need a model, a place and the right weather if I’m going to take pictures outside. This is a lot of work for a bunch of images and I will need to do extra work on those photos once I’ve taken them. They need to be cropped, lighting needs to be adjusted, colors need to be enhanced. This is just too much work for a software developer to do. Thus, you’d better hire a professional to do this work if you don’t have someone in your organisation dedicated to this. Do make sure the photographer you hire will do a “Work for hire” so you’re the official author. Otherwise, the photographer will have influence on how you can use the photos he took!

So, organisations will have a complex task of maintaining licenses and their own images. A lot of organisations do tend to forget about these details which can result in costly problems. Make sure your developers will have something to work with while they are developing. Make sure they don’t have to waste time on those images themselves since developers are costly too. They should focus on the code, not the graphics themselves. Make sure someone in your organisation will manage all images and who is responsible for checking anything that’s about to be published for unknown images. If the image isn’t in the system maintained by the image manager, then you should block the publication until this is fixed.

Multithreading, multi-troubling.

Recently, I worked on a small project that needed to make a catalog of image files and folders on my hard disk and save this catalog in a database. Since my CGI and my photography hobby generated a lot of images, it would be practical to have something easy to support it all. Plenty of software that already does something like this, but none that I liked. Especially since I want to connect images to derived images, group them, tag them, share them, assign licenses to them and publish them. And I want to keep track of where I’ve shared them already. Are they on Flickr? CafePress? DeviantArt? Plus, I wanted to know if they should be rated as adult. Some of my CGI artwork is naughty by nature (because nude models are easier to work with) and thus unsuitable for a broad audience.

But for this simple catalog I just wanted to store the image folder, the image filename, an image name that would be the filename without extension and without diacritics, plus the width and height of the image so I could calculate the image ratio. To make it slightly more complex, the folder name would be a relative folder name based on a root folder that’s set in the configuration. This would allow me to move the images to a different folder or use the same database on a different machine without the need to adjust all records.

So, the database structure is simple. One table that has the folders, one table containing image ratios and one for the image names and sizes. The ratio table will help me to group images based on the ratio between width and height. The folder table would do the same for grouping by folder. The Entity Framework would help to connect to this database and take away a lot of my troubles. All I have to do now is write a simple library that would fill and keep up this catalog plus a console application to call those methods. Sounds simple enough.

Within 30 minutes, the first version was ready. I would first enumerate all folders below the source folder, then for each folder in that list I would collect all image files of type PNG, JPG and BMP. The folder would be written to the folder table and the file would be put in the Image table. Just one minor challenge, though…

I want to add the width and height of the image to the image table too, and based on the ratio between width and height, I would have to either add a new ratio record, or change an existing one. And this meant that I had to read every file into memory to find its size and then look if there’s already a ratio record related to it. If not, I would need to add the new ratio record and make sure the next request for ratio records would now include the new ratio record. Plus, I needed to check if the image and folder records also exist in the database, because this tool needs to update only for new images.

The performance was horrible, as could easily be predicted. Especially since I make images and photo’s at high resolutions, so reading those files does take dozens of milliseconds. No matter that my six cores at 3.5 GHz and 32 GB of RAM turns my system in a Speed Demon, these read actions are just slow. And I did it inefficiently since I have six cores but my code is just single-threaded. So, redo from start and this time do it multithreaded.

But multithreading and the Entity Framework don’t go well together. The database connection isn’t threadsafe and thus you cannot access the database methods from multiple threads. Besides, the ratio table could generate collisions when two images with the same, new ratio are processed. Both threads would notice the ratio doesn’t exist thus both would add it. But one of those would then fail because the other would have added it first. So I needed to change my approach.

So I Used ‘Parallel.ForEach’ to walk through the folder list and then again for all files within the folder. I would collect the data in internal lists and when the file loop was done, I would loop through all images and add those that didn’t exist. And yes, that improved performance a lot and kept the conflicts with the ratio table away. Too bad I was still reading all images but that was not a big issue.Performance went up from hours to slightly over one hour. Still slow.

So one more addition. I would first read all existing folders and images from the database and if a file existed in this list, I would not read it’s size anymore since it wasn’t needed. I could skip the image. As a result, it still took an hour the first time I imported all images, but the second run would finish within a minute, since there wasn’t anything left to read or add. The speed was limited to just reading the files and folders from the database and from the disk.

When you’re operating these kinds of projects in an Agile team and you’re scrumming around, things will slow down considerably if you haven’t thought about these challenges before you started the sprint to create the code. Since the first version looks quite simple, you might have planned it as a very short task and thus end up with extremely slow code. In the next sprint you would have to consider options to speed things up and thus you will realize that making it multithreaded is a bigger task. And while you are working on the multithreaded version, you might discover the conflicts with the Entity Framework plus the possible collisions within the tables. So the second sprint might end with a buggy but faster solution with lots of exception handling to catch all possible problems. The third sprint would then fix these, if you manage to find a better solution. Else, this problem might haunt you to the deadline of the project…

And this is where teams have to be real careful. The task sounds very simple, but it’s not. These things are easily underestimated by a team and should be well-planned before you start writing code. Experienced developers will detect these problems before they start, thus knowing that they should take their time and plan carefully without writing code immediately. (I only did it so I could write this post.) The task seems extremely simple and I managed to describe it in the second paragraph of this post with just three lines. But the solution with a high performance will require me to think before I start writing code.

My last approach is the most promising, though. And it can be done by using multithreading but it’s far more complex than you’d assume at first. And it will be memory-hungry because you need to create several lists in memory.

You would have to start with two threads. One thread will read the database and generate lists of files, folders and ratios. These lists must be completely in-memory because if you keep them as queryable lists, the system would try to continuously read them. Besides, once you’re done generating these lists you will want to close the database connection. This all tells you what you already have. The second thread will read all folders and by using parallel threads it would have to read all image files within those folders. But you would not read the image sizes yet, nor calculate all ratios.

When you’re done collecting the data, you will have to compare it all. You would start by comparing the lists of folders. Folders that exist in both lists can be ignored (but not their files.) Folders that exist in the database list but not the disk list should be deleted, including all files within those folders! Folders that are on disk but not in the database need to be added. Thus you can now start two threads, each with their own database connection. One will delete all folders plus their related images from the database that have been deleted while the other adds all new folders that are found on the disk. And by using two database connections, you can speed things up. You will have to wait for both threads to finish, though. But it shouldn’t be slow.

The next step would be the comparison of images. Here you do something similar as with folders. You split the lists in three different lists. One with all images that are unchanged. One with all images that need to be deleted. And one with all images that need to be added. And you would create a separate thread with its own database connection to delete the images so your main process can start working on the ratios table.

Because we now know which images need to be added, we can go through those files using parallel processing, read the image width and height and add this information to the image file records. When we have enriched this list with these sizes, we can use a LINQ query to generate a list of all ratios of those images and removing all duplicate ratios in this list. This generates the list of ratios that we would need to check.

Before we add the new images, we will have to check the ratios table. As with the folders table, we check for all differences. However, we cannot delete ratios that we haven’t found among the images, because we skipped the images that already exist. We will do this later, though. We will first start adding the new ratios to the database. This too can be done in a separate thread but it’s pretty fast anyways so why bother? A performance gain of two seconds isn’t worth the extra effort if a process takes minutes to finish. So add the new ratios.

Once all ratios are added, we can add all images. We could do this using parallel threads, with each thread creating a new database connection and processing all images from one specific folder or with one specific ratio. But if you want to add them multi-threaded I would just recommend to divide the images in groups of similar sizes. Keep the amount of groups relative to the number of processes (e.g. 24 for my six cores) and let the system do its work. By evenly dividing the images over multiple threads, they should all take about the same amount of time.

When adding the new images, you will have to find the related folder and ratio in the database again. This makes adding images slower than adding folders or ratios because you need the extra lookup. This performance would increase if we had kept the Folders and Ratio lists as queryable lists but then we could not open and close the connections, not could we use multiple connections to add those images. And we want multiple connections to speed things up. So we accept a slightly worse performance at this point, although we could probably speed it up a bit by using a stored procedure to add the images. The stored procedure would have parameters for the image name, the image filename, the width and height, the folder name and the ratio width and height. I’m not too fond of procedures with many parameters and I haven’t tested if this would increase the performance, but in theory it should be faster, especially if the database is on a different machine than the application.

And thus a simple task of adding images to a database turns out to be complex, simply because we need better performance. It would still take hours if it has a lot of new images to add but once you have it mostly filled, it will do quite well.

But you will have to ask yourself and your team if you are capable to detect these problems before you start a new sprint. Designs are simple, because designers don’t always keep the performance in mind. These things are easily asked for because they appear very simple, but have a lot of consequences. Similar problems might arise when you work with projects that need to be secure. The design might ask for a login screen with username and password, and optionally a few OpenID providers as alternative logins, but the amount of code to manage all this data and keep it secure is quite complex. These are real moments when you need to design some technical documentation first, which is something people often forget when working on an Agile project.

Still, you cannot blame the developer if the designer just writes a few lines and the developer chooses the first, slow solution. The result would be the requested task. It is the designer who needs to be aware of these possible performance pitfalls. And with Agile, you have a team. All team members should be able to point out that this simple description would have these pitfalls, thus making it a long and complex task. They should all realise that they will have to discuss possible solutions for this and preferably they do so as a team with just one computer. (The computer would be used to find information, not to write code!) Only when they agree on the proper solution then one or two of them could start writing code. And they would know how long this task will take. Thus, the task would finish within two sprints. In the first sprint, all team members would have a small task to meet and discuss the options. In the second sprint, one or more members would have a big task of implementing the code.

Or, to keep it simple: think before you start writing code!