How SSL is failing…

Every web developer should know about the Secure Sockets Layer and it’s successor, the Transport Layer Security. (SSL and TLS.) These techniques are nowadays a requirement to keep the Internet secure and to keep private matters private. But while this technique is great, the implementation of it has some flaws.

Flaws that aren’t easy to resolve, though. But to understand, you will need to understand how this technique works when you use it to protect the communication between a visitor and a website.

This is a very long post as it is a complex topic and I just don’t want to split it in multiple posts. So I add several headers to separate parts…

Femke Wittemans ArmoredWhat is SSL?

When you communicate with a website, you’re basically have a line with two endpoints. You on one side and the web server on the other side. But this isn’t a direct line but goes from your computer to your router/modem, to your provider, over some other nodes on the Internet, to the provider of the host, the router of the host to the computer where the host is running the website on. And this is actually a very simplified explanation!

But it shows a problem. The communication between the two endpoints goes over various nodes and at every node, someone might be monitoring the communication, listening for sensitive information like usernames, passwords, social security numbers, credit card numbers and whole lot more. And to make it even more harder, you don’t know if your information will travel over nodes that you can trust! So, you need to make sure that the information you send over between both endpoints is secure. And that’s where SSL is used.

SSL is an encryption technique with asynchronous keys. That means that the host has a private key and a public key. It keeps the private key hidden while giving everyone the public key. You can generally use one key to encrypt a message but you would need the other key to decrypt it again. And that’s basically how security works! You have the public key and you encrypt your data with it before sending it to me. No one on those nodes have the private key, except me, so I’m the only one who can decrypt it. I can then encrypt a response with the private key and send that back to you, as you can decrypt it again with the public key.

Unfortunately, everyone on those nodes can too, as they would all know this public key as I had just sent it to you. So things need to be a little more secure here. Which fortunately is the case. But in general, you should never encrypt sensitive data with a private key as anyone with the public key would be able to read it!

Session keys…

But the Internet uses a better trick. Once you receive the public key from the web server, your browser will generate a session key, which is a synchronous encryption key. This session key can be used by anyone who knows it and right now, you would be the only one. But as you have my public key, you would use my public key to encrypt the session key and send it to me. Now I know it too and we can have a communication session using this session key for security. And once the session is done, the session key can be discarded as you would make a new one for every session.

Most people think the encryption is done using the SSL certificates but that’s not the case! SSL is only used to send a session key to the server so both can communicate safely. That is, as long as no one else knows that session key. Fortunately, session keys are short-lived so there isn’t much time for them to fall in the wrong hands…

The main weakness…

So, this seems pretty secure, right? My site sends you a public key, you create a session key and then we communicate using this session key and no one who is listening in on us can know what we are talking about! And for me it would be real easy to make the SSL certificate that would be used as most web servers can do this without any costs, and generally within just a few minutes. So, what can go wrong?

Well, the problem is called the “Man in the Middle” attack and this means that one of the nodes on the line will listen in on the communication and will intercept the request for secure communications! It will notice that you ask for a public key so it will give its own public key to you instead that of the host. It also asks for the public key of the host so it can relay all communications. You would then basically set up a secure line with the node and the node does the same with the host and will be able to listen in to anything that moves between you, as it has to decrypt and then encrypt each and every message. So, it can listen to sensitive data without you realizing that this is happening!

Authorities…

So the problem is that you need to know that the public key I gave you is my public key, and not the key of this node. How do you know for sure that this is my key? Well, this is where the Certificate Authorities (CA) have a role.

The CA has a simple role of validating the certificates that I use for my website. I want to secure my host so I make a certificate. I then ask the CA to sign this certificate for me. The CA then checks if I really am the owner of the domain at that specific moment or have at least some control over the domain. And when they believe that I’m the owner, they will sign my certificate.

Then, when you receive my public key then you can check the credentials of this certificate. It should tell you if it is for the specific domain that you’re visiting and it should be signed by the CA who issued the certificate. If it is signed correctly then the CA will have confirmed that this certificate is linked to my host and not that of some node between you and me.

Trust…

But the problem is that when your connection isn’t secure because some node is trying to listen than checking if my certificate is properly signed by the CA won’t work, as you would be requesting the CA to validate it over the same unsafe connection. The node will just claim it is so that option won’t work. No, you already need to know the public key of the CA on your system so you can decrypt my signature. And you need to have received the Ca’s certificate from some secure location. Otherwise, you can’t trust if my public key is the real thing.

So most web browsers have a list of public keys as part of their setup. When they install themselves, they will also include several trustworthy public keys from the more popular CA’s. Basically, the CA’s they deem reliable. So your browser will validate any public key from my site with the public keys it knows and trusts and if everything is okay, you will be notified that the connection is secure and the session key can be generated for further secure communications.

Otherwise, your browser will give you a warning telling you what’s wrong with the certificate. For example, it might be outdated or not meant for the specific domain. In general, you should not continue as the connection with the host has security problems!

Distrust!…

But here’s the thing… The list of trusted CA’s in your browser can be modified and to be honest, it sometimes gets modified for various reasons. Some are legitimate, others are not.

For example, this list is modified when a specific CA is deemed unreliable. This happens regularly with smaller CA’s but once in a while, some major scandal happens. For example, in 2011 it was discovered that the company DigiNotar had a security breach which had resulted in several certificates being falsified. Most likely, the Iranian Government was behind this in an attempt to check all emails that their citizens were sending through GMail. The fake certificates allowed them to listen in on all emails using the man in the middle technique. DigiNotar went bankrupt shortly afterwards, as all the certificates they had issued had become worthless.

Similar problems occurred at StartCom, a CA that actually gave away free certificates. The Israeli company was purchased by a Chinese company and some suspicious behavior happened soon afterwards. The fear was that this Chinese company (and perhaps even the Chinese government) would use this trust that StartCom had to make fake certificates to listen in on all communications in China. Both Mozilla and Google started to raise questions about this and didn’t get satisfying answers so they decided to drop the StartCom certificates. This CA had become controversial.

And then there’s Symantec. Symantec is a company that has been making software for decades that all relate to security. It is an American company and has been trustworthy for a long time. And in 2010 Symantec acquired Verisign’s authentication business unit which includes releasing SSL certificates for websites. But in 2015 it was discovered by Google that Symantec had issued several test certificates for impersonating Google and Opera. Further research has led Google to believe that Symantec has been publishing questionable certificates for over 7 years now and thus they announced that they will distrust Symantec SSL certificates in the near future. In April 2018, all Symantec certificates will be useless in Google Chrome and other browsers might follow soon.

Also interesting is that Symantec is selling their SSL business to DigiCert. This could solve the problem as DigiCert is still trusted. Or it makes things worse when browser manufacturers decide to distrust DigiCert from now on also.

But also: Telecom!

But there are more risks! Many people nowadays have mobile devices like tablets and phones. These devices are often included with a subscription to services of some mobile phone company. (T-Mobile and Vodafone, for example.) These companies also sell mobile devices to their customers and have even provided “free phones” to new subscriptions.

However, these companies will either provide you with a phone that has some of their software pre-installed on your new device or will encourage you to install their software to make better use of their services. The manufacturers of these mobile devices will generally do similar things if given a chance. And part of these additions they make to your Android or IOS device is to include their own root certificates with the others. This means that they are considered trustworthy by the browser on your device.

Is that bad? Actually, it is as it allows these companies to also do a man in the Middle attack on you. Your telecom provider and the manufacturer of your phone would be able to listen to all your data that you’re sending and receiving! This is worse, as local government might require these companies to listen in on your connection. It is basically a backdoor to your device and you should wonder why you would need to trust your provider directly. After all, your provider is just another node in your connection to the host.

Did you check the certificate?

So the problem with SSL is that it’s as reliable as the Certificate Authorities who made those certificates. It’s even worse if your device has been in the hands of someone who wants to listen in on your secure connections as they could install a custom trusted certificate. Then again, even some malware could install extra public keys in your trusted certificates list without you noticing. So while it is difficult to listen to your secure conversations, it is not impossible.

You should make a habit of checking any new SSL certificate that you see pop up in your browser and it would be a good idea if browsers would first ask you to check a certificate whenever they detect that a site has a new one. It would then be up to the user to decide to trust the certificate or not. And those certificates would then be stored so the browser doesn’t need to ask again.

Unfortunately, that would mean that you get this question to trust a certificate very often when you’re browsing various different sites and users will tend to just click ‘Ok’ to get rid of the question. So, that’s not a very good idea. Most users aren’t even able to know if a certificate is trustworthy or not!

For example, while you’re reading this blog entry, you might not have noticed that this page is also a secured page! But did you check the certificate? Did you notice that it is signed by Automattic and not by me? That it’s a certificate issued by “Let’s Encrypt“? Well, if the certificate is showing this then it should be the right one. (But when my host Automattic changes to another CA, this statement becomes invalid.)

ACME protocol…

And here things become interesting again. “Let’s Encrypt” gives away free certificates but they are only valid for a short time. This makes sense as domains can be transferred to other owners and you want to invalidate the old owner’s certificates as soon as possible. It uses a protocol called ACME which stands for “Automatic Certificate Management Environment. It’s basically a toolkit that will automate the generation of certificates for domains so even though the certificates are only valid for a short moment, they will be replaced regularly. This is a pretty secure setup, although you’d still have to trust this CA.

Problem is that “Let’s Encrypt” seems to prefer Linux over Windows as there is almost no good information available on how to use ACME on Windows in IIS. But another problem is that this protocol is still under development and thus still has some possible vulnerabilities. Besides, it is very complex, making it useless for less technical developers. The whole usage of certificates is already complex and ACME doesn’t make things easier to understand.

Also troublesome is that I tried to download the ACME client “Certify the Web” for Windows but my virus scanner blocked the download. So, now I have to ask myself if I still trust this download. I decided that it was better not to trust them, especially as I am trying to be secure. Too bad as it seemed to have a complete GUI which would have made things quite easy.

Don’t ignore security warnings! Not even when a site tells you to ignore them…

Additional problems?

Another problem with SSL is that it is an expensive solution so it is disliked by many companies who are hosting websites. It’s not the cost for the certificates, though. It’s the costs for hiring an expert on this matter and making sure they stay with the company! A minor issue is that these security specialists do have access to very sensitive material for companies so you need to be sure you can trust the employee.

Of course, for very small companies and developers who also host websites as a hobby, using SSL makes things a bit more expensive as the costs are generally per domain or sub domain. So if you have three domains and 5 subdomains then you need to purchase 8 certificates! That’s going to easily cost hundreds of euros per year. (You could use a Multi-Domain (SAN) Certificate but that will cost about €200 or more per year.)

Plus, there’s the risk that your CA does something stupid and becomes distrusted. That generally means they will have to leave the business and that the certificates you own are now worthless. Good luck trying to get a refund…

But another problem is that the whole Internet is slowly moving away from insecure connections (HTTP) to secure (HTTPS) connections, forcing everyone to start using SSL. Which is a problem as it starts to become a very profitable business and more and more malicious people are trying to fool people into buying fake or useless certificates or keep copies of the private key so they can keep listening to any communications done with their keys. This security business has become highly profitable!

So, alternatives?

I don’t know if there are better solutions. The problem is very simple: Man in the Middle. And the biggest problem with MITM is that he can intercept all communications so you need something on your local system that you already can trust. Several CA’s have already been proven untrustworthy so who do you trust? How do you make sure that you can communicate with my server without any problems?

There is the Domain Name Service but again, as the MITM is intercepting all your transactions, they can also listen in on DNS requests and provide false information. So if a public key would be stored within the DNS system, a node can just fake this when you request for it. The MITM would again succeed as you would not be able to detect the difference.

Blockchain?

So maybe some kind of blockchain technology? Blockchains have proven reliable with the Bitcoin technology as the only reason why people have lost bitcoins is because they were careless with the storage of their coins. Not because the technique itself was hacked. And as a peer-to-peer system you would not need a central authority. You just need to keep the blocks on your system updated at all times.

As we want the Internet to be decentralized, this technology would be the best option to do so. But if we want to move security into blockchains then we might have to move the whole DNS system into a blockchain. This would be interesting as all transactions in blockchains are basically permanent so you don’t have to pay a yearly fee to keep your domain registered.

But this is likely to fail as Bitcoin has also shown. Some people have lost their bitcoins because their disks crashed and they forgot to make a copy of their coins. Also, the file size of the Bitcoin blockchain has grown to 100 GB of data in 2017 which is quite huge. The whole DNS system is much bigger than Bitcoin is so it would quickly have various problems. Your browser would need to synchronize their data with the network which would take longer and longer as the amount of data grows every day.

So, no. That’s not really a good option. Even though a decentralized system sounds good.

Conclusion?

So, SSL has flaws. Then again, every security system will have flaws. The main flaw in SSL is that you need to trust others. And that has already proven to be a problem. You have to be ultra-paranoid to want to avoid all risks and a few people are this paranoid.

Richard Stallman, for example, is a great expert on software yet he doesn’t use a mobile phone and avoids using the Internet directly. Mobile phones are “portable surveillance and tracking devices” so he won’t use them. He also avoids key cards and other items that allow people to track wherever he goes. And he generally doesn’t access the Web directly, as this too would allow people to track what he’s doing. (He does use Tor, though.) And maybe he’s on to something. Maybe we are putting ourselves in danger with all this online stuff and various devices that we have on our wrists, in our pockets and at our homes.

Thing is that there is no alternative for SSL at this moment so being paranoid is useful to protect yourself. Everyone should be aware of the risks they take when they visit the Internet.  This also depends on how important or wealthy you are, as poor, boring people are generally not interesting for malicious people. There isn’t much to gain from people with no money.

Still, people are still too careless when they’re online. And SSL isn’t as secure as most people think, as events from the past have already proven…

 

The Binary Search problem

Many developers will have to learn all kinds of algorithms in their lives so they can write highly optimized code. Many of these algorithms have long histories and are well-tested. And one of them is the binary search method.

The binary search is a fast algorithm to find a record in a sorted list of records. For most people, this is a very familiar algorithm if you had to ever guess a value between 1 and 10, or 1 and 100. The principle is quite simple. You have an x amount of records and you pick the record in the middle of the list. For guessing between 1 and 100, you would pick 50. (100/2) If it is the correct value, you’ve won. If it is too high, you now know the value must be between 1 and 50 so you guess again with value 25. Too low and you pick the value between 50 and 100, which would be 75. You should be able to guess the value in up to 8 tries for values between 1 and 100.

Actually, the binary search is actually the easiest explained as bit-wise checking of values. A single byte will go from 00000000 to 11111111 so basically all you do is a bitwise compare from the highest bit to the lowest. You start with value 10000000 (128) and if the value you search for is higher, you know that first bit is 1, else it needs to be 0.

Your second guess would either be 11000000 (192) or 01000000 (64) and you would continue testing bits until you’ve had all bits tested. However, your last test could also indicate that you guessed wrong so the maximum number of guesses would be equal to the number of bits plus one.

And that’s basically what a binary search is. But it tends to be slightly more complicated. You’re not comparing numbers from 0 to some maximum value but those numbers are generally a kind of index for an array, and you compare the value at the position in the array. You basically have a value X which could basically be any data type and even be a multi-field record and you have an array of records which has all data sorted for some specific index. And these arrays can be reasonably large. Still, the binary search will allow you some very quick search.

Now, the biggest problem with the binary search is how people will calculate the index value for the comparison. I already said that you could basically check the bits from high to low but most developers will use a formula like (floor+ceiling)/2 where floor would be the lowest index value and ceiling the highest index value. This can cause an interesting problem with several programming languages because there’s a risk of overflows when you do it like this!

So, overflow? Yes! If the index is an unsigned byte then it can only hold a value of 11111111 (255) as a maximum value. So as soon when you have a floor value of 10000000 (128) and a ceiling of at least (10000001) then the sum would require 9 bits. But bytes can’t contain 9 bits so an overflow occurs. And what happens next is difficult to predict.

For a signed byte it would be worse, since value 1000000 would be -128 so you would effectively have 7 bits to use. If the 8th bit is set, your index value becomes negative! This means that with a signed byte, your array could never be longer than 64 records, else this math will generate an overflow. (64+65 would be 129, which translates to -127 for signed bytes.)

Fortunately, most developers use integers as index, not bytes. They generally have arrays larger than 256 records anyways. So that reduces the risk of overflows. Still, integers use one bit for the sign and the other bits for the number. A 16-bit integer thus has 15 bits for the value. So an overflow can happen if the number of records has the highest bit value set, meaning any value of 16384 and over. If your array has more than 16384 records then the calculation (floor+ceiling)/2 will sometimes generate an overflow.

So, people solved this by changing the formula to floor+((ceiling-floor)/2) because ceiling-floor cannot cause an overflow. It does make the math slightly more complex but this is the formula that most people are mostly familiar with when doing a binary search!

Yet this formula makes no sense if you want a high performance! If you want a binary search, you should actually just toggle each bit for the index until you found the value. To do so, you need to know how many bits you need for the highest value. And that will also tell you how many guesses you will need, at most, to find the value. But this kind of bitwise math tends to be too complex for most people.

So, there is another solution. You can promote the index value to a bigger one. You could use a 32-bit value if the index is a 16-bit value. Thus, you could use (int16(((int32)floor+(int32)ceiling)/2) and the overflow is gone again. And for a 32-bit index you could promote the math to a 64-bit integer type and again avoid any overflows.

It is still less optimal than just toggling bits but the math still looks easy and you can explain why you’re promoting the values.

But what if the index is a 64-bit value? There are almost no 128-bit values in most programming languages. So how to avoid overflows in those languages?

Well, here’s another thing. As I said, the index value is part of an array. And this array is sorted and should not have any duplicate values. So if you have 200 records, you would also need 200 unique values, with each value being at least 1 byte in size. If the index is a 15-bit signed integer then the values in the array must also be at least 15-bits and would generally be longer. Most likely, it would contain pointers to records elsewhere in memory and pointers are generally 32-bits. (In the old MS-DOS era, pointers were 20 bits, so these systems could manage up to 1.048.576 bytes or 1 megabyte of memory.)

So, let’s do math! For an overflow to occur with an index as a signed 16-bit integer you would need to have at least 16384 records. Each record would then be at least 2 bytes in size, thus you would have at least 32 kilobytes of data to search through. Most likely even more, since the array is probably made up by pointers pointing to string values or whatever. But 21 KB would be the minimum to occur when using a 16-bit signed index.

So, a signed 32-bit index would at least have bit 30 set to 1 before an overflow can occur. It would also need to contain 32-bit values to make sure every value is unique so you would have 4 GB of data to search through. And yes, that is the minimum amount of data required before an overflow would occur. You would also need up to 31 comparisons to find the value you’re searching for, which is becoming a bit high already.

So, a signed 64-bit index would have records of at least 8 bytes in size! This requires 36.893.488.147.419.103.232 bytes of data! That’s 33.554.432 terabytes! 32.768 petabytes! 32 exabytes! That’s a huge number of data, twice the amount of data stored by Google! And you need more data than this to get an overflow. And basically, this is assuming that you’re just storing 64-bit integer values in the array but in general, the data stored will be more complex.

So, chances of overflows with a 32-bit index are rare and on 64-bit indices it would be very unlikely. The amount of data required would be huge. And once you’re dealing with this much data, you will have to consider alternate solutions instead.

The alternate solution would be hash tables. By using a hash function you could reduce any value to e.g. a 16-bit value. This would be the index of an array of pointers with a 16-bit index so it would be 256 KB for the whole array. And each record in this array could be pointing to a second, sorted array of records so you would have 65536 different sorted arrays and in each of them you could use a binary search for data. This would be ideal for huge amounts of data, although things can be optimized better to calculate to an even bigger hash-value. (E.g. 20 bits.)

The use of a hash table is quite easy. You calculate the hash over the value you’re searching for and then check the list at that specific address in your hash table. If it is empty then your value isn’t in the system. Otherwise, you have to search the list at that specific location. Especially if the hash formula is evenly distributing all possible values then a hash table will be extremely effective.

Which brings me to a clear point: the binary search isn’t really suitable for large amounts of data! First of all, your data needs to be sorted! And you need to maintain this sort order every time when you add or delete items to this record, or when you change the key value of a record! Hash tables are generally unsorted and have a better performance, especially with large amounts of data.

So, people who use a 32-bit index for a binary search are just bringing themselves in trouble if they fear any overflows. When they start using floor+((ceiling-floor)/2) for their math, they’re clearly showing that they just don’t understand the algorithm that well. The extra math will slow down the algorithm slightly while the risk of overflows should not exist. If it does exist with a 32-bit index then you’re already using the wrong algorithm to search for data. You’re at least maintaining an index of 4 GB in size, making it really difficult to insert new records. That is, if overflows can occur. The time needed to sort that much data is also quite a lot and again, far from optimal.

Thing is, developers often tend to use the wrong algorithms and often have the wrong fears. Whenever you use a specific algorithm you will have to consider all options. Are you using the right algorithm for the problem? Are you at risk of having overflows and underflows? How much data do you expect to handle? And what are the alternative options.

Finally, as I said, doing a binary search basically means toggling bits for the index. Instead of doing math to calculate the half value, you could instead just toggle bits from high to low. That way, you never even have a chance of overflows.

So, you want to be a software developer? Part 2.

Here’s part two about becoming a software developer. In the first part I told you about useful resources that will help you learn and once you’re working as a developer, those same resources will help you finish your projects.

By now, you should also have a few basic computer skills. Yet if you expect me to start Ramona on Computer 2_0001.pngtalking about programming languages then no, not in this part. Because, as I said before, programming languages are not important. They are just tools and a good developer can use any tools he likes, even though it sometimes takes a short moment to get accustomed to the new tools.

To become a developer, you need to understand logic and you need to be able to visualize what your project will look like. And unfortunately, not everyone is born with these two abilities, although you can train to improve both abilities.

Having a mathematical background will help, since Math requires good logic. But there are other ways to train yourself.

To train your logic I would advise to learn to play Chess, Checkers or Go. Just these simple board games that require you to think ahead and outsmart your opponent. Especially Chess is a good game because you have various different pieces, each with different values, that you have to deploy over the ‘battlefield’ of 8×8 squares to capture the King of your opponent. Shredder Chess is a good chess engine that you can play online for free or on your mobile phone or tablet. And the various pieces will force you to think more strategically.

And although a game like Stratego and other games also requires strategic insights, the advantage of Chess is that the game starts with no hidden secrets for both players. Each player knows where the pieces of the opponents are and there’s no luck involved. Then again, if your opponent makes a mistake, you could consider that lucky, but it’s just a mistake by your opponent.

Learning Chess means that you have to think ahead in the game. Each move needs to be calculated. How will your opponent respond? And what will you do next. Good chess players can see a dozen or more moves ahead in the game, calculating which moves will win or lose. And to win, you must think further ahead than your opponent.

This is why training against a computer is a good way to improve your brains. Chess engines will use databases for the opening and the end game which will allow them to play strong, but in-between they have to think just like humans, and they generally calculate just up to a few moves ahead. It should not be difficult to think further ahead of the lower levels. And to improve, you should start playing against higher and higher levels until you win about half the games you play.

Next to learning strategy, logic and thinking ahead, you need to learn to visualize. For this, the best thing to do is to read. And since programming requires a good knowledge of the English language I would advise you to read a lot of English Science/Fiction literature. The stories of Conan the Barbarian, Tarzan or even the Dragonlance series are very good reading material. But if you don’t like this genre, start reading other stories where you can imagine how the main character is walking around in the environment set inside the book. Turn the words into a movie inside your head, visualize what is going on.

If that is a bit difficult, then read a book that has also been made into a movie. Movies like “the Hobbit” and “Lord of the Rings” were made on books written by Tolkien and make very good reading material. The movie follows the books reasonably close too, so visualizing it becomes easier. Other books would be the series “A song of Ice and Fire” which is used for “Game of Thrones” or “the Southern Vampire Mysteries” which resulted in the series “True Blood“. The movies should help you to visualize these worlds that are described inside the books.

Another way to learn how to visualize is through role-playing games and preferably those tabletop games with dice. Dungeons & Dragons is such a game but GURPS is my favorite. GURPS is less about superpowers and more adjustable to different worlds and settings. In GURPS, you can play a time-traveller who goes back to WWII with the mission to take down the Nazi’s in 1940 to prevent some disaster in the future. Or you are a time-travelling wildlife photographer being sent to the Age of Dinosaurs, 65 million years ago, filming how these dinosaurs came to their ends. Or maybe you’re just a Civil War-era cowboy fighting some Alien invaders. Then again, in GURPS you can also just be a dungeon crawler, bashing monsters and gathering loot.

Thing is, you need things that will require you to visualize what things will look like inside your head. This is important, since you will be working on various projects in the future and you will need to think ahead to know what it will look like once it is done!

As a developer, you have to understand the whole process starting with a good idea that you want to work out to finishing it completely and providing it to those who can use it. If you’re a carpenter, you might decide to build some chairs and a table and visualize how it should look like while the wood is still part of a tree. Once you are done, it is likely that it won’t look like what you visualized at the start, but that’s basically because you will keep changing your mind during the whole process.

However, if you made this for a customer then it should closely match what you agreed upon with this customer, since he expects that you deliver what you have suggested to him. You might have made drawings and descriptions so your customer will expect your project to match this. Then again, even clients can change their minds and together you could agree on changing some of the details. But in general, you start each process visualizing what you are going to make and think ahead about all the steps that need to be taken to turn the idea into a final product.

Without visualization, the whole process of building your project will be awfully slow and full with challenges. (Developers don’t talk about problems! We call it challenges, since problems are things that need to be solved and solving problems is a challenge!)

So, learn to use logic, learn to visualize things.

So, you want to be a software developer? Part 1.

So, you want to be a software developer too? Great! So, where do you start?Ramona on Computer 1_0001.png

I started with a mentor, my father. Someone who could teach me some of the more important principles, even though it still early in the World of Computers. I had to learn from books and magazines but more importantly, I had to learn by practicing! And that’s how you should start too. You need to practice your computer skills first.

But nowadays, you have one major advantage over me. You have the Internet and there are thousands of mentors willing to help you with anything you need. So, before you will learn to write programs, you need to know a few important resources.

The most important resource for all programmers is this simple search engine called Google. You probably know it already and it is used by quite a few people, although some people dislike Google and how powerful this company actually is. They might use Yahoo, Bing or even DuckDuckGo. That’s just fine but in my own experience, Google tends to provide the most relevant answers to your questions.

Google does this by profiling. They keep a history of things you’ve searched for and if you use Google Chrome as web browser, they will most likely have a good idea of your interests. If you also have a GMail or Google Apps for Work account then they will have an even better profile of you and your preferences. And yes, in a way that invades in your privacy! So you have to be smart in how you use Chrome and Google!

One simple way to get around this is by using multiple accounts. See this link on how to add more than one profile to Chrome. Make one account for any development-related stuff and make sure a Google account and mailbox is assigned to this profile. This is what you will use to let Google profile your development requests. Create a second account for anything that you want to keep more private and use whatever you like. This second account can avoid all Google products and can be used if you like to surf for porn or whatever else using Yahoo. Or to write on Facebook and read tweets on Twitter. But use at least two accounts in Chrome if you want to develop software!

While Google is profiling your development account, you will notice how all search results will start to relate more and more towards software development. This way, you take advantage of what many people consider an invasion of privacy. Just make sure that anything you do with the development account is development-related.

So, now you have the most powerful online resource available to you and as you continue to use it, it will learn more and more about the information you need. So, let’s focus on a few other resources.

An old resource is Experts Exchange, which has a bit unfortunate name. People tend to confuse it with Expert Sexchange, which happens when people forget the dash in the URL or put it in the wrong place. You don’t want to confuse those two! Anyways, EE has been a very useful resource for me for about 15 years now, although the knowledge-base of this site has become saturated. They also turned it into a paid website once, started serving advertisement and they have been losing members since new members had no free access to any answers, which annoyed many people. To get free access, you needed to get a minimum number of points per month by answering the questions of other members for which points could be awarded to you. It’s a good system and they’re continuously searching for new ways to get extra revenue. And they have a huge knowledge-base that they are continuously expanding. Still, there is a slightly better resource available.

This alternative is Stack Overflow, where membership is free, although not ad-free. When you answer questions on this site, you can receive points which increase your rank. When your rank increases, you gain more access rights on the whole system and can even become a moderator. And like EE, this site has already a huge knowledge-base. And this is just one of the many Stack Exchange sites! The makers of SO decided to use the same web software to set up various other sites for other topics and this received a lot of support, allowing them to keep maintaining the software and sites. With over 150 different communities and topics, this is one of the best resources online. But for developers, Stack Overflow should be enough.

Another good resource is Quora. But Quora is more generic and can have all kinds of topics. It is a good place to ask non-technical questions about software development, like what kind of chair you need, what the best setup for your monitor and mouse is or even what to do when someone copied your source code without permission.

And of course, there are many more resources online that you can use, but to start, download Google Chrome and set up that developer account. Then use the resources I’ve just mentioned and start searching for more resources that you think are useful. Add all these resources to your favorites and make sure your Google settings are set to synchronize your settings with Google. Why? Because it allows you to share your favorites with your developer account on multiple computers! Or to synchronize between your computer, your laptop, your tablet with Chrome and your Android phone…

You may have noticed that I talk about what programmers have to do and I have said nothing about programming yet! This is because you will need to prepare first. As a developer, the first thing you need to know is how to find the information you need.

One thing to keep in mind is that programming languages aren’t as important as they seem. A programming language is just a tool, like a hammer or screwdriver. You use it to create a product. For developers, that product is a piece of software. And for people who will use it, it just doesn’t matter how you created it as long as it just works as it is supposed to. So relax! Programming isn’t really about learning programming languages but about making products with your typing hands and brains! Just like a carpenter who makes furniture…

My history as a software developer

In the following posts I will speak about the things you need to learn to become a professional software developer. But before I start that series I first start talking about my own experiences.

I was born in 1966 so I’ve seen this world for about half a century already. And I’ve seen how computers developed from devices as large as a house to devices the size of my fingernails. I’ve seen how difficult it was to write code in the beginning, how standards started to become more popular and made things easier and how people started to complain about standards and, as XKCD makes clear, “solved” the problem by making more standards…

standards

And this is still continuing. It is interesting to see how people just continue to invent new standards, new languages and new formats just because they dislike the old ones. It is also annoying for developers because you will miss some job opportunities if you can’t use one or more of these standards.

But back about me! I was lucky, since my father happened to be a software developer back then, until the day he retired in 1994. He worked for a bank, which was called the “RijksPostSpaarbank”or RPS. (You could translate it to “State Mail Savings Bank”.) This was later known as just the “Postbank” (mail bank) and is now known as the ING. My dad actually was part of the “Postcheque- en Girodienst” (PCGD) which handled a lot of financial transactions.

The PCGD was one of the first fully automated Giro (order?) services in the world working with punched cards to automate a lot of stuff. Working with punched cards was quite fun back then and my father once brought a bag full of the punched-out paper to use as confetti for some party we were holding. That stuff was so nasty that when we moved out of our house in 1984, we could still find small pieces of this confetti in various locations. Still, fun stuff!

And yes, my interests in programming and computers have been inherited from my father. He once worked on one of the earlier Apple systems to help an uncle of mine to automate his bookkeeping and when he didn’t use it, I was allowed to play games on it and experiment a bit.

When he later bought a programmable calculator, the TI-58, I was allowed to use it for school and quickly learned to write simple programs for it, within the 50 instructions the memory allowed. And the first program I wrote myself was on this calculator and a piece of paper, for the Quadratic Formula! And yes, I needed it written down on paper since this calculator would lose the content of its memory once you turned it off so I had to reprogram it very often!

Later, he bought a ZX-81 for me to learn to use more about computer programming. He himself used it too for his bookkeeping but he found out it wasn’t as powerful as he’d hoped to. So I ended up to be the one to use it. Mostly for games but also for programming purposes. And by this time I had been programming various things already, reading magazines to learn even more.

This was around 1982 and my school also decided to teach about computers, so I got a simple education in programming through my school. Funnily enough, the school started with punched cards where we had to fill in the holes with a black pencil. They would then be sent to some university where a machine would stamp holes for all the black spots and would then run the code, returning a printed output of the results.

Compiling my code and getting the results back would often take two to three weeks. Which XKCD also documented:

compiling

Soon afterwards, we got the Sinclair QL which was a bit more serious computer. Back in those days it was fast and the two microdrives of each 100KB made storage of applications a lot easier. By then, I was also studying at the “Higher Laboratory School” to become a Lab Assistant, simply because this school offered an extra ICT education, teaching me how to use PAscal on a Minix system. It was the most interesting part of school while the rest sucked so after a year, I quit and went to look for a job as programmer or whatever else in the ICT. But since I’d learned Pascal, I had bought a PAscal compiler for the QL, which was quite rare back then.

I had one of the earlier PC’s, in my case the Tulip System PC Advance/Extend with a hard disk of 20 MB and 640 KB of RAM and an EGA video card. I had an illegal copy of Turbo Pascal 3.0 which I used for writing my own programs and I had a copy of NetHack, a fun game that I could play for hours.

Around 1987 I had the chance to get some AMBI (Dutch) modules. Part of this was learning to program in COBOL and a special Assembly language called EXAT. (Exam Assembly Language, specifically created for exams.) I also worked as an intern for 6 months at IBM Netherlands as a COBOL Developer, although my job didn’t involve as many programming tasks as I hoped for.

As my job as Intern, I was also given a first look at SQL and even got some kind of diploma indicating that I was good enough as an SQL developer in a time when SQL was just mostly restricted to the “SELECT” statement.

I’ve upgraded to newer computers several times, working all kinds of jobs and trying to get a job as Software Developer, meaning that I had to extend my skills. I got an illegal copy of Turbo Pascal 5 too but when Turbo Pascal 6 came out, I had the financial means to just purchase a license. Which I also did when Borland Pascal 7 arrived on the market. (It also supported Windows 3.11 development, which I was using back then. And around 1994 there was the first Borland Delphi version, which I purchased. And upgraded all the way until Delphi 2007.

The only Delphi version I bought after the 2007 edition was Delphi XE5. Problem was that Delphi was losing the battle against .NET and I needed to switch my skills. So in 2002 I started with the first Visual Studio compilers and I later would upgrade to the versions 2008, 2010, 2012, 2012 and now 2015. Why? Because since 2001, the .NET environment was gaining control over the Windows market and development jobs shifted from desktop applications to web development.

Nowadays, it shifts towards mobile development and embedded “Smart” devices and the “Internet of Things”.

Around 1993, I also got my first Linux distribution, which was interesting. It was difficult to get software for it, since the Internet was still under development back then and it was hard to find good sources and good documentation. Still, I managed to run Linux from a floppy disk and use the console for some simple things. But I would rarely use it until I started using Virtual Machines around 2002 using VMWare. Since then, I have created (and deleted) various virtual machines running some version of Linux. Also one or two versions of FreeBSD and once even Solaris.

And 1993 was also the year when I started to really focus on other languages. It was when the Internet started to rise and I had a dial-up connection through CompuServe. These were well-known back then since they provided a free CD for Internet access with almost every computer magazine back then. A lot of people had huge stacks of CompuServe CD’s at home, not knowing what to do with them, although they were great frisbees and also practical to scare off birds in your garden. But I used one to subscribe and kept using it until 2005, even though I had moved to Chello in 1998. (And Chello later became UPC and is now called Ziggo.)

Anyways, I had learned some Forth and a few other languages, tried my best with Perl, got some early experience with HTML and through Delphi I learned more about DBase IV and Paradox. (Both databases.) I also started to focus on C and C++, which was important since the Windows kernel was built in C and exposed methods to call to from Delphi that used the C calling methods and logic. The Windows API was well-documented for C developers but I had to learn to convert this C code to Pascal code so I could use the same libraries in Delphi.

With my copy of Borland Pascal 7 also came a copy of Turbo Assembly so I also focused on Assembly. I wrote a mouse driver in Assembly to use with Pascal on MS-DOS and when I worked as a software developer in 1994 for a company called Duware B.V. I also used Assembly to create a screen saver to use within the applications we created.

Applications that were created with Microsoft Basic Professional Development System 7.1, which technically was the latest version of QuickBasic before Microsoft created Visual Basic. Since my employer wanted to move from MS-DOS to Windows, he was also looking for a good programming language for Windows. My suggestion of Delphi was ignored because that meant my boss would need to learn PAscal, which he did not want to do. We also looked at Gupta SQL Windows, which seemed promising but when he hired a new employee who had PowerBuilder experience, he decided that we would move to PowerBuilder instead! This language was similar to the BASIC he knew and seemed a bit promising.

Still, when it took two days for two of his employees to make an animated button on a form and he allowed them to waste that much time on an unimportant feature, I realised that this job wasn’t very promising. For my boss, BASIC was like a Golden Hammer. And in my experience, you need to stay far away from people who use golden hammers since they think they’re always right and always have the right tool. What matters to them is that the problem must fit the tool, else the project needs to be changed to match.

Real developers realise that it’s the opposite! A tool must match the project, else you have to pick a different tool. What matters is that you have a large toolkit of programming languages and various techniques, APIs and frameworks that you’re familiar with so you can pick the right tool for each project.

And while I’ve been working as a Delphi developer for almost two decades, I have always focused on other languages too. I’ve done some projects partly in Assembly to speed some processes up. I’ve worked on C projects that needed to compile on various mainframes and Unix systems so I could only use the standard libraries. I’ve worked with techniques like ActiveX, COM, DCOM and COM+. I’ve created web pages in PHP that were served from a Delphi server application. I’ve written code in C++ whenever that was required. And since 2001 I also focused on .NET and specifically C# and ASP.NET for web development and web services. I’ve used Python, Perl, JavaScript and I’ve specialized in XML with style sheets and creating XML schemas. I even worked with ASN.1 for a project where I had to communicate with an external device that used a BER encoding standard.

And these days, my main focus is on Visual Studio 2015 with C# and C++, CLang, JavaScript and jQuery. I’m also learning more about electronics, writing C programs and libraries to use with an ATTiny 85 and other Atmel micro controllers to make my own hardware and to communicate with these self-made devices from e.g. my web server.

As a developer, it is a good thing to experiment with various electronic devices and micro controllers to hone your skills. It provides a better insight in hardware and how to communicate between devices. You will often have to consider techniques like WiFi, Bluetooth or Infrared communications and come up with proper protocols to send information between devices.

All in all, I have a varied experience with lots of hardware and software, I can manage my own web servers and am experienced with various operating systems like Windows, Linux, BSD and IOS. I am now focusing on embedded devices and Android/IOS development but I still keep all my skills up to date, including my Delphi knowledge next to C#. I need various tools in my toolbox, which is important for each and every software developer in this world.

And no, I don’t think that language X is better than language Y. Good developers care as much about programming languages as expert carpenters do about their hammers and screwdrivers. Because it is not the tool that matters, but the final product that you’re building!

The need of security, part 3 of 3.

Azra Yilmaz Poses III

Enter a caption

Do we really need to hash data? And how do we use those hashed results? That is the current topic.

Hashing is a popular method to generate a key for a piece of data. This key can be used to check if the data is unmodified and thus still valid. It is often used as an index for data but also as a way to store passwords. The hashed value isn’t unique in general, though. It is often just a large number between a specific range of values. If this range happens to be 0 to 9, it would basically mean that any data will result in one of 10 values as identifier, so if we store 11 pieces of data as hashes, there will always be two pieces of data that generate the same hash value. And that’s called collisions.

There are various hashing algorithms that are created to have a large numerical range to avoid collisions. Chances of collisions are much bigger in smaller ranges. Many algorithms have also been created to generate a more evenly distribution of hash values which further reduces the chance of collisions.

So, let’s have a simple example. I will hash a positive number into a value between 0 and 9 by adding all digits to get a smaller number. I will repeat this for as long as the resulting number is larger than 9. So the value 654321 would be 6+5+4+3+2+1 or 21. That would become 2+1 thus the hash value would be 3. Unfortunately, this algorithm won’t divide all possible hash values equally. The value 0, will only occur when the original value is 0. Fortunately, the other numbers will be divided equally as can be proven by the following piece of code:

Snippet

using System;
 
namespace SimpleHash
{
    class Program
    {
        static int Hash(int value)
        {
            int result = 0;
            while (value > 0)
            {
                result += value % 10;
                value /= 10;
            }
            if (result >= 10) result = Hash(result);
            return result;
        }
 
        static void Main(string[] args)
        {
            int[] index = new int[10];
            for (int i = 0; i < 1000000; i++) { index[Hash(i)]++; }
            for (int i = 0; i < 10; i++) { Console.WriteLine("{0}: {1}", i, index[i]); }
            Console.ReadKey();
        }
    }
}

Well, it proves it only for the values up to a million, but it shows that 999,999 of the numbers will result in a value between 1 and 9 and only one in a value of 0, resulting in exactly 1 million values and 10 hash values.

As you can imagine, I use a hash to divide a large group of numbers in 10 smaller groups. This is practical when you need to search for data and if you have a bigger hash result. Imagine having 20 million unsorted records and a hash value that would be between 1 and 100,000. Normally, you would have to look through 20 million records but if they’re indexed by a hash value, you just calculate the hash for a piece of data and would only have to compare 200 records. That increases the performance, but at the cost of maintaining an index which you need to build. And the data needs to be an exact match, else the hash value will be different and you would not find it.

But we’re focusing on security now and the fact that you need to have a perfect match makes it a perfect way to check a password. Because you want to limit the amount of sensitive data, you should not want to store any passwords. If a user forgets a password, it can be reset but you should not be able to just tell them their current password. That’s a secret only the user should know.

Thus, by using a hash, you make sure the user provides the right password. But there is a risk of collisions so passwords like “Wodan5tr1ke$Again” and “123456” might actually result in the same hash value! So, the user thinks his password is secure, yet something almost everyone seems to have used as password will also unlock all treasures! That would be bad so you need two things to prevent this.

First of all, the hash algorithm needs to provide a huge range of possible values. The more, the better. If the result happens to be a 256-bit value then that would be great. Bigger numbers are even more welcome. The related math would be more complex but hashing algorithms don’t need to be fast anyways. Fast algorithms actually speed up brute-force attacks so with hashing, slower algorithms are better. The user can wait a second or two. But for a hacker, two seconds per attempt means he’ll spent weeks, months or longer just to try a million possible passwords through brute force.

Second of all, it is a good idea to filter all simple and easy to guess passwords and use a minimum length requirement together with an added complexity requirement like requiring upper and lower case letters together with a digit and special character. And users should not only pick a password that qualifies for these requirements but also when they enter a password, these checks should be performed before you check the hash value for the password. This way, even if a simple password collides with one of the more complex ones, it still will be denied since it doesn’t match the requirements.

Use regular expressions, if possible, for checking if a password matches all your requirements and please allow users to enter special characters and long passwords. I’ve seen too many sites which block the use of special characters and only use the first 6 characters for whatever reason, thus making their security a lot weaker. (And they also tend to store passwords in plain-text to add to the insult!)

Security is a serious business and you should never store more sensitive data than needed. Passwords should never be stored anyways. Only hashes.

If you want to make even a stronger password check, then concatenate the user name to the password. Convert the user name to upper case, though. (Or lower case) so the user name is case-insensitive. Don’t do the same with the password, though! The result of this will be that the username and password together will result in a hash value, so even if multiple people use the same password, they will still have different hashes.

Why is this important? It is because some passwords happen to be very common and if a hacker knows one such password, he could look in the database for similar hashes and he would know the proper passwords for those accounts too! By adding the user name, the hash will be different for every user,  even if they all use the same password. This trick is also often forgotten yet is simple enough to make your security a lot more secure.

You can also include the timestamp of when the user registered their account, their gender or other fixed data that won’t change after the account is created. Or if you allow users to change their account name, you would require them to provide their (new) password too, so you can calculate the new hash value.

The need of security, part 2 of 3.

Azra Yilmaz Poses II

Enter a caption

What is encryption and what do we need to encrypt? That is an important question that I hope to answer now.

Encryption is a way to protect sensitive data by making it harder to read the data. It basically has to prevent that people can look at it and immediately recognize it. Encryption is thus a very practical solution to hide data from plain view but it doesn’t stop machines from using a few extra steps to read your data again.

Encryption can be very simple. There’s the Caesar Cipher which basically shifts letters in the alphabet. In a time when most people were illiterate, this was actually a good solution. But nowadays, many people can decipher these texts without a lot of trouble. And some can do it just inside their heads without making notes. Still, some people still like to use ROT13 as a very simple encryption solution even though it’s almost similar to having no encryption at all. But combined with other encryption methods or even hashing methods, it could be making encrypted messages harder to read, because the input for the more complex encryption method has already a simple layer of encryption.

Encryption generally comes with a key. And while ROT13 and Caesar’s Cipher don’t seem to have one, you can still build one by creating a table that tells how each character gets translated. Than again, even the mathematical formula can be considered a key.

Having a single key will allow secret communications between two or more persons and thus keep data secure. Every person will receive a key and will be able to use it to decrypt any incoming messages. These are called symmetric-key algorithms and basically allows communication between multiple parties, where each member will be able to read all messages.

The biggest problem of using a single key is that the key might fall into the wrong hands, thus allowing more people access to the data than originally intended. That makes the use of a single key more dangerous in the long run but it is still practical for smaller sessions between multiple groups, as long as each member has a secure access to the proper key. And the key needs to be replaced often.

A single key could be used by chat applications where several people will join the chat. They would all retrieve a key from a central environment and thus be able to read all messages. But you should not store the information for a long time.

A single key can also be used to store sensitive data into a database, since you would only need a single key to read the data.

A more popular solution is an asymmetric-key algorithm or public-key algorithm. Here, you will have two keys, where you keep the private (master) key and give others the public key. The advantage of this system is that you can both encrypt and decrypt data with one of the two keys, but you can’t use the same key to reverse that action again. Thus it is very useful to send data into a single direction. Thus the private key encrypts data and you would need the public key to decrypt it. Or the public key encrypts data and you would need the private key to decrypt it.

Using two keys thus limits communication to a central hub and a group of people. Everything needs to be sent to the central hub and from there it can be broadcasted to the others. For a chat application it would be less useful since it means the central hub has to do more tasks. It needs to continuously decrypt and encrypt data, even if the hub doesn’t need to know the content of this data.

For things like email and secure web pages, two keys is practical, though. The mail or web server would give the public key to anyone who wants to connect to it so they can encrypt sensitive data before sending it to the server. And only the server can read it by using the private key. The server can then use the private key to encrypt new data and send it to the visitor, who will use the public key to decrypt the message again. Thus, you have secure communications between two parties.

Both methods have some very secure algorithms but also some drawbacks. Using a single key is risky if that key falls into the wrong hands. One way to solve this is by sending the single key using a two-key algorithm to the other side! That way, it is transferred in a secure way, as long as the key used by the receiver is secret enough. In general, that key should need to be a private key so only the recipient can read the single-key you’ve sent.

A single key is also useful when encrypting files and data inside databases since it would only require one key for both actions. Again, you would need to store the key in a secure way, which would again use a two-key algorithm. You would use a private key to encrypt the single key and include a public key in your application to decrypt this data again. You would also use that public key inside your applications only but it would allow you to use a single public key in multiple applications for access to the same data.

As I said, you need to limit access to data as much as possible. This generally means that you will be using various different keys for various purposes. Right now, many different encryption algorithms are already in use but most developers don’t even know if the algorithm they use is symmetrical or asymmetrical. Or maybe even a combination of both.

Algorithms like AES, Blowfish and RC4 are actually using a single key while systems like SSH, PGP and TLS are two-key algorithms. Single-key algorithms are often used for long-term storage of data, but the key would have additional security to avoid easy access to it. Two-key algorithms are often used for message systems, broadcasts and other forms of communication because it is meant to go into a single direction. You don’t want an application to store both a private key and matching public key because it makes encryption a bit more complex and would provide a hacker a way to get the complete pair.

And as I said, a single key allows easier communications between multiple participants without the need for a central hub to translate all messages. All the hub needs to do is create a symmetrical key and provide it to all participants so they can communicate with each other without even bothering the central hub. And once the key is deleted, no one would even be able to read this data anymore, thus destroying almost all traces of the data.

So, what solution would be best for your project? Well, for communications you have to decide if you use a central hub or not. The central hub could archive it all if it stays involved in all communications, but you might not always want this. If you can provide a single key to all participants then the hub won’t be needed afterwards.

For communications in one single direction, a two-key algorithm would be better, though. Both sides would send their public key to the other side and use this public key to send messages, which can only be decrypted by the private key which only one party has. It does mean that you actually have four keys, though. Two private keys and two public keys. But it happens to be very secure.

For data storage, using a single key is generally more practical, since applications will need this key to read the data. But this single key should be considered to be sensitive thus you need to encrypt it with a private key and use a public key as part of your application to decrypt the original key again.

In general, you should use encryption whenever you need to store sensitive data in a way that you can also retrieve it again. This is true for most data, but not always.

In the next part, I will explain hashing and why we use it.