The need of security, part 1 of 3.

Azra Yilmaz Poses I

Enter a caption

Of all the things developers have to handle, security tends to be a very important one. However, no one really likes security and we rather live in a society where you can leave your home while keeping your front door open. We generally don’t want to deal with security because it’s a nuisance!

The reality? We lock our doors, afraid that someone gets inside and steal things. Or worse, waits for us to return to kill us. We need it to protect ourselves since we’re living in a world where a few people have very bad intentions.  And we hate it because security costs money, since someone has to pay for the lock. And it takes time to use it, because locking and unlocking a door is still an extra action you need to take.

And when you’re developing software, you generally have the same problem! Security costs money and slows things down a bit. And it is also hard to explain to a client why they have to pay for security and why the security has to cost so much. Clients want the cheapest locks, yet expect their stuff is as safe as Fort Knox or even better.

The worse part of all security measures is that it’s never able to keep everyone out. A lock on your door won’t help if you still leave the window open. And if the window is locked, it is still glass that can be broken. The door can be kicked in too. There are always a lot of ways for the Bad People to get inside so what use is security anyways?

Well, the answer is simple: to slow down any would-be attacker so he can be detected and dealt with, and to make the break-in more expensive than the value of the loot stored inside. The latter means that the more valuable the loot is, the stronger your security needs to be. Fort Knox contains very valuable materials so it has a very strong security system with camera’s and lots of armed guards and extremely thick walls.

So, how does this all translate to software? Well, simple. The data is basically the loot that people are trying to get at. Legally, data isn’t property or doesn’t even has much legal protection so it can’t be stolen. However, data can be copyrighted or it can contain personal information about people. Or, in some cases, the data happens to be secrets that should not be exposed to the outside world. Examples of these three would be digital artwork, your name and bank account number or the formula for a deadly poison that can be made from basic household items.

Of all this data, copyrighted material is the most common item to protect, and this protection is made harder because this material is meant to be distributed. The movie and music industry is having a very hard time protecting all copyrighted material that they have and the same applies to photographers and other graphical artists. But also software developers. The main problem is that you want to distribute a product in return for payment and people are getting it without paying you. You could consider this lost profit, although if people had no option but pay for your product, they might not have wanted it in the first place. So the profit loss is hard to prove.

To protect this kind of material you will generally need some application that can handle the data that you’re publishing. For software, this would be easy because you would include additional code to your project that will check if the software has been legally installed or not. Often, this includes a serial number and additional license information and nowadays it tends to include calling a special web server to check if licenses are still valid.

For music and films, you can use a technique called DRM which works together with proper media players to make additional checks to see if the media copy happens to be from a legal source or not. But it would limit the use of your media to media players that support your DRM methods. And to get media players to support your DRM methods, you need to publish those methods and hope they’re secure enough. But DRM has already been bypassed by hackers many times so it has proven to be not as effective as people hoped.

And then there’s a simpler option. Add a copyright notice to the media. This is the main solution for artwork anyways, since there’s no DRM for just graphic images. You might make the image part of an executable but then you have to build your own picture viewer and users won’t be able to use your image. Not many people want to just see images, unless it is pornography. So you will have to support the basic image file formats, which are generally .JPG or .PNG for any image on the Internet. Or .GIF for animations. And you protect them by adding a warning in the form of a copyright notice. Thus, if someone is misusing your artwork and you discover the use of your art without a proper license, then you can start legal actions against the violator and claim damages. This would start by sending a bill and if they don’t pay, go to court and have a judge force them to pay.

But media like films, music and images tend to be hard to protect and often require going to court to protect your intellectual property. And you won’t always win such cases either.

Next on the list is sensitive, personal information. Things like usernames and passwords, for example. One important rule to remember is that usernames should always be encrypted and passwords should always be hashed. These are two different techniques to protect data and will be explained in the next parts.

But there is more sensitive data that might need to be stored and which would be valuable. An email address could be misused to spam people so that needs to be encrypted. Name, address and phone numbers can be used to look up people and annoy those people by ordering stuff all over the Internet and have it sent to their address. Or to make fake address changes to change their address to somewhere else, so they won’t receive any mail or other services. Or even to visit the address, wait until the people left the house and then break in. And what has happened in the past with addresses of young children is that a child molester learns of their address and goes to visit them to rape and/or kidnap them. So, this information is also sensitive and needs to be encrypted.

Other important information would be bank account information, medical data and employment history would be sensitive enough to have encrypted. Order information from visitors might also be sensitive if the items were expensive since those items would become interesting things to steal. You should basically evaluate every piece of information to determine if it needs to be encrypted or not. In case of doubts, encrypt it just to make it more secure.

Do keep in mind that you can often generate all kinds of reports about this personal data. A simple address list of all your customers, for example. Or the complete medical file of a patient. These documents are sensitive too and need to be protected, but they’re also just basic media like films and artwork so copies of those reports are hard to protect and often not protected by copyrights. So be very careful with report generators and have report contain warnings about how sensitive the data in it actually is. Also useful is to have a cover page included as the first page of a report, in case people will print it. The cover page would thus cover the content if the user keeps it closed. It’s not much protection but all small bits are useful and a cover page prevents easy reading by passer-by’s of the top page of the report.

Personal information is generally protected by privacy laws and thus misuse of personal information is often considered a criminal offense. This is unlike copyright violations, which are just civil offenses in general. But if you happen to be a source of leaking personal information, you and your company could be considered guilty of the same offense and will probably be forced to pay for damages and sometimes a large fine in case of clear negligence in protecting this data.

The last part of sensitive data tends to be ideas, trade secrets and more. In general, these are just media files like reports and thus hard to protect, although there are systems that could store specific data as personal data so you can limit access to it. Ideas and other similar data are often not copyrightable. You can’t get copyright on an idea. You can only get copyright over the document that explains your idea but anyone who hears about your idea can just use it. So if you find a solution for cleaner energy, anyone else could basically build your idea into something working and make profits from it without providing you any compensation. They don’t even have to say it was your idea!

Still, to protect ideas you can use a patent, which you will have to register in many countries just to protect your idea everywhere. Patents become open to the public so everyone will know about it and be able to use it, but they will need to compensate you for using your idea. And you can basically set any price you like. This system tends to be used by patent trolls in general, since they describe very generic ideas and then go after anyone who seems to use something very similar to their idea. They often claim an amount of damages that would be lower than the legal amount it would cost the accused to fight back, so they tend to get paid for this trolling. This is why many are calling for patent reforms to stop these patent trolls from abusing the system.

So, ideas are very sensitive. You generally don’t want to share them with the generic public since it would allow others to implement your ideas. Patents are a bit expensive and not always easy to protect. And you can’t patent everything anyways. Some patents will be refused because they’ve already been patented before. And yet you still need to share them with others so you can build the idea into a project. And for this, you would use a non-disclosure agreement or NDA.

An NDA is basically a contract to make sure you can share your idea with others and they won’t be allowed to share it with more people without your permission. And if your idea does get leaked, those others would have to compensate your financial losses due to leakage as mentioned in the NDA contract. It’s not very secure but it generally does prevent people from leaking your ideas.

Well, except for possible whistleblowers who might leak information about any illegal or immoral parts of your idea. For example, if your idea happens to be to blow up the subway in Amsterdam and have an NDA with a few other terrorists to help you then it becomes difficult when one of those others just walks to the police to report you and those who help you. The NDA just happens to be a contract and can be invalidated for many reasons, including the more obvious criminal actions that would relate to it.

But there are also so-called blacklists of things you can’t force in an NDA, depending on the country where you live. It is just a contract and thus handled by the Civil courts. And if the NDA violates the rights of those who sign it then it could be invalid. One such thing would be the right of free speech, where you would ban people from even discussing if your idea happens to be good or not.

Other sensitive information would be things like instructions on how to make explosives or business information about the future plans of Intel, which could influence the stock market. Some of this information could get you into deep trouble, including the Civil Court or Criminal Court as part of your troubles, resulting in fines and possibly imprisonment if they are leaked.

In general, sensitive information isn’t meant to be shared with lots of people so you should seriously limit access to such information. It should not be printed and you should not email this information either. The most secure location for this information would be on a computer with no internet connection but having a strong firewall that blocks most access methods would be good enough for many purposes.

So we have media, which is hard to protect because it is meant to be published. We have sensitive data which should not fall in the wrong hands for various reasons and we have personal data, which is basically a special case of sensitive data that relates to people and thus has additional laws as protection.

And the way to secure it is by posting warnings and limiting access to the data, which is difficult if it was meant to be published. But for those data that we want to keep private, we have two ways of protecting it next to limiting the physical access to this data.

To keep things private, you will need to have user accounts with passwords or other security keys to lock the data and limit access to it. And these user accounts are already sensitive data so you should start with protecting it here, already.

Of all the things software developers do, security happens to be the most complex and expensive part, since it doesn’t provide any returns on investments made. All it does is try to provide assurance that data will only be available for those who are meant to use it.

The two ways to protect data is through encryption and through hashing, which are two similar things, yet also differ in their purpose. I will discuss both in my next posts.

An example of bad development…

I recently received an email from a company that’s doing questionnaires. And well, I subscribed to this and did some of their questionnaires before, so I wanted to do this new one too. Unfortunately, the page loaded quite slow, only to return a very nasty error message. A message that told me that this organisation is using amateurs for developers and administrators.

Let me be clear about one thing: errors will happen. Every developer should expect weird things to happen, but this case is not an error but evidence of amateurs. So, let’s start with analyzing the message…

Server Error in ‘/’ Application.

Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

So, what’s wrong with this? Users should never see these messages! When you develop in ASP.NET you can just tell the system to just keep these error messages only when the user is connected locally. A remote user should see a much simpler message.

This is something the administrator of the website should have known, and checked. He did not. By failing at this simple configuration setting the organisation is leaking some sensitive information about their website. Information that’s enough for me to convince they’re amateurs.

This error is also a quite common error message. Basically, it’s telling me that the system is having too many database connection open. One common cause for this error is when the code fails to close a connection after opening them. Keep that in mind, because I will show that this is what caused the error…

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

This is a standard follow-up message. The fact that users of the site would see this stack trace too is just bad.

Exception Details: System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

A timeout error. A reference to the connection pool and the max pool size. This already indicates that there are more connections are opened than closed and the system can’t handle that correctly. There are frameworks for .NET that are better suited for this to prevent these kinds of errors. That’s because these errors happened to be very common with ASP.NET applications. And with generic database applications written in .NET.

Basically, the top of the error message is just repeating itself. Blame Microsoft for that since this is a generic message from ASP.NET itself. Developers can change the way it looks but that’s not very common. Actually, developers should prevent users from seeing these kinds of messages to begin with. Preferably, the error should be caught by an exception handler which would write it to a log file or database and send an alert out to the administrator.

Considering that I received this error on a Friday afternoon, I bet the developer and administrators are already back home, watching television like I do now. Law & Order is just on…

Source Error:

Line 1578:
Line 1579:        cmSQL = New SqlCommand(strSQL, cnSQLconfig)
Line 1580:        cnSQLconfig.Open()
Line 1581:
Line 1582:        Try

This is interesting… The use of SqlCommand is a bit old-fashioned. Modern developers would have switched to e.g. the Entity Framework or other, more modern solution for database access. But the developers of this site are just connecting to the database in code, probably to execute a query and collect the data and then should close the connection again. The developers are clearly using ADO.NET for this site. And I can’t help but wonder why. They could have used more modern techniques instead. But probably they just need to keep up an existing site and aren’t they allowed to use more modern solutions.

But it seems to me that closing the database is not going to happen here. There are too many connections already open thus this red line of code fails. The code has an existing connection called cnSQLConfig which is already open. It then tries to open and execute an SQL command that fails. Unfortunately, opening that command happens outside a try-except block and if this fails, it is very likely that the connection won’t be closed either.

If this happens once or twice, then it still would not be a big problem. The connection pool is big enough. But here it just happened too often.

Another problem is that the ADO.NET technique used here is also vulnerable for SQL Injection. This would also be a good reason to use a different framework for database access. It could still be that they’re using secure code to protect against this but what I see here doesn’t give me much confidence.

Source File: E:\wwwroot\beta.example.com\index.aspx.vb    Line: 1580

A few interesting, other facts. First of all, the code was written in Visual Basic. That was already clear from the code but this just confirms it. Personally, I prefer C# over Visual Basic, even though I’ve developed in both myself. And in a few other languages. Language should not matter much, especially with .NET, but C# is often considered more professional than BASIC. (Because the ‘B’ in BASIC stands for ‘Beginners’.)

Second of all, this piece of code has over 1580 lines of code. I don’t know what the rest of the code is doing but it’s probably a lot of code. Again, this is an old-fashioned way of software development. Nowadays, you see more usage of frameworks that allow developers to write a lot less code. This makes code more readable. Even in a main index of a web site, the amount of code should be reasonable low. You can use views to display the pages, models to handle the data and controllers to connect both.

Yes, that’s Model-View-Controller, or MVC. A technique that’s practical in reducing the amount of code, if used well enough.

And one more thing is strange. While I replaced the name of the site with ‘example.com’, I kept the word ‘beta’ in front of it. I, a user, am using a beta-version of their website! That’s bad. Users should not be used as testers because it will scare them off when things go wrong. Like in this case, where the error might even last the whole weekend because developers and administrators are at home, enjoying their weekend.

Never let users use your beta versions! That’s what testers are for. You can ask users to become testers, but then users know they can expect errors like these.

Stack Trace:

[InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.]
   System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) +4863482
   System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) +117
   System.Data.SqlClient.SqlConnection.Open() +122
   _Default.XmlLangCountry(String FileName) in E:\wwwroot\beta.example.com\index.aspx.vb:1580
   _Default.selectCountry() in E:\wwwroot\beta.example.com\index.aspx.vb:1706
   _Default.Page_Load(Object sender, EventArgs e) in E:\wwwroot\beta.example.com\index.aspx.vb:251
   System.Web.UI.Control.OnLoad(EventArgs e) +99
   System.Web.UI.Control.LoadRecursive() +50
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

And that’s the stack trace. We see the site loading its controls and resources and the ‘Page_Load’ method is called at line 251. At line 1706 the system is apparently loading country-information which would be needed to set the proper language. Then it returns to line 1580 where it probably opens some table based on information from the language file.

Again, this is a lot of code for basically loading the main page. I even wonder why it needs to load data from the database based on the country information. Then again, I was about to fill in a questionnaire so it probably wanted to load the questionnaire in the proper language. If the questionnaire is multi-lingual then that would make sense.

Version Information: Microsoft .NET Framework Version:2.0.50727.3655; ASP.NET Version:2.0.50727.3658

And here’s one more bad thing. This site still uses .NET version 2.0 while the modern version is 4.5 and we’re close to version 5.0… It would not surprise me if these developers still use Visual Studio 2005 or 2008 for this all. If that’s the case then their budget for development is probably quite low. I wonder if the developers who are maintaining this site are even experts at software development. It’s not a lot of information that I can base this upon but in short:

  • The administrator did not prevent error messages to show up for users.
  • The use of ADO.NET adds vulnerabilities related to the connection pool and SQL injection.
  • The use of VB.NET is generally associated to less experienced developers.
  • The amount of code is quite long but common for sites that are developed years ago.
  • Not using a more modern framework makes the site more vulnerable.
  • Country information seems to be stored in XML while the questionnaire is most likely stored inside the database.
  • The .NET version has been out-of-date for a few years now.

My advice would be to just rewrite the whole site from scratch. Use the Entity Framework for the database and MVC 4 for the site itself. Rewrite it in C# and hire more professional developers to do the work.

A very generic datamodel.

I’ve come up with several projects in the past and a few have been mentioned here before. For example, the Garagesale project which was based on a system I called “CART”. Or the WordChain project that was a bit similar in structure. And because those similarities, I’ve been thinking about a very generic datamodel that should be handled to almost any project.

The advantage of a generic database is that you can focus on the business layer while you don’t need to change much in the database itself. The datamodel would still need development but by using the existing model, mapping to existing entities, you could keep it all very simple. And it resulted in this Datamodel:ClassDiagram(Click the image to see a bigger version.)

The top class is ‘Identifier’ which is just an ID of type GUID to find the records. Which will work fine in derived classes too. Since I’m using the Entity Framework 6 I can just use POCO to keep it all very simple. All I have to do is define a DBContext that tells me which tables (classes) I want. If I don’t create an entry for ‘Identifier’, the table won’t be created either.

The next class is the ‘DataContent’ class, which can hold any XML. That way, this class can contain all information that I define in code without the need to create new tables. I also linked it to a ‘DataTemplate’ class which can be used to validate the content of the XML with an XML schema or special style sheet. (I still need to work out how, exactly.) The template can be used to validate the data inside the content.

The ‘BaseItem’ and ‘BaseLink’ classes are the more important here. ‘BaseItem’ contains all fixed data within my system. In the CART system, this would be the catalog. And ‘BaseLink’ defines transactions of a specific item from one item to another. And that’s basically three-fourth of the CART system. (The template is already defined in the ‘DataTemplate’ class.)

I also created two separate link types. One to deal with fixed numbers which is called ‘CountLink’ which you generally use for items. (One cup, two girls, etc.) The other is for fractional numbers like weights or money and is called ‘AmountLink’. These two transaction types will be the most used transaction types, although ‘BaseLink’ can be used to transfer unique items. Derived links could be created to support more special situations but I can’t think of any.

The ‘BaseItems’ class will be used to derive more special items. These special items will define the relations with other items in the system. The simplest of them being the ‘ChildItem’ class that will define more information related to a specific item. They are strongly linked to the parent item, like wheels on a car or keys on a keyboard.

The ‘Relation’ class is used to group multiple items together. For example, we can have ‘Books’ defined as relation with multiple book items linked to it. A second group called ‘Possessions’ could also be created to contain all things I own. Items that would be in both groups would be what is in my personal library.

A special relation type is ‘Property’ which indicates that all items in the relation are owned by a specific owner. No matter what happens with those items, their owner stays the same. Such a property could e.g. be a bank account with a bank as owner. Even though customers use such accounts, the account itself could not be transferred to some other bank.

But the ‘Asset’ class is more interesting since assets are the only items that we can transfer. Any transaction will be about an asset moving from one item to another. Assets can still be anything and this class doesn’t differ much from the ‘BaseItem’ class.

A special asset is a contract. Contracts have a special purpose in transactions. Transactions are always between an item and a contract. Either you put an asset into a contract or extract it from a contract. And contracts themselves can be part of bigger contracts. By checking how much has been sent or received to a contract you can check if all transactions combined are valid. Transactions will have to specify if they’re sending items to the contract or receiving them from the contract.

The ‘BaseContract’ class is the more generic contract type and manages a list of transactions. When it has several transactions, it is important that there are no more ‘phantom items’. (A phantom item would be something that’s sent to the contract but not received by another item, or vice versa.) These contracts will need to be balanced as a check to see if they can be closed or not. They should be temporary and last from the first transaction to the last.

The ‘Contract’ type derived from ‘BaseContract’ contains an extra owner. This owner will be the one who owns any phantom items in the contract. This reduces the amount of transactions and makes the contract everlasting. (Although it can still be closed.) Balancing these contracts is not required, making them ideal as e.g. bank accounts.

Yes, it’s a bit more advanced than my earlier CART system but I’ve considered how I could use this for various projects that I have in mind. Not just the GarageSale project, but also a simple banking application, a chess notation application, a project to keep track of sugar measurements for people with diabetics and my WordChain application.

The banking application would be interesting. It would start with two ‘Relation’ records: “Banks” and “Clients”. The Banks relation would contain Bank records with information of multiple banks. The Clients relation would contain the client records for those banks. And because of the datamodel, clients can have multiple banks.

Banks would be owners of bank accounts, and those accounts would be contracts. All the bank needs to do is keep track of all money going in our out the account. (Making money just another item and all transactions will be of type ‘AmountLink’.) But to link those accounts to the persons who are authorized to receive money from the account, each account would need to be owner of a Property record. The property record then has a list of clients authorized to manage the account.

And we will need six different methods to create transactions. Authorized clients can add or withdraw money from the account. Other clients can send or receive payments from the account, where any money received from the contract needs to be authorized. Finally, the bank would like to have interest, or pays interest. (Or not.) These interest transactions don’t need authorization from the client.

The Chess Notation project would also be interesting. It would start with a Board item and 64 squares items plus a bunch of pieces assets. The game itself would be a basic contract without owner. The Game contract would contain a collection of transactions transferring all pieces to their first locations. A collection of ‘Move’ contracts would also be needed where the Game Contract owns them. The Move would show which move it is (including branches of the game) and the transactions that take place on the board. (White Rook gone from A1, White Rook added to A4 and Black pawn removed from A4, which translates into rook takes pawn at A4.)

It would be a very complex way to store a chess game, but it can be done in the same datamodel as my banking application.

With the diabetes project, each transaction would be a measurement. The contract would be owned by the person who is measuring his or her blood and we don’t need to send or receive these measurements, just link them to the contract.

The WordChain project would be a bit more complex. It would be a bunch of items with relations, properties and children. Contracts and assets would be used to support updates to the texts with every edit of a WordChain item kicking the old item out of the contract and adding a new item into the contract. That would result in a contract per word in the database.

A lot of work is still required to make sure it works as well as I expect. It would not be the most ideal datamodel for all these projects but it helps me to focus more on the business layer and the GUI without worrying about any database changes. Once the business model becomes more advanced, I could create a second data layer with a better datamodel to improve the performance of the data management.

 

 

 

Is XML in decline?

I happen to be one of those older software developers who saw the rise of XML. I even remember the older SGML standard, although I never used SGML. Version 1.0 of XML became an official standard in 1998. Once it became a standard, many companies started working to create the Killer App to work with XML without much of a hassle. And although at first many companies started to create their own XML parsers, not all of them were completely conform the standard. Those parsers disappeared fast enough too.

Right now, version 1.1 of XML is the latest standard. Yes, in 16 years not much has happened to this standard. And the changes that have been applied are more about supporting EBCDIC platforms and the newer Unicode definitions. There are discussions about a version 2.0 but it’s not likely to become a standard soon. Strange as it might sound, XML seems to be in decline if you look at how it’s used.

The power of XML was, of course, in the way how you defined these files and how you could do transformations on these file types. While we used DTD definition files at first to define the structure of an XML file, some smart people came up with the XSD schema format, which allowed more flexibility and is by itself an XML file. Combined with some nice, graphical tools, the XSD made it easier to define an XML file and to validate if an XML file conforms to the proper structure. And I’ve made plenty of XSD files between 2000 and 2010 since my work required a lot of XML data exchanges.

Of course, transformations are also important and here we use stylesheets. An XSLT file would be made in XML itself and define how you would convert an XML file to some other output format. In general, this output would be another XML file, an HTML document to display it in a web browser, a simple text file or even a comma-separated file. And in some special cases it could even create a complete rich text document that you could open in Word. This meant that you could e.g. send an XML file to a server and the server would then process it. It would validate the file with a schema and could do additional validation tools by using a style sheet. If it passed these validation style sheets, other stylesheets could then be used to extract data from the XML and send it to other servers for further processing, while it could also generate documentation to return to the user. You could do a lot of processing with just XML files.

Of course, XML also became popular because more developers started to create web services. And they used the SOAP protocol for this, which is a slightly complex protocol that’s heavily dependant on XML standards. Since SOAP also had some build-in version mechanism, you could always make sure if the client was still using the right SOAP definitions or not. You could even use several SOAP message formats on the same system with only the version number as difference. It wasn’t easy to set up, but it worked extremely well.
And more has been developed to support XML even more. The XPath expressions would allow you to point to specific elements within an XML document. With XQuery, you could execute queries on XML files and process the result. With namespaces you could even combine multiple XML definitions that uses similar entities. And then we have things like XLink, XPointer and XForms, which never have been very popular.

Between 2000 and 2010, it seemed that XML would be a dominating development technique. No more writing code in other programming languages that needed to be compiled, simply because XML happens to be a fast scripting environment. Many platforms started to have a standard for objects that could process XML files and knowledge of XML became a hard-needed requirement for developers. So, what changed?

Well, many developers consider the XML format a bit bulky, especially because tags are often used twice. Once to open the element and once to close it. Thus, if an element is called ‘NumberOfElements‘ then you have to write <NumberOfElements>10</NumberOfElements> and that’s a lot of text to store the number 10. As a result, some developers would then shorten those tag names so the resulting XML would be smaller. If you have 10,000 of these tags in your XML file, shortening it to TOE would save 26 characters per element, thus 260,000 characters in total. This doesn’t seem much but developers feel they gain more by these kinds of optimizations. With modern multi-core processors and systems with 8 or more GB of RAM, such optimizations might make the code half a second faster, which you barely notice with web services, but still… Developers think it saves a lot. And yes, when resources are truly limited, it makes a lot of sense but modern mentalities are that companies will just add a second server if one is too slow. Or more, if need be. This is because the costs of the more hardware is less expensive than the costs of having developers optimize the code even further.

These kinds of optimizations make XML files less human-readable while the purpose was to make this kind of data more readable. It becomes slightly worse when the XML file uses namespaces, since those namespaces are also shortened to just a few letters.

Another problem is the need to parse XML to extract the data. More and more companies are creating web applications that run within web browsers and heavily rely on JavaScript. These apps need to be able to run on multiple devices too. Unfortunately, not all browsers support parsing XML files and even those who do are a bit complex to use. With regular expressions it’s still possible to extract some data from the XML but if you need to fill a grid with 50 rows and 20 columns, things become real complex. And to solve this, developers started to send data to web applications as JavaScript instead of XML. This could then be executed and thus the data would load itself into memory. Since JavaScript objects are less bulky than the begin/end tags of XML elements, it made this new format very practical and thus JSON was born.

The birth of JSON also demanded a change in web services. Since web applications would call these services directly, it would be very clumsy if they have to set up SOAP messages and then parse the SOAP results. A newer, simpler style of web services arose, which uses the REST protocol. Of course, there are many other web service protocols but REST seems to become the new standard. Especially because it’s a simpler protocol that relies on the HTTP(s) protocol.

Of course, web applications have become more important these days because we’re getting more and more devices with all kinds of different operating systems, which all have web browsers. And, as I said, not all of those devices have a native XML parser built-in. They do support JavaScript though, and as a result it becomes quite easy to develop web applications for all devices which use data in JSON formats.

Of course, many devices also allow special platform-dependant apps that can be created with development tools for their specific platforms. For OS X and iOS-based devices you would use Objective C while you would use C++ or Java for Android devices. (Java is the preferred development platform for Android.) For Windows RT you would use .NET for Metro-style applications with either VB or C# as primary language. This makes it a bit difficult to develop software that runs on all three devices but there are several parties who have created compilers that will compile platformdependent executables from platform-independent code. Unfortunately, working with XML parsers still differs on all these platforms and those third-party compilers need to wrap their parsers around the built-in parsers of the underlying platform. That makes them a bit slow.

Since the number of operating systems have risen since the market starts getting more and more new devices, it becomes more difficult to keep a single standard that’s supported by all those systems. And the XML standard is quite complex so the different parsers might not all support the same things. In that regard, JSON is much simpler since these are just simple assignment statements. And these assignment statements are based on the Java syntax, which also happens to be similar to the C++, C# and Objective C syntax. The only difference with these languages is the fact that JSON puts the field names between quotes too, which you can’t do inside these languages.

So, XML is becoming less useful because it requires too much work to use. JSON makes data serialization simpler and is less bulky. Especially when developers are more focussing on web applications and apps for specific devices, the use of XML is in decline in favor of JSON and other solutions. But there’s one more reason why XML is in decline. And this is something within the .NET framework that’s called LINQ.

LINQ was implemented as a separate library for .NET version 3.5 but has become popular since then. Basically, LINK allows you to support data in a structured object and use simple queries to, or to execute transformations on extract data from those objects. This would be similar to XPath and XSLT but now it’s part of your development language, allowing you more choice in functions that you can apply to the data. This is especially important for date fields, since XML doesn’t work well with date formats. LINQ actually makes extracting data from object trees quite easy and can be used on an XML document if you’ve read this document in memory in a proper XDocument or XmlDocument object. Thus, the need for XSLT to transform data has disappeared since you can do the same in C#, VB, F# or Oxygene.

The result is that .NET developers don’t have to learn about XML anymore. Their .NET knowledge combined with LINQ is more than enough. Since .NET also allows serialization to and from XML formats, it’s also quite easy to read and write XML files in .NET. You can import an existing XSD file into your .NET application and have it converted to code, but since most XML data starts as objects that need to be stored in XML before serialization, you will often see that developers just define the objects and include attributes to tell if the object and its fields are elements or attributes, and have the serialization library use these object definitions to serialize it to and from XML. Thus, knowledge of XML schemas is not a requirement anymore.

Because .NET development made the dependency on XML knowledge almost obsolete, the popularity of XML is in decline. It’s still used quite often, but the knowledge that you need to do practical things with XML with XML tools is disappearing. And similar things are happening on other platforms. Java and PHP also started supporting LINQ queries. And, as a result, those environments can work on structured objects instead of XML data. Thus, XML is only needed if the data needs to be sent to some other process and even then, other formats might be chosen too.

In fact, many developers are less concerned about the data format that’s used for inter-process communication. The system is handling this for them and they just use a specific serialization library that does the bulk of the work for them. XML isn’t really declining, but less developers need knowledge about the XML format since development tools have nice wrappers around them that allow these developers to use XML without even realizing they’re using XML. It’s not XML that’s in decline. It’s the knowledge about XML that is in decline…

Motivating developers…

One of the biggest problems for software developers is finding the proper motivations to sit behind the screen for 8 hours per day, designing and developing new code, new projects. It’s generally boring work that requires a lot of mental efforts. And the rewards tend to be just more of the same work the next day, and the day afterwards. Creating new code or fixing existing code is like working in a factory in an assembly line, just placing a lid on a pot which someone else will close, over and over and over.

But developing code is a mental job, unlike adding lids to pots. During physical jobs, your mind can wander around to what you’re going to do in the weekend, what’s on television or whatever else you have on your mind. A mental job makes that very difficult since you can’t think about your last holiday while also thinking about how to solve this bug. And thus developers have a much more complex job than those at the assembly line. A job that causes a lot of mental fatigue. (And sitting so long behind a screen is also a physical challenge.)

Three things will generally motivate people. Three basic things, actually, that humans have in common with most animals. We all like a good night of sleep, we all like to eat good food and we’re all more or less interested in sex. Three things that will apply for almost anyone. Three things that an employer might help with.

First of all, the sleep. Developers can be very busy both at home and at work with their jobs. Many of them have a personal interest in their own job and can spend many hours at home learning, playing or even doing some personal work at their own computers. Thus, a developer might start at 8:30 and work until 17:00. The trip home, dinner and meet and greet with the family will take some time but around 19:30 the developer will be back online on Facebook and other social media, play some online games or study new things. This might go on until well past midnight before they go to bed. Some 6 hours of sleep afterwards, they get up again, have breakfast, read the morning paper and go back to work again.

But a job that is mentally challenging will require more than 6 hours of sleep per day. So you might want to tell your employees to take well care of themselves if you notice they’re up past midnight. You need them well-rested else they’re less productive. Even though those developers might do a great job, they could improve even more if they take those eight hours of sleep every day. And as an employer you can help by allowing employees to visit social sites during work hours since it will help them relax. It lowers the need to check those sites while they’re at home. The distraction of e.g. Facebook might actually even improve their mental skills because it relaxes the mind.

The second motivation is food. Employers should consider providing free lunches to their employees. Preferably sharing meals all together in a meeting room or even a dinner room. Have someone do groceries at the local supermarket to get bread, spread, cheese, butter, milk, soda’s and other drinks and other snacks. While it might seem a waste of the money spent on those groceries, the shared meal will increase moral, allow employees to have all kinds of discussions with one another and increases the team building. It also makes sure everyone will have lunch at the same moment, so they will all be back at work at the same time again.

Developers tend to have lunch between 11:30 and 14:00 and if they have to get their own lunch, it’s not unlikely for them to just go out to the local supermarket themselves or to bring lunch from home. When they go shopping for lunch, they would be unavailable during that time. Of course, lunch time is their own time, but if you need them you don’t want to wait until they’re back from the supermarket. And another problem is that those employees will start storing food at work in their desk or wherever else they can store it. This could attract mice, and I don’t mean computer mice but those live, walking and eating animals.

If an employer provides the lunch and other snacks, this also means there’s a generic storage for food products. This storage is easier to keep up than the desks of developers. Besides, those developers now know their food requirements are satisfied during work hours thus they feel more comfortable.

The third motivation is sex. And here, employers have to be extra careful because this is a very sensitive subject. For example, a developer might spend some time on dating websites or even porn sites. Like social websites, a small distraction often helps during mental processes but a social website might take two minutes to read a post and then respond. A dating website will take way more time to process the profiles of possible dating partners. A porn site will also be distracting for too long and might put the developer in a wrong mood.

The situation at home might also be problematic. An employee might be dealing with a divorce which will impact their sex lives. It also puts them back into the world of dating and thus interfere in their nightlife a bit more. This is a time when they will be less productive, simply because they have too much of their personal lives on their minds. And not much can be done to help them because they need to find a way to stabilize their personal lives again. Do consider sending the employee to a proper counselor for help, though.

Single developers might be a good option, though. They are already dealing with a life of being single and thus will be less distracted by their dates. Still, if they’re young, their status of being single might change and when that happens, it can have impact on their jobs. But the impact might be even an improvement because their partner might actually force them to go to bed sooner, thus fulfilling the sleep motivation.

Married developers who also have children might be the best option since their family lives will require them to live a very regular life. The care for their children will force this regularity. But the well-being of those children might cause the occasional distractions too. For example, when a child gets sick, the developer needs someone to care for the child at home. And they might want to work at home a few days a week to take care of their children.

As an employer, you can’t deal with the sex lives of your employees at work. Those things are private. However, it can be helpful for employees if they can spend more time at home, in a private area, if they have certain needs in this regard. Allowing them to work at home would give them some more options. Since they don’t need to travel to work, they have more time available. If they decide to visit a dating site for half an hour, they could just work half an hour longer and no one would even know about it. If their child is sick, they can take care of them and still work too.

In conclusion, make sure your employees sleep well, give them free lunches and other snacks at the workplace and allow them to work at home for their personal needs. This all will help to make them more productive and allow them to improve themselves.

To Agile/Scrum or not?

The Internet is full buzzwords that are used to make things sound more colorful than they are. Today’s buzzword seems to be “Cloud solutions” and it sounded so new a few years ago that many people applied this term to whatever they’re doing, simply to be part of the new revolutions. Not realizing that the Cloud is nothing more than a subset of websites and web services. And web services are a subset of the thin client/server technologies of over a decade ago. (Cross-breeding Client/Server with the Web will do that.) It’s just how things evolve and once in a while, a new buzzword needs to be created and marketeers are now working on the next buzzword that should make clear the Cloud is obsolete. Simply because new products need to be sold.

Still, the Software Development World hasn’t been quiet either. In the past, a project would be completed through a bunch of steps. It would start with an idea that they would turn into a concept. And this concept would include all requirements for the project.  Designers would then be called to come up with some basic principles and additional planning. When they’re done, they start to implement things, which would include methods to integrate the project into existing products and basically writing all code. It would then be tested and once the tests are satisfying, the whole project could be deployed and the maintenance would start.

If the project had problems in one of these steps, they would often have to go back one step. (Or more, in rare occasions.) This principle is called the “Waterfall model” and it’s drawback is that every step could take weeks to finish. It generally means that you can only update twice per year. Not very popular, these days.

So, new ideas were needed to make it possible to create updates more often. It started with the Agile Manifesto in 2001 and it has become a very popular method these days. Most groups of developers will have heard about it and have started implementing its principles. Well, more or less…

Agile has just four basic rules to keep in mind:

Individuals and interactions over processes and tools.
Working software over comprehensive documentation.
Customer collaboration over contract negotiation.
Responding to change over following a plan.

That’s basically the whole idea. And it sounds so simple since it makes clear what is important in the whole process. Agile focuses a lot on teamwork and tries to keep every team member involved in the whole process. Make sure every member is comfortable with the whole process and basically, talk a lot with one another over the whole process. People tend to forget it, but communication is a key element between people.

Of course, whatever you publish should work, and work well enough so users don’t complain about crashing applications or lost data. You might be missing features that customers would like, but that should not be the main focus of the whole process. Keep it working and keep the customer happy.

Of course, since you’re dealing with customers, you will need to know what they actually want. It’s fine if the CEO decided that the project needs methods X and Y to be implemented but if all customers tell you they want methods A or B implemented, then either the CEO has to change his mind or the company should start looking for a new CEO.

And keep in minds that things change, and sometimes change real fast. It’s hard to predict what next year will bring us, even online. Development systems get new updates, new plug-ins and new possibilities and you need to keep up to be able to get the most out of the tools available.

So, where do things go wrong?

Well, companies tend to violate these principles quite easily. And I’ve seen enough projects fail because of this, causing major damage or even bankrupt companies simply because the company failed at Agile. Failure can be devastating with Agile, since you’re developing at high speeds. And we all know, the faster you go, the harder you can fall…

Most problems with Agile starts with management. Especially the older managers tend to live in the past or don’t understand the whole process. Many Scrum Sprints are disrupted because management needs one or more developers from that sprint for some other task. I’ve seen sprints being disrupted because a main programmer was also responsible for maintaining a couple of web servers and during the sprint, one of those servers broke down. Since fixing it had priority, his tasks for that sprint could not be finished in time and unfortunately, other tasks depended on this task being ready.

Of course, the solution would be that another team member took over this task, but it did not fit the process that the company had set up. This task was for a major component that was under control by just one developer. Thus, he could not be replaced because it disturbed the process. (Because another developer might have slightly different ideas about doing some implementations.)

Fortunately, this only meant a delay of a few weeks and we had plenty of time before we needed to publish the new product. We’d just have to hurry a bit more…

Agile also tends to fail when teams don’t work well together. Another company had several teams all working on the same project. And unfortunately, the project wasn’t nicely divided in pieces so each team had its own part. No, all teams worked on all the code, all the pieces. And this, of course, spells trouble.

When you have multiple teams working on the same code, you will often need an extra step of merging code. This is not a problem is one team worked on part A and the other on part B. It does become a problem when both teams worked on part C and they wrote code that overlaps one another. Things will go fine when you test just the code of one team but after the merge, you need to test it all over again, thus the whole process gets delayed by one more sprint just to test the merged code. And it still leaves a lot of chances for including bugs that will be ignored during testing. Especially manual testing, when the tester has tested process X a dozen of times already for both teams and now has to test it again for the merged code. They might decide to just skip it, since they’ve seen it work dozens of times before so what could go wrong?

As it turns out, each team would do its own merging of the code with the main branch. Then they would build the main branch and tell the testers. Thus, while testers would be busy to test the main branch that team 1 provided, team 2 is also merging and will tell them again, a few days later. The result is basically that all tests have to be done over again so days of testing wasted. Team 3 would follow after this, thus again wasting days of testing. Team one then decides to include a small bugfix and again, testing will have to start from the beginning, all over again.

With automated testing, this is not a problem. You would have thousands of tests that should pass and after the update to the main branch, those tests would start running from begin to end. Computers don’t complain. However, some tests are done manually and the people who execute those tests will be really annoyed if they have to do the same test over and over with every new build. It would be better if they’d just try to automate their manual tests but that doesn’t always happen. So, occasionally they decide that they’ve tested part X often enough and it never failed so why should it fail the next time?

Well, because team 1 and team 2 wrote code that conflicts with one another and that code is in part X. The testers skip it, thus the customer will notice the bug. Painful!…

There are, of course, more problems. I’ve seen a small company that had a nice, exclusive contract with a very big company. Lets call them company Small and company Big. Company Small had created a product that company Big really liked so they asked for an exclusive version of it, with features that company Big would choose. And this would be a contract that would be worth tens of millions for company Small and its ten employees.

And things would have gone fine if company Small had not decided to continue working on its own products and just focused on delivering what company Big wanted, and to deliver in time. But no, other things were more important and the customer would just get what company Small made, with some minor adjustments. And the CEO was quite happy with this progress. That is, until the customer noticed that they did not hear his wishes. All company Big was supposed to do was sign the contract and pay the bill. And once things were done, they would just have to accept what was given to them. So company Big found another company willing to do the same project and just dumped company Small. End of contract and thus end of income, since company Small just worked exclusively for the bigger company. And within five months, company Small went tits-up, bankrupt. Why? Because they did not listen to the customer, they did not keep them happy.

And another problem is the fact that companies respond very slowly on changes. I’ve worked for companies that used development tools that were 5 years old, simply because they did not want to upgrade. I still see the occasional job offering where companies ask for developers skilled with Visual Studio 2008 while there are three newer versions available already. (Versions 2010, 2012 and 2013.) In 2003 I was still working on a 16-bit project that was meant to be used by Windows 3.1 and up, simply because one single user still used an old Windows 3.11 system. At least, we thought they did because no one ever asked them if they’ve upgraded. And that customer never told us that they had indeed upgraded and didn’t think of asking for a 32-bit version…

I’ve seen management hang on to a certain solution even though there’s plenty of evidence that newer options are available. I’ve developed software on 32-bit systems with 2 GB of memory when 64-bit systems were available and had up to 8 GB of memory, plus more speed. I had to use a single-monitor system on a PC that had options for multiple monitors plus we had extra monitors available, but management considered it a waste. The world is changing and many systems now easily support two or more monitors but some companies don’t want to follow.

So, what is Agile anyways? It’s a method to quickly respond to changes and desires of customers with a well-informed team that feels committed to the task and to deliver something the customer wants. (And customers want something they can use and which works…)

Would there be a reason not to use Agile? Actually, yes. It’s not a silver bullet or golden axe that you can use to solve anything. It’s a mindset that everyone in the team should follow. One single member in the team can disrupt the whole process. One manager who is still used to “the old ways” can devastate whole sprints. When Agile fails, it can fail quite hard. And if you lack the reserves, failure at Agile can break your company.

Agile also works better for larger projects, with reasonable big teams. A small project with one team of three members is actually too small to fully implement the Agile way of working, although it can use some parts of it. Such a small team tends to make planning a bit more difficult, especially if team members aren’t always available for the daily scrum meetings. When you’re that small, it’s just better to meet when everyone is available and discuss the next steps. No clear deadlines, since the planning is too complex. What matters is that goals are set and an estimation is made when it is finished. Whenever the team meets, they can then decide if the estimation is still correct or if it needs to be adjusted.

Another problem can be the specialists that are part of the team. Say, for example, that you have a PHP project that needs to communicate with a mainframe and some code written in COBOL. The team might have hundreds of PGP developers but chances are that none of them know anything about COBOL. So you need to have a COBOL specialist. And basically he alone would carry the tasks of maintaining the mainframe side of the project. You can make him part of the Scrum meetings but since he has to do his part all by himself, he doesn’t have much use for the other team members. So again, just decide on a specific goal and estimate when it should be finished. Get regular updates to allow adjustments and let the COBOL developer do his work.

The specialist can become even more troublesome if you have to interact with a project that another company is creating. If you do things correctly, you and the other company would discuss a generic interface for the interaction between both projects. You would then both build a stub for the other company to use for testing. This stub just has to offer some dummy information, but it should be usable.

When both companies have the stubs they need, they can each work on their part. They will have to keep each other informed if some parts of the interface need to be changed or if some rules are changed about the data that can be provided. Preferably, this is done by providing a new stub. Both teams will have just one goal, which is providing all the required methods that are part of the stubs. And when parts are fully implemented, they can offer the other company with new stubs that contain some working parts already.

Still, when two companies have to work together this way, they have to think small. Don’t create a stub with thousands of methods for all the things you want to add during the next 5 years. Start small. Just add things to the stub that you want to finish for the next sprint. Repeat adding things per sprint and communicate with the other company about what they’re going to add next. You don’t have to work on the same method of the stubs anyways. One company might start working on the GUI part that allows users to enter name, address and phone number while the other works on storing employment data and import/export management. The stubs should just give dummy methods for those parts that aren’t implemented yet. Each company should develop the parts that they consider the most important, although both should be aware that everything is finished only if all stub methods are implemented.

Agile is just a mindset. If used properly, it can be very powerful. However, do keep in mind that not all of Agile might be practical for your own situation. Agile requires a lot of time for meetings with developers, with customers and with management. Everyone needs to be involved and everyone needs to be available for those meetings. Scrum becomes more difficult if not all team workers are available on all five workdays of the week. And worse of all,, team members will have to prepare for the meetings. Even for the daily meetings since they have to keep track of their own progress.

Do not fear to just implement part of the whole Agile/Scrum principle. It is made to hybridise with other methods. Use the methods, don’t let the method force itself upon you.

Let’s talk about social media…

When I was a kid, there just wasn’t any internet. If you wanted to speak with someone else, you’d had to pick up the phone or just go visit them. Being social was complex because it involved plenty of travel to meet others. And even when the Internet was born, being social was still something that people did in real life, not behind a computer screen. Still, things slowly changed about 15 years ago, when people started to use the Internet for all kinds of fun things. It also helped that proper internet tools became more popular. (And free!) The increased speed and the change from the 33k6 modem to ADSL or Cable also helped a lot. And now, just one generation further, being social is something we do online, with bits and bytes.

But enough history. And no, I won’t explain what social media are because now, you’re reading stuff I wrote on such a social media website. (Yeah, a Hosted WordPress site, but I could have used Blogger or Tumblr too..) This discussion is about the complexity of all those social media, not their history.

Most people will be familiar with both Twitter and Facebook. On Twitter you post a message that you’ve just pooped and on Facebook you post the picture of the result. And if you’re a professional, you might also post it on LinkedIn, if you’ve pooped during office hours. Since you can connect these three together, you will start to build a practical resource with all kinds of personal information about you online. Twitter will be used to send small but important updates about yourself, your company or your products to every subscriber while Facebook is practical to connect with the consumers at home. But if you’re looking for a new job or need to hire or find some experts, you use LinkedIn for your search.

Search? That reminds me. There’s also Google Plus although not many people use it as a social platform. Still, people like it because you can use your Google Plus account to log in many other websites. (Facebook, LinkedIn and Twitter also support this.) Google also provides email accounts and document management tools, plus plenty of online storage, so it’s a very attractive site to use, even if people still are less social on Google Plus than they are elsewhere.

Yahoo also used to be a great social media center, but the competition with other sites has lessened its influence considerably. Many things that Yahoo offers is also available on other sites. Yahoo also used to be great with their email services until they decided to drop support for email through POP/SMTP, just when Google decided to start increasing their email services. By doing so, Yahoo lost much if it’s influence and never really managed to get some back, although their photo-service Flickr still holds plenty of value. (But here too, the competition becomes murderous.)

Pinterest, for example, can also be used to share photo’s with others, although Pinterest is mostly used to share pictures from others, to promote those people. Basically, it’s a site for fans. DeviantArt is a bigger challenge for Flickr and has a huge amount of graphics. Especially cartoons and CGI next to pictures. But DeviantArt is missing an easy way to connect your other social media to your DeviantArt account.

So Behance is another interesting photo site where you can build your gallery and, more importantly, allow people to contact you and offer you jobs and other career opportunities. It also connects better with other social media and if it was free, it would definitely kill Flickr. Unfortunately, the free version has limitations and the commercial version is a bit expensive if you just want to share a bit of your work. Or maybe you’d prefer Bitpine.

Then again, if you’re into the art of images and photo’s, you might like to try to make some profit by selling merchandise. Cafepress is known for this and allows you to upload pictures and put it on all kinds of things, including the cape for your dog or panties for your girlfriend. There are plenty of other sites that allow simpler merchandise like t-shirts but Cafepress just has a huge collection of things you don’t need but which still look nice with your picture on them.

There are more social media sites, of course. Including sites that will combine all your social media sites into a single reference for all your friends to know where you hang around. About.me will combine your bio, your résumé and all kinds of social media connections. Mine tends to have plenty of connections. Connect.me is also practical to connect with other people and allows you to build up your online reputation. TrustCloud is another medium that links people you know to your account. (Or mine.) Or go to Visify and tell others how active you are online.

An oldie is Reddit which is more like an online forum. However, it has so many users that all discussions go very fast. Vimeo can be used to share videos, just like YouTube. Or use GitHub if you’re a software developer and want to share your code with others. Or Society3 for those who need social media for their marketing strategies. Or, the simples one: FourSquare, where you can tell where you are and where you went.

Well, I’ve mentioned plenty of social media sites and it’s all great to share your personal information with the World and get your 15 minutes of fame. And they all connect to one another, often via ID providers from Google, Facebook, Twitter or LinkedIn and lately also from Adobe. (Although Adobe is mostly using its ID provider to have others connect to the Creative Cloud.) If you’re connected to even a third of these sites, then there’s a lot of information about you online. And this is where it starts to become creepy and dangerous.

First of all, the amount of personal information that people share is huge. The joke I started with that people tell others on Twitter that they’ve just pooped isn’t just a joke. It happens! But when people are on a holiday, they also tend to use Twitter, FourSquare and TwitPic to tell the World where they are. With more information on Facebook, thieves can try to find where those people live and rob those empty homes. They might also check LinkedIn to see if someone might have some interesting stuff at home. For example, a CEO of a company who’s on holiday in Italy is a more interesting target than a teacher visiting his aunt in Almelo. And this is just a few different media that can be abused by others without the need to hack anything.

So beware of your privacy and avoid sharing sensitive information online. Or at least be less interesting than the other online people.

But getting robbed is just one risk. You can protect your home, make sure there’s at least one person there when you’re on holiday. The problem is that all these media are connected to one another. And in general, you have given them permission to combine their information. And systems are as strong as their weakest links.

Take, for example, Facebook. Many websites use your Facebook ID to let you log in to those websites. Thus, if someone hacks your Facebook account, they also have access to those other websites. And if one of those sites has your credit card information, your bank account information or your PayPal information. They might not even need this information to make purchases in your name, simply because those connected sites remember this internally. I checked which all I use that are connected to Facebook and it turns out that I’m connected with over a hundred other websites! I know a few friends of mine have an average of around 40 other sites connected to their Facebook account and it’s easy to increase that number since plenty of sites want to connect to Facebook.

Fortunately, I have created several websites that connect to Facebook so several of those connected apps are actually my own sites. Still, it’s a lot. It means that you have to be aware that anyone who hacks my Facebook account will be able to use these other sites. What they can do on those sites depends on how those other sites have implemented their security. And the same applies with apps connected to Google Plus, Twitter or LinkedIn.

If you use Flickr or Yahoo then you might have connected that account with Facebook or Google Plus. Since Yahoo is used as ID provider for even more websites, you can see a complete chain fall down once your Facebook account is taken over. This makes Yahoo less reliable than the others. With Facebook, Twitter, LinkedIn and Google you can try to add more security. For example, only copy the ID key from the provider plus the email address and force the user to generate a new password for your site. Thus, if Facebook is hacked, they still need a password for your site.

Which causes another problem. When people have a few dozens of social media accounts, they start having troubles remembering all the passwords. I use an email alias for every site. Websites tend to allow visitors to log in with email address and password so I can use the same password for many sites, because the user email address is different for every site. (I still use different passwords too, though.) Most people just use the same address and password for many sites, though. And that’s a big risk, because if one of the sites is hacked, the hackers will be able to use that information for all the other sites.

The bigger websites do have a proper security. At least, that’s what most people think. However, both Adobe and LinkedIn have had some serious trouble with their user databases and users of both websites have received a notice in the past urging them to change their password immediately, because of the hacks. And these were just the bigger sites who dared to publicly admit they’ve been hacked. Smaller social media sites can be a bigger risk if their security isn’t strong enough. Which is why it’s actually better that they use ID providers from the bigger sites instead of implementing their own systems.

Developers often ignore security, thinking that what they’re making isn’t very interesting for hackers. But I can’t say it often enough and remind people that social media are just chained together. One weak link exposes all.

When you want to build your own social media website then be very aware of the security. Don’t build your own version unless you have an expert in your team. And even then have the code audited by another expert. Since social media chain together, a weak link in this chain will take it all down. Which reminds me of this xkcd comic:

xkcd

When you create your own ID provider, you’re just adding to the competing standards that already exist. What would make your system better than those others? Your site will be more secure by using an existing provider but if that provider has a weakness, your site will fall too unless you require more information.

My suggestion would be that people should be able to log in using Google Plus, Facebook, Twitter or LinkedIn but combine it with some extra security. You know, for example, the IP address from the visitor thus you can remember it. As long as it’s the same as in your history, it’s unlikely that the account is hacked. Once it changes, you should ask for one more extra piece of information like a separate password. The visitor should know this, since he might have had to enter it during registration.

Another option would be by asking the visitor for his mobile phone number during registration so you can send an SMS message as part of the authentication process. Thus, if a user is using a different computer, you can send an SMS with a security code. The user will have to enter that code and then you know you can trust that system. Add it to the list of trusted computers for this user and you can keep the visitor safe. (Microsoft is doing something like this with Windows Live.)

So, a long story just to start a discussion about the best way to secure social media, reminding everyone that there are actually a lot of sites chained together through all of this.