Seriously?

Posted on 2019/01/27 by W.A. ten Brink

I noticed a very interesting post today on LinkedIn and I seriously have to rant about how dumb it really is. But look for yourself first and see if you’re good enough as a developer to see the flaws in it.

This is a preview image for a free course on C++ programming. Now, I’m no expert at C++ as I prefer C# and Delphi but I can see several flaws in this little preview. Yes, even though it’s a very tiny piece of code, I see things that trouble me…

And the reason for ranting is not because the code won’t work. But because the code is badly written. And a course is useless when it encourages writing code badly… So, here it is:

But keep in mind that code can be bad but if it works just fine then that’s good enough. Good developers will go for great products, not great code. It’s just that some bad code makes software development a bit harder… In general, coding practices include the use of proper design patterns and to make sure code is reusable.

So, not counting the lines with just curly brackets, I see five lines of code and two serious problems. And that’s bad. But the flaws are related because the person who wrote this C++ code seems to ignore the fact that C++ supports objects and classes. But it probably works so these aren’t really errors. They’re flaws in the design…

So, the first flaw should be obvious. I see various variable names that all start with a single letter and an underscore. The “m+” seems to suggest that these variables are all related. That means, they should be part of an object instead of just some random variables. The “m_ probably stands for “man” or whatever so you would need a class called “Man” and it has properties like ‘IsJumping’, ‘IsFalling’, ‘TimeThisJump’ and ‘JustJumped’. It suggests that Man is in a specific state like falling or jumping. And as it is a state, you would expect a second class for the “Jump” state. And probably others for “Falling”, “Walking” and other states.

The action triggered is a jump so the Man needs to be in a state that supports a jump. Jumping and Falling would not support this state so if man is in one of those states, he can’t jump. Otherwise, his state is changed to Jumping and that would start the timer. And if the timer is zero then ‘JustJumped’ would be true, right? So, no need to keep that property around.

So, all this code should have been classes and objects instead. One object for the thing that is doing something and various “State” classes that indicates what’s he doing at that moment. The fact that the author did not use classes and objects for this example shows a lack of OO knowledge. Yet C++ requires a lot of OO experience to use it properly and allow code reuse. After all, other items might jump and fall also, so they would use the same states.

The next flaw is less obvious but relates to “Keyboard::IsPressed”. This is an event that gets triggered when the user presses a key. It’s incomplete and there’s a parenthesis in front of it so I don’t see the complete context. But we have an event that is actually changing data inside itself, rather than calling some method from some class and let that method alter any data. That’s bad as it makes code reuse more difficult.

One thing to avoid when writing code is doing copy and paste of code from somewhere else. If a piece of code is needed in two or more places then it should be encapsulated in a single method that can be called from multiple locations. So this event should call a method “DoJump”, for example, so you can also have other calls to this method from other locations. For example, if you also want to support jumping on a mouse click.

Too many developers write events and fill these events with dozens of lines of code. Which is okay if this code is only called at one moment. But it’s better to keep in mind that these event actions might also get triggered by other events so you need to move them into a separate method which you can then call.

That would allow you to reuse the same code over and over again without the need for copy and paste programming.

So, in conclusion, I see a preview for a C++ course that looks totally flawed to me. No errors, just flawed. This is not the proper way to write maintainable code!

A fifth pillar for programming?

Posted on 2019/01/26 by W.A. ten Brink

For many years I’ve been comfortable knowing that programming is build on just four pillars. These are all you need to develop anything you want and these would be:

Statements, or basically all the actions that need to perform.
Conditions, which determines in which direction the code will flow.
Loops for situations where actions need to be repeated.
Structures to organize both code and data into logical units like records and classes, databases with indices and even protocols so two applications can communicate with one another.

But as I’m developing more and more asynchronous code where you just trigger an action and later wait for the response, I start wondering if there should be a fifth one. So, should there be a fifth one?

Parallel tasks, where two or more flows run concurrently.

It’s a tricky one and if you learned to use flow charts and Nassi–Shneiderman diagrams to design applications then you’ll discover that these diagrams don’t have any popular option to represent parallel tasks in any way. Well, except in the way I just said, by triggering an external event and ignore the result until you need it.

Well, Nassi-Shneiderman does have this type for concurrent execution:

File:NassiShneiderman-Parallel.svg — Image by Homer Landskirty, CC-BY-SA 4.0 International.

And flowcharts also have a fork-join model to represent concurrent execution. So these diagrams do provide some way to indicate parallel execution. But the representation is one that suggests several different tasks are executed at the same time. But what if you have an array of records and need to execute the same task on every record in a parallel way instead?

Parallel execution is quite old yet I never found a satisfying way to display concurrency in these kinds of diagrams. Which makes sense as they are meant to define a single flow while parallel execution results in multiple, simultaneous flows. And parallel execution can make things quite complex as you might just want to cancel all these flows if one of them results in an error.

But it is possible to represent them in diagrams for a long time already even though their usage was rare in the past. But in modern times where computers have numerous cores and multithreading has become a mainstream development technique, it has become clear that parallel or concurrent execution is becoming an important standard in programming.

So it’s time to recognize that there’s a fifth pillar to programming now. But mainly because it has become extremely common nowadays to split the flow of execution only to join things at a later moment.

But this will bring many challenges. Especially because forking can result in similar tasks being executed while they don’t have to join all at once either. It basically adds a complete new dimension to programming and while it’s an old technique, it isn’t until recently that mainstream developers have started using it. The diagrams are still limited in how it can be executed while developers can be extremely creative.

Of course, there’s also a thing like thread pools as computers still have limits to how many threads can be executed simultaneously. You might trigger 50 threads at once yet add limitations so only 20 will run at the same time while the rest will wait for one of those threads to finish.

It becomes even more complex when two or more threats start sending signals to one another while they continue running. This is basically a start for event-driven programming, which doesn’t really fit inside the old diagram styles. You could use UML for various of these scenarios but it won’t fit in a single diagram anymore. A sequence diagram could be used to display parallel execution with interprocess communications but this focuses only on communication between processes. Not the flow of a single process. Then again, UML doesn’t even have those kinds of flow diagrams.

So, while programming might have a fifth pillar, it also seems it breaks the whole system due to the added complexity. It also makes programming extremely different, especially for beginners. When you start to learn programming then the four pillars should teach you what you need to know. But once you start on the fifth pillar, everything will turn upside down again…

So maybe I should even forget about my four pillars theory? Or add the fifth one and ignore the added complexity?

Delicious spam!

Posted on 2017/02/02 by W.A. ten Brink

Once more, a post about spam. Why? Because I have one more interesting email in my spam-box, sent by someone who clearly is confused by the whole topic. So, here’s the email, with some annotations:

Why is it spam? Because Google Apps/GMail says it is. And google is often right in these things. And as I don’t know Adam Collier, nor see any name of his company, it clearly seems like spam to me too, from some wannabe web developer in India looking for customers without understanding the rules.

Why from India? Well, the English writing is more British than American. The writing style is similar to how Indian spam is generally written, with only single-line paragraphs. The skill set used is also very common among Indian developers. The extreme politeness in the writing also is similar to what you see in mostly Asian countries, as people there are generally more polite. Then of course, it mentions India in the email too so that wasn’t difficult.

First of all, this email was sent from a genuine, free email address like those offered by Outlook, Gmail and Yahoo. I’m not going to say if it’s Outlook or not as I allow this guy some anonymity, even though his name is probably fake and the address already closed for sending spam. But for me that’s the first sign of spam. If it is sent from a free mail provider then you should make sure you know the sender before continuing! As usual, check the sender first for every email you receive!

Next is the address to where it was sent. While it seems to be my “info” account, it just isn’t! It was received by the account I used for my registrar and used in my domain registration where it is visible in the WhoIs information, including my name and some other details. The “info” address happens to be the address of some other website, who has also received this email. My address was actually part of the BCC header so other recipients would not see that I had received it. Smart, but it is to be expected from mass mailers as they would really piss off a lot of people if they only use the TO or CC fields, as many people tend to ‘Reply to all’ on spam messages, making even more spam.

So they got my address from the WhoIs database. So they should have known my name too! They just can’t use it as this is a mass email that’s probably sent to hundreds or even more people.As this spammer doesn’t seem to use any mass mailer application, I suspect that he just collected a lot of email addresses from interesting-looking domains and just mailed to them all from Outlook so the amount of recipients is likely to be hundreds, maybe thousands. Not the millions that more experienced spammers will use.

Interesting is how he’s called a webmanager in his email address while calling himself an online marketing manager in the email. No name for his business so maybe he doesn’t even have a real business. This could be a simple PHP developer who is trying to make a freelance web development business and is hoping to get some customers so he can expand his business. He might have a few friends who are also doing development and likely is a student at Computer Science classes in India who wants to put his lessons to the Test. This doesn’t look like a hardcore spammer, even though he is spamming. He’s more a lightweight spammer.

The prices he mentions are very reasonable. Then again, he basically uses standard frameworks like WordPress, Joomla, Magento and Drupal to build those sites which is generally not too much work. I call these “Do not expect too much from us” prices.

There is one major alert in all this, though. The grey line mentions a “Payment Gateway” which you should immediately distrust! Why? Because this developer is probably setting up this payment gateway and might have control over it later on. He could be siphoning off some of the payments made through it or even at one point empty all the money collected and put it in his own bank account! Good luck getting your money back!

Well, he could be honest but you should not take that risk to begin with…

It is interesting to see that he also provides Android and IOS applications. He seems to be specialized in PHP so he would need to know Swift or Objective-C to do the IOS development and Java for the Android development. Or have some other programming environment that allows him to develop for both platforms. He might be using Visual Studio with Xamarin which would allow him to focus on different platforms. Or he has friends who specialized in app development.

At the bottom of his email he tells you that this isn’t spam and that he actually hates spam. So if you aren’t interested you should just reply to him so he can confirm that your email address exists and is in use so he won’t be sending emails to it. Wait… Why does he need that? People who aren’t interested generally won’t respond! So he might actually be collecting confirmations for other purposes…

Anyways, it shows that many spammers are generally amateurs, not knowing what they’re doing. Some might work for some business and think they can promote it this way while others are just freelance developers trying to find a work in the current market. Both will generally learn that these kinds of emails are spam and generally end up being blacklisted or loose their free email account. The problem is not that they really want to spam people, but they are misguided in thinking that you can just send emails to everyone as part of their marketing strategy!

Unfortunately, it doesn’t work that way! If you send these kinds of messages unsolicited then you are spamming. If you seek new customers then you should start by registering your own domain name and provide proper information about yourself. Use your own domain name for sending emails and not some free provider and more important: use mailer software where people can subscribe and unsubscribe and only mail people who have subscribed! Also provide a simple web-based solution to unsubscribe as a link in your email. People might still consider it spam but at least the risks of being blacklisted becomes less as you’re conforming to the anti-spamming rules.

If you want to do proper business online then you need to be familiar with the rules. You should know about spam and how to avoid to becoming a spammer. You should have a clear profile of your business online, preferably under your own domain name. And you need to know about the legislations of the countries that you’re targeting like the cookie-laws and privacy laws in Europe. Thing is, if your site and services are targeting foreign nations then you are operating under their laws also! Never forget that!

And with that, this lesson ends…

Making 3D prints with OpenSCAD, Poser and Shapeways.

Posted on 2016/03/27 by W.A. ten Brink

As you might know, I’ve played for years with Poser Pro now. And for about a year, I have created printed models through the services of Shapeways while also creating all kind of artwork with those Poser models and E-ON Vue. But more recently (well, less than a week ago) I started experimenting with OpenSCAD, which is an open-source CAD application where you just “program” a special script file and it will generate a 3D model for you based on that script.

So, to start I will show you how I created a special box for my electronic experiments, which is my Shapeways model. It is a rectangular box with five holes in the side for wires that lead to a small, round container. Not sure what I want to build inside it, but I just like the shape and it is a nice experiment to start with.

This is a box with two different lids. One round, one rectangular. The rectangular lid will also contain a small engraved text.

To start, I create a simple module to create the lid:

module BoxLid(width, height, depth, groove){
    union(){
        cube([width, height, depth/3]); 
        translate([groove, 0, depth*1/3])cube([width-2*groove, height-2*groove, depth*1/3]);
        translate([groove/2, 0, depth*2/3]) cube([width-groove, height-groove, depth*1/3]);
        translate([width*6/20, groove*1, -depth*1/3]) cube([width*8/20, groove/2, depth*1/3]);
        translate([width*7/20, groove*2, -depth*1/3]) cube([width*6/20, groove/2, depth*1/3]);
        translate([width*8/20, groove*3, -depth*1/3]) cube([width*4/20, groove/2, depth*1/3]);
    }
}

This set of cubes will create the lid that will slide in a special slot that we will add to the box. The first three cubes are of various sizes and create a groove to slide over. The lid will completely cover the side of the box.

The last cubes are used to put some relief on the lid to allow it to slide easier up and down. But I want to have it engraved with some text, so I create an engraving module:

module label(name, depth, fontSize){
    linear_extrude(height = depth) {
         text(name, size = fontSize, font = "Lucida Calligraphy", halign = "center", valign = "center", $fn = 50);
     };
}

And I need a second module for the lid including the engraved name:

module NamedBoxLid(width, height, depth, groove){
    difference(){
         BoxLid(width, height, groove, groove);
         translate([width/2, height/2, groove/2]) rotate([0, 180, 270]) label("Team Katje", groove, 8);
     }
}

Yeah, engraving is just that simple! Subtract the shape of the text from the shape of the lid. The most tricky part is actually trying to rotate it and making it fit. But it also tells us how we can create a box. We basically make a box and subtract the lid from it! I will also subtract 6 cylinders from a side for the holes to the round box on the side. And we will also subtract the inner space from the box so it has space:

module Box(width, height, depth, groove){
    difference(){
         cube([width, height, depth]);
         union(){
             translate([groove, groove, groove]) cube([width-2*groove, height-2*groove, depth-2*groove]);
             translate([width*1/5, groove*3/2, depth/2]) rotate([90, 0, 0]) cylinder(h=groove*2, r=groove*2, $fn=precision);
             translate([width*4/5, groove*3/2, depth/2]) rotate([90, 0, 0]) cylinder(h=groove*2, r=groove*2, $fn=precision);
             translate([width/2, groove*3/2, depth*1/5]) rotate([90, 0, 0]) cylinder(h=groove*2, r=groove*2, $fn=precision);
             translate([width/2, groove*3/2, depth*4/5]) rotate([90, 0, 0]) cylinder(h=groove*2, r=groove*2, $fn=precision);
             translate([width/2, groove*3/2, depth/2]) rotate([90, 0, 0]) cylinder(h=groove*2, r=groove*2, $fn=precision);
             BoxLid(width, height, groove, groove); 
         }
     }
}

Or actually, I combine the five holes, the box content and the lid together into a single shape and subtract the whole shape from the box. This way, I know the lid will fit perfectly on the box.

All I have to do now is create the second, cylindrical box on top of the holes. For this, I again start with creating a lid:

module CylinderLid(width, groove){
    union(){
         cylinder(h=groove, r=(width-groove*2)/2, $fn=precision);
        translate([0, 0, groove])
         difference(){
             cylinder(h=groove, r=(width-groove*2)/2-groove, $fn=precision);
             cylinder(h=groove, r=(width-groove*2)/2-groove*2, $fn=precision);
         }
     }
}

Again, a simple procedure of stacking two cylinders on top of each other. However, to make slightly more room, the smaller cylinder is hollowed by subtracting an even smaller cylinder. The result is a ridge to keep the lid in place.

And like the box, I will subtract the cylinder lid from a hollowed cylinder:

module CylinderBox(width, height, groove){
    difference(){
         cylinder(h=height, r=(width-groove*2)/2, $fn=precision);
         union(){
                 CylinderLid(width, groove);
                 translate([0, 0, groove]) cylinder(h=height-groove, r=(width-groove*2)/2-groove, $fn=precision);
         }
     }
}

Now I know that the cylinder plus lid will be exactly the proper height together. All I have to do now is put them all together:

width=40;
height=80;
depth=width;
groove=2;
cylinderhight=12;

Box(width, height, depth, groove);
translate([0, 0, depth+3]) NamedBoxLid(width, height, depth, groove);
translate([width/2, groove, depth/2]) rotate([90, 0, 0]) CylinderBox(width, cylinderhight, groove);
translate([width/2, height/2, depth+10]) rotate([180, 0, 0]) CylinderLid(width, groove);
translate([width/2, height/2, depth]) cylinder(h=10, r=1, $fn=precision);

I first specify the sizes for the box in millimeters. It will be 4x4x9.2 CM in size, which is large enough for a 9 volt battery plus holder, some wires and maybe some other stuff.

Next, I create the box at the standard location. Above the box, I put the lid, with some space in-between to keep them separate. The cylinder is put against the box itself around the holes but the lid for the cylinder is put on top of the lid for the box itself.

The last thing connects the box and the two lids to make it all a single part. On Shapeways, you have to pay extra if it is all separate parts so make sure you connect it all together with thin connectors that you can easily cut away with a sharp knife. And yes, I checked to make sure it doesn’t go through the line of text on the lid!

Now, what does it all look like? Well, just look at these images:

It is all quite easy to do, although it requires plenty of math to get things in the proper locations. I haven’t explained the math part and I won’t. I’m just showing what you can do with OpenSCAD and a few hours of free time.

The result is a .STL file that you can upload to Shapeways to have it printed. As an alternative option, you can also convert it to an .OBJ file format and import the box into Poser, and add a Poser model on top of the cylinder lid, just for fun.

Or you will add textures to the shape and use it in Poser/Vue to create some new image. It can be even more interesting if you can combine a 3D printed model together with a rendered image of the same model.

I still have a lot to learn about OpenSCAD but this tool allows you to specify exactly the shapes that you want for your 3D model and allows you to create modules that will allow you to do some complex things like separating a lid from a box.

The Celebrity Hacks…

Posted on 2014/09/23 by W.A. ten Brink

For those who are still hiding in some cave, there’s something going on called “The Fappening“. It’s a celebrity scandal that involve ‘selfies‘ taken by some famous people, most of them female and in various stages between clothed and fully nude. People claim that these celebrities should not have taken nude selfies to begin with but I strongly disagree with that opinion. People should just have respect for the privacy of others and this includes the privacy of celebrities.

Unfortunately, we live in a society where the price of a used tampon can be hundreds of dollars worth, if used by someone very famous, like Miley Cyrus or Jennifer Anniston. Preferably with a certificate explaining how it was retrieved and when it was used. Just no respect for their privacy, since people can earn lots of money with it. And that’s also true with these celebrity hacks.

To make matters worse, there are plenty of people out there who will make fake pictures of those celebrities. Some are very obviously fake. Others use a look-alike model to make the photo more real. But in this case, the photo’s seem to be mostly real pictures of those celebrities with maybe a few fake ones to make it appear an even bigger hack.

Now, telling celebrities to stop taking nudies (nude selfies) is like telling people to not use their right of free speech. It would violate their own freedom of expression. Why would the girl next-door be allowed to take nudies while Victoria Justice should not do so? Well, the girl next-door is not as interesting as a target than someone famous. Besides, thousands of girls (and boys) have ended up as victims of the same crime because they shared the nudie with their lover, and that lover would then publish the nudie once the relation ends. (Something called revenge porn.) But those pictures will maybe draw attention from 10 to 20 other viewers while a nudie of Ariana Grande would draw the attention of thousands, maybe even millions, of people.

Basically, exposing nudies of other people without their permission should be considered a criminal offense, almost as criminal as rape. (And as a copyright violation, but that’s generally a misdemeanor and often something for the Civil Court, not the Criminal Court.) So if your ex-lover uploads your nudie to a revenge porn site, he (or she) should be arrested and punished for it. And sites that allow this kind of revenge porn should be considered to be criminal organisations and anyone visiting them or uploading pictures to those sites should face criminal charges too. Harsh? Yes, but our modern society seems to require such hard actions against these offenders. Besides, there are plenty of legal ways to publish nudes. You just need the consent of the model, and plenty of models are willing to pose for such images.

Problem with the Fappening is that no one seems to know how the hacker(s) gained access to these selfies, although it is assumed that iCloud from Apple isn’t secure enough. Most celebrities seem to favour Apple products over Android products and all investigations seem to focus on the iCloud. Since the iPhone camera can synchronize any picture it takes with the cloud, it also explains how those photo’s ended up on the Internet in the first place. Thus, if an iCloud account is hacked, those photo’s can start roaming all over the Internet.

One cause of this leak is the insecurity of the iCloud. It seems as if the photos are stored without any form of encryption on the iCloud servers. I’m not 100% sure about that, but Apple has a good reason to not use encryption: decryption takes time and thus slows down the system. But I don’t know why the phone itself cannot do the encryption or decryption of those pictures.

Basically, the iCloud account would contain a private key and every device that is used to connect to this account will receive a private key, after the user requests for it. When this happens, an email should also be sent to the user account to warn her (or him) that a new public key has been generated. Thus, if a hacker gains access to the account, he will need the public key, which will warn the user.

This public key would then be used when the user is uploading or downloading photos. Thus, the encryption happens in the phone, not in the cloud. So if anyone has access to the cloud data, they still won’t be able to see the pictures. This will generate much more privacy for the user. Besides, the encryption could also happen within iTunes so the user can synchronize with her computer. And all data the user has should be encrypted before the iCloud receives it.

That would be an important security upgrade by Apple, but users should also take some steps to secure themselves. Perez Hilton has an expert naming a few options but I don’t fully agree with those. To start with, he advises that every celebrity start using a new email address. That’s just wrong, because they can’t throw away the old one.

A better option for celebrities is to register their own domain name. Most of them already have one anyways. Use this domain to generate a bunch of email aliases and use each alias for any specific contact or account that you have. For example, your address to register your iPhone would be apple.phone@example.com while your registration for Amazon would be amazon.com@example.com. Or, if you want to keep up some bookkeeping, just use random codes for every contact. For example, bb001@example.com for Apple and bb002@example.com for Amazon. (In which case you could apply filters that will label incoming emails based on those aliases.)

Maintaining your email aliases this way is quite easy with e.g. the services of Google. There are plenty of alternatives too, but I just like Google so I use them as example. All you need to do is to create one user account and connect it to your domain. If you want to give your lover, your mom or your children with their own accounts then just add a few more user accounts. And if you have some expert handling your ICT requirements, then let the expert have his own user account too. Your own account should be set up as a catch-all address and you could set it up to filter all incoming emails based on the alias that receives it. You will have to add each address in the settings of your email application too, so you can respond with that alias too, instead of your regular mail address.

Another advice is to strengthen your security questions. Please don’t do that. Security questions are just a crappy way to make people think they’re secure while it just opens an extra attack vector to your account. It’s easier to just answer these questions with about 20 random letters and quickly forget about them. Just make sure you don’t forget your password.

The third advise is similar to my advice of using a whole domain. The difference is that my option is still a single account and all your mail will be received by that single account. Thus, you can create a lot of aliases and still support them with ease. Creating multiple email accounts will become troublesome once you have over 20 of those accounts. Basically, it means that you have to check 20 accounts every day instead of just one…

And the last advice sucks if you’re a celebrity, but I have to agree with it. Still, if you want to be famous, you need people to talk about you so some private information needs to leak out. Or you should get some reputation as bad Diva or whatever, walking on stage in a dress made of meat, having your nipple pierced and filmed to show on YouTube or start dating a homeless person with a criminal record just to draw attention. (Sitting naked on a wrecking ball seems to help too…)

But other information, like the name of your dog, the names of your family members and even where you’re taking your date out for dinner are things that are better kept private. You can still have a big influence on the Internet without exposing much of your private information. Your fans will continue to follow you no matter how crazy you act online.

Do keep in mind that you need to uphold a large fan base if you want to continue to profit from your fame. Having these nudies exposed to the public is horrific, but it is also an opportunity to get more fame. For example, Paris Hilton made a sex video of herself and her lover that got exposed to the Internet. Before that happened, barely anyone knew her. But the attention of this scandal did increase her popularity and provided her a lot of new opportunities. Some of the actresses who have become victims are already trying to spin the event into new opportunities for the future. They are trying to still get something positive about all this negative attention.

Besides, beneath our clothes, we’re all naked. We all are sexual beings who often do silly things that are better left to our own private information. You, the victimized celebrities have done nothing wrong. The ones who took those pictures from your private accounts are the real criminals. They are the ones to blame, they are the ones who need to be punished for the whole thing…

Tricky spammer!

Posted on 2014/06/05 by W.A. ten Brink

As usual, spammers trying to fool me and many others, and the best way to protect you against them is by sharing how they operate. (And by using a proper spam filter, which is part of Google mail. And today some message was in my spam folder which seemed to be legitimate. Well, okay… There was another hint telling me something wasn’t right. Multiple hints even.

Delivered-To: 
Received: by 10.50.83.72 with SMTP id o8csp50152igy;
        Thu, 5 Jun 2014 10:35:17 -0700 (PDT)
X-Received: by 10.180.76.210 with SMTP id m18mr17979380wiw.49.1401989716698;
        Thu, 05 Jun 2014 10:35:16 -0700 (PDT)
Return-Path: 
Received: from sm1.white-lines.net (sm1.white-lines.net. [188.65.149.28])
        by mx.google.com with ESMTP id cn1si16467631wib.60.2014.06.05.10.35.16
        for <vip@watb.nl>;
        Thu, 05 Jun 2014 10:35:16 -0700 (PDT)
Received-SPF: pass (google.com: domain of  designates 188.65.149.28 as permitted sender) client-ip=188.65.149.28;
Received: by sm1.white-lines.net id hi2736000dsi for ; Thu, 5 Jun 2014 17:35:15 +0200 (envelope-from )
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
From: Security Team <security@security-fix-required.com>
Return-Path: bounce-
To: 
Subject: Your website has a security leak!
Message-ID: 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101
 Thunderbird/24.3.0
Date: Thu, 05 Jun 2014 17:35:15 +0200

Hello,

during a routine check, we discovered that the server hosting your domain h=
as a security leak and is currently vulnerable. Your website is at risk of =
being hacked! It's also running an outdated PHP version.

For further security details and secure managed server offers, please visit=
 our website:

http://www.security-fix-required.com//

Thank you,

Security Division
Managed Root Server

So, what did they do to make it seem legitimate? Well, it was a simple plain-text email with just a small amount of text. Apparently someone discovered a security leak in my website and is warning me about it. Since there are always white-hat hackers on the Internet who search for such things to warn the site owners, it could be legitimate. It even seems an automated message from an automated vulnerability scanner. So, it will probably fool a few people into clicking on the link in the email.

And that was the first thing that set me off. The domain name is a bit long and the URL ends with what seems to be a GUID or other identifier. If I would click on it, the site would confirm my address as legitimate and perhaps it would redirect me to some online advertisement or even a malware site. So, first lesson: If a URL has a weird number in it, it should be automatically suspicious!

Of course, the message doesn’t give me any information, just a warning. If they had detected something, they could have included a few more details. At least, they could have named the domain that they’ve checked. I have multiple domain names so this warning tells me nothing about the site.

They also mentioned a leak in an older version of PHP in my website, but my website doesn’t use PHP. I know this blog does, but this blog is hosted. It’s not on my server. And the host is making sure it stays safe with the latest updates. (At least, I hope they do but fortunately they have many other customers too.) If they had left away the remark about PHP, it might have looked more legitimate.

The fact that they don’t leave a name is reasonable, since hackers prefer to be anonymous. But hackers would use an alias instead, not some name of some server.

Of course, it also helped that this email ended up in my spam folder. Reporting spam thus helps protect others.If it had not been in my spam folder I would have reported it as spam myself, so Google would recognise it as spam in the future.

Some further analysis by using RobTex tells me the domain is very new. It was registered today, so probably not blacklisted yet. A Google search for the domain name is also interesting. These two should offer plenty of warnings about the site.

Of course, this wasn’t the only spam message, but it was the most tempting. Another message I received tried to sell me a specific kind of blue pills. A third one tempted me with some video but not only did Google detect it as spam, My virus scanner detected the URL inside the spam as potentially malicious. And Ruby Palace wants me as visitor, even though online gambling sites are illegal in the Netherlands if they target Dutch consumers. Since the email was in Dutch, one extra law was broken.

Again, the best weapon against spam is educating people about all the tricks spammers use and to make sure spam gets reported as such. If you use Yahoo mail, Windows Live email or Google mail, reporting spam as such should be a simple option.

MtGox is close to bankrupt.

Posted on 2014/05/22 by W.A. ten Brink

TodaY I received a PDF file called “Announcement of Commencement of Bankruptcy Proceedings_212014” And basically, it tells me that MtGox, a bitcoin market, is definitely going bankrupt. But that was to be expected. I have less than a single euro in bitcoins at MtGox I have no regrets for trying out their service. But plenty of other people have made big investments in bitcoins and stored them at MtGox. Chances are that they will have lost it all, since MtGox has plenty of bills it needs to pay first.

To make it more complex, its unclear if bitcoins can be considered equal to money or not. They’re just a collection of bytes in a specific order and format and they’re worth exactly what people are willing to pay for them. It will be interesting to see what the Japanese court system will think of the value of bitcoins. People might still get their bitcoins if the Liquidator thinks they’re worthless. But if the system in Japan is similar to the Netherlands, that Liquidator could just auction off all bitcoins that MtGox still have to pay off the debts. The remaining cash would then be compensation for anyone who had their bitcoins stored there.

Of course, plenty of other countries (the USA and UK) are probably willing to dig into the action and try to get some financial compensation too. Plenty of American people have lost a lot of money because of this. But the Japanese government goes first and all others have to pick the remaining bones. And I don’t think there will be a lot of meat left on those bones…

The lesson learned from this is, of course, that bitcoins aren’t that safe. Especially if you have them stored at some bitcoin site as MtGox. You are losing control over your money and considering how much bitcoins have been worth in the past, being careless with them can cause a big financial blow. Then again, people can also lose bitcoins if they store them on their own systems. Bitcoins on your phone can get lost if your phone is stolen or damaged. Bitcoins on your computer are always at risk of getting wiped away. I’ve heard of one guy who threw away his old laptop and later learned that he had a few thousands of bitcoins on it, each worth over $1,000 in cash! A very expensive mistake, although he had mined them himself so he did not really lose money. He just made no profits from the mining.

So, please consider what you’re doing when you will use some crypto-money like bitcoins. Make sure you’re well-informed and don’t buy them in large quantities if you just want to save your money somehow. It’s better to just start mining them yourself so your losses can be under control.

And yes, banks can go bankrupt too, but crypto-currency is a bit more riskier since there’s no proof to tell that you really owned them. Once they’re gone, you won’t get them back. This is still something that you should leave to true pioneers who are willing to take risks.

The email itself:

関係人各位

株式会社MTGOX（以下「MTGOX」といいます。）につき、平成26年4月24日午後5時00分、東京地方裁判所より破産手続開始決定がなされ、当職が破産管財人に選任されました（東京地方裁判所平成26年（フ）第3830号）。
今後、破産管財人において、MTGOXの財産管理換価、債権調査等の破産手続を遂行していきます。
つきましては、関係者に対する情報提供を目的として、破産手続に関する基本的事項を添付のとおりお知らせいたしますので、ご確認ください。

なお、このメールアドレス（mtgox_trustee@noandt.com ）は破産管財人からの送信専用であり、貴殿が本メールアドレス宛の返信等をされても内容確認及び回答などの対応はできません。
破産手続の進行等については、ウェブサイト（ http://www.mtgox.com/ ）で情報提供をする予定ですので、当該ウェブサイトをご確認ください。
宜しくお願いいたします。

破産者株式会社MTGOX 破産管財人弁護士小林信明

To whom it may concern,

At 5:00 p.m. on April 24, 2014, the Tokyo District Court granted the order for the commencement of the bankruptcy proceedings vis-à-vis MtGox Co., Ltd. (“MtGox”), and based upon such order, I was appointed as the bankruptcy trustee (Tokyo District Court 2014 (fu) no. 3830).
The bankruptcy trustee will implement the bankruptcy proceedings, including the administration and realization of the assets and investigation of the claims.
For the purpose of providing information to the related parties, we hereby inform you of the basic matters regarding the bankruptcy proceedings as attached.

This email address（mtgox_trustee@noandt.com） is used only for the purpose of sending messages, and we are unable to check and respond to any replies to this email address.
Since we plan to provide the information regarding the bankruptcy proceedings by posting it on the website hosted by the bankruptcy trustee ( http://www.mtgox.com/ ), please check this website.

Bankrupt MtGox Co., Ltd. Bankruptcy trustee Attorney-at-law Nobuaki Kobayashi

Betaalverzoek inzake CJIB

Posted on 2014/05/19 by W.A. ten Brink

Once more some stupid spammer trying to get people to pay them lots of money. It was sent to my sister who could not understand how she had to pay so she asked me how. I quickly discovered that this is a big scam and told her so. And I’m posting it here to warn other people about this scam too and how scammers try new tricks every time hoping for the suckers who are scared enough to pay.

Since this scam was written in Dutch, I will continue in the Dutch language.

Mijn zus ontving vandaag deze email van het “CJIB” betreffende een verkeersboete van 155 euro. Het dreigt ermee dat haar bankrekening wordt geblokkeerd met ingang van 13 mei, wat dus al gebeurd zou zijn. Ze moet voor 19 mei betalen, dus op de dag dat ze de email ontving. En ja, dat is de manier waarop spammers proberen om hun slachtoffers mee onder druk te zetten zodat ze betalen zonder na te denken.

Wat belangrijk is, is hoe de spammers aanwijzingen geven om een prepaid credit card aan te schaffen om zo de boete mee te betalen. Vervolgens moet je naar een site toe, waar geeneens een domeinnaam aan hangt. Het is een URL met IP adres 153.122.39.197 en daarbinnen een folder. Daar zie je vervolgend een vrij kaal scherm met een betaalknop.

Klik je vervolgens verder dan krijg ik met Google Chrome al een waarschuwing dat de site is geblokkeerd wegens phishing. Ik neem even het risico en kom bij het volgende plaatje. Daar moet de 3B pincode worden ingevuld, waarna de oplichter de gehele creditcard kan leeghalen. Wie uiteindelijk een 19-cijferig nummer invoert krijgt vervolgens een pagina te zien die aangeeft dat de betaling succesvol was (terwijl ik een willekeurig nummer gebruikte) en ik zal binnen drie tot 5 dagen bericht krijgen van de belastingdienst.

Belastingdienst?

Het bedrag van 155 euro komt mooi overeen met de hoogste waarde van de betreffende maatschappij. Gelukkig hebben ze al door dat er dergelijke nepmails over het Internet gaan zodat iedereen op Beltegoed Opwaarderen daar nog eens de waarschuwing over deze oplichterij te zien krijgt.

Jammer dat de waarschuwing onder de betaalknoppen staat en niet erboven, waar ze nog beter opvallen. Maar iedereen zou dit toch als een waarschuwing moeten zien. Hopelijk is het duidelijk genoeg maar er zullen altijd mensen zijn die in dit soort oplichterij trappen.

Hoe komt het dat er zoveel mensen in trappen? Dat is heel simpel. Dergelijke berichten worden vaak naar grote aantallen adressen verstuurd. Als 1% van de bevolking er in trapt en ze versturen het naar 100.000 adressen dan zijn dat toch al weer 1.000 slachtoffers. En dat maal 150 euro maakt het een winstgevende actie, maar wel illegaal. Gelukkig is het percentage slachtoffers nog veel lager dan 1% maar al zijn er 10 slachtoffers in die grote groep, het geld komt dan wel binnen met relatief weinig moeite.

Hoe kun je je wapenen tegen deze oplichters? Eigenlijk moet je daarvoor gewoon goed opletten en goed weten hoe bepaalde bedrijven en organisaties werken. Het CJIB zal echt niet via prepaid creditcards betaald willen worden. Het CJIB zal sowieso nooit via het Internet boetes proberen te innen.

Dergelijke constructies zijn vooral bedoeld om geld weg te sluizen zodat het slachtoffer er niet meer bij komt. Je bent het geld gewoon kwijt zodra je op deze manier hebt betaald. Ook de creditcard maatschappij kan het niet terugkrijgen omdat ze het beltegoed erop gebruiken om bijvoorbeeld een duur 06-nummer mee te bellen. Dan is de creditcard leeg en ligt het geld bij een telefoon maatschappij die het weer moet doorbetalen aan een bel-bedrijf. En van daar gaat het geld weer verder weg van het slachtoffer.

Wat ook van belang is, is dat de site nergens om mijn persoonlijke gegevens vraagt. Deze staan zelfs niet in de email. Het is gericht aan de bestuurder, zonder zelfs een nummer van een kentekenplaat te vermelden. Dat kunnen de oplichters ook niet want ze hebben deze gegevens niet. Als iemand een rekening per email verstuurt dan zou je toch meer gegevens in de email verwachten. Het gebrek aan deze persoonlijke gegevens is ook een waarschuwing.

Wie technisch iets handiger is kan ook nog eens naar de ‘headers’ van de email kijken om te bepalen waar de email vandaan komt. En dan blijkt dat de email afkomstig is van hetzelfde IP adres als de site zelf. Een adres dat ergens in Japan te vinden is. Mogelijk een Japanse computer die onderdeel is geworden van een botnet en dus misbruikt wordt zonder dat de eigenaar dit beseft. Om de oplichter te vinden is dit dus geen behulpzame manier. Daarvoor zul je het geld moeten volgen…

Maar sowieso moet je altijd oppassen met verzoeken tot betalen per email. Eigenlijk zou je dat standaard moeten weigeren, tenzij je zeker bent dat het iets betreft dat je nog moet betalen.

Nu nog even de volledige email zoals deze is ontvangen via de hotmail account van mijn zuster:

x-store-info:4r51+eLowCe79NzwdU2kRyU+pBy2R9QCj0/8P6fDMVumMo6iGJG5XQGQsGw4y+KC5jGdX6A7+/ZVHRw3c8psWXtc+cAfssqe5kw3LdG9RbC+kh049fg5aL5vFishJNonRedbn/JCR2Y=
Authentication-Results: hotmail.com; spf=none (sender IP is 153.122.39.197) smtp.mailfrom=cjibnoreply@cjib.nl; dkim=none header.d=cjib.nl; x-hmca=none header.id=cjibnoreply@cjib.nl
X-SID-PRA: cjibnoreply@cjib.nl
X-AUTH-Result: NONE
X-SID-Result: NONE
X-Message-Status: s1:n
X-Message-Delivery: Vj0xLjE7dXM9MDtsPTA7YT0wO0Q9MjtHRD0yO1NDTD02
X-Message-Info: OR3oMfwJnYHF1wanhF69C9Yey20TK9h7x9GWXuv5yaEGAfYu81s5sUj6V3GqMLsbaFOGIxV4jNuK1YTPnnwB8khYxF5czLKOeqtp5CEeiwA6KP8+eQfiSR4aZ+C9AR+10UtHFivL+rY5J1BgXCW7aHs
+IXGFCGuG7VDEq8ZxsEs1ttSXkle85ecru4AU5KBKfNEdJylVvJENsulQeQGWmUjowK3sd7ew
Received: from vps1.cpanel.net ([153.122.39.197]) by BAY0-MC6-F21.Bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4900);
Fri, 16 May 2014 18:16:02 -0700
Received: from [62.140.132.229] (port=27929 helo=newran)
by vps1.cpanel.net with esmtpa (Exim 4.82)
(envelope-from <cjibnoreply@cjib.nl>)
id 1WlTE6-0002gc-Bo; Sat, 17 May 2014 10:15:51 +0900
Reply-To: <noreply@cjib.nl>
From: “Centraal Justitieel Incassobureau”<cjibnoreply@cjib.nl>
Subject: Betaalverzoek inzake CJIB
Date: Sat, 17 May 2014 03:15:51 +0200
MIME-Version: 1.0
Content-Type: multipart/related;
boundary=”—-=_NextPart_000_0040_01C2A9A6.59B75712″
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname – vps1.cpanel.net
X-AntiAbuse: Original Domain – hotmail.com
X-AntiAbuse: Originator/Caller UID/GID – [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain – cjib.nl
X-Get-Message-Sender-Via: vps1.cpanel.net: authenticated_id: newran/only user confirmed/virtual account not confirmed
Bcc:
Return-Path: cjibnoreply@cjib.nl
Message-ID: <BAY0-MC6-F21LjANJQ000b8ac21@BAY0-MC6-F21.Bay0.hotmail.com>
X-OriginalArrivalTime: 17 May 2014 01:16:02.0669 (UTC) FILETIME=[91B0C9D0:01CF716D]

This is a multi-part message in MIME format.

——=_NextPart_000_0040_01C2A9A6.59B75712
Content-Type: text/html;
charset=”Windows-1251″
Content-Transfer-Encoding: 7bit

<HTML><HEAD><TITLE></TITLE>
</HEAD>
<BODY bgcolor=#FFFFFF leftmargin=5 topmargin=5 rightmargin=5 bottommargin=5>
<FONT size=2 color=#000000 face=”Arial”>
<DIV>
<IMG align=middle border=0 width=400 height=69 src=”cid:00E9BAC800C5$03195E81$0100007f@uhxyhwczmgwjdgc”></DIV>
<DIV align=center>
 </DIV>
<DIV align=center>
 </DIV>
<DIV>
 </DIV>
<DIV>
Geachte bestuurder,</DIV>
<DIV>
 </DIV>
<DIV align=center>
 </DIV>
<DIV>
U hebt een beschikking en vervolgens twee aanmaningen ontvangen voor het overtreden van een verkeersvoorschrift.</DIV>
<DIV>
Het openstaande bedrag is niet volledig op de rekening van het Centraal Justitieel Incassobureau (CJIB) bijgeschreven.</DIV>
<DIV>
Daarom zullen wij de bank opdracht gegeven uw rekening te blokkeren per dinsdag 13 mei 2014.</DIV>
<DIV>
Alleen persoonlijk bij het BKR zelf kunt u inzage krijgen in de informatie die het BKR over u ontvangt.</DIV>
<DIV>
Het blokkeren van rekening betekent dat de toegang tot uw rekening geblokkkeerd is met ingang 13-05-2014 voor een periode van vier werken.</DIV>
<DIV>
 </DIV>
<DIV>
 </DIV>
<DIV>
Met de 3v online krediet kunt u online op onze website de betaling voldoen. U dient hieronder te klikken op<B><I> </B></I><I>3v credit kopen</I> .</DIV>
<DIV>
<B> </B></DIV>
<DIV>
<B> </B></DIV>
<DIV>
<A href=”http://beltegoedopwaarderen.nl/3v”><FONT color=#0000FF><B><U>3v</B></U></FONT></A><A href=”http://beltegoedopwaarderen.nl/3v”><FONT color=#0000FF><B><U> credit
kopen</B></U></FONT></A></DIV>
<DIV>
<B> </B></DIV>
<DIV>
Let op: nadat uw de 3v (prepaid credit) heeft gekocht dient u de 19 cijferige nummercode hieronder te activeren om de betaling te voldoen.</DIV>
<DIV>
Klik hieronder op <I>aanmaning betalen</I><B><I>.</B></I></DIV>
<DIV>
<B> </B></DIV>
<DIV>
<B> </B></DIV>
<DIV>
<A href=”http://153.122.39.197/~newran/”><FONT color=#0000FF><B><U>Aanmaning betalen</B></U></FONT></A></DIV>
<DIV>
Het volledige bedrag van Eur 155,00 (inclusief kosten) moet uiterlijk 19-05-2013 worden betaald. Doet u dit niet, dan wordt u per 19-05-2014 geregisteerd bij BKR.</DIV>
<DIV>
Voorkom blokkade van uw rekening.</DIV>
<DIV>
 </DIV>
<DIV>
<B> </B></DIV>
<DIV>
<B> </B></DIV>
<DIV>
Hoogachtend,</DIV>
<DIV>
<IMG align=middle border=0 width=120 height=60 src=”cid:00C18EFDDDDC$00C87F7D$0100007f@uhxyhwczmgwjdgc”></DIV>
<DIV>
Centraal Justitieel Incassobureau.</DIV>
<DIV>
<B> </B></DIV>
<DIV align=center>
 </DIV>
<DIV align=center>
 </DIV>
<DIV align=center>
 </DIV>
</FONT>
</BODY></HTML>

——=_NextPart_000_0040_01C2A9A6.59B75712
Content-Type: image/jpeg;
name=”2007-04-05_handtekening.jpg”
Content-Transfer-Encoding: base64
Content-ID: <00C18EFDDDDC$00C87F7D$0100007f@uhxyhwczmgwjdgc>

[SNIP – Some UUEncoded data]

——=_NextPart_000_0040_01C2A9A6.59B75712
Content-Type: image/jpeg;
name=”download.jpg”
Content-Transfer-Encoding: base64
Content-ID: <00E9BAC800C5$03195E81$0100007f@uhxyhwczmgwjdgc>

[SNIP – Some UUEncoded data]

——=_NextPart_000_0040_01C2A9A6.59B75712–

Multithreading, multi-troubling.

Posted on 2014/02/27 by W.A. ten Brink

Recently, I worked on a small project that needed to make a catalog of image files and folders on my hard disk and save this catalog in a database. Since my CGI and my photography hobby generated a lot of images, it would be practical to have something easy to support it all. Plenty of software that already does something like this, but none that I liked. Especially since I want to connect images to derived images, group them, tag them, share them, assign licenses to them and publish them. And I want to keep track of where I’ve shared them already. Are they on Flickr? CafePress? DeviantArt? Plus, I wanted to know if they should be rated as adult. Some of my CGI artwork is naughty by nature (because nude models are easier to work with) and thus unsuitable for a broad audience.

But for this simple catalog I just wanted to store the image folder, the image filename, an image name that would be the filename without extension and without diacritics, plus the width and height of the image so I could calculate the image ratio. To make it slightly more complex, the folder name would be a relative folder name based on a root folder that’s set in the configuration. This would allow me to move the images to a different folder or use the same database on a different machine without the need to adjust all records.

So, the database structure is simple. One table that has the folders, one table containing image ratios and one for the image names and sizes. The ratio table will help me to group images based on the ratio between width and height. The folder table would do the same for grouping by folder. The Entity Framework would help to connect to this database and take away a lot of my troubles. All I have to do now is write a simple library that would fill and keep up this catalog plus a console application to call those methods. Sounds simple enough.

Within 30 minutes, the first version was ready. I would first enumerate all folders below the source folder, then for each folder in that list I would collect all image files of type PNG, JPG and BMP. The folder would be written to the folder table and the file would be put in the Image table. Just one minor challenge, though…

I want to add the width and height of the image to the image table too, and based on the ratio between width and height, I would have to either add a new ratio record, or change an existing one. And this meant that I had to read every file into memory to find its size and then look if there’s already a ratio record related to it. If not, I would need to add the new ratio record and make sure the next request for ratio records would now include the new ratio record. Plus, I needed to check if the image and folder records also exist in the database, because this tool needs to update only for new images.

The performance was horrible, as could easily be predicted. Especially since I make images and photo’s at high resolutions, so reading those files does take dozens of milliseconds. No matter that my six cores at 3.5 GHz and 32 GB of RAM turns my system in a Speed Demon, these read actions are just slow. And I did it inefficiently since I have six cores but my code is just single-threaded. So, redo from start and this time do it multithreaded.

But multithreading and the Entity Framework don’t go well together. The database connection isn’t threadsafe and thus you cannot access the database methods from multiple threads. Besides, the ratio table could generate collisions when two images with the same, new ratio are processed. Both threads would notice the ratio doesn’t exist thus both would add it. But one of those would then fail because the other would have added it first. So I needed to change my approach.

So I Used ‘Parallel.ForEach’ to walk through the folder list and then again for all files within the folder. I would collect the data in internal lists and when the file loop was done, I would loop through all images and add those that didn’t exist. And yes, that improved performance a lot and kept the conflicts with the ratio table away. Too bad I was still reading all images but that was not a big issue.Performance went up from hours to slightly over one hour. Still slow.

So one more addition. I would first read all existing folders and images from the database and if a file existed in this list, I would not read it’s size anymore since it wasn’t needed. I could skip the image. As a result, it still took an hour the first time I imported all images, but the second run would finish within a minute, since there wasn’t anything left to read or add. The speed was limited to just reading the files and folders from the database and from the disk.

When you’re operating these kinds of projects in an Agile team and you’re scrumming around, things will slow down considerably if you haven’t thought about these challenges before you started the sprint to create the code. Since the first version looks quite simple, you might have planned it as a very short task and thus end up with extremely slow code. In the next sprint you would have to consider options to speed things up and thus you will realize that making it multithreaded is a bigger task. And while you are working on the multithreaded version, you might discover the conflicts with the Entity Framework plus the possible collisions within the tables. So the second sprint might end with a buggy but faster solution with lots of exception handling to catch all possible problems. The third sprint would then fix these, if you manage to find a better solution. Else, this problem might haunt you to the deadline of the project…

And this is where teams have to be real careful. The task sounds very simple, but it’s not. These things are easily underestimated by a team and should be well-planned before you start writing code. Experienced developers will detect these problems before they start, thus knowing that they should take their time and plan carefully without writing code immediately. (I only did it so I could write this post.) The task seems extremely simple and I managed to describe it in the second paragraph of this post with just three lines. But the solution with a high performance will require me to think before I start writing code.

My last approach is the most promising, though. And it can be done by using multithreading but it’s far more complex than you’d assume at first. And it will be memory-hungry because you need to create several lists in memory.

You would have to start with two threads. One thread will read the database and generate lists of files, folders and ratios. These lists must be completely in-memory because if you keep them as queryable lists, the system would try to continuously read them. Besides, once you’re done generating these lists you will want to close the database connection. This all tells you what you already have. The second thread will read all folders and by using parallel threads it would have to read all image files within those folders. But you would not read the image sizes yet, nor calculate all ratios.

When you’re done collecting the data, you will have to compare it all. You would start by comparing the lists of folders. Folders that exist in both lists can be ignored (but not their files.) Folders that exist in the database list but not the disk list should be deleted, including all files within those folders! Folders that are on disk but not in the database need to be added. Thus you can now start two threads, each with their own database connection. One will delete all folders plus their related images from the database that have been deleted while the other adds all new folders that are found on the disk. And by using two database connections, you can speed things up. You will have to wait for both threads to finish, though. But it shouldn’t be slow.

The next step would be the comparison of images. Here you do something similar as with folders. You split the lists in three different lists. One with all images that are unchanged. One with all images that need to be deleted. And one with all images that need to be added. And you would create a separate thread with its own database connection to delete the images so your main process can start working on the ratios table.

Because we now know which images need to be added, we can go through those files using parallel processing, read the image width and height and add this information to the image file records. When we have enriched this list with these sizes, we can use a LINQ query to generate a list of all ratios of those images and removing all duplicate ratios in this list. This generates the list of ratios that we would need to check.

Before we add the new images, we will have to check the ratios table. As with the folders table, we check for all differences. However, we cannot delete ratios that we haven’t found among the images, because we skipped the images that already exist. We will do this later, though. We will first start adding the new ratios to the database. This too can be done in a separate thread but it’s pretty fast anyways so why bother? A performance gain of two seconds isn’t worth the extra effort if a process takes minutes to finish. So add the new ratios.

Once all ratios are added, we can add all images. We could do this using parallel threads, with each thread creating a new database connection and processing all images from one specific folder or with one specific ratio. But if you want to add them multi-threaded I would just recommend to divide the images in groups of similar sizes. Keep the amount of groups relative to the number of processes (e.g. 24 for my six cores) and let the system do its work. By evenly dividing the images over multiple threads, they should all take about the same amount of time.

When adding the new images, you will have to find the related folder and ratio in the database again. This makes adding images slower than adding folders or ratios because you need the extra lookup. This performance would increase if we had kept the Folders and Ratio lists as queryable lists but then we could not open and close the connections, not could we use multiple connections to add those images. And we want multiple connections to speed things up. So we accept a slightly worse performance at this point, although we could probably speed it up a bit by using a stored procedure to add the images. The stored procedure would have parameters for the image name, the image filename, the width and height, the folder name and the ratio width and height. I’m not too fond of procedures with many parameters and I haven’t tested if this would increase the performance, but in theory it should be faster, especially if the database is on a different machine than the application.

And thus a simple task of adding images to a database turns out to be complex, simply because we need better performance. It would still take hours if it has a lot of new images to add but once you have it mostly filled, it will do quite well.

But you will have to ask yourself and your team if you are capable to detect these problems before you start a new sprint. Designs are simple, because designers don’t always keep the performance in mind. These things are easily asked for because they appear very simple, but have a lot of consequences. Similar problems might arise when you work with projects that need to be secure. The design might ask for a login screen with username and password, and optionally a few OpenID providers as alternative logins, but the amount of code to manage all this data and keep it secure is quite complex. These are real moments when you need to design some technical documentation first, which is something people often forget when working on an Agile project.

Still, you cannot blame the developer if the designer just writes a few lines and the developer chooses the first, slow solution. The result would be the requested task. It is the designer who needs to be aware of these possible performance pitfalls. And with Agile, you have a team. All team members should be able to point out that this simple description would have these pitfalls, thus making it a long and complex task. They should all realise that they will have to discuss possible solutions for this and preferably they do so as a team with just one computer. (The computer would be used to find information, not to write code!) Only when they agree on the proper solution then one or two of them could start writing code. And they would know how long this task will take. Thus, the task would finish within two sprints. In the first sprint, all team members would have a small task to meet and discuss the options. In the second sprint, one or more members would have a big task of implementing the code.

Or, to keep it simple: think before you start writing code!

Is XML in decline?

Posted on 2014/02/21 by W.A. ten Brink

I happen to be one of those older software developers who saw the rise of XML. I even remember the older SGML standard, although I never used SGML. Version 1.0 of XML became an official standard in 1998. Once it became a standard, many companies started working to create the Killer App to work with XML without much of a hassle. And although at first many companies started to create their own XML parsers, not all of them were completely conform the standard. Those parsers disappeared fast enough too.

Right now, version 1.1 of XML is the latest standard. Yes, in 16 years not much has happened to this standard. And the changes that have been applied are more about supporting EBCDIC platforms and the newer Unicode definitions. There are discussions about a version 2.0 but it’s not likely to become a standard soon. Strange as it might sound, XML seems to be in decline if you look at how it’s used.

The power of XML was, of course, in the way how you defined these files and how you could do transformations on these file types. While we used DTD definition files at first to define the structure of an XML file, some smart people came up with the XSD schema format, which allowed more flexibility and is by itself an XML file. Combined with some nice, graphical tools, the XSD made it easier to define an XML file and to validate if an XML file conforms to the proper structure. And I’ve made plenty of XSD files between 2000 and 2010 since my work required a lot of XML data exchanges.

Of course, transformations are also important and here we use stylesheets. An XSLT file would be made in XML itself and define how you would convert an XML file to some other output format. In general, this output would be another XML file, an HTML document to display it in a web browser, a simple text file or even a comma-separated file. And in some special cases it could even create a complete rich text document that you could open in Word. This meant that you could e.g. send an XML file to a server and the server would then process it. It would validate the file with a schema and could do additional validation tools by using a style sheet. If it passed these validation style sheets, other stylesheets could then be used to extract data from the XML and send it to other servers for further processing, while it could also generate documentation to return to the user. You could do a lot of processing with just XML files.

Of course, XML also became popular because more developers started to create web services. And they used the SOAP protocol for this, which is a slightly complex protocol that’s heavily dependant on XML standards. Since SOAP also had some build-in version mechanism, you could always make sure if the client was still using the right SOAP definitions or not. You could even use several SOAP message formats on the same system with only the version number as difference. It wasn’t easy to set up, but it worked extremely well.
And more has been developed to support XML even more. The XPath expressions would allow you to point to specific elements within an XML document. With XQuery, you could execute queries on XML files and process the result. With namespaces you could even combine multiple XML definitions that uses similar entities. And then we have things like XLink, XPointer and XForms, which never have been very popular.

Between 2000 and 2010, it seemed that XML would be a dominating development technique. No more writing code in other programming languages that needed to be compiled, simply because XML happens to be a fast scripting environment. Many platforms started to have a standard for objects that could process XML files and knowledge of XML became a hard-needed requirement for developers. So, what changed?

Well, many developers consider the XML format a bit bulky, especially because tags are often used twice. Once to open the element and once to close it. Thus, if an element is called ‘NumberOfElements‘ then you have to write <NumberOfElements>10</NumberOfElements> and that’s a lot of text to store the number 10. As a result, some developers would then shorten those tag names so the resulting XML would be smaller. If you have 10,000 of these tags in your XML file, shortening it to TOE would save 26 characters per element, thus 260,000 characters in total. This doesn’t seem much but developers feel they gain more by these kinds of optimizations. With modern multi-core processors and systems with 8 or more GB of RAM, such optimizations might make the code half a second faster, which you barely notice with web services, but still… Developers think it saves a lot. And yes, when resources are truly limited, it makes a lot of sense but modern mentalities are that companies will just add a second server if one is too slow. Or more, if need be. This is because the costs of the more hardware is less expensive than the costs of having developers optimize the code even further.

These kinds of optimizations make XML files less human-readable while the purpose was to make this kind of data more readable. It becomes slightly worse when the XML file uses namespaces, since those namespaces are also shortened to just a few letters.

Another problem is the need to parse XML to extract the data. More and more companies are creating web applications that run within web browsers and heavily rely on JavaScript. These apps need to be able to run on multiple devices too. Unfortunately, not all browsers support parsing XML files and even those who do are a bit complex to use. With regular expressions it’s still possible to extract some data from the XML but if you need to fill a grid with 50 rows and 20 columns, things become real complex. And to solve this, developers started to send data to web applications as JavaScript instead of XML. This could then be executed and thus the data would load itself into memory. Since JavaScript objects are less bulky than the begin/end tags of XML elements, it made this new format very practical and thus JSON was born.

The birth of JSON also demanded a change in web services. Since web applications would call these services directly, it would be very clumsy if they have to set up SOAP messages and then parse the SOAP results. A newer, simpler style of web services arose, which uses the REST protocol. Of course, there are many other web service protocols but REST seems to become the new standard. Especially because it’s a simpler protocol that relies on the HTTP(s) protocol.

Of course, web applications have become more important these days because we’re getting more and more devices with all kinds of different operating systems, which all have web browsers. And, as I said, not all of those devices have a native XML parser built-in. They do support JavaScript though, and as a result it becomes quite easy to develop web applications for all devices which use data in JSON formats.

Of course, many devices also allow special platform-dependant apps that can be created with development tools for their specific platforms. For OS X and iOS-based devices you would use Objective C while you would use C++ or Java for Android devices. (Java is the preferred development platform for Android.) For Windows RT you would use .NET for Metro-style applications with either VB or C# as primary language. This makes it a bit difficult to develop software that runs on all three devices but there are several parties who have created compilers that will compile platformdependent executables from platform-independent code. Unfortunately, working with XML parsers still differs on all these platforms and those third-party compilers need to wrap their parsers around the built-in parsers of the underlying platform. That makes them a bit slow.

Since the number of operating systems have risen since the market starts getting more and more new devices, it becomes more difficult to keep a single standard that’s supported by all those systems. And the XML standard is quite complex so the different parsers might not all support the same things. In that regard, JSON is much simpler since these are just simple assignment statements. And these assignment statements are based on the Java syntax, which also happens to be similar to the C++, C# and Objective C syntax. The only difference with these languages is the fact that JSON puts the field names between quotes too, which you can’t do inside these languages.

So, XML is becoming less useful because it requires too much work to use. JSON makes data serialization simpler and is less bulky. Especially when developers are more focussing on web applications and apps for specific devices, the use of XML is in decline in favor of JSON and other solutions. But there’s one more reason why XML is in decline. And this is something within the .NET framework that’s called LINQ.

LINQ was implemented as a separate library for .NET version 3.5 but has become popular since then. Basically, LINK allows you to support data in a structured object and use simple queries to, or to execute transformations on extract data from those objects. This would be similar to XPath and XSLT but now it’s part of your development language, allowing you more choice in functions that you can apply to the data. This is especially important for date fields, since XML doesn’t work well with date formats. LINQ actually makes extracting data from object trees quite easy and can be used on an XML document if you’ve read this document in memory in a proper XDocument or XmlDocument object. Thus, the need for XSLT to transform data has disappeared since you can do the same in C#, VB, F# or Oxygene.

The result is that .NET developers don’t have to learn about XML anymore. Their .NET knowledge combined with LINQ is more than enough. Since .NET also allows serialization to and from XML formats, it’s also quite easy to read and write XML files in .NET. You can import an existing XSD file into your .NET application and have it converted to code, but since most XML data starts as objects that need to be stored in XML before serialization, you will often see that developers just define the objects and include attributes to tell if the object and its fields are elements or attributes, and have the serialization library use these object definitions to serialize it to and from XML. Thus, knowledge of XML schemas is not a requirement anymore.

Because .NET development made the dependency on XML knowledge almost obsolete, the popularity of XML is in decline. It’s still used quite often, but the knowledge that you need to do practical things with XML with XML tools is disappearing. And similar things are happening on other platforms. Java and PHP also started supporting LINQ queries. And, as a result, those environments can work on structured objects instead of XML data. Thus, XML is only needed if the data needs to be sent to some other process and even then, other formats might be chosen too.

In fact, many developers are less concerned about the data format that’s used for inter-process communication. The system is handling this for them and they just use a specific serialization library that does the bulk of the work for them. XML isn’t really declining, but less developers need knowledge about the XML format since development tools have nice wrappers around them that allow these developers to use XML without even realizing they’re using XML. It’s not XML that’s in decline. It’s the knowledge about XML that is in decline…

Software engineering and CGI art

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: