In my previous blog post I talked about the reasons we started NewVoiceMedia. I also want to talk about some of the experiences we had before the business began, and how they led us to architect things in the way we did and helped determine our cloud rules.
Before we created NewVoiceMedia, we had worked on enterprise call routing systems used by customers such as BT, Microsoft and Hewlett-Packard. These systems shared many characteristics of the cloud systems we see today:
- They had to be scalable to support ten to hundreds of thousands of users
- They had to be very reliable – users expect telephony to always work
- They had to be easy to deploy and use, with easy to use web user interfaces
Our experience with these systems led us to the following cloud tenants:
1. Never install software on the customer’s site. This includes Java and plug-ins
2. Never require firewall configuration by the customer
3. Make it multi-tenant
4. Log everything. Whatever you think is too much logging is never enough, acquire data aggressively and measure everything
5. Keep everything. Our biggest asset is our data, we'll always want to come back and analyse it again later
6. Performance matters. Beware of premature optimisation. Optimise what your measurements tell you need optimising. Hardware is cheap, but make sure your algorithms scale
However, this experience was hard won; we made many mistakes along the way:
(1) & (2) Never install software on the customer’s site and never require firewall configuration by the customer
Before NewVoiceMedia, we wanted to deliver a rich user experience over the web. So we made what seemed like the obvious decision and used Java applets for our user interface. The first problem with Java is that you must have it installed on each machine. This can be an obstacle on its own when you are talking about tens of thousands of machines in an enterprise environment where yet another application has to be installed, managed and patched. Then your application will only work on a subset of Java versions and there are probably other Java applications that the user wishes to run. If the requirements don't match between your application and other applications the user wishes to run, you have a real nightmare / impasse, causing tough choices. You are also tied to the upgrade schedule of the customer on Java versions.
Chances are, your Java application needs to talk back to your server, and unless you do complex HTTP tunnelling, you’re going to need the customer to configure your firewall to do this. This is much more complex than you would first think; it's often difficult, if not impossible, to get some customers to do this, for very valid security reasons on their side.
We learnt the hard way that use of plugins such as Java and Flash and any need for firewall configuration gives you enormous friction with the customer and big support headaches. It also removes device and location independence, one of the big benefits of the cloud, as you can't just login from any device at any location.
This did cost in the early days of NewVoiceMedia, as we didn't have as slick user interfaces as others did, but it has paid off greatly now that pure web technologies have caught up in richness.
(3) Make it multi-tenant
From our experience in scaling, we realised that we had to make everything multi-tenant from day zero. It's the only way you can make you system truly scalable and get the best performance from it. Also, if you’re not multi-tenant then the support of multiple systems and multiple software versions will kill you!
I know of one SaaS company we spoke to that had 27 versions of its software running in the field. There is no way you could make that reliable and support it without an enormous and expensive team.
(4) Log everything
In the software we wrote before NewVoiceMedia, we always found it difficult to debug those complex, infrequent issues. We learned the hard way that the only way to have a truly reliable complex system is to have great insight into what is happening at all levels of the system. You need very detailed, configurable logging to show you what is going on so you can understand how your system is behaving and what exactly happened in that one in a million circumstance when something went wrong.
When you are dealing with a complex system that is running at scale, those one in a million things are happening all the time. They happen a lot more frequently than you think.
(5) Keep everything
Data is an enormously powerful asset and storing it is getting cheaper and cheaper. Our data is important for many reasons. We store a lot of data around how the system has performed because it helps us know how our usage and performance changes over time.
We are constantly finding new uses for our data and it enables us to provide a more reliable, higher performing service and better predict future usage.
(6) Beware of premature optimisation
Making systems scale is difficult. Optimising systems is difficult. So often it's easy to get bogged down in optimising the details, the individual elements of the system, when the real issue is the way the system is architected or the choice of algorithms used. You shouldn't be afraid to throw lots of computing power at a problem; it's always worth betting on Moore's Law. We have a big advantage in applying a lot more computing power to an individual call than traditional vendors would. However, you do need to have a good understand of Big O Notation and the differences in latency between cache, memory, disk, local network and wide area networks.In summary, we had the experience of building cloud telephony solutions and made our mistakes early – before we started NewVoiceMedia.