Friday 16 September 2016

Notes on Scalability


Moore's law says computational power doubles every 18 months but with the ever increasing demand for information it becomes a challenge to build and maintain highly available applications that can withstand heavy load.

So recently, I started building services that could handle several thousands requests per second and like every Engineer, I wanted to know if the system was performing good on average and If not, what could I do.

A typical JavaEE application that exposes a thread pool to handles requests delegates a thread to

handle each request.

Ideas

1. Let us increase the number of threads

Well, as easy as it may sound, this is not always the right choice to make. System resources are not infinite, thread context switching is expensive. Moreover, each thread has an execution stack and If we are running in a memory constrained environment, this can cause application crashes.

2. Let us limit the number of concurrent request we can service

This is also another nice idea. This has the problem of making the system seem not responsive during server peak times.  It turns out that in certain applications, we want to the system to be always available no matter the load. What can we do?

When the JVM process starts, It is allocated a default heap size and a stack size per 
thread (64k-1024k  though you can increase this if you want to).

Let M  : number of concurrent requests
A = Stack size / Thread
B = Object allocation / Thread
D = Total memory assigned to the JVM process.

For an optimally performing system, 


M(A+B) <= D --------------------(1)

What this means is that, If we have M concurrent requests, the total memory consumed while all requests are being serviced is M(A+B) and we really want this to less that the total memory we have available. How can we ensure that the above inequality holds true

1. Reduce the value of M.

How do we reduce the number of concurrent requests per system ? By Horizontal scaling. Horizontal scaling can mean introducing a Load balancer and also increasing the number of servers to handle the same requests. This way, each server receives a reduced number of concurrent request.

2. Increase the value of D

How do we increase the value of the total memory available to the system ? By Vertical Scaling. Vertical scaling is the process of making the existing system more powerful by adding more RAM, and CPU power. This way the system can handle more concurrent requests on the same server.

3. Reduce the value of  B

This is the object allocation on the Java Heap. All Java objects are stored on the heap.  In order to reduce object allocation per thread, we need to write better code that does not mis-use memory or leak memory. This can be achieved by following standard programming practices.

There is by far a lot of work that goes into making a system performance ready and scalable. These are just basic tips.


Disclaimer Alert ****
I am not by any means a systems designer. I don't claim to know anything about programming.
Please take everything you read here with a grain of salt.