Technologies on which the world of Instagram exist
Click, edit and upload. That’s the mantra when it comes to social media these days, especially Instagram. It is the one-shot destination where picture perfect settings and people come together to create an eye soothing feed. But with millions of users, it is worth asking: what makes this giant function?. Here are the technologies which keep Instagram’s wheels turning:
Ubuntu Linux 11.04 is run by Instagram’s engineers on Amazon EC2. This specific version is used because previous Ubuntu versions were found to be unpredictable. They were also freezing a lot on EC2 under high traffic.
When it comes to application servers, they use Djangoon Amazon Extra Large High-CPU machines. They have to keep on adding new machines as the usage grows. Instagram’s engineers have these extra large CPUs because they have found their work had to be more CPU bound rather than memory bound. So these machines provide the necessary CPU to memory balance. To run the bulk of command, like when deploying code, they use Fabric.
Task queue and push notifications
You must share a photo from Instagram to other social media sites often. To notify the real-time subscribers of the brand new photoset posted, Instagram uses a task queue system called Gearman. You can click here for more details regarding it.
Each and every request to the servers is sent through the load balancing machines. They use Amazon’s Elastic Load Balancer along with 3 NGINX with it. These can be easily taken out of the rotation if they fail. They also limit their SSL right at the ELB level, which decreases CPUs load on nginx.
Most of the site’s data which includes tags, users, photo and more are stored in PostgreSQL. The main cluster involves a set of 12 Extra-Large Quadruple Extra-Large memory instances. To get a reasonably good IO performance, they set up their EBS (it’s a network disc system) in a software RAID by making use of mdadm. To manage their memory data, they use a tool called vmtouch.
- PostgreSQL: All of their PostgreSQL is run up in a master replica set up which uses Streaming Replication. They also use XFS as their file system which allows them to both freezes and unfreezes all the RAID arrays while snapshotting. This guarantees a consistent snapshot.
- Photos: The photos themselves go straight over to Amazon S3, which currently stores a lot of terabytes of photo data. When it comes to CDN, then Amazon CloudFront is used. This helps users with image load time all over the world.
- Redis: This is the main feed, sessions system and activity feed, along with other related systems. Since all of Redis’ data also needs to be able to fit in memory, so they run many Extra-Large Quadruple Memory instances for it too.
- Caching: For this, the engineers use Memcached. They connect this using libmemcached and pylibmc.
To run a juggernaut like Instagram, a lot of tools and workers are needed. This isn’t just because of the huge size of the site, but also because of the changing needs of the site and its subscribers. So all the technologies used is frequently changing according to the needs and times.
The editorial unit