Scaling Instragram Infrastructure

Notes

  • Sending notifications to a person whose photo you liked: RabbitMQ -> Celery
  • Django / Python for web server / application
  • PostgreSQL to store users, medias, friendships, etc.
    • Master with multiple replicas, where reads happen on replicas (Master-Slave Replication)
    • To deal with increased latency with writes, by batching requests wherever possible
    • Replication lag from Master to slave replicas was not a big issue (for them)
  • Cassandra NoSQL (wide column store) to store user feeds, activities, etc.
    • All replicas have the same copy of data, with eventual consistency
  • Image conversion is done at read time, not upload time

Scaling

  • Used a pod or group of Django, RabbitMQ to represent the application stack for one region
    • Problem #1: Initially memcached would be in each local pod, but that lead to data staleness (memcached data from pod #1 could be stale vs. pod #2). Couldn't use memcached across regions and keep global consistency
    • Solution #1: Did cache invalidation. When a PostgreSQL insert comes in for pod #1, a daemon on the Postgres read replicas does cache invalidation in their local pods
    • Problem #2: More database reads, especially for really simple stuff like number of user likes, which used to rely only on cache
    • Solution #2: Indexed table, increased read speed by orders of magnitude.
    • Problem #3: Still high load on DB though
    • Solution #3: Use memcached lease-get, least-set. Good tradeoff for UX like 1 million likes vs. 1 million and 10 likes (some staleness is okay).
  • Server growth was outpacing user growth - not good
  • Switched from PostgreSQL to TaoDB to avoid long replication times
    • TaoDB is a DB + Cache internal at Facebook, uses write-through architecture
    • Simplified object model which makes development easier
  • Use Canary to load test in production (500's vs. 200's)
  • Deployments to 20,000 servers in about 10 minutes