FB maybe is not the best example for quality design (probably each of us is likely to have at least a single error message a day).
However, since it's the 2nd largest site in the world with contsant exponential growth, it's a good lesson to have. Most interesting that their tech is team is only about 250 people and 30K servers, pretty amazing.
Not long ago a Jeff Rothschild, the Vice President of Technology at Facebook gave a presentation in UC San Diego. You can find a detailed summary by Prof. Amin Vahdat, but I'll make several top comments that I found useful.
Keep Performing,
Moshe Kaplan. Performance Expert.
However, since it's the 2nd largest site in the world with contsant exponential growth, it's a good lesson to have. Most interesting that their tech is team is only about 250 people and 30K servers, pretty amazing.
Not long ago a Jeff Rothschild, the Vice President of Technology at Facebook gave a presentation in UC San Diego. You can find a detailed summary by Prof. Amin Vahdat, but I'll make several top comments that I found useful.
- FB Software development: mostly PHP (complied), but other languages are used as well. 
 - Common interface between services using an internal development that was turned to open source: Thirft. This interface enables easy connections between the different languages.
 - Logging Framework that is not dependent on a central repository or its availability. FB is using Hadoop and well as Hive (that was developed there as well). The log size is growing at a 25TB a day.
 - Operational Monitoring: Separated from the logging mechanism
 - LAN can be a bottleneck as well: expect for packet loss and packet drops in the LAN if you stress it too much. 
 - CDN: Facebook is using external CDN to images distribution. 
 - Dedicated file system named Haystack that combines simple storage along with cache directory: file system is accessed only once to get images from it, while directory structure is retrieved from the cache.
 - Most data is served from Memcached. Database is used mostly for Persistency and data replication between sites (Memcached is being heated by the MySQL itself):
 - Top challenge: keeping data consistent since Memcached can be messed easily (No search for keys is available).
 - Mixing information including sizes and types is better - making sure that load on CPU, Memory and etc is distributed equally.
 - Shared nothing - Keep your system independent - avoid a single bottleneck. Therefore, data is saved in a sharded MySQL from day 1. However, MySQL is used mostly for data persistency and not for conservative database usage pattern:
 - No Joins in the MySQL
 - Chosen due to good data consistency + Management Software
 - 4K servers
 - Data replication between sites is based on MySQL replication
 - Memcached is being heated based on the MySQL Replication using a custom API
 
Keep Performing,
Moshe Kaplan. Performance Expert.






