FB maybe is not the best example for quality design (probably each of us is likely to have at least a single error message a day).
However, since it's the 2nd largest site in the world with contsant exponential growth, it's a good lesson to have. Most interesting that their tech is team is only about 250 people and 30K servers, pretty amazing.
Not long ago a Jeff Rothschild, the Vice President of Technology at Facebook gave a presentation in UC San Diego. You can find a detailed summary by Prof. Amin Vahdat, but I'll make several top comments that I found useful.
Keep Performing,
Moshe Kaplan. Performance Expert.
However, since it's the 2nd largest site in the world with contsant exponential growth, it's a good lesson to have. Most interesting that their tech is team is only about 250 people and 30K servers, pretty amazing.
Not long ago a Jeff Rothschild, the Vice President of Technology at Facebook gave a presentation in UC San Diego. You can find a detailed summary by Prof. Amin Vahdat, but I'll make several top comments that I found useful.
- FB Software development: mostly PHP (complied), but other languages are used as well.
- Common interface between services using an internal development that was turned to open source: Thirft. This interface enables easy connections between the different languages.
- Logging Framework that is not dependent on a central repository or its availability. FB is using Hadoop and well as Hive (that was developed there as well). The log size is growing at a 25TB a day.
- Operational Monitoring: Separated from the logging mechanism
- LAN can be a bottleneck as well: expect for packet loss and packet drops in the LAN if you stress it too much.
- CDN: Facebook is using external CDN to images distribution.
- Dedicated file system named Haystack that combines simple storage along with cache directory: file system is accessed only once to get images from it, while directory structure is retrieved from the cache.
- Most data is served from Memcached. Database is used mostly for Persistency and data replication between sites (Memcached is being heated by the MySQL itself):
- Top challenge: keeping data consistent since Memcached can be messed easily (No search for keys is available).
- Mixing information including sizes and types is better - making sure that load on CPU, Memory and etc is distributed equally.
- Shared nothing - Keep your system independent - avoid a single bottleneck. Therefore, data is saved in a sharded MySQL from day 1. However, MySQL is used mostly for data persistency and not for conservative database usage pattern:
- No Joins in the MySQL
- Chosen due to good data consistency + Management Software
- 4K servers
- Data replication between sites is based on MySQL replication
- Memcached is being heated based on the MySQL Replication using a custom API
Keep Performing,
Moshe Kaplan. Performance Expert.