一、YARN的故障
即使完美的软件也会有故障, YARN 是为了减少停机时间,而不是组件故障。
下图显示YARN中故障监控时, 各组件的通信来确保都存活的, 在故障发生时, 每个组件都有中重启机制。
为了检查各组件是否存活, 定期巡检, 并处理故障组件。
YARN Work-Preserving Restarts 相关配置:
- Enabled by default in HDP 2.3
- Enables long-term
- storage of NodeManager logs by storing them in a central location in HDFS -Avoids the need to truncate logs in order to conserve space on a local file system -Provides ability to centrally view log files via a single web UI (the Job History Server)
YARN Log Aggregation 默认配置: