sonic-buildimage/files/image_config
Samuel Angebault 0c4d4ace76
[kdump] Fix OOM events in crashkernel (#6447)
A few issues where discovered with crashkernel on Arista platforms.

1) platforms using `docker_inram=on` would end up OOM in kdump environment.
This happens because the same initramfs is used by SONiC and the crashkernel.
With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion.
Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM.
This OOM event can in turn trigger a panic.

2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot.
This happens because the `boot0` in secureboot mode is strict about kernel parameter injection.

3) The secureboot path allowlist would remove kernel crash reports.

4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot`

**- How I did it**

1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster.
This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization.
The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel.
The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`.

2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified.
The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode.
It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot.
This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline.

3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths`

4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used.

**- How to verify it**

Regular boot:
 - enable kdump
 - enable docker_inram=on via kernel-params
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness OOM events on the console
 - after: crash kernel works and crash available under /var/crash

Secure boot:
 - enable kdump
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness no kdump
 - after: crash kernel works and crash available under /var/crash


Co-authored-by: Boyang Yu <byu@arista.com>
2021-02-02 01:55:09 -08:00
..
apt change image apt source list from stretch to buster for arm 2020-05-25 13:15:19 +00:00
bash [bash.bashrc] Add reverse SSH script to bash.bashrc (#5438) 2020-11-24 14:11:53 +08:00
config-chassisdb [ChassisDB]: bring up ChassisDB service (#5283) 2020-10-14 15:15:24 -07:00
config-setup Take a copy of existing TACACS credentials and restore it during upgrade (#6285) 2021-01-07 16:45:38 -08:00
constants [bgpcfgd]: Fixes for BBR (#5956) 2020-11-19 00:07:58 -08:00
copp Copp Manager Changes (#4861) 2020-11-23 09:31:42 -08:00
corefile_uploader [Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109) 2020-12-03 15:57:50 -08:00
cron.d [core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659) 2019-10-23 15:55:47 -07:00
ebtables [baseimage]: Updates for Ebtables and support for multi-asic (#6542) 2021-01-27 08:36:10 -08:00
environment [image]: Update login message (#706) 2017-06-14 15:18:02 -07:00
fstrim [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
hostname [hostname-config] improve hostname-config process (#3676) 2019-10-29 08:30:27 -07:00
interfaces Set preference for forced mgmt routes (#5844) 2020-11-10 14:20:13 -08:00
kdump [kdump] Add more kernel panic conditions for vmcore dump (#6095) 2020-12-15 08:54:13 -08:00
kubernetes [baseimage]: Install Kubernetes packages if enabled in image (#4374) 2020-04-13 08:41:18 -07:00
logrotate [Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310) 2021-01-22 09:42:19 -08:00
misc [Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109) 2020-12-03 15:57:50 -08:00
monit [Monit] Monitoring the running status of containers. (#6251) 2021-01-07 19:52:22 -08:00
ntp [ntp]: Source interface support for NTP (#6033) 2020-12-21 05:34:13 -08:00
pcie-check Fix bug with pcie-check.service (#5368) 2020-09-15 15:21:31 -07:00
platform [baseimage]: Updates for Ebtables and support for multi-asic (#6542) 2021-01-27 08:36:10 -08:00
rsyslog Move frr logs from syslog to /var/log/frr/*.log (#5988) 2020-12-10 08:44:34 -08:00
secureboot [kdump] Fix OOM events in crashkernel (#6447) 2021-02-02 01:55:09 -08:00
snmp mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from con… (#4057) 2020-01-28 17:41:21 -08:00
sudoers [baseimage]: add docker ps to the sudoer file (#6604) 2021-01-29 08:16:32 -08:00
sysctl Set sock rx Buf size to 3MB. (#5566) 2020-10-15 14:40:59 -07:00
syslog [baseimage]: /host unmount timeout issue during reboot. (#5032) 2020-07-25 01:27:58 -07:00
system-health [system-health] Add support for monitoring system health (#4835) 2020-10-12 11:12:49 +03:00
systemd [services] Restart SwSS service upon unexpected critical process exit (#2845) 2019-05-01 08:02:38 -07:00
topology [platform] Add Support For Environment Variable File (#5010) 2020-07-31 17:59:09 -07:00
updategraph [platform] Add Support For Environment Variable File (#5010) 2020-07-31 17:59:09 -07:00
warmboot-finalizer [warm boot finalizer] only wait for enabled components to reconcile (#6454) 2021-01-15 07:48:11 -08:00
watchdog-control [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00