Reduce the disk space taken up during bootup and runtime.
1. Remove python package cache from the base image and from the containers.
2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup.
3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example).
* Remove pip2 and pip3 caches from some containers
Only containers which appeared to have a significant pip cache size are
included here.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Don't create var-log.ext4 if we're storing logs in memory
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Run tune2fs on the device containing /host to not reserve any blocks for just the root user
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 5617b1ae3e)
The haveged service file in Debian Buster specifies that haveged should
start after systemd-random-seed starts (this was removed in Bullseye
after systemd changes caused a bootloop). This is a bit
counterproductive, since haveged is meant to be used in environments
with minimal sources of entropy, but one of the checks that
systemd-random-seed does is to verify that entropy is present.
Therefore, override the default .service file for haveged that moves
systemd-random-seed to the Before list, allowing it to start before
systemd-random-seed checks the system entropy level. (systemd doesn't
allow removing items from dependency/ordering entries such as After= and
Before=, so the entire .service file has to be overwritten.)
Note that despite this, haveged takes up to two seconds to actually
start working, so systemd-random-seed may still block for about two
seconds. However, this still allows other work (such as running
rc.local) to proceed a bit sooner.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This can save 6 sec for teamd LAG restoration - the time between:
```
Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```
- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.
- How I did it
Kill teamd docker after graceful shutdown of teamd processes.
- How to verify it
Run warm reboot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
* Update container_checker for multi-asic devices
Update container_checker for multi-asic devices to add database containers in always_running_containers.
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.
* Update container_checker
Update an indent
Why I did it
To reduce the processing time of rc.local, refactoring s6100 platform initialization.
Fixing [warm-upgrade][202012] Slow DELL platform init in rc.local causes lacp-teardown #10150
How I did it
On branch 202012-s6100-rclocalChanges to be committed: (use "git restore --staged <file>..." to unstage)
modified: ../../../../files/image_config/platform/rc.local
modified: ../debian/platform-modules-s6100.install
modified: scripts/fast-reboot_plugin
modified: scripts/s6100_platform.sh
renamed: scripts/s6100_i2c_enumeration.sh -> scripts/s6100_platform_startup.sh
renamed: systemd/s6100-i2c-enumerate.service -> systemd/s6100-platform-startup.service
The marvel-armhf build is hung, it does not exist after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
$ cat postinst
sh /usr/sbin/nokia-7215_plt_setup.sh
...
$ cat usr/sbin/nokia-7215_plt_setup.sh | tail
python /etc/entropy.py &
$ cat etc/entropy.py
if path.exists("/proc/sys/kernel/random/entropy_avail"):
while 1:
while avail() < 2048:
with open('/dev/urandom', 'rb') as urnd, open("/dev/random", mode='wb') as rnd:
d = urnd.read(512)
t = struct.pack('ii', 4 * len(d), len(d)) + d
fcntl.ioctl(rnd, RNDADDENTROPY, t)
time.sleep(30)
It is a workaround to fix the build issue, need to fix debian package, and revert the change.
Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.
This is a cherry-pick of https://github.com/Azure/sonic-buildimage/pull/9302
- Why I did it
In order to allow traps without an entry in features table, to be installed automatically.
- How I did it
Add always_enabled field to traps without a feature
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs
How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
Description for the changelog
Add emmc quirks for Upperlake
Why I did it
Fixes#8980 partly.
The corresponding changes in sonic-sairedis is here :
Azure/sonic-sairedis#975
How I did it
Include changes from both repos and build an image for verification.
How to verify it
Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level.
Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>
Why I did it
Cherry pick changes in #9197 to 202012 branch
Add bgpcfgd support to advertise routes.
How I did it
Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly.
How to verify it
Added unit tests in bgpcfgd and verify on KVM about route advertisement.
Why I did it
Eliminate benign firsttime boot error reported when running on platforms that do not support kdump.
How I did it
Change rc.local to check for presence of the file /etc/default/kdump-tools before referencing it.
How to verify it
Install a new image on an armhf or arm64 platform and check for a failed reference to /etc/default/kdump-tools on firsttime boot.
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
- Remove unused imports
- Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
1. CS00012211718 [4.3] Pfcwd getting continuously triggered/restored when pause frames are sent continuously to both queues of a port (TD2/Th/Th2/TD3) MSFT Default
Preliminary tests look fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
fib/test_fib.py
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
decap/test_decap.py
ipfwd/test_dip_sip.py
ipfwd/test_dir_bcast.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
```
- add a new service "mark_dhcp_packet" to mux container
- apply packet marks on a per-interface basis in ebtables
- write packet marks to "DHCP_PACKET_MARK" table in state_db
This is to pick up BRCM SAI 4.3.5.1-7 fixes which contains the following fixes:
1. CS00012209390: SONIC-50037, Used SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP as a default decap map for IPinIP tunnels.
2. CS00012212995: SONIC-50948 SAI_API_QUEUE:_brcm_sai_cosq_stat_get:1353 egress Min limit get failed with error Invalid parameter
3. SONIC-51583: Fixed acl group member creation failure with priority of -1
4. CS00012215744:SONIC-51395 [TH, TH2] WB 3.5 to 4.3 fails at APPLY_VIEW while setting SAI_PORT_ATTR_EGRESS_ACL
5. SONIC-51638: SDK-249337 ERROR: AddressSanitizer: heap-buffer-overflow in _tlv_print_array
Preliminary tests look fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
fib/test_fib.py
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
decap/test_decap.py
ipfwd/test_dip_sip.py
ipfwd/test_dir_bcast.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
```