Why I did it
During the upgrade process via k8s, the feature's systemd service will restart as well, all of the feature systemd service has restart number limit, and the limit number is too small, only three times. if fallback happens when upgrade, the start count will be 2, just once again, the systemd service will be down. So, need to bypass this. This restart function will be called when do local -> kube, kube -> kube, kube ->local, each time call this function, we indeed need to restart successfully, so do reset-failed every time we do restart.
When need to go back to local mode, we do systemd restart immediately without waiting the default restart interval time so that we can reduce the container down time.
Work item tracking
Microsoft ADO (number only):
24172368
How I did it
Before every restart for upgrade, do reset feature's restart number. The restart number will be reset to 0 to bypass the restart limit.
When need to go back to local mode, we do systemd restart immediately.
How to verify it
Feature's systemd service can be always restarted successfully during upgrade process via k8s.
src/sonic-platform-daemons
* bef58aa - (HEAD -> 202205, origin/202205) Added PCIe transaction check for all peripherals on the bus (#331) (10 hours ago) [Ashwin Srinivasan]
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.
more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211
#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
src/sonic-platform-common
* 56f227a - (HEAD -> 202205, origin/202205) More prevention of fatal exception caused by VDM dictionary missing fields when a transceiver has just been pulled (#376) (3 hours ago) [snider-nokia]
Why I did it
To reduce the container's dependency from host system
Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.
How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.
Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
How I did it
Free up Multiprocessing Manager resource at task stop request
[self.mpmgr.shutdown() in task_stop]
How to verify it
time systemctl stop system-health.service
* [Arista] Update hwsku.json for Arista-7050QX-32S-S4Q31
* Change to 3x10G(3)+1x1G(1) on Arista-7050QX-32S-S4Q31
Co-authored-by: byu343 <byu@arista.com>
* [chassis][lldp] Fix the lldp error log in host instance which doesn't contain front pannel ports
---------
Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
* [buildsystem] Fix hiredis package version: 0.14.1-1 (#15461)
- Why I did it
To fix hiredis compilation
- How I did it
Changed package version: 0.14.0-3~bpo9+1 -> 0.14.1-1
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Update Makefile
---------
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
Why I did it
Fix incorrectly specified table name in the extra queues and extra pgs j2 files for 8101-32FH-O
How I did it
Update platform module to 202205.2.2.7
Update SAI xgs version to 7.1.50.4 to include the following changes:
patch fix from CSP CS00012282080 needed to support speed change from 400g to 100g on chassis linecards.
Backport SONIC-71507 VSQF/VSQE are not created after port creation. JIRA# SONIC-71507
Backport JIRA SONIC-70704 to rel_ocp_sai_7_1. JIRA# SONIC-70704
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* Updated default ECN settings for T2 chassis (#14388)
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
* Fix for test failures
* Test case failures
* test case fix
DNX fixes:
CS00012287482 - support for 1024 LAGs on DNX
Other changes (XGS fixes)
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER