src/sonic-utilities
* 1ed5b5a9 - (HEAD -> 202205, origin/202205) Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (cherry-pick to 202205) (#2950) (4 days ago) [longhuan-cisco]
* ba327726 - Fix in config override when all asic namespaces not present in golden_config_db (#2946) (4 days ago) [judyjoseph]
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
src/sonic-platform-common
* a6dd67e - (HEAD -> 202205, origin/202205) Comment out tx power validation check and program the passed value (#389) (29 hours ago) [abdosi]
Adding yang model for CONFIG_DB table MUX_LINKMGR|SERVICE_MGMT.
sign-off: Jing Zhang zhangjing@microsoft.com
Co-authored-by: Jing Zhang <zhangjing@microsoft.com>
src/sonic-swss
* 17c4d731 - (HEAD -> 202205, origin/202205) Remove system neighbor DEL operation in m_toSync if SET operation for (#2853) (3 hours ago) [Song Yuan]
src/sonic-platform-daemons
* 8147e25 - (HEAD -> 202205, origin/202205) Revert "Added PCIe transaction check for all peripherals on the bus (#331)" (12 hours ago) [Ying Xie]
sonic-build image side change to fix source interface selection in dual tor scenario.
dhcprelay related PR:
[master]fix dhcpv6 relay dual tor source interface selection issue sonic-dhcp-relay#42
Announce dhcprelay submodule to 6a6ce24 to include PR #42
Graceful restart is a key event for bgpd, related log print is debug level. To change it to info level to get more visibilities when this kind of event is triggered.
Why I did it
When sonic is managed by k8s, the sonic container is managed by k8s daemonset, daemonset identifies its members by labels. Currently when restarting a sonic service by systemctl, if the service's container is already managed by k8s, systemd script stops the container by removing the feature label to make it disjoin from k8s daemonset, and then starts it by adding the label to make it join k8s daemonset again.
This behavior would cause problem during k8s container upgrade. Containers in daemonset are upgraded in a rolling fashion, that means the daemonset version is updated first, then rollout the new version to containers with precheck/postcheck one by one. However, if a sonic device joins a daemonset, k8s will directly deploy a pod with the current version of daemonset, it is expected when a device joins k8s cluster at first time.
But for a device which has already joined k8s cluster, the re-joining daemonset will cause the container upgraded to new version without precheck, so if a systemd service is restarted during daemonset upgrade, the container may be upgraded without precheck and break rolling update policy. To fix it, we need to remove the logic about dropping k8s label in systemd service stop script for kube mode.
Work item tracking
Microsoft ADO (number only): 24304563
How I did it
Don't drop label in systemd service stop script when feature's set_owner is kube. Only drop label when feature's set_owner is local.
How to verify it
The label feature_enabled should be always true if the feature's set owner is kube.
Why I did it
When do clean up container images, current code has two bugs need to be fixed. And some variables' name maybe cause confused, change the variables' name.
Work item tracking
Microsoft ADO (number only): 24502294
How I did it
We do clean up after tag latest successfully. But currently tag latest function only return 0 and 1, 0 means succeed and 1 means failed, when we get 1, we will retry, when we get 0, we will do clean up. Actually the code 0 includes another case we don't need to do clean up. The case is that when we are doing tag latest, the container image we want to tag maybe not running, so we can not tag latest and don't need to cleanup, we need to separate this case from 0, return -1 now.
When local mode(v1) -> kube mode(v2) happens, one problem is how to handle the local image, there are two cases. one case is that there was one kube v1 container dry-run(cause we don't relace the local if kube version = local version), we will remove the kube v1 image and tag the local version with ACR prefix and remove local v1 local tag. Another case is that there was no kube v1 container dry-run, we remove the local v1 image directly, cause the local v1 image should not be the last desire version.
About the docker_id variable, it may cause confused, it's actually docker image id, so rename the variable. About the two dicts and the list, rename them to be more readable.
How to verify it
Check tag latest and image clean up result.
Why I did it
During the upgrade process via k8s, the feature's systemd service will restart as well, all of the feature systemd service has restart number limit, and the limit number is too small, only three times. if fallback happens when upgrade, the start count will be 2, just once again, the systemd service will be down. So, need to bypass this. This restart function will be called when do local -> kube, kube -> kube, kube ->local, each time call this function, we indeed need to restart successfully, so do reset-failed every time we do restart.
When need to go back to local mode, we do systemd restart immediately without waiting the default restart interval time so that we can reduce the container down time.
Work item tracking
Microsoft ADO (number only):
24172368
How I did it
Before every restart for upgrade, do reset feature's restart number. The restart number will be reset to 0 to bypass the restart limit.
When need to go back to local mode, we do systemd restart immediately.
How to verify it
Feature's systemd service can be always restarted successfully during upgrade process via k8s.
src/sonic-platform-daemons
* bef58aa - (HEAD -> 202205, origin/202205) Added PCIe transaction check for all peripherals on the bus (#331) (10 hours ago) [Ashwin Srinivasan]
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.
more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211
#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
src/sonic-platform-common
* 56f227a - (HEAD -> 202205, origin/202205) More prevention of fatal exception caused by VDM dictionary missing fields when a transceiver has just been pulled (#376) (3 hours ago) [snider-nokia]
Why I did it
To reduce the container's dependency from host system
Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.
How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.
Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
How I did it
Free up Multiprocessing Manager resource at task stop request
[self.mpmgr.shutdown() in task_stop]
How to verify it
time systemctl stop system-health.service
* [chassis][lldp] Fix the lldp error log in host instance which doesn't contain front pannel ports
---------
Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
* [buildsystem] Fix hiredis package version: 0.14.1-1 (#15461)
- Why I did it
To fix hiredis compilation
- How I did it
Changed package version: 0.14.0-3~bpo9+1 -> 0.14.1-1
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Update Makefile
---------
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Updated default ECN settings for T2 chassis (#14388)
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
* Fix for test failures
* Test case failures
* test case fix
Why I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
Work item tracking
Microsoft ADO (number only): 24101023
How I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
How to verify it
Ran unittest to verify this update:
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
* [YANG] Add MUX_CABLE yang model (#11797)
Why I did it
Address issue #10970
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-mux-cable.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Pass sonic-config-engine unit test.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
f8fe41a023/src/sonic-yang-models/doc/Configuration.md (mux_cable)
* [YANG] add peer switch model (#11828)
Why I did it
Address issue #10966
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-peer-switch.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
b721ff87b9/src/sonic-yang-models/doc/Configuration.md (peer-switch)
src/sonic-swss
* 6a193e0 - (HEAD -> 202205, origin/202205) [Dual-ToR][ACL] bind LAG to ACL table in order to guarantee rule coverage if lag menber will be added to LAG after binding (#2749) (5 hours ago) [Andriy Yurkiv]
* Re-add 127.0.0.1/8 when bringing down the interfaces
With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.
To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.
Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
src/sonic-utilities
* 0b878087 - (HEAD -> 202205, origin/202205) Add display support for serial field in show chassis modules status CLI (#2858) (4 days ago) [amulyan7]
src/sonic-swss
* 2aec547 - (HEAD -> 202205, origin/202205) [pfcwd] Enhance DLR_INIT based recovery and DLR_PACKET_ACTION for broadcom platforms (#2807) (4 days ago) [Neetha John]
* fa4acd3 - Fix to substract the macsec sectag size from port MTU during InitializePort (#2789) (5 days ago) [judyjoseph]
src/sonic-utilities
* cd08aa69 - (HEAD -> 202205, origin/202205) [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2853) (2 days ago) [vdahiya12]
* 4b96bd7e - Update pcieutil error message on loading common pcie module (#2786) (6 days ago) [cytsao1]
* 77b725ca - Fix the show interface counters throwing exception on device with no external interfaces (#2851) (6 days ago) [abdosi]
Why I did it
Update cable length for uplink/downlink ports for chassis and and update PG/pool headroom size accordingly.
Work item tracking
17880812
How I did it
Updated cable length as well as buffer config in HWSKU files.
Why I did it
To improve readability of config.bcm, fixed the alignment of soc properties
How to verify it
Build sonic_config_engine-1.0-py3-none-any.whl successfully
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.
Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.
How to verify it
Check whether the old version images are removed
Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
Why I did it
Introduce a new valid neighbor element type to YANG.
Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.
How to verify it
Passes UTs
Co-authored-by: Jing Kan <jika@microsoft.com>
This is backport of #14757
SONiC Yang model support for IPv6 link local
What I did
Created SONiC Yang model for IPv6 link local
How I did it
Defined Yang models for IPv6 link local based on https://github.com/sonic-net/SONiC/blob/master/doc/ipv6/ipv6_link_local.md
How to verify it
Added enable test case.
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
src/sonic-platform-common
* ff72811 - (HEAD -> 202205, origin/202205) Fix issue '<' not supported between instances of 'NoneType' and 'int' (#371) (5 hours ago) [Junchao-Mellanox]
* f2a419d - Render Media lane and Media assignment options info from Application Code (#368) (8 hours ago) [rajann]
* d8bad10 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (8 hours ago) [mihirpat1]
Why I did it
This PR is to backport PR #11117 into 202205 branch.
This PR is to define Yang model for SYSTEM_DEFAULTS table.
The table was introduced in PR sonic-net/SONiC#982
The table will be like
"SYSTEM_DEFAULTS": {
"tunnel_qos_remap": {
"status": "enabled"
}
}
Work item tracking
Microsoft ADO (https://msazure.visualstudio.com/One/_workitems/edit/23037078)
How I did it
Add a new yang file sonic-system-defaults. Yang.
How to verify it
Verified by UT