What I did:
Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state
Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops
For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB.
Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.
Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507
How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard
How to verify it
Tests on chassis
- Why I did it
Update SAI xml file to align with the default SKU
- How I did it
Update the SN5600 SAI xml file
- How to verify it
Install image on SN5600 device
- Why I did it
Update the sensors.conf and pcie.yaml according to the real hardware.
- How I did it
Update the sensors.conf and pcie.yaml
- How to verify it
run relevant sonic-mgmt test cases.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add fan direction check to system health, all fans should be in the same direction
- How I did it
Add fan direction check to system health, all fans should be in the same direction
- How to verify it
Manual test
Unit test
Added sonic-mgmt test case to verify
Manual Cherrypick of #15260
Why I did it
Bug fix:
I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.
Work item tracking
Microsoft ADO (number only):
How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer
How to verify it
run full sonic-mgmt regression
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles
How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold
How to verify it
Verified the changes manually on a Th device.
Existing unit tests render Th template from the RDMA-CENTRIC folder. Updated the expected output to use the correct threshold
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.
Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.
How to verify it
Check whether the old version images are removed
Why I did it
Introduce a new valid neighbor element type to YANG.
Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.
How to verify it
Passes UTs
Why I did it
Remove 'kvmtest-t0' and 'kvmtest-t1-lag' test jobs since all the test jobs are required (continueOnError: false) already, and will only enable one of classical and testbedV2 tests, no need to do an unnecessary 'or' compute test job.
Change agent pool to reduce cost and avoid congestion
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
Why I did it
Manually cherry-pick and resolve conflicts of this PR: #15109
Extend device_metadata yang model.
Work item tracking
Microsoft ADO (number only): 22912178
How I did it
Add rack_mgmt_map field in yang model.
How to verify it
Build image.
- Why I did it
In order to reduce sonic build time, there is an option to acquire sonic slave docker(s) from artifact server (reduce sonic make configure time).
Current implementation supports only convention of:
<REGISTRY_SERVER>:<REGISTRY_PORT>/<SLAVE_BASE_IMAGE>:<SLAVE_BASE_TAG>
In case the SLAVE_BASE_IMAGE appear in internal path inside the server, the convention should be like that:
<REGISTRY_SERVER>:<REGISTRY_PORT><REGISTRY_SERVER_PATH>/<SLAVE_BASE_IMAGE>:<SLAVE_BASE_TAG>
When REGISTRY_SERVER_PATH (that is set on rules/config) will have to start with "/".
If REGISTRY_SERVER_PATH will not be set, the behavior will remain the same it works today.
- How I did it
Add ability to set REGISTRY_SERVER_PATH and update the code for docker image tag and docker image pull accordingly
- How to verify it
Use sonic slave docker image from artifact server in which the image is kept in internal folder and make sure it consume it.
Why I did it
systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.
Note: This won't be observed on the latest master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers
root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
Expected
Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
snmp Down Down Inactive
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
How I did it
Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this
Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ]
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]
Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ]
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
* To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822
How I did it
Add downloading ptf afpacket module in docker file.
How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
Why I did it
When reboot the chassis by issuing "sudo reboot" on Supervisor card. The internal midplane communication xe0 should be shutdown to avoid double reboot on the linecard.
Added a udev link rule to disable the autoneg on AMD xgbe port Xe0 and Xe1 and make the setting in sync with the peer Broadcom greyhound ports.
How I did it
Modify the Nokia-7250IXRE specific reboot script on the Supervisor card to shutdown the internal interface xe0. Also move reboot linecard code to the top of the script to make sure the notification has been send to Linecard before shutdown the xe0 interface.
Introduced a new rule 80-net-by-driver.link to disable the autoneg on the AMD size. This change requires the latest NDK which contains the change to set the autoneg on the xe0 and xe1 port on the Greyhound.
Signed-off-by: mlok <marty.lok@nokia.com>
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
#### Why I did it
When user enable TACACS per-command authorization, and run a command with wildcard , if the command match more than hundreds of files, the per-command authorization will failed with following message:
*** authorize failed by TACACS+ with given arguments, not executing
The root cause of this issue is because bash will match files with wildcard and replace with wildcard args with matched files. when there are too many files, TACACS plugin will generate a big authorization request, which will be reject by server side.
##### Work item tracking
- Microsoft ADO **(number only)**: 18074861
#### How I did it
Fix bash patch file, use original user inputs as authorization parameters.
#### How to verify it
Pass all UT.
Create new UT to validate the TACACS authorization request are using original command arguments.
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8115
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [X] 202211
#### Tested branch (Please provide the tested image version)
- [x] 202205.258490-412b83d0f
- [x] 202211.71966120-1b971c54b5
#### Description for the changelog
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
Depends on https://github.com/sonic-net/sonic-linux-kernel/pull/315
#### Why I did it
The name SECURE_UPGRADE_DEV_SIGNING_CERT is misleading, this flag is relevant to both to dev and prod signing.
#### How I did it
Rename all mentions of name SECURE_UPGRADE_DEV_SIGNING_CERT to SECURE_UPGRADE_SIGNING_CERT - this is also done with PR in sonic-linux-kernel repository
#### How to verify it
Build SONiC using your own prod script
- Why I did it
Fix issue with signing tool not running due to being call with the path from the host and not the path it is mounted on inside the docker-slave
- How I did it
Modified the path on the SECURE_UPGRADE_PROD_SIGNING_TOOL flag to the path where it is mounted inside the slave docker
- How to verify it
Build SONiC using your own prod script