Why I did it
To reduce the container's dependency from host system
Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.
How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.
Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
How I did it
Free up Multiprocessing Manager resource at task stop request
[self.mpmgr.shutdown() in task_stop]
How to verify it
time systemctl stop system-health.service
* [chassis][lldp] Fix the lldp error log in host instance which doesn't contain front pannel ports
---------
Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
* [buildsystem] Fix hiredis package version: 0.14.1-1 (#15461)
- Why I did it
To fix hiredis compilation
- How I did it
Changed package version: 0.14.0-3~bpo9+1 -> 0.14.1-1
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Update Makefile
---------
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Updated default ECN settings for T2 chassis (#14388)
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
* Fix for test failures
* Test case failures
* test case fix
Why I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
Work item tracking
Microsoft ADO (number only): 24101023
How I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
How to verify it
Ran unittest to verify this update:
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
* [YANG] Add MUX_CABLE yang model (#11797)
Why I did it
Address issue #10970
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-mux-cable.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Pass sonic-config-engine unit test.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
f8fe41a023/src/sonic-yang-models/doc/Configuration.md (mux_cable)
* [YANG] add peer switch model (#11828)
Why I did it
Address issue #10966
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-peer-switch.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
b721ff87b9/src/sonic-yang-models/doc/Configuration.md (peer-switch)
src/sonic-swss
* 6a193e0 - (HEAD -> 202205, origin/202205) [Dual-ToR][ACL] bind LAG to ACL table in order to guarantee rule coverage if lag menber will be added to LAG after binding (#2749) (5 hours ago) [Andriy Yurkiv]
* Re-add 127.0.0.1/8 when bringing down the interfaces
With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.
To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.
Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
src/sonic-utilities
* 0b878087 - (HEAD -> 202205, origin/202205) Add display support for serial field in show chassis modules status CLI (#2858) (4 days ago) [amulyan7]
src/sonic-swss
* 2aec547 - (HEAD -> 202205, origin/202205) [pfcwd] Enhance DLR_INIT based recovery and DLR_PACKET_ACTION for broadcom platforms (#2807) (4 days ago) [Neetha John]
* fa4acd3 - Fix to substract the macsec sectag size from port MTU during InitializePort (#2789) (5 days ago) [judyjoseph]
src/sonic-utilities
* cd08aa69 - (HEAD -> 202205, origin/202205) [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2853) (2 days ago) [vdahiya12]
* 4b96bd7e - Update pcieutil error message on loading common pcie module (#2786) (6 days ago) [cytsao1]
* 77b725ca - Fix the show interface counters throwing exception on device with no external interfaces (#2851) (6 days ago) [abdosi]
Why I did it
Update cable length for uplink/downlink ports for chassis and and update PG/pool headroom size accordingly.
Work item tracking
17880812
How I did it
Updated cable length as well as buffer config in HWSKU files.
Why I did it
To improve readability of config.bcm, fixed the alignment of soc properties
How to verify it
Build sonic_config_engine-1.0-py3-none-any.whl successfully
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.
Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.
How to verify it
Check whether the old version images are removed
Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
Why I did it
Introduce a new valid neighbor element type to YANG.
Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.
How to verify it
Passes UTs
Co-authored-by: Jing Kan <jika@microsoft.com>
This is backport of #14757
SONiC Yang model support for IPv6 link local
What I did
Created SONiC Yang model for IPv6 link local
How I did it
Defined Yang models for IPv6 link local based on https://github.com/sonic-net/SONiC/blob/master/doc/ipv6/ipv6_link_local.md
How to verify it
Added enable test case.
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
src/sonic-platform-common
* ff72811 - (HEAD -> 202205, origin/202205) Fix issue '<' not supported between instances of 'NoneType' and 'int' (#371) (5 hours ago) [Junchao-Mellanox]
* f2a419d - Render Media lane and Media assignment options info from Application Code (#368) (8 hours ago) [rajann]
* d8bad10 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (8 hours ago) [mihirpat1]
Why I did it
This PR is to backport PR #11117 into 202205 branch.
This PR is to define Yang model for SYSTEM_DEFAULTS table.
The table was introduced in PR sonic-net/SONiC#982
The table will be like
"SYSTEM_DEFAULTS": {
"tunnel_qos_remap": {
"status": "enabled"
}
}
Work item tracking
Microsoft ADO (https://msazure.visualstudio.com/One/_workitems/edit/23037078)
How I did it
Add a new yang file sonic-system-defaults. Yang.
How to verify it
Verified by UT
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.
How I verify:
Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.
Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.
Note: This won't be observed on the latest master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers
root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
Expected
Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
snmp Down Down Inactive
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
How I did it
Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this
Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ]
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]
Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ]
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
There are chassis-packet and Single asic platforms which support this 400G to 100G/40G speed change via config.
Enabling this feature for all platforms which can support this. Keeping it enabled for all does not affect the platforms
which do not support this feature yet.
Work item tracking
Microsoft ADO (number only):
17952356
- How I did it
Removed switch_type and role type check.
- How to verify it
Loaded router with default 400G config. Loaded minigraph to convert 400G to 100G speed.
Signed-off-by: anamehra <anamehra@cisco.com>
src/sonic-platform-common
* c7ce1a5 - (HEAD -> 202205, origin/202205) Prevent VDM dictionary related KeyError when a transceiver module is pulled while a bulk get method is interrogating said module (#360) (5 days ago) [snider-nokia]
* Support ACL interface type BmcData in minigraph parser
* Support ACL interface type BmcData in minigraph parser
* add unittest
* Add a global dict for storing the defination of custom acl tables
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
#### Why I did it
When user enable TACACS per-command authorization, and run a command with wildcard , if the command match more than hundreds of files, the per-command authorization will failed with following message:
*** authorize failed by TACACS+ with given arguments, not executing
The root cause of this issue is because bash will match files with wildcard and replace with wildcard args with matched files. when there are too many files, TACACS plugin will generate a big authorization request, which will be reject by server side.
##### Work item tracking
- Microsoft ADO **(number only)**: 18074861
#### How I did it
Fix bash patch file, use original user inputs as authorization parameters.
#### How to verify it
Pass all UT.
Create new UT to validate the TACACS authorization request are using original command arguments.
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8115
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [X] 202211
#### Tested branch (Please provide the tested image version)
- [x] 202205.258490-412b83d0f
- [x] 202211.71966120-1b971c54b5
#### Description for the changelog
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
* yang mode support for neighbor metadata
* add description in leaf node
* modify description
Co-authored-by: jcaiMR <111116206+jcaiMR@users.noreply.github.com>
* [minigraph] add support for changing T1 ports speed from 400G to 100G and vice-versa (#14505)
Open
[minigraph] add support for changing T1 ports speed from 400G to 100G and vice-versa
vdahiya12 wants to merge 9 commits into sonic-net:master from vdahiya12:dev/vdahiya/minigraph_parser
Conversation 10
Commits 9
Checks 18
Files changed 5
Conversation
vdahiya12
@vdahiya12 vdahiya12 commented 2 weeks ago •
On SONiC T1 cisco 8101 HwSku, the speed changes are done from 400G to 100G needs to be supported on 400G ports.
To enable this, along with speed change the port lanes need to be changed. This PR has the changes to update the port lanes when such speed change happens.
Basically if Banwidth in minigraph.xml intends to enable a 100G speed on a 400G port, then the appropriate lane change and speed change needs to be invoked in mingraph parser
Example if port_config.ini dicatates the speed to be 400G and minigraph has 100G speed, then this changeneeds to be accommodated
Ethernet96 1536,1537,1538,1539,1540,1541,1542,1543 etp12 12 400000 0
<DeviceLinkBase>
<ElementType>DeviceInterfaceLink</ElementType>
<EndDevice>ARISTA01T2</EndDevice>
<EndPort>Ethernet1</EndPort>
<StartDevice>Device-8101-01</StartDevice>
<StartPort>etp12</StartPort>
<Bandwidth>100000</Bandwidth>
</DeviceLinkBase>
These platforms today have 400g port with 8 serdes lines, and 100g will operate with 4 serdes lane. When the port speed changes from 400G to 100G the first 4 lanes will be used for 100G port.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* add all
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* fix unit
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
---------
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
src/sonic-platform-common
* c97af3c - (HEAD -> 202205, origin/202205) Modify sfputil show fwversion to include build version for active/inactive FW version fields (#367) (2 days ago) [mihirpat1]
* 7705a20 - Adding electrical for 800G and 100G (#365) (2 days ago) [mihirpat1]
* d0038fc - SFF-8472: Fix tx_disable_channel to avoid write to read-only bit (#364) (2 days ago) [mihirpat1]
* 518a471 - fix get module hardware minor revision (#361) (2 days ago) [Qingxiao Ren]
src/sonic-platform-daemons
* f913e9c - (HEAD -> 202205, origin/202205) [CMIS] Add power up duration for power up timeout (#345) (2 hours ago) [ChiouRung Haung]
src/sonic-sairedis
* 7f6abdd - (HEAD -> 202205, origin/202205) Revert "Ignore removing switch for mellanox platform due to known limitation (#1216)" (#1231) (8 days ago) [Junchao-Mellanox]
Why I did it
refine reproducible build.
How I did it
Fix reset map variable in bash.
Ignore empty web file md5sum value.
If web file didn't backup in azure storage, use file on web.
How to verify i
What/Why I did:
Allow traffic with source and destination as chassis eth1-midplane ip. Needed for Supervisor Redis-db connection (Redis packet has source and destination ip as eth1-midpane) after we load acl.json that has catch-all drop rule. Changes are generic and not specific to supervisor and applies on LC also.
Made multi_asic_ns_to_host_fwd as False for ACL service for External Client. This flag is needed for service SSH and SNMP where traffic can come in namespace over front-panel ports and we need to send the traffic in host where corresponding docker/service are running. There is no use-case of External client service for multi-asic as of now. Having flag as True creates failure when we try to load acl.json.
src/sonic-utilities
* ece22b7d - (HEAD -> 202205, origin/202205) Revert "[GCU] Add PFC_WD RDMA validator (#2781)" (4 minutes ago) [Ying Xie]
* 7d16b184 - Remove the no use new line in show version (#2792) (21 hours ago) [xumia]
* 3a880a2b - Support to display the SONiC OS Version in the command show version (#2787) (21 hours ago) [xumia]
* a5199f75 - [voq][chassis][generate_dump] [BCM] Dump only the relevant BCM commands for fabric cards (#2606) (21 hours ago) [saksarav-nokia]
* 2410d364 - Fixed a bug in "show vnet routes all" causing screen overrun. (#2644) (#2801) (
Why I did it
Change static route expiry timer max timeout value from 1800 to 172800.
To keep same value range as defined in sonic-restapi/sonic_api.yaml
How I did it
How to verify it
apply change to bgpcfd, restart bgp container see if the value take action.
#### Why I did it
When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds.
Fix issue https://github.com/sonic-net/sonic-buildimage/issues/14381
#### How I did it
The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port.
Implement the op in LACP runner to disable the port which leads to proper LACPDU send.
#### How to verify it
Set LAG between 2 switches.
Set LAGs to be router port and set ip address.
In switch A send ping to ip address of LAG in switch B.
In switch B, while ping is running remove port from LAG.
Verify ping is not stopping.