* [YANG] Add MUX_CABLE yang model (#11797)
Why I did it
Address issue #10970
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-mux-cable.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Pass sonic-config-engine unit test.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
f8fe41a023/src/sonic-yang-models/doc/Configuration.md (mux_cable)
* [YANG] add peer switch model (#11828)
Why I did it
Address issue #10966
sign-off: Jing Zhang zhangjing@microsoft.com
How I did it
Add sonic-peer-switch.yang and unit tests.
How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
Description for the changelog
Link to config_db schema for YANG module changes
b721ff87b9/src/sonic-yang-models/doc/Configuration.md (peer-switch)
src/sonic-swss
* 6a193e0 - (HEAD -> 202205, origin/202205) [Dual-ToR][ACL] bind LAG to ACL table in order to guarantee rule coverage if lag menber will be added to LAG after binding (#2749) (5 hours ago) [Andriy Yurkiv]
* Re-add 127.0.0.1/8 when bringing down the interfaces
With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.
To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.
Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
src/sonic-utilities
* 0b878087 - (HEAD -> 202205, origin/202205) Add display support for serial field in show chassis modules status CLI (#2858) (4 days ago) [amulyan7]
src/sonic-swss
* 2aec547 - (HEAD -> 202205, origin/202205) [pfcwd] Enhance DLR_INIT based recovery and DLR_PACKET_ACTION for broadcom platforms (#2807) (4 days ago) [Neetha John]
* fa4acd3 - Fix to substract the macsec sectag size from port MTU during InitializePort (#2789) (5 days ago) [judyjoseph]
src/sonic-utilities
* cd08aa69 - (HEAD -> 202205, origin/202205) [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2853) (2 days ago) [vdahiya12]
* 4b96bd7e - Update pcieutil error message on loading common pcie module (#2786) (6 days ago) [cytsao1]
* 77b725ca - Fix the show interface counters throwing exception on device with no external interfaces (#2851) (6 days ago) [abdosi]
Why I did it
Update cable length for uplink/downlink ports for chassis and and update PG/pool headroom size accordingly.
Work item tracking
17880812
How I did it
Updated cable length as well as buffer config in HWSKU files.
Why I did it
To improve readability of config.bcm, fixed the alignment of soc properties
How to verify it
Build sonic_config_engine-1.0-py3-none-any.whl successfully
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.
Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.
How to verify it
Check whether the old version images are removed
Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
Why I did it
Introduce a new valid neighbor element type to YANG.
Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.
How to verify it
Passes UTs
Co-authored-by: Jing Kan <jika@microsoft.com>
This is backport of #14757
SONiC Yang model support for IPv6 link local
What I did
Created SONiC Yang model for IPv6 link local
How I did it
Defined Yang models for IPv6 link local based on https://github.com/sonic-net/SONiC/blob/master/doc/ipv6/ipv6_link_local.md
How to verify it
Added enable test case.
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
src/sonic-platform-common
* ff72811 - (HEAD -> 202205, origin/202205) Fix issue '<' not supported between instances of 'NoneType' and 'int' (#371) (5 hours ago) [Junchao-Mellanox]
* f2a419d - Render Media lane and Media assignment options info from Application Code (#368) (8 hours ago) [rajann]
* d8bad10 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (8 hours ago) [mihirpat1]
Why I did it
This PR is to backport PR #11117 into 202205 branch.
This PR is to define Yang model for SYSTEM_DEFAULTS table.
The table was introduced in PR sonic-net/SONiC#982
The table will be like
"SYSTEM_DEFAULTS": {
"tunnel_qos_remap": {
"status": "enabled"
}
}
Work item tracking
Microsoft ADO (https://msazure.visualstudio.com/One/_workitems/edit/23037078)
How I did it
Add a new yang file sonic-system-defaults. Yang.
How to verify it
Verified by UT
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.
How I verify:
Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.
Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.
Note: This won't be observed on the latest master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers
root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
Expected
Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
snmp Down Down Inactive
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
How I did it
Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this
Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ]
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]
Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ]
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
There are chassis-packet and Single asic platforms which support this 400G to 100G/40G speed change via config.
Enabling this feature for all platforms which can support this. Keeping it enabled for all does not affect the platforms
which do not support this feature yet.
Work item tracking
Microsoft ADO (number only):
17952356
- How I did it
Removed switch_type and role type check.
- How to verify it
Loaded router with default 400G config. Loaded minigraph to convert 400G to 100G speed.
Signed-off-by: anamehra <anamehra@cisco.com>
src/sonic-platform-common
* c7ce1a5 - (HEAD -> 202205, origin/202205) Prevent VDM dictionary related KeyError when a transceiver module is pulled while a bulk get method is interrogating said module (#360) (5 days ago) [snider-nokia]
* Support ACL interface type BmcData in minigraph parser
* Support ACL interface type BmcData in minigraph parser
* add unittest
* Add a global dict for storing the defination of custom acl tables
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
#### Why I did it
When user enable TACACS per-command authorization, and run a command with wildcard , if the command match more than hundreds of files, the per-command authorization will failed with following message:
*** authorize failed by TACACS+ with given arguments, not executing
The root cause of this issue is because bash will match files with wildcard and replace with wildcard args with matched files. when there are too many files, TACACS plugin will generate a big authorization request, which will be reject by server side.
##### Work item tracking
- Microsoft ADO **(number only)**: 18074861
#### How I did it
Fix bash patch file, use original user inputs as authorization parameters.
#### How to verify it
Pass all UT.
Create new UT to validate the TACACS authorization request are using original command arguments.
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8115
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [X] 202211
#### Tested branch (Please provide the tested image version)
- [x] 202205.258490-412b83d0f
- [x] 202211.71966120-1b971c54b5
#### Description for the changelog
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
* yang mode support for neighbor metadata
* add description in leaf node
* modify description
Co-authored-by: jcaiMR <111116206+jcaiMR@users.noreply.github.com>
* [minigraph] add support for changing T1 ports speed from 400G to 100G and vice-versa (#14505)
Open
[minigraph] add support for changing T1 ports speed from 400G to 100G and vice-versa
vdahiya12 wants to merge 9 commits into sonic-net:master from vdahiya12:dev/vdahiya/minigraph_parser
Conversation 10
Commits 9
Checks 18
Files changed 5
Conversation
vdahiya12
@vdahiya12 vdahiya12 commented 2 weeks ago •
On SONiC T1 cisco 8101 HwSku, the speed changes are done from 400G to 100G needs to be supported on 400G ports.
To enable this, along with speed change the port lanes need to be changed. This PR has the changes to update the port lanes when such speed change happens.
Basically if Banwidth in minigraph.xml intends to enable a 100G speed on a 400G port, then the appropriate lane change and speed change needs to be invoked in mingraph parser
Example if port_config.ini dicatates the speed to be 400G and minigraph has 100G speed, then this changeneeds to be accommodated
Ethernet96 1536,1537,1538,1539,1540,1541,1542,1543 etp12 12 400000 0
<DeviceLinkBase>
<ElementType>DeviceInterfaceLink</ElementType>
<EndDevice>ARISTA01T2</EndDevice>
<EndPort>Ethernet1</EndPort>
<StartDevice>Device-8101-01</StartDevice>
<StartPort>etp12</StartPort>
<Bandwidth>100000</Bandwidth>
</DeviceLinkBase>
These platforms today have 400g port with 8 serdes lines, and 100g will operate with 4 serdes lane. When the port speed changes from 400G to 100G the first 4 lanes will be used for 100G port.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* add all
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* fix unit
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
---------
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
src/sonic-platform-common
* c97af3c - (HEAD -> 202205, origin/202205) Modify sfputil show fwversion to include build version for active/inactive FW version fields (#367) (2 days ago) [mihirpat1]
* 7705a20 - Adding electrical for 800G and 100G (#365) (2 days ago) [mihirpat1]
* d0038fc - SFF-8472: Fix tx_disable_channel to avoid write to read-only bit (#364) (2 days ago) [mihirpat1]
* 518a471 - fix get module hardware minor revision (#361) (2 days ago) [Qingxiao Ren]
src/sonic-platform-daemons
* f913e9c - (HEAD -> 202205, origin/202205) [CMIS] Add power up duration for power up timeout (#345) (2 hours ago) [ChiouRung Haung]
src/sonic-sairedis
* 7f6abdd - (HEAD -> 202205, origin/202205) Revert "Ignore removing switch for mellanox platform due to known limitation (#1216)" (#1231) (8 days ago) [Junchao-Mellanox]
Why I did it
refine reproducible build.
How I did it
Fix reset map variable in bash.
Ignore empty web file md5sum value.
If web file didn't backup in azure storage, use file on web.
How to verify i
What/Why I did:
Allow traffic with source and destination as chassis eth1-midplane ip. Needed for Supervisor Redis-db connection (Redis packet has source and destination ip as eth1-midpane) after we load acl.json that has catch-all drop rule. Changes are generic and not specific to supervisor and applies on LC also.
Made multi_asic_ns_to_host_fwd as False for ACL service for External Client. This flag is needed for service SSH and SNMP where traffic can come in namespace over front-panel ports and we need to send the traffic in host where corresponding docker/service are running. There is no use-case of External client service for multi-asic as of now. Having flag as True creates failure when we try to load acl.json.
src/sonic-utilities
* ece22b7d - (HEAD -> 202205, origin/202205) Revert "[GCU] Add PFC_WD RDMA validator (#2781)" (4 minutes ago) [Ying Xie]
* 7d16b184 - Remove the no use new line in show version (#2792) (21 hours ago) [xumia]
* 3a880a2b - Support to display the SONiC OS Version in the command show version (#2787) (21 hours ago) [xumia]
* a5199f75 - [voq][chassis][generate_dump] [BCM] Dump only the relevant BCM commands for fabric cards (#2606) (21 hours ago) [saksarav-nokia]
* 2410d364 - Fixed a bug in "show vnet routes all" causing screen overrun. (#2644) (#2801) (
Why I did it
Change static route expiry timer max timeout value from 1800 to 172800.
To keep same value range as defined in sonic-restapi/sonic_api.yaml
How I did it
How to verify it
apply change to bgpcfd, restart bgp container see if the value take action.
#### Why I did it
When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds.
Fix issue https://github.com/sonic-net/sonic-buildimage/issues/14381
#### How I did it
The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port.
Implement the op in LACP runner to disable the port which leads to proper LACPDU send.
#### How to verify it
Set LAG between 2 switches.
Set LAGs to be router port and set ip address.
In switch A send ping to ip address of LAG in switch B.
In switch B, while ping is running remove port from LAG.
Verify ping is not stopping.
src/sonic-platform-common
* 24009de - (HEAD -> 202205, origin/202205) Fix issue: should always check return value of a function if the function may return None (#350) (#356) (3 hours ago) [Junchao-Mellanox]
Why I did it
Optimize the version control for Debian packages.
Fix sonic-slave-buster/sources.list.amd64 not found display issue, need to generate the file before running the shell command to evaluate the sonic image tag.
When using the snapshot mirror, it is not necessary to update the version file based on the base image. It will reduce the version dependency issue, when an image is not run when freezing the version.
How I did it
Not to update the version file when snapshot mirror enabled.
How to verify it
#### Why I did it
Update sonic-swss-common submodule pointer to include the following:
* 55fd28a [202205] Non recursive automake and Debian packaging changes (sonic-net/sonic-swss-common#772)
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.
How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.
How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.
Why I did it
This PR is to update the check of IP_TYPE from sonic-acl.yang.
It's because if the ACL rule is added by loading a json file with acl-loader, there is no IP_TYPE for ACL rule. If such rule exists in ACL_RULE table, the GCU (generic config updater) refuses to update any ACL rules because the existing one is invalid.
This PR updates the yang model for ACL. If the IP_TYPE leaf doesn't exist, then we don't check the field.
How I did it
Accept the rule if IP_TYPE is absent.
How to verify it
The change is verified by UT.
Why I did it
We found a bug when pilot, the tag function doesn't remove the ACR domain when do tag, it makes the latest tag not work. And in the original tag function, it calls os.system and os.popen which are not recommend, need to refactor.
How I did it
Do a split("/") when get image_rep to fix the acr domain bug
Refactor the tag function code and add test cases
How to verify it
Check whether container images are tagged as latest when in kube mode.
Why I did it
src/sonic-swss
* bf496d1 - (HEAD -> 202205, origin/202205) [202205] Fixed set admin_status for deleted subintf due to late notification (#2704) (3 days ago) [EdenGri]
src/sonic-swss-common
* ecf60a3 - (HEAD -> 202205, origin/202205) [logger] Add public `restartLogger` to restart logger thread (#768) (2 days ago) [Longxiang Lyu]
Backport #14372 to 202205
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles
How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Add yang model definition for CHASSIS_MODULE define and implemented for sonic chassis. HLD for this configuration is included in https://github.com/sonic-net/SONiC/blob/master/doc/pmon/pmon-chassis-design.md#configurationFixes#12640
How I did it
Added yang model definition, unit tests, sample config and documentation for the table
How to verify it
Validated config tree generation using "pyang -Vf tree -p /usr/local/share/yang/modules/ietf ./yang-models/sonic-voq-inband-interface.yang"
Built the below python-wheels to validate unit tests and other changes
target/python-wheels/bullseye/sonic_yang_mgmt-1.0-py3-none-any.whl
target/python-wheels/bullseye/sonic_yang_models-1.0-py3-none-any.whl
target/python-wheels/bullseye/sonic_config_engine-1.0-py3-none-any.whl
Why I did it
After the renaming of the asic_port_name in port_config.ini file (PR: #13053 ), the asic_ifname in port_config.ini is changed from '-ASIC<asic_id>' to just port. Example: 'Eth0-ASIC0' to 'Eth0'.
However, with this change a config_db generated via config load_minigraph would cause the EVERFLOW and EVERFLOWV6 tables under ACL_TABLE to not have any of non-LAG front panel interfaces. This was causing the EVERFLOW suite to fail.
How I did it
In parse_asic_external_neigbhors in minigraph.py there was a check that the asic_name.lower() (like asic0) is present in the port_alias_asic_map. However with -ASIC removed from the asic_ifname, the port_alias_asic_map would not have the asic_name and thus any non-LAG neighbor would not be included.
Fix was the ignore the asic name change as the port_alias_asic_map is already only looking for ports in just the same asic as asic_name.
How to verify it
Execute "config load_minigraph" with the mingraph which is generated by sonic-mgmt gen-minigraph script. And confirm ono-lag interface are present in the Everfloe table in the config_dbs.
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.
Add debug mode which prints counter to syslog
How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.
Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.
- Why I did it
Healthd check system status every 60 seconds. However, running checker may take several seconds. Say checker takes X seconds, healthd takes (60 + X) seconds to finish one iteration. This implementation makes sonic-mgmt test case not so stable because the value X is hard to predict and different among different platforms. This PR introduces an interval
compensation mechanism to healthd main loop.
- How I did it
Introduces an interval compensation mechanism to healthd main loop: healthd should wait (60 - X) seconds for next iteration
- How to verify it
Manual test
Unit test
Why I did it
Add 'channel' to the CONFIG_DB PORT table. This will be needed to support PORT breakout to multiple channel ports so that Xcvrd can understand which datapath or channel to initialize on the CMIS compliant optics
How I did it
Add 'channel' to the CONFIG_DB PORT table.
How to verify it
Added unit test for valid and invalid channel number
Channel 0 -> No breakout
Channel 1 to 8 -> Breakout channel 1,2, ..8
Signed-off-by: Prince George <prgeor@microsoft.com>
Porting fix https://github.com/sonic-net/sonic-buildimage/pull/14045 to 202205. 202205 doesn't have fast_rate and hence only fallback is updated in yang model
#### Why I did it
Added Missing fields in sonic-portchannel yang model.
"fallback" is present in configuration schema but not in yang model. This leads to traceback when yang is validated
#### How I did it
Updated yang model
#### How to verify it
Added tests to verify
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
#### Link to config_db schema for YANG module changes
Part of the PR
What I did
Check /etc/pam.d/sshd integrity after modify it in hostcfgd.
Why I did it
Found some incident that /etc/pam.d/sshd become empty file during OR upgrade.
How I verified it
Pass all UT.
Add new UT to cover new code.
Why I did it
Porting/cherry-pick PR sonic-net/sonic-host-services#46
"show reboot-cause history" shows empty history. When the previous-reboot-cause has a broken symlink, And rebooting the system will not be able to generate a new symlink of the new previous-reboot-cause.
admin@sonic:~$ show reboot-cause history
Name Cause Time User Comment
------ ------- ------ ------ ---------
How I did it
Somehow, when the symlink file /host/reboot-cause/previous-reboot-cause is broken (which its destination files doesn't exist in this case), the current condition check "if os.path,exists(PREVIOUS_REBOOT_CAUSE_FILE)" will return False in determine-reboot-cause script. Hence, the current previous-reboot-cause is not been removed and the recreation of the new previous-reboot-cause failed. In case of previous-reboot-cause is a broken synlink file, add condition os.path.islink(PREVIOUS_REBOOT_CAUSE) to check and allow the remove operation happens.
How to verify it
Manually make the /host/reboot-cause/previous-reboot-cause to be a broken symlink file by removing its destination file
reboot the system. "show reboot-cause history" should show the correct info
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
[Build] Support to use loosen version when failed to install python packages
It is to fix the issue #14012
How I did it
Try to use the installation command without constraint
How to verify it
To support 64 cores on arista skus. Fixesaristanetworks/sonic#77
Remapped recycle ports to lowers core port ids and set appl_param_nof_ports_per_modid to 64.
Co-authored-by: Sambath Kumar Balasubramanian <63021927+skbarista@users.noreply.github.com>
What I did:
Added IP Table rule to make sure we do not drop chassis internal traffic on eth1-midpplane when Control Plane ACL's are installed.
Why I did:
When Control Plane ACL's are installed there is default Catch All rule is added to drop all traffic that is not white-listed explicitly https://github.com/sonic-net/sonic-host-services/blob/master/scripts/caclmgrd#L735. In this case Internal Traffic between Supervisor and LC will get drop. To fix this added explicit rule to allow all traffic coming from eth1-midplane.
Fixes#11873.
When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.
In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.
system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.
How I did it
Port container_checker logic from #11442 into service_checker for system-health.
How to verify it
Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.
linkmgrd:
* d2227d8 2023-02-22 | [active-standby] Toggle to standby if link down and config auto (#173) (HEAD -> 202205) [Longxiang Lyu]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Update sonic-swss submodule pointer to include the following:
* 3aeb4be Align watermark flow with port configuration ([#2672](https://github.com/sonic-net/sonic-swss/pull/2672))
Signed-off-by: dprital <drorp@nvidia.com>
#### Why I did it
Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:
```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```
A wrong file name is generated (e.g database.**servcee**).
#### How I did it
Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).
#### How to verify it
Perform first boot and observe services start immidiatelly after boot.
- Why I did it
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState'. The error log is only seen on shutdown flow such as fast-reboot/warm-reboot.
In shutdown flow, 'LoadState' might not be available in systemctl status output, using [] might cause a KeyError.
- How I did it
Use dict.get instead of []
- How to verify it
Manual test
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Added SONiC YANG model for RADIUS.
How I did it
Added the RADIUS and RADIUS_SERVER tables for global and per RADIUS server configuration. RADIUS statistics reside in COUNTERS_DB and are not part of the configuration. These are not a part of this PR.
How to verify it
Compiled sonic_yang_mgmt-1.0-py3-none-any.whl.
Why I did it
Cherry pick from #13097
[Build] Support Debian snapshot mirror to improve build stability
It is to enhance the reproducible build, supports the Debian snapshot mirror. It guarantees all the docker images using the same Debian mirror snapshot and fixes the temporary build failure which is caused by remote Debain mirror indexes changed during the build. It is also to fix the version conflict issue caused by no fixed versions of some of the Debian packages.
How I did it
Add a new feature to support the Debian snapshot mirror.
How to verify it
- Why I did it
Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8.
- How I did it
Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back.
To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches.
On uplink ports, map DSCP 2/6 to TC 2/6 respectively
On downlink ports, both DSCP 2/6 are still mapped to TC 1
Buffer adjusted according to the ports information:
Mellanox-SN4600c-C64:
56 downlinks 50G + 8 uplinks 100G
Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8:
24 downlinks 50G + 8 uplinks 100G
- How to verify it
Unit test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
* Enable marvell-armhf saiserver docker
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* fix libsaithriift build env
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* fix thrift 014 dependent issue in armhf
* fix build env
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* [sai_ptf]fix thrift armhf build
in armhf buidl failed as no python command
how
add a checker for different python command, python/python3 and base on result use the right command
verify
container build
* [Thrift_014[armhf]]Fix libboost_unit_test_framework.a not found during build
Why
error happen build thirft in armhf
How
fix this issue, add a soft link for the dependent file
Verify
Build pipeline
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* add metadata dependence
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* change build pipeline
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
---------
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
Why I did it
golang lib xmlquery v1.2.1 has critical security issue. MS Component Governance created an alert.
Update submodule HEAD to fix CG alert about CVE-2020-25614
How I did it
sonic-mgmt-framework
a72d9ee Fix CG alert CVE-2020-25614 about xmlquery v1.2.1 (#91)
sonic-telemetrey
727aefd Fix CG alert CVE-2020-25614 about xmlquery v1.2.1 (#107)
utilities:
* c63a62b 2023-01-23 | [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622) (HEAD -> 202205) [Jing Zhang]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct
Correct number of tx increases
Tx does not increase when ToR is on standby