Why I did it
This PR is to update the check of IP_TYPE from sonic-acl.yang.
It's because if the ACL rule is added by loading a json file with acl-loader, there is no IP_TYPE for ACL rule. If such rule exists in ACL_RULE table, the GCU (generic config updater) refuses to update any ACL rules because the existing one is invalid.
This PR updates the yang model for ACL. If the IP_TYPE leaf doesn't exist, then we don't check the field.
How I did it
Accept the rule if IP_TYPE is absent.
How to verify it
The change is verified by UT.
Why I did it
We found a bug when pilot, the tag function doesn't remove the ACR domain when do tag, it makes the latest tag not work. And in the original tag function, it calls os.system and os.popen which are not recommend, need to refactor.
How I did it
Do a split("/") when get image_rep to fix the acr domain bug
Refactor the tag function code and add test cases
How to verify it
Check whether container images are tagged as latest when in kube mode.
Why I did it
src/sonic-swss
* bf496d1 - (HEAD -> 202205, origin/202205) [202205] Fixed set admin_status for deleted subintf due to late notification (#2704) (3 days ago) [EdenGri]
src/sonic-swss-common
* ecf60a3 - (HEAD -> 202205, origin/202205) [logger] Add public `restartLogger` to restart logger thread (#768) (2 days ago) [Longxiang Lyu]
Backport #14372 to 202205
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles
How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Add yang model definition for CHASSIS_MODULE define and implemented for sonic chassis. HLD for this configuration is included in https://github.com/sonic-net/SONiC/blob/master/doc/pmon/pmon-chassis-design.md#configurationFixes#12640
How I did it
Added yang model definition, unit tests, sample config and documentation for the table
How to verify it
Validated config tree generation using "pyang -Vf tree -p /usr/local/share/yang/modules/ietf ./yang-models/sonic-voq-inband-interface.yang"
Built the below python-wheels to validate unit tests and other changes
target/python-wheels/bullseye/sonic_yang_mgmt-1.0-py3-none-any.whl
target/python-wheels/bullseye/sonic_yang_models-1.0-py3-none-any.whl
target/python-wheels/bullseye/sonic_config_engine-1.0-py3-none-any.whl
Why I did it
After the renaming of the asic_port_name in port_config.ini file (PR: #13053 ), the asic_ifname in port_config.ini is changed from '-ASIC<asic_id>' to just port. Example: 'Eth0-ASIC0' to 'Eth0'.
However, with this change a config_db generated via config load_minigraph would cause the EVERFLOW and EVERFLOWV6 tables under ACL_TABLE to not have any of non-LAG front panel interfaces. This was causing the EVERFLOW suite to fail.
How I did it
In parse_asic_external_neigbhors in minigraph.py there was a check that the asic_name.lower() (like asic0) is present in the port_alias_asic_map. However with -ASIC removed from the asic_ifname, the port_alias_asic_map would not have the asic_name and thus any non-LAG neighbor would not be included.
Fix was the ignore the asic name change as the port_alias_asic_map is already only looking for ports in just the same asic as asic_name.
How to verify it
Execute "config load_minigraph" with the mingraph which is generated by sonic-mgmt gen-minigraph script. And confirm ono-lag interface are present in the Everfloe table in the config_dbs.
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.
Add debug mode which prints counter to syslog
How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.
Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.
- Why I did it
Healthd check system status every 60 seconds. However, running checker may take several seconds. Say checker takes X seconds, healthd takes (60 + X) seconds to finish one iteration. This implementation makes sonic-mgmt test case not so stable because the value X is hard to predict and different among different platforms. This PR introduces an interval
compensation mechanism to healthd main loop.
- How I did it
Introduces an interval compensation mechanism to healthd main loop: healthd should wait (60 - X) seconds for next iteration
- How to verify it
Manual test
Unit test
Why I did it
Add 'channel' to the CONFIG_DB PORT table. This will be needed to support PORT breakout to multiple channel ports so that Xcvrd can understand which datapath or channel to initialize on the CMIS compliant optics
How I did it
Add 'channel' to the CONFIG_DB PORT table.
How to verify it
Added unit test for valid and invalid channel number
Channel 0 -> No breakout
Channel 1 to 8 -> Breakout channel 1,2, ..8
Signed-off-by: Prince George <prgeor@microsoft.com>
Porting fix https://github.com/sonic-net/sonic-buildimage/pull/14045 to 202205. 202205 doesn't have fast_rate and hence only fallback is updated in yang model
#### Why I did it
Added Missing fields in sonic-portchannel yang model.
"fallback" is present in configuration schema but not in yang model. This leads to traceback when yang is validated
#### How I did it
Updated yang model
#### How to verify it
Added tests to verify
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
#### Link to config_db schema for YANG module changes
Part of the PR
What I did
Check /etc/pam.d/sshd integrity after modify it in hostcfgd.
Why I did it
Found some incident that /etc/pam.d/sshd become empty file during OR upgrade.
How I verified it
Pass all UT.
Add new UT to cover new code.
Why I did it
Porting/cherry-pick PR sonic-net/sonic-host-services#46
"show reboot-cause history" shows empty history. When the previous-reboot-cause has a broken symlink, And rebooting the system will not be able to generate a new symlink of the new previous-reboot-cause.
admin@sonic:~$ show reboot-cause history
Name Cause Time User Comment
------ ------- ------ ------ ---------
How I did it
Somehow, when the symlink file /host/reboot-cause/previous-reboot-cause is broken (which its destination files doesn't exist in this case), the current condition check "if os.path,exists(PREVIOUS_REBOOT_CAUSE_FILE)" will return False in determine-reboot-cause script. Hence, the current previous-reboot-cause is not been removed and the recreation of the new previous-reboot-cause failed. In case of previous-reboot-cause is a broken synlink file, add condition os.path.islink(PREVIOUS_REBOOT_CAUSE) to check and allow the remove operation happens.
How to verify it
Manually make the /host/reboot-cause/previous-reboot-cause to be a broken symlink file by removing its destination file
reboot the system. "show reboot-cause history" should show the correct info
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
[Build] Support to use loosen version when failed to install python packages
It is to fix the issue #14012
How I did it
Try to use the installation command without constraint
How to verify it
To support 64 cores on arista skus. Fixesaristanetworks/sonic#77
Remapped recycle ports to lowers core port ids and set appl_param_nof_ports_per_modid to 64.
Co-authored-by: Sambath Kumar Balasubramanian <63021927+skbarista@users.noreply.github.com>
What I did:
Added IP Table rule to make sure we do not drop chassis internal traffic on eth1-midpplane when Control Plane ACL's are installed.
Why I did:
When Control Plane ACL's are installed there is default Catch All rule is added to drop all traffic that is not white-listed explicitly https://github.com/sonic-net/sonic-host-services/blob/master/scripts/caclmgrd#L735. In this case Internal Traffic between Supervisor and LC will get drop. To fix this added explicit rule to allow all traffic coming from eth1-midplane.
Fixes#11873.
When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.
In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.
system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.
How I did it
Port container_checker logic from #11442 into service_checker for system-health.
How to verify it
Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.
linkmgrd:
* d2227d8 2023-02-22 | [active-standby] Toggle to standby if link down and config auto (#173) (HEAD -> 202205) [Longxiang Lyu]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Update sonic-swss submodule pointer to include the following:
* 3aeb4be Align watermark flow with port configuration ([#2672](https://github.com/sonic-net/sonic-swss/pull/2672))
Signed-off-by: dprital <drorp@nvidia.com>
#### Why I did it
Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:
```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```
A wrong file name is generated (e.g database.**servcee**).
#### How I did it
Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).
#### How to verify it
Perform first boot and observe services start immidiatelly after boot.
- Why I did it
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState'. The error log is only seen on shutdown flow such as fast-reboot/warm-reboot.
In shutdown flow, 'LoadState' might not be available in systemctl status output, using [] might cause a KeyError.
- How I did it
Use dict.get instead of []
- How to verify it
Manual test
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Added SONiC YANG model for RADIUS.
How I did it
Added the RADIUS and RADIUS_SERVER tables for global and per RADIUS server configuration. RADIUS statistics reside in COUNTERS_DB and are not part of the configuration. These are not a part of this PR.
How to verify it
Compiled sonic_yang_mgmt-1.0-py3-none-any.whl.
Why I did it
Cherry pick from #13097
[Build] Support Debian snapshot mirror to improve build stability
It is to enhance the reproducible build, supports the Debian snapshot mirror. It guarantees all the docker images using the same Debian mirror snapshot and fixes the temporary build failure which is caused by remote Debain mirror indexes changed during the build. It is also to fix the version conflict issue caused by no fixed versions of some of the Debian packages.
How I did it
Add a new feature to support the Debian snapshot mirror.
How to verify it