#### Why I did it
src/sonic-host-services
```
* 6767bc7 - (HEAD -> master, origin/master, origin/HEAD) [FeatureD] Move the Feature related config from Hostcfgd into a new daemon (#71) (6 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
8111 800G interface, split to 2x400G (each has 4 lanes) fails to change interface speed from 400G to 100G during deploy mg. In minigraph.xml, the interface speed configuration is good, but fails to generate the right value to config_db.json.
In order to support this SKU the speed transitioning should support both 4 lanes and 8 lanes in the port_config.ini.
Why I did it
before this change for a 400G to 100G transition, in all cases except when lanes are 8, we would continue and the line
ports.setdefault(port_name, {})['speed'] = port_speed_png[port_name]
would not be executed, hence the default speed will never be set for a case and config_db will not be updated,
where speed is transitioning from 400G to 100G or 40G, but lanes are not equal to 8.
In order for those cases to pass where lanes are not specifically 8, we need the change
Work item tracking
24242657
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
#### Why I did it
When a kernel crash occurs, the system will reboot to the kdump capture kernel if kdump is enabled (`config kdump enable`). In the kdump capture boot, it only stores the crash information, and then reboot the system to a normal boot.
In this boot, no SONiC service is started but it invokes `reboot` which is actually the SONiC reboot that depends on SONiC services. There is a logic to skip all SONiC stuff and invoke platform reboot in SONiC reboot to avoid issues.
However, on Nvidia platforms, the platform reboot still depends on SONiC services, which can cause issues.
So, the Debian reboot is called directly in platform reboot if it is invoked from the kdump capture boot.
#### How I did it
Manual test
### Why I did it
- Hostcfgd is handling a lot of tasks and Feature table is by itself an important and big task which can benefit from separation into a new daemon
- Currently, Hostcfgd handles feature table first before other tables an thus other taska such as Aaa, Ntp are delayed. With the split, they can run in paralell
- After the recent config-reload enhancements, Hostcfgd uses a multi-threading approach to listen to PortInitDone. BY splitting the daemon into two, we can avoid having a separate thread by using SubscriberStateTable and Select,.
#### Note:
Depends on host-services PR : https://github.com/sonic-net/sonic-host-services/pull/71
Once the host-services is merged, updating the submodule along with this PR should fix the CI problem
#### How I did it
Refactor the feature related tasks from hostcfgd into a seperate daemon.
#### How to verify it
UT's and Tested on DUT
```
admin@r-tigris-22:~$ show logging -f | grep featured
Jun 28 22:13:33.870021 r-tigris-22 INFO featured: ConfigDB connect success
Jun 28 22:14:05.638063 r-tigris-22 INFO featured: Updating feature 'radv' systemd config file related to auto-restart ...
Jun 28 22:14:06.169184 r-tigris-22 INFO featured: Feature radv is enabled and started
Jun 28 22:14:06.172343 r-tigris-22 INFO featured: Updating feature 'sflow' systemd config file related to auto-restart ...
Jun 28 22:14:06.844322 r-tigris-22 INFO featured: Feature sflow is stopped and disabled
Jun 28 22:14:06.846761 r-tigris-22 INFO featured: Updating feature 'snmp' systemd config file related to auto-restart ...
Jun 28 22:14:07.129090 r-tigris-22 INFO featured: Feature is snmp delayed for port init
Jun 28 22:14:07.132052 r-tigris-22 INFO featured: Updating feature 'swss' systemd config file related to auto-restart ...
Jun 28 22:14:08.368948 r-tigris-22 INFO featured: Feature swss is enabled and started
Jun 28 22:14:08.369240 r-tigris-22 INFO featured: Updating feature 'syncd' systemd config file related to auto-restart ...
Jun 28 22:14:08.718357 r-tigris-22 INFO featured: Feature syncd is enabled and started
Jun 28 22:14:08.721496 r-tigris-22 INFO featured: Updating feature 'teamd' systemd config file related to auto-restart ...
Jun 28 22:14:09.042495 r-tigris-22 INFO featured: Feature teamd is enabled and started
Jun 28 22:14:09.045441 r-tigris-22 INFO featured: Updating feature 'telemetry' systemd config file related to auto-restart ...
Jun 28 22:14:09.359831 r-tigris-22 INFO featured: Feature is telemetry delayed for port init
Jun 28 22:14:30.740499 r-tigris-22 INFO featured: Updating delayed features after port initialization
Jun 28 22:14:33.914178 r-tigris-22 INFO featured: Feature lldp is enabled and started
Jun 28 22:14:35.536264 r-tigris-22 INFO featured: Feature mgmt-framework is enabled and started
Jun 28 22:14:38.098571 r-tigris-22 INFO featured: Feature snmp is enabled and started
Jun 28 22:14:39.555727 r-tigris-22 INFO featured: Feature telemetry is enabled and started
Jun 28 22:13:33.977011 r-tigris-22 INFO hostcfgd: ConfigDB connect success
Jun 28 22:13:33.993878 r-tigris-22 INFO hostcfgd: Waiting for systemctl to finish initialization
Jun 28 22:13:34.274818 r-tigris-22 INFO hostcfgd: systemctl has finished initialization -- proceeding ...
Jun 28 22:13:34.391623 r-tigris-22 INFO hostcfgd: file size check pass: /etc/pam.d/sshd size is (2139) bytes
Jun 28 22:13:34.427273 r-tigris-22 INFO hostcfgd: file size check pass: /etc/pam.d/login size is (4132) bytes
Jun 28 22:13:34.433390 r-tigris-22 INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes
Jun 28 22:13:34.455110 r-tigris-22 INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes
Jun 28 22:13:34.478882 r-tigris-22 INFO hostcfgd: Found audisp-tacplus PID: 442
Jun 28 22:13:34.482365 r-tigris-22 INFO hostcfgd: cmd - ['service', 'aaastatsd', 'stop']
Jun 28 22:13:36.108569 r-tigris-22 INFO hostcfgd: NtpCfg load ...
Jun 28 22:13:36.108699 r-tigris-22 INFO hostcfgd: ntp server update key 0
Jun 28 22:13:36.108763 r-tigris-22 INFO hostcfgd: ntp server update, restarting ntp-config, ntp servers configured set()
Jun 28 22:14:06.691693 r-tigris-22 INFO hostcfgd: KdumpCfg init ...
Jun 28 22:14:06.691771 r-tigris-22 DEBUG hostcfgd: passw_policies_update - key: POLICIES
Jun 28 22:14:06.691832 r-tigris-22 DEBUG hostcfgd: passw_policies_update - data: {'digits_class': 'true', 'expiration': '180', 'expiration_warning': '15', 'history_cnt': '10', 'len_min': '8', 'lower_class': 'true', 'reject_user_passw_match': 'true', 'special_class': 'true', 'state': 'disabled', 'upper_class': 'true'}
Jun 28 22:14:06.691891 r-tigris-22 DEBUG hostcfgd: modify_conf_file: passw_policies - {'digits_class': True, 'expiration': '180', 'expiration_warning': '15', 'history_cnt': '10', 'len_min': '8', 'lower_class': True, 'reject_user_passw_match': True, 'special_class': True, 'state': 'disabled', 'upper_class': True}
Jun 28 22:14:06.701982 r-tigris-22 DEBUG hostcfgd: Initial hostname: r-tigris-22
Jun 28 22:14:06.702075 r-tigris-22 DEBUG hostcfgd: Initial mgmt interface conf: {('eth0', '10.210.24.108/22'): {'gwaddr': '10.210.24.1'}}
Jun 28 22:14:06.702115 r-tigris-22 DEBUG hostcfgd: Initial mgmt VRF state:
Jun 28 22:14:06.702177 r-tigris-22 INFO hostcfgd: RSyslogCfg: Initial config: {'config': {'GLOBAL': {'rate_limit_burst': '0', 'rate_limit_interval': '0'}}, 'servers': {}}
Jun 28 22:14:06.709455 r-tigris-22 INFO hostcfgd[39326]: Failed to restart resolv-config.service: Unit resolv-config.service not found.
Jun 28 22:14:06.709560 r-tigris-22 ERR hostcfgd: ['systemctl', 'restart', 'resolv-config'] - failed: return code - 5, output:#012None
admin@r-tigris-22:~$ Connection to r-tigris-22 closed by remote host.
```
- Why I did it
Increase UT coverage for Nvidia platform API code
Work item tracking
Microsoft ADO (number only):
- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py
- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
- Why I did it
Added the fwtrace config files in order to be able to call the mlxstrace utility during the show techsupport dump.
Work item tracking
Microsoft ADO (number only):
- How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.
- How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
#### Why I did it
src/sonic-gnmi
```
* c548cc2 - (HEAD -> master, origin/master, origin/HEAD) Support empty protobytes (#141) (2 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* cd882cc8 - (HEAD -> master, origin/master, origin/HEAD) Input check for timeout in generate_dump (#2925) (4 hours ago) [ycoheNvidia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* [E1031] add platform specific reboot command support
Why I did it
E1031: add platform specific cold reboot support
How I did it
Use the CPLD to trigger board level power cycle when cold reboot
How to verify it
Do reboot stress test and check the reboot cause history
* [E1031] try to umount filesystem before power cycle reboot
* [E1031] remove fstrim in customized reboot script
#### Why I did it
src/sonic-gnmi
```
* 58a7b20 - (HEAD -> master, origin/master, origin/HEAD) Add delete field to On change response when key is deleted (#139) (8 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Enable rpc target in PR checker to avoid build break for rpc target.
Work item tracking
Microsoft ADO (number only): 24708372
How I did it
How to verify it
#### Why I did it
src/sonic-utilities
```
* a56b11b6 - (HEAD -> master, origin/master, origin/HEAD) revert unit test tests/test_clear_tag (#2934) (10 hours ago) [Mai Bui]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Use remote PR test template from sonic-mgmt master to run PR test.
How I did it
Modify PR test azure pipeline yml file.
How to verify it
PR test executing normally.
Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
Why I did it
The protoc-dev library with the wrong declaration.
Work item tracking
Microsoft ADO (number only): 24707066
How I did it
Revise the wrong declaration from:
PROTOC = libprotoc_$(PROTOBUF_VERSION_FULL)_$(CONFIGURED_ARCH).deb to PROTOC_DEV = libprotoc-dev$(PROTOBUF_VERSION_FULL)_$(CONFIGURED_ARCH).deb
How to verify it
Check Azp log error.
Why I did it
Support default DNS configuration
How I did it
Use j2 template to generate default DNS configuration.
How to verify it
Run sonic-config-engine unit test.
This reverts commit e0927e28af.
Why I did it
Reverts #15720
It breaks build for target/debs/bullseye/syncd_1.0.0_amd64.deb
make[2]: Entering directory '/sonic/src/sonic-sairedis'
dh_install
# Note: escape with an extra symbol
if [ -f debian/syncd-rpc/usr/bin/syncd_init_common.sh ] ; then
/bin/sh: 1: Syntax error: end of file unexpected (expecting "fi")
make[2]: *** [debian/rules:65: override_dh_install] Error 2
make[2]: Leaving directory '/sonic/src/sonic-sairedis'
make[1]: *** [debian/rules:51: binary] Error 2
make[1]: Leaving directory '/sonic/src/sonic-sairedis'
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2
Work item tracking
Microsoft ADO (number only): 24691535
How I did it
How to verify it
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Midstone platform has compilation error in master branch, fixed the same.
How I did it
Due to bullseye migration i2c_new_dummy API is deprecated modified with i2c_new_dummy_device.
How to verify it
Verified target/debs/bullseye/platform-modules-midstone-200i_0.2.2_amd64.deb is generated
Co-authored-by: Kannan Selvaraj <skannan@celestica.com>
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
#### Why I did it
src/sonic-gnmi
```
* 2c8e4ab - (HEAD -> master, origin/master, origin/HEAD) Support proto encoding (#140) (22 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
Tacplus package has missed cache configuration
#### How I did it
Defined cache configuration for tacplus package
#### How to verify it
Build image with cache enabled and make sure you don't see any warnings related to tacplus
Why I did it
sonic-host-services depends on sonic-utilities because of FIPS feature.
Add dependency to unblock submodule sonic-host-services HEAD pointer update.
Work item tracking
Microsoft ADO (number only): 24671218
How I did it
Why I did it
Support FIPS DB configuration
Design Doc: sonic-net/SONiC#1372
Work item tracking
Microsoft ADO (number only): 24411148
How I did it
Add the FIPS Yang model to make FIPS configurable in ConfigDB.
How to verify it
See TestPlan: sonic-net/sonic-mgmt#9092
Build the image and run the tests: sonic-net/sonic-mgmt#9091
#### Why I did it
src/linkmgrd
```
* aa902a3 - (HEAD -> master, origin/master, origin/HEAD) [link prober] Increase pause/restart probe log verbosity (#213) (3 days ago) [Longxiang Lyu]
* 736cdda - [active-standby] Write `unhealthy` is default route `N/A` (#214) (3 days ago) [Longxiang Lyu]
* e923e15 - Add ADO to the PR template (#215) (4 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* ce8f642 - (HEAD -> master, origin/master, origin/HEAD) [vs] Use boost join to concatenate switch types in config (#1266) (6 days ago) [Kamil Cudnik]
* d6055a2 - [vslib]: Temporaily map DPU switch type to NVDA_MBF2H536C (#1259) (13 days ago) [prabhataravind]
* e1cdb4d - [CodeQL]: Use dependencies with relevant versions in azp template. (#1262) (3 weeks ago) [Nazarii Hnydyn]
* c08f9a2 - [CI]: Fix collect log error in azp template. (#1260) (3 weeks ago) [Nazarii Hnydyn]
* eed856c - [CodeQL]: Fix syncd compilation in azp template. (#1261) (3 weeks ago) [Nazarii Hnydyn]
* a3f1f1a - Reland 'Make changes to building and packaging sairedis (#1116)' (#1194) (3 weeks ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
https://github.com/sonic-net/sonic-utilities/pull/472 Added SNMP_AGENT_ADDRESS_CONFIG table in config db.
This PR is to add corresponding YANG model for that table.
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Added YANG modesl for SNMP_AGENT_ADDRESS_CONFIG.
keys: agent_ip, port number, vrf.
CLI implementaion checks if agent_ip, port number already exists in CONFIG_DB table, if it does, then new entry is not added.
So added another condition to ensure combination of agent_ip and port is unique.
Below is an example of how data looks like in DB:
```
127.0.0.1:6379[4]> HGETALL "SNMP_AGENT_ADDRESS_CONFIG|10.1.1.1|161|foo"
1) "NULL"
2) "NULL"
127.0.0.1:6379[4]> HGETALL "SNMP_AGENT_ADDRESS_CONFIG|10.1.0.32|161|"
1) "NULL"
2) "NULL"
```
#### How to verify it
Added unit-test for various combinations and ensures that it passes.
Why I did it
get_system_mac was returning 'None' mac for system without eeprom.
get_system_mac for marvell platform checks for mac in eeprom, profile.ini(hwsku file) and eth0. Check for valid mac returned by syseeprom was incorrect. Which was resulting in bypassing mac get from profile.ini and eth0.
How I did it
get_system_mac already has a logic to get first valid mac.
Removed null check for mac returned by eeprom.
Corrected the check for profile.ini file by checking if file exist.
How to verify it
Executed sonic-cfggen to check valid mac address is getting configured in config_db.json with/without profile.ini.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Line:7 will exit when k8s file didn't change.
Use 'System.PullRequest.TargetBranchName' instead of 'System.PullRequest.TargetBranch'. Because git server in AzDevOps don't support 'System.PullRequest.TargetBranch'.
Work item tracking
Microsoft ADO (number only): 24636791
How I did it
How to verify it
This submodule update needs to be manually done due to build changes
done in the sairedis submodule. Specifically, Debian build profiles are
now being used instead of dpkg build targets, and dbgsym packages are
being used instead of dbg packages. Because of this, there needs to be
changes on the sonic-buildimage side for this.
This submodule update brings in the following changes:
ce8f642 [vs] Use boost join to concatenate switch types in config (#1266)
d6055a2 [vslib]: Temporaily map DPU switch type to NVDA_MBF2H536C (#1259)
e1cdb4d [CodeQL]: Use dependencies with relevant versions in azp template. (#1262)
c08f9a2 [CI]: Fix collect log error in azp template. (#1260)
eed856c [CodeQL]: Fix syncd compilation in azp template. (#1261)
a3f1f1a Reland 'Make changes to building and packaging sairedis (#1116)' (#1194)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Currently, k8s master image is generated from a separate branch which we created by ourselves, not release ones. We need to commit these k8s master related code to master branch for a better way to do k8s master image build out.
Work item tracking
Microsoft ADO (number only):
19998138
How I did it
Install k8s dashboard docker images
Install geneva mds and mdsd and fluentd docker images and tag them as latest, tagging latest will help create container always with the latest version
Install azure-storage-blob and azure-identity, this will help do etcd backup and restore.
Install kubernetes python client packages, this will help read worker and container state, we can send these metric to Geneva.
Remove mdm debian package, will replace it with the mdm docker image
Add k8s master entrance script, this script will be called by rc-local service when system startup. we have some master systemd services in compute-move repo, when VMM service create master VM, VMM will copy all master service files inside VM, the entrance script will setup all services according to the service files.
When the entrance script content changed, the PR build will set include_kubernetes_master=y to help do validation for k8s master related code change. The default value of include_kubernetes_master should be always n for public master branch. We will generate master image from internal master branch
How to verify it
Build with INCLUDE_KUBERNETES_MASTER = y
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "delayed=False". However, we know that PMON has a timer now. So, I try to fix it here.