Why I did it
SONiC Mgmt test syslog/test_syslog_rate_limit.py syslog.test_syslog_rate_limit test_syslog_rate_limit was failing on SKUs with gbsyncd. This includes Arista 720DT when testing on the 202305 branch.
How I did it
The issue was no value for gbsyncd in "show syslog rate-limit-container",
because gbsyncd is not having a SYSLOG_CONFIG_FEAGTURE|gbsyncd entry in
config_db, which is further because gbsyncd feature is for not enabled
through init_cfg.json.j2.
How to verify it
Test is now passing on 720DT in 202305 branch.
Co-authored-by: Boyang Yu <byu@arista.com>
These changes provide for the automatic shutdown of NIF ports on LC when an ungraceful reboot scenario occurs. Reboot and panic notifier hooks are now registered so that callback occurs from the kernel and NIF ports are subsequently shut down.
Why I did it
To facilitate the timely movement of traffic away from a crashed LC when its peers recognize that the associated links have gone down.
How I did it
Linux kernel reboot and panic notifier hooks are used to register a callback routine that, when invoked, stuffs all present transceiver modules into reset.
How to verify it
Cause an ungraceful reboot (whether via /usr/sbin/reboot or by causing a kernel panic) and verify that all LC native NIF links are brought down at reboot/panic time (on the way down). It may be necessary to monitor the LC link peer(s) in order to verify in real-time.
Added YANG related changes for adding `dom_polling` field in PORT table of CONFIG_DB. This field can be set with `config interface transceiver dom PORT_NAME (enable|disable)` CLI.
The `dom_polling` field was added through https://github.com/sonic-net/sonic-utilities/pull/3187. Please refer to this PR for the details on the reason for adding `dom_polling` field.
Added `dom_polling` field to CONFIG_DB PORT table.
Added unit tests for both valid and invalid options for controlling `dom_polling`.
Valid values for for `dom_polling` are `enabled` and `disabled`
Any other value is treated as an invalid value
- Why I did it
The field 'subport' represents the index of the split port within a physical port. For example, if a port is split into 4, the subport of the first logical port is 1, the subport of the second logical port is 2, and so on.
In xcvrd, the CMIS manager uses the subport to calculate the lane mask, which is used to control the data path per lane. In Nvidia platform, the subport is missing and is always set to 0. According to the xcvrd code, when subport=0, it will always correspond to the first logical port. Therefore, if we shut down any logical port that is not the first one, we will see the operational status of the first logical port also becomes down.
This PR aims to add the subport field to CONFIG DB and prevent such scenarios. This is applicable only for static default breakout mode. For DPB, subport calculation will happen on the fly (changes are not in Sonic yet).
(Subport HLD: HLD of subport: [link to the HLD document])
- How I did it
I have added the 'subport' field to all relevant Nvidia hwsku.json files (minigraph generation is based on them). Additionally, I introduced the new 'subport' field to portconfig.py, so that sonic-cfggen will be able to generate the minigraph with it. In this file, I also fixed an error that caused all attributes from hwsku.json to be applied only to the first logical ports associated with a physical port.
Furthermore, I updated hwsku_json_checker to include the new field and applied a fix to the sample_hwsku.json file. sample_hwsku.json is the file that sonic-config-engine's unit tests rely on for its tests. Previously, it only included attributes for the first logical port of a split physical port. For example, if Ethernet4, a 4-lane port, was split into 2 ports, then sample_hwsku.json included only the entry for Ethernet4, with no entry for Ethernet6. This misalignment with the structure of other hwsku.json files has been corrected as well.
- How to verify it
Ensure that each logical port has the correct value of 'subport' in CONFIG DB, and that shutting down a logical port affects only that port and not other ports in the split.
* [Mellanox] Support DSCP remapping in Dual-ToR topo for SN4700-O8V48, update buffers for t0
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
* [Mellanox] Support DSCP remapping in Dual-ToR topo for SN4700-O8V48, update buffers for t0 (fixes after recalculation)
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
---------
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
* Fix debug package variables for syncd
PR #16072 renamed the debug package variables from `*_DBG` to
`*_DBGSYM`, since the package names had changed. However, the references
weren't updated. Since all the other debug packages (including ones that
are named `*-dbgsym`) use `*_DBG`, just use that here as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Update sairedis.mk as well
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-platform-daemons
```
* 395d8d7 - (HEAD -> 202311, origin/202311) Enable periodic polling of TRANSCEIVER_FIRMWARE_INFO table in DomInfoUpdateTask (#443) (22 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* a037503 - (HEAD -> 202311, origin/202311) arm64: dts: marvell: Add DTS for 7215-IXS-A1 board (#379) (6 days ago) [Pavan-Nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* c68f4b16 - (HEAD -> 202311, origin/202311) Skip the validation of action in acl-loader if capability table in STATE_DB is empty (#3199) (3 days ago) [bingwang-ms]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* ba250d8 - (HEAD -> 202311, origin/202311) Combine psu presence/status update with data update (#424) (2 days ago) [Yuanzhe]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* dd4810b1 - (HEAD -> 202311, origin/202311) Set HOST_TX_READY_NOTIFY attribute only after query capabilities(#3070) (2 days ago) [noaOrMlnx]
```
#### How I did it
#### How to verify it
#### Description for the changelog
If encountered a line without RequiredBy or WantedBy the code passes uninitialized pointer to get_install_targets_from_line(). Where it can fail with segfault or silently pass randomly.
- Why I did it
Uninitialized target_suffix is passed to get_install_targets_from_line() when other fields are present in [Install] section, like this:
root@sonic:/home/admin# systemctl cat ntpsec
...
[Install]
Alias=ntp.service
Alias=ntpd.service
WantedBy=multi-user.target
- How I did it
Initialize target_suffix with NULL, put an assert in get_install_targets_from_line(). Edited test to cover this scenario.
- How to verify it
UT and on the switch.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.
How I did it
Setting loglevel attribute in crashkernel cmdline
How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
Why I did it
Fix the build broken issue:
Processing /sonic_host_services-1.0-py3-none-any.whl
Requirement already satisfied: dbus-python in /usr/lib/python3/dist-packages (from sonic-host-services==1.0) (1.2.16)
Requirement already satisfied: systemd-python in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (235)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (3.1.2)
Collecting PyGObject (from sonic-host-services==1.0)
Downloading pygobject-3.48.0.tar.gz (714 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 714.2/714.2 kB 13.1 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'error'
error: subprocess-exited-with-error
Work item tracking
Microsoft ADO (number only): 27124786
How I did it
Install the pygobject before installing the sonic_host_services.
If installing during the .,whl, it will try to install the latest version (3.48.0), then it will have an issue. Prefer to use the version 3.46.0, see
sonic-buildimage/files/build/versions/host-image/versions-py3
Line 55 in a6437d8
pygobject==3.46.0
It will not add a new package, only install the depended packages firstly.
#### Why I did it
src/sonic-utilities
```
* 9d5dacab - (HEAD -> 202311, origin/202311) CLI to skip polling for periodic information for a port in DomInfoUpdateTask thread (#3187) (4 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
pkgs.k8s.io: Introducing Kubernetes Community-Owned Package Repositories | Kubernetes
For 1.22.2 k8s packages, source repo has been deprecated, going to store these packages in sonic build storage for installation to mitigate the issue. Will migrate to new repo when we are ready to upgrade k8s version.
Work item tracking
Microsoft ADO (number only): 27075924
How I did it
Store the 1.22.2 k8s package in sonic build storage and install the package there.
How to verify it
"apt list" to check if it's installed.
#### Why I did it
src/sonic-swss
```
* dd1432a2 - (HEAD -> 202311, origin/202311) [ci] Allow partially success build artifact in PR checker pipeline. #2986 (10 hours ago) [Liu Shilong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
Changing LPMODE timing is different between cables.
We want to add functionality to make sure LPMODE has changed.
For that, the wait_until utility is used and every 1 second (until timeout), it will check with lower-layers what is the current Lpmode.
Once it is the expected mode, set_lpmode() functino will return True.
If after seconds, Lpmode is still not in the expected mode, set_lpmode() function will return False.
- How I did it
Add use of wait_until function to make sure lpmode was changed.
- How to verify it
sfputil lpmode on
sfputil lpmode off
#### Why I did it
src/linkmgrd
```
* 1f5fcfd - (HEAD -> 202311, origin/202311) Exclude DbInterface in PR coverage check (#224) (21 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* 83e5106 - (HEAD -> 202311, origin/202311) Updated supported CMIS module types in xcvrd to include new module for SPC4 (#440) (4 hours ago) [Tomer Shalvi]
* f390d8d - Mark sub-port interfaces as invalid ports in xcvrd (#412) (21 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* c4fd095e - (HEAD -> 202311, origin/202311) Fix multi VLAN neighbor learning (#3049) (#3064) (65 minutes ago) [Lawrence Lee]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
The creation of system EEPROM VPD file "/var/run/hw-management/eeprom/vpd_info" is triggered by the udev event during the system boot up, in case the CPU is busy during the bootup, the udev event handling can be delayed, and need to wait for some more time for the file creation.
- How I did it
Extend the waiting time from 10s to 20s to overcome some extreme case.
- How to verify it
continuously run reboot case and verify whether still can see error msg "ERR decode-syseeprom: Nowhere to read syseeprom from! No symlink found"
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
deb11u1 is deprecated.
Use deb11u2 instead.
Other branches are not impacted, because their reproducible build version files are up to date.
Work item tracking
Microsoft ADO (number only): 26964185
How I did it
How to verify it
Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
#### Why I did it
src/sonic-platform-common
```
* 4dfc01f - (HEAD -> 202311, origin/202311) Certain VDM fields not populating after encountering KeyError on 400ZR optics (#442) (28 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 64d5fdd9 - (HEAD -> 202311, origin/202311) [intfsorch] Enable ipv6 proxy ndp along with proxy arp (#3045) (2 days ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* b2bea12c - (HEAD -> 202311, origin/202311) CLI enhancements to revtrieve data from TRANSCEIVER_FIRMWARE_INFO table (#3177) (4 hours ago) [mihirpat1]
* 02ae33f3 - Modify transceiver PM CLI to handle N/A value for DOM threshold (#3174) (28 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* edb2b17 - (HEAD -> 202311, origin/202311) Add new functionality to syncd_init_common.sh, to use common sai.profile (#1352) (22 hours ago) [noaOrMlnx]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
- Why I did it
The cable thermal sensors will be deprecated from the kernel driver. When cable host management is enabled, NOS will fetch the cable temperature from cable EEPROM, kernel driver will not provide the sysfs anymore.
- How I did it
Remove the relevant sensor form the conf files
- How to verify it
Run sonic mgmt sensor test
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
src/sonic-platform-common
```
* 5430f6f - (HEAD -> 202311, origin/202311) Change get_transceiver_info_firmware_versions return type to dict (#440) (2 days ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog