These changes provide for the automatic shutdown of NIF ports on LC when an ungraceful reboot scenario occurs. Reboot and panic notifier hooks are now registered so that callback occurs from the kernel and NIF ports are subsequently shut down.
Why I did it
To facilitate the timely movement of traffic away from a crashed LC when its peers recognize that the associated links have gone down.
How I did it
Linux kernel reboot and panic notifier hooks are used to register a callback routine that, when invoked, stuffs all present transceiver modules into reset.
How to verify it
Cause an ungraceful reboot (whether via /usr/sbin/reboot or by causing a kernel panic) and verify that all LC native NIF links are brought down at reboot/panic time (on the way down). It may be necessary to monitor the LC link peer(s) in order to verify in real-time.
Why I did it
Upgrade the xgs SAI version to 10.1.6.0 to include the following fix:
10.1.6.0: [CS00012332630][SAI_BRANCH rel_ocp_sai_10_1] SAI - OTHER - [SAI BUG] sflow use psample to send packet, but the psample in linux version is not right.
10.1.4.0: [CS00012329827]ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
10.1.3.0: Double commit test code fixes in EM for 10.1.
10.1.2.0: fix ODP packaging in rel_ocp_sai_10_1
10.1.1.0: Use knet-cb procfs path for DNX port speed sampling rate (does not use new genl)
Work item tracking
Microsoft ADO (number only): 26720003
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run full qual on s6100 T1: https://elastictest.org/scheduler/testplan/65c1c2e69e3e72f540cae34b
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.
Why I did it
To address thermal logging deficiencies.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:
Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.
Modify the module.py reboot to support reboot linecard from Supervisor
- Modify reboot to call _reboot_imm for single IMM card reboot
- Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output
- Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
- Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
- Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
- Provide ability for IMM to disable all transceiver module TX at reboot time
- Remove defunct xcvr-resync service
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239
How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.
How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
Why I did it
[Bookworm] Update platform-modules-dell for Bookworm #16735
How I did it
Modified platform driver to comply with bookworm kernel.
Removed MODULE_SUPPORTED_DEVICE wherever used.
Modified python build commands for building whl packages.
How to verify it
Verify whether all the platform bookworm debs are built.
make target/debs/bookworm/platform-modules-z9100_1.1_amd64.deb
Load the platform debian into the device and install it in bookworm image.
Verify the platform related CLI and the functionality
* sonic-platform-modules-cel: broadcom: adapt for kernel 6.1 and bookworm
The i2c_driver->remove API declaration has been updated to return void instead
of int, as part of cleanup patches in 6.1. More details can be referred from
here: [1]. Update the remove API definition in the modules accordingly and
cleanup variables that go unused from the remove API.
Update python build commands for bookworm. The packaging based on calling
setup.py is deprecated and using build module/pip utility is the recommended
method for python packaging/installation. Further details can be referred to
from here: [2], [3]. The build module is picky about the package information file,
which needs to be either setup.py or pyproject.toml.
Additionally, fix formatting inconsistencies in debian/changelog reported by
`dh_installchangelogs` during the build.
Tested the changes by compiling the changes as below:
make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
cd platform/broadcom/sonic-platform-modules-cel
KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage
Also verified the python scripts under the sonic-platform-modules-cel with
pyflakes to ensure no new errors are flagged (with exception of unused modules).
References:
[1] - https://github.com/torvalds/linux/commit/ed5c2f5f
[2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
[3] - 0b20a4863 (Update Python build commands for Bookworm, 2023-09-07)
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
* platform/pddf: i2c: adapt for kernel 6.1 and bookworm
* Fixup i2c_driver->remove API due to changes in the function
prototype (ref: [1]).
* Cleanup `MODULE_SUPPORTED_DEVICE` macros that were cleaned up in
the upstream (ref: [2]).
* Sanitize python packaging and installation using the `build` module
instead of calling the setup.py directly (ref: [3]. [4]).
Tested the changes by compiling pddf module as below:
make sonic-slave-bash NOBUSTER=1 NOBULLSEYE=1
sudo dpkg -i target/debs/bookworm/linux-headers-6.1.0-11-2-*.deb
cd platform/pddf/i2c
KVERSION=6.1.0-11-2-amd64 dpkg-buildpackage
References:
[1] - https://github.com/torvalds/linux/commit/ed5c2f5f
[2] - https://github.com/torvalds/linux/commit/6417f031
[2] - https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.htm
[3] - 0b20a4863 (Update Python build commands for Bookworm, 2023-09-07)
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
* platform/broadcom: include platform-modules-cel in builds
With pddf modules patched for 6.1, platform-modules-cel can be compiled
and included in the final image.
Testing by building sonic-broadcom.bin/sonic-broadcom-dnx.bin.
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
* pddf/i2c: revert correct rootdir for pip install
The pip install directory has been set to test-pkg1/ for testing the build and
incorrectly retained as is. Revert this to the correct path $(PACKAGE_PRE_NAME).
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
* platform/broadcom: include pddf/modules-cel in the base package
Without this change, the modules were built but not packaged in the final .bin.
The final sonic-broadcom.bin has been tested for bootup on Celestica's
Silverstone platform.
admin@sonic:~$ uname -a
Linux sonic 6.1.0-11-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
admin@sonic:~$ show platform summary
Platform: x86_64-cel_silverstone-r0
HwSKU: Silverstone
ASIC: broadcom
ASIC Count: 1
Serial Number: R4009B2F062504LK200024
Model Number: N/A
Hardware Revision: N/A
admin@sonic:~$ show version | head
SONiC Software Version: SONiC.g0aad6c67c-rachandr
SONiC OS Version: 12
Distribution: Debian 12.2
Kernel: 6.1.0-11-2-amd64
Build commit: 0aad6c67c
Build date: Thu Oct 26 07:13:47 UTC 2023
Built by: rachandr@AZUHPS14
Platform: x86_64-cel_silverstone-r0
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
---------
Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com>
Why I did it
- Convert hw-dump into generate-dump plugins
- Enable DRAM scrubber on some products
- Fix xcvr driver active low register bit logic
- Improve cooling algorithm (now considers xcvrs and modules)
- Add linecard graceful shutdown (disabled by default)
The scrubber was enabled for the following products:
- DCS-7050QX-32S
- DCS-7050CX3-32S
- DCS-7060CX-32S
This is CSP CS00012280996.
The issue to fix is that the checksum was incorrect for all TCP packets leaving the system so that the BGP connection cannot be established. We found the issue on BCM56993, and it is possible to affect all platforms using linux_ngknet.
Why I did it
XGS saibcm-modules 8.4 is needed. #14471
Work item tracking
Microsoft ADO (number only): 24917414
How I did it
Copy files from xgs SDK 8.4 repo and modify makefiles to build the image.
Upgrade version to 8.4.0.2 in saibcm-modules.mk.
How to verify it
Build a private image and run full qualification with it: https://elastictest.org/scheduler/testplan/650419cb71f60aa92c456a2b
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.
How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:
8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Why I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
How I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
This fixesNokia-ION/ndk#22
Note that this PR must be coupled with NDK version >= 22.9.13
Why I did it
To provide proper support for CMIS compliant transceiver module CDB operations (including FW related operations).
How I did it
Enhanced the transport subsystem so as to provide for up to 2k bytes of data to be passed to/from modules (as contrasted with the prior max of 128 bytes).
How to verify it
Ensure that new FW (firmware) can be programmed to CMIS compliant module(s) using the 'sfputil firmware ...' commands.
To include files in path platform/broadcom/sonic-platform-modules-dell/s6100/bin during build in Dell S6100 platform deb package.
How I did it
During Dell S6100 platform deb package build, copy the files to the install location.
How to verify it
- Copy the required files to platform/broadcom/sonic-platform-modules-dell/s6100/bin.
- Build the SONiC image and install in a Dell S6100 device.
- The files will be available in /usr/share/sonic/device/x86_64-dell_s6100_c2538-r0/bin/.
Management port currently broken for Edgecore AS4630-54PE platform due to NIC hardware numbering.
Created new PR with typo from Edgecore in original PR fixed. Here is a link to the old PR that has broken logic:
#9560
#15926 updated the submodule hash to point to a commit on a private branch that included changes for compiling with the 5.10.179 kernel. However, this submodule hash should've pointed to one on the master branch, not on a private branch. Fix this with this PR.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
Ignore intermittent IO errors during get_change_event in the Platform API
Fix tunings for some ports on CatalinaDD
Fix kernel module build for 6.1 kernel in preparation of bookworm upgrade
Why I did it
System health config is missing in few Dell platforms.
How I did it
Added system health monitoring config and its related API's
How to verify it
show system-health summary/detail commands.
Why I did it
Add PDDF support on following Ufispace platforms with Broadcom ASIC
S9110-32X
S8901-54XC
S7801-54XS
S6301-56ST
How I did it
Add PDDF configuration files, scripts and python files
How to verify it
Run pddf commands and show commands.
Signed-off-by: nonodark <ef67891@yahoo.com.tw>
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
Why I did it
To fixsonic-net/sonic-mgmt#8786
How I did it
Modified Fan API to check whether the data retrieved is valid or not and return accordingly
How to verify it
Verify whether API 2.0 is loaded properly or not.
Execute CLI's like "show version", "show interface status", "show platform psustatus" etc..
Why I did it
To support dynamic swapping of module types/speeds (400G/100G/40G)
To optimize CMIS ZR optics operation
How I did it
Reinitialize xcvr_api at module removal/insertion time, and also optimize cache for ZR optics.
How to verify it
Verify that different (supported) module types can be dynamically swapped (removed/inserted) and that each is properly provisioned by Xcvrd and has its EEPROM information accurately reported in Redis DB (using "show transceiver eeprom") as well as "sfputil show eeprom" direct access.
Also verify that Xcvrd initialization and operation with 400G CMIS ZR optics is both efficient and functional.
** edit 6/14/23: pushed enhanced caching (full memory map) support and elimination of base class APIs override.
Why I did it
FPGA driver crash was observed in Dell FPGA based platforms.
How I did it
Fixed FPGA crash
How to verify it
Load FPGA driver and check whether the kernel crashes.
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822
How I did it
Add downloading ptf afpacket module in docker file.
How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
Fix lpmode on 7060DX5-32
Fix psu led issue on 7060DX5-64
Use sonic_xcvr lpmode if platform does not support hw lpmode
Add chassis cooling algorithm
Change cooling algorithm default interval to 10s
Force filesystem sync on linecard reboot
- Fix watchdog reboot cause for wolverine linecard
- Fix PSU fan speed of 0% by adding max RPM to most psu descriptions
- Add product DCS-7060DX5-64
- Add product DCS-7060DX5-32
Why I did it
Update sonic-platform submodule for Nokia-7250IXRE Platform. This requires the new NDK 22.9.8 and above
How I did it
Update submodule sonic-platform for Nokia-7250IXRE platform.
c9f316e Disparate process and thread-safe protection for MDIPC transport, and refactored presence logic to better align with SfpStateUpdateTask operation
a3486cc Added _get_module_bulk_info() and cache the info for 5 seconds to optimize the chassisd update.
4b2e729 Fixed the nokia_cmd show qfpga help display
7b87049 Fixed the nokia_cmd show midplane helper dispaly.
83eabea Add "nokia_cmd set ndk-monitor-action" and "nokia_cmd set ndk-log-level" commands
8aad7de Add nokia_cmd show ndk-version
d2c55e3 Modify the psu.py and module.py to optimize the psud running time
Signed-off-by: mlok <marty.lok@nokia.com>
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.
#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.
However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.
To avoid the false alert, improve the monitor to wait and re-check.
Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly:
```
admin@***:~$ sudo systemctl stop serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```
syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```
#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.
How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.
How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.
Signed-off-by: Eric Zhu <erzhu@celestica.com>