Ignore intermittent IO errors during get_change_event in the Platform API
Fix tunings for some ports on CatalinaDD
Fix kernel module build for 6.1 kernel in preparation of bookworm upgrade
Why I did it
System health config is missing in few Dell platforms.
How I did it
Added system health monitoring config and its related API's
How to verify it
show system-health summary/detail commands.
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.
- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs
- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
Add PDDF support on following Ufispace platforms with Broadcom ASIC
S9110-32X
S8901-54XC
S7801-54XS
S6301-56ST
How I did it
Add PDDF configuration files, scripts and python files
How to verify it
Run pddf commands and show commands.
Signed-off-by: nonodark <ef67891@yahoo.com.tw>
* Update sairedis submodule
This submodule update needs to be manually done due to build changes
done in the sairedis submodule. Specifically, Debian build profiles are
now being used instead of dpkg build targets, and dbgsym packages are
being used instead of dbg packages. Because of this, there needs to be
changes on the sonic-buildimage side for this.
This is a reland of #15720, which was reverted in #15995 due to the RPC
package build failing. That failure has since been fixed, and the
PR pipeline has been updated to build the RPC package so that this is
checked at the PR stage.
This submodule update brings in the following changes:
```
4dbdb21 Fix RPC package build failure due to shell syntax issue (#1268)
588d596 Make sure new binaries replace existing binaries in docker-sonic-vs (#1269)
ce8f642 [vs] Use boost join to concatenate switch types in config (#1266)
d6055a2 [vslib]: Temporaily map DPU switch type to NVDA_MBF2H536C (#1259)
e1cdb4d [CodeQL]: Use dependencies with relevant versions in azp template. (#1262)
c08f9a2 [CI]: Fix collect log error in azp template. (#1260)
eed856c [CodeQL]: Fix syncd compilation in azp template. (#1261)
a3f1f1a Reland 'Make changes to building and packaging sairedis (#1116)' (#1194)
```
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Update sairedis submodule with the fix for the RPC package build
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
Increase UT coverage for Nvidia platform API code
Work item tracking
Microsoft ADO (number only):
- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py
- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
- Why I did it
Added the fwtrace config files in order to be able to call the mlxstrace utility during the show techsupport dump.
Work item tracking
Microsoft ADO (number only):
- How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.
- How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
This reverts commit e0927e28af.
Why I did it
Reverts #15720
It breaks build for target/debs/bullseye/syncd_1.0.0_amd64.deb
make[2]: Entering directory '/sonic/src/sonic-sairedis'
dh_install
# Note: escape with an extra symbol
if [ -f debian/syncd-rpc/usr/bin/syncd_init_common.sh ] ; then
/bin/sh: 1: Syntax error: end of file unexpected (expecting "fi")
make[2]: *** [debian/rules:65: override_dh_install] Error 2
make[2]: Leaving directory '/sonic/src/sonic-sairedis'
make[1]: *** [debian/rules:51: binary] Error 2
make[1]: Leaving directory '/sonic/src/sonic-sairedis'
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2
Work item tracking
Microsoft ADO (number only): 24691535
How I did it
How to verify it
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Midstone platform has compilation error in master branch, fixed the same.
How I did it
Due to bullseye migration i2c_new_dummy API is deprecated modified with i2c_new_dummy_device.
How to verify it
Verified target/debs/bullseye/platform-modules-midstone-200i_0.2.2_amd64.deb is generated
Co-authored-by: Kannan Selvaraj <skannan@celestica.com>
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
This submodule update needs to be manually done due to build changes
done in the sairedis submodule. Specifically, Debian build profiles are
now being used instead of dpkg build targets, and dbgsym packages are
being used instead of dbg packages. Because of this, there needs to be
changes on the sonic-buildimage side for this.
This submodule update brings in the following changes:
ce8f642 [vs] Use boost join to concatenate switch types in config (#1266)
d6055a2 [vslib]: Temporaily map DPU switch type to NVDA_MBF2H536C (#1259)
e1cdb4d [CodeQL]: Use dependencies with relevant versions in azp template. (#1262)
c08f9a2 [CI]: Fix collect log error in azp template. (#1260)
eed856c [CodeQL]: Fix syncd compilation in azp template. (#1261)
a3f1f1a Reland 'Make changes to building and packaging sairedis (#1116)' (#1194)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
To fixsonic-net/sonic-mgmt#8786
How I did it
Modified Fan API to check whether the data retrieved is valid or not and return accordingly
How to verify it
Verify whether API 2.0 is loaded properly or not.
Execute CLI's like "show version", "show interface status", "show platform psustatus" etc..
= Why I did it
To optimize Mellanox platform SAI build
- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.
- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
- Why I did it
The reset cause "reset_from_comex" has been removed by hw-management, hence removing it from platform API code
- How I did it
Remove reset_from_comex from reboot cause mapping
- How to verify it
Manual test
- Why I did it
BIOS on new generation switch can come with a file type of cap or cab. Needs to add support to these file type.
Also ONIE version on new devices can have a suffix of 'dev'.
- How I did it
Added cap & cab as possible component extensions for ComponentBIOS.
Update the ONIE version regex to include dev signed versions.
- How to verify it
Update BIOS.
Why I did it
port_config.ini and hwsku.json are needed to generate the default config
switch_type needs to be "dpu" to spawn the right set of processes during dvs initialization and to make sure that DASH APIs can be handled properly
Work item tracking
Microsoft ADO 24375371:
How I did it
Use the same hwsku.json and port_config.ini for DPU-2P as the ones used for Nvidia-MBF2H536C SKU in nvidia-sonic sonic-buildimage repo.
Set switch_type to "dpu" in DEVICE_METADATA configuration to make sure DASH specific APIs are handled properly
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
* [202012][platform/barefoot] (#8543)
Why I did it
Pcied running by python 2.
How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2
How to verify it
docker exec pmon supervisorctl status
* [Netberg][nba710] Added initial support for Aurora 710
Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
---------
Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
Co-authored-by: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com>
Why I did it
To support dynamic swapping of module types/speeds (400G/100G/40G)
To optimize CMIS ZR optics operation
How I did it
Reinitialize xcvr_api at module removal/insertion time, and also optimize cache for ZR optics.
How to verify it
Verify that different (supported) module types can be dynamically swapped (removed/inserted) and that each is properly provisioned by Xcvrd and has its EEPROM information accurately reported in Redis DB (using "show transceiver eeprom") as well as "sfputil show eeprom" direct access.
Also verify that Xcvrd initialization and operation with 400G CMIS ZR optics is both efficient and functional.
** edit 6/14/23: pushed enhanced caching (full memory map) support and elimination of base class APIs override.
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.
NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker
How I did it
Implemented new service to clean the shared storage.
How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Define a generic 2-port NPU SKU for docker-sonic-vs to
enable DASH vstests to pass on azure pipelines
Work item tracking
Microsoft ADO 24375371:
How I did it
Define a generic 2-port NPU hwsku that is used only for DASH-specific vstests.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
- Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.
- How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.
- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
- Why I did it
Adjust the warning threshold implementation according to the latest algorithm update
- How I did it
Modify power warning and critical thresholds methods
- How to verify it
Unit test updated to cover the change
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Add the patchwork link to the commit description for non-upstream patches if present
- How I did it
Parse the patchwork/<patch_name>.txt file from hw-mgmt
Why I did it
FPGA driver crash was observed in Dell FPGA based platforms.
How I did it
Fixed FPGA crash
How to verify it
Load FPGA driver and check whether the kernel crashes.
Why I did it
When git clone -b xxx command is used the versions-git will reset the HEAD of the git to the commit ID in the versions-git file. Which causes incorrect commit to be checked out causing build errors.
Work item tracking
Microsoft ADO (number only):
How I did it
Split ‘git clone -b’ into two steps to avoid owerwrite
Git clone
cd mrvl-prestera; git checkout ; cd ..
How to verify it
Build marvell-arm64 target using below instructions
make init
make configure PLATFORM=marvell-arm64 PLATFORM_ARCH=arm64
make target/sonic-marvell-arm64.bin SONIC_BUILD_JOBS=2
- Why I did it
Bug fix:
- * I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.
- How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer
-How to verify it
Run full sonic-mgmt regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
Facilitate Automatic integration of sdk kernel patches into SONiC.
**Inputs to the Script:**
1) `MLNX_SDK_VERSION` Eg: `4.5.4206`
2) `MLNX_SDK_ISSU_VERSION` Eg: `101`
**Note: If nothing is provided the one already present in the sdk.mk file is used**
3) `MLNX_SDK_SOURCE_BASE_URL:`
**Note: If nothing is provided the upstream sdk drivers url is used**
4) `CREATE_BRANCH: (y|n)` Creates a branch instead of a commit (optional, default: n)
5) `BRANCH_SONIC`: Only relevant when CREATE_BRANCH is y. `Default: master`.
Note: These should be provided through `SONIC_OVERRIDE_BUILD_VARS ` parameter
**Output:**
1) Script creates a commit in sonic-linux-kernel with any updates to sdk-kernel patches in sonic in accordance with the version provided by `MLNX_SDK_VERSION`
**Note: Script Doesn't commit anything to linux-kernel when there aren't any changes required..**
#### How I did it
1) Added a new make target which can be invoked by calling `make integrate-mlnx-sdk`
```
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6f38dca_integrate_4.5.4206
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 1
d64d1e7 (HEAD -> master_6f38dca_integrate_4.5.4206) Intgerate MLNX SDK 4.5.4206 Kernel Patches
```
Changes made will be summarized under `sonic-buildimage/integrate-mlnx-sdk_user.out` file. Debugging and troubleshooting output is written to `sonic-buildimage/integrate-mlnx-sdk.log` files
[log_files.zip](https://github.com/sonic-net/sonic-buildimage/files/11226441/log_files.zip)
#### Limitations:
1) Assumes that the sdk kernel patches are always upstreamed
#### How to verify it
Build the Kernel and test
Why I did it
Align with SAI headers v1.12.0
Work item tracking
Microsoft ADO (number only):
How I did it
Update Mellanox SAI submodule
How to verify it
Compile SONiC image
- Why I did it
The current implementation of SFP reset, LPM, present relies on SDK API. This PR moves the implementation to SDK sysfs. By this PR, it gains following benefit:
1. SDK sysfs provides better performance.
2. Host side and container side share the same code.
3. Code is much cleaner.
- How I did it
Use SDK sysfs to implement SFP reset, LPM, present.
- How to verify it
1. Manual test.
2. Unit test.
SDK/FW Fixed Issues:
• When a system has more than 256 ACL entries, on rare occasion, removing/adding entries may cause some ACL entries not to work.
• When using mirror session policer on spectrum-2, spectrum-3, the actual CIR was 1.28 times more than the configured CIR value
• After warm boot process, when enabling ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
• Warm boot might fail if the key value SAI_KEY_ACCUMULATED_FLOW_COUNTER_UNITS_IN_KB is set
• If counters are bound to an next hop group, there is a probability the next API calls that modify the next-hop group members will fail.
• In Spectrum platforms Fastboot mode is not operational for Split port with Force mode in 50G speed
• When fine grain next hop group has a size of 2K or 4K members, and group is removed, FW will remove only (size % 2048) members, resulting in leakage of KVD resources
• When reading some port statistics, or bulk reading some Queue or PG statistics, and in parallel reading or writing other counters, FW may, in rare cases, get stuck
• SN2201 Module 1 is considered to be present/linked while no cable/module is plugged
• On Spectrum-3 when port configure to 400G FW might stuck after running mlxlink while 400G interface connected and swap between upper and lower 4 lanes
SAI New features:
• ACL: Added support for an ACL match on the AETH field (SAI_ACL_TABLE_ATTR_FIELD_AETH_SYNDROME, SAI_ACL_ENTRY_ATTR_FIELD_AETH_SYNDROME) to count RoCE NAK and CNP packets.
• PLL Status: Added a new logging entry that alerts the user upon a PLL lock loss event.
• Dual ToR - Additional MAC Address: Added support for setting a MAC address for the router interface which is not part of the 10 bit MAC address available for RIFs on Spectrum-1, as part of the Dual ToR scenario.
• Dual ToR: DSCP Remapping Added support for tunnel QoS maps as part of the Dual TOR scenario.
SAI Fixed issues:
• When setting a WRED profile attribute for a color that was not enabled during the profile create time, an error would be returned. After the fix, a default profile is create on such scenario and the set attribute is applied on top of it
• When calling the flush FDB by using the SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID attribute, the bridge bv_id value was filled on the notification callback where it should have been left empty.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Add new Nokia build target and establish an arm64 build:
Platform: arm64-nokia_ixs7215_52xb-r0
HwSKU: Nokia-7215-A1
ASIC: marvell
Port Config: 48x1G + 4x10G
How I did it
- Change make files for saiserver and syncd to use Bulleseye kernel
- Change Marvell SAI version to 1.11.0-1
- Add Prestera make files to build kernel, Flattened Device Tree blob and ramdisk for arm64 platforms
- Provide device and platform related files for new platform support (arm64-nokia_ixs7215_52xb-r0).
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.
PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.
This specific PR adds S3IP supported sysfs attribute in common FAN driver of PDDF.
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.
PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.
This specific PR adds the S3IP supported sysfs attributes in PDDF common LED driver.
Why I did it
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.
PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.
This specific PR adds support for pddf-s3ip-init.service and enables it in PDDF.
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822
How I did it
Add downloading ptf afpacket module in docker file.
How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
- SAI-1.11.0 support
- SONIC 20220531.25 OC Failure: Everflow testcases failing due to SAI orchagent crash
- SONIC 20220531.25 OC Failure: ACL IPv6 testcases.
- TPID support
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
- Why I did it
As a LED indicator to help user to find switch location in the lab, UID LED is a useful LED in Mellanox switch.
- How I did it
I add a new member _led_uid in Mellanox/Chassis.py, and extend Mellanox/led.py to support blue color.
Relevant platform-common PR sonic-net/sonic-platform-common#369
- How to verify it
Add unit test cases in test.py, and do manual test including turn-on/off/show uid led.
Signed-off-by: David Xia <daxia@nvidia.com>
Fix lpmode on 7060DX5-32
Fix psu led issue on 7060DX5-64
Use sonic_xcvr lpmode if platform does not support hw lpmode
Add chassis cooling algorithm
Change cooling algorithm default interval to 10s
Force filesystem sync on linecard reboot
- Why I did it
Add the commit-id patch map in the commit message.
- How I did it
By parsing the patch DB from hw-mgmt
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Currently, LED sysfs path is hardcoded. We will need change LED code if new LED color is supported for new platforms. This PR is aimed to improve this. By this PR, LED sysfs path is deduced from LED capability file.
- How I did it
Improve LED management on Nvidia platform:
get LED capability from capability file and deduce sysfs name according to the capability
- How to verify it
Unit test
Manual test
- Why I did it
To improve ASAN backtrace output when the call stack contains a code that is not compiled with -fno-omit-frame-pointer.
- How I did it
Added fast_unwind_on_malloc=0 to the ASAN_OPTIONS
- How to verify it
Build and test docker-syncd-mlnx.gz with ENABLE_ASAN=y
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Using timer-override.conf, we modify the fstrim.timer service.
For armhf, Nokia-7215 platform, we modify fstrim.timer to run daily
instead of weekly. This is required because the size of the SSD on
this platform is 16GB, which on average is nearly 10 times smaller than
most other sonic platforms. With smaller disk and the ever increasing
level of logging done by sonic, this change is required to prevent
the SSD from entering a read-only state due to inadequate free blocks.
- Fix watchdog reboot cause for wolverine linecard
- Fix PSU fan speed of 0% by adding max RPM to most psu descriptions
- Add product DCS-7060DX5-64
- Add product DCS-7060DX5-32
- Why I did it
Add NVIDIA Copyright header to NVIDIA files added lately
- How I did it
Add NVIDIA Copyright header for the relevant files
- How to verify it
N/A (only commented text was added).
Normally doesn't need to measure i2c calls.
Also switched to use timespec64_sub() to ensure time delta normalized
Co-authored-by: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com>
- Why I did it
Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2
- How I did it
Download iproute2 from Debian repository, apply patches and compile to create a new target.
The target is then deployed in syncd container of Mellanox switches only.
The new target is called IPROUTE2_MLNX.
- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
Why I did it
Update sonic-platform submodule for Nokia-7250IXRE Platform. This requires the new NDK 22.9.8 and above
How I did it
Update submodule sonic-platform for Nokia-7250IXRE platform.
c9f316e Disparate process and thread-safe protection for MDIPC transport, and refactored presence logic to better align with SfpStateUpdateTask operation
a3486cc Added _get_module_bulk_info() and cache the info for 5 seconds to optimize the chassisd update.
4b2e729 Fixed the nokia_cmd show qfpga help display
7b87049 Fixed the nokia_cmd show midplane helper dispaly.
83eabea Add "nokia_cmd set ndk-monitor-action" and "nokia_cmd set ndk-log-level" commands
8aad7de Add nokia_cmd show ndk-version
d2c55e3 Modify the psu.py and module.py to optimize the psud running time
Signed-off-by: mlok <marty.lok@nokia.com>
Signed-off-by: Stepan Blyschak stepanb@nvidia.com
DEPENDS: #12852
Why I did it
To support BGP pending FIB suppression.
How I did it
I backported patches from FRR 8.4 feature that allows communicating ASIC route status back to FRR.
Also, added a new field in DEVICE_METADATA YANG model table. Added UT for YANG model changes.
How to verify it
Run on the switch.
- Why I did it
Package Marvell/Innovium CLI shell.
- How I did it
Include shell packages.
- How to verify it
Platform specific shell commands.
Signed-off-by: rck-innovium rck@innovium.com
- Why I did it
Facilitate Automatic integration of new hw-mgmt version into SONiC.
Inputs to the Script:
MLNX_HW_MANAGEMENT_VERSION Eg: 7.0040.5202
CREATE_BRANCH: (y|n) Creates a branch instead of a commit (optional, default: n)
BRANCH_SONIC: Only relevant when CREATE_BRANCH is y. Default: master.
Note: These should be provided through SONIC_OVERRIDE_BUILD_VARS parameter
Output:
Script creates a commit (in each of sonic-buildimage, sonic-linux-kernel) with all the changes required for upgrading the hw-management version to a version provided by MLNX_HW_MANAGEMENT_VERSION
Brief Summary of the changes made:
MLNX_HW_MANAGEMENT_VERSION flag in the hw-management.mk file
hw-mgmt submodule is updated to the corresponding version
Updates are made to non-upstream-patches/patches and series.patch file
series, kconfig-inclusion and kconfig-exclusion files can be updated in the sonic-linux-kernel repo
sonic-linux-kernel/patches folder is updated with the corresponding upstream patches
Based on the inputs, there could be a branch seen in the local for each of the repo's. Branch is named as <branch>_<parent_commit>_integrate_<hw_mgmt_version>
- How I did it
Added a new make target which can be invoked by calling make integrate-mlnx-hw-mgmt
user@server:/sonic-buildimage$ git rev-parse --abbrev-ref HEAD
master_23193446a_integrate_7.0020.5052
user@server:/sonic-buildimage$ git log --oneline -n 2
f66e01867 (HEAD -> master_23193446a_integrate_V.7.0020.5052, show) Intgerate HW-MGMT V.7.0020.5052 Changes
23193446a (master_intg_hw_mgmt) Update logic
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6847319_integrate_7.0020.4104
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 2
6094f71 (HEAD -> master_6847319_integrate_V.7.0020.5052) Intgerate HW-MGMT V.7.0020.5052 Changes
6847319 (origin/master, origin/HEAD) Read ID register for optoe1 to find pageable bit in optoe driver (#308)
Changes made will be summarized under sonic-buildimage/integrate-mlnx-hw-mgmt_user.out file. Debugging and troubleshooting output is written to sonic-buildimage/integrate-mlnx-hw-mgmt.log files
User output file & stdout file:
log_files.tar.gz
Limitations:
Assumes the changes would only work for amd64
Assumes the non-upstream patches in mellanox only belong to hw-mgmt
- How to verify it
Build the Kernel
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.
SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
How I did it
- Why I did it
Currently, non upstream patches are applied only after upstream patches.
Depends on sonic-net/sonic-linux-kernel#313. Can be merged in any order, preferably together
- What I did it
Non upstream Patches that reside in the sonic repo will not be saved in a tar file bur rather in a folder pointed out by EXTERNAL_KERNEL_PATCH_LOC. This is to make changes to the non upstream patches easily traceable.
The build variable name is also updated to INCLUDE_EXTERNAL_PATCHES
Files/folders expected under EXTERNAL_KERNEL_PATCH_LOC
EXTERNAL_KERNEL_PATCH_LOC/
├──── patches/
├── 0001-xxxxx.patch
├── 0001-yyyyyyyy.patch
├── .............
├──── series.patch
series.patch should contain a diff that is applied on the sonic-linux-kernel/patch/series file. The diff should include all the non-upstream patches.
How to verify it
Build the Kernel and verified if all the patches are applied properly
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.
#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.
However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.
To avoid the false alert, improve the monitor to wait and re-check.
Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly:
```
admin@***:~$ sudo systemctl stop serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```
syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```
#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Why I did it
After sonic-install install a new image, print_menu is set echo without any data. No image info between Hit any key to stop autoboot: 0 and Start USB
Board configuration detected:
Net:
| port | Interface | PHY address |
|--------|-----------|--------------|
No ethernet found.
Hit any key to stop autoboot: 0
(Re)start USB...
USB0: Port (usbActive) : 0 Interface (usbType = 2) : USB EHCI 1.00
scanning bus 0 for devices... 3 USB Device(s) found
scanning usb for storage devices... 0 Storage Device(s) found
How I did it
The fw_setenv print_menu is missing the double quotes. That causes the value is truncated. Using double quotes to in the environment setting.
How to verify it
Install new image with this fix. And reboot the system. The following section should be shown:
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Add platform files for critical processes and default qos config for Innovium platforms
How I did it
Added default files for critical processes and qos config
How to verify it
Tested with autorestart/test_container_autorestart.py::test_containers_autorestart
Signed-off-by: rck-innovium rck@innovium.com
Why I did it
Fix the installation candidate not found issue when building docker-sonic-vs
How I did it
Need to run the command "apt-get update" to update the mirror indexes before installing the package gnupg
How to verify it
Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.
How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.
How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.
Signed-off-by: Eric Zhu <erzhu@celestica.com>
* Upgrade docker-sonic-vs and docker-syncd-vs to Bullseye
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* iproute2: Force a new version and timestamp to be used for the package
There is an issue with Docker's overlay2 storage driver when not using
native diffs (and thus falling back to naive diff mode), which is the
case in the CI builds. The way the naive diff mode detects changes is by
comparing the file size and comparing the timestamps (specifically, I
believe it's the modification timestamp), and if there's a change there,
then it's considered a change that needs to be recorded as part of that
layer.
The problem is that with the code being added in the patch, the file
size remains the same, and the timestamp of binary files appear to be
the same timestamp as the changelog entry (likely for reproducible build
purposes). The file size remains the same likely due to extra padding
within the file introduced by relro. Because of this, Docker doesn't
detect this file has changed, and doesn't save the new file as part of
this layer.
To work around this, create a new changelog entry (with a new version as
well) with a new timestamp. This will result in the binary files having
a different timestamp, and thus will get saved by Docker as part of that
layer.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready
- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.
- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE
Why I did it
To enhance pddf_eeprom.py to use caching and fix#13835
How I did it
Utilising the in-built caching mechanism in the base class eeprom_base.py.
Adding a cache file to store the eeprom data.
How to verify it
By running 'decode-syseeprom' or 'show platform syseeprom' commands.
Why I did it
To enable FPGA support in PDDF.
How I did it
Added FPGAI2C and FPGAPCI in the build path for the PDDF debian package
Added the support for FPGA access APIs in the drivers of fan, xcvr, led etc.
Added the FPGA device creation support in PDDF utils and parsers
How to verify it
These changes can be verified on some platform using such FPGAs. For testing purpose, we took Dell S5232f platform and brought it up using PDDF. In doing so, FPGA devices are created using PDDF and optics eeproms were accessed using common FPGA drivers. Below are some of the logs.
- Why I did it
To include latest fixes:
Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog
- How I did it
Updated SDK submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Why I did it
Upgrade SAI XGS version to 8.4.0.2 and migrate to DMZ repo.
How I did it
Update SAI XGS version in sai.mk.
How to verify it
Run the SONiC and SAI test with the SAI pipeline.
Signed-off-by: zitingguo-ms zitingguo@microsoft.com
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1
- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK
- How to verify it
Manual testing
Fixes#13568
Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:
admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.
- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.
- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
- Why I did it
To optimize Mellanox platform build
- How I did it
sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release
sx kernel is downloaded as zip from Spectrum-SDK-Drivers
- How to verify it
configure/build for Mellanox platform
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.