- Why I did it
Update SDK/FW version to 4.6.2202/2012.2202
Fixed issues:
1. On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
2. In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
3. On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
4. When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place
- How I did it
Updating make files.
-How to verify it
Running regression
- Why I did it
Update SAI version to SAIBuild2311.26.0.28
Fixed issues
1. Traffic with unicast destination ip and multicast destination mac wasn't properly dropped
2. When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
3. Optional feature of Port IP counters (SAI_PORT_STAT_IP*) , enabled by SAI XML per-port-ip-counter-enabled config node, wasn't initialized properly.
4. Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
5. The default value for port FEC during switch init for Spectrum3 was initialized as 'auto' and not aligned to SAI header default 'none'. Note if setups has invalid configuration and relied previously on auto, now it might be necessary for the user to provide explicit valid value for SAI_PORT_ATTR_FEC_MODE
Update SDK/FW version to 4.6.2134/2012.2134
Fixed issues:
1. Updated SN3700C to enable limit to 100G speed.
2. Recovering from Low power mode might ends with port down.
- How I did it
Updating the versions in makefile
- How to verify it
Confirm issues fixed and run sonic-mgmt tests
Why I did it
Update SDK/SAI and FW for Mellanox Platform
How I did it
Update SDK/FW to v4.6.2104/v2012.2104
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
New Features:
Debian 12 and kernel 6.1 support
Update SAI
New Features:
Auto Fec Support
FDB entries are now restored after warmboot to prevent temporary system flooding.
Minor Enhancement and Bug Fix in integrate-mlnx-sdk
How to verify it
Build Image and run tests
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support
SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
Facilitate Automatic integration of sdk kernel patches into SONiC.
**Inputs to the Script:**
1) `MLNX_SDK_VERSION` Eg: `4.5.4206`
2) `MLNX_SDK_ISSU_VERSION` Eg: `101`
**Note: If nothing is provided the one already present in the sdk.mk file is used**
3) `MLNX_SDK_SOURCE_BASE_URL:`
**Note: If nothing is provided the upstream sdk drivers url is used**
4) `CREATE_BRANCH: (y|n)` Creates a branch instead of a commit (optional, default: n)
5) `BRANCH_SONIC`: Only relevant when CREATE_BRANCH is y. `Default: master`.
Note: These should be provided through `SONIC_OVERRIDE_BUILD_VARS ` parameter
**Output:**
1) Script creates a commit in sonic-linux-kernel with any updates to sdk-kernel patches in sonic in accordance with the version provided by `MLNX_SDK_VERSION`
**Note: Script Doesn't commit anything to linux-kernel when there aren't any changes required..**
#### How I did it
1) Added a new make target which can be invoked by calling `make integrate-mlnx-sdk`
```
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6f38dca_integrate_4.5.4206
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 1
d64d1e7 (HEAD -> master_6f38dca_integrate_4.5.4206) Intgerate MLNX SDK 4.5.4206 Kernel Patches
```
Changes made will be summarized under `sonic-buildimage/integrate-mlnx-sdk_user.out` file. Debugging and troubleshooting output is written to `sonic-buildimage/integrate-mlnx-sdk.log` files
[log_files.zip](https://github.com/sonic-net/sonic-buildimage/files/11226441/log_files.zip)
#### Limitations:
1) Assumes that the sdk kernel patches are always upstreamed
#### How to verify it
Build the Kernel and test
SDK/FW Fixed Issues:
• When a system has more than 256 ACL entries, on rare occasion, removing/adding entries may cause some ACL entries not to work.
• When using mirror session policer on spectrum-2, spectrum-3, the actual CIR was 1.28 times more than the configured CIR value
• After warm boot process, when enabling ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
• Warm boot might fail if the key value SAI_KEY_ACCUMULATED_FLOW_COUNTER_UNITS_IN_KB is set
• If counters are bound to an next hop group, there is a probability the next API calls that modify the next-hop group members will fail.
• In Spectrum platforms Fastboot mode is not operational for Split port with Force mode in 50G speed
• When fine grain next hop group has a size of 2K or 4K members, and group is removed, FW will remove only (size % 2048) members, resulting in leakage of KVD resources
• When reading some port statistics, or bulk reading some Queue or PG statistics, and in parallel reading or writing other counters, FW may, in rare cases, get stuck
• SN2201 Module 1 is considered to be present/linked while no cable/module is plugged
• On Spectrum-3 when port configure to 400G FW might stuck after running mlxlink while 400G interface connected and swap between upper and lower 4 lanes
SAI New features:
• ACL: Added support for an ACL match on the AETH field (SAI_ACL_TABLE_ATTR_FIELD_AETH_SYNDROME, SAI_ACL_ENTRY_ATTR_FIELD_AETH_SYNDROME) to count RoCE NAK and CNP packets.
• PLL Status: Added a new logging entry that alerts the user upon a PLL lock loss event.
• Dual ToR - Additional MAC Address: Added support for setting a MAC address for the router interface which is not part of the 10 bit MAC address available for RIFs on Spectrum-1, as part of the Dual ToR scenario.
• Dual ToR: DSCP Remapping Added support for tunnel QoS maps as part of the Dual TOR scenario.
SAI Fixed issues:
• When setting a WRED profile attribute for a color that was not enabled during the profile create time, an error would be returned. After the fix, a default profile is create on such scenario and the set attribute is applied on top of it
• When calling the flush FDB by using the SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID attribute, the bridge bv_id value was filled on the notification callback where it should have been left empty.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2
- How I did it
Download iproute2 from Debian repository, apply patches and compile to create a new target.
The target is then deployed in syncd container of Mellanox switches only.
The new target is called IPROUTE2_MLNX.
- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
- Why I did it
To include latest fixes:
Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog
- How I did it
Updated SDK submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
- Why I did it
To optimize Mellanox platform build
- How I did it
sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release
sx kernel is downloaded as zip from Spectrum-SDK-Drivers
- How to verify it
configure/build for Mellanox platform
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
To include latest fixes and new functionality
SDK/FW
1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module.
2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
4. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error.
5. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
6. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work.
7. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly.
8. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event.
9. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port.
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
Add SDK hash calculator Debian and update SDK makefile to compile it.
- How I did it
SDK hash calculator Debian will be used by ECMP calculator (PR #12482)
- How to verify it
Compile sonic-buildimage and verify SDK hash calculator Debian exist in target folder.
- Why I did it
Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes:
New functionality:
1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system
Fix the following issues:
1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow
2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed
3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received
4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted
5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU
6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized
7. SLL configuration is missing in SDK dump
8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero)
9. PCI calibration changes from a static to a dynamic mechanism
10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event
11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Signed-off-by: dprital <drorp@nvidia.com>
- Why I did it
To include latest fixes and new functionality
SAI fixes and new features
fix#3205239, incorrect object type returned for SG child list
Fix VRF-VNI map entries remove issue
ECC health event and logging
[Port Buffers] restore default queue and pg configuration when all user pools are deleted
Fix EVPN type3 error on removal of uc/bc flood group
Fix EVPN type2 MAC move from local to remote results in SAI failure
Fix Disable learning on VXLAN tunnel
Fix error on VXLAN v6 tunnel removal
Fix port cannot apply schedule group when it is a lag member
Fix BFD add more detailed message on BFD packet not related to any existing session
gcc10 compilation fixes
Disable learning on VXLAN tunnel
Support BFD remote-disc exchange in negotiation stage
Tunnel Loopback packet action attribute implementation (for Dual TOR)
Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only)
Support for CRC2 hash algorithm
Bulk counter support for PGs, queues
Support mirror sample rate attribute (SPC2+)
[Functional] [QoS] | Unable to remove SCHEDULE profile table even if there is no object referencing it
Next hop group optimized bulk API
Reduce verbosity of shared database already exists print
Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+
use same size descriptor pool for rx/tx
fix bfd - notify Sonic for admin-down event
2201 - empty list for supported fec for RJ45 ports
Fix don't disable used tunnel underlay interfaces
SDK fixes
100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom".
Added remote peer UDP port information in BFD packet event.
After editing an ECMP, the resilient ECMP next-hop counter may not count correctly.
Fixed potential memory leaks in some APIs related to LPM
If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero).
In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides
In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected.
In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped.
In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt.
On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
PCI calibration changes from a static to a dynamic mechanism.
SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group.
SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group.
SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump.
SLL configuration is missing in SDK dump.
Spectrum-2 systems, do no support 1GbE on supported 40GbE modules.
When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly.
When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU.
When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed.
When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received.
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes:
• Spectrum-3 | PCI calibration changes from a static to a dynamic mechanism.
• [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
- Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
1. Cr space timeout on Hold and Release GW - at warm boot
2. Spectrum Port in stuck PHY_UP after peer side rebooted
3. Memory leak in sx_api_router_ecmp_update_set
- How I did it
Update the make file with the new version number
Update submodule Switch-SDK-drivers pointer
- How to verify it
Run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add support for mellanox platform building for target architecture arm64.
- How I did it
Contains the following changes:
1. Change instances of hard-coded amd64 to $(CONFIGURED_ARCH)
2. Add logic to download correct binary for MFT package
3. Add TARGET_BOOTLOADER=grub definition to rules.mk to override default arm64 bootloader
- How to verify it
Build mellanox platform with TARGET_ARCH set as arm64
- Why I did it
This is for the eventual support of multiple architectures for the mellanox platform.
- How I did it
Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for.
This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image.
- How to verify it
Build SONiC for mellanox and verify it compiles successfully.
- Why I did it
To include latest fixes:
1. Warmboot | When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
2. Link Up | When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
3. Shared buffer | While moving from lossless to lossy while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized.
- How I did it
Updated SDK submodule along with the relevant Makefiles
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.1.1
SDK/FW features:
1. Added support for Finisar DR4 (FTCD4523E2PCM) on Spectrum-2 and Spectrum-3 systems.
SAI Features:
1. ECMP overlay support for IPv6
2. BFD offloading / 4K scale
3. Host interface user traps + improved trap registration (table entry)
4. gcc11 compilation fixes
5. Read support for ACL redirect action
6. Optimize ECMP DB size
7. Buffer descriptors new defaults
8. Updated port mapping for SN2201
SAI Fixes:
1. Debug counter removal when configured with all drop reasons
- Why I did it
Upgrade Mellanox SDK and SAI versions to latest
- How I did it
Updated submodule pointers
- How to verify it
Regression tested
- Why I did it
Update NVIDIA Copyright header to "mellanox" files which were changed since 1.1.2022
- How I did it
Update the copyright header
- How to verify it
Sanity tests and PR checkers.
- Why I did it
To include latest SDK fixes:
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds.
and to include SAI fixes \ changes:
1. Reduce verbosity for resource check vendor data not found
2. Fix metadata validation, check default value on conditions check
3. Add 100MB, 10MB to 2201 system
4. L3 VXLAN overlay ECMP
5. VXLAN srcport API implementation
6. Fix scheduler profile null (default values) when set on sub group scheduler group
7. Fix ACL binding restoration when port leaves a LAG
8. Fix route logic for set next hop/action and reference counter for ECMP overlay
- How I did it
1. Updated SDK/FW submodule and relevant makefiles with the required versions.
2. Update SAI submodule and relevant makefile with the required version.
- How to verify it
Build an image and run tests from "sonic-mgmt".
- Why I did it
To include latest fixes.
SAI
1. Reclaim buffers for port which is admin down
2. Support for Spectrum-4 os Nvidia ASIC simulation
3. Support for SN2201
4. Fix host interface table entry, one channel per trap (fix sflow double registration)
5. 2 new queue counters - ecn marked packets + shared current occupancy
6. Fix storm policer unknown unicast
7. Add key/value for accuflow counters
8. Add MAC move
9. Add mirror congestion mode attribute
SDK
1. Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
2. In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
3. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
6. Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
7. Tying the SCL and SDA of the optical modules to 3.3V causes errors.
8. On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
9. While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
10. In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY.
11. The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
12. When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
13. Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
14. Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason.
15. Fixed a memory leak in sx_api_port_sflow_statistics_get API.
16. During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.
17. Fix route issue on Kernel 5.10
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Allow mellanox platform to build and successfully switch packets in
Debian 11
Upgraded
* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches
Adjusted build system to support host system running bullseye and
dockers running buster.
- Why I did it
Add NVIDIA Copyright header to "mellanox" files
- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files
- How to verify it
Sanity tests and PR checkers.
- Why I did it
Update SDK\FW version to 4.4.3326\2008.3326. This version contains:
New Features:
1. Add support for Fast Boot for SN3800
Bug Fixing:
1. In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
2. On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
3. On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
4. Some Molex cables do not support speed after reboot
- How I did it
- How to verify it
Was verified by running regression tests that includes complete sonic-mgmt tests supported
Update SDK\FW version to 4.4.3320\2008.3324. This version contains:
New Features:
* Add support for Fast Boot for SN3800
Bug Fixing:
* In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
* On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
* On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
* Some Molex cables do not support speed after reboot
Signed-off-by: Dror Prital <drorp@nvidia.com>
- Why I did it
* For SAI - Advance to adopt the following fixes:
1. Better handle not implement object type for resource availability
2. Fix ext dump when saidump is triggered from 2nd process (saidump utility) other than main adapter host (syncd in SONiC)
* For SDK\FW:
- Changes and new features:
1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM).
2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE
3. Improved performance of "per-port-buffer" counters
4. Added support for Kernel 5.10
- Bugs fixes:
On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds
Signed-off-by: Dror Prital <drorp@nvidia.com>
Why I did it
* For SAI - Upgrade to Version 1.19.0
- Add support for VxLAN encap TTL uniform model on SPC2/3
- Add ACL entry actions set VRF, set do no learn, add VLAN ID, add VLAN priority
- Add ACL field has VLAN tag
- Bulk counters (improve port statistics performance)
- Create async dump extra as part of debug generate dump
- Create irisc dump on severe health event
- Support 0 port systems (modify get switch mac to work accordingly)
- Set interface vlan up state for ping tool in SONiC
- Support attributes SAI_PORT_ATTR_QOS_SCHEDULER_PROFILE_ID, SAI_PORT_ATTR_QOS_INGRESS_BUFFER_PROFILE_LIST,
SAI_PORT_ATTR_QOS_EGRESS_BUFFER_PROFILE_LIST, SAI_PORT_ATTR_POLICER_ID as part of port create Git stats
* For SDK\FW - Upgrade to Version SDK 4.4.3106, FW 2008_3110
Added Features:
- Increased ACL table
- Enhanced PSAMPLE support
- Added support for Finisar SR4 module in SN3700 systems
- Added support for Python 3.0 in examples.
Fix bugs:
- On LR4 transceivers 00YD278, the firmware incorrectly identified the transceiver
- Reduce memory consumption for virtual LAG
- Fixed PSAMPLE listeners cleanup on SDK drivers unloading.
- On Spectrum-2 and Spectrum-3 systems, slow reaction time to Rx pause packets may lead to buffer overflow on servers.
- BER may be experienced when using 5m DAC cables between SN4700 and SN2700 in 100GbE speed.
- On very rare occasion, when connecting DR4 PAM4 transceiver to 100GbE DR1 NRZ, low BER may be experienced.
- Unexpected packet drops on the port ingress buffer may be experienced when working in 400GbE mode.
Note: When performing ISSU from an older version, this fix won't be applied. For fix to apply, a non-ISSU reset is required.
- Fix SN3800 specific warm boot scenario: Disable interface, Warm Boot, Enable Interface --> link will remain down.
Signed-off-by: Dror Prital <drorp@nvidia.com>
New features and fixes in the new SDK/FW:
SN4600C | AN/LT support
SN2700 | AN/LT bugs fixes
WJH | FID_MISS support
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Fix the following issues:
Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system
All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911)
All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
To pick up new features and fix from SDK/FW and SAI
SDK/FW new Feature:
All | Added support for multiple modules and cable types. For full list contact Nvidia networking support
Spectrum-3 | SN46000C | Added support for up to 5W on ports 49 to 64 .
SDK/FW bugs' fix:
All | fast reboot | fast boot failure from latest 201811 to 201911 and above
Spectrum | 10GbE/1GbE Transceiver (FTLX8574D3BCV) stopped working after firmware upgrade
Spectrum-2 | When device is rebooted with locked Optical Transceivers in split mode, the firmware may get stuck
Spectrum-2 | SN3700 | When connecting at 200GbE to Ixia K400, Ixia receives CRC errors
Spectrum-2 | SN3800 | On rare occasions packets loss may be experienced due to signal integrity issues
Spectrum-2 | When the port is a member of a LAG, after a warmboot and port toggle on the peer-side, the port remains down
Spectrum-3 | SN4700 | While using Optic cable in Split 4x1 mode in PAM4, when two first ports are toggled, the other 2 ports go down
Spectrum-3 | SN4700 | When working in 400GbE, deleting the headroom configuration (changing buffer size to zero) on the fly may cause continual packet drops
SAI
All | sFlow | Use hardcoded value 1 as netlink group number ax expected by hsflowd
- How I did it
Update the related version number in the make files and update the submodule pointer accordingly.
- How to verify it
Run regression test and everything works good.
To have the following fixes:
* All | Port status remains down after warm boot and flapping the port on peer side
* All | LAG HASH | IPv6 SRC_IP is not accounted in LAG hashing [
* All | ASIC driver | Kernel crash observed when driver reload is initiated before it fully loaded
* Spectrum-3 | Buffer | In lossless configuration, headroom is been evicted only when the shared buffers is free
* All | prevent FW access during ISSU
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Bugs fixes:
All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.
Enahncments:
All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Features:
Spectrum-3 | Systems | Added GA-level support for SN4700 A0 system
All | Shared headroom | Added GA-level support for Shared headroom between PGs
Bugs fixes:
All | Counters | Sent traffic in certain size is wrongly increase to a smaller size counter, because port extended counter has a counter for sent traffic per packet-size range
All | Shared buffer | Configuring shared buffer on the fly may, on occasion, cause the chip to get stuck
Spectrum-2 | Modules | On occasion, link down is experienced with INPHI COLORZ PAM4 100G optic cables on SN3700 systems
* [Mellanox] Update SAI to 1.18.0
* [Mellanox] Update SDK to 4.4.2112
* Updated Mellanox SAI to 1.18.0.2
* Updated bcmsai debians to use SAI 1.7.1
* Updated Mellanox to use SAI 1.7.1
* Updated submodule sonic-sairedis using SAI 1.7.1
Co-authored-by: Vineet Mittal <vmittalmittal@microsoft.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
In case of non-GA SDK version there is '-' symbol in Mellanox SDK version name. (For example: 4.4.1306-006)
In appropriate .deb packet there is '.' instead of '-'. Because of this there was problem while building SDK
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
- SN3800 vs Cisco9236 - no link copper or optics - start sending IDLE before PHY_UP for specific OPNs
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* Update sonic-sairedis (sairedis with SAI 1.6 headers)
* Update SAIBCM to 3.7.4.2, which is built upon SAI1.6 headers
* missed updating BRCM_SAI variable, fixed it
* Update SAIBCM to 3.7.4.2, updated link to libsaibcm
* [Mellanox] Update SAI (release:v1.16.3; API:v1.6)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
* Update sonic-sairedis pointer to include SAI1.6 headers
* [Mellanox] Update SDK to 4.4.0914 and FW to xx.2007.1112 to match SAI 1.16.3 (API:v1.6)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
* ensure the veth link is up in docker VS container
* ensure the veth link is up in docker VS container
* [Mellanox] Update SAI (release:v1.16.3.2; API:v1.6)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
* use 'config interface startup' instead of using ifconfig command, also undid the previous change'
Co-authored-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
Update SAI/SDK/FW and MSN4700 device files to support 8 lanes 400G
Update SAI to 1.16.3
Update SDK to 4.4.0914
Update FW to *.2007.1112
Update MSN4700 device files to support 8 lanes 400G