- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.
We now disable the i801 driver interrupt and instead enable polling
Microsoft ADO (number only): 24910530
How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver
How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-
- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3
Signed-off-by: Prince George <prgeor@microsoft.com>
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.
Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How to verify it
Manual test
sonic-mgmt test_sensors.py
- Why I did it
Add new breakout modes to be used in PAM4 supported cables
- How I did it
- How to verify it
Verified the 50G per lane breakout modes are applied properly on the switch
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Enabled port late create on SN5600 Spectrum-4 switch boots up with no ports
Work item tracking
N/A
- How I did it
Updated SAI xml config file
- How to verify it
Run sonic-mgmt tests of fastboot
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it
When a kernel crash occurs, the system will reboot to the kdump capture kernel if kdump is enabled (`config kdump enable`). In the kdump capture boot, it only stores the crash information, and then reboot the system to a normal boot.
In this boot, no SONiC service is started but it invokes `reboot` which is actually the SONiC reboot that depends on SONiC services. There is a logic to skip all SONiC stuff and invoke platform reboot in SONiC reboot to avoid issues.
However, on Nvidia platforms, the platform reboot still depends on SONiC services, which can cause issues.
So, the Debian reboot is called directly in platform reboot if it is invoked from the kdump capture boot.
#### How I did it
Manual test
Added support data for fabric monitoring in CONFIG_DB
The CONFIG_DB now has the FABRIC_MONITOR|FABRIC_MONITOR_DATA table for default value for fabric port monitoring. An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_MONITOR|FABRIC_MONITOR_DATA"
{'monErrThreshCrcCells': '1', 'monErrThreshRxCells': '61035156', 'monPollThreshIsolation': '1', 'monPollThreshRecovery': '8'}
The CONFIG_DB now also has a table for each fabric port for its isolate status.
An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_PORT|Fabric20"
{'alias': 'Fabric20', 'isolateStatus': 'False', 'lanes': '20'}
Co-authored-by: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com>
Why I did it
sonic-mgmt is failing tests due to invalid test data in platform.json
Fwutil is upset the chassis name in the platform_component.json of the 7060CX-32S
How I did it
Fixed the aforementioned issues
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard
How to verify it
Tests on chassis
Why I did it
fix possible cpld race read issue between watchdog and reboot cause
process
How I did it
Use fcntl.flock to limit parallel access to cpld sys file
How to verify it
It can be simulated and verified with following python script
``` python3
import fcntl
import signal
import threading
exit_flag = False
def get_cpld_reg_value(getreg_path, register):
file = open(getreg_path, 'w+')
# Acquire an exclusive lock on the file
fcntl.flock(file, fcntl.LOCK_EX)
try:
file.write(register + '\n')
file.flush()
# Seek to the beginning of the file
file.seek(0)
# Read the content of the file
result = file.readline().strip()
finally:
# Release the lock and close the file
fcntl.flock(file, fcntl.LOCK_UN)
file.close()
return result
def cpld_read(thread_num, cpld_reg, expect_val):
while not exit_flag:
val
= get_cpld_reg_value("/sys/devices/platform/dx010_cpld/getreg",
cpld_reg)
#print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}")
if val != expect_val:
print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}, expect_val {expect_val}")
def signal_handler(sig, frame):
global exit_flag
print("Ctrl+C detected. Quitting...")
exit_flag = True
if __name__ == '__main__':
# Register the signal handler for Ctrl+C
signal.signal(signal.SIGINT, signal_handler)
t1 = threading.Thread(target=cpld_read, args=(1, '0x103', '0x11',))
t2 = threading.Thread(target=cpld_read, args=(2, '0x141', '0x00',))
t1.start()
t2.start()
t1.join()
t2.join()
```
Why I did it
Update the device data files to support 1024 LAGs for Nokia IXR7250E platform
fixes https://github.com/Nokia-ION/ndk/issues/15
How I did it
Update the lag_id_end=1024 in chassisdb.conf file and add the trunk_group_max_members=16 in the BCM config file
How to verify it
check to allow to create lag ids up to 1024 with 16 port members
Signed-off-by: mlok <marty.lok@nokia.com>
* [202205] Update SOC properties for DLR_INIT based pfcwd recovery (#15217)
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
#### Why I did it
To add new SKU Mellanox-SN4700-O8C48 with following requirements:
| Port configuration | Value |
| ------ |--------- |
| Breakout mode for each port |**Defined in port mapping** |
| Speed of the port | **Defined in Port mapping** |
| Auto-negotiation enable/disable | **No setting required** |
| FEC mode | **No setting required** |
|Type of transceiver used | **Not needed**|
Buffer configuration | Value
------ |---------
Shared headroom | **Enabled**
Shared headroom pool factor | **2**
Dynamic Buffer | **Disable**
In static buffer scenario how many uplinks and downlinks? | **48x100G Downlinks and 8x400G uplinks**
2km cable support required? | **Yes**
Switch configuration | Value
------ |---------
Warmboot enabled? | **yes**
Should warmboot be added to SAI profile when enabled? | **yes**
Is VxLAN source port range set? | **No**
Should Vxlan source port range be added to SAI profile when set. | **No**
Is Static Policy Based Hashing enabled? | **No**
Port Mapping
| Ports | Mode |
| ------ |--------- |
| 1-12 | 2x100G |
| 13-20 | 1x400G |
| 21-32 | 2x100G |
Number of Uplinks / Downlinks:
T1 topology: **48x100G Downlinks 8x400G uplinks**.
Length of downlink: **40m**
Length of uplink: **2000m**
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Defined the SKU as per requirements
#### How to verify it
Load the SKU and verify if all links come up and traffic passes.
#### A picture of a cute animal (not mandatory but encouraged)
* [armhf][Nokia-7215]Add HWSKU files for new SAI
Add new easy bringup (EZB) files for new SAI 1.11.0
* [Nokia][devicedata]Modified the port autoneg default setting for Nokia-7215 platform
[armhf][Nokia-7215]Update profile.ini
Why I did it
Optimize Silverstone led init process, this linkscan = off can cause the sonic port link status async with bcm shell after reboot.
How I did it
Remove redundant code.
How to verify it
After reboot, the ports can linkup normally.
* Update PG headroom settings ports based on port speed/cable length
* Updated XOFF settings to use chip level numbers than core
* Updated PG headroom based on uplink/downlink side
* fix for sonic-config-gen tests
* More fixes for unit test cases
* more test fixes
* Merged multiple functions into one
Add new Nokia build target and establish an arm64 build:
Platform: arm64-nokia_ixs7215_52xb-r0
HwSKU: Nokia-7215-A1
ASIC: marvell
Port Config: 48x1G + 4x10G
How I did it
- Change make files for saiserver and syncd to use Bulleseye kernel
- Change Marvell SAI version to 1.11.0-1
- Add Prestera make files to build kernel, Flattened Device Tree blob and ramdisk for arm64 platforms
- Provide device and platform related files for new platform support (arm64-nokia_ixs7215_52xb-r0).
- Why I did it
Update SAI xml file to align with the default SKU
- How I did it
Update the SN5600 SAI xml file
- How to verify it
Install image on SN5600 device
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
- Why I did it
Update the sensors.conf and pcie.yaml according to the real hardware.
- How I did it
Update the sensors.conf and pcie.yaml
- How to verify it
run relevant sonic-mgmt test cases.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
Today at most 128 LAGs are supported. This is not sufficient if there are many LAGs with just few ports.
How I did it
Increase LAG Ids to 1024 for DNX device.
Why I did it
When reboot the chassis by issuing "sudo reboot" on Supervisor card. The internal midplane communication xe0 should be shutdown to avoid double reboot on the linecard.
Added a udev link rule to disable the autoneg on AMD xgbe port Xe0 and Xe1 and make the setting in sync with the peer Broadcom greyhound ports.
How I did it
Modify the Nokia-7250IXRE specific reboot script on the Supervisor card to shutdown the internal interface xe0. Also move reboot linecard code to the top of the script to make sure the notification has been send to Linecard before shutdown the xe0 interface.
Introduced a new rule 80-net-by-driver.link to disable the autoneg on the AMD size. This change requires the latest NDK which contains the change to set the autoneg on the xe0 and xe1 port on the Greyhound.
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Support Egress Mirroring on supported Arista platforms
How I did it
Add necessary soc properties for egress mirroring recycle ports to be created
Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Updated asic_port_names for all Arista LC SKUs to follow latest naming
conventions to remove redundant ASICx suffix. For
Arista-7800R3-48CQ2-C48, added the asic_port_name mapping.
Why I did it
sonic-sfp based sfp impl would be deprecated in future, change to sfp-refactor based implementation.
How I did it
Use the new sfp-refactor based sfp implementation for seastone.
How to verify it
Manual test sfp platform api or run sfp platform test cases.
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles
How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold
How to verify it
Verified the changes manually on a Th device.
Existing unit tests render Th template from the RDMA-CENTRIC folder. Updated the expected output to use the correct threshold
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic
How I did it
Modified pg_profile_lookup.ini to reflect the correct value
Signed-off-by: Neetha John <nejo@microsoft.com>
Provide platform-components.json for Clearwater2 and Wolverine
These files are needed for fwutil platform sonic-mgmt tests to pass.
Fix PikeZ platform_components.json
Co-authored-by: Patrick MacArthur <pmacarthur@arista.com>
Co-authored-by: Andy Wong <andywong@arista.com>
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
To support 64 cores on arista skus. Fixesaristanetworks/sonic#77
Remapped recycle ports to lowers core port ids and set appl_param_nof_ports_per_modid to 64.