This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches.
#### Why I did it
This commit introduces pensando platform which is based on ELBA ASIC.
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Created platform/pensando folder and created makefiles specific to pensando.
This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC).
Output of the build process creates two images which can be used from ONIE and goldfw.
Recommendation is use to use ONIE.
#### How to verify it
Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP.
##### Description for the changelog
Add pensando platform support.
This change was submitted directly to 202205 but it's also needed in master and 202305 with SAI9.x
#13346
There has been a couple CSPs for this as well:
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012320965 - SAI9.2: iBGP doesn't work due to SA_EQUALS_DA trap
If SA_EQUALS_DA trap is enabled iBGP won't work as the Ethernet-IB0 ports are expected to get packets with SA==DA.
In the VOQ chassis design, for outgoing control plane packets, the packets goes the recycle port for routing, therefore the dmac of the packet should be the asic router mac. The source mac is assigned by the kernel, so it is also the asic router mac.
Why I did it
Add platform support for Debian 12 (Bookworm) on Mellanox Platform
How I did it
Update hw-management to v7.0030.2008
Deprecate the sfp_count == module_count approach in favour of asic init completion
Ref: Mellanox/hw-mgmt@bf4f593
Add xxd package to base image which is required by hw-management scripts
Add the non-upstream flag into linux kernel cache options
Update the thermalctl logic based on new sysfs attributes
Fix the integrate-mlnx-hw-mgmt script to not populate the arm64 Kconfig
How to verify it
Build kernel and run platform tests
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
- Why I did it
Fix the issue with configuration generation from the minigrapth:
- How I did it
Change the default breakout mode for internal ports to the mode that corresponds platfom.json configuration.
- How to verify it
1. Deploy minigraph
2. Run config load_minigraph -y command
* [Marvell-arm64] Add platform support for rd98DX35xx
This change adds following two variants of rd98DX35xx board to arm64
build.
Board with CPU integrated into the 98DX35xx switching chip:
Platform: arm64-marvell_rd98DX35xx-r0
HwSKU: rd98DX35xx
ASIC: marvell
Port Config: 32x1G + 16x2.5G + 6x25G
Board with external CN9131 CPU connected over PCI to 98DX35xx
switching chip:
Platform: arm64-marvell_rd98DX35xx_cn9131-r0
HwSKU: rd98DX35xx_cn9131
ASIC: marvell
Port Config: 32x1G + 16x2.5G + 6x25G
Change-Id: I21dc9fe972417daaabb20a5bddf7779d72b7972e
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Add HWSKU for rd98DX35xx and rd98DX35xx_cn9131
This patch adds new HWSKU's for Marvell arm64 platforms rd98DX35xx
and rd98DX35xx_cn9131.
Change-Id: Id7c14f49f0e304335cc4ca73dcae52362c49d231
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
---------
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
- Why I did it
Support running hw-management service on MSN4700 emulation platform.
- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it
- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.
E.G. of a few failures seen when insufficient shmem was set
ha_init: The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.
Syncd hangs here:
syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.
Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
- Why I did it
New introduced MSN2700 platform has a different platform name compared to the old one, it should be "MSN2700-A1".
- How I did it
Update the name to the new one in platform.json and platform_components.json.
- How to verify it
run platform-related sonic-mgmt test cases on the new platform.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Removing 8x split DPB mode from platform files since it is not fully supported yet.
- How I did it
Updating platform file.
- How to verify it
Manual testing.
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh
How I did it
Remove this PCIe device from pcie.yaml
How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.
ADO: 25447788
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.
- How I did it
Add new SKU with all relevant files.
- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.
- How I did it
Added vendor SAI config options
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
* [buffers] Align the sonic-device_metadata.yang
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
---------
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Microsoft ADO (25266920)
sonic-mgmt xoff test was failing for [100g,120km]. Needed to update total headroom pool size when 100G line card is used as T2 uplink.
This size was calculated assuming 100g is used for downlink so cable length was 2km whereas it can also be used for uplink (cable length - 120km). so we need to do calculation based on 120km not 2km. Although it will be some wastage for 2km scenario but it should cover both cases.
Add two new hwskus for different port speed layouts
Arista-7060DX5-32-25Gx96-100Gx8-200Gx8
Arista-7060DX5-32-200Gx50-100Gx14
Disable bfd on all hwskus for x86_64-arista_7060dx5_32 as its dependencies have not been ready, which will result in a runtime error if not disabled.
Why I did it
Added DPB support for x86_64-dell_z9100_c2538-r0 device
How I did it
Added new SKU folder Force10-Z9100 based on Force10-Z9100-C32
Added platform.json and hwsku.json
Added generic th-z9100-flex-all.config.bcm
How to verify it
On x86_64-dell_z9100_c2538-r0 with changes from this PR
change default SKU to Force10-Z9100
do factory reset
reboot
Signed-off-by: Myron Sosyak <myron.sosyak@plvision.eu>
Co-authored-by: Andriy Kokhan <andriy.kokhan@gmail.com>
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.
We now disable the i801 driver interrupt and instead enable polling
Microsoft ADO (number only): 24910530
How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver
How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-
- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3
Signed-off-by: Prince George <prgeor@microsoft.com>
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Why I did it
Support Intel Tofino based platforms Netberg Aurora 750
ASIC: Intel Tofino BFN-T10-064Q
Pors: 64x 100G
How I did it
Added specification to device/netberg directory
Added platform/barefoot/sonic-platform-modules-netberg contains kernel modules, scripts and sonic_platform packages.
Modified the platform/barefoot/platform-modules-netberg.mk to include Aurora 750 related ID.
Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How to verify it
Manual test
sonic-mgmt test_sensors.py
What I did it
Add new platform arm64-ragile_ra-b6010-48gt4x-r0 (Centec)
ASIC Vendor: Centec
Switch ASIC: Centec
Port Config: 48x1G+4x10G
Why I did it
Add new platform RA-B6010-48GT4X
How I did it
Add new platform RA-B6010-48GT4X
Signed-off-by: pettershao-ragilenetworks <pettershao@ragilenetworks.com>
- Why I did it
Enabled port late create on SN5600 Spectrum-4 switch boots up with no ports
Work item tracking
N/A
- How I did it
Updated SAI xml config file
- How to verify it
Run sonic-mgmt tests of fastboot
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Why I did it
System health config is missing in few Dell platforms.
How I did it
Added system health monitoring config and its related API's
How to verify it
show system-health summary/detail commands.
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
These devices will not reliabily report the proper devid and vendorid
when reading it is read directly from the pci config space.
It can be read but shouldn't be compared against some fixed value like
the one stored in pcie.yaml.
Since this makes pcied unhappy, the simplest path forward is to just
remove this device from monitoring.
* sonic-buildimage: Add 7060DX5-64S brcm tunnel config
Add bcm_tunnel_term_compatible_mode: 1 support, which allows
Loopback configuration to no longer result in SAI failure
"tunnel terminator add failed with error Feature unavailable"
that caused Orchagent SIGABRT
Signed-off-by: Aaron Payment <aaronp@arista.com>
* sonic-buildimage: Set port config ENABLE:0 in 7060DX5-64S brcm config
Set ENABLE:0 for the front panel ports in the brcm config so that the
ports are default admin down. This change prevents the issue that ports
are able to link up and pass traffic resulting in mac learn events after
SAI create switch and before SAI admin state up. The unexpected mac learn events
resulted in Orch agent crash in PortsOrch init, which occurs after SAI
create switch and before SAI admin state up.
* fix sensors.conf on CatalinaDD
* Add support for two sfp ports
* Add copper 50g tuning to babbagelp on catalina
---------
Signed-off-by: Aaron Payment <aaronp@arista.com>
Co-authored-by: enes.oncu <enes.oncu@arista.com>
Co-authored-by: Boyang Yu <byu@arista.com>
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.
Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)
Signed-off-by: mlok <marty.lok@nokia.com>
- Why I did it
Add new breakout modes to be used in PAM4 supported cables
- How I did it
- How to verify it
Verified the 50G per lane breakout modes are applied properly on the switch
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
Add PDDF support on following Ufispace platforms with Broadcom ASIC
S9110-32X
S8901-54XC
S7801-54XS
S6301-56ST
How I did it
Add PDDF configuration files, scripts and python files
How to verify it
Run pddf commands and show commands.
Signed-off-by: nonodark <ef67891@yahoo.com.tw>
Why I did it
To overwrite the default DSCP_TO_TC_MAP for tunnel traffic, the attribute sai_tunnel_support must be set to 1.
Before this change, the attribute is set only on dual-tor platform when remap is enabled.
This PR is to set the attribute on all Arista-7260cx3 devices.
Work item tracking
Microsoft ADO 24785776
How I did it
Update the config.bcm template for Arista-7260cx3 devices.
How to verify it
The change is verified by manually rendering the j2 on a T1 testbed.
Syncd will abort in handleSaiCreateStatus with
'Encountered failure in create operation, exiting orchagent,SAI API: SAI_API_TUNNEL, status: SAI_STATUS_NOT_SUPPORTED'
The fix is to add the following brcm config to prevent the error:
sai_tunnel_global_sip_mask_enable=1
bcm_tunnel_term_compatible_mode=1
Signed-off-by: Aaron Payment <aaronp@arista.com>
#### Why I did it
When a kernel crash occurs, the system will reboot to the kdump capture kernel if kdump is enabled (`config kdump enable`). In the kdump capture boot, it only stores the crash information, and then reboot the system to a normal boot.
In this boot, no SONiC service is started but it invokes `reboot` which is actually the SONiC reboot that depends on SONiC services. There is a logic to skip all SONiC stuff and invoke platform reboot in SONiC reboot to avoid issues.
However, on Nvidia platforms, the platform reboot still depends on SONiC services, which can cause issues.
So, the Debian reboot is called directly in platform reboot if it is invoked from the kdump capture boot.
#### How I did it
Manual test