Commit Graph

1301 Commits

Author SHA1 Message Date
Vadym Hlushko
1d57472eb0
[graceful reboot] Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle (#18324)
**DEPENDS ON: [[graceful reboot] Add the pre_reboot_hook script execution, add the watchdog arm before the reboot](https://github.com/sonic-net/sonic-utilities/pull/3203)**

#### Why I did it
Add support for the `graceful reboot` instead of  the `sysfs power cycle` to avoid filesystem corruption 

### How I did it
Rename the `platform_reboot` script to the `pre_reboot_hook`.
Remove the sysfs power cycle function, from now on the Debian reboot (`/sbin/reboot`) will be executed instead of the sysfs power cycle.

#### How to verify it
1. Start watching logs by using `show log -f` and `journalctl -p debug -f`
2. Execute the `reboot` command from the switch CLI
3. Check in logs that all systemd services terminated
2024-03-23 16:45:36 -07:00
jfeng-arista
84ec2a43e0
Fix arista_7800r3a_36d2_lc fabric link down issue (#18198) 2024-03-22 16:25:39 -07:00
Zhijian Li
ab966ceeea
[E1031] Bugfix for Python syntax error in sonic_platform/common.py (#18386)
Why I did it
Bugfix for Python syntax error in sonic_platform/common.py.
A method of class need to have self as parameter.

Fixing below issue:

e1031:~$ show int st
Traceback (most recent call last):
  File "/usr/local/bin/intfutil", line 836, in <module>
    main()
  File "/usr/local/bin/intfutil", line 819, in main
    interface_stat.display_intf_status()
  File "/usr/local/bin/intfutil", line 448, in display_intf_status
    self.get_intf_status()
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/multi_asic.py", line 157, in wrapped_run_on_all_asics
    func(self,  *args, **kwargs)
  File "/usr/local/bin/intfutil", line 529, in get_intf_status
    self.portchannel_speed_dict = po_speed_dict(self.po_int_dict, self.db)
  File "/usr/local/bin/intfutil", line 334, in po_speed_dict
    optics_type = port_optics_get(appl_db, value[0], PORT_OPTICS_TYPE)
  File "/usr/local/bin/intfutil", line 224, in port_optics_get
    if is_rj45_port(intf_name):
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/platform_sfputil_helper.py", line 120, in is_rj45_port
    platform_chassis = sonic_platform.platform.Platform().get_chassis()
  File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
    self._chassis = Chassis()
  File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 37, in __init__
    self._is_host = self._api_common.is_host()
TypeError: is_host() takes 0 positional arguments but 1 was given
Work item tracking
Microsoft ADO (number only): 27208152
How I did it
Add self parameter to function Common::is_host().

How to verify it
Verified on E1031 DUT with this patch.
2024-03-19 14:04:38 +08:00
Andriy Yurkiv
2e1410c7b7
[Mellanox] Support DSCP remapping in Dual-ToR topo for SN4700-O8V48, update buffers for t0 (#18293)
* [Mellanox] Support DSCP remapping in Dual-ToR topo for SN4700-O8V48, update buffers for t0

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>

* [Mellanox] Support DSCP remapping in Dual-ToR topo for SN4700-O8V48, update buffers for t0 (fixes after recalculation)

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>

---------

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2024-03-18 14:32:24 -07:00
Pavan-Nokia
9505f820c1
[armhf][Nokia-7215]Update HWSKU files for new SAI (#18294) 2024-03-10 21:44:31 -07:00
Xichen96
c66acd93b4
replace host check command in e1031 (#18279) 2024-03-07 08:46:28 -08:00
Pavan-Nokia
d4ca86bf9d
[Nokia-7215-A1]Update Nokia-7215-A1 Platform (#18147)
1) Update Nokia-7215-A1 platform to address UT and OC test failures
2) Enable watchdog service
3) EZB files for SAI upgrade
2024-03-04 10:53:00 -08:00
Tomer Shalvi
744a152685
[Mellanox] Adding a new field to CONFIG DB: "subport" (#18204)
- Why I did it
The field 'subport' represents the index of the split port within a physical port. For example, if a port is split into 4, the subport of the first logical port is 1, the subport of the second logical port is 2, and so on.
In xcvrd, the CMIS manager uses the subport to calculate the lane mask, which is used to control the data path per lane. In Nvidia platform, the subport is missing and is always set to 0. According to the xcvrd code, when subport=0, it will always correspond to the first logical port. Therefore, if we shut down any logical port that is not the first one, we will see the operational status of the first logical port also becomes down.
This PR aims to add the subport field to CONFIG DB and prevent such scenarios. This is applicable only for static default breakout mode. For DPB, subport calculation will happen on the fly (changes are not in Sonic yet).
(Subport HLD: HLD of subport: [link to the HLD document])

- How I did it
I have added the 'subport' field to all relevant Nvidia hwsku.json files (minigraph generation is based on them). Additionally, I introduced the new 'subport' field to portconfig.py, so that sonic-cfggen will be able to generate the minigraph with it. In this file, I also fixed an error that caused all attributes from hwsku.json to be applied only to the first logical ports associated with a physical port.
Furthermore, I updated hwsku_json_checker to include the new field and applied a fix to the sample_hwsku.json file. sample_hwsku.json is the file that sonic-config-engine's unit tests rely on for its tests. Previously, it only included attributes for the first logical port of a split physical port. For example, if Ethernet4, a 4-lane port, was split into 2 ports, then sample_hwsku.json included only the entry for Ethernet4, with no entry for Ethernet6. This misalignment with the structure of other hwsku.json files has been corrected as well.

- How to verify it
Ensure that each logical port has the correct value of 'subport' in CONFIG DB, and that shutting down a logical port affects only that port and not other ports in the split.
2024-02-29 16:07:10 +02:00
noaOrMlnx
b25dfa91c1
[Mellanox] Update Nvidia sai.profile SKU files to have common file (#18074)
* Update Nvidia sai.profile SKU files to have common file

* Remove SAI_DUMP_MFT_CFG_PATH from sai-common.profile as it is not in use
2024-02-28 11:05:20 -08:00
Kebo Liu
1b5f72127a
[Mellanox] Remove SFP sensors from sensors.conf (#17631)
- Why I did it
The cable thermal sensors will be deprecated from the kernel driver. When cable host management is enabled, NOS will fetch the cable temperature from cable EEPROM, kernel driver will not provide the sysfs anymore.

- How I did it
Remove the relevant sensor form the conf files

- How to verify it
Run sonic mgmt sensor test

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2024-02-12 16:12:57 +02:00
snider-nokia
7f3fd1377d
[Nokia-IXR7250E][Devicedata] Update the device data for Nokia IXR7250E platform (thermal logging thresholds) (#18063)
These changes adjust Nokia IXR7250 thermal sensor logging thresholds.

Why I did it
To modify the thermal sensor logging thresholds used on LC and Supervisor.

How I did it
Modified the JSON based thermal logging thresholds used to determine when to log current high sensor temperature and hottest sensor margin fluctuations.

How to verify it
Verify that syslog messages indicating current (high) temperature and margin values are only logged when these respective values fluctuate by at least 5 degrees.
2024-02-08 13:03:05 -08:00
Arvindsrinivasan Lakshmi Narasimhan
4703192d0f
[nokia][chassis][voq] update the sai_post_init soc file with interrupt ids (#18066)
Update/Add the sai_postinit_cmd.soc with the interrupt-ids

Microsoft ADO 26730061:

How to verify it
Verify on the Chassis LCs
2024-02-08 13:01:51 -08:00
Volodymyr Samotiy
f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00
vdahiya12
9f18587234
[Arista] Update config.bcm of 7060_cx32s for handling 40g optics with unreliable los settings (#17768)
For 40G optics there is SAI handling of T0 facing ports to be set with SR4 type and unreliable los set for a fixed set of ports. For this property to be invoked the requirement is set
phy_unlos_msft=1 in config.bcm.
This change is to meet the requirement and once this property is set, the los/interface type settings is applied by SAI on the required ports.

Why I did it
For Arista-7060CX-32S-Q32 T1, 40G ports RX_ERR minimalization during connected device reboot
can be achieved by turning on Unreliable LOS and SR4 media_type for all ports which are connected to T0.

The property phy_unlos_msft=1 is to exclusively enable this property.

Microsoft ADO: 25941176

How I did it
Changes in SAI and turning on property

How to verify it
Ran the changes on a testbed and verified configurations are as intended.

with property

admin@sonic2:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on                    = 0
Media Type                  = 2
Unreliable LOS              = 1
Scrambling Disable          = 0
Lane Config from PCS        = 0

without property

admin@sonic:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on                    = 0
Media Type                  = 0
Unreliable LOS              = 0
Scrambling Disable          = 0
Lane Config from PCS        = 0

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2024-01-16 11:34:19 -08:00
Marty Y. Lok
a5d443d60a
[Nokia-IXR7250E] Modify the platform_reboot on the IXR7250E for PMON API reboot and Disable all SFPs (#17483)
Why I did it
When Supervisor card is rebooted by using PMON API, it takes about 90 seconds to trigger the shutdown in down path. At this time linecards have been up. This delays linecards database initialization which is trying to PING/PONG the database-chassis. To address this issue, we modified the NDK to use the system call with "sudo reboot" when the request is from PMON API on Supervisor case. The NDK version is 22.9.20 and greater. This new NDK requires this modifcaiton of platform_reboot to work with.

Work item tracking
Microsoft ADO (number only): 26365734
How I did it
Modify the platform_reboot In Supervisor not to reboot all IMMs since it has been done in the function reboot() in module.py. Also handle the reboot-cause.txt for on the Supervisor when the reboot is request from PMON API.
Modify the Nokia platform specific platform_reboot in linecard to disable all SPFs.
This PR works with NDK version 22.9.20 and above

Signed-off-by: mlok <marty.lok@nokia.com>
2024-01-08 11:39:30 -08:00
snider-nokia
98f24b639e
[Nokia][sonic-platform] Update Nokia sonic-platform submodule and device data (#17378)
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.

Why I did it
To address thermal logging deficiencies.

Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:

Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.

Modify the module.py reboot to support reboot linecard from Supervisor

 - Modify reboot to call _reboot_imm for single IMM card reboot
 - Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output

 - Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
 - Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
 - Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
 - Provide ability for IMM to disable all transceiver module TX at reboot time
 - Remove defunct xcvr-resync service
2024-01-08 11:38:46 -08:00
bktsim
c0bc1d9753
[Arista] Remove aggregate port config files for multi-asic devices (#16923)
An aggregate port_config.ini file for Arista multi-asic devices was first introduced by mistake. This PR cleans up these unnecessary files.
2023-12-22 17:10:41 -08:00
byu343
2559d7e541
[Arista] Use port_config.ini for Arista-7050QX-32S-S4Q31 (#17253)
This change of removing hwsku.json is to correct the port index for
sfp ports (Ethernet0, Ethernet1, Ethernet2, Ethernet3) by using
port_config.ini, which should be '1, 2, 3, 4'. We could not do it
with hwsku.json, as it is defined as '5, 5, 5, 5' by platform.json
for the breakout_mode 1x40G[10G].
2023-12-20 15:29:43 +08:00
Junchao-Mellanox
c1cb292310
[Mellanox] implement platform wait in python code (#17398)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2023-12-14 12:04:24 +02:00
DavidZagury
ee598deced
[Mellanox][SKU] Adding Mellanox-SN4700-O8V48 SKU (#17425)
- Why I did it
To add new SKU Mellanox-SN4700-O8V48 with following requirements:

- How I did it
Create new SKU files based on the below definition:
* Port Mapping: 1-12 2x200G, 13-20 1x400G, 21-32 2x200G
   T0 topology: 48x200G Downlinks 8x400G uplinks.
   Length of downlink: 5m
   Length of uplink: 40m
* Auto-negotiation enable/disable: Yes
* FEC mode: RS
* Shared headroom: Enabled
* Shared headroom pool factor: 2
* Warmboot enabled: yes

- How to verify it
SONiC build with new SKU finish init, all ports up, qos tests suite from sonic-mgmt
2023-12-10 16:18:11 +02:00
Junchao-Mellanox
c02c8f0cc3
[Mellanox] remove log in RAM kernel option for 2700 A1 platform (#17254)
- Why I did it
Remove logs_inram kernel option

- How I did it
Remove logs_inram kernel option

- How to verify it
SONiC mgmt regression test of 202305
2023-12-05 17:52:38 +02:00
Ashwin Hiranniah
ada7c6a72e
Add pensando platform (#15978)
This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches.

#### Why I did it
This commit introduces pensando platform which is based on ELBA ASIC.
##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Created platform/pensando folder and created makefiles specific to pensando.
This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC).
Output of the build process creates two images which can be used from ONIE and goldfw.
Recommendation is use to use ONIE.
#### How to verify it
Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP.

##### Description for the changelog
Add pensando platform support.
2023-12-04 14:41:52 -08:00
arista-nwolfe
865f33c62d
[Arista]: Disable SA_EQUALS_DA trap on DNX LC SKUs (#17206)
This change was submitted directly to 202205 but it's also needed in master and 202305 with SAI9.x
#13346

There has been a couple CSPs for this as well:
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012320965 - SAI9.2: iBGP doesn't work due to SA_EQUALS_DA trap

If SA_EQUALS_DA trap is enabled iBGP won't work as the Ethernet-IB0 ports are expected to get packets with SA==DA.

In the VOQ chassis design, for outgoing control plane packets, the packets goes the recycle port for routing, therefore the dmac of the packet should be the asic router mac. The source mac is assigned by the kernel, so it is also the asic router mac.
2023-11-28 16:25:43 -08:00
Pavan-Nokia
90ff72c885
[armhf][Nokia-7215] Remove platform reboot (#17010) 2023-11-27 11:00:12 -08:00
Vivek
787dd7221d [Mellanox] Upgrade HW-MGMT to 7.0030.2008 and update platform-api (#17134)
Why I did it
Add platform support for Debian 12 (Bookworm) on Mellanox Platform

How I did it
Update hw-management to v7.0030.2008
Deprecate the sfp_count == module_count approach in favour of asic init completion
Ref: Mellanox/hw-mgmt@bf4f593
Add xxd package to base image which is required by hw-management scripts
Add the non-upstream flag into linux kernel cache options
Update the thermalctl logic based on new sysfs attributes
Fix the integrate-mlnx-hw-mgmt script to not populate the arm64 Kconfig
How to verify it
Build kernel and run platform tests

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-11-21 18:53:15 -08:00
Oleksandr Ivantsiv
c2af11064f
[Mellanox] Change the default breakout mode for internal ports of the Mellanox-SN4700-O28 SKU. (#17192)
- Why I did it
Fix the issue with configuration generation from the minigrapth:

- How I did it
Change the default breakout mode for internal ports to the mode that corresponds platfom.json configuration.

- How to verify it
1. Deploy minigraph
2. Run config load_minigraph -y command
2023-11-21 09:51:36 +02:00
jfeng-arista
6dfaf5e293
[sonic-vs]: Add fabric port data for vs test, and start fabricmgrd in vs environment (#16791)
Add fabric port data for vs test, and start fabricmgrd in vs environment.

This PR depends on sonic-net/sonic-sairedis#1301

sonic-net/sonic-swss#2920 needs this one merge first.
2023-11-20 16:21:03 -08:00
Pavan Naregundi
307e39bde4
[Marvell-arm64] Add platform support for rd98DX35xx (#16874)
* [Marvell-arm64] Add platform support for rd98DX35xx

This change adds following two variants of rd98DX35xx board to arm64
build.

Board with CPU integrated into the 98DX35xx switching chip:

 Platform: arm64-marvell_rd98DX35xx-r0
 HwSKU: rd98DX35xx
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Board with external CN9131 CPU connected over PCI to 98DX35xx
switching chip:

 Platform: arm64-marvell_rd98DX35xx_cn9131-r0
 HwSKU: rd98DX35xx_cn9131
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Change-Id: I21dc9fe972417daaabb20a5bddf7779d72b7972e
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

* Add HWSKU for rd98DX35xx and rd98DX35xx_cn9131

This patch adds new HWSKU's for Marvell arm64 platforms rd98DX35xx
and rd98DX35xx_cn9131.

Change-Id: Id7c14f49f0e304335cc4ca73dcae52362c49d231
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

---------

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-11-20 09:43:02 -08:00
Stephen Sun
b93852d53d
[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584)
- Why I did it
Support running hw-management service on MSN4700 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it

- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-11-19 11:03:46 +02:00
saksarav-nokia
534eed9de7
[Nokia][Nokia-IXR7250E-SUP-10] Update BCM config for supervisor card to reduce the CPU usage (#16790)
Disabled the bcmCNTR thread to reduce the CPU usage for Nokia SFM cards.

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
2023-11-17 15:11:05 -08:00
arista-nwolfe
00a9412880
[Arista]: Set SYNCD_SHM_SIZE for Arista DNX Devices (#17205)
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.

E.G. of a few failures seen when insufficient shmem was set

ha_init:  The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.

Syncd hangs here:

syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.

Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
2023-11-17 09:06:25 -08:00
Kebo Liu
8b62e7a5b2
[Mellanox] fix new MSN2700-A1 platform name (#17151)
- Why I did it
New introduced MSN2700 platform has a different platform name compared to the old one, it should be "MSN2700-A1".

- How I did it
Update the name to the new one in platform.json and platform_components.json.

- How to verify it
run platform-related sonic-mgmt test cases on the new platform.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-11-15 08:29:11 +02:00
Sudharsan Dhamal Gopalarathnam
070d488e9d
[Mellanox] [SN5600] Removing 8x DPB mode from platform files (#17071)
- Why I did it
Removing 8x split DPB mode from platform files since it is not fully supported yet.

- How I did it
Updating platform file.

- How to verify it
Manual testing.
2023-11-07 08:45:23 +02:00
Nazarii Hnydyn
845bb80a3c
[ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945)
HLD: sonic-net/SONiC#1084

To improve FAST reboot dataplane downtime

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-01 21:50:14 -07:00
Pavan Naregundi
add98b221b [Marvell-arm64]: Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.

Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-10-12 02:07:36 -07:00
Samuel Angebault
be22217b64
[Arista] Remove pcie device monitoring for 7260CX3-64 (#12734)
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh

How I did it
Remove this PCIe device from pcie.yaml

How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.

ADO: 25447788
2023-10-11 22:57:34 -07:00
Ashwin Srinivasan
61683d9d64
Revert "Move /var/log to RAM for Mellanox SN2700, Nokia 7215 and Dell S6100 (#15077)" (#16775)
This reverts commit 05f326eed9.

Microsoft ADO 25355843:
2023-10-11 10:36:29 -07:00
Yakiv Huryk
5719d1a59a
[Mellanox] add Mellanox-SN4700-O28 SKU (#16784)
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.

- How I did it
Add new SKU with all relevant files.

- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-10-10 19:20:10 +03:00
Nazarii Hnydyn
875a6d9a1f
[Mellanox][Switching Mode] Enable Store-And-Forward switching mode on specific platforms (#16781)
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.

- How I did it
Added vendor SAI config options

- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-10-09 19:00:02 +03:00
Vadym Hlushko
3bd396043e
[buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-03 08:35:57 -07:00
Nazarii Hnydyn
d1ea3620c0
[Mellanox]: Update default SKU for SN2700. (#16663)
Set default SKU for SN2700: Mellanox-SN2700 -> ACS-MSN2700

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-09-30 01:43:30 -07:00
vmittal-msft
9068bd986b
[nokia]: Updated total headroom pool size to accommodate 100G ports on T2 uplinks (#16690)
Microsoft ADO (25266920)

sonic-mgmt xoff test was failing for [100g,120km]. Needed to update total headroom pool size when 100G line card is used as T2 uplink.

This size was calculated assuming 100g is used for downlink so cable length was 2km whereas it can also be used for uplink (cable length - 120km). so we need to do calculation based on 120km not 2km. Although it will be some wastage for 2km scenario but it should cover both cases.
2023-09-26 15:58:34 -07:00
byu343
504f1163d3
[Arista] Add new hwskus to x86_64-arista_7060dx5_32 (#16077)
Add two new hwskus for different port speed layouts

Arista-7060DX5-32-25Gx96-100Gx8-200Gx8
Arista-7060DX5-32-200Gx50-100Gx14

Disable bfd on all hwskus for x86_64-arista_7060dx5_32 as its dependencies have not been ready, which will result in a runtime error if not disabled.
2023-09-23 01:42:31 -07:00
Junchao-Mellanox
5138afe4e7
[Mellanox] add new platform 2700 a1 (#16515)
- new pcie.yaml
- new sensors.conf
- new thermal support
- new platform.json file
- adjust test code
2023-09-23 00:15:17 -07:00
Myron Sosyak
d35bf7ef57
[devices] Add DPB support for x86_64-dell_z9100_c2538-r0 (#16538)
Why I did it
Added DPB support for x86_64-dell_z9100_c2538-r0 device

How I did it
Added new SKU folder Force10-Z9100 based on Force10-Z9100-C32
Added platform.json and hwsku.json
Added generic th-z9100-flex-all.config.bcm

How to verify it
On x86_64-dell_z9100_c2538-r0 with changes from this PR

change default SKU to Force10-Z9100
do factory reset
reboot

Signed-off-by: Myron Sosyak <myron.sosyak@plvision.eu>
Co-authored-by: Andriy Kokhan <andriy.kokhan@gmail.com>
2023-09-23 00:12:43 -07:00
Kebo Liu
e286869b24
[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239)
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems

- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-06 11:32:08 +03:00
Prince George
a4e37a5cd6
[platform]: Disable interrupt for intel i2c-i801 driver (#16309)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
2023-09-05 10:23:57 -07:00
Pavan-Nokia
31194124b5
[armhf][Nokia-7215]Add HWSKU files for new SAI (#16321)
Add new easy bringup (EZB) files for new SAI 1.12.0
2023-09-05 10:21:53 -07:00
Vadym Hlushko
78587cedc3
[Mellanox] Remove mlxtrace support for SPC4 (#16373)
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.

- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-04 10:53:20 +03:00
Andrew Sapronov
0405b369af
[Netberg][Barefoot] Added support for Aurora 750 (#16342)
Why I did it
Support Intel Tofino based platforms Netberg Aurora 750
ASIC: Intel Tofino BFN-T10-064Q
Pors: 64x 100G

How I did it
Added specification to device/netberg directory
Added platform/barefoot/sonic-platform-modules-netberg contains kernel modules, scripts and sonic_platform packages.
Modified the platform/barefoot/platform-modules-netberg.mk to include Aurora 750 related ID.

Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
2023-09-01 22:52:39 -07:00