sonic-buildimage

Author	SHA1	Message	Date
Vivek	f49ae28948	[Mellanox] Fix the hw-mgmt intg tool case sensitivity for KConfig (#14709 ) Fix the script to consider case sensitivity while writing the kconfig Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2023-05-06 12:32:19 +08:00
mssonicbld	10e635be93	[Mellanox] Facilitate automatic integration of new hw-mgmt (#14594 ) (#14966 )	2023-05-06 09:08:54 +08:00
Lior Avramov	d7d8d7754d	[Mellanox] [202211] Replace iproute2 supplied by SDK to iproute2 downloaded from Debian repository (#14726 ) (#14724 ) - Why I did it Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2 - How I did it Download iproute2 from Debian repository, apply patches and compile to create a new target. The target is then deployed in syncd container of Mellanox switches only. The new target is called IPROUTE2_MLNX. - How to verify it Compile and load on switch, verify interfaces network devices created successfully. Verify LLDP shows connections to neighbors. Verify ping between 2 hosts over 2 router ports is successful.	2023-05-02 10:29:02 +03:00
mssonicbld	6781c4a4fb	Made non-upstream patch design order aware (#14434 ) (#14650 )	2023-04-14 03:29:35 +08:00
Sudharsan Dhamal Gopalarathnam	156189dbad	[Mellanox]Fix lpmode set when logical port is larger than 64 (#14138 ) - Why I did it In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1 - How I did it Remove the hardcoded value of 64. Obtained the number of logical ports from SDK - How to verify it Manual testing	2023-03-19 20:50:58 +08:00
Dror Prital	ba14f728de	Update SDK/FW to version 4.5.4206/4.5.4204 (#14164 ) - Why I did it To include latest fixes: Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten. Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value. Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete. Fix False errors during SDK deinitialization may be seen in the syslog - How I did it Updated SDK submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt".	2023-03-19 20:50:49 +08:00
dbarashinvd	d7ba89a95b	[Mellanox] fix for watchdog device not found, adding dependency on hw-management (#14182 ) - Why I did it Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready - How I did it Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device. - How to verify it verification test of ONIE installation of image in a loop making sure watchdog service is always up (not failed) after first installation from ONIE	2023-03-19 20:50:44 +08:00
Volodymyr Samotiy	cc5ed4b632	[Mellanox] Update MFT to 4.22.1-15 (#14133 ) Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2023-03-19 18:33:57 +08:00
Stepan Blyshchak	969166d769	[Mellanox] Place FW binaries under platform directory instead of squashfs (#13837 ) Fixes #13568 Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation: admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa - Why I did it 202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change. - How I did it Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation. /etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade. - How to verify it Upgrade from 201911 to master master to 201911 downgrade master -> master reboot ONIE -> master boot (First FW burn) Which release branch to backport (provide reason below if selected)	2023-03-08 13:50:18 +08:00
mssonicbld	aea96da04d	[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710 ) (#13962 )	2023-03-06 16:47:31 +08:00
mssonicbld	1757f53290	[Mellanox] update sdk/fw build procedure (#14025 ) (#14059 )	2023-03-03 02:43:19 +08:00
mssonicbld	18bc044179	Remove support to Mellanox SPC4 ASIC (#13932 ) (#13957 )	2023-02-23 22:22:35 +08:00
mssonicbld	310827c26c	Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847 ) (#13959 )	2023-02-23 20:32:15 +08:00
mssonicbld	50aaf92590	[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792 ) (#13960 )	2023-02-23 20:32:09 +08:00
Junchao-Mellanox	e8789a2e11	[Mellanox] Check system eeprom existence in a retry manner (#13884 ) - Why I did it On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it. - How I did it Waiting EEPROM creation in platform API up to 10 seconds. - How to verify it Manual test	2023-02-23 20:31:29 +08:00
mssonicbld	6a12ca9332	[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814 ) (#13931 )	2023-02-22 22:14:09 +08:00
Stephen Sun	b0416a5c2c	[Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372 ) - Why I did it Advance hw-mgmt service to V.7.0020.4100 Add missing thermal sensors that are supported by hw-mgmt package Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready. Depends on sonic-net/sonic-linux-kernel#305 - How I did it 1. Update hw mgmt version 2. Add missing sensors 3. Delay service - How to verify it Regression test. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-02-20 14:38:53 +08:00
Stephen Sun	4f3b649f8e	[Mellanox] Support per PSU slope value for PSU power threshold (#13757 ) - Why I did it Support per PSU slope value for PSU power threshold according to hardware team requirement - How I did it Pass the PSU number as a parameter when fetching the slope value of PSU. - How to verify it Running regression and manual test Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-02-20 12:38:20 +08:00
Sudharsan Dhamal Gopalarathnam	a993fc205f	[Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533 ) - Why I did it Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker - How I did it Added script in docker-syncd of mellanox and copied it to /usr/bin - How to verify it Manual UT and new sonic-mgmt tests	2023-02-18 06:34:29 +08:00
mssonicbld	94e59a841e	[Mellanox] Enhance MFT make file to download source code from any valid URL (#13801 ) (#13868 )	2023-02-18 02:14:00 +08:00
Volodymyr Samotiy	e849455742	[Mellanox] Update SDK/FW to 4.5.4150/2010.4150 (#13480 ) - Why I did it To include latest fixes and new functionality SDK/FW 1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module. 2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck. 3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked. 4. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error. 5. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck. 6. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work. 7. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly. 8. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event. 9. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2023-02-16 18:36:43 +08:00
Lior Avramov	e6b1ed366b	[Mellanox] [ECMP calculator] Add script usage and more information to script description in help option (#13493 ) Add script usage and more information to script description being printed in help option. - Why I did it Missing information in script description in help option. - How I did it Expand script description and add script usage. - How to verify it Run the script with -h option.	2023-02-16 18:36:36 +08:00
mssonicbld	8832ddd60b	[Mellanox] Improve FW upgrade logging (#13465 ) (#13681 )	2023-02-12 23:53:33 +08:00
Vadym Hlushko	3530fdbea1	[SFP] Change logging severity when failed to read EEPROM (#13011 ) - Why I did it In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step. The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED). The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable). To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog: ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register - How I did it Change logging severity from ERR to WARNING - How to verify it Run the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py OR much faster way to run the next script on the switch: #!/bin/bash START=0 END=248 for (( intf=$START; intf<=$END; intf+=8)) do sfputil reset Ethernet"${intf}" done sfputil show presence	2023-02-04 02:36:51 +08:00
Junchao-Mellanox	cf6f31b215	[Mellanox] Remove TODO comments which are no longer needed (#13023 ) - Why I did it Remove TODO comments which are no longer needed - How I did it Remove TODO comments which are no longer needed - How to verify it Only comment change	2023-02-04 02:36:47 +08:00
Kebo Liu	9680479661	[Mellanox] change the implementation of is_host() to fix a stuck issue on simx platform (#13100 ) - Why I did it Following code to judge whether a process is running inside a docker could get stuck on the simx platform subprocess.Popen(["docker", "--version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True) When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up. root@sonic:/# service config-chassisdb status config-chassisdb.service - Config chassis_db Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled) Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago Main PID: 571 (config-chassisd) Tasks: 14 (limit: 9501) Memory: 132.4M CGroup: /system.slice/config-chassisdb.service ├─571 /bin/bash /usr/bin/config-chassisdb ├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform ├─602 /bin/sh -c sudo decode-syseeprom -m ├─603 sudo decode-syseeprom -m ├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m ├─616 /bin/sh -c docker --version 2>/dev/null └─617 docker --version - How I did it Use an alternative way to implement this function and issue can be avoided: docker_env_file = '/.dockerenv' return os.path.exists(docker_env_file) is False - How to verify it run regression on real hardware and simx platform.	2023-02-04 02:36:43 +08:00
Kebo Liu	ab54549d53	[Mellanox] Skip the leftover hardware reboot cause in case of last boot is warm/fast reboot (#13246 ) - Why I did it In case of warm/fast reboot, the hardware reboot cause will NOT be cleared because CPLD will not be touched in this flow. To not confuse the reboot cause determine logic, the leftover hardware reboot cause shall be skipped by the platform API, platform API will return the 'REBOOT_CAUSE_NON_HARDWARE' instead of the "hardware" reboot cause. - How I did it Check the proc cmdline to see whether the last reboot is a warm or fast reboot, if yes skip checking the leftover hardware reboot cause. - How to verify it a. Manual test: - Perform a power loss - Perform a warm/fast reboot - Check the reboot cause should be "warm-reboot" or "fast-reboot" instead of "power loss" b. Run reboot cause related regression test. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-01-31 18:34:36 +08:00
Junchao-Mellanox	e631f426f4	[infra] Support syslog rate limit configuration (#12490 ) (#13535 ) Backport of https://github.com/sonic-net/sonic-buildimage/pull/12490 into 202211 - Why I did it Support syslog rate limit configuration feature - How I did it Remove unused rsyslog.conf from containers Modify docker startup script to generate rsyslog.conf from template files Add metadata/init data for syslog rate limit configuration - How to verify it Manual test New sonic-mgmt regression cases	2023-01-30 20:11:44 +02:00
Dror Prital	d12c3b79bc	[202211][Mellanox] Add ASIC simulation version tag to fw.mk (#13473 ) Signed-off-by: dprital <drorp@nvidia.com>	2023-01-23 13:28:19 +02:00
mssonicbld	1dc71aa4ff	[Mellanox] Update ECMP calculator README (#13051 ) (#13362 )	2023-01-14 11:46:42 +08:00
mssonicbld	1e522ff3a9	Add ECMP calculator tool (#12482 ) (#13301 )	2023-01-09 00:48:56 +08:00
Kebo Liu	28f8da80ea	[Mellanox] Add support to Mellanox Spectrum-4 ASIC Firmware compiling and upgrade (#12844 ) - Why I did it Add support for compiling Spectrum-4 ASIC firmware to the SONiC image Add support for Spectrum-4 ASIC firmware upgrade - How I did it Update Mellanox fw make files to include Spectrum-4 ASIC firmware binaries. Update firmware upgrade scripts to be able to detect Spectrum-4 ASIC. - How to verify it Run regression tests Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-12-10 10:33:21 +08:00
Lior Avramov	f3821c6d2f	[Mellanox] Add SDK hash calculator debian and update SDK makefile to compile it (#12840 ) - Why I did it Add SDK hash calculator Debian and update SDK makefile to compile it. - How I did it SDK hash calculator Debian will be used by ECMP calculator (PR #12482) - How to verify it Compile sonic-buildimage and verify SDK hash calculator Debian exist in target folder.	2022-12-10 10:33:21 +08:00
Stephen Sun	91e12d7b49	[Mellanox] Support PSU power threshold checking (#11863 ) * Support power threshold Signed-off-by: Stephen Sun <stephens@nvidia.com> * get_psu_power_warning_threshold => get_psu_power_warning_suppress_threshold Signed-off-by: Stephen Sun <stephens@nvidia.com> * Fix comments Signed-off-by: Stephen Sun <stephens@nvidia.com> Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-12-10 10:33:21 +08:00
Richard.Yu	c34e3ff86b	[submodule]Advance sairdis with sai 1.11 and add brcm and mlnx sai sdk (#12471 ) (#12820 ) * Why I did it* Advance submodule sairdis with sai 1.11 and add brcm and mlnx sai sdk How I did it Advance sairedis which contains Todo: cause sairedis 202211 branch blocked by some dependences repo, map to sairedis master, will move to 202211 when branch ready [submodule][SAI]Advance SAI head pointer sonic-sairedis#1155 [Recorder]: Acquire lock for ofstream changes sonic-sairedis#1145 [SAI submodule update] Enable support for SAI v1.11.0 sonic-sairedis#1140 Add brcm sdk 7.1 which update with sai 1.11 Add mlnx sdk which update with sai 1.11 How to verify it Test with pipeline which enable RPC build as well https://github.com/sonic-net/sonic-buildimage/pull/12770/files Test with sonic smoke test cases Test with sai test cases Signed-off-by: richardyu-ms <richard.yu@microsoft.com> Signed-off-by: richardyu-ms <richard.yu@microsoft.com> Signed-off-by: Kebo Liu <kebol@nvidia.com> Co-authored-by: Kebo Liu <kebol@nvidia.com> Signed-off-by: richardyu-ms <richard.yu@microsoft.com> Signed-off-by: richardyu-ms <richard.yu@microsoft.com> Signed-off-by: Kebo Liu <kebol@nvidia.com> Co-authored-by: Kebo Liu <kebol@nvidia.com>	2022-11-24 23:30:54 +08:00
Junchao-Mellanox	20d885dbc2	[Mellanox] Add new thermal sensors for SN5600 (#12671 ) - Why I did it Add new thermal sensors for SN5600 - How I did it Add new thermal sensors for SN5600: PCH and SODIMM - How to verify it Manual test	2022-11-14 11:10:33 -08:00
Kebo Liu	c8c2b7fc45	[Mellanox] [Platform API] Update SN2201 dynamic minimum fan speed table (#12602 ) - Why I did it Update SN2201 dynamic minimum fan speed table according to data provided by the thermal team. - How I did it Update the thermal table in device_data.py - How to verify it Run platform related regression Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-11-08 13:37:10 +02:00
Junchao-Mellanox	830b7d8cb4	[Mellanox] Use sdk sysfs instead of ethtool (#12480 )	2022-11-03 11:17:44 -07:00
Vivek	5d83d424b1	Added BUILD flags to provision for building the kernel with non-upstream patches (#12428 ) * Added ENV vars for non-upstream patches Signed-off-by: Vivek Reddy <vkarri@nvidia.com> * Made MLNX_PATCH_LOC an absolute path Signed-off-by: Vivek Reddy <vkarri@nvidia.com> * Added non-upstream-patches dir Signed-off-by: Vivek Reddy <vkarri@nvidia.com> * Update README.md * Addressed comments * Env vars updated Signed-off-by: Vivek Reddy <vkarri@nvidia.com> * Readme updated Signed-off-by: Vivek Reddy <vkarri@nvidia.com> Signed-off-by: Vivek Reddy <vkarri@nvidia.com>	2022-10-31 12:16:05 -07:00
Dror Prital	917ad1ffe0	[Mellanox] Update SDK/FW to version 4.5.3186/2010.3186 (#12542 ) - Why I did it Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes: New functionality: 1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system Fix the following issues: 1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow 2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed 3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received 4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted 5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU 6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized 7. SLL configuration is missing in SDK dump 8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero) 9. PCI calibration changes from a static to a dynamic mechanism 10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event 11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None - How I did it Update pointer for the SDK/FW - How to verify it Run regression tests Signed-off-by: dprital <drorp@nvidia.com>	2022-10-30 09:31:09 +02:00
Stephen Sun	8c73e68468	Remove \n from the end of fs_path in ONIEUpdater (#12465 ) This fixes the following error ``` admin@sonic:~$ sudo fwutil show status mount: /mnt/onie-fs: special device /dev/sda2 does not exist. Error: Command '['mount', '-n', '-r', '-t', 'ext4', '/dev/sda2\n', '/mnt/onie-fs']' returned non-zero exit status 32.. Aborting... Aborted! admin@sonic:~$ sudo vi /usr/local/lib/python3.9/dist-packages/sonic_platform/ ``` Seems like #11877 the rstrip('\n') was removed. Probably by mistake. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-10-23 09:59:20 +03:00
Mai Bui	648ca075c7	[device/mellanox] Mitigation for security vulnerability (#11877 ) Signed-off-by: maipbui <maibui@microsoft.com> Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first. #### Why I did it `subprocess.Popen()` and `subprocess.check_output()` is used with `shell=True`, which is very dangerous for shell injection. #### How I did it Disable `shell=True`, enable `shell=False` #### How to verify it Tested on DUT, compare and verify the output between the original behavior and the new changes' behavior. [testresults.zip](https://github.com/sonic-net/sonic-buildimage/files/9550867/testresults.zip)	2022-10-06 17:51:31 -04:00
Dror Prital	44356fa8d7	[Mellanox] Add NVIDIA copyright header for NVIDIA added files (#12130 ) - Why I did it Add NVIDIA Copyright header for new "NVIDIA" files - How I did it Add the copyright header as remark at the head of the file	2022-10-02 11:34:24 +03:00
Volodymyr Samotiy	eea8ebd0a9	[Mellanox] Update MFT to v4.21.0-100 (#11758 ) - Why I did it To update MFT package to the latest version. - How I did it Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk. - How to verify it Build an image and deploy to the switch Check MFT version by dpkg -l \| grep mft Verify that all the SONiC services up and running Run regression testing using tests from sonic-mgmt Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-09-30 09:48:40 +03:00
Volodymyr Samotiy	92bd6dae28	[Mellanox] Update SAI to v2205.22.1.19 and SDK/FW to v4.5.3168/v2010.3170 (#12205 ) - Why I did it To include latest fixes and new functionality SAI fixes and new features fix #3205239, incorrect object type returned for SG child list Fix VRF-VNI map entries remove issue ECC health event and logging [Port Buffers] restore default queue and pg configuration when all user pools are deleted Fix EVPN type3 error on removal of uc/bc flood group Fix EVPN type2 MAC move from local to remote results in SAI failure Fix Disable learning on VXLAN tunnel Fix error on VXLAN v6 tunnel removal Fix port cannot apply schedule group when it is a lag member Fix BFD add more detailed message on BFD packet not related to any existing session gcc10 compilation fixes Disable learning on VXLAN tunnel Support BFD remote-disc exchange in negotiation stage Tunnel Loopback packet action attribute implementation (for Dual TOR) Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only) Support for CRC2 hash algorithm Bulk counter support for PGs, queues Support mirror sample rate attribute (SPC2+) [Functional] [QoS] \| Unable to remove SCHEDULE profile table even if there is no object referencing it Next hop group optimized bulk API Reduce verbosity of shared database already exists print Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+ use same size descriptor pool for rx/tx fix bfd - notify Sonic for admin-down event 2201 - empty list for supported fec for RJ45 ports Fix don't disable used tunnel underlay interfaces SDK fixes 100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom". Added remote peer UDP port information in BFD packet event. After editing an ECMP, the resilient ECMP next-hop counter may not count correctly. Fixed potential memory leaks in some APIs related to LPM If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero). In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected. In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped. In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets. On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck. PCI calibration changes from a static to a dynamic mechanism. SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group. SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group. SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump. SLL configuration is missing in SDK dump. Spectrum-2 systems, do no support 1GbE on supported 40GbE modules. When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly. When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU. When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU. While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-09-30 09:40:12 +03:00
Junchao-Mellanox	1d69f0916e	[Mellanox] Provide dummy implementation for get_rx_los and get_tx_fault (#12231 ) - Why I did it get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them. NOTE: in later releases we will get them back via different interface. - How I did it Return False * lane_num for get_rx_los and get_tx_fault - How to verify it Added unit test	2022-09-30 09:38:05 +03:00
Stephen Sun	4d317aff94	[Mellanox] Fix typo in platform API (#12136 ) - Why I did it Fix a typo in chassis platform API which causes the following error >>> import sonic_platform as P >>> c = P.platform.Platform().get_chassis() >>> sl = c.get_all_sfps() >>> sl[0].get_lpmode() Sep 28 07:48:33 INFO LOG: Initializing SX log with STDOUT as output file. False >>> del c Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0> Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__ self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle) NameError: name 'sfp_module' is not defined - How I did it Use self while using the SDK handle - How to verify it Manual test Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-09-28 11:09:18 +03:00
Junchao-Mellanox	f890606d82	Revert "[Mellanox] Redirect ethtool stderr to subprocess for better error log (#12038 )" (#12183 ) This reverts commit `9750cb4`. There is a PR to handle 202205 branch revert: #12184 - Why I did it The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged: Cannot get module EEPROM information: Input/output error Before the "bad" PR, the message format is like this: INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message: NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error Rsyslog no longer truncate such log and many such messages are flooded to syslog. - How I did it Revert the PR - How to verify it Manual test	2022-09-28 10:15:26 +03:00
Dror Prital	54b146f56c	[Mellanox] Update SDK/FW to version 4.5.2320/2010.2320 (#11990 ) - Why I did it Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes: • Spectrum-3 \| PCI calibration changes from a static to a dynamic mechanism. • [VxLAN] TTL was set to 0 for non IP traffic (such as ARP) - How I did it Update pointer for the SDK/FW - How to verify it Run regression tests	2022-09-14 20:43:38 +03:00
Junchao-Mellanox	9750cb48c6	[Mellanox] Redirect ethtool stderr to subprocess for better error log (#12038 ) - Why I did it ethtool print error logs when EEPROM of a SFP is not available. It prints error like this: INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP. - How I did it Redirect ethtool stderr to subprocess and log it better - How to verify it Manual test	2022-09-14 20:41:43 +03:00

1 2 3 4 5 ...

545 Commits