Why I did it
Advance SAI Redis head pointer
How I did it
changes:
sonic-net/sonic-sairedis@cf679e7sonic-net/sonic-sairedis@8d6688e
[202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185 sonic-net/sonic-sairedis@66f2961
remove parameter --skip_error, which removed from [202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185
How to verify it
local image build
This PR is to backport #13100 to the 202205 branch since can not be cleanly cherry-picked.
Following code to judge whether a process is running inside a docker could get stuck on the simx platform
subprocess.Popen(["docker", "--version"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True)
When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up.
root@sonic:/# service config-chassisdb status
config-chassisdb.service - Config chassis_db
Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago
Main PID: 571 (config-chassisd)
Tasks: 14 (limit: 9501)
Memory: 132.4M
CGroup: /system.slice/config-chassisdb.service
├─571 /bin/bash /usr/bin/config-chassisdb
├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform
├─602 /bin/sh -c sudo decode-syseeprom -m
├─603 sudo decode-syseeprom -m
├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m
├─616 /bin/sh -c docker --version 2>/dev/null
└─617 docker --version
- How I did it
Use an alternative way to implement this function and issue can be avoided:
docker_env_file = '/.dockerenv'
return os.path.exists(docker_env_file) is False
- How to verify it
run regression on real hardware and simx platform.
Why I did it
In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step.
The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED).
The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable).
To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog:
ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register
How I did it
Change logging severity from ERR to WARNING
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
Why I did it
To bring the following fixes:
Applied customer patch for limited port breakout support in rel_ocp_sai_7_1.
Revert "Merged PR 7224850: SAI7.1_DNX: Temp workaround for Nexthop Group Scale Issue(CS00012251649)".
backport SONIC-67662 to SAI7.1:JR2C ECMP partition for NHgroup members.
How I did it
Updated SAI code with the fixes above.
How to verify it
Run the SONiC and SAI test with the SAI pipeline.
Why I did it
Update Marvell armhf sai debian version. Includes supports for,
Autoneg
Everflow
How to verify it
Compile and load SONiC armhf bin
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Why I did it
smartctl tool is available only in PMON docker. Hence, the tool may be not accessible incase PMON docker goes down.
Using iSMART_64 tool to fetch the SSD firmware version and device model information.
How I did it
Replacing smartctl with iSMART_64.
1d53bf4 Skip platform NDK health check two times in watchdog.sh
d68297c Added code to shutdown the channel after the grpc call also fixed the show fp-status command
0769efe Impelemented the module API to return the correct eeprom info for fabric card.
171569c Remove explicit logger identifier for transceiver module operations; use inherited id
6c4d651 Corrected the log messages for firmware install
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
enable sai-ptf logger in sai_adapter to log all the sai api invcations
How I did it
add build parameter to enable the sai-ptf logger when build sai PRC
How to verify it
local build test
test the generated sai_adapter
test with pipeline
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Why I did it
Move armhf syncd docker compilation to bullseye.
How I did it
compile syncd docker for armhf platform using below commands,
NOJESSIE=1 NOSTRETCH=1 NOBUSTER=1 BLDENV=bullseye make configure PLATFORM=marvell-armhf PLATFORM_ARCH=armhf
NOJESSIE=1 NOSTRETCH=1 NOBUSTER=1 BLDENV=bullseye make target/docker-syncd-mrvl.gz
How to verify it
upgrade the syncd docker and verify ports are up.
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
cherry-pick #12761
Make syncd rpc docker which supports sai-ptf v2
Part of previous PR #11610
local bulild the target
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=broadcom NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-syncd-brcm-rpcv2.gz NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
add partial reboot cause support for linecards
add watchdog support for linecards
add power draw information for chassis
properly implement Chassis.get_port_or_cage_type
fix pcieutil on chassis with powered off cards
fix watchdog-control.service crash
misc fixes and cleanups
* Build docker-gbsyncd-broncos image
* Correct typo in LIBSAI_BRONCOS_URL_PREFIX
* Update docker-gbsyncd-broncos/Dockerfile.j2
* Enable debug shell support on docker-gbsyncd-broncos
* Include bcmsh in docker-gbsyncd-broncos
Why I did it
In docker-gbsyncd-broncos image, enable debug shell support for BRCM broncos PHY.
How I did it
How to verify it
Note: need enable attr SAI_SWITCH_ATTR_SWITCH_SHELL_ENABLE support in BCM PAI library
# bcmsh
Press Enter to show prompt.
Press Ctrl+C to exit.
NOTICE: Only one bcmsh or bcmcmd can connect to the shell at same time.
BRCM:> help
help
List of available commands
- h or help => Print command menu
- l => Print list of active ports on the PHY
- ps <port_id> <options> => Print port status
<options> => 1 -> Link status
=> 2 -> Link training failure status
=> 3 -> Link training RX status
=> 4 -> PRBS lock status
=> 5 -> PRBS lock loss status
- rd <port_id> <addr> <no of registers to read> => Read register contents
- wr <port_id> <addr> <data> => Write register data
- rrd <lanemap> <if_side> <addr> <no of registers to read> => Raw read register contents using lanemap and if_side (line = 0, system = 1)
- rwr <lanemap> <if_side> <addr> <data> => Raw write register data using lanemap and if_side (line = 0, system = 1)
- fw or firmware => Print firmware version of the PHY
- pd or port_dump <port_id> <flags> => Dump port status
- eyescan <port_id> => Display eye scan
- fec_status <port_id> => Get fec status of the port
- polarity <lanemap> <if_side> <TX polarity> <RX Polarity> => Set TX and RX polarity
<lanemap> => 0xF, 0xFF, or 0xFFFF based on number of lanes
<if_side > => Line = 0, System = 1
<TX/RX Polarity> =>_TX/RX Polarity bitmap of all lanes
Each bit represents a lane number.
E.g. Lane 0's polarity value (0 or 1) is populated in Bit 0.
- polarity <lanemap> <if_side> => Print TX and RX polarity
- lb <port_id> <lb_value> => Enable loopback on the port
lb_value = 0 -> Disable, 1 -> PHY, 2 -> MAC
- lb <port_id> => Print loopback configuration of the port
- prbs <port_id> <options> <val> => Set/Get PRBS configuration
<options> => 1 -> Get PRBS state and polynomial
2 -> Set PRBS Polynomial, <val> - PRBS Polynomial
Please refer to phy/chip documentation for valid values
3 -> Enable PRBS
<val> => 0 Disable PRBS
1 Enable both PRBS Transmitter and Receiver
2 Enable PRBS Receiver
3 Enable PRBS Transmitter
exit or q => Exit the diagnostic shell
- Why I did it
Update SN2201 dynamic minimum fan speed table according to data provided by the thermal team.
- How I did it
Update the thermal table in device_data.py
- How to verify it
Run platform related regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes:
New functionality:
1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system
Fix the following issues:
1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow
2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed
3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received
4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted
5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU
6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized
7. SLL configuration is missing in SDK dump
8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero)
9. PCI calibration changes from a static to a dynamic mechanism
10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event
11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Signed-off-by: dprital <drorp@nvidia.com>
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
fix linecard provisioning issue (500 error)
fix some value types for get_system_eeprom_info API
refactor code to leverage pci topology (enabling dynamic Pcie plugin)
refactor asic declaration logic to new style
misc fixes
- Why I did it
get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them.
NOTE: in later releases we will get them back via different interface.
- How I did it
Return False * lane_num for get_rx_los and get_tx_fault
- How to verify it
Added unit test
- Why I did it
To include latest fixes and new functionality
SAI fixes and new features
fix#3205239, incorrect object type returned for SG child list
Fix VRF-VNI map entries remove issue
ECC health event and logging
[Port Buffers] restore default queue and pg configuration when all user pools are deleted
Fix EVPN type3 error on removal of uc/bc flood group
Fix EVPN type2 MAC move from local to remote results in SAI failure
Fix Disable learning on VXLAN tunnel
Fix error on VXLAN v6 tunnel removal
Fix port cannot apply schedule group when it is a lag member
Fix BFD add more detailed message on BFD packet not related to any existing session
gcc10 compilation fixes
Disable learning on VXLAN tunnel
Support BFD remote-disc exchange in negotiation stage
Tunnel Loopback packet action attribute implementation (for Dual TOR)
Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only)
Support for CRC2 hash algorithm
Bulk counter support for PGs, queues
Support mirror sample rate attribute (SPC2+)
[Functional] [QoS] | Unable to remove SCHEDULE profile table even if there is no object referencing it
Next hop group optimized bulk API
Reduce verbosity of shared database already exists print
Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+
use same size descriptor pool for rx/tx
fix bfd - notify Sonic for admin-down event
2201 - empty list for supported fec for RJ45 ports
Fix don't disable used tunnel underlay interfaces
SDK fixes
100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom".
Added remote peer UDP port information in BFD packet event.
After editing an ECMP, the resilient ECMP next-hop counter may not count correctly.
Fixed potential memory leaks in some APIs related to LPM
If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero).
In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides
In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected.
In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped.
In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt.
On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
PCI calibration changes from a static to a dynamic mechanism.
SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group.
SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group.
SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump.
SLL configuration is missing in SDK dump.
Spectrum-2 systems, do no support 1GbE on supported 40GbE modules.
When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly.
When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU.
When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed.
When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received.
How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "soni-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
Fix a typo in chassis platform API which causes the following error
>>> import sonic_platform as P
>>> c = P.platform.Platform().get_chassis()
>>> sl = c.get_all_sfps()
>>> sl[0].get_lpmode()
Sep 28 07:48:33 INFO LOG: Initializing SX log with STDOUT as output file.
False
>>> del c
Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__
self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle)
NameError: name 'sfp_module' is not defined
- How I did it
Use self while using the SDK handle
- How to verify it
Manual test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
This reverts commit 9750cb4.
There is a PR to handle master branch revert: #12183
- Why I did it
The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged:
Cannot get module EEPROM information: Input/output error
Before the "bad" PR, the message format is like this:
INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error
It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message:
NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error
Rsyslog no longer truncate such log and many such messages are flooded to syslog.
- How I did it
Revert the PR
- How to verify it
Manual test
- Why I did it
ethtool print error logs when EEPROM of a SFP is not available. It prints error like this:
INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error
INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument
However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP.
- How I did it
Redirect ethtool stderr to subprocess and log it better
- How to verify it
Manual test
- Why I did it
Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes:
• Spectrum-3 | PCI calibration changes from a static to a dynamic mechanism.
• [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Why I did it
Enable syncd container autorestart for Innovium platforms
How I did it
Add critical_process file and sypervisord.conf entry
How to verify it
Tested with autorestart/test_container_autorestart.py::test_containers_autorestart
PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-xxx-dut-sonic-xxx-dut|syncd]
Signed-off-by: rck-innovium rck@innovium.com
* [SAIServer] support saiserver v2 in bullseye
Support build saiserverv2 in bullseye
Add dependencies for building saiserverv2
Upgrade libboost-atomic1.71 to libboost-atomic1.74
Test done:
Local builded with NOSTRETCH=y NOJESSIE=y NOBUSTER=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
* update libboost-atomic from 1.71 to 1.74 for bullseye
Why I did it
The directory /var/warmboot as top directory for warmboot feature is also needed in docker gbsyncd. Some vendor SAI might save data under it. Without it, the SAI init/creation API failure has happened on PikeZ platform.
How I did it
Mount host directory /host/warmboot as /var/warmboot in docker gbsyncd, which is same as what it has done on docker syncd.
Why I did it
It solves a swss orchagent crash issue on PikeZ device, due to link-training setting of external PHY port.
How I did it
Catch up the fix for CS00012257483 in version 7.1.7.2.
- Why I did it
To update MFT package to the latest version.
- How I did it
Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk.
- How to verify it
Run regression testing using tests from sonic-mgmt
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
Cr space timeout on Hold and Release GW - at warm boot
SPC-1 Port in stuck PHY_UP after peer side rebooted
memory leak in sx_api_router_ecmp_update_set
How I did it
update the make file with the new version number
update submodule Switch-SDK-drivers pointer
How to verify it
run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Update HW-MGMT to V.7.0020.3006
1. Support new system SN2201
2. Add COMEX BRDWL respin support
- How I did it
Update the version number of the makefile
Advance the hw-mgmt submodule pointer
- How to verify it
Run full regression on Nvidia platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add more log while doing sysfs reading to increase the debug capability
- How I did it
Log the relevant file path and error number while sysfs reading return None
- How to verify it
Manual test
To reduce rc.local script execution time. Porting changes from [DellEMC] S6100 Platform Service optimization #10989
Changes:
Moving platform-modules-s6100.service and s6100-lpc-monitor.service asynchronous to rc.local script.
- Why I did it
Fix bug: pmon report error on start up because some SKUs do not have hwsku.json
- How I did it
If hwsku.json, do not extract RJ45 port information
- How to verify it
Manual test.
Unit test.
- Why I did it
Support get_port_or_cage_type for RJ45 ports
- How I did it
Implement the new platform API get_port_or_cage_type
Fix the issue: unable to import SFP when chassis object is destructed
- How to verify it
Manually test and regression test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support new platform SN2201 and RJ45 port
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* remove unused import and redundant function
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* fix error introduced by rebase
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* Revert the special handling of RJ45 ports (#56)
* Revert the special handling of RJ45 ports
sfp.py
sfp_event.py
chassis.py
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove deadcode
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support CPLD update for SN2201
A new class is introduced, deriving from ComponentCPLD and overloading _install_firmware
Change _install_firmware from private (starting with __) to protected, making it overloadable
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Initialize component BIOS/CPLD
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove swb_amb which doesn't on DVT board any more
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove the unexisted sensor - switch board ambient - from platform.json
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Do not report error on receiving unknown status on RJ45 ports
Translate it to disconnect for RJ45 ports
Report error for xSFP ports
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Add reinit for RJ45 to avoid exception
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
Give more room for the kernel image in memory
Change-Id: I015856d173d50d94e30d8c555590efb70eb712ae
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
- Why I did it
Advance to new SAI version for bugs fixes as well as new features/enhacements:
New:
1. ARM64 support
2. FG ECMP performance optimization
3. Support setting empty list for port ingress/egress buffer profile list
4. Add service port for SN5600
5. Add CR8/SR8/LR8/KR8 interface type
6. Disable mlxtrace during debug dump
Fixes:
1. Fix SAI_ACL_ENTRY_ATTR_FIELD_TC
2. Fix Packets loop back if no member in portchannel
3. Fix optimize descriptors apply time (and fast boot time)
4. Add flush fdb entries for vxlan tunnel bridge port
5. Don't disable used tunnel underlay interfaces
- How I did it
Advanced SAI submodule
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* [Mellanox]Check dmi file permission before access (#11309)
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Why I did it
During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog
Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied')
How I did it
Check the file permission before accessing it.
How to verify it
Added UT to verify. Manually verified if the error log is not thrown.
- Why I did it
This is for the eventual support of multiple architectures for the mellanox platform.
- How I did it
Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for.
This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image.
- How to verify it
Build SONiC for mellanox and verify it compiles successfully.
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Why I did it
During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog
Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied')
How I did it
Check the file permission before accessing it.
How to verify it
Added UT to verify. Manually verified if the error log is not thrown.
- Why I did it
To include latest fixes:
1. Warmboot | When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
2. Link Up | When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
3. Shared buffer | While moving from lossless to lossy while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized.
- How I did it
Updated SDK submodule along with the relevant Makefiles
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker
- How I did it
Added the "print_suppressions=0" to ASAN configs.
- How to verify it
add a suppression to some ASAN-enabled component (the suppression should catch some leak)
build with ENABLE_ASAN=y
run a test and see that the ASAN report is empty instead of having the suppression summary
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
To ensure that ASAN logs are always generated. Currently, the way to get the logs is to map the "/var/log/asan" outside of a container, which doesn't work for DVS test run with "--imgname" option.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This fixes the build for armhf to be able to use '/device///installer.conf' files. Specifically, armhf needs support to be able to change the size of /var/log/ directory. It is hardcoded to 512 bytes on all armhf platforms currently. This change will allow any armhf platform to be able to use an installer.conf file to customize the installed image.
- Why I did it
"import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before.
- How I did it
Find slow import and call them when need.
- How to verify it
Measure the import time.
Currently, the build with ASAN_ENABLE=y reuses the packages built with
ASAN_ENABLE=n (and vice versa). To address this issue, ASAN_ENABLE is added to DEP_FLAGS for asan-enabled packages (docker-syncd-mlnx, syncd, docker-orchagent, swss).
- Why I did it
To make dpkg cache use/rebuild the packages for ASAN_ENABLE=y/n.
- How I did it
Added ASAN_ENABLE to the DEP_FLAGS for asan-enabled packages.
- How to verify it
Built with ASAN_ENABLE=y/n and checked the .flags .log files.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is to improve the readability of ASAN reports. The debug package adds function names and source code references to the backtrace (currently, there are only binary addresses of functions)
Another way to address this issue is to build the image with "INSTALL_DEBUG_TOOLS=y". The downside of this approach is that the image size and compilation time are unnecessarily big. Also, the idea is to make the "ENABLE_ASAN" self-sufficient, which would not be the case for this approach.
- Why I did it
To improve the readability of asan logs.
- How I did it
Added SYNCD_DBG and SWSS_DBG to corresponding docker images for ASAN_ENABLE=y build
- How to verify it
Add artificial memory leak
Build with ASAN_ENABLE=y
Test the image and check the ASAN report
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
Update MFT to newer version
- How I did it
Update MFT_VERSION in platform/mellanox/mft.mk
- How to verify it
Check version via dpkg -l | grep mft
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Why I did it
add celestica belgite platform
How I did it
add belgite platform in celestica
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Co-authored-by: anjian <anjian@celestica.com>
Co-authored-by: sandycelestica <sandyli@celestica.com>
This update has following changes
Refactor pci topology logic for chassis (fixes some chassis commands and chassisd on linecard)
Introduce new cooling algorithm
Fix linecard poweroff logic when supervisor is going down
Fix linecard status led leading to system-health crashing
Misc fixes
- Why I did it
Script fails when there is an exception while reading
- How I did it
Add more logs and checks. Fix wrong variable naming and messages.
- How to verify it
Provoke exception while read_eeprom() and check that it is handled properly
Why I did it
To include ONIE version in show platform firmware status command output in DellEMC S6100 and Z9332f platforms.
How I did it
Include ‘ONIE’ in the list of components provided by platform APIs in DellEMC S6100 and Z9332f.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins in DellEMC S6100.
Fixes#9279
- Why I did it
Part of larger effort to move all SONiC systems to bullseye
- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3
- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed