Why I did it
fix the dx010 system eeprom unavailable issue
How I did it
enable the i2c slave 30ms timeout mechanism
How to verify it
i2cstress test in DX010 iSMT controller bus
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Why I did it
Update Makefile, so it does the following:
For a given platform, verify if platform/checkout/.ini exists and hence run the platform/checkout/template.j2. This allows platform code to be checked out during the 'make configure' stage.
How I did it
git clone git@github.com:Azure/sonic-buildimage.git
mkdir platform/cisco-8000
make init
make configure PLATFORM=cisco-8000
make all
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
Why I did it
platform test suite failed for few API's in DellEMC Z9332f platform.
How I did it
Modified the API's to return the expected values in the script.
How to verify it
Run platform test suite after making the changes.
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
Move SAI libraries for Broadcom for XGS and DNX families to 5.0.0.6
Included fixes
```
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012192505 | [4.3] Re-encap IPinIP decap packets
CS00012192502 | [3.7.5.2] Start LED shell script execution on all DELL based platforms causing all ports flapping on SAI 3.7.5.2
CS00012191363 | [4.3] Support of memscan thread to detect TCAM parity error
CS00012190932 | [4.3] SAI_PORT_PFC_X_RX_PKTS incremented incorrectly even when no PFC frames are received on that priority
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00011382163 | [4.4] Support warm-boot from 3.5 to 4.3
CS00011318937 | [4.3] MACSec SAI Support for Jericho2c+
CS00011318926 | [4.3] Provide SAI support for Jericho2c+
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012195261 | [4.3][5.0][TD3]VLAN tagged IP packet received on untagged interface being routed instead of dropped
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012196056 | [4.3.3.8][WARMBOOT] syncd[2584]: segfault at 5616ad6c3d80 ip 00007f61e0c6bc65 sp 00007fff0c5a7a90 error 4 in libsai.so.1.0[7f61e0a95000+3cd8000]
CS00012195262 | [4.3][5.0][TD3] Malformed IP packet(missing IP header) received on a VLAN Interface is flooded to other LVAN members instead of being dropped
CS00012195956 | [4.3.3.8] [TD3]Syncd Crash at brcm_sai_tnl_mp_create_tunnel()
PR 4346163: Add support for AN/LT
```
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available
How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.
How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform
- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also
- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.
- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings
- Why I did it
to prevent python exception error when executing warm-reboot command on mellanox simulator platform
- How I did it
return None on the watchdog python script on cases that watchdog file is not exist
- How to verify it
warm-reboot is running well without the python error. error message will appear on log on these cases.
in order to avoid this error message we can simulate the watchdog on mellanox simulator platform
Why I did it
Update XGS and DNX SAI to 5.0.0.4 and additional flags needed in saibcm-modules
The following CSP's are merged in 5.0.0.4
CS00012182148 [4.3] Rate Limit Parity error message to syncd/sonic.
CS00012178692 [4.3] ACL drops counted as interface drops
CS00012183901 [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012070713 [SAI 4.3 , DNX, 8690] Everflow ACL creation fails - brcm_sai_dnx_create_acl_table API fails, with unknown attribute error.
CS00012023263 [4.4] TD3/TH2 : Support 4 lossless queues(2 SW PFCWD and 2 HW PFCWD)
CS00012019578 [4.4] Pre FEC bit-error rate (BER) - DNX and XGS (TD and TH 50/100G)
How I did it
Changes the various make files to include the new SAI release + update the opennsl-modules.
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot
Signed-off-by: Dror Prital <drorp@nvidia.com>
- Why I did it
* For SAI - Advance to adopt the following fixes:
1. Better handle not implement object type for resource availability
2. Fix ext dump when saidump is triggered from 2nd process (saidump utility) other than main adapter host (syncd in SONiC)
* For SDK\FW:
- Changes and new features:
1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM).
2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE
3. Improved performance of "per-port-buffer" counters
4. Added support for Kernel 5.10
- Bugs fixes:
On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds
Signed-off-by: Dror Prital <drorp@nvidia.com>
Why I did it
We hit an issue recently in the chassis bringup where the linux bde attach failed with the following ioctl error.
[ 9058.585960] linux-user-bde (897363): Error: Invalid ioctl (00004c1d)
[ 9105.668237] linux-user-bde (901002): Error: Invalid ioctl (00004c1d)
Debugged with Broadcom team, who suggested to use this flag BCM_INSTANCE_SUPPORT to support multi-instance scenarios ( platforms with more than one asic where there are separate sai/syncd docker instances running controller each asic instance).
This flag was introduced since SDK-6.5.21 and need to be present in SAI and SAI GPL kernel module makefile.
How I did it
Add the flag in this flag BCM_INSTANCE_SUPPORT in gpl modules
Why I did it
To determine the revision of the pcie.yaml to be used based on BIOS version in DellEMC S6100 platform.
Depends on: Azure/sonic-platform-common#195
How I did it
Added two revisions of pcie.yaml pcie_1.yaml and pcie_2.yaml
Included a platform-specific Pcie class to provide the revision of the pcie.yaml to be used by pcieutil/pcied.
How to verify it
Execute pcieutil check (Azure/sonic-utilities#1672) command and verify the list of PCIe devices displayed.
Logs: UT_logs.txt
#### Why I did it
Support API 2.0 for S5248F platform
#### How I did it
Making changes to S5248F platform specific directory
Co-authored-by: Arun LK <Arun_L_K@dell.com>
#### Why I did it
Currently, SONiC use a single value to represent SFP error, however, multiple SFP errors could exist at the same time. This PR is aimed to support it
#### How I did it
Return bitmap instead of single value when a SFP event occurs
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Split and bulk counter bug fixes:
- Init port auto neg to default on static (SAI XML) port split for 2nd+ port
- Fix port stats SAI_PORT_STAT_WRED_DROPPED_PACKETS, SAI_PORT_STAT_ECN_MARKED_PACKETS, SAI_PORT_STAT_ETHER_TX_OVERSIZE_PKTS
- Hide error message when reading not implemented port stat
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
The debian install files are required for installing sonic_platform packages
#### How I did it
Add install files to under debian folder
Co-authored-by: robert.hong <robert.hong@qct.io>
- Why I did it
The methods get_model, get_serial, and get_revision have been implemented by reading relevant information from VPD and then recording the information into relevant fields.
However, there is no VPD data on platforms with fixed PSUs and relevant fields haven't been initialized, which causes the methods to throw exceptions. which in turn prevents psud from inserting fields into PSU table.
Eventually, this causes show platform psustatus doesn't output correct info.
- How I did it
Initialize those fields as N/A on systems with fixed PSUs.
- How to verify it
Manually test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
vs has components from swss, bgp, teamd and nat. This table is needed by this change https://github.com/Azure/sonic-utilities/pull/1554.
Because Azure/sonic-utilities#1554 requires "config warm_restart enable FEATURE" and the FEATURE has to be in feature table.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).
How I did it
Remove quagga-related code.
#### Why I did it
- To build flashrom properly with dependency tracking.
#### How I did it
- Moved flashrom code from platform/broadcom/sonic-platform-modules-dell/tools directory to src/flashrom directory.
- At the end, flashrom_0.9.7_amd64.deb package is build which will be installed in the devices.
- Currently flashrom builds only for Dell S6100 platforms.
Introduce new sonic-buildimage images for Broadcom DNX ASIC family.
sonic-broadcom-dnx.bin
sonic-aboot-broadcom-dnx.swi
How I did it
NO CHANGE to existing make commands
make init; make configure PLATFORM=broadcom; make target/sonic-aboot-broadcom.swi; make target/sonic-broadcom.bin
The difference now is that it will result in new broadcom images for DNX asic family as well.
sonic-broadcom.bin, sonic-broadcom-dnx.bin
sonic-aboot-broadcom.swi, sonic-aboot-broadcom-dnx.swi
Note: This PR also adds support for Broadcom SAI 5.0 (based on 1.8 SAI ) for DNX based platform + changes in platform x86_64-arista_7280cr3_32p4 bcm config files and platform_env.conf files
#### Why I did it
Failures observed when running the open community platform test suite (sonic-mgmt)
#### How I did it
Call PSUBase class initializer from derived class
Add device and platform code for ix7-bwde, ix8a-bwde.
Support platform API 2.0 for all quanta platforms except for ix1b
Co-authored-by: robert.hong <robert.hong@qct.io>
- Why I did it
Remove EEPROM cache file and use DB instead
- How I did it
Read EEPROM data from DB if possible
If data is not ready in DB, read from hardware using a visitor pattern
- How to verify it
Manual test and regression
Why I did it
The Mellanox platform is required to support the fwutil auto-update feature defined here
This is to allow switches, when performing SONiC upgrades to choose whether to perform firmware upgrades that may interrupt the data plane through a cold boot.
How I did it
Two methods were added to the component implementations for mellanox.
In the base Component class we add a default function that chooses to skip the installation of any firmware unless the cold boot option is provided. This is because the Mellanox platform, by default, does not support installing firmware on ONIE, the CPLD, or the BIOS "on-the-fly".
In the ComponentSSD class we add a function that behaves similarly but uses the Mellanox specific SSD firmware upgrade tool to check if the current SSD supports being upgraded on the fly in order to decide whether to skip or perform the installation.
How to verify it
Unit tests are included with this PR. These test will run on build of target sonic-mellanox.bin
You may also perform fwutil auto-update ... commands after Azure/sonic-utilities#1242 is merged in.
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API
- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly
- How to verify it
Manually test and run regression platform test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Adjust the Makefile for SDK/python-SDK-API to support both python2 and python3
- How to verify it
Build the image and check whether python2 and python3 are both supported by SDK API.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Added support BRCM SAI 5.0.0.1.
Major changes here:
CS00012019568 Link Training (all 100G ASICs - TH families and TD3)
CS00012184310 [attribute_capability| for port SAI_PORT_ATTR_TPID returns CREATE_IMP=false|SET_IMP=true|GET_IMP=true
CS00012182145 [IPinIP][Tunnel Delete] If IPinIP tunnel delete is performed observed following SYNCd error: ERR syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_remove_tunnel_term_table_entry:4026 _brcm_sai_mptnl_sip_tnl_lookup failed with error -7.
CS00012182148 Rate Limit Parity error message to syncd/sonic.
CS00012178692 ACL drops counted as interface drops
CS00012183901 [WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012023263 TD3/TH2 : Support 4 lossless queues(2 SW PFCWD and 2 HW PFCWD)
CS00012019578 Pre FEC bit-error rate (BER) - DNX and XGS (TD and TH 50/100G)
To fix determine-reboot-cause service which was failing due to non-implemented thrown from get_reboot_case, if the reboot was done with `sudo reboot` (cold reboot)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Why I did it
* For SAI - Upgrade to Version 1.19.0
- Add support for VxLAN encap TTL uniform model on SPC2/3
- Add ACL entry actions set VRF, set do no learn, add VLAN ID, add VLAN priority
- Add ACL field has VLAN tag
- Bulk counters (improve port statistics performance)
- Create async dump extra as part of debug generate dump
- Create irisc dump on severe health event
- Support 0 port systems (modify get switch mac to work accordingly)
- Set interface vlan up state for ping tool in SONiC
- Support attributes SAI_PORT_ATTR_QOS_SCHEDULER_PROFILE_ID, SAI_PORT_ATTR_QOS_INGRESS_BUFFER_PROFILE_LIST,
SAI_PORT_ATTR_QOS_EGRESS_BUFFER_PROFILE_LIST, SAI_PORT_ATTR_POLICER_ID as part of port create Git stats
* For SDK\FW - Upgrade to Version SDK 4.4.3106, FW 2008_3110
Added Features:
- Increased ACL table
- Enhanced PSAMPLE support
- Added support for Finisar SR4 module in SN3700 systems
- Added support for Python 3.0 in examples.
Fix bugs:
- On LR4 transceivers 00YD278, the firmware incorrectly identified the transceiver
- Reduce memory consumption for virtual LAG
- Fixed PSAMPLE listeners cleanup on SDK drivers unloading.
- On Spectrum-2 and Spectrum-3 systems, slow reaction time to Rx pause packets may lead to buffer overflow on servers.
- BER may be experienced when using 5m DAC cables between SN4700 and SN2700 in 100GbE speed.
- On very rare occasion, when connecting DR4 PAM4 transceiver to 100GbE DR1 NRZ, low BER may be experienced.
- Unexpected packet drops on the port ingress buffer may be experienced when working in 400GbE mode.
Note: When performing ISSU from an older version, this fix won't be applied. For fix to apply, a non-ISSU reset is required.
- Fix SN3800 specific warm boot scenario: Disable interface, Warm Boot, Enable Interface --> link will remain down.
Signed-off-by: Dror Prital <drorp@nvidia.com>
This is due to the fact that we use SONIC_OVERRIDE_BUILD_VARS internally
in our build jobs and this is not accounted in caching framework.
So we add MLNX_SDK_DEB_VERSION to force rebuild if we changed it via
SONIC_OVERRIDE_BUILD_VARS.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).
How I did it
Remove quagga-related code.
#### Why I did it
- After [sonic-linux-kernel#177](https://github.com/Azure/sonic-linux-kernel/pull/177) changes, the I2C mux channels of Baseboard and Switchboard CPLDs are moved from i2c-4 and i2c-5 to i2c-36 and i2c-37 respectively.
- This caused QSFP driver initialization of i2c-36 to i2c-41 to fail causing the ports from Ethernet208 to Ethernet248 fail.
#### How I did it
- The fix to this problem is to change the order of QSFP driver initialization to I2C mux channels.
- Instead of the order i2c-10 to i2c-41, the order i2c-4 to i2c-35 is being utilized.
- Also, need to change the i2c-mux-channel number for Baseboard CPLD and switchboard CPLD in scripts to access them.
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.
How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.
How to verify it
I verified this on the device str-7260cx3-acs-1.