Signed-off-by: Stephen Sun stephens@nvidia.com
Why I did it
Support zero buffer profiles
Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file
Dependency: It depends on Azure/sonic-swss#1910 and submodule advancing PR once the former merged.
How I did it
Add buffer profiles and pool definition for zero buffer profiles
If the buffer model is static:
Apply normal buffer profiles to admin-up ports
Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
Apply normal buffer profiles to all ports
buffer manager will take care when a port is shut down
Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists
Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:
The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:
One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files
Update sonic-cfggen test accordingly:
Adjust example output file of JSON template for unit test
Add unit test in for Mellanox's new buffer templates.
How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
* Add macsec-xpn-support iproute2 in syncd
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Polish code
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Remove useless files
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Add self-compiled iproute2 to docker sonic vs
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Enhance apt install for iproute2 dependencies
Signed-off-by: Ze Gan <ganze718@gmail.com>
Sonic master has moved to use SAI v1.9.1. For consistency, use the latest credo sai v0.7.2 package, built with
SAI header v1.9.1 too.
How I did it
Update credo sai url for v0.7.2
Add a debug tool crshell
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
- How I did it
When PSU is powered of, don't treat it as absent.
- How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
- add `set_write_max` and `get_write_max` apis to Sfp API
- add `get_revision` platform API
- implement `is_midplane_reacheable` API
- add software fallback for setting Xcvr low power mode
- add `get_error_description` API
- add support for Woodleaf SKU
- various fixes/refactors/cleanups
upgrade centec arm64 sai to v1.9.1 and fix syncd compile error:
```
2021-11-21T19:58:09.0294367Z /usr/bin/ld: libSyncd.a(libSyncd_a-VendorSai.o): in function `syncd::VendorSai::queryStatsCapability(unsigned long, _sai_object_type_t, _sai_stat_capability_list_t*)':
2021-11-21T19:58:09.0297367Z ./syncd/VendorSai.cpp:439: undefined reference to `sai_query_stats_capability'
2021-11-21T19:58:09.0298900Z collect2: error: ld returned 1 exit status
```
Co-authored-by: shil <shil@centecnetworks.com>
- Why I did it
Support PSU voltage high/low thresholds and power max threshold
1. Add thresholds support for voltage and power.
2. As thresholds are not supported on all platforms, we need to check the capability first and fetch thresholds only if it is supported.
- How I did it
- How to verify it
Run regression test and manual test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
Signed-off-by: pettershao-ragilenetworks pettershao@ragilenetworks.com
What I did it
Add new platform x86_64-ragile_ra-b6510-32c-r0 (Trident 3)
ASIC Vendor: Broadcom
Switch ASIC: Trident 3
Port Config: 32x100G
Add new platform x86_64-ragile_ra-b6920-4s-r0 (Tomahawk 3)
ASIC Vendor: Broadcom
Switch ASIC: Tomahawk 3
Port Config: 128x100G
-How I did it
Provide device and platform related files.
-How to verify it
show platform fan
show platform ssdhealth
show platform psustatus
show platform summary
show platform syseeprom
show platform temperature
show interface status
#### Why I did it
Mellanox builds were failing intermittently due to the `issue_version` file and MFT package not building correctly in the Azure pipeline environment (both of these packages were patched to build correctly with bullseye running on the host and buster running on the dockers)
#### How I did it
Fixed two problems:
1. BLDENV is not passed to the Makefiles so the references to this were replaced with correct logic
2. `issue_version` was not defined as a target for bullseye and as such was not cached. Altered the build such that it is defined as a target for bullseye (in the case of buster it builds the file, in the case of bullseye it copies from buster)
The previous PR fixing this was reverted as it is no longer necessary for a passing build and was not a long-term fix. https://github.com/Azure/sonic-buildimage/pull/9235
#### How to verify it
Build on AZP and verify success.
Remove the "4.19..." specific code to add "-unsigned" suffix and just do so for any linux version.
For the syseeprom API part, have the Arista syseeprom class inherit from a class that can populate db.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
To include capabilities fields in platform.json of DellEMC S6000, S6100, Z9332f platforms.
How I did it
Add the capabilities fields in each platform's respective platform.json.
How to verify it
Ran sonic-mgmt platform api test cases that use capabilities fields and verified that the results are as expected.
This commit fixes/avoids the following errors encountered during the
marvell-armhf build for bullseye
- Fix Marvell prestera DMA driver build failure due to kallsyms_lookup_name()
no longer being exported by the updated bullseye kernel. This is a temporary
fix that will be replaced by a future version of the DMA driver.
- Update qemu-user-static version to align with the new glibc version included
in bullseye
- Skip systemd-sonic-generator unit tests to avoid test failures. Root cause is
still TBD
#### Why I did it
Fix the following build errors observed when building marvell-armhf for bullseye
1. Marvell Prestera DMA driver uses kernel API no longer exported
ERROR: modpost: "kallsyms_lookup_name" [/sonic/platform/marvell-armhf/prestera/mrvl-prestera/cpssEnabler/linuxNoKernelModule/drivers//mvDmaDrv.ko] undefined!
2. Old qemu-user-static version does not support semop() leading to following build failure
semop(1): encountered an error: Function not implemented
3. systemd-sonic-generator unit test failure
ssg-test.cc:217: Failure
Expected equality of these values:
find_string_in_file(str_t, target, num_asics)
Which is: false
expected_result
Which is: true
Error validating Before=single_inst.service in test.service
[ FAILED ] SsgMainTest.ssg_main_40_npu (20 ms)
[----------] 4 tests from SsgMainTest (36 ms total)
[----------] Global test environment tear-down
[==========] 10 tests from 3 test suites ran. (54 ms total)
[ PASSED ] 7 tests.
[ FAILED ] 3 tests, listed below:
[ FAILED ] SsgMainTest.ssg_main_single_npu
[ FAILED ] SsgMainTest.ssg_main_10_npu
[ FAILED ] SsgMainTest.ssg_main_40_npu
3 FAILED TESTS
BRCM SAI missed implementing the SAI API "sai_query_stats_capability()" which is causing build issue.
The build issue is impacting PR(s) that need to use this API.
This PR is to stubbed BRCM SAI to add this SAI API and return not implemented so that it will fix build issue that it is causing.
No other functional changes were made.
The issu-version file for Mellanox is generated from the Mellanox SDK
libraries. The SDK is installed into a Buster docker container, but the
issu-version file goes onto the base OS, which is Bullseye. To work
around this, the issu-version build rules explicitly copies the
issu-version file to target/files/bullseye/ during the Buster build.
Because of our build infra, if caching is enabled and a cache is being
used, then for issu-version, since it is technically built as part of
Buster, then only target/files/buster/issu-version is saved into the
cache, and target/files/bullseye/issu-version isn't cached. If this
cache gets used, then target/files/bullseye/issu-version is missing, and
the final image build fails.
This is to work around the current build issue where Mellanox builds are
failing. This is so that issu-version is always "built", so that copy is
made into the bullseye directory.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
ipmihelper files are repeated for few DellEMC platforms. Removed the
files in sonic_platform since as part of debian rules,ipmihelper will be
copied to necessary directory.
* Make neccesary changed to mellanox platform code to build on Debian 11
* Revert use of backported kernel to build mft and elect to only build kernel module under bullseye
Allow mellanox platform to build and successfully switch packets in
Debian 11
Upgraded
* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches
Adjusted build system to support host system running bullseye and
dockers running buster.
Also add out of tree pca9548 mux driver to use platform data to mapping i2c bus with front panel port.
Signed-off-by: Jakkapan Jangmuang <jjangmua@celestica.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10
Co-authored-by: Shi Lei <shil@centecnetworks.com>
Add an include in saibcm-modules and saibcm-modules-dnx that are now
needed due to Mellanox kernel patches.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
ISSU will likely be broken. As of right now, the issu-version file is
not being generated during build.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The kernel already provides psample, and with module versioning being
done in modpost, having the SDK compile its own copy of psample breaks
loading dependent modules.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Remove Python 2 package installation from the base image. For container
builds, reference Python 2 packages only if we're not building for
Bullseye.
For libyang, don't build Python 2 bindings at all, since they don't seem
to be used.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Update the build rules in saibcm-modules to use the 5.10 kernel instead
of searching for the 4.19 kernel. In addition, some code changes were
done to get it to compile. The main categories of such changes are as
follows:
* For /proc files, `struct file_operations` has been replaced with
`struct proc_ops`.
* Y2038 changes to use the new APIs, since `do_gettimeofday()` is no
longer available.
* Minor changes in how external kernel module symbols are read by
modpost.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
All docker containers will be built as Buster containers, from a Buster
slave. The base image and remaining packages that are installed onto the
host system will be built for Bullseye, from a Bullseye slave.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
*Removing fake_platform environment variable. Following the merge of #9044 and Azure/sonic-swss#1978 the fake_platform environment variable is not used in any place and removing the stale references.
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Added support for Mellanox-SN2700 based SKU for docker-sonic-vs and to differentiate platform based on hw-sku rather than on fake_platform in VS.
Currently SAI VS library uses hwsku based SAI profile to differentiate and mock different platform implementations. The same functionality in swss is achieved using a fake_platform env variable.
Using a fake_platform has some drawbacks that the vs container appears to still use a different vendor hw-sku
env
PLATFORM=x86_64-kvm_x86_64-r0
HOSTNAME=dd21a1637723
PWD=/
HOME=/root
TERM=xterm
HWSKU=Force10-S6000
SHLVL=1
fake_platform=mellanox
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DEBIAN_FRONTEND=noninteractive
_=/usr/bin/env
In order to unify the approach at both swss and vs SAI and to be uniform throughout this PR introduces the approach of using hw-sku to differentiate different platforms. This requires support for Mellanox-SN2700 HWSKU for Mellanox platform which is also addressed by this PR.
root@23c9ba83b0aa:/# show platform summary
/bin/sh: 1: sudo: not found
Platform: x86_64-kvm_x86_64-r0
HwSKU: Mellanox-SN2700
ASIC: vs
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
root@23c9ba83b0aa:/# env
PLATFORM=x86_64-kvm_x86_64-r0
HOSTNAME=23c9ba83b0aa
PWD=/
HOME=/root
TERM=xterm
HWSKU=Mellanox-SN2700
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DEBIAN_FRONTEND=noninteractive
_=/usr/bin/env
root@23c9ba83b0aa:/#
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.
- How I did it
Changed the way how to version in RPC and DBG images are constructed.
- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
* [Nokia ixs7215] Platform API fixes
This commit delivers the following fixes
- Fix bug preventing access to second PSU eeprom
- Fix bug preventing updates to front panel PSU status led
- Fix SFP reset test case failure
* Fix LGTM alert
Fixed issue introduced by PR#7857. The problem was that introduction of
using init_cfg.json.j2 template did not have instruction in
init_config.json.j2 to merge the virtual chassis default_config.json.
Ths fix is to have a separate invokiation of sonic-cfggen to combine
init_config.json and virtual chassis default_config.json
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
* [202012][platform/barefoot] (#8543)
Why I did it
Pcied running by python 2.
How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2
How to verify it
docker exec pmon supervisorctl status
* [Netberg][nba715] Initial support for Netberg Aurora 715 switch
Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
Co-authored-by: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com>
Adding platform support for centec v682-48y8c_d.
Why I did it
Adding platform support for centec v682-48y8c_d.
This switch has 48 SFP+ (1G/10G/25G) ports, 8 QSFP28 (40G/100G) ports on CENTEC TsingMa.MX.
CPU used in v682-48y8c_d is Intel(R) Xeon(R) CPU D-1527.
How I did it
Modify related code in platform and device directory.
Upgrade sai to v1.8.
fix bug for parallel compile centec platform modules.
How to verify it
Build centec amd64 sonic image, verify platform functions (port, sfp, led etc) on centec v682-48y8c_d board.
Co-authored-by: Shi Lei <shil@centecnetworks.com>
Why I did it
Added Support for Dell EMC S5212f platform
How I did it
Implemented the support for Dell EMC S5212f platform
Platform: x86_64-dellemc_s5212f_c3538-r0
HwSKU: DellEMC-S5212f-P-25G
ASIC: broadcom
ASIC Count: 1
How to verify it
Verified the show command outputs
- Why I did it
* To support systems with dynamic port configuration
* Apply lazy initialization to faster the speed of loading platform API
- How I did it
* Add module.py to implement dynamic port configuration (aka line card model)
* Adjust chassis.py, platform.py, thermal.py, sfp.py to support dynamic port configuration
* Optimize existing code
- How to verify it
Platform regression on MSN4700, MSN3800 and MSN2700, 100% pass
Unit test covers all new changes.
Why I did it
Added Support for Alphanetworks SNJ60D0-320F platform
How I did it
Implemented the support for Alphanetworks SNJ60D0-320F platform
Platform: x86_64-alphanetworks_snj60d0_320f-r0
HwSKU: Alphanetworks-SNJ60D0-320F
ASIC: broadcom
ASIC Count: 1
How to verify it
Verified the show command outputs
- Why I did it
Bug fix:
bad_param request due to missing parser rest command while running mlxlink
- How I did it
Advance to MFT tool version to 4.17.2-12.
- How to verify it
Manually tested on all mellanox platforms.
- Why I did it
Add NVIDIA Copyright header to "mellanox" files
- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files
- How to verify it
Sanity tests and PR checkers.
Why I did it
Currently the mellanox platform API is validating the file extensions of firmware packages to be installed for basic sanity checking. However, ONIE packages do not have an extension and as such if there is a "." in the name it is taken to be an extension and then fails the sanity check.
How I did it
I removed the check which ensures that ONIE images don't have a file extension.
How to verify it
Name the ONIE updater file 2021.onie and attempt to install it via fwutil install fw 2021.onie --yes
Why I did it
The fwutil update all utility expects the auto_update_firmware method in the Platform API to execute the update_firmware() call and not the install_firmware() call.
How I did it
Changed the method in the mellanox platform API component implementation.
How to verify it
Run fwutil update all with a CPLD update on a Mellanox platform and verify that it properly updates the firmware using the MPFA file.
- Why I did it
Advance to new Nvidia SAI release with the following changes:
New features:
- Align with new SDK/FW version 4.5.1006 and above and in parallel to existing used SDK/FW bundle
- Implement timestamp and egress_queue_index hostif packet attributes.
Bugs fixes:
- Fix compilation issues with gcc10
- Fix return code for buffer overflow for query enum values and query statistics capabilities
- Reduce verbosity of print in case packet ingress on invalid port
- Fix mirror Qos settings
- How I did it
Updated SAI version and submodule pointer
- How to verify it
Run regression tests from sonic-mgmt
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
PCIe Gen1 settings was needed for Dell S6000 device.
How I did it
Modified from Gen2 to Gen 1 speed for Dell S6000 PCIe devices
How to verify it
Check lspci output, verify the syslogs
- Why I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high, so that thermal algorithm would set fan speed to minimum allowed earlier and save power.
- How I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high
- How to verify it
Manual test
Why I did it
Added support for the device S5224F
How I did it
Implemented the support for the platform S5224F
Switch Vendor: DellEMC
Switch SKU: S5224F-ON
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform/interface commands
Why I did it
Added support for the device N3248PXE
How I did it
Implemented the support for the platform N3248PXE
n3248pxe_unit_test_log.txt
Switch Vendor: DellEMC
* Switch SKU: N3248PXE
* ASIC Vendor: Broadcom
* SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform commands
Why I did it
Added support for the device N3248TE
How I did it
Implemented the support for the platform N3248TE
Switch Vendor: DellEMC
Switch SKU: N3248TE
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform commands
To Fix#8697 . The config load_minigraph initializes 'admin_status' to up when platform.json has DPB configs. This doesn't happen when using port_config.ini
The update minigraph has logic to initialize only the ports whose neighbors are defined or those belonging to portchannel
However, a change was introduced to have default admin status to be 'up' in portconfig.py when the minigraph was using platform.json
This will lead to sanity check failure in sonic-mgmt and thus no test cases could be run
* [Nokia ixs7215] Miscellaneous platform API fixes
This commit delivers the following fixes for the Nokia ixs7215 platform
- Fix bug in a fan API error path
- Add support for setting the fan drawer led
- Add support for getting/setting the front panel PSU status led
- Add support for getting the min/max observed temperature value
* [Nokia ixs7215] code review changes: temperature min/max values
- Why I did it
Advance to Mellanox SAI ver 1.19.2 to pick up dynamic Policy Based Hashing support.
For this version above the static Policy Based Hashing is no longer supported.
For detailed release notes check https://github.com/Mellanox/SAI-Implementation/blob/sonic2111/release%20notes.txt
- How I did it
Updated SAI-Implementation submodule
- How to verify it
1. make configure PLATFORM=mellanox
2. make target/sonic-mellanox.bin
Run full regression as well as new dynamic PBH tests
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
What I did it
Add new platform x86_64-ragile_ra-b6910-64c-r0 (Tomahawk 3)
ASIC Vendor: Broadcom
Switch ASIC: Tomahawk 3
Port Config: 64x100G
-How I did it
Provide device and platform related files.
-How to verify it
show platform fan
show platform ssdhealth
show platform psustatus
show platform summary
show platform syseeprom
show platform temperature
show interface status
Why I did it
Power cycle test case fails for Z9332f in sonic-mgmt framework(#8605).
How I did it
Modified the platform API to return expected strings.
How to verify it
Power cycle the device and verify the reboot reason.
Run sonic-mgmt test_reboot script.
Catch up on fixes from BRCM SUG repo to pick up fixes after 5.0.0.6 all the way up to 5.0.0.8
Fixes include the following:
```
CS00012201827: Warmreboot causes syncd crash with SAI_API_UNSPECIFIED:syncdb_data_file_read:2230 Failed to parse JSON: error -2
DNX: Fix for ACL table create with v6 next hdr attr
and many unspecified changes that also went into 5.0.0.8
```
#### How to verify it
Preliminary tests looks fine on both XGS (gechiang) and DNX (judyjoseph)
On XGS testing done as following:
BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7260CX3 T0 DUT and all passed:
```
ipfwd/test_dir_bcast.py
fdb/test_fdb.py
fib/test_fib.py
vlan/test_valn.py
```
Also validated for for CS00012201827 (https://github.com/Azure/sonic-buildimage/issues/8300)
To enable saiserver docker on different platforms, it needs different configuration files. make the saiserver docker mount them in hwsku folder.
Co-authored-by: Ubuntu <richardyu@richardyu-ubuntu-vm0.trsxrdzozv2e1czsze2t05vqzh.ix.internal.cloudapp.net>
- Disable health monitoring of `psu.voltage` until support is implemented
- chassis: disable provisioning bit once linecard has booted
- chassis: fix issue in `show version` when running as `admin`
- chassis: fix race when reading an eeprom before it's available
- chassis: implement `get_all_asics` call
- api: fix `ChassisBase.get_system_eeprom_info` implementation
- api: add missing thermal condition and info
- api: fix return value of `ChassisBase.set_status_led`
- sfp: introduce SfpOptoeBase implementation used based on configuration knob
- psu: rely on pmbus to read input/output status when other mechanism is missing
- misc: other refactors and improvements
Why I did it
End goal: To have azure pipeline job to run multi-asic VS tests.
Intermediate goal: Require multi-asic KVM image so that the test can be run.
The difference between single asic and multi-asic KVM image is asic.conf file which has different NUM_ASIC values.
Idea behind the approach in this PR to attain the intermediate goal above:
Ease of building multi-asic KVM image so that any user or azure pipeline can use a simple make command to generate single or multi-asic KVM images as required.
Use a single onie installer image and multiple KVM images for single and multi-asic images.
Current scenario:
For VS platform, sonic-vs.bin is generated which is the onie installer image and sonic-vs.img.gz is generated which is the KVM iamge.
Scenario to be achieved:
sonic-vs.bin - which will include single asic platform, 4 asic platform and 6 asic platform.
sonic-vs.img.gz - single asic KVM image
sonic-4asic-vs.img.gz - 4 asic KVM image
sonic-6asic-vs.img.gz - 6 asic KVM image
In this PR, 2 new platforms are added for 4-asic and 6-asic VS.
How I did it
Create 4-asic and 6-asic device directories with the required files and hwsku files.
Add onie-recovery image information in vs platform.
How to verify it
After this PR change, no build change.
sonic-vs.bin onie installer image should include information of new multi-asic vs platforms.
Why I did it
To monitor the SSD health condition in DellEMC S6100 platform post upgrade.
A daemon is introduced to monitor the SSD every one hour.
To check for SSD status at boot time and at the time of cold-reboot.
All these changes are supported only for newer SSD firmware.
Porting changes from 201911 branch
Added a platform_reboot_pre_check script to prevent cold-reboot based on SSD status.
Depends on Azure/sonic-utilities#1556
DO NOT MERGE UNTIL ABOVE PR IS MERGED
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.
A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU
#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file
#### How to verify it
Manual test
- Why I did it
Update SDK\FW version to 4.4.3326\2008.3326. This version contains:
New Features:
1. Add support for Fast Boot for SN3800
Bug Fixing:
1. In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
2. On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
3. On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
4. Some Molex cables do not support speed after reboot
- How I did it
- How to verify it
Was verified by running regression tests that includes complete sonic-mgmt tests supported
- Why I did it
New release of MFT has the following changelog / RN
Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot.
Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation
- How I did it
Update makefile pointer to new version.
- How to verify it
Manually tested on all Mellanox platforms.
CPLD offset in led drv is incorrect.
#### How to verify it
echo 1 > brightness, show green on in DIAG led
echo 3 > brightness, show red on in DIAG led
echo 4 > brightness, show blue on in DIAG led
echo 5 > brightness, show green blinking in DIAG led
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards
How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
Update SDK\FW version to 4.4.3320\2008.3324. This version contains:
New Features:
* Add support for Fast Boot for SN3800
Bug Fixing:
* In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
* On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
* On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
* Some Molex cables do not support speed after reboot
Signed-off-by: Dror Prital <drorp@nvidia.com>
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes
Co-authored-by: Carl Keene <keene@nokia.com>
The start template script that this value is used in will determine what
network namespace to use, and will add --net=host if it needs to run in
the host namespace.
With Docker 20.10, if --net=host is specified twice in docker run, then
it errors out. Therefore, remove the explicit --net=host in the run
options setting and let the start template script specify it.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
fix the dx010 system eeprom unavailable issue
How I did it
enable the i2c slave 30ms timeout mechanism
How to verify it
i2cstress test in DX010 iSMT controller bus
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
Support for show system-health command in s5232f
How I did it
Added the configuration, API changes to support system health
How to verify it
Execute "show system-health summary/detail/monitor-list" CLI.
the branch refers the branch name that the commit is in,
for example master, 202012, 201911, ...
In case there is no branch, the name will be HEAD.
release is encoded in /etc/sonic/sonic_release file.
the file is only available for a release branch.
It is not available in master branch.
example for master branch
```
build_version: 'master.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: 'master'
release: 'none'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```
example for 202012 release branch
```
build_version: '202012.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: '202012'
release: '202012'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Why I did it
platform test suite failed for few API's in DellEMC Z9332f platform.
How I did it
Modified the API's to return the expected values in the script.
How to verify it
Run platform test suite after making the changes.
Why I did it
Update Makefile, so it does the following:
For a given platform, verify if platform/checkout/.ini exists and hence run the platform/checkout/template.j2. This allows platform code to be checked out during the 'make configure' stage.
How I did it
git clone git@github.com:Azure/sonic-buildimage.git
mkdir platform/cisco-8000
make init
make configure PLATFORM=cisco-8000
make all
This change is to add a gbsyncd container to accommodate the syncd process and the SAI libraries for the Credo gearbox chips.
How I did it
This container works similar to the existing Broadcom syncd container. Its main difference is that the SAI-related dynamic libraries are replaced by the ones for Credo gearbox chips, and the container only reacts to SAI events for the gearbox chips. The SAI libraries will be provided by the package libsai-credo_1.0_amd64.deb.
For the image build, the added container will be built and included in the Broadcom platform image, after $(LIBSAI_CREDO)_URL = is replaced to the correct value. For now, as $(LIBSAI_CREDO)_URL is empty, the container build is skipped in the image build.
After the container is included in the image, in the runtime, the container will begin with checking the existence of /usr/share/sonic/hwsku/gearbox_config.json; if that file is not provided, the container will exit by itself. Therefore, for platforms unrelated to the Credo chips, as long as they are not providing the file, they will not be affected by this change.
Why I did it
Added a check in determining CPU reset in fast-/warm-reboot in their respective platform plugin.
Introducing reboot plugin for "reboot" command to handle its own platform plugin.
How I did it
On branch s6100_fast_warm_check
Changes to be committed:
(use "git reset HEAD ..." to unstage)
modified: ../../debian/platform-modules-s6100.install
modified: ../scripts/fast-reboot_plugin
modified: ../scripts/platform_reboot_override
new file: ../scripts/reboot_plugin
modified: ../scripts/track_reboot_reason.sh
modified: chassis.py
How to verify it
Triggered cold-reset inside fast-reboot to test out the reboot-cause 2.0 API.
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available
How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.
How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
Add needed code to pddf_custom_psu.c to deal with multi PSU and get SN.
How to verify it
Plugin new PSU (3Y) and test,
```
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_serial_num
S0A000X601919000013
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_model_name
YESM1300AM
root@sonic:/home/admin# pddf_psuutil mfrinfo
PSU Status Manufacturer ID Model Serial Fan Airflow Direction
PSU1 NOT OK 3Y POWER YESM1300AM S0A000X601919000007 exhaust
PSU2 OK 3Y POWER YESM1300AM S0A000X601919000013 exhaust
```
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
Move SAI libraries for Broadcom for XGS and DNX families to 5.0.0.6
Included fixes
```
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012192505 | [4.3] Re-encap IPinIP decap packets
CS00012192502 | [3.7.5.2] Start LED shell script execution on all DELL based platforms causing all ports flapping on SAI 3.7.5.2
CS00012191363 | [4.3] Support of memscan thread to detect TCAM parity error
CS00012190932 | [4.3] SAI_PORT_PFC_X_RX_PKTS incremented incorrectly even when no PFC frames are received on that priority
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00011382163 | [4.4] Support warm-boot from 3.5 to 4.3
CS00011318937 | [4.3] MACSec SAI Support for Jericho2c+
CS00011318926 | [4.3] Provide SAI support for Jericho2c+
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012195261 | [4.3][5.0][TD3]VLAN tagged IP packet received on untagged interface being routed instead of dropped
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012196056 | [4.3.3.8][WARMBOOT] syncd[2584]: segfault at 5616ad6c3d80 ip 00007f61e0c6bc65 sp 00007fff0c5a7a90 error 4 in libsai.so.1.0[7f61e0a95000+3cd8000]
CS00012195262 | [4.3][5.0][TD3] Malformed IP packet(missing IP header) received on a VLAN Interface is flooded to other LVAN members instead of being dropped
CS00012195956 | [4.3.3.8] [TD3]Syncd Crash at brcm_sai_tnl_mp_create_tunnel()
PR 4346163: Add support for AN/LT
```
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform
- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also
- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.
- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings