This PR aims to fix the healthd crash issue by adding system health monitoring configuration file for platform Celestica E1031 by adding a new configuration file under the path device/celestica/x86_64-cel_e1031-r0/.
How to verify it
I manually restart the system-health.service and confirmed that healthd is running.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
What I did:
add platform components
How I did it:
In platform_components.json add chassis and empty component
How to verify it:
Run show platform firmware updates
- Why I did it
Removed 2x40G for SN3800. This mode is not supported by hardware.
- How I did it
Removing it from hwsku.json and platform.json
- How to verify it
Load it in the device and check supported modes
Why I did it
Fix an issue on the Clearwater2 linecard.
When the linecard is started with a fresh image without configuration, phys would not be initialized.
How I did it
Added default_sku for Clearwater2 which prevents config-setup from failing to create a default config_db.json.
Added some extra logic in the phy-credo-init script to run the phy_config.sh of the hwsku pointed by default_sku if the DEVICE_METADATA.localhost.hwsku information is not populated in CONFIG_DB.
How to verify it
Booting an image with this change and without configuration will lead to the phys being initialized using the phy_config.sh from default_sku.
What I did it
Add new platform x86_64-ragile_ra-b6910-64c-r0 (Tomahawk 3)
ASIC Vendor: Broadcom
Switch ASIC: Tomahawk 3
Port Config: 64x100G
-How I did it
Provide device and platform related files.
-How to verify it
show platform fan
show platform ssdhealth
show platform psustatus
show platform summary
show platform syseeprom
show platform temperature
show interface status
- Why I did it
SN4600 A0 platform was EOL, so there is no need to support it, sensor conf can be removed and we don't need to maintain 2 sensor conf files, only A1 platform is needed.
- How I did it
Remove get_sensors_conf_path which intends to load correct sensor conf for different(A0/A1) platforms.
Remove the sensor conf for A0 platform, rename previous sensor.conf.a1 to sensor.conf
- How to verify it
Run sensor test on the SN4600 platform.
- Disable health monitoring of `psu.voltage` until support is implemented
- chassis: disable provisioning bit once linecard has booted
- chassis: fix issue in `show version` when running as `admin`
- chassis: fix race when reading an eeprom before it's available
- chassis: implement `get_all_asics` call
- api: fix `ChassisBase.get_system_eeprom_info` implementation
- api: add missing thermal condition and info
- api: fix return value of `ChassisBase.set_status_led`
- sfp: introduce SfpOptoeBase implementation used based on configuration knob
- psu: rely on pmbus to read input/output status when other mechanism is missing
- misc: other refactors and improvements
Why I did it
End goal: To have azure pipeline job to run multi-asic VS tests.
Intermediate goal: Require multi-asic KVM image so that the test can be run.
The difference between single asic and multi-asic KVM image is asic.conf file which has different NUM_ASIC values.
Idea behind the approach in this PR to attain the intermediate goal above:
Ease of building multi-asic KVM image so that any user or azure pipeline can use a simple make command to generate single or multi-asic KVM images as required.
Use a single onie installer image and multiple KVM images for single and multi-asic images.
Current scenario:
For VS platform, sonic-vs.bin is generated which is the onie installer image and sonic-vs.img.gz is generated which is the KVM iamge.
Scenario to be achieved:
sonic-vs.bin - which will include single asic platform, 4 asic platform and 6 asic platform.
sonic-vs.img.gz - single asic KVM image
sonic-4asic-vs.img.gz - 4 asic KVM image
sonic-6asic-vs.img.gz - 6 asic KVM image
In this PR, 2 new platforms are added for 4-asic and 6-asic VS.
How I did it
Create 4-asic and 6-asic device directories with the required files and hwsku files.
Add onie-recovery image information in vs platform.
How to verify it
After this PR change, no build change.
sonic-vs.bin onie installer image should include information of new multi-asic vs platforms.
Why I did it
The first 4 ports on this dut are breakout ports. They might not always be connected in lab. Mark them as 'RJ45' to skip the SFP check since they are by default disabled.
How to verify it
run platform test_reboot.py
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- Why I did it
Update platform data files for SN4800 to support chassis management
- How I did it
Update pcie.yml
Update sensors.conf
Update platform.json
- How to verify it
Run platform test suite in sonic-mgmt
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json
- How I did it
Added the aforementioned breakout modes to platform.json
- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards
How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
Why I did it
The SFP1 port was disabled in the default configuration.
How I did it
This commit enables the 10G front port for usage.
This was done using a DX030 together with an Arista switch using bcmsh and phy diag xe0 dsc to figure out what lane mappings make sense.
Validated on Celestica Seastone2 DX030.
How to verify it
Own a Celestica DX030
Connect the front SFP1 port to something.
It works :-)
You can do tcpdump -i Ethernet128 -n and you will see both incoming and outgoing LLDP.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
To support dynamic port breakout Broadcom configurations on Arista platforms.
How I did it
Updated platform.json for platforms and added new hwsku, Broadcom config, and hwsku.json for dynamic port breakout usage.
The name of the new hwsku name used is very similar to the platform name (platform x86_64-arista_7050_qx32s hwsku Arista-7050QX-32S) as the flex hwsku is meant to be the default in the future.
How to verify it
Boot up device with the new hwsku, interfaces are up.
Change hwsku.json with new default breakout mode and reload device, breakout will have successfully been applied.
#### Why I did it
The values of the syseeprom were not valid
#### How I did it
I took the correct hexdump values from a real switch and created this hex file again
#### How to verify it
decode-syseeprom will display the new values
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes
Co-authored-by: Carl Keene <keene@nokia.com>
* add hwsku.json for the Nokia-7215
* added required default_brkout_mode to hwsku as its not optional
* remove tabs from the file so spacing consistent
Co-authored-by: Carl Keene <keene@nokia.com>
#### Why I did it
New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.
#### How I did it
Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.
#### How to verify it
Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
Support for show system-health command in s5232f
How I did it
Added the configuration, API changes to support system health
How to verify it
Execute "show system-health summary/detail/monitor-list" CLI.
* Update default cable len to 0m for TD2 (#8298)
* Update sonic-cfggen tests with the correct cable len
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
- With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Cfggen tests passed with the cable len update
Why I did it
To support "pcied" and "pcieutil" commands in DellEMC Z9332f.
How I did it
Add 'pcie.yaml' in device/dell/[PLATFORM]/ directory.
How to verify it
Execute "pcieutil check" command.
Logs: UT_logs.txt
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
1. Implement FanDrawer-Fan hierarchy.
2. Enable thermalctld, disable pcied.
3. Implement SystemLED in Chassis.
4. Correct Fan direction
5. Implement require Fan APIs for SystemHealthMonitoring.
6. Handle non-ascii character while reading PSU model/serial num.
```
Check if System-health can pass the check and display the SystemLED correctly.
///////// booting, DIAG_LED = GREEN_BLINKING /////////
root@sonic:/tmp# show system-health detail
System is currently booting...
root@sonic:/tmp# cat /sys/class/leds/diag/brightness
5
///////// container_checker fail, DIAG_LED = AMBER /////////
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_AMBER
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: container_checker is not Status ok
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
container_checker Not OK Program
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
--------------- -------- ------
asic Ignored Device
psu.temperature Ignored Device
///////// skip container_checker, DIAG_LED = GREEN /////////
root@sonic:/sys/bus/i2c/devices# vi /usr/share/sonic/device/x86_64-accton_as4630_54te-r0/system_health_monitoring_config.json
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_GREEN
Services:
Status: OK
Hardware:
Status: OK
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
----------------- -------- -------
container_checker Ignored Service
psu.temperature Ignored Device
asic Ignored Device
```
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
*Edited platform.json for 4600 & 4600C
*Edited hwsku.json and port_config.ini files for all the SKU's present under these platforms
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
This reverts commit a557dbd97e.
Reverting this PR as it is not required currently for multi-asic VS.
multi-asic VS will come up with multiple instances of swss and syncd. syncd will use default hwinfo string, same as in single asic VS.
Why I did it
To determine the revision of the pcie.yaml to be used based on BIOS version in DellEMC S6100 platform.
Depends on: Azure/sonic-platform-common#195
How I did it
Added two revisions of pcie.yaml pcie_1.yaml and pcie_2.yaml
Included a platform-specific Pcie class to provide the revision of the pcie.yaml to be used by pcieutil/pcied.
How to verify it
Execute pcieutil check (Azure/sonic-utilities#1672) command and verify the list of PCIe devices displayed.
Logs: UT_logs.txt
#### Why I did it
Support API 2.0 for S5248F platform
#### How I did it
Making changes to S5248F platform specific directory
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
MMU configuration for DellEMC Z9332 systems in T0/T1 topology
How I did it
Updated config.bcm, QoS/Buffer pool and lossy/lossless profile settings
How to verify it
Verified that Dell systems are booting up fine and basic test cases passing.
Introduce new sonic-buildimage images for Broadcom DNX ASIC family.
sonic-broadcom-dnx.bin
sonic-aboot-broadcom-dnx.swi
How I did it
NO CHANGE to existing make commands
make init; make configure PLATFORM=broadcom; make target/sonic-aboot-broadcom.swi; make target/sonic-broadcom.bin
The difference now is that it will result in new broadcom images for DNX asic family as well.
sonic-broadcom.bin, sonic-broadcom-dnx.bin
sonic-aboot-broadcom.swi, sonic-aboot-broadcom-dnx.swi
Note: This PR also adds support for Broadcom SAI 5.0 (based on 1.8 SAI ) for DNX based platform + changes in platform x86_64-arista_7280cr3_32p4 bcm config files and platform_env.conf files
Add device and platform code for ix7-bwde, ix8a-bwde.
Support platform API 2.0 for all quanta platforms except for ix1b
Co-authored-by: robert.hong <robert.hong@qct.io>
- Why I did it
To create SDK dump on Mellanox devices when SDK event has occurred.
- How I did it
Set the SKUs keys needed to initialize the feature in SAI.
- How to verify it
Simulate SDK event and check that dump is created in the expected path.
- Why I did it
The default breakout mode according to hwsku.json for the MSN4410 is 1x400G and this is not a supported breakout mode according to its platform.json
This causes a conflict on boot of this platform and no containers on the switch will init successfully.
- How I did it
Referenced the platform specification files and updated platform.json
- How to verify it
Install master version of SONiC on MSN4410
Boot switch and verify swss is successfully running using docker ps
What I did:
Updated 7260 MMU Profile based on latest MSFT Tier 1 Tomahawk2_MMU_Setting_48x100G_40m_16x100G_300m_v1.0 and
TH2_PGHdrm_MSFT.
How I verify:
Made sure image is up/traffic is flowing/mmu dump looked fine.
SAI qos test need will be updated to support this SKU.
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API
- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly
- How to verify it
Manually test and run regression platform test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Process pcied failed on Arista-7170-32CD-C32
```
root@sonic:/# supervisorctl
chassis_db_init EXITED Jun 03 08:48 AM
dependent-startup EXITED Jun 03 08:48 AM
ledd RUNNING pid 28, uptime 3:07:49
lm-sensors EXITED Jun 03 08:48 AM
pcied FATAL Exited too quickly (process log may have details)
```
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>
Why I did it
7050 S4Q31 mmu configuration is missing ALPM configurations, causing not enough memory reserved for routes. Orchagent crashes on a nightly testbed with 6400 route entries.
How I did it
Add the missing ALPM configurations.
How to verify it
Load the configuration on testbed and verified new configuration exists and no more crash.
Signed-off-by: Ying Xie ying.xie@microsoft.com
#### Why I did it
- After [sonic-linux-kernel#177](https://github.com/Azure/sonic-linux-kernel/pull/177) changes, the I2C mux channels of Baseboard and Switchboard CPLDs are moved from i2c-4 and i2c-5 to i2c-36 and i2c-37 respectively.
- This caused QSFP driver initialization of i2c-36 to i2c-41 to fail causing the ports from Ethernet208 to Ethernet248 fail.
#### How I did it
- The fix to this problem is to change the order of QSFP driver initialization to I2C mux channels.
- Instead of the order i2c-10 to i2c-41, the order i2c-4 to i2c-35 is being utilized.
- Also, need to change the i2c-mux-channel number for Baseboard CPLD and switchboard CPLD in scripts to access them.
hwskus.
Why I did it
For multi-asic platforms, orchagent process in swss docker is started by passing device_ids(or asic_ids).
Each swss docker starts orchagent with a different device_id. This device_id is passed as Hardware info to syncd. For syncd to start with the right hwinfo, context_config.json is passed as an argument. context_config.json file is looked up to get the hwinfo information.
sonic-sairedis PRs required for this diff to be used to bring up multi-asic VS:
Azure/sonic-sairedis#830Azure/sonic-sairedis#832
How I did it
Add context_config.json for each asic in the same structure as provided here: https://github.com/Azure/sonic-sairedis/blob/master/lib/src/context_config.json
Each asic context_config.json will have different hwinfo string.
hwinfo string will be same as device id retrieved from asic.conf file.
Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
#### Why I did it
Support 2km cables for Microsoft SKUs
#### How I did it
1. Update pg_profile_lookup.ini with 2000m cable supported
2. Update buffer configuration for t1 with uplink cable 2000m
- For SN3800 platform:
- C64:
- t0: 32 100G down links and 32 100G up links.
- t1: 56 100G down links and 8 100G up links with 2 km cable.
- D112C8: 112 50G down links and 8 100G up links.
- D24C52: 24 50G down links, 20 100G down links, and 32 100G up links.
- D28C50: 28 50G down links, 18 100G down links, and 32 100G up links.
- For SN2700 platform:
- D48C8: 48 50G down links and 8 100G up links.
- C32:
- t0: 16 100G down links and 16 100G up links.
- t1: 24 100G down links and 8 100G up links with 2 km cable.
- For SN4600C platform:
- D112C8: 112 50G down links and 8 100G up links.
#### How to verify it
Run regression test
This PR contains the following changes
Original Arista-7050-QX-32S sku (32x40G ports) has been renamed to Arista-7050QX32S-Q32
Arista-7050-QX-32S is symlinked to Arista-7050QX-32S-S4Q31 (4x10G, 31x40G ports)
Signed-off-by: Neetha John <nejo@microsoft.com>
**- Why I did it**
- To fix failed test cases of Seastone-DX010 platform APIs that found on [platform_tests](https://github.com/Azure/sonic-mgmt/tree/master/tests/platform_tests/api) script
**- How I did it**
1. Add device/celestica/x86_64-cel_seastone-r0/platform.json
2. Update functions to support python3.7
3. Add more functions follow latest sonic_platform_base
4. Fix the bug
Why I did it
Arista-7260CX3-Q64 is missing T1 MMU configuration.
How I did it
Define T1 MMU configuration for Arista-7260CX3-Q64.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
#### Why I did it
To get rid of obsolete code
#### How I did it
Removed plugins folder from device/barefoot
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
#### Why I did it
The label for PSU related sensors on the Spectrum-2 platform is not aligned with the physical location of the PSU.
#### How I did it
Update the label in the sensor conf file for those relevant platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
PG profile settings need to be aligned with Arista-7050-QX-32S
How I did it
Copy over the current settings from Arista-7050-QX-32S and define params for 10G and 1G speeds as well
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Need proper MMU and Qos settings for Arista-7050QX-32S-S4Q31
How I did it
Updated the settings based on Arista-7050-QX-32S
#### Why I did it
Add initial support of SN4800 platform for Mellanox ASIC simulation device.
NOTE: This is work in progress and not full support of the platform.
#### How I did it
Add new folders for SN4800 with zero ports based on SN4700 Spectrum-3 switch.
#### Why I did it
Improve readability of `show environment` output.
#### How I did it
In all sensors.conf, give the customized labels according to HW specifications for each model.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
- Why I did it
Add initial support of SN4800 platform .
NOTE: This is work in progress and not full support of the platform.
- How I did it
Add new folders for SN4800 with zero ports based on SN4700 Spectrum-3 switch.
- How to verify it
Simulator device was tested. See #7448
Add support for Accton as9726-32d platform
This pull request is based on as9716-32d, so I reference as9716-32d to create new model: as9726-32d.
This module do not need led driver to control led, FPGA can handle it.
I also implement API2.0(sonic_platform) for this model, CPLD driver, PSU driver, Fan driver to control these HW behavior.
Platform library changes
- Fix the use of /proc/modules during testing, fixes#7463
- Add `libsfp-eeprom.so` build to read/write xcvr eeproms in C
- Add some more reboot-cause information
- Write down temperature hw thresholds to the sensors
- Report software thresholds through platform api
- Writ `port_name sysfs` file of optoe`
- Tests enhancements
- Fix dependency issues for chassis provisioning
Platform configuration changes
- Add `pcie.yaml` configuration for a few platforms
- Mount `libsfp-eeprom.so` inside `pmon`
- Fix `Arista-7050SX3-48C8` and `Arista-7050SX3-48YC8' platform and hwsku
- Miscellaneous fixes
Co-authored-by: Boyang Yu <byu@arista.com>
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
#### Why I did it
MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0
#### How I did it
Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
- Why I did it
Enable VXLAN src port range configuration via SAI profile for Mellanox-SN3800-D28C49S1 SKU
- How I did it
Added SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 configuration to appropriate sai.profile
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
#### Why I did it
Current platform.json lacks some peripheral device related facts, like chassis/fan/pasu/drawer/thermal/components names, numbers, etc.
#### How I did it
Add platform device facts to the platform.json file
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json