Depends on Azure/sonic-utilities#1626
Depends on Azure/sonic-swss#1754
QOS tables in config db used ABNF format i.e "[TABLE_NAME|name] to refer fieldvalue to other qos tables.
Example:
Config DB:
"Ethernet92|3": {
"scheduler": "[SCHEDULER|scheduler.1]",
"wred_profile": "[WRED_PROFILE|AZURE_LOSSLESS]"
},
"Ethernet0|0": {
"profile": "[BUFFER_PROFILE|ingress_lossy_profile]"
},
"Ethernet0": {
"dscp_to_tc_map": "[DSCP_TO_TC_MAP|AZURE]",
"pfc_enable": "3,4",
"pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]",
"tc_to_queue_map": "[TC_TO_QUEUE_MAP|AZURE]"
},
This format is not consistent with other DB schema followed in sonic.
And also this reference in DB is not required, This is taken care by YANG "leafref".
Removed this format from all platform files to consistent with other sonic db schema.
Example:
"Ethernet92|3": {
"scheduler": "scheduler.1",
"wred_profile": "AZURE_LOSSLESS"
},
Dependent pull requests:
#7752 - To modify platfrom files
#7281 - Yang model
Azure/sonic-utilities#1626 - DB migration
Azure/sonic-swss#1754 - swss change to remove ABNF format
Why I did it
Added support for the device S5224F
How I did it
Implemented the support for the platform S5224F
Switch Vendor: DellEMC
Switch SKU: S5224F-ON
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform/interface commands
Why I did it
Added support for the device N3248PXE
How I did it
Implemented the support for the platform N3248PXE
n3248pxe_unit_test_log.txt
Switch Vendor: DellEMC
* Switch SKU: N3248PXE
* ASIC Vendor: Broadcom
* SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform commands
Why I did it
Added support for the device N3248TE
How I did it
Implemented the support for the platform N3248TE
Switch Vendor: DellEMC
Switch SKU: N3248TE
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin
How to verify it
Verified the show platform commands
This PR aims to fix the healthd crash issue by adding system health monitoring configuration file for platform Celestica E1031 by adding a new configuration file under the path device/celestica/x86_64-cel_e1031-r0/.
How to verify it
I manually restart the system-health.service and confirmed that healthd is running.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
What I did:
add platform components
How I did it:
In platform_components.json add chassis and empty component
How to verify it:
Run show platform firmware updates
- Why I did it
Removed 2x40G for SN3800. This mode is not supported by hardware.
- How I did it
Removing it from hwsku.json and platform.json
- How to verify it
Load it in the device and check supported modes
Why I did it
Fix an issue on the Clearwater2 linecard.
When the linecard is started with a fresh image without configuration, phys would not be initialized.
How I did it
Added default_sku for Clearwater2 which prevents config-setup from failing to create a default config_db.json.
Added some extra logic in the phy-credo-init script to run the phy_config.sh of the hwsku pointed by default_sku if the DEVICE_METADATA.localhost.hwsku information is not populated in CONFIG_DB.
How to verify it
Booting an image with this change and without configuration will lead to the phys being initialized using the phy_config.sh from default_sku.
What I did it
Add new platform x86_64-ragile_ra-b6910-64c-r0 (Tomahawk 3)
ASIC Vendor: Broadcom
Switch ASIC: Tomahawk 3
Port Config: 64x100G
-How I did it
Provide device and platform related files.
-How to verify it
show platform fan
show platform ssdhealth
show platform psustatus
show platform summary
show platform syseeprom
show platform temperature
show interface status
- Why I did it
SN4600 A0 platform was EOL, so there is no need to support it, sensor conf can be removed and we don't need to maintain 2 sensor conf files, only A1 platform is needed.
- How I did it
Remove get_sensors_conf_path which intends to load correct sensor conf for different(A0/A1) platforms.
Remove the sensor conf for A0 platform, rename previous sensor.conf.a1 to sensor.conf
- How to verify it
Run sensor test on the SN4600 platform.
- Disable health monitoring of `psu.voltage` until support is implemented
- chassis: disable provisioning bit once linecard has booted
- chassis: fix issue in `show version` when running as `admin`
- chassis: fix race when reading an eeprom before it's available
- chassis: implement `get_all_asics` call
- api: fix `ChassisBase.get_system_eeprom_info` implementation
- api: add missing thermal condition and info
- api: fix return value of `ChassisBase.set_status_led`
- sfp: introduce SfpOptoeBase implementation used based on configuration knob
- psu: rely on pmbus to read input/output status when other mechanism is missing
- misc: other refactors and improvements
Why I did it
End goal: To have azure pipeline job to run multi-asic VS tests.
Intermediate goal: Require multi-asic KVM image so that the test can be run.
The difference between single asic and multi-asic KVM image is asic.conf file which has different NUM_ASIC values.
Idea behind the approach in this PR to attain the intermediate goal above:
Ease of building multi-asic KVM image so that any user or azure pipeline can use a simple make command to generate single or multi-asic KVM images as required.
Use a single onie installer image and multiple KVM images for single and multi-asic images.
Current scenario:
For VS platform, sonic-vs.bin is generated which is the onie installer image and sonic-vs.img.gz is generated which is the KVM iamge.
Scenario to be achieved:
sonic-vs.bin - which will include single asic platform, 4 asic platform and 6 asic platform.
sonic-vs.img.gz - single asic KVM image
sonic-4asic-vs.img.gz - 4 asic KVM image
sonic-6asic-vs.img.gz - 6 asic KVM image
In this PR, 2 new platforms are added for 4-asic and 6-asic VS.
How I did it
Create 4-asic and 6-asic device directories with the required files and hwsku files.
Add onie-recovery image information in vs platform.
How to verify it
After this PR change, no build change.
sonic-vs.bin onie installer image should include information of new multi-asic vs platforms.
Why I did it
The first 4 ports on this dut are breakout ports. They might not always be connected in lab. Mark them as 'RJ45' to skip the SFP check since they are by default disabled.
How to verify it
run platform test_reboot.py
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- Why I did it
Update platform data files for SN4800 to support chassis management
- How I did it
Update pcie.yml
Update sensors.conf
Update platform.json
- How to verify it
Run platform test suite in sonic-mgmt
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json
- How I did it
Added the aforementioned breakout modes to platform.json
- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards
How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
Why I did it
The SFP1 port was disabled in the default configuration.
How I did it
This commit enables the 10G front port for usage.
This was done using a DX030 together with an Arista switch using bcmsh and phy diag xe0 dsc to figure out what lane mappings make sense.
Validated on Celestica Seastone2 DX030.
How to verify it
Own a Celestica DX030
Connect the front SFP1 port to something.
It works :-)
You can do tcpdump -i Ethernet128 -n and you will see both incoming and outgoing LLDP.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
To support dynamic port breakout Broadcom configurations on Arista platforms.
How I did it
Updated platform.json for platforms and added new hwsku, Broadcom config, and hwsku.json for dynamic port breakout usage.
The name of the new hwsku name used is very similar to the platform name (platform x86_64-arista_7050_qx32s hwsku Arista-7050QX-32S) as the flex hwsku is meant to be the default in the future.
How to verify it
Boot up device with the new hwsku, interfaces are up.
Change hwsku.json with new default breakout mode and reload device, breakout will have successfully been applied.
#### Why I did it
The values of the syseeprom were not valid
#### How I did it
I took the correct hexdump values from a real switch and created this hex file again
#### How to verify it
decode-syseeprom will display the new values
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes
Co-authored-by: Carl Keene <keene@nokia.com>
* add hwsku.json for the Nokia-7215
* added required default_brkout_mode to hwsku as its not optional
* remove tabs from the file so spacing consistent
Co-authored-by: Carl Keene <keene@nokia.com>
#### Why I did it
New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.
#### How I did it
Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.
#### How to verify it
Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
Support for show system-health command in s5232f
How I did it
Added the configuration, API changes to support system health
How to verify it
Execute "show system-health summary/detail/monitor-list" CLI.
* Update default cable len to 0m for TD2 (#8298)
* Update sonic-cfggen tests with the correct cable len
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
- With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Cfggen tests passed with the cable len update
Why I did it
To support "pcied" and "pcieutil" commands in DellEMC Z9332f.
How I did it
Add 'pcie.yaml' in device/dell/[PLATFORM]/ directory.
How to verify it
Execute "pcieutil check" command.
Logs: UT_logs.txt
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
1. Implement FanDrawer-Fan hierarchy.
2. Enable thermalctld, disable pcied.
3. Implement SystemLED in Chassis.
4. Correct Fan direction
5. Implement require Fan APIs for SystemHealthMonitoring.
6. Handle non-ascii character while reading PSU model/serial num.
```
Check if System-health can pass the check and display the SystemLED correctly.
///////// booting, DIAG_LED = GREEN_BLINKING /////////
root@sonic:/tmp# show system-health detail
System is currently booting...
root@sonic:/tmp# cat /sys/class/leds/diag/brightness
5
///////// container_checker fail, DIAG_LED = AMBER /////////
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_AMBER
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: container_checker is not Status ok
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
container_checker Not OK Program
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
--------------- -------- ------
asic Ignored Device
psu.temperature Ignored Device
///////// skip container_checker, DIAG_LED = GREEN /////////
root@sonic:/sys/bus/i2c/devices# vi /usr/share/sonic/device/x86_64-accton_as4630_54te-r0/system_health_monitoring_config.json
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_GREEN
Services:
Status: OK
Hardware:
Status: OK
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
----------------- -------- -------
container_checker Ignored Service
psu.temperature Ignored Device
asic Ignored Device
```
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
*Edited platform.json for 4600 & 4600C
*Edited hwsku.json and port_config.ini files for all the SKU's present under these platforms
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
This reverts commit a557dbd97e.
Reverting this PR as it is not required currently for multi-asic VS.
multi-asic VS will come up with multiple instances of swss and syncd. syncd will use default hwinfo string, same as in single asic VS.
Why I did it
To determine the revision of the pcie.yaml to be used based on BIOS version in DellEMC S6100 platform.
Depends on: Azure/sonic-platform-common#195
How I did it
Added two revisions of pcie.yaml pcie_1.yaml and pcie_2.yaml
Included a platform-specific Pcie class to provide the revision of the pcie.yaml to be used by pcieutil/pcied.
How to verify it
Execute pcieutil check (Azure/sonic-utilities#1672) command and verify the list of PCIe devices displayed.
Logs: UT_logs.txt
#### Why I did it
Support API 2.0 for S5248F platform
#### How I did it
Making changes to S5248F platform specific directory
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
MMU configuration for DellEMC Z9332 systems in T0/T1 topology
How I did it
Updated config.bcm, QoS/Buffer pool and lossy/lossless profile settings
How to verify it
Verified that Dell systems are booting up fine and basic test cases passing.
Introduce new sonic-buildimage images for Broadcom DNX ASIC family.
sonic-broadcom-dnx.bin
sonic-aboot-broadcom-dnx.swi
How I did it
NO CHANGE to existing make commands
make init; make configure PLATFORM=broadcom; make target/sonic-aboot-broadcom.swi; make target/sonic-broadcom.bin
The difference now is that it will result in new broadcom images for DNX asic family as well.
sonic-broadcom.bin, sonic-broadcom-dnx.bin
sonic-aboot-broadcom.swi, sonic-aboot-broadcom-dnx.swi
Note: This PR also adds support for Broadcom SAI 5.0 (based on 1.8 SAI ) for DNX based platform + changes in platform x86_64-arista_7280cr3_32p4 bcm config files and platform_env.conf files
Add device and platform code for ix7-bwde, ix8a-bwde.
Support platform API 2.0 for all quanta platforms except for ix1b
Co-authored-by: robert.hong <robert.hong@qct.io>
- Why I did it
To create SDK dump on Mellanox devices when SDK event has occurred.
- How I did it
Set the SKUs keys needed to initialize the feature in SAI.
- How to verify it
Simulate SDK event and check that dump is created in the expected path.
- Why I did it
The default breakout mode according to hwsku.json for the MSN4410 is 1x400G and this is not a supported breakout mode according to its platform.json
This causes a conflict on boot of this platform and no containers on the switch will init successfully.
- How I did it
Referenced the platform specification files and updated platform.json
- How to verify it
Install master version of SONiC on MSN4410
Boot switch and verify swss is successfully running using docker ps
What I did:
Updated 7260 MMU Profile based on latest MSFT Tier 1 Tomahawk2_MMU_Setting_48x100G_40m_16x100G_300m_v1.0 and
TH2_PGHdrm_MSFT.
How I verify:
Made sure image is up/traffic is flowing/mmu dump looked fine.
SAI qos test need will be updated to support this SKU.
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API
- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly
- How to verify it
Manually test and run regression platform test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Process pcied failed on Arista-7170-32CD-C32
```
root@sonic:/# supervisorctl
chassis_db_init EXITED Jun 03 08:48 AM
dependent-startup EXITED Jun 03 08:48 AM
ledd RUNNING pid 28, uptime 3:07:49
lm-sensors EXITED Jun 03 08:48 AM
pcied FATAL Exited too quickly (process log may have details)
```
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>
Why I did it
7050 S4Q31 mmu configuration is missing ALPM configurations, causing not enough memory reserved for routes. Orchagent crashes on a nightly testbed with 6400 route entries.
How I did it
Add the missing ALPM configurations.
How to verify it
Load the configuration on testbed and verified new configuration exists and no more crash.
Signed-off-by: Ying Xie ying.xie@microsoft.com
#### Why I did it
- After [sonic-linux-kernel#177](https://github.com/Azure/sonic-linux-kernel/pull/177) changes, the I2C mux channels of Baseboard and Switchboard CPLDs are moved from i2c-4 and i2c-5 to i2c-36 and i2c-37 respectively.
- This caused QSFP driver initialization of i2c-36 to i2c-41 to fail causing the ports from Ethernet208 to Ethernet248 fail.
#### How I did it
- The fix to this problem is to change the order of QSFP driver initialization to I2C mux channels.
- Instead of the order i2c-10 to i2c-41, the order i2c-4 to i2c-35 is being utilized.
- Also, need to change the i2c-mux-channel number for Baseboard CPLD and switchboard CPLD in scripts to access them.
hwskus.
Why I did it
For multi-asic platforms, orchagent process in swss docker is started by passing device_ids(or asic_ids).
Each swss docker starts orchagent with a different device_id. This device_id is passed as Hardware info to syncd. For syncd to start with the right hwinfo, context_config.json is passed as an argument. context_config.json file is looked up to get the hwinfo information.
sonic-sairedis PRs required for this diff to be used to bring up multi-asic VS:
Azure/sonic-sairedis#830Azure/sonic-sairedis#832
How I did it
Add context_config.json for each asic in the same structure as provided here: https://github.com/Azure/sonic-sairedis/blob/master/lib/src/context_config.json
Each asic context_config.json will have different hwinfo string.
hwinfo string will be same as device id retrieved from asic.conf file.
Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
#### Why I did it
Support 2km cables for Microsoft SKUs
#### How I did it
1. Update pg_profile_lookup.ini with 2000m cable supported
2. Update buffer configuration for t1 with uplink cable 2000m
- For SN3800 platform:
- C64:
- t0: 32 100G down links and 32 100G up links.
- t1: 56 100G down links and 8 100G up links with 2 km cable.
- D112C8: 112 50G down links and 8 100G up links.
- D24C52: 24 50G down links, 20 100G down links, and 32 100G up links.
- D28C50: 28 50G down links, 18 100G down links, and 32 100G up links.
- For SN2700 platform:
- D48C8: 48 50G down links and 8 100G up links.
- C32:
- t0: 16 100G down links and 16 100G up links.
- t1: 24 100G down links and 8 100G up links with 2 km cable.
- For SN4600C platform:
- D112C8: 112 50G down links and 8 100G up links.
#### How to verify it
Run regression test
This PR contains the following changes
Original Arista-7050-QX-32S sku (32x40G ports) has been renamed to Arista-7050QX32S-Q32
Arista-7050-QX-32S is symlinked to Arista-7050QX-32S-S4Q31 (4x10G, 31x40G ports)
Signed-off-by: Neetha John <nejo@microsoft.com>
**- Why I did it**
- To fix failed test cases of Seastone-DX010 platform APIs that found on [platform_tests](https://github.com/Azure/sonic-mgmt/tree/master/tests/platform_tests/api) script
**- How I did it**
1. Add device/celestica/x86_64-cel_seastone-r0/platform.json
2. Update functions to support python3.7
3. Add more functions follow latest sonic_platform_base
4. Fix the bug
Why I did it
Arista-7260CX3-Q64 is missing T1 MMU configuration.
How I did it
Define T1 MMU configuration for Arista-7260CX3-Q64.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
#### Why I did it
To get rid of obsolete code
#### How I did it
Removed plugins folder from device/barefoot
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
#### Why I did it
The label for PSU related sensors on the Spectrum-2 platform is not aligned with the physical location of the PSU.
#### How I did it
Update the label in the sensor conf file for those relevant platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
PG profile settings need to be aligned with Arista-7050-QX-32S
How I did it
Copy over the current settings from Arista-7050-QX-32S and define params for 10G and 1G speeds as well
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Need proper MMU and Qos settings for Arista-7050QX-32S-S4Q31
How I did it
Updated the settings based on Arista-7050-QX-32S
#### Why I did it
Add initial support of SN4800 platform for Mellanox ASIC simulation device.
NOTE: This is work in progress and not full support of the platform.
#### How I did it
Add new folders for SN4800 with zero ports based on SN4700 Spectrum-3 switch.
#### Why I did it
Improve readability of `show environment` output.
#### How I did it
In all sensors.conf, give the customized labels according to HW specifications for each model.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
- Why I did it
Add initial support of SN4800 platform .
NOTE: This is work in progress and not full support of the platform.
- How I did it
Add new folders for SN4800 with zero ports based on SN4700 Spectrum-3 switch.
- How to verify it
Simulator device was tested. See #7448
Add support for Accton as9726-32d platform
This pull request is based on as9716-32d, so I reference as9716-32d to create new model: as9726-32d.
This module do not need led driver to control led, FPGA can handle it.
I also implement API2.0(sonic_platform) for this model, CPLD driver, PSU driver, Fan driver to control these HW behavior.
Platform library changes
- Fix the use of /proc/modules during testing, fixes#7463
- Add `libsfp-eeprom.so` build to read/write xcvr eeproms in C
- Add some more reboot-cause information
- Write down temperature hw thresholds to the sensors
- Report software thresholds through platform api
- Writ `port_name sysfs` file of optoe`
- Tests enhancements
- Fix dependency issues for chassis provisioning
Platform configuration changes
- Add `pcie.yaml` configuration for a few platforms
- Mount `libsfp-eeprom.so` inside `pmon`
- Fix `Arista-7050SX3-48C8` and `Arista-7050SX3-48YC8' platform and hwsku
- Miscellaneous fixes
Co-authored-by: Boyang Yu <byu@arista.com>
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
#### Why I did it
MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0
#### How I did it
Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
- Why I did it
Enable VXLAN src port range configuration via SAI profile for Mellanox-SN3800-D28C49S1 SKU
- How I did it
Added SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 configuration to appropriate sai.profile
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
#### Why I did it
Current platform.json lacks some peripheral device related facts, like chassis/fan/pasu/drawer/thermal/components names, numbers, etc.
#### How I did it
Add platform device facts to the platform.json file
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
#### Why I did it
- To start support of dynamic port breakout as the norm for Arista platforms.
- Add a DPB hwsku for the 7060CX-32S
#### How I did it
- Expand platform.json for the 7060CX-32S
- Added a new hwsku specifically for DPB
- Added a flex Broadcom configuration
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
This change introduces dynamic port breakout (DPB) for Arista 7050 QX32 model by adding a new SKU suffixed with `-Flex`.
The breakout configuration allowed is the same as in mainline Arista EOS, i.e. 24 first ports are allowed to be used in 4x10G in addition to the default 40G mode. The last 8 ports are fixed to 40G. This is due to ASIC limitations of a total of 104 max ports.
**NOTE**: As described in https://github.com/aristanetworks/sonic/issues/30#issuecomment-820584113 front panel LEDs are likely not working when operating in breakout mode. It is not clear if the LEDs work correctly in 40G mode as I have not had a chance to physically inspect the switch with this patch.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Set hierarchical ecmp level to 2 instead of 3. Based on CS00011833367, ecmp level must be set to 2.
This is already handled for TH2 platforms. Change is required only for TD3
Co-authored-by: Ubuntu <prsunny@prince-vm.vzw1i4tqyeburcdz5lrgulxi2c.yx.internal.cloudapp.net>
Fix to the correct value for all SPC1 devices.
For 10G added 10GB_CX4_XAUI, 10GB_KX4, 10GB_KR, 10GB_SR and 10GB_ER_LR
For 50G added 50GB_SR2
This bitmask represents all the options available for interface type and some were missing.
Note: it was working just fine if you were setting the value from SONiC CLI but not from the default SAI Profile.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.
#### How I did it
- Modified get_transceiver_change_event in 1.0 to poll mode.
The platform name for MSN4600C in sfputil pliugin is not complete: "x86_64-mlnx_msn4600c" -> "x86_64-mlnx_msn4600c-r0"
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add missed files for dynamic buffer calculation for ACS-MSN3420 and ACS-MSN4410
- How I did it
asic_table.j2: Add mapping from platform to ASIC
Add buffer_dynamic.json.j2 for ACS-MSN4410.
- How to verify it
Check whether the dynamic buffer calculation daemon starts successfully.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
These ports are being enumerated by the latest SAI. But they are not defined in port_config.ini.
SONiC end up trying to delete these 3 ports and hit SAI error and crash.
How I did it
Add the GbE and the 2 HiGig ports in the port_config.ini.
How to verify it
Put the port_config.ini on a device crashing with port deleting. load minigraph and the crash stopped.
Signed-off-by: Ying Xie ying.xie@microsoft.com
* 7260cx3 DualToR config.bcm support based on DualToR setting in device metadata at boot time.
For HWSKU Arista-7260CX3-C64 the MMU setting SOC for T0/T1 is also combined into the config.bcm.j2 logic so use just one config file and adding delta based on Switch Roles.
Dynamic Port Breakout fall in case "autoneg" field exist in config_db.
- How I did it
Added "autoneg" field in sonic-port yang model.
- How to verify it
Add "autoneg" field into config_db like this:
"Ethernet8": {
"index": "2",
"lanes": "8,9,10,11",
"fec": "rs",
"pfc_asym": "off",
"mtu": "9100",
"alias": "Ethernet8",
"admin_status": "up",
"autoneg": "on",
"speed": "100000",
},
The file device/mellanox/x86_64-mlnx_msn4410-r0/plugins/sfputil.py is not a software link for device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py. And it is still using python2 syntex which causes some SFP CLI error. The PR is to change it to a softlink and add 4410 support in device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py.
Incorporate the below changes in DellEMC Z9332F platform:
- Implemented watchdog platform API support
- Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
- Change return type of SFP methods to match specification in sonic_platform_common/sfp_base.py
- Added platform.json file in device directory.
Co-authored-by: V P Subramaniam <Subramaniam_Vellalap@dell.com>
#### Why I did it
- The xcvrd service requires an event detection function, unplug or plug in the transceiver.
#### How I did it
- Add sysfs interrupt to notify userspace app of external interrupt
- Implement get_change_event() in chassis api.
- Also begin installing Python 3 sonic-platform package for Celestica platforms
#### Why I did it
Additional file for DPB in order to support SKU SN2700-D40C8S8 on master
#### How I did it
Add hwsku.json file
#### How to verify it
Enforce "Mellanox-SN2700-D40C8S8 SKU on Master and see it works as expected, meaning:
Port 1/3 will be used as 4x10G
Port 2/4 - Not exist (blocked since 1 and 3 split to 4)
Port 7/8/9/10/23/24/25/26 will used as 100G
All other ports will be used as 2x50G
This PR should be added on top of PR:
https://github.com/Azure/sonic-buildimage/pull/6876
#### Description for the changelog
Adding hwsku.json file to SN2700-D40C8S8 SKU
As booting on DCS-7060DX4-32 would use the default sku of DCS-7060PX4-32 which is not compatible,
thus move some files around to properly separate the configurations that are device specific.
Signed-off-by: Samuel Angebault <staphylo@arista.com>
#### Why I did it
Change buffer config for new SKU Mellanox-SN2700-D40C8S8
#### How I did it
Reuse the buffer config of SKU Mellanox-SN2700-D48C8
#### How to verify it
Run sonic-mgmt qos test and all passed
- Why I did it
Fix the build and fix the SN4600 DPB support
- How I did it
Fix port configuration file for SN4600 based on recent changes
- How to verify it
System bringup is completed, all interfaces are up.
Platform tests suits all is passing.
- Why I did it
To add support for the dynamic breakout on Mellanox platform x86_64-mlnx_msn4600
- How I did it
Add the relevant files describing Mellanox platform x86_64-mlnx_msn4600 breakout modes to a new device folder.
- How to verify it
System bringup is completed, all interfaces are up.
Platform tests suits all is passing.
- Why I did it
To fix PCIEd errors in log.
- How I did it
Update pcie.yaml with the right PCI addresses.
- How to verify it
Check logs, operation occurs each minute.
Signed-off-by: liora <liora@nvidia.com>
To fix [DPB| wrong aliases for interfaces](https://github.com/Azure/sonic-buildimage/issues/6024) issue, implimented flexible alias support [design doc](https://github.com/Azure/SONiC/pull/749)
> [[dpb|config] Fix the validation logic of breakout mode](https://github.com/Azure/sonic-utilities/pull/1440) depends on this
#### How I did it
1. Removed `"alias_at_lanes"` from port-configuration file(i.e. platfrom.json)
2. Added dictionary to "breakout_modes" values. This defines the breakout modes available on the platform for this parent port, and it maps to the alias list. The alias list presents the alias names for individual ports in order under this breakout mode.
```
{
"interfaces": {
"Ethernet0": {
"index": "1,1,1,1",
"lanes": "0,1,2,3",
"breakout_modes": {
"1x100G[40G]": ["Eth1"],
"2x50G": ["Eth1/1", "Eth1/2"],
"4x25G[10G]": ["Eth1/1", "Eth1/2", "Eth1/3", "Eth1/4"],
"2x25G(2)+1x50G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"],
"1x50G(2)+2x25G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"]
}
}
}
```
#### How to verify it
`config interface breakout`
Signed-off-by: Sangita Maity <samaity@linkedin.com>
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.
It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
- Why I did it
Add support for new 64x200G SN4600 systems
- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.
- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.
Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100
#### Why I did it
Add new SKU for SN2700 Mellanox system that supports the following port configuration:
8 X 100G
40 X 50G
8 X 10G
#### How I did it
Add new Folder - "Mellanox-SN2700-D40C8S8" under /sonic-buildimage/device/mellanox/x86_64-mlnx_msn2700-r0/
that contains the relevant files supporting this SKU
the buffers are based on SKU: D48C8 . Later on it will be configured specific for this SKU
#### How to verify it
Bring up the image, run "show interface status" and make sure that all ports are up and reflect the following requirement:
Port 1/3 will be used as 4x10G
Port 2/4 - Not exist (blocked since 1 and 3 split to 4)
Port 7/8/9/10/23/24/25/26 will used as 100G
All other ports will be used as 2x50G
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [X] 202012
#### Description for the changelog
Support new SKU under the name of SN2700-D40C8S8
- Why I did it
Current mutli-asic vs hwsku consists of 6 asics with each asic having 32 interfaces. When bringing this up, below issue was seen:
When all 32 interfaces(sonic interfaces and linux interface) are set to 9100 mtu, DMA error is seen "DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:06:03.0" which can be fixed by updating swiotlb=65536 in /host/grub/grub.cfg .In order to keep multi-asic VS lighter and easier to bring up and test, new hwsku 'msft_four_asic_vs' is added to represent 4-asic hwsku with 2 frontend asics and 2 backend asics and each asic having 8 interfaces interconnected by port-channels.
- How I did it
Add msft_four_asic_hwsku directory to have the right number of directories (4) and update port_config.ini and lanemap.ini files to include 8 ports information.
Add topology.sh script to create the internal asic-asic connectivity.
- How to verify it
Update asic.conf with the 4 asic information as below and build sonic-vs.img:
NUM_ASIC=4
DEV_ID_ASIC_0=0
DEV_ID_ASIC_1=1
DEV_ID_ASIC_2=2
DEV_ID_ASIC_3=3
Modify sonic_multiasic.xml to have 8 front panel interfaces.
create virtual switch using "sudo virsh sonic_mutliasic.xml" command.
Start topology service and Load config_db files for switch and each asic.
Ensure that that all internal interfaces and port_channels are coming up.
multi-asic vs testbed:
Bring up mutli-asic VS testbed with a multi-asic image(asic.conf updated to 4 asics) and using t1-lag topology.
./testbed-cli.sh -t vtestbed.csv -m veos_vtb -k ceos add-topo vms-kvm-four-asic-t1-lag password.txt
Load minigraph/config_dbs.
Ensure all internal and external interfaces come up.
No change on single asic vs.
- Improve sonic-mgmt platform test suite pass rate
- Improve coverage of platform unit tests
- Provide platform specific reboot logic as per platform porting guide
- Fix bug due to pcie.yaml file being located in the wrong directory
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437
- How I did it
Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
- Add support for more variants of `DCS-7280CR3-32[PD]4`
- Add Supervisor to Linecard consutil support
- Complete Watchdog platform API support
- Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
- Fix SEU management on `DCS-7060CX-32S`
- Allow kernel modules to build up to linux 5.10
- Rename led color `orange` to `amber`
- Miscellaneous fixes
Update topology script to retrieve hwsku from minigraph
if hwsku information is not available in config_db.
Fix clean up of interfaces in msft_multi_asic_vs hwsku
topology script.
- Why I did it
When bringing up multi-asic VS switch, topology service is started during boot up.
Topology service starts a shell script which runs the topology script present in /usr/share/sonic/device// directory. To invoke hwsku specific script, the topology script tries to retrieve hwsku information from config_db.
During initial boot up config_db might not be populated. In order to start topology service before config_db is updated,
update topology script to get hwsku information from minigraph.xml if it is available.
This will be helpful to bring up multi-asic VS testbed by loading minigraph and starting topology service.
- How I did it
Update topology.sh script to retrieve hwsku information from minigraph.xml.
Fix clean up function on msft_multi_asic_vs toplogy script.
- How to verify it
single-asic VS - no change; topology service is only enabled for multi-asic VS.
multi-asic VS - Bring up multi-asic VS image, copy minigraph to vs image, start topology service. Topology service should be successful.
to test clean up function fix, start topology service - make sure interfaces are created and moved to the right namespaces.
stop topology service - make sure namespace do not have any interface and all front end interfaces are present in default namespace.
- Why I did it
Mellanox-SN4600C-D112C8 SKU is not configured properly.
It should have 112 50G interfaces and 8 100G interfaces as described on this PR.
- How I did it
Modify sai_profile, port_config.ini and hwsku.json for DPB.
- How to verify it
Apply this HwSKU to a MSN4600C Mellanox platform.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
- Why I did it
Support shared headroom pool
Signed-off-by: Stephen Sun stephens@nvidia.com
- How I did it
Port configurations for SKUs based on 2700/3800 platform from 201911
For SN3800 platform:
C64: 32 100G down links and 32 100G up links.
D112C8: 112 50G down links and 8 100G up links.
D24C52: 24 50G down links, 20 100G down links, and 32 100G up links.
D28C50: 28 50G down links, 18 100G down links, and 32 100G up links.
For SN2700 platform:
D48C8: 48 50G down links and 8 100G up links
C32: 16 100G downlinks and 16 100G uplinks
Add configuration for Mellanox-SN4600C-D112C8
112 50G down links and 8 100G up links.
- How to verify it
Run regression test.
- Why I did it
Enable platform API tests to run successfully by providing required test infrastructure files along with supporting changes.
- How I did it
Added platform.json along with supporting changes.
- Addition of pcie.yaml supporting pcied
- Addition of Real fan drawer support vs Virtual
- Removal of python2 wheel with support in place for python3
- supporting changes platform api tests
Submodule commits included:
* src/sonic-platform-common 6ad0004...bd4dc03 (1):
> [sonic_sfp/qsfp_dd.py] Update DOM capability method name to align with other drivers (#163)
Also align all calling function names to match.
Port_config update for hwsku 7050CX3-32S-C3 - add two 10G ports.
This change is added to fix issue of "PortsOrch initialization failure" seen by previous removal of these 10G ports.
Tested on the device with new minigraph, and the PortsOrch initialization failure is not seen.
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.
**- How I did it**
Update the name of the function globally for all calls into the SFF-8436 driver.
Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
**- Why I did it**
To incorporate the below changes in DellEMC S6100, S6000 platforms.
- S6100, S6000:
- Enable 'thermalctld'
- Implement DeviceBase methods (presence, status, model, serial) for Fantray and Component
- Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
- Implement ‘get_status’ method for Fantray
- Implement ‘get_temperature’, ‘get_temperature_high_threshold’, ‘get_voltage_high_threshold’, ‘get_voltage_low_threshold’ methods for PSU
- Implement ‘get_status_led’, ‘set_status_led’ methods for Chassis
- SFP:
- Make EEPROM read both Python2 and Python3 compatible
- Fix ‘get_tx_disable_channel’ method’s return type
- Implement ‘tx_disable’, ‘tx_disable_channel’ and ‘set_power_override’ methods
- S6000:
- Move PSU thermal sensors from Chassis to respective PSU
- Make available the data of both Fans present in each Fantray
**- How I did it**
- Remove 'skip_thermalctld:true' in pmon_daemon_control.json
- Implement the platform API methods in the respective device files
- Use `bytearray` for data read from transceiver EEPROM
- Change return type of 'get_tx_disable_channel' to match specification in sonic_platform_common/sfp_base.py
**- Why I did it**
- Add as4630_54pe SDK configuration parameters.
**- How I did it**
- Add l3_alpm_enable=2 and ipv6_lpm_128b_enable=1 in hx5-as4630-48x1G+4x25G+2x100G.bcm.
Co-authored-by: derek_sun <derek_sun@edge-core>
Co-authored-by: derek_sun <ecsonic@edge-core.com>
Current mutli-asic vs hwsku consists of 6 asics with each asic having 32 interfaces.
When bringing this up, below issue was seen:
When all 32 interfaces in each namespace (sonic interfaces and linux interface) is set to 9100 mtu, DMA error is seen "DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:06:03.0" which can be fixed by updating swiotlb=65536 in /host/grub/grub.cfg .
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
port_config.ini for HWSKU Arista-7050CX3-32S-C32 has missing speed column and duplicated lanes.
The incorrect speed causes issues in Orchagent RESTARTCHECK as the below task remains as the remaining item during swss shutdown.
BRCM SDK 6.5.21 includes firmware updates (premier cancun) for TD3 platforms. The firmware update is required on TD3 platforms, which is packaged with BCMSAI 4.3.0.10.
**- How I did it**
Updated BCM config with a new variable that specifies the firmware package path. SDK uses this path to locate firmware packages and load during cold boot.
**- How to verify it**
bsv
BRCM SAI ver: [4.3.0.10], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.21] CANCUN ver: [5.3.3]
drivshell>
admin@str2-7050cx3-acs-02:~$ bcmsh
Press Enter to show prompt.
Press Ctrl+C to exit.
NOTICE: Only one bcmsh or bcmcmd can connect to the shell at same time.
drivshell>cancun stat
cancun stat
UNIT0 CANCUN:
CIH: LOADED
Ver: 06.06.01
CMH: LOADED
Ver: 06.06.01
SDK Ver: 06.05.21
CCH: LOADED
Ver: 06.06.01
SDK Ver: 06.05.21
CEH: LOADED
Ver: 06.06.01
SDK Ver: 06.05.21
drivshell>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN4600C
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
NOTE: breakout to 4 is currently not available as of missing functionality in DPB implementation.
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN4410
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN3700
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN2410
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
NOTE: breakout to 4 is currently not available as of missing functionality in DPB implementation.
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN2100
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN2010
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN3800
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
[DPB] added capability files for SN2700 platform
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout
- How I did it
Created capability files according to platform specification SN2700
- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage
NOTE: breakout to 4 is currently not available as of missing functionality in DPB implementation.
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.
**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
In order to build up device hierachy, PSU and module thermals are no longer child of chassis. PSU thermal belongs to PSU objects and SFP thermals belong to SFP object now. Need align this change in platform.json. Move thermal objects to correct parent device
fix platform driver breakage due to python3 upgrade and fix load minigraph errors with config load_minigraph -y
**- How I did it**
added python3-smbus to the pmon docker template since the previous was python2 specific
fixed additional "ord" python2 specific code
fixed the jinja templates used by qos reload - the template logic required data to be parsed
**- How to verify it**
run "show platform XXX" commands and verify output
run "sudo config load_minigraph -y" and verify configuration
run "show interfaces XXX" and verify output
Co-authored-by: Carl Keene <keene@nokia.com>
Y* profile is the name pattern for p4 programs that developed for the current platform. The difference between them is features enabled and resource reservation.
For this platform, it is expected to work on any Y profile. but after the latest changes, the first Y profile is always used.
Prevent system-healthd from service from failing at boot time due to missing configuration.
Also adds basic support for healthd.
The following caveat exists with this placeholder configuration:
- No PSU monitoring (sensors/fans)
- No ASIC temperature monitoring
- Why I did it
The sai.profile file in kvm images overrides the warmboot file with path /var/cache/sai_warmboot.bin. Since the directory /var/cache is not mounted in syncd, it will be cleared in an image upgrade, the warm-reboot image upgrade will fail if the file is put in the directory.
Fix#6183
- How I did it
Remove the path that overrides the default path. The warmboot file path will then be the default value /var/warmboot/sai-warmboot.bin. Since /var/warmboot/ is mounted by /host/warmboot/ in the host, it could survive an image upgrade.
- How to verify it
Tested warm reboot upgrading kvm image locally.
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338)
- [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361)
- [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973)
**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
- `dynamic` for the dynamic buffer calculation model
- `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
- ASIC_TABLE in platform/\<vendor\>/asic_table.j2
- PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
- PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
- DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
- Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
- fetch the vendor name when creates the docker and pass it as a docker environment variable
- `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
Update the buffer settings for dynamic buffer calculation
python2 is end of life and SONiC is going to support python3. This PR is going to support:
1. Mellanox SONiC platform API python3 support
2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side
platform.json is needed for sonic-mgmt testing. Also in the future it will be used as part of dynamic port breakout.
Also removed the folder symlink for BlackhawkDD because it has a different platform.json than BlackhawkO.
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
Fixes#6126.
There is a bug in getting the path of voltage, current and power. The
list object is directly converted to string to format the file path. As
a result, read_txt_file will get None value and a WARNING will be
recorded. This commit fix the issue.
Signed-off-by: bingwang <bingwang@microsoft.com>
Current support for the 7060PX4-32 and 7060DX4 was broken.
With this change, ports are now linking fine.
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
Rename chip name dps1900-i2c-X-58 to pmbus-i2c-X-58 in sensors.conf for Arista 7170 due to latest updates for Arista driver submodules #5686. After these updates adapter dps1900 was renamed and sensors.conf file is not applied properly. Issue was observed started from BFN SONiC image 16.
Signed-off-by: Nazar Tkachuk <nazarx.tkachuk@intel.com>
Submodule updates include the following commits:
* src/sonic-utilities 9dc58ea...f9eb739 (18):
> Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260)
> [generate_dump] Ignoring file/directory not found Errors (#1201)
> Fixed porstat rate and util issues (#1140)
> fix error: interface counters is mismatch after warm-reboot (#1099)
> Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255)
> [acl-loader] Make list sorting compliant with Python 3 (#1257)
> Replace hard-coded fast-reboot with variable. And some typo corrections (#1254)
> [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247)
> Remove unnecessary conversions to list() and calls to dict.keys() (#1243)
> Clean up LGTM alerts (#1239)
> Add 'requests' as install dependency in setup.py (#1240)
> Convert to Python 3 (#1128)
> Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238)
> [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233)
> Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224)
> [cli]: NAT show commands newline issue after migrated to Python3 (#1204)
> [doc]: Update Command-Reference.md (#1231)
> Added 'import sys' in feature.py file (#1232)
* src/sonic-py-swsssdk 9d9f0c6...1664be9 (2):
> Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96)
> FieldValueMap `contains`(`in`) will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94)
- Also fix Python 3-related issues:
- Use integer (floor) division in config_samples.py (sonic-config-engine)
- Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform
- Update all platform plugins to be compatible with both Python 2 and Python 3
- Remove shebangs from plugins files which are not intended to be executable
- Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict
- Remove trailing whitespace from plugins files
Fix 259 alerts reported by the LGTM tool:
- 245 for Unused import
- 7 for Testing equality to None
- 5 for Duplicate key in dict literal
- 1 for Module is imported more than once
- 1 for Unused local variable
**- Why I did it**
We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.
**- How I did it**
- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
- Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
Need A mmu configuration to get the device going without generating lots of warnings.
Similar to dummy MMU configuration for Arista-7050CX3-32S-C32, this configuration will need to be updated with correct numbers. This MMU configuration is copied from 7060 comparable hwsku.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This change introduces PDDF which is described here: https://github.com/Azure/SONiC/pull/536
Most of the platform bring up effort goes in developing the platform device drivers, SONiC platform APIs and validating them. Typically each platform vendor writes their own drivers and platform APIs which is very tailor made to that platform. This involves writing code, building, installing it on the target platform devices and testing. Many of the details of the platform are hard coded into these drivers, from the HW spec. They go through this cycle repetitively till everything works fine, and is validated before upstreaming the code.
PDDF aims to make this platform driver and platform APIs development process much simpler by providing a data driven development framework. This is enabled by:
JSON descriptor files for platform data
Generic data-driven drivers for various devices
Generic SONiC platform APIs
Vendor specific extensions for customisation and extensibility
Signed-off-by: Fuzail Khan <fuzail.khan@broadcom.com>
* [Juniper] Platform bug fixes / improvements
This patch set introduces the following changes for
the two platforms.
- QFX5210
- Fixes a driver bug related to reboot notifier
- Disable pcied
- Introduces a wrapper script for fast / warm reboots
for unloading the driver containing reboot handler
- Support for PSM4 optics in media_settings
- QFX5200
- BCM configuration file updates
- Bug fixes for EM policy
- Fixes a driver bug related to reboot notifier
- Introduces a wrapper script for fast / warm reboots
for unloading the driver containing reboot handler
- Disable pcied
- Support for PSM4 optics
Signed-off-by: Ciju Rajan K <crajank@juniper.net>
Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates
Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created
Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
On Arista-7050CX3-32S-C32, there are constant stream of errors like following.
Nov 3 21:56:24.415190 str2-7050cx3-acs-06 NOTICE swss#orchagent: :- registerInWdDb: No lossless TC found on port Ethernet68
Which causes:
loganalyzer to claim test failed.
leaving the system without MMU configuration. Which couldn't be good for any IO test.
- How I did it
Added these MMU configuraions are copied from another platform and guaranteed to be incorrect for hwsku Arista-7050CX3-32S-C32.
Adding them so that we have A MMU configuration and the system won't throw a whole bunch of errors and leave MMU unconfigured. The correct MMU configuration will come later.
This configuration is definitely not suitable for testing system performance or QoS behavior.
Signed-off-by: Ying Xie ying.xie@microsoft.com
- How to verify it
Test will have chance to pass. Ran a few test that would fail otherwise.
This PR has a dependency on community change to move PCIe config files from $PLATFORM/plugin folder to $PLATFORM/ folder
- Why I did it
To support PCIed daemon on Mellanox platforms
- How I did it
Add PCIed config yaml files for all Mellanox platforms
Update pmon daemon config files for SimX platforms
We don't need a custom platform reboot on Clearwater2(Ms). They are expected to be rebooted via a normal linux soft reboot.
Remove symlink to the arista common platform reboot for those 2 platforms.
**- Why I did it**
Converted two SP model to single pool model and modified the buffer size.
**- How I did it**
Changed buffer_default settings for all the DellEMC Z9264f HWSKU's.
**- How to verify it**
Check SP register values in NPU shell.
**- Which release branch to backport (provide reason below if selected)**
Need to be cherry picked for 201911 branch.
- Enable thermalctld support for our platforms
- Fix Chassis.get_num_sfp which had an off by one
- Implement read_eeprom and write_eeprom in SfpBase
- Refactor of Psus and PsuSlots. Psus they are now detected and metadata reported
- Improvements to modular support
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
The `get_serial_number()` method in the ChassisBase and ModuleBase classes was redundant, as the `get_serial()` method is inherited from the DeviceBase class. This method was removed from the base classes in sonic-platform-common and the submodule was updated in https://github.com/Azure/sonic-buildimage/pull/5625.
This PR aligns the existing vendor platform API implementations to remove the `get_serial_number()` methods and ensure the `get_serial()` methods are implemented, if they weren't previously.
Note that this PR does not modify the Dell platform API implementations, as this will be handled as part of https://github.com/Azure/sonic-buildimage/pull/5609
- Make DellEMC platform modules Python3 compliant.
- Change return type of PSU Platform APIs in DellEMC Z9264, S5232 and Thermal Platform APIs in S5232 to 'float'.
- Remove multiple copies of pcisysfs.py.
- PEP8 style changes for utility scripts.
- Build and install Python3 version of sonic_platform package.
- Fix minor Platform API issues.
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
* system health first commit
* system health daemon first commit
* Finish healthd
* Changes due to lower layer logic change
* Get ASIC temperature from TEMPERATURE_INFO table
* Add system health make rule and service files
* fix bugs found during manual test
* Change make file to install system-health library to host
* Set system LED to blink on bootup time
* Caught exceptions in system health checker to make it more robust
* fix issue that fan/psu presence will always be true
* fix issue for external checker
* move system-health service to right after rc-local service
* Set system-health service start after database service
* Get system up time via /proc/uptime
* Provide more information in stat for CLI to use
* fix typo
* Set default category to External for external checker
* If external checker reported OK, save it to stat too
* Trim string for external checker output
* fix issue: PSU voltage check always return OK
* Add unit test cases for system health library
* Fix LGTM warnings
* fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon
* Remove boot_timeout configuration because it will get from monit config file
* Fix argument miss
* fix unit test failure
* fix issue: summary status is not correct
* Fix format issues found in code review
* rename th to threshold to make it clearer
* Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead
* Fix unit test failure
* Fix LGTM alert
* Fix LGTM alert
* Fix review comments
* Fix review comment
* 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker
* Ignore check for unknown service type
* Fix unit test issue
* Rename user define checker to user defined checker
* Rename user_define_checkers to user_defined_checkers for configuration file
* Renmae file user_define_checker.py -> user_defined_checker.py
* Fix typo
* Adjust import order for config.py
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
* Adjust import order for src/system-health/health_checker/hardware_checker.py
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
* Adjust import order for src/system-health/scripts/healthd
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
* Adjust import orders in src/system-health/tests/test_system_health.py
* Fix typo
* Add new line after import
* If system health configuration file not exist, healthd should exit
* Fix indent and enable pytest coverage
* Fix typo
* Fix typo
* Remove global logger and use log functions inherited from super class
* Change info level logger to notice level
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>