Signed-off-by: bingwang <wang.bing@microsoft.com>
Why I did it
This PR brings two changes
Add lossy PG profile for PG2 and PG6 on T1 for ports between T1 and T2.
After PR Update qos config to clear queues for bounced back traffic #10176 , the DSCP_TO_TC_MAP and TC_TO_PG_MAP is updated when remapping is enable
DSCP_TO_TC_MAP
Before After Why do this change
"2" : "1" "2" : "2" Only change for leaf router to map DSCP 2 to TC 2 as TC 2 will be used for lossless TC
"6" : "1" "6" : "6" Only change for leaf router to map DSCP 6 to TC 6 as TC 6 will be used for lossless TC
TC_TO_PRIORITY_GROUP_MAP
Before After Why do this change
"2" : "0" "2" : "2" Only change for leaf router to map TC 2 to PG 2 as PG 2 will be used for lossless PG
"6" : "0" "6" : "6" Only change for leaf router to map TC 6 to PG 6 as PG 6 will be used for lossless PG
So, we have two new lossy PGs (2 and 6) for the T2 facing ports on T1, and two new lossless PGs (2 and 6) for the T0 facing port on T1.
However, there is no lossy PG profile for the T2 facing ports on T1. The lossless PGs for ports between T1 and T0 have been handled by buffermgrd .Therefore, We need to add lossy PG profiles for T2 facing ports on T1.
We don't have this issue on T0 because PG 2 and PG 6 are lossless PGs, and there is no lossy traffic mapped to PG 2 and PG 6
Map port level TC7 to PG0
Before the PCBB change, DSCP48 -> TC 6 -> PG 0.
After the PCBB change, DSCP48 -> TC 7 -> PG 7
Actually, we can map TC7 to PG0 to save a lossy PG.
How I did it
Update the qos and buffer template.
How to verify it
Verified by UT.
Why I did it
As part of PCBB changes, we need to enable 2 extra lossless queues. The changes in this PR are done to adjust only the reserved sizes on Th2 for the additional 2 lossless queues
Calculations are done based on 40 downlinks for T1 and 16 uplinks for dual ToR
How to verify it
Verified that the rendering works fine on Th2 dut
Unit tests have been updated to reflect the modified buffer sizes when pcbb is enabled. There are existing testcases that will test the original buffer sizes when pcbb is disabled. With these changes, was able to build sonic-config-engine wheel successfully
Signed-off-by: Neetha John <nejo@microsoft.com>
Signed-off-by: bingwang <bingwang@microsoft.com>
Why I did it
This PR is to add two extra lossless queues for bounced back traffic.
HLD sonic-net/SONiC#950
SKUs include
Arista-7050CX3-32S-C32
Arista-7050CX3-32S-D48C8
Arista-7260CX3-D108C8
Arista-7260CX3-C64
Arista-7260CX3-Q64
How I did it
Update the buffers.json.j2 template and buffers_config.j2 template to generate new BUFFER_QUEUE table.
For T1 devices, queue 2 and queue 6 are set as lossless queues on T0 facing ports.
For T0 devices, queue 2 and queue 6 are set as lossless queues on T1 facing ports.
Queue 7 is added as a new lossy queue as DSCP 48 is mapped to TC 7, and then mapped into Queue 7
How to verify it
Verified by UT
Verified by coping the new template and generate buffer config with sonic-cfggen
* [Tunnel PFC] Tests for adding property 'sai_remap_prio_on_tnl_egress'
Add tests for adding property 'sai_remap_prio_on_tnl_egress', this
property should only be added in dual tor environment.
Test done:
Run test test_j2files.py
Co-authored-by: richardyu <richardyu@contoso.com>
- Why I did it
1. SN2201 sai profile needs to be updated according to the latest hardware.
2. In the reboot script, need to use the common symbol link of the power_cycle sysfs instead of directly accessing it due to SN2201 sysfs is different than other platforms.
3. echo 1 > $SYSFS_PWR_CYCLE will trigger the reboot immediately, the following sleep 3 and echo 0 > $SYSFS_PWR_CYCLE will never be executed, can be removed.
- How I did it
1. Replace the SN2201 sai profile with the latest one.
2. In the platform_reboot script, replace the direct sysfs path with the symbol link path.
3. Remove the redundant code from platform_reboot
- How to verify it
Perform reboot on all the Nvidia platforms, and check all can be rebooted successfully.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* [Tunnel PFC] Add property for tunnel PFC
Replace the config.bcm file with j2 template file
- Add 'sai_remap_prio_on_tnl_egress=1' property when device metadata local
- Host subtype is 'dualtor'
- Change sai.profile foe the new config.bcm.j2
* Removed unused default_config.json
* Remove asic.conf file from HW SKUs directories as they are not used by upstream code
* Enable dynamic PCI ID identification on Otterlake2
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
What/Why I did:
Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability.
Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created.
Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface.
Issue 2: Use of IPVLAN vs MACVLAN
Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted.
Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address.
Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server.
Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle
Solution: Added explicit PING from namespace that will check this reachability.
Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet.
Solution: Handle gracefully via try..except and log the message.
Why I did it
add celestica belgite platform
How I did it
add belgite platform in celestica
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Co-authored-by: anjian <anjian@celestica.com>
Co-authored-by: sandycelestica <sandyli@celestica.com>
- Why I did it
Platform_reboot files for simx doesn't do aything different apart from calling /sbin/reboot. which is anyway done in the /usr/local/bin/reboot script i.e. the parent script which calls the platform specific reboot scripts if present.
Moreover, /sbin/reboot invoked in the platform specific reboot script is a non-blocking call and thus it returns back to the original script (although /sbin/reboot does it job in the background) and we see messages like this.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
To include ONIE version in show platform firmware status command output in DellEMC S6100 and Z9332f platforms.
How I did it
Include ‘ONIE’ in the list of components provided by platform APIs in DellEMC S6100 and Z9332f.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins in DellEMC S6100.
Why I did it
Fixes some pmon errors/warnings by providing missing configuration files
How I did it
Add missing pcie.yaml and sensors.conf for supported linecards
How to verify it
pcie-check should pass
sensors should display proper sensor names
* [PDDF] Rename temp for 7816/7326/7726
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
* Change naming to pddf device
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
Co-authored-by: ecsonic <ecsonic@edge-core.com>
Why I did it
The customer report of the PCIe Bus Errors upon the SDK initialization of as7816-64x.
How I did it
Based on the internal info and discussion, update "pcie_aspm=off" into ONIE_PLATFORM_EXTRA_CMDLINE_LINUX of installer.conf to resolve it.
Why I did it
The buffer pool & profile setting in buffer template was not correct and caused the errors like the following:
ERR swss#orchagent: :- parseReference: malformed reference:[BUFFER_PROFILE|ingress_lossless_profile]. Must not be surrounded by [ ]
How I did it
Fix the buffer pool & profile setting by removing "[]".
How to verify it
Loaded image with this fix in a switch and made sure the error was not seen anymore.
The v0.7.5 has bug fix for the support of gearbox port and macsec counters. It also includes a owl firmware update with owl.lz4.fw.1.94.0.bin.
How I did it
Update credo sai url for v0.7.5
Update gearbox_config.json with using firmware owl.lz4.fw.1.94.0.bin instead of owl.lz4.fw.1.92.1.bin
How to verify it
Test gearbox port and macsec counter successfully on A7280.
- Why I did it
There is a hardware bug that PSU voltage threshold sysfs returns incorrect value. The workaround is to call "sensor -s" to refresh it.
- How I did it
Call "sensor -s" when the threshold value is not incorrect and PSU is "DELTA 1100"
- How to verify it
Unit test and Manual test
This PR includes necessary changes for correct generating BUFFER_QUEUE values in DB. Changes are based on the schema.md
Why I did it
Change format of generating BUFFER_QUEUE in DB according to schema.md and yang-model.
Old format:
"BUFFER_QUEUE": {
"Ethernet0,Ethernet100,Ethernet104,Ethernet108,Ethernet112,Ethernet116,Ethernet12,Ethernet120,Ethernet124,Ethernet16,Ethernet20,Ethernet24,Ethernet28,Ethernet32,Ethernet36,Ethernet4,Ethernet40,Ethernet44,Ethernet48,Ethernet52,Ethernet56,Ethernet60,Ethernet64,Ethernet68,Ethernet72,Ethernet76,Ethernet8,Ethernet80,Ethernet84,Ethernet88,Ethernet92,Ethernet96|queue": {
"profile": "profile"
},
"Ethernet0,Ethernet100,Ethernet104,Ethernet108,Ethernet112,Ethernet116,Ethernet12,Ethernet120,Ethernet124,Ethernet16,Ethernet20,Ethernet24,Ethernet28,Ethernet32,Ethernet36,Ethernet4,Ethernet40,Ethernet44,Ethernet48,Ethernet52,Ethernet56,Ethernet60,Ethernet64,Ethernet68,Ethernet72,Ethernet76,Ethernet8,Ethernet80,Ethernet84,Ethernet88,Ethernet92,Ethernet96|queue": {
"profile": "profile"
}
},
New format:
"BUFFER_QUEUE": {
"Ethernet0|queue": {
"profile": "profile"
},
"Ethernet0|queue": {
"profile": "profile"
},
"Ethernet4|queue": {
"profile": "profile"
},
"Ethernet4|queue": {
"profile": "profile"
},
"Ethernet8|queue": {
"profile": "profile"
},
"Ethernet8|queue": {
"profile": "profile"
},
...
}
How I did it
Updated structure of buffers_defaults jinja templates.
Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com>
* [device config] Adding configuration for default route fallback
* Set sai_tunnel_underlay_route_mode attribute to fallback to default route if more specific route is unavailable.
Why I did it
Prevent from i2c bus to get locked.
How I did it
Add sysfs driver to access ioport.
Command to reset i2c mux:
echo 1 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Command to bring i2c mux out of reset:
echo 0 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
Why I did it
Support pddf to as4630/as7816/as7326
How I did it
Send needed file to the PR for these platform
How to verify it
Test sensors and show platform cmd.
root@as7326-56x-3:/home/admin# show platform psustatus
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED
PSU 1 FSF045-611 FSF0451912000505 N/A 12.06 5.50 66.00 OK green
PSU 2 FSF045-611 FSF0451912000568 N/A 12.00 5.50 66.00 OK green
root@as7326-56x-3:/home/admin# sensors
lm75-i2c-15-4a
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +35.5 C (high = +80.0 C, hyst = +75.0 C)
lm75-i2c-15-4b
Adapter: i2c-1-mux (chan_id 6)
CPU Board Temperature: +29.0 C (high = +80.0 C, hyst = +75.0 C)
fan_ctrl-i2c-11-66
Adapter: i2c-1-mux (chan_id 2)
fan1: 9100 RPM
fan2: 9400 RPM
fan3: 9300 RPM
fan4: 9600 RPM
fan5: 9000 RPM
fan6: 9100 RPM
fan7: 9100 RPM
fan8: 9300 RPM
fan9: 9200 RPM
fan10: 9400 RPM
fan11: 9200 RPM
fan12: 9400 RPM
pch_haswell-virtual-0
Adapter: Virtual device
temp1: +43.0 C
psu_pmbus-i2c-17-59
Adapter: i2c-1-mux (chan_id 0)
in3: +12.06 V
fan1: 6272 RPM
temp1: +37.0 C
power2: 60.00 W
curr2: +6.00 A
lm75-i2c-15-49
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +40.0 C (high = +80.0 C, hyst = +75.0 C)
lm75-i2c-15-48
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +39.0 C (high = +80.0 C, hyst = +75.0 C)
psu_pmbus-i2c-13-5b
Adapter: i2c-1-mux (chan_id 4)
in3: +12.00 V
fan1: 6144 RPM
temp1: +36.0 C
power2: 66.00 W
curr2: +5.50 A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 0: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 1: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 2: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 3: +50.0 C (high = +82.0 C, crit = +104.0 C)
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
Why I did it
Added platform.json file for N3248TE
How I did it
Defined the platform.json file with the required components under chassis.
How to verify it
validated the API 2.0 test suite
Co-authored-by: Arun LK <Arun_L_K@dell.com>
DellEMC: N3248TE platform API2.0 changes
Why I did it
N3248TE Platform API 2.0 changes
How I did it
Implemented the functional API's needed for Platform API 2.0
Added system_health_monitoring_config.json file
How to verify it
Used the API 2.0 test suite to validate the test cases.
Co-authored-by: Arun LK <Arun_L_K@dell.com>
When do "skip_thermalcltd: true" will let "show platform fan" and "show platform temp" fail.
When enable thermalctld, these cmd will work well.
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
- Why I did it
Update NVIDIA Copyright header to "mellanox" files which were changed since 1.1.2022
- How I did it
Update the copyright header
- How to verify it
Sanity tests and PR checkers.
The following changes are provided to support bullseye and the latest master
branch content.
- Accommodate relocated fan and thermal sysfs entries in bullseye
- Add support for chassis and PSU HW revision
Why I did it
Fix platform issues introduced by the bullseye kernel upgrade.
How I did it
Minor fixes to Nokia ixs7215 platform code
How to verify it
Execute the following CLI commands
show platform summary
show platform fan
show platform temperature
Why I did it
The current code assumes that the value part does not have whitespace. So everything after the whitespace will be ignored. The syseeprom values returned from platform API do not match the output of "show platform syseeprom" on dx010 and e1031 device.
How I did it
This change improved the regular expression for parsing syseeprom values to accommodate whitespaces in the value.
PR 10021 provides the solution, but committed to the wrong place for dx010 and e1031.
How to verify it
Compile the sonic_platform wheel for dx010, then upload to device and install the wheel, verify the platform eeprom API.
Signed-off-by: Eric Zhu <erzhu@celestica.com>
Update device-specific files for new platform SN2201, including:
device/mellanox/x86_64-nvidia_sn2201-r0/ACS-SN2201/buffers_defaults_objects.j2
device/mellanox/x86_64-nvidia_sn2201-r0/ACS-SN2201/hwsku.json
device/mellanox/x86_64-nvidia_sn2201-r0/default_sku
device/mellanox/x86_64-nvidia_sn2201-r0/pcie.yaml
device/mellanox/x86_64-nvidia_sn2201-r0/platform.json
device/mellanox/x86_64-nvidia_sn2201-r0/platform_components.json
device/mellanox/x86_64-nvidia_sn2201-r0/sensors.conf
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>