Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic
How I did it
Modified pg_profile_lookup.ini to reflect the correct value
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
The 720DT-48S platform has variants with different chassis names, and these need to all be included in platform_components.json to ensure that sonic-mgmt platform_tests/fwutil/test_fwutil.py::test_fwutil_show passes
How I did it
Updated platform_components.json with the variant names for 720DT-48S.
How to verify it
Ran aforementioned testcase and verified that it passes on the different variants.
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305
- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service
- How to verify it
Regression test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
By specifying 'status_led' 'controllable' to false for psu section, it means the platform is not yet supporting psu status led
How I did it
specify 'status_led' 'controllable' to false for psu section
How to verify it
by running test in pdb, manually add {'status_led' : {'controllable' : False}} in dictionary
this flag will be able to get False and skip testing:
ce290c735d/tests/platform_tests/api/test_psu.py (L337)
Why I did it
1. fix chassis test_set_fans_led case
2. fix chassis get_name case mismatch issue
3. fix fan_drawer test_set_fans_speed
4. fix component test_components test case
How I did it
Add corresponding configuration into chassis json file
How to verify it
Run platform tests cases to verify these failure cases
After upgrade to brcmsai 8.1, the sdk running environment (container) recommended with mininum memory size as below
TH4/TD4(ltsw) uses 512MB
TH3 used 300MB
Helix4/TD2/TD3/TH/TH 256 MB
Base on this requirement, adjust the default syncd share memory size and set the memory size for special ACISs in platform_env.conf file for different types of Broadcom ASICs.
How I did it
Add the platform_env.conf file if none of it for broadcom platform (base on platform_asic file)
Add the 'SYNCD_SHM_SIZE' and set the value
for ltsw(TD4/TH4) devices set to 512M at least (update the platform_env.conf)
for Td2/TH2/TH devices set to 256M
for TH3 set to 300M
verify
How to verify it
verify the image with code fix
Check with UT
Check on lab devices
On a problematic device which cannot start successfully
Run with the command
$ cat /proc/linux-kernel-bde
Broadcom Device Enumerator (linux-kernel-bde)
Module parameters:
maxpayload=128
usemsi=0
dmasize=32M
himem=(null)
himemaddr=(null)
DMA Memory (kernel): 33554432 bytes, 0 used, 33554432 free, local mmap
No devices found
$ docker rm -f syncd
syncd
$ sudo /usr/bin/syncd.sh start
Cannot get Broadcom Chip Id. Skip set SYNCD_SHM_SIZE.
Creating new syncd container with HWSKU Force10-S6000
a4862129a7fea04f00ed71a88715eac65a41cdae51c3158f9cdd7de3ccc3dd31
$ docker inspect syncd | grep -i shm
"ShmSize": 67108864,
"Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",
On Normal device
$ docker inspect syncd | grep -i shm
"ShmSize": 268435456,
"Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e"
change the config syncd_shm.ini to b85=128m
$ docker rm -f syncd
syncd
$ sudo /usr/bin/syncd.sh start
Creating new syncd container with HWSKU Force10-S6000
3209ffc1e5a7224b99640eb9a286c4c7aa66a2e6a322be32fb7fe2113bb9524c
$ docker inspect syncd | grep -i shm
"ShmSize": 134217728,
"Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",
change the config under
/usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/Force10-S6000/platform_env.conf
and run command
$ cat /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/platform_env.conf
SYNCD_SHM_SIZE=300m
$ sudo /usr/bin/syncd.sh start
Creating new syncd container with HWSKU Force10-S6000
897f6fcde1f669ad2caab7da4326079abd7e811bf73f018c6dacc24cf24bfda5
$ docker inspect syncd | grep -i shm
"ShmSize": 314572800,
"Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Eariler the SDK stat polling was erroneously set to once every msec
which is far more frequent than required by SWSS. The new setting, which
is consistent with other vendor SKUs, is once a second. The net result
is reduced CPU MHz by syncd.
Why I did it
fix DX010 fan drawer and watchdog platform test case issues
How I did it
1. Add fan_drawer get_maximum_consumed_power support
2. Adjust maximum watchdog timeout value check
How to verify it
Run test_fan_drawer and test_watchdog test cases.
Why I did it
Fix the following issues for Seastone platform:
- system-health issue: show system-health detail will not complete #9530, Celestica Seastone DX010-C32: show system-health detail fails with 'Chassis' object has no attribute 'initizalize_system_led' #11322
- show platform firmware updates issue: Celestica Seastone DX010-C32: show platform firmware updates #11317
- other platform optimization
How I did it
Modify and optimize the platform implememtation.
How to verify it
Manual run the test commands described in these issues.
Why I did it
This is to fix test_forward_ip_packet_with_0xffff_chksum_tolerant test failure on 720DT-48S. IP packets with checksum set to 0xffff will be forwarded with the same checksum on this platform, instead of updating to the correct value.
How I did it
Add bcm config sai_verify_incoming_chksum=0 so that checksum is updated instead of being left unchanged when checksum is 0xffff. Note that packets with invalid checksum are still dropped with this config.
Why I did it
Add two platform that support s3IP framework
How I did it
Add two platforms supporting S3IP SYSFS (TCS8400, TCS9400)
How to verify it
Manual test
Co-authored-by: tianshangfei <31125751+tianshangfei@users.noreply.github.com>
#### Why I did it
`os` and `commands` modules are not secure against maliciously constructed input
`getstatusoutput` is detected without a static string, uses `shell=True`
#### How I did it
Eliminate the use of `os` and `commands`
Use `subprocess` instead
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
ECN parameters need to be updated for storage backend
How I did it
Included the check for storage backend devices to update qos configs
How to verify it
Verified that the new ecn settings are applied on storage backend device.
Verified that the old ecn settings are applied for storage frontend, non storage frontend/backend devices
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
The [xml.etree.ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree) module is not secure against maliciously constructed data.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
`subprocess.getstatusoutput` is dangerous because include shell=True in the implementation
#### How I did it
Remove xml. Use [lxml](https://pypi.org/project/lxml/) XML parsers package that prevent potentially malicious operation.
Replace `os` by `subprocess`
Use command as an array instead of string
Use `getstatusoutput_noshell` in `sonic_py_common` lib
Why I did it
Added to allow test_crm_route to pass; the test tries to add a /126 ipv6 route and this change is required in order for the count of available routes to be updated correctly.
Why I did it
TX FIR tuning should be done based on the type of inserted transceiver
How I did it
Add media_settings.json which contains the tuning data for 100G optic and 400G optic.
How to verify it
Tested against x86_64-arista_7800r3a_36d2_lc
Why I did it
The PR is to apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP to uplink ports on dualtor.
The traffic with DSCP 2 and DSCP 6 from T1 is treated as lossless traffic.
DSCP TC Queue
2 2 2
6 6 6
Traffic with DSCP 2 or DSCP 6 from downlink is still treated as lossy traffic as before.
How I did it
Define DSCP_TO_TC_MAP|AZURE_UPLINK and TC_TO_QUEUE_MAP|AZURE_UPLINK.
How to verify it
Verified by UT
Verified by coping the new template to a testbed, and rendering a config_db.json
Why I did it
Some sonic-mgmt platform_tests/api were failing on the 7060CX-32S
How I did it
Added the missing metadata in platform.json and platform_components.json
This is purely test data and does not impact our API implementation.
How to verify it
Run platform_tests / api and expect 100% pass rate.
Why I did it
Some sonic-mgmt platform_tests/api were failing on the 7260CX3-64
How I did it
Added the missing metadata in platform.json and platform_components.json
This is purely test data and does not impact our API implementation.
How to verify it
Run platform_tests/api and expect 100% passrate.
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
There is a need to have separate profiles on compute and storage and this infra update will help achieve that
How I did it
Moved buffer pool/profile and qos definitions on TD2 to a common folder and all TD2 hwsku's will reference that folder
Why I did it
Fixes#12614
How I did it
In the container_checker the database_chassis is added to expected container if device is supervisor
To detect the device is superviso, add supervisor=1 to the platform_env.conf of 7808 sup platform
How to verify it
run container_checker monit check
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`subprocess.Popen()` and `subprocess.run()` is used with `shell=True`, which is very dangerous for shell injection.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
#### How I did it
Replace `os` by `subprocess`
Remove unused functions
Why I did it
This changes should go with updated SDE for BFN. Without update we do see orchagent core dump.
How I did it
Revert changes
How to verify it
Deploy topology. No core dump appears
- Why I did it
A new SKU for MSN4700 Platform i.e. Mellanox-SN4700-V16A96
Requirements:
Breakout:
Port 1-24: 4x25G(4)[10G,1G]
Port 25-28: 2x100G[200G,50G,40G,25G,10G,1G]
Port 29-32: 2x200G[100G,50G,40G,25G,10G,1G]
Downlinks: 96 (1-24) + 4 (25-28)
Uplinks: 4 (29-32)
Shared Headroom: Enabled
Over Subscribe Ratio: 1:4
Default Topology: T0
Default Cable Length for T1: 5m
VxLAN source port range set: No
Static Policy Based Hashing Supported: No
Additional Details:
QoS params: The default ones defined in qos_config.j2 will be applied
Small Packet Percentage: Used 50% for traditional buffer model Note: For dynamic model, the value defined in LOSSLESS_TRAFFIC_PATTERN|AZURE|small_packet_percentage is used
SKU was drafted under the assumption that the downlink ports uses xcvr's that will only support the first 4 lanes of the physical port they are connected to. Hence for the ports 1-24, the last four lanes are not used
Cable Lengths used for generating buffer_defaults_{t0,t1}.j2 values
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Barefoot uses hysteresis, instead of 'xon-threshold'. 'xon' is only
supported in static mode, so there is a need to add this attribute
to every mode in PG profile init file
- How I did it
'xon_offset' was added to pg_profile_lookup.ini
- How to verify it
Install and basic sanity tests including traffic.
Checked with:
pfcwd/test_pfc_config.py pfcwd/test_pfcwd_all_port_storm.py
pfcwd/test_pfcwd_function.py pfcwd/test_pfcwd_war_reboot.py
pfc_asym/test_pfc_asym.py
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
What I did
Adding the dynamic headroom calculation support for Barefoot platforms.
Why I did it
Enabling dynamic mode for barefoot case.
How I verified it
The community tests are adjusted and pass.
A new SKU for MSN4700 Platform i.e. Mellanox-SN4700-V48C32
Requirements:
Breakout:
Port 1-24: 2x200G
Port 25-32: 4x100G
Downlinks: 48 (1-24)
Uplinks: 32 (25-32)
Shared Headroom: Enabled
Over Subscribe Ratio: 1:8
Default Topology: T1
Default Cable Length for T1: 300m
VxLAN source port range set: No
Static Policy Based Hashing Supported: No
Additional Details:
QoS params: The default ones defined in qos_config.j2 will be applied
Small Packet Percentage: Used 50% for traditional buffer model Note: For dynamic model, the value defined in LOSSLESS_TRAFFIC_PATTERN|AZURE|small_packet_percentage is used
Cable Lengths used for generating buffer_defaults_{t0,t1}.j2 values
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>