This is to address an issue where it was observed that SAI operations sometime make take a very long to time complete (over 45ms). It was determined that the ALPM distributed thread was causing this issue.
The fix is to disable this debug thread that has no functional purpose.
Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the fib test cases on 7050CX3 (TD3), TD2, TH, TH2, and TH3 based platforms and
thy all passed.
Why I did it
During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage.
How I did it
Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
Wrong SKU configuration will lead to longer init flow.
This will affect fast-reboot feature by increasing the traffic downtime.
Since MLNX met the required downtime period with this SKU this bug found with a delay.
- How I did it
Add the required split labels for ports.
- How to verify it
Run fast-reboot with this platform using SN3800-D112C8 SKU.
These changes are included in this PR:
07e1f79 [syncd] Add workaround for warm boot new objects (#959)
50fd353 Fix the option missing in kernel config issue (#956)
e77503c [syncd] Comparison logic workaround for empty buffer profile (#906) (#941)
This is to pick up BRCM SAI 4.3.5.1-6 fixes which contains the following fixes:
1. CS00012213351 SONIC-50679: [TH, TH2] Warm-reboot from 3.5 to 4.3 fails due to null objects discovered
2. CS00012182162: SONIC-49805 TD3 MMU config profile optimization changes
3. CS00012210826:SONIC-50205/760c60fc: Should read MMU_INTFI_MMU_PORT_TO_MMU_QUEUES_FC_BKP for TH3
Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
fib/test_fib.py
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
decap/test_decap.py
ipfwd/test_dip_sip.py
ipfwd/test_dir_bcast.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
```
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Storage T0's have all vlan members as tagged
How I did it
Since currently minigraph does not have a unique way to identify if a vlan member is tagged/untagged and to ensure other scenarios are not broken, the logic used is to just update the vlan member type as 'tagged' when we determine that it is a storage backend device. This change will apply only to storage backend T0's since storage backend T1's will not have vlan member information
How to verify it
Updated the storage backend T0 testcases to check for tagged vlan members
Added testcase to check if a T1 and backend T1 device generates an empty vlan member table
Existing vlan member testcases are good enough for checking if any regression has been caused for regular T0's
Build sonic_config_engine-1.0-py3-none-any.whl successfully
#### Why I did it
Fixed an issue that changing SDK version leads to cache framework taking cached syncd RPC image rather then rebuilding syncd RPC based on new syncd with new SDK.
Investigation showed that cache framework calculates a component hash based on direct dependencies. Syncd RPC image hash consists of two parts: one is the flags of syncd RPC (platform, ENABLE_SYNCD_RPC) and syncd RPC direct dependencies makefiles. None of the syncd RPC direct dependencies are modified when SDK version changes, so hash is unchanged.
#### How I did it
To fix this issue, include the hash of dependencies into current component hash calculation, e.g.:
In calcultation of the hash ```docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz```, the hash of syncd is included: ```docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz``` in which the hash of SDK is included.
#### How to verify it
Build with cache enabled and check that changing SDK version leads to a different hash of syncd rpc image:
SDK version 4.5.1002:
```
docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz
docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz
```
SDK version 4.5.1002-005:
```
docker-syncd-mlnx.gz-18baf952e3e0eda7cda7c3c-e5668f4784390d5dffd55af.tgz
docker-syncd-mlnx-rpc.gz-4a6e59580eda110b5709449-552f76be135deaf750aeab2.tgz
```
bf7214c (HEAD -> 202012, origin/202012) [y_cable] add support for manual/standby mode in xcvrd for muxcable; fix download firmware version retrieval logic while download firmware in progress (#220)
fc6a41e (HEAD -> 202012, origin/202012) Fix typo in the simulated y_cable driver (#226)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* [Nokia ixs7215] Platform API fixes
This commit delivers the following fixes
- Fix bug preventing access to second PSU eeprom
- Fix bug preventing updates to front panel PSU status led
- Fix SFP reset test case failure
* Fix LGTM alert
This commit more fully declares the HW capabilities of the Nokia-7215
platform. For example, support for the threshold values associated with each
thermal sensor is described. The intent here is to inform the sonic-mgmt
platform test cases of which HW features are supported.
This commit must align with PR# 4521 within the sonic-mgmt git repo which is
currently under review. Any changes to that PR will need to be reflected in
this commit.
Fix the check used to wait for interfaces to come up. The group name in
the supervisor config files has changed from isc-dhcp-relay to
dhcp-relay.
Also, in the wait script, wait 10 additional seconds after the vlans,
port channels, and any interfaces are up. This is because dhcrelay
listens on all interfaces (in addition to port channels and vlans), and
to ensure that it stays in a clean state during runtime, wait some extra
time to make sure that those interfaces are created as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This PR updates the following commits
0770dec Add retry reading/setting mux status to simulated y-cable driver (#221)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Redis 4.0.0b1 has been uploaded to pip as a prerelease version. This
version drops support for Python 2 and only supports Python 3. Because
setup.py is being run, it will use the latest version of a package and
not the latest stable version (which is still 3.5.3).
Therefore, pin the redis package to version 3.5.3, so that it will work
for both Python 2 and 3.
#### How to verify it
Make sure that redis-dump-load for Python 2 builds today.
copp-config service needs to be started after sonic.target so that it could
render the copp-config with the latest information.
It also needs to be restarted when config reload or load_minigraph is invoked.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Since database.service has been moved to execute after rc-local.service,
and determine-reboot-cause.service rely on database.service, we have to
specify that in "After=".
Signed-off-by: Xichen Lin <xichenlin@microsoft.com>
Co-authored-by: Xichen Lin <xichenlin@microsoft.com>
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
The previous PR #8914 was reverted due to crashing the 202012 syseepromd.
How I did it
Tested the 202012 image with change and fixed the disparity between master and 202012.
How to verify it
Run the built image on the dut and syseepromd will not crash, and in redis-cli can fetch the eeprom information.
Include the following commits:
f9bbed3cb86a3bab9a07745096835dbdbe5a4db6
Convert Unit Tests from unittest framework to pytest framework
e842c5ff317c67919dcbcab3358143cb9a16c9dd
Generate code coverage for Unit Tests
Why I did it
Sujin noticed that Arista eeprom platform API cannot update the redis database. Although Arista and Guohan believe that database update logic should be part of the daemon, it is easy enough to implement the fix for Arista for now.
How I did it
Made Arista eeprom platform API inherit from TlvInfoDecoder, then write Arista's own visit_eeprom method.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Vaibhav Dahiya notified me that invalid fan speed policy was expecting an error raised in sonic-mgmt testing, but it was not raised.
This change will fix test_platform_info.py::test_thermal_control_load_invalid_value_json
How I did it
Add in the suggested code chunk to Arista platform submodule to raise ValueError when an invalid fan speed is set in thermal policy.
How to verify it
Vaibhav Dahiya has verified it through sonic-mgmt testing.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
- Why I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high, so that thermal algorithm would set fan speed to minimum allowed earlier and save power.
- How I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high
- How to verify it
Manual test
A new config option `sai_verify_incoming_chksum` was added to control the value of IPV4_INCR_CHECKSUM_ORIGINAL_VALUE_VERIFY in the EGR_FLEX_CONFIG control register (this prevents checksums of 0xffff from being propagated to other devices)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
How I did it
Added if multi npu check before invoking the load global config.
How to verify it
Restart caclmgrd after this change and check if no error log is thrown.