Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs
How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
The quirk added is SDHCI_QUIRK2_BROKEN_HS200 used to downgrade the link speed for the eMMC.
Co-authored-by: Samuel Angebault <staphylo@arista.com>
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker
- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin
- How to verify it
Manual UT and new sonic-mgmt tests
Co-authored-by: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com>
Why I did it
To address error sometimes seen when running sonic-mgmt test_stress_routes.py::test_announce_withdraw_route on 720DT-48S
How I did it
Update boot0 logic to set platform specific varlog size for 720DT-48S
How to verify it
Verified that /var/log size increased and error is no longer observed when running test
Co-authored-by: andywongarista <78833093+andywongarista@users.noreply.github.com>
- Why I did it
Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8.
- How I did it
Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back.
To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches.
On uplink ports, map DSCP 2/6 to TC 2/6 respectively
On downlink ports, both DSCP 2/6 are still mapped to TC 1
Buffer adjusted according to the ports information:
Mellanox-SN4600c-C64:
56 downlinks 50G + 8 uplinks 100G
Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8:
24 downlinks 50G + 8 uplinks 100G
- How to verify it
Unit test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
- Why I did it
fixes#12907
When the management interface IP address configuration changes from dynamic to static the DNS configuration (retrieved from the DHCP server) in /etc/resolv.conf remains uncleared. This leads to a DNS configuration pointing to the wrong nameserver. To make the behavior clear DNS configuration received from DHCP should be cleared.
- How I did it
Use resolvconf package for managing DNS configuration. It is capable of tracking the source of DNS configuration and puts the configuration retrieved from the DHCP servers into a separate file. This allows the implementation of DNS configuration cleanup retrieved from DHCP during networking reconfiguration.
- How to verify it
Ensure that the management interface has no static configuration.
Check that /etc/resolv.conf has DNS configuration.
Configure a static IP address on the management interface.
Verify that /etc/resolv.conf has no DNS configuration.
Remove the static IP address from the management interface.
Verify that /etc/resolv.conf has DNS configuration retrieved form DHCP server.
* [Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips (#13737)
Why I did it
[Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips
f6df7303d8 Update expired certs.
84540b59c1 CVE-2022-2068
f763d8a93e Prepare 1.1.1n-0+deb11u2
576562cebe CVE-2022-1292
How I did it
Upgrade the OpenSSL version
* [Security] Upgrade OpenSSL version for armhf
Changing the default config knob value to be True for killing radv, due to the reasons below:
Killing RADV is to prevent sending the "cease to be advertising interface" protocol packet.
RFC 4861 says this ceasing packet as "should" instead of "must", considering that it's fatal to not do this.
In active-active scenario, host side might have difficulty distinguish if the "cease to be advertising interface" is for the last interface leaving.
6.2.5. Ceasing To Be an Advertising Interface
shutting down the system.
In such cases, the router SHOULD transmit one or more (but not more
than MAX_FINAL_RTR_ADVERTISEMENTS) final multicast Router
Advertisements on the interface with a Router Lifetime field of zero.
In the case of a router becoming a host, the system SHOULD also
depart from the all-routers IP multicast group on all interfaces on
which the router supports IP multicast (whether or not they had been
advertising interfaces). In addition, the host MUST ensure that
subsequent Neighbor Advertisement messages sent from the interface
have the Router flag set to zero.
sign-off: Jing Zhang zhangjing@microsoft.com
Co-authored-by: Jing Zhang <zhangjing@microsoft.com>
What I did:
Fix : #13117
How I did:
During build time mask only those feature/services that are disabled explicitly. Some of the features ((eg: teamd/bgp/dhcp-relay/mux/etc..)) state is determine run-time so for those feature by default service will be up and running and then later hostcfgd will mask them if needed.
So Default behavior will be
init_cfg.json.j2 during build time make state as disabled then mask the service
init_cfg.json.j2 during build time make state as another jinja2 template render string than do no mask the service
init_cfg.json.j2 during build time make state as enabled then do not mask the service
How I verify:
Manual Verification.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Why I did it
In the voq chassis the buffer_queue configuration needs to be applied on system_port instead of the sonic port.
This PR has the change to do this.
How I did it
Modify buffer_config.j2 to generate buffer_queue configuration on system_ports if the device is Voq Chassis
How to verify it
Verify the buffer_queue configuration is generated properly using sonic-cfggen
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Why I did it
Unify the Debian mirror sources
Make easy to upgrade to the next Debian release, not source url code change required. Support to customize the Debian mirror sources during the build
Relative issue: #12523
How I did it
How to verify it
Debian is shipping a systemd timer unit for logrotate, but we're also
packaging in a cron job, which means both of them will run, potentially
at the same time. Remove our cron file, and add an override to the
shipped timer file to have it be run every 10 minutes.
Fixes#12392.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Limiting #12804 changes to PikeZ platform only (Arista-720DT-48S). Note that this is a short term workaround for this platform until SDK investigation on SDK init failure on docker syncd restart due to DMA issues is resolved.
How I did it
Retrieve platform name from /host/machine.conf and only reload SDK kmods on Arista-720DT-48S platform.
Signed-off-by: Michael Li <michael.li@broadcom.com>
- Why I did it
There's a slowdown in bootup related to the execution of a show command during startup of swss service. show is a pretty heavy command and takes long time to execute ~2 sec.
- How I did it
I replaced show with sonic-db-cli which takes a ms to run.
- How to verify it
Boot the switch and verify swss is active.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
There is an issue on the Arista PikeZ platform (using T3.X2: BCM56274) while running SONiC. If the 'syncd' container in SONiC is restarted, the expected behaviour is that syncd will automatically restart/recover; however it does not and always fails at create_switch due to BCM SDK kmod DMA operation cancellation getting stuck.
Sep 16 22:19:44.855125 pkz208 ERR syncd#syncd: [none] SAI_API_SWITCH:platform_process_command:428 Platform command "init soc" failed, rc = -1. Sep 16 22:19:44.855206 pkz208 INFO syncd#supervisord: syncd CMIC_CMC0_PKTDMA_CH4_DESC_COUNT_REQ:0x33#015 Sep 16 22:19:44.855264 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:platformInit:1909 initialization command "init soc" failed, rc = -1 (Internal error). Sep 16 22:19:44.855403 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:sai_driver_init:642 Error initializing driver, rc = -1. ... Sep 16 22:19:44.855891 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:brcm_sai_create_switch:1173 initializing SDK failed with error Operation failed (0xfffffff5).
Reloading the BCM SDK kmods allows the switch init to continue properly.
How I did it
If BCM SDK kmods are loaded, unload and load them again on syncd docker start script.
How to verify it
Steps to reproduce:
In SONiC, run 'docker ps' to see current running containers; 'syncd' should be present.
Run 'docker stop syncd'
Wait ~1 minute.
Run 'docker ps' to see that syncd is missing.
Check logs to see messages similar to the above.
Signed-off-by: Michael Li <michael.li@broadcom.com>
Why I did it
The PR is to apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP to uplink ports on dualtor.
The traffic with DSCP 2 and DSCP 6 from T1 is treated as lossless traffic.
DSCP TC Queue
2 2 2
6 6 6
Traffic with DSCP 2 or DSCP 6 from downlink is still treated as lossy traffic as before.
How I did it
Define DSCP_TO_TC_MAP|AZURE_UPLINK and TC_TO_QUEUE_MAP|AZURE_UPLINK.
How to verify it
Verified by UT
Verified by coping the new template to a testbed, and rendering a config_db.json
Why I did it
The current lazy installer relies on a filename sort for both unpack and configuration steps. When systemd services are configured [started] by multiple packages the order is by filename not by the declared package dependencies. This can cause the start order of services to differ between first-boot and subsequent boots. Declared systemd service dependencies further exacerbate the issue (e.g. blocking the first-boot script).
The current installer leaves packages un-configured if the package dependency order does not match the filename order.
This also fixes a trivial bug in [Build]: Support to use symbol links for lazy installation targets to reduce the image size #10923 where externally downloaded dependencies are duplicated across lazy package device directories.
How I did it
Changed the staging and first-boot scripts to use apt-get:
dpkg -i /host/image-$SONIC_VERSION/platform/$platform/*.deb
becomes
apt-get -y install /host/image-$SONIC_VERSION/platform/$platform/*.deb
when dependencies are detected during image staging.
How to verify it
Apt-get critical rules
Add a Depends= to the control information of a package. Grep the syslog for rc.local between images and observe the configuration order of packages change.
Fix the issue where arp_update will not ping some of the ip's even
though they are in failed state since grep of that ip on ip neigh show
command does not do exact word match and can return multiple match.