Currently, whenever isc-dhcp-relay forwards a packet upstream,
internally, it will try to send it on a "fallback" interface. My
understanding is that this isn't meant to be a real interface, but
instead is basically saying to use Linux's regular routing stack to
route the packet appropriately (rather than having isc-dhcp-relay
specify specifically which interface to use).
The problem is that on systems with a weak CPU, a large number of
interfaces, and many upstream servers specified, this can introduce a
noticeable delay in packets getting sent. The delay comes from trying to
get the ifindex of the fallback interface. In one test case, it got to
the point that only 2 packets could be processed per second. Because of
this, dhcrelay will easily get backlogged and likely get to a point
where packets get dropped in the kernel.
Fix this by adding a check saying if we're using the fallback interface,
then don't try to get the ifindex of this interface. We're never going
to have an interface named this in SONiC.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-snmpagent
```
* b0a4bcc - (HEAD -> 202305, origin/202305) Set the execute bit on sysDescr_pass.py (#306) (22 hours ago) [Andre Kostur]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* f5e9b54 - (HEAD -> 202305, origin/202305) [CodeQL] fix unmet build dependency (#222) (10 hours ago) [Jing Zhang]
* 2282cc5 - [active-standby] Probe the link in suspend timeout (#235) (22 hours ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 93c42272 - (HEAD -> 202305, origin/202305) [chassis]: Support show ip bgp summary to display without error when no external neighbors are configured on chassis LC (#3099) (22 hours ago) [Arvindsrinivasan Lakshmi Narasimhan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Upgrade the xgs SAI version to 8.4.39.2 to include the following fix:
8.4.36.0: [submodule upgrade] [SAI_BRANCH rel_ocp_sai_8_4] SID: SDK-381039 Cosq control dynamic type changes
8.4.37.0: SID: MMU cosq control configuration with Dynamic Type Check
8.4.38.0: [sbumodule upgrade] [CSP 0001232212][SAI_BRANCH rel_ocp_sai_8_4]back-porting SONIC-82415 to SAI 8.4
8.4.39.0: [CSP CS00012320979] Port SONIC-81867 sai spec compliance for get SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO
8.4.39.1: changes for phy-re-init of 40G ports for TH platforms CS00012327470
8.4.39.2: fix capability for Hostif queue by change SET operation of SAI_HOSTIF_ATTR_QUEUE to be true
Work item tracking
Microsoft ADO (number only): 26491005
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
https://dev.azure.com/mssonic/internal/_build/results?buildId=457869&view=results
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's
How I did:
- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.
How I verify:
Manual Verification
UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
double commit PR-16450 because of cherry pick conflict for PR#202305
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Why I did it
When we disable telemetry.service, sonic-hostservice will not start. And root cause is sonic-hostservice is only wanted by telemetry.service.
Work item tracking
Microsoft ADO (number only):
How I did it
Add dependency for gnmi.service.
How to verify it
Disable telemetry.service and build new image, and then check sonic-hostservice with new image.
Fix#16204
Microsoft ADO (number only): 25746782
How I did it
multiarch/debian-debootstrap:arm64-bullseye is too old.
It needs to add some gpg keys before 'apt-get update'
#### Why I did it
src/sonic-swss
```
* ac94f0b7 - (HEAD -> 202305, origin/202305) [202305][routeorch] Fixing bug with multiple routes pointing to nhg (#3002) (2 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
*What I did:
Enable BFD for Static Route for chassis-packet. This will trigger the use of the feature as defined in here: #13789
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
What I did:
Added flag in sonic_version.yml to see if compiled image is secured or non-secured. This is done using build/compile time environmental variable SECURE_UPGRADE_MODE as define in HLD: https://github.com/sonic-net/SONiC/blob/master/doc/secure_boot/hld_secure_boot.md
This flag does not provide the runtime status of whether the image has booted securely or not. It's possible that compile time signed image (secured image) can boot on non secure platform.
Why I did:
Flag can be used for manual check or by the test case.
ADO: 24319390
How I verify:
Manual Verification
---
build_version: 'master-16191.346262-cdc5e72a3'
debian_version: '11.7'
kernel_version: '5.10.0-18-2-amd64'
asic_type: broadcom
asic_subtype: 'broadcom'
commit_id: 'cdc5e72a3'
branch: 'master-16191'
release: 'none'
build_date: Fri Aug 25 03:15:45 UTC 2023
build_number: 346262
built_by: AzDevOps@vmss-soni001UR5
libswsscommon: 1.0.0
sonic_utilities: 1.2
sonic_os_version: 11
secure_boot_image: 'no'
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.
Why I did it
To address thermal logging deficiencies.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:
Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.
Modify the module.py reboot to support reboot linecard from Supervisor
- Modify reboot to call _reboot_imm for single IMM card reboot
- Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output
- Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
- Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
- Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
- Provide ability for IMM to disable all transceiver module TX at reboot time
- Remove defunct xcvr-resync service
Why I did it
When Supervisor card is rebooted by using PMON API, it takes about 90 seconds to trigger the shutdown in down path. At this time linecards have been up. This delays linecards database initialization which is trying to PING/PONG the database-chassis. To address this issue, we modified the NDK to use the system call with "sudo reboot" when the request is from PMON API on Supervisor case. The NDK version is 22.9.20 and greater. This new NDK requires this modifcaiton of platform_reboot to work with.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
Modify the platform_reboot In Supervisor not to reboot all IMMs since it has been done in the function reboot() in module.py. Also handle the reboot-cause.txt for on the Supervisor when the reboot is request from PMON API.
Modify the Nokia platform specific platform_reboot in linecard to disable all SPFs.
This PR works with NDK version 22.9.20 and above
Signed-off-by: mlok <marty.lok@nokia.com>
For 40G optics there is SAI handling of T0 facing ports to be set with SR4 type and unreliable los set for a fixed set of ports. For this property to be invoked the requirement is set
phy_unlos_msft=1 in config.bcm.
This change is to meet the requirement and once this property is set, the los/interface type settings is applied by SAI on the required ports.
Why I did it
For Arista-7060CX-32S-Q32 T1, 40G ports RX_ERR minimalization during connected device reboot
can be achieved by turning on Unreliable LOS and SR4 media_type for all ports which are connected to T0.
The property phy_unlos_msft=1 is to exclusively enable this property.
Microsoft ADO: 25941176
How I did it
Changes in SAI and turning on property
How to verify it
Ran the changes on a testbed and verified configurations are as intended.
with property
admin@sonic2:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on = 0
Media Type = 2
Unreliable LOS = 1
Scrambling Disable = 0
Lane Config from PCS = 0
without property
admin@sonic:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on = 0
Media Type = 0
Unreliable LOS = 0
Scrambling Disable = 0
Lane Config from PCS = 0
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Why I did it
For some devices with small memory, after upgrading to the latest image, the available memory is not enough.
Work item tracking
Microsoft ADO (number only):
26324242
How I did it
Disable restapi feature for LeafRouter which with slim image.
How to verify it
verified on 7050qx T1 (slim image), restapi disabled
verified on 7050qx T0 (slim image), restapi enabled
verified on 7260 T1 (normal image), restapi enabled
#### Why I did it
src/sonic-utilities
```
* 651a80b1 - (HEAD -> 202305, origin/202305) Modify teamd retry count script to base BGP status on default BGP status (#3069) (22 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com
This improvement does not bring any warm-reboot degradation, since the database availability (tcp/ip access over the loopback interface) was fixed by these PRs:
Re-add 127.0.0.1/8 when bringing down the interfaces #15080
Fix potentially not having any loopback address on lo interface #16490
Why I did it
Removed dependency on interfaces-config.service to speed up the boot, because interfaces-config.service takes a lot of time on init
Work item tracking
N/A
How I did it
Changed service files for swss/syncd
How to verify it
Boot and check swss/syncd start time comparing to interfaces-config
#### Why I did it
src/linkmgrd
```
* 2f5971f - (HEAD -> 202305, origin/202305) [warmboot] use config_db connector to update mux mode config instead of CLI (#223) (4 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com
Why I did it
Improved switch init time
Work item tracking
N/A
How I did it
Replaced: sonic-cfggen -> sonic-db-cli
Aggregated template list for sonic-cfggen
How to verify it
Run warm-reboot
#### Why I did it
src/linkmgrd
```
* 2089ab6 - (HEAD -> 202305, origin/202305) Exclude DbInterface in PR coverage check (#224) (3 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* e5fd192 - (HEAD -> 202305, origin/202305) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (9 hours ago) [DavidZagury]
```
#### How I did it
#### How to verify it
#### Description for the changelog
DEPENDS ON: sonic-net/sonic-swss#2997sonic-net/sonic-utilities#3093
What I did
Revert the feature.
Why I did it
Revert bgp suppress FIB functionality due to found FRR memory consumption issues and bugs.
How I verified it
Basic sanity check on t1-lag, regression in progress.
Backport PR #17458 due to conflict.
Why I did it
Optimize syslog rate limit feature for fast and warm boot
Work item tracking
Microsoft ADO (number only):
How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
How to verify it
Manual test
Regression test
* [Celestica-E1031] Enable CPU watchdog (#16083)
Enable CPU watchdog on Celestica-E1031.
* Add info syslog for cpu_wdt.service (#16678)
Why I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
How I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
Why I did it
Release notes for Cisco 8111-32EH-O, 8102-64H-O and 8101-32FH-O:
• Fixed a bug in PFC-WD where watchdog is triggered too often when sparse traffic is present, failing to detect the traffic traversal - (SR 696617830)
• Resolved an issue where SAI_STATUS_ITEM_NOT_FOUND error was seen while adding LAG members - (MIGSMSFT-354)
• Fixed Thermal API related error message (MIGSMSFT-354)
• Fixed an issue related to default config trap - (MIGSMSFT-354)
• Changed the message log level from error to debug in situations when the HW offloaded session is not found or was never created for the packet received. (MIGSMSFT-354)
• Fixed an issue where drop option was not working when encap and decap IPinIP tunnels share the same SDK tunnel port.
• Fixed an error while running VRF testcase (MIGSMSFT-354)
• Fixed an issue where BFD packets not egressing using Queue 7
• SAI support for additional FEC related attributes:
· SAI_PORT_ATTR_MAX_FEC_SYMBOL_ERRORS_DETECTABLE
· SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0
. SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S16
Work item tracking
Microsoft ADO (number only):