Commit Graph

8342 Commits

Author SHA1 Message Date
spilkey-cisco
3b982c073c Fix system-health hardware_checker to consume fan tolerance details (#16689)
Why I did it

Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:

ADO: 25279165

root@sonic/# show system-health summary
System status summary

  System status LED  red_blink
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: Failed to get speed tolerance for fantray5.fan1
	     Failed to get speed tolerance for fantray5.fan0
	     Failed to get speed tolerance for fantray4.fan1
	     Failed to get speed tolerance for fantray4.fan0
	     Failed to get speed tolerance for fantray3.fan1
	     Failed to get speed tolerance for fantray3.fan0
	     Failed to get speed tolerance for fantray2.fan1
	     Failed to get speed tolerance for fantray2.fan0
	     Failed to get speed tolerance for fantray1.fan1
	     Failed to get speed tolerance for fantray1.fan0
	     Failed to get speed tolerance for fantray0.fan1
	     Failed to get speed tolerance for fantray0.fan0
	     Failed to get speed tolerance for PSU1.fan0
	     Failed to get speed tolerance for PSU0.fan0

How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.

How to verify it
root@sonic:/# show system-health summary
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK
2024-02-02 14:33:14 +08:00
jingwenxie
93eaa3cac0 Update TELEMETRY_CLIENT YANG model (#16861)
### Why I did it
Github issue: https://github.com/sonic-net/sonic-buildimage/issues/16356. The YANG definition breaks GCU feature.

We can either update sonic_yang and GCU's search algorithm to enable the same key count case or simply update YANG model to solve the issue.

The pros for update YANG model are it could solve the issue directly and we don't need to handle the complicate search algorithm in sonic_yang and GCU. This is the only YANG model that has this issue.

### How I did it
Combine two list into one. The previous YANG validation unit tests are still applicable.
#### How to verify it
Unit test and E2E test
2024-02-02 14:33:11 +08:00
Baorong Liu
770ffb1ecd [staticroutebfd] fix an error in error logging (#17043)
Why I did it
Fix an error in the log_err call.
this error can be triggered by an invalid static route key. usually the code cannot go here with normal config file. but hit this issue with an invalid key by manual testing with redis-cli directly. the file is scanned by Python lint to prevent such errors.

Work item tracking
Microsoft ADO ():26250268

How I did it
fix the format error.

How to verify it
1, ran pylint to check the design, make sure no such error in the design file.
2, wrote a separate python program to verify the log call.
In the current logging related testing, usually use patch/mock for logging. for this specific error, could not trigger it if we call mock function instead the real function in the design. so need to do lint checking for code change.
2024-02-02 14:33:08 +08:00
SuvarnaMeenakshi
6905ab74dc [SNMP]: Modify minigraph parser to update SNMP_AGENT_ADDRESS_CONFIG table (#17045)
#### Why I did it
SNMP query over IPv6 does not work due to issue in net-snmp where IPv6 query does not work on multi-nic environment.
To get around this, if snmpd listens on specific ipv4 or ipv6 address, then the issue is not seen.
We plan to configure Management IP and Loopback IP configured in minigraph.xml as SNMP_AGENT_ADDRESS in config_db., based on changes discussed in https://github.com/sonic-net/SONiC/pull/1457.

##### Work item tracking
- Microsoft ADO **(number only)**:26091228

#### How I did it
Modify minigraph parser to update SNMP_AGENT_ADDRESS_CONFIG with management and Loopback0 IP addresses.
Modify snmpd.conf.j2 to use SNMP_AGENT_ADDRESS_CONFIG table if it is present in config_db, if not listen on any IP.
Main change:
1. if minigraph.xml is used to configure the device, then snmpd will listen on mgmt and loopback IP addresses,
2. if config_db is used to configure the device, snmpd will listen IP present in SNMP_AGENT_ADDRESS_CONFIG  if that table is present, if table is not present snmpd will listen on any IP.
#### How to verify it
config_db.json created from minigraph.xml for single asic VS image with mgmt and Loopback IP addresses.
```
    "SNMP_AGENT_ADDRESS_CONFIG": {
        "10.1.0.32|161|": {},
        "10.250.0.101|161|": {},
        "FC00:1::32|161|": {},
        "fec0::ffff:afa:1|161|": {}
    },
 .....
 
 snmpd listening on the above IP addresses:
 admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      71522/snmpd         
udp        0      0 10.250.0.101:161        0.0.0.0:*                           71522/snmpd         
udp        0      0 10.1.0.32:161           0.0.0.0:*                           71522/snmpd         
udp6       0      0 fec0::ffff:afa:1:161    :::*                                71522/snmpd         
udp6       0      0 fc00:1::32:161          :::*                                71522/snmpd  
```
2024-02-02 14:33:04 +08:00
byu343
c469359cef [Arista] Use port_config.ini for Arista-7050QX-32S-S4Q31 (#17253)
This change of removing hwsku.json is to correct the port index for
sfp ports (Ethernet0, Ethernet1, Ethernet2, Ethernet3) by using
port_config.ini, which should be '1, 2, 3, 4'. We could not do it
with hwsku.json, as it is defined as '5, 5, 5, 5' by platform.json
for the breakout_mode 1x40G[10G].
2024-02-02 14:33:01 +08:00
Liping Xu
1e4dcbc75d disable restapi for leafRouter in slim image (#17713)
Why I did it
For some devices with small memory, after upgrading to the latest image, the available memory is not enough.

Work item tracking
Microsoft ADO (number only):
26324242
How I did it
Disable restapi feature for LeafRouter which with slim image.

How to verify it
verified on 7050qx T1 (slim image), restapi disabled
verified on 7050qx T0 (slim image), restapi enabled
verified on 7260 T1 (normal image), restapi enabled
2024-02-02 14:32:57 +08:00
Liu Shilong
015ce751a4 [build] Fix a bash script some times called by sh issue. (#17761)
Why I did it
Fix a bug that sometimes the script runs in sh not bash.

Work item tracking
Microsoft ADO (number only): 26297955
How I did it
2024-02-02 14:32:53 +08:00
Hua Liu
b84e3f9e8a Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281)
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.

#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.

After investigation, I found the IPV6 'default' route table does not add to route lookup:

admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
admin@vlab-01:~$

As compare:
admin@vlab-01:~$ ip -4 rule list
1001:   from all lookup local
32764:  from all to 172.17.0.1/24 lookup default
32765:  from 10.250.0.101 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table exist in IPV4 route lookup

Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$

##### Work item tracking
- Microsoft ADO: 25798732

#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.

#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:

### Tested branch (Please provide the tested image version)

- [x]  master-17281.417570-2133d58fa

#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
2024-02-02 14:32:50 +08:00
Yaqiang Zhu
8f9e58c033 [dhcp_relay] Optimize j2 file in dhcp_relay container (#17506) 2024-02-02 14:32:46 +08:00
Yaqiang Zhu
0b84b8fc30 [dhcp_server] Remove dependency in port-name-alias-map.txt.j2 (#17858)
* [dhcp_server] Remove dependency in port-name-alias-map.txt.j2
2024-02-02 14:32:41 +08:00
vdahiya12
9eef01d7a7 [Arista] Update config.bcm of 7060_cx32s for handling 40g optics with unreliable los settings (#17768)
For 40G optics there is SAI handling of T0 facing ports to be set with SR4 type and unreliable los set for a fixed set of ports. For this property to be invoked the requirement is set
phy_unlos_msft=1 in config.bcm.
This change is to meet the requirement and once this property is set, the los/interface type settings is applied by SAI on the required ports.

Why I did it
For Arista-7060CX-32S-Q32 T1, 40G ports RX_ERR minimalization during connected device reboot
can be achieved by turning on Unreliable LOS and SR4 media_type for all ports which are connected to T0.

The property phy_unlos_msft=1 is to exclusively enable this property.

Microsoft ADO: 25941176

How I did it
Changes in SAI and turning on property

How to verify it
Ran the changes on a testbed and verified configurations are as intended.

with property

admin@sonic2:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on                    = 0
Media Type                  = 2
Unreliable LOS              = 1
Scrambling Disable          = 0
Lane Config from PCS        = 0

without property

admin@sonic:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on                    = 0
Media Type                  = 0
Unreliable LOS              = 0
Scrambling Disable          = 0
Lane Config from PCS        = 0

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2024-02-02 14:32:38 +08:00
davidpil2002
765377ac81 password-hardening: Add support to disable expiration date like in Linux (PAM) (#17426)
- Why I did it
Enhance the feature to support disabling password hardening as Linux support.
-1: expiration will never occur
0: expiration will expired immediately

Opened bug:
#17427

- How I did it
Added the -1 value to be supported in hostcfgd and this value will propagate to the relevant Linux files

- How to verify it
Pls see the details in the bug description that link attached above
2024-02-02 14:32:34 +08:00
Hua Liu
f35512ef0a [TACACS] Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue. (#17749)
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.

#### Why I did it
When set TACACS to "tacacs+, local", user still can run a blocked command with local permission.

##### Work item tracking
- Microsoft ADO: 26399545

#### How I did it
Fix code to reject command when authorized failed from TACACS server side.

#### How to verify it
Pass all UT.

### Description for the changelog
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
2024-02-02 14:32:29 +08:00
Ze Gan
34a86bd8f9 [Azp]: Fix azp on building ubuntu20.04 and sonic-mgmt (#17439)
The Azp failed on ubuntu20.04 and sonic-mgmt building due to sonic-dash-api updating.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2024-02-02 14:32:23 +08:00
Sudharsan Dhamal Gopalarathnam
fcef7d1095
[202311][Mellanox]Update SAI to 2311.26.0.28, SDK/FW to 4.6.2202/2012.2202 (#17975) 2024-02-01 17:53:49 -08:00
Sudharsan Dhamal Gopalarathnam
27b05ddac3
[FRR] Fix zebra memory leak when bgp fib suppress pending is enabled (#17484) (#17977)
Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983

While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
2024-02-01 16:28:21 -08:00
mssonicbld
e676ad1aa0
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#17979)
#### Why I did it
src/sonic-linux-kernel
```
* 342f6c3 - (HEAD -> 202311, origin/202311) [kconfig] Set default SATA Link Power Management policy (#363) (4 hours ago) [Volodymyr Samotiy]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-01 18:33:13 +08:00
mssonicbld
747ddb1978
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17980)
#### Why I did it
src/sonic-platform-common
```
* 83a8c7a - (HEAD -> 202311, origin/202311) Fix issue: QSFP module with id 0x0d can be parsed using 8636 (#412) (4 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-01 16:32:49 +08:00
Nazarii Hnydyn
48f7613f95 [swss/syncd]: Remove dependency on interfaces-config.service (#17739)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2024-02-01 09:34:16 +08:00
Nazarii Hnydyn
f419319e0e [installer] Create a blank grubenv if doesn't exist. (#17414)
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE

- How I did it
Initialized empty GRUB environment file after ONiE installation

- How to verify it
Install image from ONiE
Run BIOS firmware upgrade

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2024-02-01 09:34:12 +08:00
Stepan Blyshchak
23190ad000 [config-chassisdb] use cached variables (#17342)
- Why I did it
Improve boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-02-01 09:34:09 +08:00
Junchao-Mellanox
d53fba12cb Fix error log while creating PSU thermal object (#17789)
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:

Jan  8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist

Jan  8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist

- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
2024-02-01 09:34:05 +08:00
Oleksandr Ivantsiv
3ba603960b [dns] Do not apply dynamic DNS configuration when MGMT interface has static IP address. (#17769)
### Why I did it
Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test.

### How I did it
Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address.

#### How to verify it
Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.
2024-02-01 09:34:02 +08:00
Nazarii Hnydyn
d88051cc3a [sonic-cfggen]: Optimize template rendering and database access. (#17740)
#### Why I did it
* Improved switch init time

### How I did it
* Replaced: `sonic-cfggen` -> `sonic-db-cli`
* Aggregated template list for `sonic-cfggen`

#### How to verify it
1. Run `warm-reboot`
2024-02-01 09:33:58 +08:00
mssonicbld
e2ae581028
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17954)
#### Why I did it
src/sonic-utilities
```
* 9c1d489c - (HEAD -> 202311, origin/202311) Fix database initialization for db_migrator (#3100) (10 hours ago) [ganglv]
* e9ae14d2 - Support golden config in db migrator (#3076) (16 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-31 16:32:37 +08:00
Lior Avramov
4599f7aeaf [Nvidia] Update syncd docker to use python version 3 (#17735)
* Remove python2 from compilation of python-sdk-api

* Upgrade Python version in syncd RPC docker image to Python3
2024-01-31 10:32:20 +08:00
Prince George
6b8549c3bb Fix the fsck script that does filesystem repair (#17424)
Fix the fsck check which is not working. Potentially fixes #16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides

Microsoft ADO: 26098631
2024-01-30 06:32:12 +08:00
Kevin Wang
2b21e64ffc [qos] change the template keyword from Compute-AI to ComputeAI (#17902)
Why I did it
Align the keywords to make qos configuration take effect

Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI

How to verify it
reload minigraph and check the qos configuration
2024-01-29 16:32:34 +08:00
mssonicbld
4166d83f04
[ci/build]: Upgrade SONiC package versions (#17917) 2024-01-27 18:39:42 -08:00
mssonicbld
9f7166b5d2
Change tcp port range to support telemetry and gnmi (#17907) (#17916)
* Reserve tcp port for telemetry and gnmi

* Use ip_local_port_range instead

* Fix sysctl config

Co-authored-by: ganglv <88995770+ganglyu@users.noreply.github.com>
2024-01-26 13:27:30 -08:00
Ying Xie
388c3f5f90
[202311][sonic-utilities] Revert bgp suppress fib pending (#17915)
Revert "Revert "[202311] Revert bgp suppress fib pending" (#17882)"

This reverts commit 1ccee478e2.

sonic-utilities:
* be294f39 2024-01-25 | [202311] Revert bgp suppress fib pending (#3003) (HEAD -> 202311, github/202311) [Stepan Blyshchak]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2024-01-26 13:12:10 -08:00
Liu Shilong
d41a23986e
[ci] Remove barefoot build in Pinning Version pipeline. (#17901)
Why I did it
Remove barefoot platform in pinning version pipeline.

Work item tracking
Microsoft ADO (number only): 26515265
2024-01-25 19:00:57 +08:00
mssonicbld
c7d84d26d5
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17898)
#### Why I did it
src/sonic-utilities
```
* e70b0546 - (HEAD -> 202311, origin/202311) [202311] Revert bgp suppress fib pending (#3109) (9 hours ago) [Stepan Blyshchak]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-25 16:34:50 +08:00
mssonicbld
7430574fc8
[submodule] Update submodule sonic-snmpagent to the latest HEAD automatically (#17890)
src/sonic-snmpagent

* 03e8bcd - (HEAD -> 202311, origin/202311) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (10 hours ago) [DavidZagury]
2024-01-23 23:42:51 -08:00
mssonicbld
05dafc9653
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17888)
src/linkmgrd

* 74e455e - (HEAD -> 202311, origin/202311) [active-standby] Probe the link in suspend timeout (#235) (22 hours ago) [Longxiang Lyu]
2024-01-23 23:42:23 -08:00
mssonicbld
30fe193900
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17880)
src/sonic-swss

* b78a4181 - (HEAD -> 202311, origin/202311) Add host_tx_ready enhancements (#2930) (2 days ago) [noaOrMlnx]
2024-01-23 23:41:58 -08:00
Ying Xie
1ccee478e2
Revert "[202311] Revert bgp suppress fib pending" (#17882) 2024-01-23 08:43:43 -08:00
dbarashinvd
d7a77601e4 fix low polarity wrong value for hw_reset deassert and seek(0) before reading sysfs upon poll event (#17627)
* fix hw_reset low polarity (reverse values)

* move seek to beginning of sysfs fd before reading to resolve power_good
sysfs returns empty upon plug out cable
2024-01-23 09:38:50 +08:00
mssonicbld
204c5528d7
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17846)
src/sonic-utilities

* 7a242eeb - (HEAD -> 202311, origin/202311) [202311] Support reading/writing module EEPROM data by page and offset (#3008) (#3073) (2 days ago) [Junchao-Mellanox]
* cb0fd428 - [202311] Collect module EEPROM data in dump (#3009) (#3124) (3 days ago) [Junchao-Mellanox]
2024-01-22 10:19:52 -08:00
mssonicbld
aa691f1f90
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17845)
src/sonic-platform-common

* 570bb3f - (HEAD -> 202311, origin/202311) Fix memory map parsing issue (#427) (3 days ago) [Stephen Sun]
2024-01-22 10:19:25 -08:00
mssonicbld
51ba2cc071
[Mellanox][SKU] Adding Mellanox-SN4700-O8V48 SKU (#17425) (#17831)
- Why I did it
To add new SKU Mellanox-SN4700-O8V48 with following requirements:

- How I did it
Create new SKU files based on the below definition:
* Port Mapping: 1-12 2x200G, 13-20 1x400G, 21-32 2x200G
   T0 topology: 48x200G Downlinks 8x400G uplinks.
   Length of downlink: 5m
   Length of uplink: 40m
* Auto-negotiation enable/disable: Yes
* FEC mode: RS
* Shared headroom: Enabled
* Shared headroom pool factor: 2
* Warmboot enabled: yes

- How to verify it
SONiC build with new SKU finish init, all ports up, qos tests suite from sonic-mgmt

Co-authored-by: DavidZagury <32644413+DavidZagury@users.noreply.github.com>
2024-01-22 10:18:30 -08:00
Stepan Blyshchak
b527372642
[202311] Revert bgp suppress fib pending (#17660)
* [FRR] Bring back patches required for FPM plugin

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

* [zebra] use fpm plugin instead of dplane_fpm_nl

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

* Revert BGP suppress FIB pending

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

---------

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-01-22 10:17:50 -08:00
ganglv
87caed6f62
Fix host service for 202311 branch (#17782) 2024-01-21 21:15:22 -08:00
Ye Jianquan
9fb1b889cc
[202311, PR test] Use 202311 as the test branch (#17843)
* [PR test] Use 202311 as the test branch
2024-01-21 21:04:16 -08:00
Junchao-Mellanox
8d65e2c517 [Mellanox] Fix issues found for CMIS host management (#17637)
- Why I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How to verify it
Manual test
Unit test
2024-01-20 06:32:58 +08:00
Longxiang Lyu
3f29b28b36 [dualtor] Disable zebra link-detect for vlan interfaces (#17784)
* [dualtor] Disable zebra link-detect for vlan interfaces

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2024-01-19 06:33:12 +08:00
Junchao-Mellanox
0fbdc2b8ed [Mellanox] wait until hw-management watchdog files ready (#17618)
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.

- How I did it
adds a wait to make sure watchdog has been fully initialized.

- How to verify it
Manual test
sonic regression
2024-01-19 04:32:53 +08:00
Saikrishna Arcot
9d918889e4 dhcrelay: Don't look up the ifindex for the fallback interface (#17797)
Currently, whenever isc-dhcp-relay forwards a packet upstream,
internally, it will try to send it on a "fallback" interface. My
understanding is that this isn't meant to be a real interface, but
instead is basically saying to use Linux's regular routing stack to
route the packet appropriately (rather than having isc-dhcp-relay
specify specifically which interface to use).

The problem is that on systems with a weak CPU, a large number of
interfaces, and many upstream servers specified, this can introduce a
noticeable delay in packets getting sent. The delay comes from trying to
get the ifindex of the fallback interface. In one test case, it got to
the point that only 2 packets could be processed per second. Because of
this, dhcrelay will easily get backlogged and likely get to a point
where packets get dropped in the kernel.

Fix this by adding a check saying if we're using the fallback interface,
then don't try to get the ifindex of this interface. We're never going
to have an interface named this in SONiC.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2024-01-18 16:33:48 +08:00
Nazarii Hnydyn
1687a442de [frr]: Force disable next hop group support. (#17344)
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

Closes #17345

This W/A was proposed by Nvidia FRR team before the long term solution is ready.

Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
2024-01-18 14:36:23 +08:00
Junchao-Mellanox
56ba5b10b4 [Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2024-01-17 06:33:11 +08:00