Commit Graph

7676 Commits

Author SHA1 Message Date
Vaibhav Hemant Dixit
9649a44470
Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)
This reverts commit 02b17839c3.

Reverts #14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
2023-06-16 13:58:38 -07:00
Prince Sunny
6df70097b4
Fix a check for yang validation (#15498)
[Sonic-Config-Engine] Re-add the yang validation check accidently removed by #13409
2023-06-16 10:34:22 -07:00
mssonicbld
078b18df6d
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15468)
#### Why I did it
src/sonic-swss
```
* 87e0b08 - (HEAD -> master, origin/master, origin/HEAD) [portsorch]: Enhancing SWSS OA logs to capture host_tx_ready change events (#2822) (11 hours ago) [mihirpat1]
* c7e52a0 - [subinterface]: Fix admin state handling. (#2806) (34 hours ago) [Nazarii Hnydyn]
* ebfda13 - [aclorch] Fix TODO: use SAI object API to query capabilities (#2743) (2 days ago) [Stepan Blyshchak]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-16 16:29:34 +08:00
mssonicbld
c6d242180b
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#15504)
#### Why I did it
src/sonic-gnmi
```
* a600dc9 - (HEAD -> master, origin/master, origin/HEAD) Fix threading issues in Event Client (#121) (9 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-16 16:29:27 +08:00
mssonicbld
0d10c7cbd9
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15507)
#### Why I did it
src/sonic-swss-common
```
* 2320ddc - (HEAD -> master, origin/master, origin/HEAD) Add ZMQ port for orchagent (#795) (19 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-16 16:29:22 +08:00
Stepan Blyshchak
e2e5b77f16
[mlnx-ffb.sh] Update issu-version location (#14925)
#### Why I did it

ISSU version check fails due to inability to mount squashfs from 202211 on 201911

#### How I did it

Put ISSU version file under platform directory

#### How to verify it

Warm-upgrade matrix:
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211
- 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master
- 202205 (with this change cherry-picked) to master
2023-06-15 15:14:52 -07:00
mssonicbld
4819b85a3d [submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically 2023-06-15 16:32:43 +08:00
mssonicbld
dd8f3e6172
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15469) 2023-06-15 15:52:07 +08:00
byu343
c2b2407335
[Arista] Update hwsku.json for Arista-7050QX-32S-S4Q31 (#15251)
* [Arista] Update hwsku.json for Arista-7050QX-32S-S4Q31

* Change to 3x10G(3)+1x1G(1) on Arista-7050QX-32S-S4Q31
2023-06-14 16:16:24 -07:00
Prince Sunny
f75116ab7a
Create default Vxlan and Vnet configs (#13409)
* Create default Vxlan and Vnet configs for ToRs with Appliance Resource type
2023-06-14 16:07:46 -07:00
Samuel Angebault
afc6f7acc7
[Arista] fix platform.json for a few devices (#15308)
Why I did it
sonic-mgmt is failing tests due to invalid test data in platform.json
Fwutil is upset the chassis name in the platform_component.json of the 7060CX-32S

How I did it
Fixed the aforementioned issues
2023-06-14 13:19:28 -07:00
pavannaregundi
bdc1d7ac35
[Marvell] Update armhf driver version (#15138)
Changes in MRVL_PRESTERA_DRIVER_1.4:
- Memory leak fixed by releasing pci device after retrieval.
- Fixes for 5.10 kernel porting.

Change-Id: I1d7ee4ec02ec17a29ddb8473725ab68ca399748b

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-06-14 10:54:30 -07:00
Rajesh Perumal R
ff4be8e8bc
sonic-yang-models: WRED statistics yang (#14758)
* Yang added for WRED_ECN_QUEUE flex counter group
* Yang added for WRED_ECN_PORT flex counter group

  Signed-off-by: rperumal@marvell.com
2023-06-13 22:29:35 -07:00
Saikrishna Arcot
f84dfd2345
Re-add 127.0.0.1/8 when bringing down the interfaces (#15080)
* Re-add 127.0.0.1/8 when bringing down the interfaces

With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.

To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.

Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-13 18:45:39 -07:00
Lior Avramov
c05d017091
[Mellanox] Remove iproute2 SDK patches from SONiC tree and consume them from SDK github (#15062)
- Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.

- How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.

- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-06-13 15:17:52 +03:00
Stephen Sun
238e6ffcc1
[Mellanox] Adjust warning threshold implementation according to the latest algorithm update (#15092)
- Why I did it
Adjust the warning threshold implementation according to the latest algorithm update

- How I did it
Modify power warning and critical thresholds methods

- How to verify it
Unit test updated to cover the change

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-06-13 15:14:10 +03:00
Kebo Liu
3cb13226be
Update SN5600 platform.json with service port sfp (#15337)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-06-13 14:15:15 +03:00
mssonicbld
1343b1eba3 [submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically 2023-06-13 18:32:53 +08:00
mssonicbld
d7e75f48bf [submodule] Update submodule sonic-host-services to the latest HEAD automatically 2023-06-13 16:32:51 +08:00
mssonicbld
2227365107 [submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically 2023-06-13 16:32:46 +08:00
mssonicbld
9ddb9d6852 [submodule] Update submodule sonic-mgmt-framework to the latest HEAD automatically 2023-06-13 16:32:42 +08:00
mssonicbld
713a8a8a7e [submodule] Update submodule sonic-swss to the latest HEAD automatically 2023-06-13 16:32:34 +08:00
jingwenxie
54a1ad10f9
[yang] Change asn to start from 0 for bgp monitor (#15350)
#### Why I did it
The asn 0 in BGP_MONITOR is invalid by YANG definition. However, the asn 0 in BGP_MONITOR is found in many devices. 
It was introduced by minigraph where its value is set to 0.
To unblock Config Updater test, the short term fix is to accept the asn 0 in BGP_MONITOR. 
We can revert this after NGS team make all the ASN change in minigraph.
##### Work item tracking
- Microsoft ADO **(24186140)**:

#### How I did it
Change the range
#### How to verify it
Unit test.
2023-06-12 21:59:34 -07:00
Hua Liu
05f1a5a31e
Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429)
Add watchdog mechanism to swss service and generate alert when swss have issue. 

**Work item tracking**
Microsoft ADO (number only): 16578912

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.

Manually test process_monitoring/test_critical_process_monitoring.py can pass.

Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.

Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-12 17:53:54 -07:00
Alpesh Patel
633fff8c10
enable ethernet backplane port support in port config for packet mode T2 devices (#14533)
For T2 systems using packet mode, the backplane interfaces (Ethernet-BP#) and the fabric card ethernet interfaces are not visible as neighbor interfaces.
In packet mode, these interfaces needs qos and buffer config as well.
This fix addresses that issue and adds the backplane interfaces to the PORTS_ACTIVE list
2023-06-12 14:02:22 -07:00
mssonicbld
cb9d9e57a6
[ci/build]: Upgrade SONiC package versions (#15431)
Upgrade SONiC Versions
2023-06-12 22:27:29 +08:00
mssonicbld
c74629a83a [submodule] Update submodule sonic-utilities to the latest HEAD automatically 2023-06-12 16:32:51 +08:00
mssonicbld
6b9c100974 [submodule] Update submodule sonic-host-services to the latest HEAD automatically 2023-06-11 16:32:32 +08:00
mssonicbld
50238d8039 [submodule] Update submodule sonic-platform-common to the latest HEAD automatically 2023-06-11 16:32:27 +08:00
mssonicbld
a45595158b
[ci/build]: Upgrade SONiC package versions (#15345) 2023-06-10 20:38:13 +08:00
mssonicbld
df20467b29
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15425) 2023-06-10 17:03:02 +08:00
mssonicbld
7f3d68f4c2 [submodule] Update submodule sonic-gnmi to the latest HEAD automatically 2023-06-10 16:32:55 +08:00
mssonicbld
bad9099fba [submodule] Update submodule linkmgrd to the latest HEAD automatically 2023-06-10 16:32:50 +08:00
mssonicbld
5c18870688
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#15402) 2023-06-10 16:30:05 +08:00
mssonicbld
a48a813d08
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15370) 2023-06-10 16:17:01 +08:00
mssonicbld
dc4eb9e90d
[submodule] Update submodule sonic-ztp to the latest HEAD automatically (#15426) 2023-06-10 16:05:44 +08:00
mssonicbld
e662c480dc
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15403) 2023-06-10 15:57:18 +08:00
mssonicbld
516e7930b2
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15401) 2023-06-10 15:30:27 +08:00
Liping Xu
78c41a1e58
allow docker_inram to kernel cmd list (#15374)
Why I did it
After docker_inram is enabled, the docker folder's default max size is 1.5G.
It's not big enough for some tests which need to install additional docker images or install extra packages.

Work item tracking
Microsoft ADO 24199761:
How I did it
add docker_inram into cmdline_allowlist

How to verify it
sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append'
sudo reboot and check the docker folder size
2023-06-10 14:19:44 +08:00
Sudharsan Dhamal Gopalarathnam
162856ad9a
[sflow]Delay starting sflow service until ports are created (#15333)
* [sflow]Delay starting sflow service until ports are created
* Removing sflow from sonic.target dependency since it will be managed by hostcfgd
2023-06-09 16:28:15 -07:00
Saikrishna Arcot
d466994e91
teamd: Add support for custom retry counts for LACP sessions (#13453)
Why I did it
This is to add support for specifying custom retry counts for LACP sessions. This is to make warmboot easier on low-storage and low-memory platforms, by allowing more than 90 seconds of downtime.

How I did it
How to verify it
Tested manually with these cases:

Verify that changing the retry count using teamdctl PortChannel101 state item set runner.retry_count 5 takes effect
Verify that the retry count change actually affects when the LAG goes down by forcefully killing teamd on one side (i.e. setting the retry count to 5 causes the LAG to go down after 150 seconds)
Verify that the retry count gets reset to 3 after the LAG goes down for whatever reason
Verify that the retry count gets reset to 3 after some period of time (30 seconds * retry count)
Test cases are in sonic-net/sonic-mgmt#7961 and sonic-net/sonic-mgmt#8152.


Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-09 10:03:25 -07:00
mssonicbld
2b5c0dd0c6
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15404) 2023-06-09 15:57:30 +08:00
Ye Jianquan
cec9d7b83a
Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686)" (#15390)
This reverts commit 44427a2f6b.
Docker image not updated during PR validation and caused PR check failures.
Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.
2023-06-09 09:10:35 +08:00
Arvindsrinivasan Lakshmi Narasimhan
0f194c5a03
set the default value for the port fec to RS on J2 based LC (#15346)
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard

How to verify it
Tests on chassis
2023-06-08 11:08:48 -07:00
Vivek
9d8ab1b8e4
[Mellanox] Added patchwork link to commit message (#15301)
- Why I did it
Add the patchwork link to the commit description for non-upstream patches if present

- How I did it
Parse the patchwork/<patch_name>.txt file from hw-mgmt
2023-06-08 18:51:58 +03:00
Liu Shilong
96cac8e918
[ci] Add marvell-arm64 build in PR checks. (#15356)
Why I did it
Add marvell-arm64 platform build in PR checks to avoid build break.

Work item tracking
Microsoft ADO (number only): 17257160
How I did it
How to verify it
2023-06-08 09:40:20 +08:00
Ikki Zhu
9fcbd5ed1d
fix possible cpld race access issue (#15371)
Why I did it
fix possible cpld race read issue between watchdog and reboot cause
process

How I did it
Use fcntl.flock to limit parallel access to cpld sys file

How to verify it
It can be simulated and verified with following python script

``` python3
import fcntl
import signal
import threading

exit_flag = False

def get_cpld_reg_value(getreg_path, register):
    file = open(getreg_path, 'w+')
    # Acquire an exclusive lock on the file
    fcntl.flock(file, fcntl.LOCK_EX)

    try:
        file.write(register + '\n')
        file.flush()

        # Seek to the beginning of the file
        file.seek(0)

        # Read the content of the file
        result = file.readline().strip()
    finally:
        # Release the lock and close the file
        fcntl.flock(file, fcntl.LOCK_UN)
        file.close()

    return result

def cpld_read(thread_num, cpld_reg, expect_val):
    while not exit_flag:
        val
= get_cpld_reg_value("/sys/devices/platform/dx010_cpld/getreg",
cpld_reg)
        #print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}")
        if val != expect_val:
            print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}, expect_val {expect_val}")

def signal_handler(sig, frame):
    global exit_flag
    print("Ctrl+C detected. Quitting...")
    exit_flag = True

if __name__ == '__main__':
    # Register the signal handler for Ctrl+C
    signal.signal(signal.SIGINT, signal_handler)

    t1 = threading.Thread(target=cpld_read, args=(1, '0x103', '0x11',))
    t2 = threading.Thread(target=cpld_read, args=(2, '0x141', '0x00',))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
```
2023-06-07 11:29:18 -07:00
Yevhen Fastiuk
8a6d45227e
[Clock] Add timezone config YANG model (#14651)
* Add the ability to configure timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add YANG model for timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add timezone reference

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

---------

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2023-06-07 10:39:24 -07:00
abdosi
6139c525d2
updated internal route policy for chassis-packet (#15349)
What I did:

Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state

Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops

For VOQ chassis there is no e-BGP peer (connected route via bgp )  resolution as route is added as Static route by orchagent over Ethernet-IB.

Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507

How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-06-07 09:17:44 -07:00
Rajkumar-Marvell
94790bef04
[sflow] Add egress sflow support. (#14630)
* [sflow] Add egress sflow support.
- Updated sonic-yang-model
- change hsflowd version to 2.0.45
2023-06-06 11:23:39 -07:00