Commit Graph

7704 Commits

Author SHA1 Message Date
jingwenxie
54a1ad10f9
[yang] Change asn to start from 0 for bgp monitor (#15350)
#### Why I did it
The asn 0 in BGP_MONITOR is invalid by YANG definition. However, the asn 0 in BGP_MONITOR is found in many devices. 
It was introduced by minigraph where its value is set to 0.
To unblock Config Updater test, the short term fix is to accept the asn 0 in BGP_MONITOR. 
We can revert this after NGS team make all the ASN change in minigraph.
##### Work item tracking
- Microsoft ADO **(24186140)**:

#### How I did it
Change the range
#### How to verify it
Unit test.
2023-06-12 21:59:34 -07:00
Hua Liu
05f1a5a31e
Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429)
Add watchdog mechanism to swss service and generate alert when swss have issue. 

**Work item tracking**
Microsoft ADO (number only): 16578912

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.

Manually test process_monitoring/test_critical_process_monitoring.py can pass.

Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.

Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-12 17:53:54 -07:00
Alpesh Patel
633fff8c10
enable ethernet backplane port support in port config for packet mode T2 devices (#14533)
For T2 systems using packet mode, the backplane interfaces (Ethernet-BP#) and the fabric card ethernet interfaces are not visible as neighbor interfaces.
In packet mode, these interfaces needs qos and buffer config as well.
This fix addresses that issue and adds the backplane interfaces to the PORTS_ACTIVE list
2023-06-12 14:02:22 -07:00
mssonicbld
cb9d9e57a6
[ci/build]: Upgrade SONiC package versions (#15431)
Upgrade SONiC Versions
2023-06-12 22:27:29 +08:00
mssonicbld
c74629a83a [submodule] Update submodule sonic-utilities to the latest HEAD automatically 2023-06-12 16:32:51 +08:00
mssonicbld
6b9c100974 [submodule] Update submodule sonic-host-services to the latest HEAD automatically 2023-06-11 16:32:32 +08:00
mssonicbld
50238d8039 [submodule] Update submodule sonic-platform-common to the latest HEAD automatically 2023-06-11 16:32:27 +08:00
mssonicbld
a45595158b
[ci/build]: Upgrade SONiC package versions (#15345) 2023-06-10 20:38:13 +08:00
mssonicbld
df20467b29
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15425) 2023-06-10 17:03:02 +08:00
mssonicbld
7f3d68f4c2 [submodule] Update submodule sonic-gnmi to the latest HEAD automatically 2023-06-10 16:32:55 +08:00
mssonicbld
bad9099fba [submodule] Update submodule linkmgrd to the latest HEAD automatically 2023-06-10 16:32:50 +08:00
mssonicbld
5c18870688
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#15402) 2023-06-10 16:30:05 +08:00
mssonicbld
a48a813d08
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15370) 2023-06-10 16:17:01 +08:00
mssonicbld
dc4eb9e90d
[submodule] Update submodule sonic-ztp to the latest HEAD automatically (#15426) 2023-06-10 16:05:44 +08:00
mssonicbld
e662c480dc
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15403) 2023-06-10 15:57:18 +08:00
mssonicbld
516e7930b2
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15401) 2023-06-10 15:30:27 +08:00
Liping Xu
78c41a1e58
allow docker_inram to kernel cmd list (#15374)
Why I did it
After docker_inram is enabled, the docker folder's default max size is 1.5G.
It's not big enough for some tests which need to install additional docker images or install extra packages.

Work item tracking
Microsoft ADO 24199761:
How I did it
add docker_inram into cmdline_allowlist

How to verify it
sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append'
sudo reboot and check the docker folder size
2023-06-10 14:19:44 +08:00
Sudharsan Dhamal Gopalarathnam
162856ad9a
[sflow]Delay starting sflow service until ports are created (#15333)
* [sflow]Delay starting sflow service until ports are created
* Removing sflow from sonic.target dependency since it will be managed by hostcfgd
2023-06-09 16:28:15 -07:00
Saikrishna Arcot
d466994e91
teamd: Add support for custom retry counts for LACP sessions (#13453)
Why I did it
This is to add support for specifying custom retry counts for LACP sessions. This is to make warmboot easier on low-storage and low-memory platforms, by allowing more than 90 seconds of downtime.

How I did it
How to verify it
Tested manually with these cases:

Verify that changing the retry count using teamdctl PortChannel101 state item set runner.retry_count 5 takes effect
Verify that the retry count change actually affects when the LAG goes down by forcefully killing teamd on one side (i.e. setting the retry count to 5 causes the LAG to go down after 150 seconds)
Verify that the retry count gets reset to 3 after the LAG goes down for whatever reason
Verify that the retry count gets reset to 3 after some period of time (30 seconds * retry count)
Test cases are in sonic-net/sonic-mgmt#7961 and sonic-net/sonic-mgmt#8152.


Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-09 10:03:25 -07:00
mssonicbld
2b5c0dd0c6
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15404) 2023-06-09 15:57:30 +08:00
Ye Jianquan
cec9d7b83a
Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686)" (#15390)
This reverts commit 44427a2f6b.
Docker image not updated during PR validation and caused PR check failures.
Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.
2023-06-09 09:10:35 +08:00
Arvindsrinivasan Lakshmi Narasimhan
0f194c5a03
set the default value for the port fec to RS on J2 based LC (#15346)
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard

How to verify it
Tests on chassis
2023-06-08 11:08:48 -07:00
Vivek
9d8ab1b8e4
[Mellanox] Added patchwork link to commit message (#15301)
- Why I did it
Add the patchwork link to the commit description for non-upstream patches if present

- How I did it
Parse the patchwork/<patch_name>.txt file from hw-mgmt
2023-06-08 18:51:58 +03:00
Liu Shilong
96cac8e918
[ci] Add marvell-arm64 build in PR checks. (#15356)
Why I did it
Add marvell-arm64 platform build in PR checks to avoid build break.

Work item tracking
Microsoft ADO (number only): 17257160
How I did it
How to verify it
2023-06-08 09:40:20 +08:00
Ikki Zhu
9fcbd5ed1d
fix possible cpld race access issue (#15371)
Why I did it
fix possible cpld race read issue between watchdog and reboot cause
process

How I did it
Use fcntl.flock to limit parallel access to cpld sys file

How to verify it
It can be simulated and verified with following python script

``` python3
import fcntl
import signal
import threading

exit_flag = False

def get_cpld_reg_value(getreg_path, register):
    file = open(getreg_path, 'w+')
    # Acquire an exclusive lock on the file
    fcntl.flock(file, fcntl.LOCK_EX)

    try:
        file.write(register + '\n')
        file.flush()

        # Seek to the beginning of the file
        file.seek(0)

        # Read the content of the file
        result = file.readline().strip()
    finally:
        # Release the lock and close the file
        fcntl.flock(file, fcntl.LOCK_UN)
        file.close()

    return result

def cpld_read(thread_num, cpld_reg, expect_val):
    while not exit_flag:
        val
= get_cpld_reg_value("/sys/devices/platform/dx010_cpld/getreg",
cpld_reg)
        #print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}")
        if val != expect_val:
            print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value
{val}, expect_val {expect_val}")

def signal_handler(sig, frame):
    global exit_flag
    print("Ctrl+C detected. Quitting...")
    exit_flag = True

if __name__ == '__main__':
    # Register the signal handler for Ctrl+C
    signal.signal(signal.SIGINT, signal_handler)

    t1 = threading.Thread(target=cpld_read, args=(1, '0x103', '0x11',))
    t2 = threading.Thread(target=cpld_read, args=(2, '0x141', '0x00',))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
```
2023-06-07 11:29:18 -07:00
Yevhen Fastiuk
8a6d45227e
[Clock] Add timezone config YANG model (#14651)
* Add the ability to configure timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add YANG model for timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add timezone reference

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

---------

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2023-06-07 10:39:24 -07:00
abdosi
6139c525d2
updated internal route policy for chassis-packet (#15349)
What I did:

Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state

Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops

For VOQ chassis there is no e-BGP peer (connected route via bgp )  resolution as route is added as Static route by orchagent over Ethernet-IB.

Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507

How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-06-07 09:17:44 -07:00
Rajkumar-Marvell
94790bef04
[sflow] Add egress sflow support. (#14630)
* [sflow] Add egress sflow support.
- Updated sonic-yang-model
- change hsflowd version to 2.0.45
2023-06-06 11:23:39 -07:00
mssonicbld
084d012749 [submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically 2023-06-06 16:32:12 +08:00
mssonicbld
40eb97c2f3
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15294) 2023-06-06 14:44:24 +08:00
mssonicbld
f78261cbac
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15355) 2023-06-06 14:34:37 +08:00
mssonicbld
ac56598db1 [submodule] Update submodule dhcprelay to the latest HEAD automatically 2023-06-06 14:33:15 +08:00
mssonicbld
ba241bbe3f [submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically 2023-06-06 14:33:06 +08:00
Hua Liu
44427a2f6b
Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686)
This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first.

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.
Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.
Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-05 22:21:17 -07:00
siqbal1986
381cfe4485
Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE,VNET_ROUTE_TUNNEL_TABLE to the list (#14992)
* The 3 tables in state DB need to be cleaned up after SWSS restart for have consistant state.
2023-06-05 13:18:50 -07:00
Kalimuthu-Velappan
2627dcc5b4
07.Version Cache - Support for PIP (#14613)
During build, lots of pip packages are getting installed through pip
install command.

This feature adds support for caching all the pip packages into local
cache path, so that subsequent build always loads from the cache.
2023-06-05 12:02:33 -07:00
Marty Y. Lok
d4a81ea121
[Nokia-IXR7250E][Devicedata] update the device data for Nokia IXR7250E platform (#15216)
Why I did it
Update the device data files to support 1024 LAGs for Nokia IXR7250E platform
fixes https://github.com/Nokia-ION/ndk/issues/15

How I did it
Update the lag_id_end=1024 in chassisdb.conf file and add the trunk_group_max_members=16 in the BCM config file

How to verify it
check to allow to create lag ids up to 1024 with 16 port members

Signed-off-by: mlok <marty.lok@nokia.com>
2023-06-05 12:02:05 -07:00
Kalimuthu-Velappan
9dce453552
06.Version Cache - Support for wget (#14612)
When a package is referenced from the web through wget command,
it downloads the package for every build.

This feature caches all the packages that are being downloaded from the
web, so that subsequent build always loads the cache instead of from
web.
2023-06-05 12:00:58 -07:00
Aravind Mani
b26445cf7b
Dell FPGA driver fix (#15144)
Why I did it
FPGA driver crash was observed in Dell FPGA based platforms.

How I did it
Fixed FPGA crash

How to verify it
Load FPGA driver and check whether the kernel crashes.
2023-06-05 11:01:46 -07:00
mssonicbld
4335690de7 [ci/build]: Upgrade SONiC package versions 2023-06-05 20:51:47 +08:00
Liu Shilong
d2915969d4
[ci] Add OVERRIDE_BUILD_OPTIONS in image build template. (#15309)
Why I did it
Set build options in pipeline UI.
Support setting reproducible build options to py2,py3 in release branch and none in master branch.

Work item tracking
Microsoft ADO (number only): 22335854
How I did it
How to verify it
2023-06-05 18:42:06 +08:00
Mai Bui
1477f779de
modify commands using utilities_common.cli.run_command and advance sonic-utilities submodule on master (#15193)
Dependency:
sonic-net/sonic-utilities#2718

Why I did it
This PR sonic-net/sonic-utilities#2718 reduce shell=True usage in utilities_common.cli.run_command() function.

Work item tracking
Microsoft ADO (number only): 15022050
How I did it
Replace strings commands using utilities_common.cli.run_command() function to list of strings

due to circular dependency, advance sonic-utilities submodule
72ca4848 (HEAD -> master, upstream/master, upstream/HEAD) Add CLI configuration options for teamd retry count feature (sonic-net/sonic-utilities#2642)
359dfc0c [Clock] Implement clock CLI (sonic-net/sonic-utilities#2793)
b316fc27 Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (sonic-net/sonic-utilities#2772)
dc59dbd2 Replace pickle by json (sonic-net/sonic-utilities#2849)
a66f41c4 [show] replace shell=True, replace xml by lxml, replace exit by sys.exit (sonic-net/sonic-utilities#2666)
57500572 [utilities_common] replace shell=True (sonic-net/sonic-utilities#2718)
6e0ee3e7 [CRM][DASH] Extend CRM utility to support DASH resources. (sonic-net/sonic-utilities#2800)
b2c29b0b [config] Generate sysinfo in single asic (sonic-net/sonic-utilities#2856)
2023-06-05 17:08:13 +08:00
DavidZagury
29051072ab
[FRR][CVE] Add FRR patches to fix CVEs: CVE-2022-43681 CVE-2022-40318 CVE-2022-40302 (#15262)
Add patches from PRs
https://github.com/FRRouting/frr/pull/12043
https://github.com/FRRouting/frr/pull/12247

#### Why I did it
To fix CVEs found in FRR 8.2

#### How I did it
Take commit from  the FRR repo and created a patch from them
2023-06-04 23:53:27 -07:00
mssonicbld
e1cb774b7d
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15328) 2023-06-04 15:47:32 +08:00
Tejaswini Chadaga
8058550c09
[bgp]: Add sudo check for TSA/B/C command execution (#15288)
TSA/B/C scripts invoke commands that require root permissions. If the user does not have sudo permissions, the scripts today execute until the command and throw a backtrace with error at the specific command. Added a check to ensure the operations check for root permissions upfront.
2023-06-03 23:47:26 +02:00
Neetha John
6a8f1bad63
[brcm] Update SOC properties for DLR_INIT based pfcwd recovery (#15286)
* [202205] Update SOC properties for DLR_INIT based pfcwd recovery (#15217)

Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism

How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-06-03 14:39:38 -07:00
Arvindsrinivasan Lakshmi Narasimhan
3f4b959d3f
[chassis] add libffi-dev for sonic-utilities (#15218)
In the PR sonic-net/sonic-utilities#2850 , for support remote access of linecards paramiko package is installed in sonic-utilities. libffi-dev needs to installed to be able to compile for armhf image

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-06-03 14:36:50 -07:00
mssonicbld
f80e182c22
[ci/build]: Upgrade SONiC package versions (#15325) 2023-06-03 19:45:07 +08:00
mssonicbld
e94e3f27e7
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15323) 2023-06-03 14:40:29 +08:00
mssonicbld
d4e0b99727
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#15322) 2023-06-03 14:39:33 +08:00