Commit Graph

7929 Commits

Author SHA1 Message Date
Gokulnath Raja
de2f7bd7b9
Upgrading hsflowd version from 2.0.35 to 2.0.51-26 to address for [sflow]ERR sflow#hsflowd: device Loopback0 Get SIOCGIFFLAGS failed : No such device #13407 (#15362)
Signed-off-by: Gokulnath-Raja <Gokulnath_R@dell.com>
Co-authored-by: mohanapriya-meganathan <mohanapriya.m1@dell.com>
2023-09-13 17:18:34 -07:00
anamehra
78981d93b8
Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527)
Signed-off-by: anamehra anamehra@cisco.com

Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
2023-09-13 14:10:56 -07:00
ShiyanWangMS
42126ccf7d
Revert "Upgrade Ansible to 6.7.0 and make Python3 as the default interpreter in sonic-mgmt-docker (#15836)" (#16537)
This reverts commit 51fb6d7d9f.

The new sonic-mgmt docker image has ansible upgraded. Encountered some issues that are hard to debug to have a quick fix. Let's revert the change for now. The new sonic-mgmt docker image was kept for further debugging and fixing. After all the issues are fixed, we'll need to apply this change again.
2023-09-13 16:20:17 +08:00
Zain Budhwani
337a9dbcf4
Add rsyslog plugin support for frr log (#16192)
### Why I did it

Currently there is only rsyslog plugin support for /var/log/syslog, meaning we do not detect events that occur in frr logs such as BGP Hold Timer Expiry that appears in frr/bgpd.log. 

##### Work item tracking
- Microsoft ADO **(number only)**: 13366345

#### How I did it

Add omprog action to frr/bgpd.log and frr/zebra.log. Add appropriate regex for both events.

#### How to verify it

sonic-mgmt test case
2023-09-12 16:53:45 -07:00
ShiyanWangMS
51fb6d7d9f
Upgrade Ansible to 6.7.0 and make Python3 as the default interpreter in sonic-mgmt-docker (#15836)
Why I did it
This PR is part of sonic-mgmt-docker Python3 migration project.

Work item tracking
Microsoft ADO (number only): 24397943

How I did it
Upgrade Ansible to 6.7.0
Make Python3 as the default interpreter. python is a soft link to python3. If you want to use python2, use the command python2 explicitly.
Upgrade some pip packages to higher version in order to meet security requirement.

How to verify it
Build a private sonic-mgmt-docker successfully.
Verify python is python3.
Verify python2 is working with 202012 and 202205 branch.
Verify python3 is working with master branch.
2023-09-12 17:34:57 +08:00
Saikrishna Arcot
f27aac7f0b
[ci] For vstest, make sure kernel modules are built and installed (#16479)
* [ci] For vstest, make sure kernel modules are built and installed

Make sure that the agent that vstest runs on has the team module
available. If it is not available, then build and install it.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Use version of script that's checked into sonic-swss-common

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-09-11 21:54:40 -07:00
Hua Liu
b0be5824f8
Write error message to syslog when add user failed or connect to TACACS server failed. (#16240)
Write error message to syslog when add user failed or connect to TACACS server failed.

#### Why I did it
With these messages, we can downgrade TACACS server with issue to lower priority.

##### Work item tracking
- Microsoft ADO: 24667696

#### How I did it
Write error message to syslog when add user failed or connect to TACACS server failed.

#### How to verify it
Pass all UT.
Manually verify error message generated.

### Description for the changelog
Write error message to syslog when add user failed or connect to TACACS server failed.
2023-09-11 15:35:54 -07:00
Liu Shilong
78415800a5
[ci] Disable building broadcom raw image because of S6100 device disk space limit. (#16516)
* [ci] Disable building broadcom raw image because of S6100 device disk space limit.
2023-09-11 15:10:03 -07:00
Christian Svensson
566fe1eb1b
[pddf] Enable deselect logic for CPLDMUX (#14631)
This feature was meant to be enabled but was accidentally left disabled.

Also downgrades the select/deselect messages to KERN_INFO to reduce log
spam.

Fixes #14546.

Signed-off-by: Christian Svensson <blue@cmd.nu>
2023-09-11 11:36:39 -07:00
Yaqiang Zhu
76b7cb8b64
[dhcp_server] Add dhcp_server container (#14031)
Why I did it
Add dhcp_server ipv4 feature to SONiC.
HLD: sonic-net/SONiC#1282

How I did it
To be clarify: This container is disabled by INCLUDE_DHCP_SERVER = n for now, which would cause container not build.

Add INCLUDE_DHCP_SERVER to indicate whether to build dhcp_server container
Add docker file for dhcp_server, build and install kea-dhcp4 inside container
Add template file for dhcp_server container services.
Add entry for dhcp_server to FEATURE table in config_db.
How to verify it
Build image with INCLUDE_DHCP_SERVER = y to verify:

Image can be install successfully without crush.
By config feature state dhcp_server enabled to enable dhcp_server.
2023-09-11 09:15:56 -07:00
vganesan-nokia
b13b41fc22
[swss] Chassis db clean up optimization and bug fixes (#16454)
* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
2023-09-11 08:28:27 -07:00
jcaiMR
9c1c82e9ff
add show dhcp_relay ipv4 counter entry, fix interface name offset issue (#16507)
Why I did it
Add another cli entry: show dhcp_relay ipv4 counter
Fix get all interface offset issue

Work item tracking
Microsoft ADO (17271822):
How I did it
show dhcp_relay ipv4 counter -i [ifname]
show dhcp4relay_counters counts -i [ifname]

How to verify it
show dhcp4relay_counters counts | more 10
Message Type Ethernet144(RX)
2023-09-11 21:08:06 +08:00
Yakiv Huryk
2b1c39e6f6
[vs] support for ARM build (#15692)
- Why I did it
To support the building of ARM-based docker-sonic-vs.gz

- How I did it
Fixed SYNCD_VS build rule to be architecture-specific.

- How to verify it
make configure PLATFORM=vs PLATFORM_ARCH=arm64
make target/docker-sonic-vs.gz

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-09-10 18:27:04 +03:00
mssonicbld
6f2f28975b
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16496) 2023-09-09 10:25:38 +08:00
ganglv
666879b867
Upgrade gnxi to support dash (#16498)
### Why I did it
Need new gnmi client for dash test.

### How I did it
I have updated gnxi repo, and this PR is used to get latest change.

#### How to verify it
Run end2end test for DASH.
2023-09-08 08:56:51 -07:00
mssonicbld
dae7022920
[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically (#16495)
#### Why I did it
src/sonic-mgmt-common
```
* ee3029d - (HEAD -> master, origin/master, origin/HEAD) DB Access Layer Merges: (#96) (11 hours ago) [a-barboza]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-09-08 18:32:54 +08:00
mssonicbld
084a6e1a3e
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16494)
#### Why I did it
src/sonic-linux-kernel
```
* fa40db7 - (HEAD -> master, origin/master, origin/HEAD) Change the system.map file permission only readable by root (#329) (21 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-09-08 16:32:40 +08:00
mssonicbld
7986aba097
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16497) 2023-09-08 14:57:35 +08:00
lixiaoyuner
4f53819efa
Install parted package for k8s master (#16484)
### Why I did it
Need a tool to extend disk size
##### Work item tracking
- Microsoft ADO **(number only)**: 25094467
#### How I did it
Install parted package
#### How to verify it
Use apt list parted command to check if it's installed
2023-09-07 23:22:47 -07:00
snider-nokia
2f69a0eaa6
[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#16348)
This likely fixes Nokia-ION/ndk#21

To fix a failure that results when edge condition results in MDIPC channel being freed with mismatched ownership.
2023-09-07 11:20:06 -07:00
Mai Bui
e07d435553
[telemetry] limit privileged flag for telemetry container (#16350)
Signed-off-by: Mai Bui <maibui@microsoft.com>
2023-09-07 11:04:11 -07:00
Arun Saravanan Balachandran
154c0c628b
[build] Change raw image disk size to 1700MB (#16463)
Maximum RAM availability for NOS to SONiC migration using raw image in Dell S6100 is 1700MB.
Raw images larger than that cannot be used for NOS to SONiC migration.
2023-09-07 09:19:54 -07:00
Arun Saravanan Balachandran
d04e3523cd
[build] Remove compression of raw image (#16462) 2023-09-07 09:19:17 -07:00
Arun Saravanan Balachandran
d758e44c2c
[build] Make the build to fail if raw image generation is not successful (#16461) 2023-09-07 09:15:03 -07:00
Dror Prital
d7b85af18b
[Mellanox] Update SDK/FW to 4.6.1062/2012.1062 Update SDK/FW/SAI to 4.6.1062/2012.1062/SAIBuild2211.25.1.4 (#16478)
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support

SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
2023-09-07 14:05:33 +03:00
mssonicbld
92d20cc9a3
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#16480)
#### Why I did it
src/sonic-gnmi
```
* 6fd461c - (HEAD -> master, origin/master, origin/HEAD) Get origin from prefix (#149) (17 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-09-07 18:34:19 +08:00
Aman Singhal
e22136dd9f
[cisco]: Enable Kdump config by default for cisco-8000 (#16224)
Why I did it
Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf.
After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot.

How I did it
Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms.

How to verify it
Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot.
Kdump config should persist on subsequent reboots and kdump loaded during bootup

Signed-off-by: Aman Singhal <amans@cisco.com>
2023-09-07 01:30:24 -07:00
Liu Shilong
52568ceab0
[action] Update workflow to parse & monitor pending automation PRs. (#16446)
Why I did it
There are many automation PRs pending for PR checker failure issue.
As PR number grows, github api to list prs comes to its limit.
We need to monitor and send alert for these PRs.

Work item tracking
Microsoft ADO (number only): 25064441
How I did it
For auto-cherry pick PRs:
- more than 3 days, comment @author to check
- more than 10 days, stop comment.
- more than 28 days, comment @author PR will be closed
- more than 30 days, close PR

For submodule update HEAD PRs:
- more than 3 days, send alert(submodule PR)

How to verify it
Which release bra
2023-09-07 13:34:34 +08:00
judyjoseph
7d2e3cb011
Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388)
* Change the CAK key length check in config plugin, macsec test profile changes

* Fix the format in add_profile api

The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier.
2023-09-06 21:11:02 -07:00
Saikrishna Arcot
065c35cc34
Add nlohmann-json3-dev package into the slave container (#16308)
### Why I did it

The json.hpp header file from that package is used in the sonic-swss-common build. An old version of that header file (from 2016) has been checked into the sonic-swss-common repo. However, since then, there have been changes to that header file, and starting with GCC 12 in Bookworm, generates some errors about variables being possibly uninitialized before use.

##### Work item tracking
- Microsoft ADO **(number only)**: 25027439

#### How I did it

To fix this, install the nlohmann-json3-dev package, and allow using the header file from the Debian package instead of a static checked-in version. The version in Debian Bullseye is much newer than this version.

#### How to verify it

With this change alone, sonic-swss-common will still be using the json.hpp file in its own codebase. The change to actually use the system header file instead of the local header file will happen in a separate PR in the necessary repoes.
2023-09-06 19:23:07 -07:00
Saikrishna Arcot
24ae0a9606
Don't build libhiredis anymore (#15633)
### Why I did it

We're not adding any patch on top of hiredis, and there's no apparent reason to build this. Remove the build step here, and just install the package from the Debian repos.

##### Work item tracking
- Microsoft ADO **(number only)**: 24381590

#### How to verify it

Build the SONiC image, and load it. Verify that services come up.
2023-09-06 16:23:34 -07:00
Kebo Liu
e286869b24
[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239)
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems

- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-06 11:32:08 +03:00
Konstantin Vasin
1e7db2ab01
[build]: Don't build ethtool from source (#15856)
Why I did it
There is no reason to build deb package ethtool from source code.
We can install the same version from Debian bullseye mirror.

How I did it
Remove ethtool Makefiles from sonic-buildimage.
Install ethtool via apt-get in pmon container.
2023-09-05 23:42:34 -07:00
mssonicbld
204579a0cc [ci/build]: Upgrade SONiC package versions 2023-09-06 12:32:47 +08:00
Prince George
a4e37a5cd6
[platform]: Disable interrupt for intel i2c-i801 driver (#16309)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
2023-09-05 10:23:57 -07:00
Pavan-Nokia
31194124b5
[armhf][Nokia-7215]Add HWSKU files for new SAI (#16321)
Add new easy bringup (EZB) files for new SAI 1.12.0
2023-09-05 10:21:53 -07:00
Rajkumar-Marvell
782a92213d
[Marvell] Update armhf sai debian to add SAI 1.12 support (#16299)
- SAI 1.12 support

Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
2023-09-05 10:20:27 -07:00
jcaiMR
a522a63e25
[dhcp-relay]: dhcp/dhcpv6 per interface counter support (#16377)
Why I did it
Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image.

Work item tracking
Microsoft ADO (17271822):

How I did it
- Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo
- Show CLI changes after counter format change

How to verify it
- Manually run show command
- dhcpmon, dhcprelay integration tests
2023-09-05 10:16:39 -07:00
Stephen Sun
b5e8c16134
[Mellanox] Enhance FW upgrade mechanism (#16090)
### Why I did it

1. Enhance the diagnosis information collecting mechanism
   - If the option `-v` is fed, it will pass additional diagnosis flags to mlxfwmanager
   - Collect all the output from mlxfwmanager and print them to syslog if it fails
2. Abort syncd in case waiting for device or upgrading firmware fails

Signed-off-by: Stephen Sun <stephens@nvidia.com>

### How I did it

#### How to verify it

Regression and manual test
2023-09-04 11:28:53 -07:00
Vadym Hlushko
78587cedc3
[Mellanox] Remove mlxtrace support for SPC4 (#16373)
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.

- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-04 10:53:20 +03:00
mssonicbld
c787d51f29
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16391)
#### Why I did it
src/sonic-linux-kernel
```
* 7ee50c9 - (HEAD -> master, origin/master, origin/HEAD) [Mellanox] Upstream kernel patches with HW-MGMT 7.0030.1011 (#327) (29 hours ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-09-03 18:33:09 +08:00
mssonicbld
ccfef69ac4
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16392)
#### Why I did it
src/sonic-platform-daemons
```
* c1c43f6 - (HEAD -> master, origin/master, origin/HEAD) [pmon][chassis][voq] Chassis DB cleanup when module is down (#394) (2 days ago) [vganesan-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-09-03 18:33:05 +08:00
Yoush
559151b41e
[centec]: update sonic master centec-sai reference to v1.12.0-1 (#16238)
Signed-off-by: yoush <yoush@centec.com>
2023-09-01 23:22:00 -07:00
Vadym Hlushko
9e3fdded69
[Mellanox][SFP] Remove unused function parameter (#16318)
Why I did it
To avoid errors when the sfputil show error-status -hw is called from the host OS (not from the pmon docker).

How I did it
Remove the self.sdk_handle parameter from the _get_module_info() function.

How to verify it
Execute the sfputil show error-status -hw

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-01 23:06:04 -07:00
Mai Bui
ff5f46955c
[database] make Redis process runs as non-root user (#16326)
Why I did it
Running the Redis server as the "root" user is not recommended. It is suggested that the server should be operated by a non-privileged user.

Work item tracking
Microsoft ADO (number only): 15895240

How I did it
Ensure the Redis process is operating under the 'redis' user in supervisord and make redis user own REDIS_DIR inside db container.

How to verify it
Built new image, verify redis process is running as 'redis' user and all containers are up.

Signed-off-by: Mai Bui <maibui@microsoft.com>
2023-09-01 23:03:15 -07:00
Zain Budhwani
84cfc3bc69
[eventd]: Remove unnecessary log (#16166)
Work item tracking
Microsoft ADO (number only): 16789053
2023-09-01 23:01:46 -07:00
Riff
7c1d720a65
[sonic-mgmt]: Adding sshconf 0.2.5 into sonic-mgmt container. (#16344)
Why I did it
This change is to help us running SSH config generation for our testbed in mgmt container.

Original PR in sonic-mgmt repo can be found here: sonic-net/sonic-mgmt#9773.

Work item tracking
Microsoft ADO (number only): 25007799

How I did it
Updating sonic-mgmt docker file to add sshconf 0.2.5 into pip install under venv.
2023-09-01 22:58:27 -07:00
Andrew Sapronov
0405b369af
[Netberg][Barefoot] Added support for Aurora 750 (#16342)
Why I did it
Support Intel Tofino based platforms Netberg Aurora 750
ASIC: Intel Tofino BFN-T10-064Q
Pors: 64x 100G

How I did it
Added specification to device/netberg directory
Added platform/barefoot/sonic-platform-modules-netberg contains kernel modules, scripts and sonic_platform packages.
Modified the platform/barefoot/platform-modules-netberg.mk to include Aurora 750 related ID.

Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>
2023-09-01 22:52:39 -07:00
Guohan Lu
3bdfdd95ea Revert "[Ragile]: Add new centec platform ra-b6010 (#14819)"
This reverts commit 75062436e8.
2023-09-01 22:43:18 -07:00
anamehra
f6897bb585
chassis-packet: Update arp_update script for FAILED and STALE check (#16311)
chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
2023-09-01 11:41:46 -07:00