Commit Graph

8232 Commits

Author SHA1 Message Date
Kebo Liu
31451295d5
Add special rsyslog filter for MSN2700 platform (#16684)
- Why I did it
Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely.

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2700 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-10-24 17:54:44 +03:00
mssonicbld
72a7051690
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16978)
#### Why I did it
src/sonic-platform-common
```
* 6d804d6 - (HEAD -> master, origin/master, origin/HEAD) Fix SSD health percentage issue for vendor Virtium (#407) (3 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-24 16:33:04 +08:00
mssonicbld
9f7dfc4668
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16980) 2023-10-24 16:27:30 +08:00
mssonicbld
f8d4614683
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16981) 2023-10-24 15:57:33 +08:00
Samuel Angebault
e4a497183a
Add build option to reduce final image size (#16729)
* Reduce SONiC image filesystem size

Add a build option to reduce the image size.
The image reduction process is affecting the builds in 2 ways:
 - change some packages that are installed in the rootfs
 - apply a rootfs reduction script

The script itself will perform a few steps:
 - remove file duplication by leveraging hardlinks
   - under /usr/share/sonic since the symlinks under the device folder are lost during the build.
   - under /var/lib/docker since the files there will only be mounted ro
 - remove some extra files (man, docs, licenses, ...)
 - some image specific space reduction (only for aboot images currently)

The script can later be improved but for now it's reducing the rootfs
size by ~30%.

* restore fully featured vim package
2023-10-24 10:01:58 +08:00
Liu Shilong
1eae34993e
[build] Add config to set pip http timeout (#16748)
Why I did it
Add config to set pip HTTP timeout value in building process for build to be more stable.
Default value is 60.

Work item tracking
Microsoft ADO (number only): 25190067
How I did it
Insert timeout options in all pip commands.
2023-10-23 18:05:22 +08:00
Junchao-Mellanox
c2edc6f9d5
Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)" (#16956)
This reverts commit 0846322e9a.
2023-10-23 11:55:27 +03:00
lixiaoyuner
b2c8ad8b10
Fix K8S_OPTIONS maybe empty issue (#16891)
Why I did it
K8S_OPTIONS maybe empty, so there will be syntax error. Need to fix this issue.

Work item tracking
Microsoft ADO (number only): 25495020
How I did it
Add "" for K8S_OPTIONS to avoid exception.

How to verify it
No more exception is throwed in PR build validation pipeline.
2023-10-23 14:02:07 +08:00
Yaqiang Zhu
73dd38a5ce
[dhcp_server] Add dhcpservd to dhcp_server container (#16560)
Why I did it
Part implementation of dhcp_server. HLD: sonic-net/SONiC#1282.
Add dhcpservd to dhcp_server container.

How I did it
Add installing required pkg (psutil) in Dockerfile.
Add copying required file to container in Dockerfile (kea-dhcp related and dhcpservd related)
Add critical_process and supervisor config.
Add support for generating kea config (only in dhcpservd.py) and updating lease table (in dhcpservd.py and lease_update.sh)

How to verify it
Build image with setting INCLUDE_DHCP_SERVER to y and enabled dhcp_server feature after installed image, container start as expected.
Enter container and found that all processes defined in supervisor configuration running as expected.
Kill processes defined in critical_processes, container exist.
2023-10-20 09:52:05 -07:00
mssonicbld
1dd0becda0
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16953)
#### Why I did it
src/sonic-utilities
```
* 244ad2d6 - (HEAD -> master, origin/master, origin/HEAD) Revert "Remove syslog service validator in GCU (#2991)" (#3015) (2 hours ago) [jingwenxie]
* d857eb09 - [db_migrator] Fix the broken version chain (#3014) (11 hours ago) [Vivek]
* 424be9ca - [fwutil] Fix python SyntaxWarning for 'is' with literals (#3013) (23 hours ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-20 16:33:00 +08:00
Mai Bui
a850f8b2f5
Fix privileged and volumes for macsec container (#16894)
### Why I did it
Privileges and volumes were incorrectly set in macsec container. Privileged flag is set to false and volumes are not mounted properly.
 ```
admin@vlab-01:~$ docker inspect macsec0 | grep Privi
            "Privileged": false,
admin@vlab-01:~$ docker inspect macsec0 | grep -A 10 Binds
            "Binds": [
                "/var/run/redis0:/var/run/redis:rw",
                "/var/run/redis-chassis:/var/run/redis-chassis:ro",
                "/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0/Nokia-IXR7250E-36x100G/0:/usr/share/sonic/hwsku:ro",
                "/var/run/redis0/:/var/run/redis0/:rw",
                "/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0:/usr/share/sonic/platform:ro"
            ],
```
### How I did it

#### How to verify it
Make sure privileged settings remain unchanged and make sure volumes are properly mounted
```
admin@vlab-01:~$ docker inspect macsec | grep Privi
            "Privileged": false,
admin@vlab-01:~$ docker inspect macsec | grep -A 10 Binds
            "Binds": [
                "/etc/timezone:/etc/timezone:ro",
                "/var/run/redis:/var/run/redis:rw",
                "/var/run/redis-chassis:/var/run/redis-chassis:ro",
                "/etc/fips/fips_enable:/etc/fips/fips_enable:ro",
                "/usr/share/sonic/templates/rsyslog-container.conf.j2:/usr/share/sonic/templates/rsyslog-container.conf.j2:ro",
                "/etc/sonic:/etc/sonic:ro",
                "/host/warmboot:/var/warmboot",
                "/usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/:/usr/share/sonic/hwsku:ro",
                "/usr/share/sonic/device/x86_64-kvm_x86_64-r0:/usr/share/sonic/platform:ro"
            ],
```
2023-10-19 11:17:05 -07:00
Liu Shilong
25842ec6d1
Disable read cache when building SONiC fs part 1 (#16936)
Why I did it
RFS cache have issues which breaks official build and PR checker.
By reading cache, fsroot-vs/lib/modules folder don't exist.

Work item tracking
Microsoft ADO (number only): 25481484
How I did it
Disable read cache currently.

How to verify it
2023-10-19 10:14:10 +08:00
Stepan Blyshchak
7ab27c1b90
[frr] fix default zebra config not inserted into empty zebra.conf (#16747)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-10-19 08:47:24 +08:00
xumia
5f224327a9
[Security] Upgrade the OpenSSL/OpenSSH to fix CVE alerts (#16902)
### Why I did it
[Security] Upgrade the OpenSSL/OpenSSH to fix CVE alerts

Upgrade OpenSSL to 1.1.1n-0+deb11u5
Fix CVEs:
      CVE-2023-0464 (Excessive Resource Usage Verifying X.509 Policy
      CVE-2023-0465 (Invalid certificate policies in leaf certificates are
      CVE-2023-0466 (Certificate policy check not enabled).
      CVE-2022-4304 (Timing Oracle in RSA Decryption).
      CVE-2023-2650 (Possible DoS translating ASN.1 object identifiers).

Upgrade OpenSSH to 8.4p1-5+deb11u2
Fix CVEs:
    CVE-2023-38408 (Lacks SSH agent restriction)

##### Work item tracking
- Microsoft ADO **(number only)**: 25506776

#### How I did it
Upgrade the OpenSSL/OpenSSH package version and fix the UT failure.

#### How to verify it
Verified by UTs with and without FIPS enabled.
2023-10-18 15:52:26 -07:00
Vivek
6410e66f35
[Mellanox] Enhance the processing of Kconfig in the hw-mgmt integration (#16752)
- Why I did it
Add an ability to add arm64 mellanox specific kconfig using the integration tool
Fix the existing duplicate kconfig problem by using the vanilla .config
Add an ability to patch kconfig-inclusions file. Renamed series.patch to external-changes.patch to reflect the behavior
NOTE: Min hw-mgmt version to use with these changes: V.7.0030.2000 not yet upstream but required prio to it.
This option will be enabled one the new hw mgmt will be upstream.

Depends on sonic-net/sonic-linux-kernel#336

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-10-18 19:32:59 +03:00
mssonicbld
0aa0854113
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16889)
#### Why I did it
src/sonic-swss
```
* f31ccd09 - (HEAD -> master, origin/master, origin/HEAD) Add refillToSync() into ConsumerBase to support warmboot. (#2866) (2 days ago) [mint570]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-18 18:34:21 +08:00
mssonicbld
38749e82d6
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#16900)
#### Why I did it
src/sonic-gnmi
```
* 07e0b36 - (HEAD -> master, origin/master, origin/HEAD) Recover from potential panic when doing map to JSON serialization (#161) (29 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-18 18:34:15 +08:00
mssonicbld
dd0d4a7689
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16931)
#### Why I did it
src/sonic-linux-kernel
```
* 6508505 - (HEAD -> master, origin/master, origin/HEAD) Add drop monitor Kernel Patches for buffer support (#338) (3 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-18 18:33:58 +08:00
mssonicbld
c90bffebbd
[submodule] Update submodule sonic-restapi to the latest HEAD automatically (#16932)
#### Why I did it
src/sonic-restapi
```
* ccad4a2 - (HEAD -> master, origin/master, origin/HEAD) [Tunnel] Support co-existence of IPv4 and IPv6 tunnels (#147) (8 hours ago) [Prince Sunny]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-18 18:33:53 +08:00
Rajkumar-Marvell
357ab54e08
[Marvell] Updated SAI 1.13.0 amd64 debian (#16811)
Why I did it
Added Marvell SAI-1.13.0 debian support for x86_64 platform.

Work item tracking
Microsoft ADO (number only):
How I did it
compile marvel libsai.so (with SAI headers from version 1.13.0) and package it with version 1.13.0-1

How to verify it
2023-10-18 16:47:53 +08:00
Saikrishna Arcot
963d40a77b
Re-add missing dependency for derived debs. (#16896)
* Re-add missing dependency for derived debs.

My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Add the kernel and initramfs as dependencies for RFS build

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-18 10:08:45 +08:00
mssonicbld
5ce2a71dff
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16885) 2023-10-14 15:01:31 +08:00
Samuel Angebault
d760fb928c
Disable CPU C-States other than C1 (#16703)
Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.

How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.

How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
2023-10-13 20:24:39 -07:00
mssonicbld
f88a5f5d2c
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16835)
#### Why I did it
src/sonic-linux-kernel
```
* fee7d7e - (HEAD -> master, origin/master, origin/HEAD) Add nvidia arm section and an ability to patch kconfig-inc and fix manage-config (#336) (3 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 10:32:24 +08:00
mssonicbld
07827d3776
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16785)
#### Why I did it
src/sonic-swss
```
* b9313df0 - (HEAD -> master, origin/master, origin/HEAD) Reducing the severity of oper fec attribute get failure (#2924) (89 minutes ago) [Sudharsan Dhamal Gopalarathnam]
* cb98893f - Add support for SEND_TO_INGRESS port table.  (#2816) (19 hours ago) [Yilan Ji]
* 966c5bb0 - [Dash] Fix wrong table name for acl_out_table (#2911) (2 days ago) [Ze Gan]
* 35996350 - [FEC]Auto FEC initial changes (#2893) (8 days ago) [Sudharsan Dhamal Gopalarathnam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 09:29:04 +08:00
mssonicbld
cc4eda78e0
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#16836)
#### Why I did it
src/sonic-sairedis
```
* 65323ca - (HEAD -> master, origin/master, origin/HEAD) [VOQ][saidump] To move saidump.sh from the sonic-buildimage repo to the sairedis repo (#1298) (3 days ago) [JunhongMao]
* d520642 - [syncd] Respect each api log level after sai discovery (#1303) (3 days ago) [Kamil Cudnik]
* 7c07d81 - [vslib]: Fix method signatures. (#1299) (3 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 09:28:56 +08:00
mssonicbld
64282bf723
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16857)
#### Why I did it
src/sonic-platform-common
```
* 76a8590 - (HEAD -> master, origin/master, origin/HEAD) Fix exception occurred during decode vendor name and pn (#406) (2 days ago) [Anoop Kamath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 09:28:50 +08:00
mssonicbld
0e964bf72f
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16858)
#### Why I did it
src/sonic-utilities
```
* bf9c07c4 - (HEAD -> master, origin/master, origin/HEAD) Add target mode to sfputil firmware (#3002) (22 hours ago) [Anoop Kamath]
* 0e43e4dc - [sflow] Added egress Sflow support. (#2790) (2 days ago) [Rajkumar-Marvell]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 09:28:45 +08:00
mssonicbld
6693b63d86
[submodule] Update submodule sonic-ztp to the latest HEAD automatically (#16876)
#### Why I did it
src/sonic-ztp
```
* 739470d - (HEAD -> master, origin/master, origin/HEAD) [ZTP] 'config reload' use -f to avoid system checks (#52) (4 hours ago) [Peter Yu]
* 04cd8e8 - [ZTP] bufsize=1 not supported in binary mode (#51) (4 hours ago) [Peter Yu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 09:28:41 +08:00
Hua Liu
f0d88f3c5c
[TACACS] Improve per-command authorization performance by read passwd entry with getpwent (#16460)
Improve per-command authorization performance by read passwd entry with getpwent.

#### Why I did it
Currently per-command authorization will check if user is remote user with getpwnam API, which will trigger tacplus-nss for authentication with TACACS server.
But this is not necessary because when user login the user information already add to local passwd file.
Use getpwent API can directly read from passwd file, this will improve per-command authorization performance.

##### Work item tracking
- Microsoft ADO: 25104723

#### How I did it
Improve per-command authorization performance by read passwd entry with getpwent.

#### How to verify it
Pass all UT.

### Description for the changelog
Improve per-command authorization performance by read passwd entry with getpwent.
2023-10-13 17:43:10 -07:00
Longxiang Lyu
072eaed2e3
[snmp] Check intfmgrd running before start (#16588)
Add pre start check to ensure intfmgrd is running.
The check will run for 20 seconds at most.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2023-10-13 16:00:51 -07:00
mssonicbld
465ccde3d5
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#16833)
#### Why I did it
src/sonic-gnmi
```
* 8e13400 - (HEAD -> master, origin/master, origin/HEAD) Fix random build failures due to sonic_internal.proto (#157) (3 days ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 06:32:28 +08:00
mssonicbld
35b6d3f6ed
[submodule] Update submodule sonic-restapi to the latest HEAD automatically (#16871)
#### Why I did it
src/sonic-restapi
```
* c8fa96b - (HEAD -> master, origin/master, origin/HEAD) Remove command to install libhiredis deb file (#146) (23 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-10-14 06:32:18 +08:00
mssonicbld
43c474a80b
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#16872) 2023-10-14 06:21:25 +08:00
Saikrishna Arcot
9ae77bc2dd
Remove main deb installation for derived deb build (#16859)
* Don't install dependencies of derived debs

When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.

* Re-add openssh-client and openssh-sftp-server as derived debs

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-13 10:16:45 -07:00
Mai Bui
23badd68ea
[docker-dhcp-relay] limit privileged flag for dhcp_relay container (#16817)
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420

#### How I did it
Reduce linux capabilities in privileged flag

#### How to verify it
Run dhcprelay sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
```
admin@vlab-05:~$ docker inspect dhcp_relay | grep Privilege
            "Privileged": false,
admin@vlab-05:~$ docker exec -it dhcp_relay bash
root@vlab-05:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
```
2023-10-13 10:05:54 -07:00
Pavan Naregundi
add98b221b [Marvell-arm64]: Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.

Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-10-12 02:07:36 -07:00
Pavan Naregundi
5c5e4c77f4 [Marvell-arm64] Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).

Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.

Change-Id: Id5b011e6bd67accf7b1579d91cb7affad464e916
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-10-12 02:07:36 -07:00
Samuel Angebault
be22217b64
[Arista] Remove pcie device monitoring for 7260CX3-64 (#12734)
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh

How I did it
Remove this PCIe device from pcie.yaml

How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.

ADO: 25447788
2023-10-11 22:57:34 -07:00
Saikrishna Arcot
469aed2cf7
[baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826)
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. 
Since we're building openssh with some patches, we need to update our version as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-11 10:42:20 -07:00
Ashwin Srinivasan
61683d9d64
Revert "Move /var/log to RAM for Mellanox SN2700, Nokia 7215 and Dell S6100 (#15077)" (#16775)
This reverts commit 05f326eed9.

Microsoft ADO 25355843:
2023-10-11 10:36:29 -07:00
mssonicbld
ac77abe70b
[submodule] Update submodule sonic-snmpagent to the latest HEAD automatically (#16837) 2023-10-11 14:47:21 +08:00
Yakiv Huryk
6cb8893180
[build] add support for 2 stage rootfs build (#15924)
This adds optimization for the SONiC image build by splitting the final build step into two stages. It allows running the first stage in parallel, improving build time.

The optimization is enabled via new rules/config flag ENABLE_RFS_SPLIT_BUILD (disabled by default)

- Why I did it
To improve a build time.

- How I did it
Added a logic to run build_debian.sh in two stages, transferring the progress via a new build artifact.

- How to verify it
make ENABLE_RFS_SPLIT_BUILD=y SONIC_BUILD_JOBS=32 target/<IMAGE_NAME>.bin

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-10-11 09:33:17 +03:00
abdosi
7059f42385
[chassis/multi-asic] Make sure iBGP session established as directly connected (#16777)
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.

Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example

- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default  where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.

Above scenario can result in packet mis-forwarding on data plane

How I fixed it:-

To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature

neighbor PEER ttl-security hops NUMBER

This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.

We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.

How I verify:

Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-10-10 11:51:40 -07:00
zitingguo-ms
7f706329f8
upgrade xgs SAI version to 8.4.21.0 (#16805)
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:

8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-10-10 09:59:15 -07:00
Vadym Hlushko
9d5bcdae74
[sflow]: Remove the ENABLE_SFLOW_DROPMON flag (#16607)
- Why I did it
To simplify usability and increase adoption of the sFlow + dropmon feature without rebuilding an image.

- How I did it
Remove the ENABLE_SFLOW_DROPMON compilation flag, and remove unnecessary patches.

- How to verify it
1. Configure the sFlow on the switch
2. Configure the Host (PTF)
3. Launch the sflowtool on Host (PTF)
4. Send the dropped packets from Host (PTF) to the switch via scapy
5. Check the L3 counters on the switch
6. Check the samples that were captured by the sflowtool on the Host (PTF)

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-10 19:27:12 +03:00
Junchao-Mellanox
0846322e9a
[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)
- Why I did it
hw-management renamed PSU temperature related sysfs:

psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.

- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold

- How to verify it
Manual test
sonic-mgmt Regression test
2023-10-10 19:21:27 +03:00
Yakiv Huryk
5719d1a59a
[Mellanox] add Mellanox-SN4700-O28 SKU (#16784)
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.

- How I did it
Add new SKU with all relevant files.

- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-10-10 19:20:10 +03:00
Hua Liu
6e3260098f
Enable ZMQ between GNMI and Orchanget (#16661)
Enable ZMQ on gnmi and orchagent

#### Why I did it
Improve GNMI API performance for Dash resources

#### How I did it
Modify gnmi and orchagent service start script, add ZMQ parameter.

#### How to verify it
Pass all UT & E2E test
Manually verify with create Dash resources via gnmi API.
2023-10-09 14:22:50 -07:00
Nazarii Hnydyn
875a6d9a1f
[Mellanox][Switching Mode] Enable Store-And-Forward switching mode on specific platforms (#16781)
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.

- How I did it
Added vendor SAI config options

- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-10-09 19:00:02 +03:00