- Why I did it
Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely.
- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf
- How to verify it
run regression on the MSN2700 platform to make the error log will not be printed to the syslog.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
src/sonic-platform-common
```
* 6d804d6 - (HEAD -> master, origin/master, origin/HEAD) Fix SSD health percentage issue for vendor Virtium (#407) (3 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Reduce SONiC image filesystem size
Add a build option to reduce the image size.
The image reduction process is affecting the builds in 2 ways:
- change some packages that are installed in the rootfs
- apply a rootfs reduction script
The script itself will perform a few steps:
- remove file duplication by leveraging hardlinks
- under /usr/share/sonic since the symlinks under the device folder are lost during the build.
- under /var/lib/docker since the files there will only be mounted ro
- remove some extra files (man, docs, licenses, ...)
- some image specific space reduction (only for aboot images currently)
The script can later be improved but for now it's reducing the rootfs
size by ~30%.
* restore fully featured vim package
Why I did it
Add config to set pip HTTP timeout value in building process for build to be more stable.
Default value is 60.
Work item tracking
Microsoft ADO (number only): 25190067
How I did it
Insert timeout options in all pip commands.
Why I did it
K8S_OPTIONS maybe empty, so there will be syntax error. Need to fix this issue.
Work item tracking
Microsoft ADO (number only): 25495020
How I did it
Add "" for K8S_OPTIONS to avoid exception.
How to verify it
No more exception is throwed in PR build validation pipeline.
Why I did it
Part implementation of dhcp_server. HLD: sonic-net/SONiC#1282.
Add dhcpservd to dhcp_server container.
How I did it
Add installing required pkg (psutil) in Dockerfile.
Add copying required file to container in Dockerfile (kea-dhcp related and dhcpservd related)
Add critical_process and supervisor config.
Add support for generating kea config (only in dhcpservd.py) and updating lease table (in dhcpservd.py and lease_update.sh)
How to verify it
Build image with setting INCLUDE_DHCP_SERVER to y and enabled dhcp_server feature after installed image, container start as expected.
Enter container and found that all processes defined in supervisor configuration running as expected.
Kill processes defined in critical_processes, container exist.
#### Why I did it
src/sonic-utilities
```
* 244ad2d6 - (HEAD -> master, origin/master, origin/HEAD) Revert "Remove syslog service validator in GCU (#2991)" (#3015) (2 hours ago) [jingwenxie]
* d857eb09 - [db_migrator] Fix the broken version chain (#3014) (11 hours ago) [Vivek]
* 424be9ca - [fwutil] Fix python SyntaxWarning for 'is' with literals (#3013) (23 hours ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
### Why I did it
Privileges and volumes were incorrectly set in macsec container. Privileged flag is set to false and volumes are not mounted properly.
```
admin@vlab-01:~$ docker inspect macsec0 | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec0 | grep -A 10 Binds
"Binds": [
"/var/run/redis0:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0/Nokia-IXR7250E-36x100G/0:/usr/share/sonic/hwsku:ro",
"/var/run/redis0/:/var/run/redis0/:rw",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0:/usr/share/sonic/platform:ro"
],
```
### How I did it
#### How to verify it
Make sure privileged settings remain unchanged and make sure volumes are properly mounted
```
admin@vlab-01:~$ docker inspect macsec | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec | grep -A 10 Binds
"Binds": [
"/etc/timezone:/etc/timezone:ro",
"/var/run/redis:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/etc/fips/fips_enable:/etc/fips/fips_enable:ro",
"/usr/share/sonic/templates/rsyslog-container.conf.j2:/usr/share/sonic/templates/rsyslog-container.conf.j2:ro",
"/etc/sonic:/etc/sonic:ro",
"/host/warmboot:/var/warmboot",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/:/usr/share/sonic/hwsku:ro",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0:/usr/share/sonic/platform:ro"
],
```
Why I did it
RFS cache have issues which breaks official build and PR checker.
By reading cache, fsroot-vs/lib/modules folder don't exist.
Work item tracking
Microsoft ADO (number only): 25481484
How I did it
Disable read cache currently.
How to verify it
### Why I did it
[Security] Upgrade the OpenSSL/OpenSSH to fix CVE alerts
Upgrade OpenSSL to 1.1.1n-0+deb11u5
Fix CVEs:
CVE-2023-0464 (Excessive Resource Usage Verifying X.509 Policy
CVE-2023-0465 (Invalid certificate policies in leaf certificates are
CVE-2023-0466 (Certificate policy check not enabled).
CVE-2022-4304 (Timing Oracle in RSA Decryption).
CVE-2023-2650 (Possible DoS translating ASN.1 object identifiers).
Upgrade OpenSSH to 8.4p1-5+deb11u2
Fix CVEs:
CVE-2023-38408 (Lacks SSH agent restriction)
##### Work item tracking
- Microsoft ADO **(number only)**: 25506776
#### How I did it
Upgrade the OpenSSL/OpenSSH package version and fix the UT failure.
#### How to verify it
Verified by UTs with and without FIPS enabled.
- Why I did it
Add an ability to add arm64 mellanox specific kconfig using the integration tool
Fix the existing duplicate kconfig problem by using the vanilla .config
Add an ability to patch kconfig-inclusions file. Renamed series.patch to external-changes.patch to reflect the behavior
NOTE: Min hw-mgmt version to use with these changes: V.7.0030.2000 not yet upstream but required prio to it.
This option will be enabled one the new hw mgmt will be upstream.
Depends on sonic-net/sonic-linux-kernel#336
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
#### Why I did it
src/sonic-swss
```
* f31ccd09 - (HEAD -> master, origin/master, origin/HEAD) Add refillToSync() into ConsumerBase to support warmboot. (#2866) (2 days ago) [mint570]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-gnmi
```
* 07e0b36 - (HEAD -> master, origin/master, origin/HEAD) Recover from potential panic when doing map to JSON serialization (#161) (29 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* 6508505 - (HEAD -> master, origin/master, origin/HEAD) Add drop monitor Kernel Patches for buffer support (#338) (3 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-restapi
```
* ccad4a2 - (HEAD -> master, origin/master, origin/HEAD) [Tunnel] Support co-existence of IPv4 and IPv6 tunnels (#147) (8 hours ago) [Prince Sunny]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Added Marvell SAI-1.13.0 debian support for x86_64 platform.
Work item tracking
Microsoft ADO (number only):
How I did it
compile marvel libsai.so (with SAI headers from version 1.13.0) and package it with version 1.13.0-1
How to verify it
* Re-add missing dependency for derived debs.
My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Add the kernel and initramfs as dependencies for RFS build
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.
How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.
How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
#### Why I did it
src/sonic-linux-kernel
```
* fee7d7e - (HEAD -> master, origin/master, origin/HEAD) Add nvidia arm section and an ability to patch kconfig-inc and fix manage-config (#336) (3 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* b9313df0 - (HEAD -> master, origin/master, origin/HEAD) Reducing the severity of oper fec attribute get failure (#2924) (89 minutes ago) [Sudharsan Dhamal Gopalarathnam]
* cb98893f - Add support for SEND_TO_INGRESS port table. (#2816) (19 hours ago) [Yilan Ji]
* 966c5bb0 - [Dash] Fix wrong table name for acl_out_table (#2911) (2 days ago) [Ze Gan]
* 35996350 - [FEC]Auto FEC initial changes (#2893) (8 days ago) [Sudharsan Dhamal Gopalarathnam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 65323ca - (HEAD -> master, origin/master, origin/HEAD) [VOQ][saidump] To move saidump.sh from the sonic-buildimage repo to the sairedis repo (#1298) (3 days ago) [JunhongMao]
* d520642 - [syncd] Respect each api log level after sai discovery (#1303) (3 days ago) [Kamil Cudnik]
* 7c07d81 - [vslib]: Fix method signatures. (#1299) (3 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 76a8590 - (HEAD -> master, origin/master, origin/HEAD) Fix exception occurred during decode vendor name and pn (#406) (2 days ago) [Anoop Kamath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* bf9c07c4 - (HEAD -> master, origin/master, origin/HEAD) Add target mode to sfputil firmware (#3002) (22 hours ago) [Anoop Kamath]
* 0e43e4dc - [sflow] Added egress Sflow support. (#2790) (2 days ago) [Rajkumar-Marvell]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-ztp
```
* 739470d - (HEAD -> master, origin/master, origin/HEAD) [ZTP] 'config reload' use -f to avoid system checks (#52) (4 hours ago) [Peter Yu]
* 04cd8e8 - [ZTP] bufsize=1 not supported in binary mode (#51) (4 hours ago) [Peter Yu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Improve per-command authorization performance by read passwd entry with getpwent.
#### Why I did it
Currently per-command authorization will check if user is remote user with getpwnam API, which will trigger tacplus-nss for authentication with TACACS server.
But this is not necessary because when user login the user information already add to local passwd file.
Use getpwent API can directly read from passwd file, this will improve per-command authorization performance.
##### Work item tracking
- Microsoft ADO: 25104723
#### How I did it
Improve per-command authorization performance by read passwd entry with getpwent.
#### How to verify it
Pass all UT.
### Description for the changelog
Improve per-command authorization performance by read passwd entry with getpwent.
#### Why I did it
src/sonic-gnmi
```
* 8e13400 - (HEAD -> master, origin/master, origin/HEAD) Fix random build failures due to sonic_internal.proto (#157) (3 days ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-restapi
```
* c8fa96b - (HEAD -> master, origin/master, origin/HEAD) Remove command to install libhiredis deb file (#146) (23 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Don't install dependencies of derived debs
When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.
* Re-add openssh-client and openssh-sftp-server as derived debs
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run dhcprelay sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
```
admin@vlab-05:~$ docker inspect dhcp_relay | grep Privilege
"Privileged": false,
admin@vlab-05:~$ docker exec -it dhcp_relay bash
root@vlab-05:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
```
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Change-Id: Id5b011e6bd67accf7b1579d91cb7affad464e916
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh
How I did it
Remove this PCIe device from pcie.yaml
How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.
ADO: 25447788
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408.
Since we're building openssh with some patches, we need to update our version as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This adds optimization for the SONiC image build by splitting the final build step into two stages. It allows running the first stage in parallel, improving build time.
The optimization is enabled via new rules/config flag ENABLE_RFS_SPLIT_BUILD (disabled by default)
- Why I did it
To improve a build time.
- How I did it
Added a logic to run build_debian.sh in two stages, transferring the progress via a new build artifact.
- How to verify it
make ENABLE_RFS_SPLIT_BUILD=y SONIC_BUILD_JOBS=32 target/<IMAGE_NAME>.bin
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.
Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example
- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.
Above scenario can result in packet mis-forwarding on data plane
How I fixed it:-
To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature
neighbor PEER ttl-security hops NUMBER
This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.
We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.
How I verify:
Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:
8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
- Why I did it
To simplify usability and increase adoption of the sFlow + dropmon feature without rebuilding an image.
- How I did it
Remove the ENABLE_SFLOW_DROPMON compilation flag, and remove unnecessary patches.
- How to verify it
1. Configure the sFlow on the switch
2. Configure the Host (PTF)
3. Launch the sflowtool on Host (PTF)
4. Send the dropped packets from Host (PTF) to the switch via scapy
5. Check the L3 counters on the switch
6. Check the samples that were captured by the sflowtool on the Host (PTF)
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
- Why I did it
hw-management renamed PSU temperature related sysfs:
psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.
- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold
- How to verify it
Manual test
sonic-mgmt Regression test
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.
- How I did it
Add new SKU with all relevant files.
- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Enable ZMQ on gnmi and orchagent
#### Why I did it
Improve GNMI API performance for Dash resources
#### How I did it
Modify gnmi and orchagent service start script, add ZMQ parameter.
#### How to verify it
Pass all UT & E2E test
Manually verify with create Dash resources via gnmi API.
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.
- How I did it
Added vendor SAI config options
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>