Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.
How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.
How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
#### Why I did it
src/sonic-linux-kernel
```
* fee7d7e - (HEAD -> master, origin/master, origin/HEAD) Add nvidia arm section and an ability to patch kconfig-inc and fix manage-config (#336) (3 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* b9313df0 - (HEAD -> master, origin/master, origin/HEAD) Reducing the severity of oper fec attribute get failure (#2924) (89 minutes ago) [Sudharsan Dhamal Gopalarathnam]
* cb98893f - Add support for SEND_TO_INGRESS port table. (#2816) (19 hours ago) [Yilan Ji]
* 966c5bb0 - [Dash] Fix wrong table name for acl_out_table (#2911) (2 days ago) [Ze Gan]
* 35996350 - [FEC]Auto FEC initial changes (#2893) (8 days ago) [Sudharsan Dhamal Gopalarathnam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 65323ca - (HEAD -> master, origin/master, origin/HEAD) [VOQ][saidump] To move saidump.sh from the sonic-buildimage repo to the sairedis repo (#1298) (3 days ago) [JunhongMao]
* d520642 - [syncd] Respect each api log level after sai discovery (#1303) (3 days ago) [Kamil Cudnik]
* 7c07d81 - [vslib]: Fix method signatures. (#1299) (3 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 76a8590 - (HEAD -> master, origin/master, origin/HEAD) Fix exception occurred during decode vendor name and pn (#406) (2 days ago) [Anoop Kamath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* bf9c07c4 - (HEAD -> master, origin/master, origin/HEAD) Add target mode to sfputil firmware (#3002) (22 hours ago) [Anoop Kamath]
* 0e43e4dc - [sflow] Added egress Sflow support. (#2790) (2 days ago) [Rajkumar-Marvell]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-ztp
```
* 739470d - (HEAD -> master, origin/master, origin/HEAD) [ZTP] 'config reload' use -f to avoid system checks (#52) (4 hours ago) [Peter Yu]
* 04cd8e8 - [ZTP] bufsize=1 not supported in binary mode (#51) (4 hours ago) [Peter Yu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Improve per-command authorization performance by read passwd entry with getpwent.
#### Why I did it
Currently per-command authorization will check if user is remote user with getpwnam API, which will trigger tacplus-nss for authentication with TACACS server.
But this is not necessary because when user login the user information already add to local passwd file.
Use getpwent API can directly read from passwd file, this will improve per-command authorization performance.
##### Work item tracking
- Microsoft ADO: 25104723
#### How I did it
Improve per-command authorization performance by read passwd entry with getpwent.
#### How to verify it
Pass all UT.
### Description for the changelog
Improve per-command authorization performance by read passwd entry with getpwent.
#### Why I did it
src/sonic-gnmi
```
* 8e13400 - (HEAD -> master, origin/master, origin/HEAD) Fix random build failures due to sonic_internal.proto (#157) (3 days ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-restapi
```
* c8fa96b - (HEAD -> master, origin/master, origin/HEAD) Remove command to install libhiredis deb file (#146) (23 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Don't install dependencies of derived debs
When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.
* Re-add openssh-client and openssh-sftp-server as derived debs
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run dhcprelay sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
```
admin@vlab-05:~$ docker inspect dhcp_relay | grep Privilege
"Privileged": false,
admin@vlab-05:~$ docker exec -it dhcp_relay bash
root@vlab-05:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
```
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Change-Id: Id5b011e6bd67accf7b1579d91cb7affad464e916
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh
How I did it
Remove this PCIe device from pcie.yaml
How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.
ADO: 25447788
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408.
Since we're building openssh with some patches, we need to update our version as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This adds optimization for the SONiC image build by splitting the final build step into two stages. It allows running the first stage in parallel, improving build time.
The optimization is enabled via new rules/config flag ENABLE_RFS_SPLIT_BUILD (disabled by default)
- Why I did it
To improve a build time.
- How I did it
Added a logic to run build_debian.sh in two stages, transferring the progress via a new build artifact.
- How to verify it
make ENABLE_RFS_SPLIT_BUILD=y SONIC_BUILD_JOBS=32 target/<IMAGE_NAME>.bin
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.
Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example
- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.
Above scenario can result in packet mis-forwarding on data plane
How I fixed it:-
To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature
neighbor PEER ttl-security hops NUMBER
This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.
We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.
How I verify:
Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:
8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
- Why I did it
To simplify usability and increase adoption of the sFlow + dropmon feature without rebuilding an image.
- How I did it
Remove the ENABLE_SFLOW_DROPMON compilation flag, and remove unnecessary patches.
- How to verify it
1. Configure the sFlow on the switch
2. Configure the Host (PTF)
3. Launch the sflowtool on Host (PTF)
4. Send the dropped packets from Host (PTF) to the switch via scapy
5. Check the L3 counters on the switch
6. Check the samples that were captured by the sflowtool on the Host (PTF)
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
- Why I did it
hw-management renamed PSU temperature related sysfs:
psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.
- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold
- How to verify it
Manual test
sonic-mgmt Regression test
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.
- How I did it
Add new SKU with all relevant files.
- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Enable ZMQ on gnmi and orchagent
#### Why I did it
Improve GNMI API performance for Dash resources
#### How I did it
Modify gnmi and orchagent service start script, add ZMQ parameter.
#### How to verify it
Pass all UT & E2E test
Manually verify with create Dash resources via gnmi API.
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.
- How I did it
Added vendor SAI config options
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it
src/sonic-linux-kernel
```
* d5232ab - (HEAD -> master, origin/master, origin/HEAD) arm64: ac5: Fix watchdog timeleft (#334) (7 days ago) [pavannaregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 91e7a27a - (HEAD -> master, origin/master, origin/HEAD) [buffers] Add handler for the 'create_only_config_db_buffers' configuration knob (#2883) (11 hours ago) [Vadym Hlushko]
* 7f7bc33d - Do not set internal port count to the PortConfigDone DB value. (#2910) (34 hours ago) [mint570]
* d0f1108b - [muxorch] Reorder the neighbor disable operations (#2917) (2 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* f34cb09 - (HEAD -> master, origin/master, origin/HEAD) [warmboot] config all interfaces back to `auto` if reconciliation times out (#220) (8 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
1. Upgrade Centec SAI debian package version to v1.13, in order to match syncd's requirement.
2. Fix syncd compile fail for missing sai_query_api_version function in verdor sai
Signed-off-by: Xianghong Gu <xgu@centec.com>
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
* [buffers] Align the sonic-device_metadata.yang
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
---------
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run radv sonic-mgmt tests
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run eventd sonic-mgmt tests
### Why I did it
When FRR is built with Cache enabled, the build failed with the following error logs
```
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/grpc_basic/lib
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/ospfapi/lib
[2023-09-20T15:17:00.273Z] make: *** [Makefile.cache:528: target/debs/bullseye/frr_8.5.1-sonic-0_amd64.deb.smdep] Error 123
[2023-09-20T15:17:00.273Z] make: *** Waiting for unfinished jobs....
```
#### How I did it
Currently symlinks are excluded in hardcoded fashion. With FRR upgrades new symlinks might get introduced. To overcome it modified the way in which symlinks are excluded by finding symlinks using find command
#### How to verify it
Build FRR with cache enabled
Why I did it
fixes#15949
Problem 1: Setting ONIE_IMAGE_PART_SIZE using env variable or using "make ONIE_IMAGE_PART_SIZE=65536 USERNAME=test PASSWORD=test all" did not work.
Problem 2: The platform specific file for example "device/x86_64-8201_32fh_o-r0/installer.conf" cannot override it by setting value of ONIE_IMAGE_PART_SIZE in the file. change 2 adds support to do that.
How I did it
Change 1: when ONIE_IMAGE_PART_SIZE, the files Makefile.work and slave.mk should pass that setting along all the way to build_image.sh. Please see commit 1.
Change 2: In installer/install.sh, save the value set during build time string replace into a value and then let this value be overridden later when installer.conf get read which is platform specific. If platform does not override it, the original value will continue to work. Please see commit 2.
How to verify it
1: The below command works now
make ONIE_IMAGE_PART_SIZE=65536 USERNAME=test PASSWORD=test all"
The image properly was installed using ONIE and the partition size reflects what was passed in the above build command.
If the above value is not set, the default from "onie-image.conf" takes effect and still works.
2: Set ONIE_IMAGE_PART_SIZE in platform specific file like below example
--------------Diff----
device/x86_64-8201_32fh_o-r0/installer.conf
@@ -1 +1,2 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX=" intel_iommu=off"
+ONIE_IMAGE_PART_SIZE=128000
and built the image using "make USERNAME=test PASSWORD=test all" and verified that the final installation properly partitioned the disk to the requested value from installer.conf file.
Created patches to address two CVEs from FRR CVE-2023-41358 and CVE-2023-38802.
Patch FRR commit CVE fixed
0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e CVE-2023-41358
0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c CVE-2023-38802