This adds optimization for the SONiC image build by splitting the final build step into two stages. It allows running the first stage in parallel, improving build time.
The optimization is enabled via new rules/config flag ENABLE_RFS_SPLIT_BUILD (disabled by default)
- Why I did it
To improve a build time.
- How I did it
Added a logic to run build_debian.sh in two stages, transferring the progress via a new build artifact.
- How to verify it
make ENABLE_RFS_SPLIT_BUILD=y SONIC_BUILD_JOBS=32 target/<IMAGE_NAME>.bin
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.
Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example
- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.
Above scenario can result in packet mis-forwarding on data plane
How I fixed it:-
To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature
neighbor PEER ttl-security hops NUMBER
This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.
We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.
How I verify:
Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:
8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
- Why I did it
To simplify usability and increase adoption of the sFlow + dropmon feature without rebuilding an image.
- How I did it
Remove the ENABLE_SFLOW_DROPMON compilation flag, and remove unnecessary patches.
- How to verify it
1. Configure the sFlow on the switch
2. Configure the Host (PTF)
3. Launch the sflowtool on Host (PTF)
4. Send the dropped packets from Host (PTF) to the switch via scapy
5. Check the L3 counters on the switch
6. Check the samples that were captured by the sflowtool on the Host (PTF)
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
- Why I did it
hw-management renamed PSU temperature related sysfs:
psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.
- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold
- How to verify it
Manual test
sonic-mgmt Regression test
- Why I did it
To add new SKU for Virtual Smart Switch. T1 switch with 28x400G ports.
- How I did it
Add new SKU with all relevant files.
- How to verify it
run sonic-mgmt t1-28 test suites based on master.
Few issues observed not relevant to the topology but to the stability of master
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Enable ZMQ on gnmi and orchagent
#### Why I did it
Improve GNMI API performance for Dash resources
#### How I did it
Modify gnmi and orchagent service start script, add ZMQ parameter.
#### How to verify it
Pass all UT & E2E test
Manually verify with create Dash resources via gnmi API.
- Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700 on specific and requested SKUs. Default SKU remain untouched.
- How I did it
Added vendor SAI config options
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
run sonic-mgmt test suits while this option is enabled.
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it
src/sonic-linux-kernel
```
* d5232ab - (HEAD -> master, origin/master, origin/HEAD) arm64: ac5: Fix watchdog timeleft (#334) (7 days ago) [pavannaregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 91e7a27a - (HEAD -> master, origin/master, origin/HEAD) [buffers] Add handler for the 'create_only_config_db_buffers' configuration knob (#2883) (11 hours ago) [Vadym Hlushko]
* 7f7bc33d - Do not set internal port count to the PortConfigDone DB value. (#2910) (34 hours ago) [mint570]
* d0f1108b - [muxorch] Reorder the neighbor disable operations (#2917) (2 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* f34cb09 - (HEAD -> master, origin/master, origin/HEAD) [warmboot] config all interfaces back to `auto` if reconciliation times out (#220) (8 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
1. Upgrade Centec SAI debian package version to v1.13, in order to match syncd's requirement.
2. Fix syncd compile fail for missing sai_query_api_version function in verdor sai
Signed-off-by: Xianghong Gu <xgu@centec.com>
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
* [buffers] Align the sonic-device_metadata.yang
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
---------
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run radv sonic-mgmt tests
### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run eventd sonic-mgmt tests
### Why I did it
When FRR is built with Cache enabled, the build failed with the following error logs
```
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/grpc_basic/lib
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/ospfapi/lib
[2023-09-20T15:17:00.273Z] make: *** [Makefile.cache:528: target/debs/bullseye/frr_8.5.1-sonic-0_amd64.deb.smdep] Error 123
[2023-09-20T15:17:00.273Z] make: *** Waiting for unfinished jobs....
```
#### How I did it
Currently symlinks are excluded in hardcoded fashion. With FRR upgrades new symlinks might get introduced. To overcome it modified the way in which symlinks are excluded by finding symlinks using find command
#### How to verify it
Build FRR with cache enabled
Why I did it
fixes#15949
Problem 1: Setting ONIE_IMAGE_PART_SIZE using env variable or using "make ONIE_IMAGE_PART_SIZE=65536 USERNAME=test PASSWORD=test all" did not work.
Problem 2: The platform specific file for example "device/x86_64-8201_32fh_o-r0/installer.conf" cannot override it by setting value of ONIE_IMAGE_PART_SIZE in the file. change 2 adds support to do that.
How I did it
Change 1: when ONIE_IMAGE_PART_SIZE, the files Makefile.work and slave.mk should pass that setting along all the way to build_image.sh. Please see commit 1.
Change 2: In installer/install.sh, save the value set during build time string replace into a value and then let this value be overridden later when installer.conf get read which is platform specific. If platform does not override it, the original value will continue to work. Please see commit 2.
How to verify it
1: The below command works now
make ONIE_IMAGE_PART_SIZE=65536 USERNAME=test PASSWORD=test all"
The image properly was installed using ONIE and the partition size reflects what was passed in the above build command.
If the above value is not set, the default from "onie-image.conf" takes effect and still works.
2: Set ONIE_IMAGE_PART_SIZE in platform specific file like below example
--------------Diff----
device/x86_64-8201_32fh_o-r0/installer.conf
@@ -1 +1,2 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX=" intel_iommu=off"
+ONIE_IMAGE_PART_SIZE=128000
and built the image using "make USERNAME=test PASSWORD=test all" and verified that the final installation properly partitioned the disk to the requested value from installer.conf file.
Created patches to address two CVEs from FRR CVE-2023-41358 and CVE-2023-38802.
Patch FRR commit CVE fixed
0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e CVE-2023-41358
0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c CVE-2023-38802
previously, get_num_asics() returns the maximum number of asics. however, the asic_count
should be actual number of asics populated which can be get from get_asic_presence_list().
ADO: 25158825
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-linux-kernel
```
* ecba611 - (HEAD -> master, origin/master, origin/HEAD) arm64: Enable CONFIG_KEXEC_FILE (#333) (6 hours ago) [pavannaregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fix: #16699
Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685
The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
Remove 15s unconditional sleep.
Instead check every second that /proc is not mounted.
Go to the next step if /proc is not mounted anymore or after 15s.
Why I did it
Now build will fail on:
fatal: Unable to hash src/sonic-frr/frr/tests/topotests/grpc_basic/lib
fatal: Unable to hash src/sonic-frr/frr/tests/topotests/ospfapi/lib
make: *** [Makefile.cache:528: target/debs/buster/frr_8.5.1-sonic-0_amd64.deb.smdep] Error 123
make: *** Waiting for unfinished jobs....
Root cause is that these files are symbol links.
git hash-object can't hash symbol links.
Work item tracking
Microsoft ADO (number only): 25271730
How I did it
These two files are symbol links.
When calculate sha value, skip these two files.
#### Why I did it
src/sonic-gnmi
```
* cbb7631 - (HEAD -> master, origin/master, origin/HEAD) Debug grpc to fetch subscribe preferences of a path (#130) (6 hours ago) [Sachin Holla]
* 099ff7c - Remove command to install libhiredis deb file (#151) (9 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
Fix issue #16533 , telemetry service exit in master and 202305 branches due to no telemetry configs in redis DB.
#### How I did it
Enable default config if no TELEMETRY configs from redis DB.
#### How to verify it
After the fix, telemetry service would work with the following two scenarios:
1. With TELEMETRY config in redis DB, load service configs from DB.
2. No TELEMETRY config in redis DB, use default service configs.
#### Why I did it
src/sonic-mgmt-common
```
* 42ca0a6 - (HEAD -> master, origin/master, origin/HEAD) DB Access Layer Merges: GetTablePattern ... (#103) (10 hours ago) [a-barboza]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 45547e66 - (HEAD -> master, origin/master, origin/HEAD) [Buffer Orch] Retry one more time when it fails to set buffer profiles' attributes to SAI (#2890) (11 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
When SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT changed, almost all dockers need to be built again.
But currently it will be loaded by cache.
Work item tracking
Microsoft ADO (number only): 25123348
How I did it
Add $(DOCKER)_FILES into dependencies.
#### Why I did it
src/sonic-swss-common
```
* b0f148e - (HEAD -> master, origin/master, origin/HEAD) [chassis][voq] Add fabric monitoring tables definitions. (#808) (10 hours ago) [jfeng-arista]
```
#### How I did it
#### How to verify it
#### Description for the changelog
### Why I did it
##### Work item tracking
- Microsoft ADO **(number only)**:24851367
#### How I did it
Read subscription message when capture service starts, before reading cached events.
#### How to verify it
UT/Manual testing
### Why I did it
### How I did it
Fix regex such that dhcp bind failure event is detected as well as process name since dhcp relay processes that need to be detected are dhcprelay6 and dhcrelay.
#### How to verify it
Manual testing and nightly test event
Microsoft ADO (25266920)
sonic-mgmt xoff test was failing for [100g,120km]. Needed to update total headroom pool size when 100G line card is used as T2 uplink.
This size was calculated assuming 100g is used for downlink so cable length was 2km whereas it can also be used for uplink (cable length - 120km). so we need to do calculation based on 120km not 2km. Although it will be some wastage for 2km scenario but it should cover both cases.
What I did:
Enable Sending BGP Community over internal neighbors over iBGP Session
Microsoft ADO: 25268695
Why I did:
Without this change BGP community send by e-BGP Peers are not carry-forward to other e-BGP peers.
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52141
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 16:08:26 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52688
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 15:45:51 2023
After the change
str2-xxxx-lc2-2(config)# router bgp 65100
str2-xxxx-lc2-2(config-router)# address-family ipv4
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V4 send-community
str2-xxxx-lc2-2(config-router-af)# exit
str2-xxxx-lc2-2(config-router)# address-family ipv6
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V6 send-community
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52400
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:19 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52947
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:09 2023
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-sairedis
```
* c22b76b - (HEAD -> master, origin/master, origin/HEAD) [VOQ][saidump] Enhance saidump with new option -r to parser the JSON file and displays/format the right output (#1288) (17 hours ago) [JunhongMao]
* 31bd92a - Add log for git revision (#1293) (4 days ago) [Kamil Cudnik]
* edf6597 - [submodule] Update SAI submodule to v1.13 (#1292) (6 days ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog