Commit Graph

1152 Commits

Author SHA1 Message Date
mssonicbld
0d709a3655
[ci/build]: Upgrade SONiC package versions (#14888) 2023-04-29 17:42:19 +08:00
Tejaswini Chadaga
ca224863cb
Changes to support TSA from supervisor (#14691)
Why I did it
Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module

Work item tracking
Microsoft ADO (number only): 17826134
How I did it
When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards

TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this)

Changes also include no-stats option with TSC for quick retrieval of the current system isolation state

This PR also reverts changes in #11403

How to verify it
These changes have a dependency on sonic-net/sonic-utilities#2701 for testing

Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard
Verify that all routes are withdrawn from eBGP neighbors on all linecards
Run TSB from supervisor module and ensure transition to Normal mode on each linecard
Verify that all routes are re-advertised from eBGP neighbors on all linecards
Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards
2023-04-28 16:28:06 +08:00
Stephen Sun
9e56fea091
Temporary WA for the issue that asic_table.json can not be rendered (#13888)
- Why I did it
We suspect the issue #13791 is caused by redis server being temporarily unavailable during system initialization so we do not use -d in sonic-cfggen, for now, to avoid accessing redis server

- How I did it
Provide a string containing required json data when calling sonic-cfggen

- How to verify it
Manually test it

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-04-24 17:02:35 +03:00
mssonicbld
5ad844f185 [ci/build]: Upgrade SONiC package versions 2023-04-24 18:33:06 +08:00
mssonicbld
81a557885b
[ci/build]: Upgrade SONiC package versions (#14799) 2023-04-22 17:47:40 +08:00
mssonicbld
d006219e2d
[ci/build]: Upgrade SONiC package versions (#14718) 2023-04-19 18:59:16 +08:00
Aryeh Feigin
039a9c998a
[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583)
Part of sonic-net/sonic-utilities#2760
Similar to #14295

- Why I did it
To clear teamd timer when fast-reboot is finalized to prevent any further affect.

- How I did it
Deleted teamd timer from config-db in fast-reboot finalizer.
config save call is moved to after clearing teamd-timer so it won't have any further affect as well.

- How to verify it
Verified manually that entry was deleted after fast-reboot was finailized.
2023-04-18 09:15:42 +03:00
Stepan Blyshchak
d73c810e86
[image_config] add rasdaemon.timer (#14300)
rasdaemon is a tool to log hardware errors. It takes 100% CPU during
boot for a few seconds. It impacts fast/warm boot by delaying control
plane restoration for 5 sec on some platforms.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-04-17 08:58:45 -07:00
mssonicbld
7f262d71da
[ci/build]: Upgrade SONiC package versions (#14685) 2023-04-17 19:58:43 +08:00
mssonicbld
49dbaeb649
[ci/build]: Upgrade SONiC package versions (#14672) 2023-04-15 18:21:50 +08:00
Sudharsan Dhamal Gopalarathnam
2804998766
[config reload]Config Reload Enhancement (#13969)
#### Why I did it
Implementing code changes for https://github.com/sonic-net/SONiC/pull/1203

#### How I did it
Removed the timers and delayed target since the delayed services would start based on event driven approach.
Cleared port table during config reload and cold reboot scenario.
Modified yang model, init_cfg.json to change has_timer to delayed

#### How to verify it
Running regression
2023-04-12 11:20:03 -07:00
mssonicbld
f9eb849d75
[ci/build]: Upgrade SONiC package versions (#14620) 2023-04-12 20:05:30 +08:00
anamehra
f34360f101
chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-12 15:07:42 +08:00
xumia
f1fd42558a
Support to add SONiC OS Version in device info (#14601)
Why I did it
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.

SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
How I did it
2023-04-12 09:20:08 +08:00
mssonicbld
4e5c8988b1
[ci/build]: Upgrade SONiC package versions (#14586) 2023-04-10 18:10:37 +08:00
Aryeh Feigin
41a9813018
Finalize fast-reboot in warmboot finalizer (#14238)
- Why I did it
To solve an issue with upgrade with fast-reboot including FW upgrade which has been introduced since moving to fast-reboot over warm-reboot infrastructure.
As well, this introduces fast-reboot finalizing logic to determine fast-reboot is done.

- How I did it
Added logic to finalize-warmboot script to handle fast-reboot as well, this makes sense as using fast-reboot over warm-reboot this script will be invoked. The script will clear fast-reboot entry from state-db instead of previous implementation that relied on timer. The timer could expire in some scenarios between fast-reboot finished causing fallback to cold-reboot and possible crashes.

As well this PR updates all services/scripts reading fast-reboot state-db entry to look for the updated value representing fast-reboot is active.

- How to verify it
Run fast-reboot and check that fast-reboot entry exists in state-db right after startup and being cleared as warm-reboot is finalized and not due to a timer.
2023-04-09 16:59:15 +03:00
mssonicbld
e32624d362
[ci/build]: Upgrade SONiC package versions (#14571) 2023-04-08 18:00:30 +08:00
Stephen Sun
152148fb81
Enhance the error message output mechanism (#14384)
#### Why I did it

Enhance the error message output mechanism during swss docker creating

#### How I did it

Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog.

#### How to verify it

Manually test
2023-04-07 14:23:35 -07:00
Devesh Pathak
d74055e12c
Increase wait_for_tunnel() timeout to 90s (#14279)
Why I did it
Orchagent sometimes take additional time to execute Tunnel tasks. This cause write_standby script to error out and mux state machines are not initialized. It results in show mux status missing some ports in output.

Mar 13 20:36:52.337051 m64-tor-0-yy41 INFO systemd[1]: Starting MUX Cable Container...
Mar 13 20:37:52.480322 m64-tor-0-yy41 ERR write_standby: Timed out waiting for tunnel MuxTunnel0, mux state will not be written
Mar 13 20:37:58.983412 m64-tor-0-yy41 NOTICE swss#orchagent: :- doTask: Tunnel(s) added to ASIC_DB.
How I did it
Increase timeout from 60s to 90s

How to verify it
Verified that mux state machine is initialized and show mux status has all needed ports in it.
2023-04-07 11:30:58 +08:00
Ying Xie
737d0e57ad
[write standby] force DB connections to use unix socket to connect (#14524)
Why I did it
At service start up time, there are chances that the networking service is being restarted by interface-config service. When that happens, write_standby could fail to make DB connections due to loopback interface is being reconfigured.

How I did it
Force the db connector to use unix socket to avoid loopback reconfig timing window.

How to verify it
Run config reload test 20+ times and no issue encountered.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* use unix socket instead

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-04-06 13:54:56 -07:00
Ye Jianquan
6c04ed987d
Revert "chassis-packet: resolve the missing static routes (#14230)" (#14544)
This reverts commit a8f8ea3b50.
2023-04-06 10:36:10 -07:00
mssonicbld
41c46aedf6
[ci/build]: Upgrade SONiC package versions (#14528) 2023-04-05 18:36:57 +08:00
Ying Xie
d3f3ac6411
Delay mux/sflow/snmp timer after interface-config service (#14506)
Why I did it
All these 3 services started after swss service, which used to start after interface-config service. But #13084 remove the time constraints for swss.

After that, these 3 services has the chance of start earlier when the inteface-config service is restarting the networking service, which could cause db connect request to fail.

How I did it
Delay mux/sflow/snmp timer after the interface-config service.

How to verify it
PR test.
Config reload can repro the issue in 1-3 retries. With this change. config reload run 30+ iterations without hitting the issue.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-04-04 16:23:00 -07:00
mssonicbld
884dfa5427
[ci/build]: Upgrade SONiC package versions (#14498) 2023-04-03 18:34:35 +08:00
mssonicbld
66d3586fd4
[ci/build]: Upgrade SONiC package versions (#14487) 2023-04-01 18:45:34 +08:00
anamehra
a8f8ea3b50
chassis-packet: resolve the missing static routes (#14230)
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping to
resolve the missing entry.

Why I did it
Fixes #14179

chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.

Signed-off-by: anamehra <anamehra@cisco.com>
2023-03-29 09:53:32 -07:00
mssonicbld
6e11833a6c
[ci/build]: Upgrade SONiC package versions (#14430) 2023-03-29 18:39:10 +08:00
Hua Liu
4c059d8eb5
Improve sudo cat command for RO user. (#14428)
Improve sudo cat command for RO user.

#### Why I did it
RO user can use sudo command show none syslog files.

#### How I did it
Improve sudo cat command for RO user.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly.

#### Description for the changelog
Improve sudo cat command for RO user.
2023-03-27 17:08:14 -07:00
oleksandrx-kolomeiets
4da51b07ad
Set owner after restoring counters folder during warmboot (#13507)
Why I did it
After warm reboot, show environment prints the following error:
failed to import plugin show.plugins.macsec: [Errno 13] Permission denied: '/tmp/cache/macsec'

How I did it
Set owner back to admin after restoring counters folder.

How to verify it
sudo warm-reboot, then ensure show environement does not print errors.

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>
2023-03-27 10:32:07 -07:00
mssonicbld
fb6e37819b
[ci/build]: Upgrade SONiC package versions (#14414) 2023-03-25 19:21:56 +08:00
mssonicbld
20f1ab8203
[ci/build]: Upgrade SONiC package versions (#14383) 2023-03-22 19:34:21 +08:00
mssonicbld
4429bdd091
[ci/build]: Upgrade SONiC package versions (#14354) 2023-03-21 01:10:17 +08:00
xumia
7209666374
[Security] Fix some of vulnerability issue relative python packages (#14269)
Why I did it
Fix some of vulnerability issue relative python packages #14269
Pillow: [CVE-2021-27921]
Wheel: [CVE-2022-40898]
lxml: [CVE-2022-2309]

How I did it
2023-03-20 14:15:45 +08:00
mssonicbld
1e8e993a94 [ci/build]: Upgrade SONiC package versions 2023-03-20 09:00:28 +08:00
mssonicbld
89ebd43c81
[ci/build]: Upgrade SONiC package versions (#14311)
Upgrade SONiC Versions
2023-03-19 10:16:41 +08:00
Dev Ojha
de17f72d9a
[Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14280)
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.

How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.

Signed-off-by: dojha <devojha@microsoft.com>
2023-03-17 11:01:17 -07:00
mssonicbld
96817c4357
[ci/build]: Upgrade SONiC package versions (#14102)
Upgrade SONiC Versions
2023-03-17 10:12:30 +08:00
Neetha John
f30fb6ec58
[storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-16 14:18:28 -07:00
davidpil2002
8098bc4bf5
Add Secure Boot Support (#12692)
- Why I did it
Add Secure Boot support to SONiC OS.
Secure Boot (SB) is a verification mechanism for ensuring that code launched by a computer's UEFI firmware is trusted. It is designed to protect a system against malicious code being loaded and executed early in the boot process before the operating system has been loaded.

- How I did it
Added a signing process to sign the following components:
shim, grub, Linux kernel, and kernel modules when doing the build, and when feature is enabled in build time according to the HLD explanations (the feature is disabled by default).

- How to verify it
There are self-verifications of each boot component when building the image, in addition, there is an existing end-to-end test in sonic-mgmt repo that checks that the boot succeeds when loading a secure system (details below).

How to build a sonic image with secure boot feature: (more description in HLD)

Required to use the following build flags from rules/config:
SECURE_UPGRADE_MODE="dev"
SECURE_UPGRADE_DEV_SIGNING_KEY="/path/to/private/key.pem"
SECURE_UPGRADE_DEV_SIGNING_CERT="/path/to/cert/key.pem"
After setting those flags should build the sonic-buildimage.
Before installing the image, should prepared the setup (switch device) with the follow:
check that the device support UEFI
stored pub keys in UEFI DB

enabled Secure Boot flag in UEFI
How to run a test that verify the Secure Boot flow:
The existing test "test_upgrade_path" under "sonic-mgmt/tests/upgrade_path/test_upgrade_path", is enough to validate proper boot
You need to specify the following arguments:
Base_image_list your_secure_image
Taget_image_list your_second_secure_image
Upgrade_type cold
And run the test, basically the test will install the base image given in the parameter and then upgrade to target image by doing cold reboot and validates all the services are up and working correctly
2023-03-14 14:55:22 +02:00
Stepan Blyshchak
f908dfe919
[Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-06 13:36:43 +02:00
mssonicbld
506f372533
[ci/build]: Upgrade SONiC package versions (#14072)
Upgrade SONiC Versions
2023-03-05 11:29:38 +08:00
anamehra
4a93e4cfa4
Add support for platform syncd pre shutdown plugin (#13564)
Why I did it
Vendor platform may require running platform specific pre-shutdown routine before shutting down the syncd process which runs the SAI and vendor sdk instance.

How I did it
Added a platform script hook which will be executed if the plugin script is provided by the platform in device//plugins/
2023-03-03 15:53:33 -08:00
Sudharsan Dhamal Gopalarathnam
8883259673
[netlink] Increse netlink buffer size from 3MB to 16MB (#13965)
#### Why I did it
Following the PR https://github.com/sonic-net/sonic-swss-common/pull/739 increasing netlink buffer size in linux kernel
As error is seen in fdbsyncd with netlink reports "out of memory on reading a netlink socket" It is seen when kernel is sending 10k remote mac to fdbsyncd.


#### How I did it
Increase the buffer size of the netlink buffer from 3MB to 16MB


#### How to verify it
Verified with 10k remote mac, and restarting the fdbsyncd process. So that kernel send the bridge fdb dump to the fdbsyncd.
Verified that the netlink buffer error is not reported in the sys log.
2023-02-27 15:41:22 -08:00
mssonicbld
8d0d3e57ba
[ci/build]: Upgrade SONiC package versions (#13989)
Upgrade SONiC Versions
2023-02-27 13:45:49 +08:00
mssonicbld
58592e6c49
[ci/build]: Upgrade SONiC package versions (#13526)
The initial version files for the SONiC reproducible build
2023-02-25 08:16:38 +08:00
Samuel Angebault
b9dffcbaaf
[Arista] Disable SSD NCQ on Lodoga (#13964)
Why I did it
Fix similar issue seen on #13739 but only for DCS-7050CX3-32S

How I did it
Add a kernel parameter to tell libata to disable NCQ

How to verify it
The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg.

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ

   READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec
  WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec
without NCQ

   READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec
  WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec
2023-02-24 10:08:04 -08:00
DavidZagury
ee1b6b3751
Remove support to Mellanox SPC4 ASIC (#13932)
- Why I did it
FW for Spectrum-4 ASIC not yet available

- How I did it
Remove in Mellanox fw make files to Spectrum-4 ASIC firmware binaries.
Remove from firmware upgrade scripts to be able Spectrum-4 ASIC.

- How to verify it
Run regression test
2023-02-23 08:25:34 +02:00
Andriy Yurkiv
5ad78abea0
[Dual-ToR] add default value for ACL rule for mellanox platform (#13547)
- Why I did it
Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario

- How I did it
Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table

- How to verify it
check that new attribute exists in redis:
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS|mux_tunnel_ingress_acl
1."state"
2."false"

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2023-02-22 20:25:54 +02:00
Marty Y. Lok
2c22d9affc
[Chassis][multiasic] Fix the sonic-db-cli core files issue on multiasic platform after the c++ implementation of sonic-db-cli (#13207)
Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
2023-02-21 11:23:22 -08:00
Saikrishna Arcot
56d732a0a0
Use tmpfs for /var/log on Arista 7050CX3-32S (#13805)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-16 19:13:39 -08:00