Commit Graph

605 Commits

Author SHA1 Message Date
StormLiangMS
2c28502ddd
Revert "Share docker image and use telemetry container for 202305 (#17255)" (#17356)
This reverts commit 2c7d53e5fb.
2023-11-30 20:41:38 +08:00
ganglv
2c7d53e5fb
Share docker image and use telemetry container for 202305 (#17255)
Why I did it
Need to share docker image for telemetry and gnmi, and only use telemetry container for 202305 branch

Work item tracking
Microsoft ADO (number only):
How I did it
Add a new docker image, base-gnmi, build sonic-gnmi and sonic-telemetry on this docker image.
Enable telemetry container.

How to verify it
Run end to end test for telemetry and gnmi.
2023-11-24 11:22:48 +08:00
ganglv
733a902a70
Revert "[202305] Share image for gnmi and telemetry (#17137)" (#17261)
This reverts commit f2a495f7e5.
2023-11-22 23:51:34 +08:00
mssonicbld
1337d295a3
[chassisd]: Add alternate to the bridge interface created on chassis supervisor. (#16505) (#17223) 2023-11-19 14:42:00 +08:00
ganglv
f2a495f7e5
[202305] Share image for gnmi and telemetry (#17137)
Why I did it
Share docker image to support gnmi container and telemetry container
backport #16863

Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.

How to verify it
Run end to end test.
2023-11-15 11:28:21 +08:00
Vadym Hlushko
28ecd068d4
[202305][buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#17006)
Why I did it
Add the create_only_config_db_buffers attribute to the DEVICE_METADATA|localhost. If the "create_only_config_db_buffers" exists and is equal to "true" - the buffers will be created according to the config_db configuration (for example BUFFER_QUEUE|* table), otherwise the maximum available buffers (which are read from SAI) will be created, regardless of the CONFIG_DB buffers configuration.

Work item tracking
Microsoft ADO (number only):
How I did it
Add the create_only_config_db_buffers.json files for Mellanox devices (not MSFT SKU's), and inject the content to the CONFIG_DB during the swss docker container start.

How to verify it
Manual verification:

Install the image with this PR included on the not MSFT SKU switch
Check the show queue counters output and verify that only configured in CONFIG_DB buffers are created
root@sonic:/home/admin# show queue counters
     Port    TxQ    Counter/pkts    Counter/bytes    Drop/pkts    Drop/bytes
---------  -----  --------------  ---------------  -----------  ------------
Ethernet0    UC0               0                0            0           N/A
Ethernet0    UC1               0                0            0           N/A
Ethernet0    UC2               0                0            0           N/A
Ethernet0    UC3               0                0            0           N/A
Ethernet0    UC4               0                0            0           N/A
Ethernet0    UC5               0                0            0           N/A
Ethernet0    UC6               0                0            0           N/A
Open the /usr/share/sonic/device/$DEVICE/$SKU/create_only_config_db_buffers.json and change it to:
"create_only_config_db_buffers": "false"
Do config reload
Check the show queue counters output and verify that all available buffers are created
root@sonic:/home/admin# show queue counters
     Port    TxQ    Counter/pkts    Counter/bytes    Drop/pkts    Drop/bytes
---------  -----  --------------  ---------------  -----------  ------------
Ethernet0    UC0               0                0            0           N/A
Ethernet0    UC1               0                0            0           N/A
Ethernet0    UC2               0                0            0           N/A
Ethernet0    UC3               0                0            0           N/A
Ethernet0    UC4               0                0            0           N/A
Ethernet0    UC5               0                0            0           N/A
Ethernet0    UC6               0                0            0           N/A
Ethernet0    UC7              60            15346            0           N/A
Ethernet0    MC8             N/A              N/A          N/A           N/A
Ethernet0    MC9             N/A              N/A          N/A           N/A
Ethernet0   MC10             N/A              N/A          N/A           N/A
Ethernet0   MC11             N/A              N/A          N/A           N/A
Ethernet0   MC12             N/A              N/A          N/A           N/A
Ethernet0   MC13             N/A              N/A          N/A           N/A
Ethernet0   MC14             N/A              N/A          N/A           N/A
Ethernet0   MC15             N/A              N/A          N/A           N/A
2023-11-03 14:27:17 +08:00
mssonicbld
feaa855346
Add special rsyslog filter for MSN2700 platform (#16684) (#17078) 2023-11-03 03:05:44 +08:00
Samuel Angebault
274e929f11
Reduce SONiC image filesystem size (#16948)
Why I did it
Running SONiC releases past 202012 has become really challenging on system with small storage devices (4GB).
Some of these devices can also be limited by only having 4GB of RAM which complicates mitigations.
The main contributor to these issues is the SONiC image growth.
Being able to reduce it by some decent amount should allow these systems to run SONiC longer.
It would also reduce some impacts related to space savings mitigations.

Work item tracking
Microsoft ADO (number only):
How I did it
Add a build option to reduce the image size.
The image reduction process is affecting the builds in 2 ways:

change some packages that are installed in the rootfs
apply a rootfs reduction script
The script itself will perform a few steps:

remove file duplication by leveraging hardlinks
under /usr/share/sonic since the symlinks under the device folder are lost during the build.
under /var/lib/docker since the files there will only be mounted ro
remove some extra files (man, docs, licenses, ...)
some image specific space reduction (only for aboot images currently)
The script can later be improved but for now it's reducing the rootfs size by ~30%.

How to verify it
Compare the size of an image with this option enabled and this option enabled.
Expect the fully extracted content to be ~30% less.

Which release branch to backport (provide reason below if selected)
This is a backport of #16729

Description for the changelog
Add build option to reduce final image size
2023-10-24 21:08:38 +08:00
Longxiang Lyu
dd20597e4d [snmp] Check intfmgrd running before start (#16588)
Add pre start check to ensure intfmgrd is running.
The check will run for 20 seconds at most.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2023-10-21 12:32:42 +08:00
Aman Singhal
f265c79541 [cisco]: Enable Kdump config by default for cisco-8000 (#16224)
Why I did it
Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf.
After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot.

How I did it
Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms.

How to verify it
Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot.
Kdump config should persist on subsequent reboots and kdump loaded during bootup

Signed-off-by: Aman Singhal <amans@cisco.com>
2023-10-18 00:37:30 +08:00
Saikrishna Arcot
39cdee57e1 [baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826)
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. 
Since we're building openssh with some patches, we need to update our version as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-17 16:34:18 +08:00
mssonicbld
185a63bc7f
[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733) (#16753) 2023-09-29 05:29:20 +08:00
Alpesh Patel
6b48346ff5 qos template change for backend compute-ai deployment (#16150)
#### Why I did it

To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI".
This deployment has the following requirement:

- Config below enabled if DEVICE_TYPE as one of backend_device_types
- Config below enabled if ResourceType is 'Compute-AI'
- 2 lossless TCs' (2, 3)
- 2 lossy TCs' (0,1)
- DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows:
   "DSCP_TO_TC_MAP": {
        "AZURE": {
             "48" : "0",
            "46" : "1",
            "3"  : "3",
            "4"  : "4"
        }
    }

- WRED profile has green {min/max/mark%} as {2M/10M/5%}

This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here).

### How I did it

#### How to verify it
- with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met

- verified no error

### Description for the changelog
Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles
2023-09-21 18:34:11 +08:00
Kebo Liu
fe7eeed051
[202305][Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3(#16096) (#16298)
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)

SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3

SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

SDK/FW Features
1. On SN2700 all ports can support y cable by credo

SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile

- How I did it
Update SDK/FW/SAI make files

- How to verify it
Run full sonic-mgmt regression on Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
	platform/mellanox/mlnx-sai.mk

* Fix issue: unprintable character is rendered when handling comments in j2

Use "{#-" and "-#}" to mark comments in jinja template

Signed-off-by: Stephen Sun <stephens@nvidia.com>

---------

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
2023-09-10 22:28:46 +08:00
xumia
288ebd5dd3 Support FIPS DB configuration (#15632)
Why I did it
Support FIPS DB configuration
Design Doc: sonic-net/SONiC#1372

Work item tracking
Microsoft ADO (number only): 24411148
How I did it
Add the FIPS Yang model to make FIPS configurable in ConfigDB.

How to verify it
See TestPlan: sonic-net/sonic-mgmt#9092
Build the image and run the tests: sonic-net/sonic-mgmt#9091
2023-09-03 16:33:25 +08:00
mssonicbld
adfc486456
Run db_migrator for non first-time reboots (#16116) (#16306) 2023-08-29 05:36:36 +08:00
Vaibhav Hemant Dixit
0b83639068
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685) (#16217)
Cherypick of #15685

MSFT ADO: 24274591

Why I did it
Two changes:

1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect.
There are multiple places where same incorrect logic is used.

Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation.

root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 
Fix this logic by checking for value of flag to be "1".

root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case
Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.

Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
2023-08-24 16:58:24 +08:00
Vaibhav Hemant Dixit
2969d84e58 Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684)
This reverts commit 9649a44470.
2023-08-15 04:32:38 +08:00
vmittal-msft
5ee18ece65 Update WRED profile on system ports (#15612)
* Update WRED profile on system ports
2023-08-07 14:33:42 +08:00
mssonicbld
33a10b479a
[nvidia] make sure shared storage with syncd is cleared on restarts (#14547) (#16046)
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2023-08-07 09:27:43 +08:00
Junchao-Mellanox
bf37c3162c Fix issue: set delayed attribute to true for platform monitor service (#15816)
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "delayed=False". However, we know that PMON has a timer now. So, I try to fix it here.
2023-08-07 00:34:12 +08:00
lixiaoyuner
c59f55f6a3
Move k8s script to docker-config-engine (#14788) (#15768)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-17 23:21:01 +08:00
mssonicbld
bb3eff6ab4
Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464) (#15618) 2023-06-29 22:35:47 +08:00
Stepan Blyshchak
e2e5b77f16
[mlnx-ffb.sh] Update issu-version location (#14925)
#### Why I did it

ISSU version check fails due to inability to mount squashfs from 202211 on 201911

#### How I did it

Put ISSU version file under platform directory

#### How to verify it

Warm-upgrade matrix:
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211
- 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master
- 202205 (with this change cherry-picked) to master
2023-06-15 15:14:52 -07:00
Alpesh Patel
633fff8c10
enable ethernet backplane port support in port config for packet mode T2 devices (#14533)
For T2 systems using packet mode, the backplane interfaces (Ethernet-BP#) and the fabric card ethernet interfaces are not visible as neighbor interfaces.
In packet mode, these interfaces needs qos and buffer config as well.
This fix addresses that issue and adds the backplane interfaces to the PORTS_ACTIVE list
2023-06-12 14:02:22 -07:00
Sudharsan Dhamal Gopalarathnam
162856ad9a
[sflow]Delay starting sflow service until ports are created (#15333)
* [sflow]Delay starting sflow service until ports are created
* Removing sflow from sonic.target dependency since it will be managed by hostcfgd
2023-06-09 16:28:15 -07:00
Yevhen Fastiuk
8a6d45227e
[Clock] Add timezone config YANG model (#14651)
* Add the ability to configure timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add YANG model for timezone

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add timezone reference

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

---------

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2023-06-07 10:39:24 -07:00
Arvindsrinivasan Lakshmi Narasimhan
3f4b959d3f
[chassis] add libffi-dev for sonic-utilities (#15218)
In the PR sonic-net/sonic-utilities#2850 , for support remote access of linecards paramiko package is installed in sonic-utilities. libffi-dev needs to installed to be able to compile for armhf image

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-06-03 14:36:50 -07:00
Vaibhav Hemant Dixit
02b17839c3
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
2023-05-30 10:16:21 -07:00
vmittal-msft
ecb4db58a9
Update PG headroom settings ports based on port speed/cable length (#14908)
* Update PG headroom settings ports based on port speed/cable length

* Updated XOFF settings to use chip level numbers than core

* Updated PG headroom based on uplink/downlink side

* fix for sonic-config-gen tests

* More fixes for unit test cases

* more test fixes

* Merged multiple functions into one
2023-05-19 08:19:27 -07:00
Zain Budhwani
a738c39328
Add fix to monit_regex.json for catching mem_usage and cpu_usage (#14954)
Why I did it
Current regex not able to capture logs, modify regex to capture syslog messages

Work item tracking
Microsoft ADO (number only): 13366345
How I did it
Code change

How to verify it
sonic-mgmt test case
2023-05-08 11:48:17 -07:00
Stephen Sun
9e56fea091
Temporary WA for the issue that asic_table.json can not be rendered (#13888)
- Why I did it
We suspect the issue #13791 is caused by redis server being temporarily unavailable during system initialization so we do not use -d in sonic-cfggen, for now, to avoid accessing redis server

- How I did it
Provide a string containing required json data when calling sonic-cfggen

- How to verify it
Manually test it

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-04-24 17:02:35 +03:00
Stepan Blyshchak
d73c810e86
[image_config] add rasdaemon.timer (#14300)
rasdaemon is a tool to log hardware errors. It takes 100% CPU during
boot for a few seconds. It impacts fast/warm boot by delaying control
plane restoration for 5 sec on some platforms.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-04-17 08:58:45 -07:00
Sudharsan Dhamal Gopalarathnam
2804998766
[config reload]Config Reload Enhancement (#13969)
#### Why I did it
Implementing code changes for https://github.com/sonic-net/SONiC/pull/1203

#### How I did it
Removed the timers and delayed target since the delayed services would start based on event driven approach.
Cleared port table during config reload and cold reboot scenario.
Modified yang model, init_cfg.json to change has_timer to delayed

#### How to verify it
Running regression
2023-04-12 11:20:03 -07:00
anamehra
f34360f101
chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-12 15:07:42 +08:00
xumia
f1fd42558a
Support to add SONiC OS Version in device info (#14601)
Why I did it
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.

SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
How I did it
2023-04-12 09:20:08 +08:00
Stephen Sun
152148fb81
Enhance the error message output mechanism (#14384)
#### Why I did it

Enhance the error message output mechanism during swss docker creating

#### How I did it

Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog.

#### How to verify it

Manually test
2023-04-07 14:23:35 -07:00
Ye Jianquan
6c04ed987d
Revert "chassis-packet: resolve the missing static routes (#14230)" (#14544)
This reverts commit a8f8ea3b50.
2023-04-06 10:36:10 -07:00
Ying Xie
d3f3ac6411
Delay mux/sflow/snmp timer after interface-config service (#14506)
Why I did it
All these 3 services started after swss service, which used to start after interface-config service. But #13084 remove the time constraints for swss.

After that, these 3 services has the chance of start earlier when the inteface-config service is restarting the networking service, which could cause db connect request to fail.

How I did it
Delay mux/sflow/snmp timer after the interface-config service.

How to verify it
PR test.
Config reload can repro the issue in 1-3 retries. With this change. config reload run 30+ iterations without hitting the issue.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-04-04 16:23:00 -07:00
anamehra
a8f8ea3b50
chassis-packet: resolve the missing static routes (#14230)
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping to
resolve the missing entry.

Why I did it
Fixes #14179

chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.

Signed-off-by: anamehra <anamehra@cisco.com>
2023-03-29 09:53:32 -07:00
Dev Ojha
de17f72d9a
[Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14280)
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.

How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.

Signed-off-by: dojha <devojha@microsoft.com>
2023-03-17 11:01:17 -07:00
Neetha John
f30fb6ec58
[storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-16 14:18:28 -07:00
davidpil2002
8098bc4bf5
Add Secure Boot Support (#12692)
- Why I did it
Add Secure Boot support to SONiC OS.
Secure Boot (SB) is a verification mechanism for ensuring that code launched by a computer's UEFI firmware is trusted. It is designed to protect a system against malicious code being loaded and executed early in the boot process before the operating system has been loaded.

- How I did it
Added a signing process to sign the following components:
shim, grub, Linux kernel, and kernel modules when doing the build, and when feature is enabled in build time according to the HLD explanations (the feature is disabled by default).

- How to verify it
There are self-verifications of each boot component when building the image, in addition, there is an existing end-to-end test in sonic-mgmt repo that checks that the boot succeeds when loading a secure system (details below).

How to build a sonic image with secure boot feature: (more description in HLD)

Required to use the following build flags from rules/config:
SECURE_UPGRADE_MODE="dev"
SECURE_UPGRADE_DEV_SIGNING_KEY="/path/to/private/key.pem"
SECURE_UPGRADE_DEV_SIGNING_CERT="/path/to/cert/key.pem"
After setting those flags should build the sonic-buildimage.
Before installing the image, should prepared the setup (switch device) with the follow:
check that the device support UEFI
stored pub keys in UEFI DB

enabled Secure Boot flag in UEFI
How to run a test that verify the Secure Boot flow:
The existing test "test_upgrade_path" under "sonic-mgmt/tests/upgrade_path/test_upgrade_path", is enough to validate proper boot
You need to specify the following arguments:
Base_image_list your_secure_image
Taget_image_list your_second_secure_image
Upgrade_type cold
And run the test, basically the test will install the base image given in the parameter and then upgrade to target image by doing cold reboot and validates all the services are up and working correctly
2023-03-14 14:55:22 +02:00
Stepan Blyshchak
f908dfe919
[Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-06 13:36:43 +02:00
DavidZagury
ee1b6b3751
Remove support to Mellanox SPC4 ASIC (#13932)
- Why I did it
FW for Spectrum-4 ASIC not yet available

- How I did it
Remove in Mellanox fw make files to Spectrum-4 ASIC firmware binaries.
Remove from firmware upgrade scripts to be able Spectrum-4 ASIC.

- How to verify it
Run regression test
2023-02-23 08:25:34 +02:00
Andriy Yurkiv
5ad78abea0
[Dual-ToR] add default value for ACL rule for mellanox platform (#13547)
- Why I did it
Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario

- How I did it
Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table

- How to verify it
check that new attribute exists in redis:
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS|mux_tunnel_ingress_acl
1."state"
2."false"

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2023-02-22 20:25:54 +02:00
Marty Y. Lok
2c22d9affc
[Chassis][multiasic] Fix the sonic-db-cli core files issue on multiasic platform after the c++ implementation of sonic-db-cli (#13207)
Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
2023-02-21 11:23:22 -08:00
zhixzhu
f0f7639fa2
set cable length to 1m for backplane ports (#13572)
Signed-off-by: Zhixin Zhu zhixzhu@cisco.com

Why I did it
backplane ports cable length need to be specified.

How I did it
separated handling for the specific port name.
2023-02-10 19:01:49 -08:00
Stephen Sun
e3ff08833e
[Mellanox] Support DSCP remapping in dual ToR topo on T0 switch (#12605)
- Why I did it
Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8.

- How I did it
Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back.
To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches.

On uplink ports, map DSCP 2/6 to TC 2/6 respectively
On downlink ports, both DSCP 2/6 are still mapped to TC 1
Buffer adjusted according to the ports information:
Mellanox-SN4600c-C64:
56 downlinks 50G + 8 uplinks 100G
Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8:
24 downlinks 50G + 8 uplinks 100G

- How to verify it
Unit test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-07 16:21:59 +02:00
Chun'ang Li
eea54717b8
Fix rsyslogd start failed cause by rsyslog.conf is emtpy. (#13669)
- Why I did it
In to-sonic and multi-asic KVM-test, pretest sometimes failed. Reason is rsyslogd process can not start in teamd container. Because rsyslog.conf is empty caused by sonic-cfggen execute failed

- How I did it
If sonic-cfggen -d execute failed, execute without -d because the template file has the default value.

- How to verify it
Build image and test it over 40 times, all passed pretest.

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
2023-02-06 16:38:04 +02:00