Commit Graph

489 Commits

Author SHA1 Message Date
shlomibitton
4ec3af86af
[Fastboot] Delay PMON service for better fastboot performance (#10567)
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
2022-05-02 10:44:17 +03:00
shlomibitton
1d84e0d7df
[Fastboot] Delay LLDP service for better fastboot performance (#10568)
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
2022-04-28 10:35:14 +03:00
ganglv
9d7387a18e
[sonic-host-services]: Fix import and invalid path (#10660)
Why I did it
Can not start sonic-hostservice

How I did it
Install python3-dbus and systemd-python, and replace invalid path

How to verify it
Start the service with below commands:
sudo systemctl start sonic-hostservice
sudo systemctl status sonic-hostservice

Signed-off-by: Gang Lv ganglv@microsoft.com
2022-04-27 07:14:51 +08:00
Saikrishna Arcot
64187a1b15
Remove SSH host keys after installing the custom version of sshd (#10633)
* Remove SSH host keys after installing the custom version of sshd

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Use an override for for sshd instead of overwriting the service file

Don't overwrite upstream's .service file, and instead use an override
file for making sure the host key(s) are generated.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-04-25 10:38:52 -07:00
bingwang-ms
3fc3259a35
Define qos map AZURE_TUNNEL for QoS remapping of tunnel traffic (#10565)
* Add AZURE_TUNNEL map

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-04-25 15:06:10 +08:00
kellyyeh
2a516a7763
[dhcp_relay] Enable dhcp_relay on EPMS, MgmtTsTor, MgmtToRRouter and BackEndToRRouter (#10474) 2022-04-15 18:01:24 -07:00
Yakiv Huryk
d9117d9411
[Mellanox][asan] add address sanitizer support for syncd (#10266)
Why I did it
To support address sanitizer for Mellanox syncd

How I did it
/var/log/asan is mapped for syncd container (the same as for swss)
container stop() has a timeout (60s) for syncd (the same as for swss)
This is so libasan has enough time to generate a report.
added ASAN's log path to Mellanox syncd supervisord.conf
added "asan: yes" to sonic_version.yml
How to verify it
Added artificial memory leaks
Compiled with ENABLE_ASAN=y
Installed the image on DUT
Rebooted the DUT
Verified that /var/log/asan/syncd-asan.log contains the leaks

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2022-04-14 15:00:32 -07:00
byu343
f7a6553933
[docker-syncd]: Add optional shm-size to syncd container (#10516)
Why I did it
In the bringup of tomahawk4/trident4, we realized that such chips need a larger size of /dev/shm in syncd container, so we added the option --shm-size to the docker create for syncd. The default value for shm-size is 64m; after this change, people can add SYNCD_SHM_SIZE=128m to platform_env.conf to change it to 128m.

How to verify it
We verified that after this change, 1) on existing platforms without platform_env.conf, the size of /dev/shm in syncd container (df -h | grep shm) is still the default 64M; 2) after we add SYNCD_SHM_SIZE=128m to platform_env.conf, /dev/shm in syncd becomes 128M.
2022-04-09 10:47:18 -07:00
bingwang-ms
b9dd1df372
Update qos config to clear queues for bounced back traffic (#10176)
* Update qos config to clear queues for bounced back traffic

Signed-off-by: bingwang <bingwang@microsoft.com>
2022-04-05 22:32:25 +08:00
judyjoseph
8e642848c2
Introduce the asic_subtype field for adding the sub platform variants. (#10235)
* Introduce the asic_subtype field for adding the sub platform variants. 
   It uses the value of TARGET_MACHINE variable in slave.mk.
2022-03-28 11:22:32 -07:00
xumia
1017ee6002
[Build]: Use one debian mirror config (#10274)
Why I did it
Use one debian mirror config.
The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense.
All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt.

How I did it
Remove files/image_config/apt and the reference.
2022-03-21 16:47:20 +08:00
xumia
0243ed9538
[build]: Fix marvell-armhf build hung issue (#10156) (#10229)
Why I did it
The marvel-armhf build is hung, it does not exit after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
2022-03-15 10:03:54 +08:00
xumia
eea3cc7ad1
[Build]: only install grpc in amd64 (#10212)
[Build]: only install grpc in amd64
Unblock marvell-armhf build.
2022-03-14 13:41:37 +08:00
xumia
9cdf81230b
[Build]: Fix /proc not mounted issue (#10164)
[Build]: Fix /proc not mounted issue
2022-03-11 09:23:37 +08:00
Song Yuan
01798447ab
[Chassis][QoS template] Skip configuring buffer and QoS config on recirc ports (#7869)
* Added test case to verify the template changes.
2022-03-09 16:04:36 -08:00
Kebo Liu
fe0a7693f4
[smartmontools] Install smartmontools with apt-get and upgrade it to 7.2-1 (#10087)
Why I did it
Smartmontools 6.6 has an issue with reading SMART info of nvme SSD
Smartmontools can be installed with apt-get, no need to build and install

How I did it
Use apt-get to install smartmontools 7.2-1
Remove previous make files for smartmontools 6.6

How to verify it
verify with "smartctl" can read out correct SMART info on NVME ssd.
verify "show platform ssdhealth" can still work

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-03-07 09:39:33 -08:00
Aravind Mani
1740beb1f2
[sonic-cfggen]: Fix sonic-cfggen build failures for armhf (#10132)
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9

The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.

How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
2022-03-02 13:06:20 -08:00
vmittal-msft
bc1dfea619
Updated traffic scheduler settings for HWSKUs : DellEMC-Z9332f-O32 and DellEMC-Z9332f-M-O16C64 (#9828) 2022-02-23 17:22:41 -08:00
byu343
155220be9b
Support multi-asic on macsec container (#9921)
This change enables the support of running multiple macsec containers, each for one ASIC.
2022-02-13 22:45:24 -08:00
Oleksandr Ivantsiv
25a0ce5eb1
[asan] Add address sanitizer support. (#9857)
Implement infrastructure that allows enabling address sanitizer
for docker containers. Enable address sanitizer for SWSS container.

- Why I did it
To add a possibility to compile SONiC applications with address sanitizer (ASAN).
ASAN is a memory error detector for C/C++. It finds:
1. Use after free (dangling pointer dereference)
2. Heap buffer overflow
3. Stack buffer overflow
4. Global buffer overflow
5. Use after return
6. Use after the scope
7. Initialization order bugs
8. Memory leaks

- How I did it
By adding new ENABLE_ASAN configuration option.

- How to verify it
By default ASAN is disabled and the SONiC image is not affected.
When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS.

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
2022-02-09 13:29:18 +02:00
Alexander Allen
8a07af95e5
[Mellanox] Modified Platform API to support all firmware updates in single boot (#9608)
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.

How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
2022-01-24 00:56:38 -08:00
Shyam
20f32dc072
Added gbsyncd infra for multi-ASIC, multi-PHY mode (#9722)
- External PHY is managed via gearbox (gbsybcd docker container) in SONiC
  - Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC
  - Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode
  - Added gbsyncd.service.j2 on per_namespace basis.
  - Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its
    own Gearbox table, redis-DB

Signed-off-by: Shyam Kumar <shyakuma@cisco.com>
2022-01-21 10:08:16 +08:00
Alexander Allen
5f596aef63
[pmon] Move smartctl from pmon to host (#9607)
Why I did it
Need to be able to run smartctl when pmon docker is not running.

How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.

How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
2022-01-19 10:53:10 -08:00
Sudharsan Dhamal Gopalarathnam
bd0a19aa17
[rsyslog]Setting log file size to 16Mb (#9504)
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.

How I did it
Changed the size parameter and related macros in logrotate config for rsyslog

How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.

Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
2022-01-14 10:24:07 -08:00
Marty Y. Lok
04a4b8dcb1
[multiasic][database]database.sh failed to create the database for namespace (#9502)
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503

How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
2021-12-13 10:17:05 -08:00
Qi Luo
cf4011d526
Revert "CRM init config for SRV6 Nexthop and MY_SID resource (#9238)" (#9506)
This reverts commit 8187d473af.
2021-12-12 12:16:39 -08:00
Brian O'Connor
46bcda359c
[PINS] Build P4RT container for PINS (#9083)
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files

Submission containing materials of a third party:
    Copyright Google LLC; Licensed under Apache 2.0

#### Why I did it

Adds P4RT container to SONiC for PINS

The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md

#### How I did it

Followed the pattern and templates used for other SONiC applications

#### How to verify it

Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.

#### Which release branch to backport (provide reason below if selected)

None

#### Description for the changelog

Build P4RT container for PINS
2021-12-07 11:11:25 -08:00
Marty Y. Lok
cb4c66ae98
[chassis][multiasic] fixed rsyslogd FATAL issue in the database container in multi-asic box (#8390)
Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.

Signed-off-by: mlok <marty.lok@nokia.com>
2021-12-01 07:16:49 -08:00
liuh-80
739c45645c
[TACACS+] Add audisp-tacplus for per-command accounting. (#8750)
This pull request integrate audisp-tacplus to SONiC for per-command accounting.

#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.

#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC

#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.

#### Which release branch to backport (provide reason below if selected)
N/A

#### Description for the changelog
Add audisp-tacplus for per-command accounting.

#### A picture of a cute animal (not mandatory but encouraged)
2021-12-01 11:50:09 +08:00
Kumaresh Perumal
8187d473af
CRM init config for SRV6 Nexthop and MY_SID resource (#9238)
*Enable CRM for SRV6 Nexthop and SRV6 MY_SID entries.
2021-11-30 09:21:19 -08:00
Lawrence Lee
6e1a477ce0
[mux]: Fix mark_dhcp_packet (#9373)
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
    - Remove unused imports
    - Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-29 12:04:06 -08:00
Stephen Sun
b3ccef9c08
[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-11-24 15:00:23 +02:00
Junhua Zhai
240596ec7d
[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9332)
Why I did it
Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.

How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
2021-11-23 10:44:29 -08:00
Guohan Lu
f3faf6111b Revert "[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286)"
This reverts commit 1d2a11bbb8.
2021-11-19 10:10:55 -08:00
Junhua Zhai
1d2a11bbb8
[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286)
Why I did it
Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.

How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
2021-11-17 23:49:49 -08:00
Vivek Reddy
ff32ac3ed4
[Auto Techsupport] Event driven Techsupport Changes (#8670)
#### Why I did it

Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](https://github.com/Azure/SONiC/pull/818 )

Requires: https://github.com/Azure/sonic-utilities/pull/1796.
Merging in any order would be fine.

Summary of the changes:

- Added the YANG Models for the new tables introduces as a part of this feature.
- Enhanced init_cfg.json with the default config required
- Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json
- Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
2021-11-15 21:56:37 -08:00
liuh-80
ff09b8b8ed
[TACACS+] Add Bash TACACS+ plugin for per-command authorization. (#8715)
This pull request add a bash plugin for TACACS+ per-command authorization

#### Why I did it
1. To support TACACS per command authorization, we check user command before execute it.
2. Fix libtacsupport.so can't parse tacplus_nss.conf correctly issue:
            Support debug=on setting.
            Support put server address and secret in same row.
3. Fix the parse_config_file method not reset server list before parse config file issue.

#### How I did it
The bash plugin will be called before every user command, and check user command with remote TACACS+ server for per-command authorization.

#### How to verify it
UT with CUnit cover all code in this plugin.
Also pass all current UT.

#### Which release branch to backport (provide reason below if selected)
N/A

#### Description for the changelog
Add Bash TACACS+ plugin.


#### A picture of a cute animal (not mandatory but encouraged)
2021-11-13 09:57:30 +08:00
Stepan Blyshchak
a2c2d67098
[ACL] enable ACL FC when genereting config from minigraph but disable by default (#8908)
* [ACL] enable ACL FC when genereting config from minigraph but disable by default
Why I did it
To support ACL counters on Flex Counter Infrastructure.

How I did it
Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset.

How to verify it
Together with depends PRs. Run ACL/Everflow test suite.

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-11 09:07:54 +08:00
Guohan Lu
5f11eb320e Revert "sysready (#8889)"
This reverts commit d7e5372e54.
2021-11-10 15:36:20 -08:00
Alexander Allen
2847265bfd Mellanox bullseye merge (#1)
Allow mellanox platform to build and successfully switch packets in
Debian 11

Upgraded

* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches

Adjusted build system to support host system running bullseye and
dockers running buster.
2021-11-10 15:27:22 -08:00
LuiSzee
5b284767f6 Update Centec platform support for Bullseye and 5.10 kernel (#7)
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10

Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-11-10 15:27:22 -08:00
Saikrishna Arcot
1d00613305 Add support for building Mellanox image
ISSU will likely be broken. As of right now, the issu-version file is
not being generated during build.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
Saikrishna Arcot
33e4b7f90e Fix Python 3 syntax in SONiC container startup scripts
The common startup script used for SONiC containers is calling an inline
python command that uses Python 2 syntax, and thus errors out when run
with Python 3. Make this work with Python 3.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
Saikrishna Arcot
2b0ad74db6 Update kdump-tools for bullseye
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
Saikrishna Arcot
a1d30e3aa0 Python 2 removal/cleanup
Remove Python 2 package installation from the base image. For container
builds, reference Python 2 packages only if we're not building for
Bullseye.

For libyang, don't build Python 2 bindings at all, since they don't seem
to be used.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
Saikrishna Arcot
b8a7a6355b Update the base Debian system installation script to get Bullseye
Python 2 is no longer available, so remove those packages, and remove
the pip2 commands. For picocom and systemd, just install from the
regular repo, since there's no backports yet.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
Senthil Kumar Guruswamy
d7e5372e54
sysready (#8889) 2021-11-10 14:52:52 -08:00
Lawrence Lee
475bfc9625
[mux.service]: Remove pmon dependency (#9211)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 08:08:03 -08:00
tjchadaga
8544147a70
Fix for additional intf flap during fast-reboot (#9166) 2021-11-08 15:21:11 -08:00
Stepan Blyshchak
2ef97bb5df
[dockers] change RPC, DBG dockers version: put RPG, DBG sign in build metadata part of the version (#8920)
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.

- How I did it
Changed the way how to version in RPC and DBG images are constructed.

- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-01 19:02:57 +02:00