- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
Why I did it
Can not start sonic-hostservice
How I did it
Install python3-dbus and systemd-python, and replace invalid path
How to verify it
Start the service with below commands:
sudo systemctl start sonic-hostservice
sudo systemctl status sonic-hostservice
Signed-off-by: Gang Lv ganglv@microsoft.com
* Remove SSH host keys after installing the custom version of sshd
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Use an override for for sshd instead of overwriting the service file
Don't overwrite upstream's .service file, and instead use an override
file for making sure the host key(s) are generated.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
To support address sanitizer for Mellanox syncd
How I did it
/var/log/asan is mapped for syncd container (the same as for swss)
container stop() has a timeout (60s) for syncd (the same as for swss)
This is so libasan has enough time to generate a report.
added ASAN's log path to Mellanox syncd supervisord.conf
added "asan: yes" to sonic_version.yml
How to verify it
Added artificial memory leaks
Compiled with ENABLE_ASAN=y
Installed the image on DUT
Rebooted the DUT
Verified that /var/log/asan/syncd-asan.log contains the leaks
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Why I did it
In the bringup of tomahawk4/trident4, we realized that such chips need a larger size of /dev/shm in syncd container, so we added the option --shm-size to the docker create for syncd. The default value for shm-size is 64m; after this change, people can add SYNCD_SHM_SIZE=128m to platform_env.conf to change it to 128m.
How to verify it
We verified that after this change, 1) on existing platforms without platform_env.conf, the size of /dev/shm in syncd container (df -h | grep shm) is still the default 64M; 2) after we add SYNCD_SHM_SIZE=128m to platform_env.conf, /dev/shm in syncd becomes 128M.
Why I did it
The marvel-armhf build is hung, it does not exit after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
Why I did it
Smartmontools 6.6 has an issue with reading SMART info of nvme SSD
Smartmontools can be installed with apt-get, no need to build and install
How I did it
Use apt-get to install smartmontools 7.2-1
Remove previous make files for smartmontools 6.6
How to verify it
verify with "smartctl" can read out correct SMART info on NVME ssd.
verify "show platform ssdhealth" can still work
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9
The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.
How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
Implement infrastructure that allows enabling address sanitizer
for docker containers. Enable address sanitizer for SWSS container.
- Why I did it
To add a possibility to compile SONiC applications with address sanitizer (ASAN).
ASAN is a memory error detector for C/C++. It finds:
1. Use after free (dangling pointer dereference)
2. Heap buffer overflow
3. Stack buffer overflow
4. Global buffer overflow
5. Use after return
6. Use after the scope
7. Initialization order bugs
8. Memory leaks
- How I did it
By adding new ENABLE_ASAN configuration option.
- How to verify it
By default ASAN is disabled and the SONiC image is not affected.
When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS.
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.
How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
- External PHY is managed via gearbox (gbsybcd docker container) in SONiC
- Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC
- Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode
- Added gbsyncd.service.j2 on per_namespace basis.
- Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its
own Gearbox table, redis-DB
Signed-off-by: Shyam Kumar <shyakuma@cisco.com>
Why I did it
Need to be able to run smartctl when pmon docker is not running.
How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.
How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.
How I did it
Changed the size parameter and related macros in logrotate config for rsyslog
How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503
How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files
Submission containing materials of a third party:
Copyright Google LLC; Licensed under Apache 2.0
#### Why I did it
Adds P4RT container to SONiC for PINS
The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md
#### How I did it
Followed the pattern and templates used for other SONiC applications
#### How to verify it
Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.
#### Which release branch to backport (provide reason below if selected)
None
#### Description for the changelog
Build P4RT container for PINS
Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.
Signed-off-by: mlok <marty.lok@nokia.com>
This pull request integrate audisp-tacplus to SONiC for per-command accounting.
#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.
#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC
#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add audisp-tacplus for per-command accounting.
#### A picture of a cute animal (not mandatory but encouraged)
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
- Remove unused imports
- Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
#### Why I did it
Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](https://github.com/Azure/SONiC/pull/818 )
Requires: https://github.com/Azure/sonic-utilities/pull/1796.
Merging in any order would be fine.
Summary of the changes:
- Added the YANG Models for the new tables introduces as a part of this feature.
- Enhanced init_cfg.json with the default config required
- Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json
- Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
This pull request add a bash plugin for TACACS+ per-command authorization
#### Why I did it
1. To support TACACS per command authorization, we check user command before execute it.
2. Fix libtacsupport.so can't parse tacplus_nss.conf correctly issue:
Support debug=on setting.
Support put server address and secret in same row.
3. Fix the parse_config_file method not reset server list before parse config file issue.
#### How I did it
The bash plugin will be called before every user command, and check user command with remote TACACS+ server for per-command authorization.
#### How to verify it
UT with CUnit cover all code in this plugin.
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add Bash TACACS+ plugin.
#### A picture of a cute animal (not mandatory but encouraged)
* [ACL] enable ACL FC when genereting config from minigraph but disable by default
Why I did it
To support ACL counters on Flex Counter Infrastructure.
How I did it
Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset.
How to verify it
Together with depends PRs. Run ACL/Everflow test suite.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Allow mellanox platform to build and successfully switch packets in
Debian 11
Upgraded
* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches
Adjusted build system to support host system running bullseye and
dockers running buster.
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10
Co-authored-by: Shi Lei <shil@centecnetworks.com>
ISSU will likely be broken. As of right now, the issu-version file is
not being generated during build.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The common startup script used for SONiC containers is calling an inline
python command that uses Python 2 syntax, and thus errors out when run
with Python 3. Make this work with Python 3.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Remove Python 2 package installation from the base image. For container
builds, reference Python 2 packages only if we're not building for
Bullseye.
For libyang, don't build Python 2 bindings at all, since they don't seem
to be used.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Python 2 is no longer available, so remove those packages, and remove
the pip2 commands. For picocom and systemd, just install from the
regular repo, since there's no backports yet.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.
- How I did it
Changed the way how to version in RPC and DBG images are constructed.
- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>