- Why I did it
Need to execute mlxreg inside pmon docker
- How I did it
Add MFT package to pmon Makefile
- How to verify it
Install image, go to pmon : docker exec -it pmon bash, exec mlxreg
Verifiy warm, fast and cold reboot while MFT is being called in pmon constantly
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Why I did it
The docker storage driver vfs is not a good option for build, it uses the “deep copy” when building a new layer, leads to lower performance and more space used on disk than other storage drivers.
A better docker storage driver is the default one overlay2, it is a modern union filesystem.
Currently, the build with ASAN_ENABLE=y reuses the packages built with
ASAN_ENABLE=n (and vice versa). To address this issue, ASAN_ENABLE is added to DEP_FLAGS for asan-enabled packages (docker-syncd-mlnx, syncd, docker-orchagent, swss).
- Why I did it
To make dpkg cache use/rebuild the packages for ASAN_ENABLE=y/n.
- How I did it
Added ASAN_ENABLE to the DEP_FLAGS for asan-enabled packages.
- How to verify it
Built with ASAN_ENABLE=y/n and checked the .flags .log files.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is to improve the readability of ASAN reports. The debug package adds function names and source code references to the backtrace (currently, there are only binary addresses of functions)
Another way to address this issue is to build the image with "INSTALL_DEBUG_TOOLS=y". The downside of this approach is that the image size and compilation time are unnecessarily big. Also, the idea is to make the "ENABLE_ASAN" self-sufficient, which would not be the case for this approach.
- Why I did it
To improve the readability of asan logs.
- How I did it
Added SYNCD_DBG and SWSS_DBG to corresponding docker images for ASAN_ENABLE=y build
- How to verify it
Add artificial memory leak
Build with ASAN_ENABLE=y
Test the image and check the ASAN report
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Official build fails complaining missing below targets:
2022-05-25T10:50:38.0560306Z tar: target/debs/buster/libyang2-cpp1_2.0.112-6_amd64.deb: Cannot stat: No such file or directory
2022-05-25T10:50:38.0571392Z tar: target/debs/buster/libyang2-cpp-dev_2.0.112-6_amd64.deb: Cannot stat: No such file or directory
2022-05-25T10:50:38.0588893Z tar: target/debs/buster/libyang2-cpp1-dbgsym_2.0.112-6_amd64.deb: Cannot stat: No such file or directory
2022-05-25T10:50:38.0590887Z tar: target/debs/buster/yang-tools_2.0.112-6_all.deb: Cannot stat: No such file or directory
Why I did it
Upgrade FRR to version 8.2.2. Build libyang2 required by FRR.
How I did it
Update FRR version and tag.
How to verify it
Following tests were performed on sonic-vs:
BGP docker status check
BGP configuration and session establishment
Route redistribution and ping
Issued show commands to check the bgp neighbor and routes
Checked app-db to ensure bgp routes are installed with correct interface and nexthop.
Create VRF and check FRR knows the VRF
Check VRF routes are installed in app-db with correct Vrf name and next-hop
Establish BGP Evpn session and check if Evpn routes (multicast, mac, prefix) are exchanged and installed correctly in app-db.
Fixes#9279
- Why I did it
Part of larger effort to move all SONiC systems to bullseye
- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3
- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
Python 2 support for sonic-pcied was removed, and the Python 2 version
of the variable no longer exists.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.
docker-database:latest
docker-swss:latest
When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.
This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.
docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag
The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
Why I did it
Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well.
Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts.
Signed-off-by: Zhaohui Sun zhaohuisun@microsoft.com
How I did it
Add python3 virtual environment for docker-ptf.
Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here:
p4lang/ptf#173p4lang/ptf#174
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
* [CG-Fix-CVE-2021-44906] Patching on thrift.0.14.1 for package minimist
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* add more information in patch
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Update 0003-Remove-minimist-packages.patch
* change the thrift 0.14.1 to package download
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* use the series file for patching
* fix a code defect
Why I did it
Missing the dependency of macsecmgrd in swss so that the MACsec feature cannot be enabled.
How I did it
Add SWSS dependency in docker-macsec.mk
How to verify it
Check the Azp of sonic-mgmt
sign-off: Jing Zhang zhangjing@microsoft.com
#### Why I did it
As part of the process moving containers from buster to bullseye.
#### How I did it
1. change base image from buster to bullseye.
2. remove unused addition to orchagent run options
#### How to verify it
Tested building locally.
* [build]: Patch debootstrap to not unmount the host's /proc filesystem
Currently, when the final image is being built (sonic-vs.img.gz,
sonic-broadcom.bin, or similar), each invocation of sudo in the
build_debian.sh script takes 0.8 seconds to run and execute the actual
command. This is because the /proc filesystem in the slave container has
been unmounted somehow. This is happening when debootstrap is running,
and it incorrectly unmounts the host's (in our case, the slave
container's) /proc filesystem because in the new image being built,
/proc is a symlink to the host's (the slave container's) /proc. Because
of that, /proc is gone, and each invocation of sudo adds 0.8 seconds
overhead. As a side effect, docker exec into the slave container during
this time will fail, because /proc/self/fd doesn't exist anymore, and
docker exec assumes that that exists.
Debootstrap has fixed this in 1.0.124 and newer, so backport the patch
that fixes this into the version that Bullseye has.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* [build_debian.sh]: Use eatmydata to speed up deb package installations
During package installations, dpkg calls fsync multiples times (for each
package) to ensure that tht efiles are written to disk, so that if
there's some system crash during package installation, then it is in at
least a somewhat recoverable state. For our use case though, we're
installing packages in a chroot in fsroot-* from a slave container and
then packaging it into an image. If there were a system crash (or even
if docker crashed), the fsroot-* directory would first be removed, and
the process would get restarted. This means that the fsync calls aren't
really needed for our use case.
The eatmydata package includes a library that will block/suppress the
use of fsync (and similar) system calls from applications and will
instead just return success, so that the application is not blocked on
disk writes, which can instead happen in the background instead as
necessary. If dpkg is run with this library, then the fsync calls that
it does will have no effect.
Therefore, install the eatmydata package at the beginning of
build_debian.sh and have dpkg be run under eatmydata for almost all
package installations/removals. At the end of the installation, remove
it, so that the final image uses dpkg as normal.
In my testing, this saves about 2-3 minutes from the image build time.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Change ln syntax to use chroot
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
To sign SONiC kernel image and allow secure boot based system to verify SONiC image before loading into the system.
How I did it
Pass following parameter to rules/config.user
Ex:
SONIC_ENABLE_SECUREBOOT_SIGNATURE := y
SIGNING_KEY := /path/to/key/private.key
SIGNING_CERT := /path/to/public/public.cert
How to verify it
Secure boot enabled system enrolled with right public key of the, image in the platform UEFI database will able to verify image before load.
Alternatively one can verify with offline sbsign tool as below.
export SBSIGN_KEY=/abc/bcd/xyz/
sbverify --cert $SBSIGN_KEY/public_cert.cert fsroot-platform-XYZ/boot/vmlinuz-5.10.0-8-2-amd64 mage
O/P:
Signature verification OK
Removed python2 support for sonic-platform-daemons that was causing unit
test errors in sonic_pcied.
* Removed config from docker supervisord jinja templates per VD review comment
* Removed space and python3 per QL comments
Why I did it
Running warm-reboot in a loop for 500 times leads to this error on 318-th iteration:
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors Traceback (most recent call last):
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors File "/usr/bin/restore_neighbors.py", line 24, in <module>
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors from scapy.all import conf, in6_getnsma, inet_pton, inet_ntop, in6_getnsmac, get_if_hwaddr, Ether, ARP, IPv6, ICMPv6ND_NS, ICMPv6NDOptSrcLLAddr
Apr 2 15:56:27.346795 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/all.py", line 25, in <module>
Apr 2 15:56:27.346956 sonic INFO swss#/supervisord: restore_neighbors from scapy.route import *
Apr 2 15:56:27.346995 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/route.py", line 205, in <module>
Apr 2 15:56:27.347089 sonic INFO swss#/supervisord: restore_neighbors conf.iface = get_working_if()
Apr 2 15:56:27.347129 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/linux.py", line 128, in get_working_if
Apr 2 15:56:27.347213 sonic INFO swss#/supervisord: restore_neighbors ifflags = struct.unpack("16xH14x", get_if(i, SIOCGIFFLAGS))[0]
Apr 2 15:56:27.347250 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/common.py", line 31, in get_if
Apr 2 15:56:27.347345 sonic INFO swss#/supervisord: restore_neighbors return ioctl(sck, cmd, struct.pack("16s16x", iff.encode("utf8")))
Apr 2 15:56:27.347365 sonic INFO swss#/supervisord: restore_neighbors OSError: [Errno 19] No such device
The issue was reported to scapy devs secdev/scapy#3369, the fix is secdev/scapy#3371, however there is no released scapy version with this fix right now, thus decided to build scapy v2.4.5 from sources and apply the fix in a form of a patch.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Change the base image from `docker-config-engine-buster` to
`docker-config-engine-bullseye`, and remove the hardcoded
`radvd` version from the Dockerfile.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
SONiC is migrating to bullseye. This will update the sonic-pins container to bullseye.
#### How I did it
The [sonic-pins code](https://github.com/Azure/sonic-buildimage/blob/master/rules/p4rt.mk) isn't dependent on any architecture so it will already build successfully for bullseye. This PR updates the docker to use bullseye.
#### How to verify it
Today we cannot build the docker-sonic-p4rt.gz target (e.g. Issue #9885). With this change the docker will build successfully. The P4RT executable will not run, because of a missing runtime library, libgmpxx, which I'll address in a followup PR.
#### Description for the changelog
Update docker-sonic-p4rt.gz target to build with Bullseye instead of Buster.
- Why I did it
Stopping swss and syncd causes some driver module unloading. Those driver modules are depended by PMON. This could trigger ERROR logs in syslog.
- How I did it
Adjust warmboot shutdown order in make file
- How to verify it
Manual test
#### Why I did it
The current redis version of SONiC is `6.0.6`, which contains many high-risky security issues like CVEs that are fixed in the latest version. The Redis release notes also highly recommend to upgrade with SECURITY urgency.
```
================================================================================
Redis 6.0.16 Released Mon Oct 4 12:00:00 IDT 2021
================================================================================
Upgrade urgency: SECURITY, contains fixes to security issues.
Security Fixes:
* (CVE-2021-41099) Integer to heap buffer overflow handling certain string
commands and network payloads, when proto-max-bulk-len is manually configured
to a non-default, very large value [reported by yiyuaner].
* (CVE-2021-32762) Integer to heap buffer overflow issue in redis-cli and
redis-sentinel parsing large multi-bulk replies on some older and less common
platforms [reported by Microsoft Vulnerability Research].
* (CVE-2021-32687) Integer to heap buffer overflow with intsets, when
set-max-intset-entries is manually configured to a non-default, very large
value [reported by Pawel Wieczorkiewicz, AWS].
* (CVE-2021-32675) Denial Of Service when processing RESP request payloads with
a large number of elements on many connections.
* (CVE-2021-32672) Random heap reading issue with Lua Debugger [reported by
Meir Shpilraien].
* (CVE-2021-32628) Integer to heap buffer overflow handling ziplist-encoded
data types, when configuring a large, non-default value for
hash-max-ziplist-entries, hash-max-ziplist-value, zset-max-ziplist-entries
or zset-max-ziplist-value [reported by sundb].
* (CVE-2021-32627) Integer to heap buffer overflow issue with streams, when
configuring a non-default, large value for proto-max-bulk-len and
client-query-buffer-limit [reported by sundb].
* (CVE-2021-32626) Specially crafted Lua scripts may result with Heap buffer
overflow [reported by Meir Shpilraien].
Other bug fixes:
* Fix appendfsync to always guarantee fsync before reply, on MacOS and FreeBSD (kqueue) (#9416)
* Fix the wrong mis-detection of sync_file_range system call, affecting performance (#9371)
* Fix replication issues when repl-diskless-load is used (#9280)
```
#### How I did it
Edit `Dockerfile.j2` file
#### How to verify it
Check redis version
#### Description for the changelog
This PR will upgrade redis-server version to `6.0.16`.
#### Why I did it
To bump the Thrift version to 0.14.1
- To avoid [CVE-2020-13949](https://nvd.nist.gov/vuln/detail/CVE-2020-13949)
- to fix some dependencies issues
#### How I did it
- rename `src/thrfit_0_13_0` to `src/thrift_2` to remove version number in the path. (`src/thrift` contains rules to build thrift 0.11.0 )
- Add thrift sources as submodule as there are no prepared debian packages for version >0.13.0 on [debian.org](https://packages.debian.org/search?searchon=sourcenames&keywords=thrift)
- Added patches with fixes for original thrift debian rules:(remove unneeded packages, fix multi job build)
#### How to verify it
```
BLDENV=buster make -f Makefile.work target/debs/buster/libthrift-dev_0.14.1_amd64.deb
```
Implement infrastructure that allows enabling address sanitizer
for docker containers. Enable address sanitizer for SWSS container.
- Why I did it
To add a possibility to compile SONiC applications with address sanitizer (ASAN).
ASAN is a memory error detector for C/C++. It finds:
1. Use after free (dangling pointer dereference)
2. Heap buffer overflow
3. Stack buffer overflow
4. Global buffer overflow
5. Use after return
6. Use after the scope
7. Initialization order bugs
8. Memory leaks
- How I did it
By adding new ENABLE_ASAN configuration option.
- How to verify it
By default ASAN is disabled and the SONiC image is not affected.
When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS.
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
- Why I did it
Remove obsolete parameter that enables static VXLAN src port range
provide functionality no generate json config file according to appropriate parameter in config_db
Done for
SN3800:
• Mellanox-SN3800-D28C50
• Mellanox-SN3800-C64
• Mellanox-SN3800-D28C49S1 (New 10G SKU)
SN2700:
• Mellanox-SN2700-D48C8
- How I did it
Remove SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 from appropriate sai.profile files
Created vxlan.json file and added few params that depends on DEVICE_METADATA.localhost.vxlan_port_range
- How to verify it
File /etc/swss/config.d/vxlan.json should be generated inside swss docker when it restart
[
{
"SWITCH_TABLE:switch": {
"vxlan_src": "0xFF00",
"vxlan_mask": "8"
},
"OP": "SET"
}
]
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Enable dbgsym package for dhcpmon.
Allow CFLAGS and LDFLAGS from environment variables to be used
in the dhcp6relay build. This makes sure that the -O2 flag from
dpkg-buildflags gets used.
Finally, enable all hardening flags in dpkg-buildflags for
dhcp6relay and dhcpmon. The change from the default set of flags is that
during linking, immediate binding of symbols is done instead of lazy
binding.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* [y_cable] Support for initialization of new Daemon ycable to support
ycables
This PR also adds the commit in sonic-platform-daemons
94fa239 [y_cable] refactor y_cable to a seperate logic and new daemon from xcvrd (#219)
Why I did it
This PR separates the logic of Y-Cable from xcvrd. Before this change we were utilizing xcvrd daemon to control all aspects of Y-Cable right from initialization to processing requests from other entities like orch,linkmgr.
Now we would have another daemon ycabled which will serve this purpose.
Logically everything still remains the same from the perspective of other daemons.
it also take care aspects like init/delete daemon from Y-Cable perspective.
How I did it
To serve the purpose we build a new wheel sonic_ycabled-1.0-py3-none-any.whl and install it inside pmon.
We also initalize the daemon ycabled which serves our purpose for refactor inside pmon
How to verify it
Ran the changes with an image for dualtor tests on a 7050cx3 platform
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Why I did it
Need to be able to run smartctl when pmon docker is not running.
How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.
How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
As part of this, update the isc-dhcp package to match the Bullseye
version (this fixes some compile errors related to BIND), clean up some
of the build dependencies and runtime dependencies for debian packaging,
and use the default Boost version to compile against instead of
explicitly saying using 1.74.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Improve the Linux kernel build cache hit rate.
Current the the hit rate is around 85.8% (based on the last 3 month, 3479 PR builds totally, 494 PR build not hit).
We can improve the hit rate up to 95% or better.
The Linux kernel build will take really long time, most of the PRs are nothing to do with the kernel change. The remaining cache options should be enough to detect the Linux kernel cache status (dirty or not).
Correct thrift.0.13.0 dependent package name.
In previous code, the buildout target was named as PYTHON3_THRIFT_0_13_0
But when add the prackage to LIBTHRIFT_0_13_0, it typo as PYTHON_THRIFT_0_13_0
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files
Submission containing materials of a third party:
Copyright Google LLC; Licensed under Apache 2.0
#### Why I did it
Adds P4RT container to SONiC for PINS
The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md
#### How I did it
Followed the pattern and templates used for other SONiC applications
#### How to verify it
Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.
#### Which release branch to backport (provide reason below if selected)
None
#### Description for the changelog
Build P4RT container for PINS
Bring in the following commit:
405f1df Use build profiles instead of distro version for Python 2 binding build (#558)
This change requires a corresponding change in this repo to set a build
profile to not build the python 2 bindings on Bullseye.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This pull request integrate audisp-tacplus to SONiC for per-command accounting.
#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.
#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC
#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add audisp-tacplus for per-command accounting.
#### A picture of a cute animal (not mandatory but encouraged)
* Add macsec-xpn-support iproute2 in syncd
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Polish code
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Remove useless files
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Add self-compiled iproute2 to docker sonic vs
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Enhance apt install for iproute2 dependencies
Signed-off-by: Ze Gan <ganze718@gmail.com>
HLD updated here: https://github.com/Azure/SONiC/pull/887
#### Why I did it
Command `monit summary -B` can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to address that. monit is still used by system-health to do file system check as well as customize check.
#### How I did it
1. Get container names from FEATURE table
2. For each container, collect critical process names from file critical_processes
3. Use “docker exec -it <container_name> bash -c ‘supervisorctl status’” to get processes status inside container, parse the output and check if any critical processes exit
#### How to verify it
1. Add unit test case to cover it
2. Adjust sonic-mgmt cases to cover it
3. Manual test