Why I did it
Smartmontools 6.6 has an issue with reading SMART info of nvme SSD
Smartmontools can be installed with apt-get, no need to build and install
How I did it
Use apt-get to install smartmontools 7.2-1
Remove previous make files for smartmontools 6.6
How to verify it
verify with "smartctl" can read out correct SMART info on NVME ssd.
verify "show platform ssdhealth" can still work
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9
The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.
How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
Implement infrastructure that allows enabling address sanitizer
for docker containers. Enable address sanitizer for SWSS container.
- Why I did it
To add a possibility to compile SONiC applications with address sanitizer (ASAN).
ASAN is a memory error detector for C/C++. It finds:
1. Use after free (dangling pointer dereference)
2. Heap buffer overflow
3. Stack buffer overflow
4. Global buffer overflow
5. Use after return
6. Use after the scope
7. Initialization order bugs
8. Memory leaks
- How I did it
By adding new ENABLE_ASAN configuration option.
- How to verify it
By default ASAN is disabled and the SONiC image is not affected.
When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS.
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.
How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
- External PHY is managed via gearbox (gbsybcd docker container) in SONiC
- Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC
- Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode
- Added gbsyncd.service.j2 on per_namespace basis.
- Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its
own Gearbox table, redis-DB
Signed-off-by: Shyam Kumar <shyakuma@cisco.com>
Why I did it
Need to be able to run smartctl when pmon docker is not running.
How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.
How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.
How I did it
Changed the size parameter and related macros in logrotate config for rsyslog
How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503
How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files
Submission containing materials of a third party:
Copyright Google LLC; Licensed under Apache 2.0
#### Why I did it
Adds P4RT container to SONiC for PINS
The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md
#### How I did it
Followed the pattern and templates used for other SONiC applications
#### How to verify it
Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.
#### Which release branch to backport (provide reason below if selected)
None
#### Description for the changelog
Build P4RT container for PINS
Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.
Signed-off-by: mlok <marty.lok@nokia.com>
This pull request integrate audisp-tacplus to SONiC for per-command accounting.
#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.
#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC
#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add audisp-tacplus for per-command accounting.
#### A picture of a cute animal (not mandatory but encouraged)
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
- Remove unused imports
- Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
#### Why I did it
Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](https://github.com/Azure/SONiC/pull/818 )
Requires: https://github.com/Azure/sonic-utilities/pull/1796.
Merging in any order would be fine.
Summary of the changes:
- Added the YANG Models for the new tables introduces as a part of this feature.
- Enhanced init_cfg.json with the default config required
- Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json
- Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
This pull request add a bash plugin for TACACS+ per-command authorization
#### Why I did it
1. To support TACACS per command authorization, we check user command before execute it.
2. Fix libtacsupport.so can't parse tacplus_nss.conf correctly issue:
Support debug=on setting.
Support put server address and secret in same row.
3. Fix the parse_config_file method not reset server list before parse config file issue.
#### How I did it
The bash plugin will be called before every user command, and check user command with remote TACACS+ server for per-command authorization.
#### How to verify it
UT with CUnit cover all code in this plugin.
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add Bash TACACS+ plugin.
#### A picture of a cute animal (not mandatory but encouraged)
* [ACL] enable ACL FC when genereting config from minigraph but disable by default
Why I did it
To support ACL counters on Flex Counter Infrastructure.
How I did it
Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset.
How to verify it
Together with depends PRs. Run ACL/Everflow test suite.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Allow mellanox platform to build and successfully switch packets in
Debian 11
Upgraded
* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches
Adjusted build system to support host system running bullseye and
dockers running buster.
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10
Co-authored-by: Shi Lei <shil@centecnetworks.com>
ISSU will likely be broken. As of right now, the issu-version file is
not being generated during build.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The common startup script used for SONiC containers is calling an inline
python command that uses Python 2 syntax, and thus errors out when run
with Python 3. Make this work with Python 3.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Remove Python 2 package installation from the base image. For container
builds, reference Python 2 packages only if we're not building for
Bullseye.
For libyang, don't build Python 2 bindings at all, since they don't seem
to be used.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Python 2 is no longer available, so remove those packages, and remove
the pip2 commands. For picocom and systemd, just install from the
regular repo, since there's no backports yet.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.
- How I did it
Changed the way how to version in RPC and DBG images are constructed.
- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Enable gbsyncd support for cisco platforms
Signed-off-by: Sachin Naik sachnaik@cisco.com
Why I did it
To enable cisco gbsyncd container for cisco gearbox hardwares.
How I did it
Create symlink to gbsyncd.service.j2 to start gearbox systemd service.
How to verify it
Verify that the gbsyncd-cisco container started for x86_64-88_lc0_36fh_mo-r0 Line card
root@localhost:/home/cisco# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
50d309ea9967 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes telemetry
65cebc9e181b docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes mgmt-framework
5a9b510da24d docker-snmp:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes snmp
c291b0a1fc87 26195cc7c042 "/usr/bin/docker_ini…" 26 minutes ago Up 6 minutes dhcp_relay
d85aa5e6b78c docker-router-advertiser:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes radv
46c787329374 docker-lldp:latest "/usr/bin/docker-lld…" 28 minutes ago Up 6 minutes lldp
6643f53e4ceb docker-gbsyncd-cisco:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes gbsyncd-cisco
f05ae8af4aaa docker-syncd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes syncd
02e0e53b62cf docker-teamd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes teamd
fc7bc2dbb6a9 docker-orchagent:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes swss
5c5147c986c9 docker-fpm-frr:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes bgp
63b5ce3d4c80 docker-platform-monitor:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes pmon
7e6f34dca0e5 docker-database:latest "/usr/local/bin/dock…" 28 minutes ago Up 29 minutes database
Signed-off-by: Sachin Naik <sachnaik@cisco.com>
Co-authored-by: Sachin Naik <sachnaik@cisco.com>
#### Why I did it
Nokia IXR7250E platform requires grpcio, grpcio-tools python library, and libprotobuf-dev, libgrpc++ library
#### How I did it
Modified the build_debian.sh install libprotobuf-dev and libgrpc++ to support nokia ndk
Modified the sonic_debian_extension.j2 to install the grpcio and grpcio-tools in the host
Modified the docker-platform-monitor/Dockerfile.js to install grpcio and grpcio-tools for the pmon container.
#### How to verify it
Image running success.
- add a new service "mark_dhcp_packet" to mux container
- apply packet marks on a per-interface basis in ebtables
- write packet marks to "DHCP_PACKET_MARK" table in state_db
[mux] Update Service Install With SONiC Target
Recent PR grouped all SONiC service into sonic.taget. The install section
of mux.service was not update and this causes delays when using config
reload as the service failed state is not being reset.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
[mux] Start Mux on Only Dual-ToR Platform
mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.
Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y
Edit: linkmgrd PR will follow.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Related work items: #2315, #3146150
This pull request add plugin support library to bash.
And we will create a TACACS+ plugin for bash in an other PR, which will bring per command authorization feature to bash.
Why I did it
To support TACACS per command authorization, we check user command before execute it.
How I did it
Add plugin support to bash.
How to verify it
UT with CUnit under bash project cover all new code in plugin.c.
Also pass all current UT.
Which release branch to backport (provide reason below if selected)
N/A
Description for the changelog
Add plugin support to bash.
Depends on Azure/sonic-utilities#1626
Depends on Azure/sonic-swss#1754
QOS tables in config db used ABNF format i.e "[TABLE_NAME|name] to refer fieldvalue to other qos tables.
Example:
Config DB:
"Ethernet92|3": {
"scheduler": "[SCHEDULER|scheduler.1]",
"wred_profile": "[WRED_PROFILE|AZURE_LOSSLESS]"
},
"Ethernet0|0": {
"profile": "[BUFFER_PROFILE|ingress_lossy_profile]"
},
"Ethernet0": {
"dscp_to_tc_map": "[DSCP_TO_TC_MAP|AZURE]",
"pfc_enable": "3,4",
"pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]",
"tc_to_queue_map": "[TC_TO_QUEUE_MAP|AZURE]"
},
This format is not consistent with other DB schema followed in sonic.
And also this reference in DB is not required, This is taken care by YANG "leafref".
Removed this format from all platform files to consistent with other sonic db schema.
Example:
"Ethernet92|3": {
"scheduler": "scheduler.1",
"wred_profile": "AZURE_LOSSLESS"
},
Dependent pull requests:
#7752 - To modify platfrom files
#7281 - Yang model
Azure/sonic-utilities#1626 - DB migration
Azure/sonic-swss#1754 - swss change to remove ABNF format
As a part of warmboot, redis database is dumped:
c97fe546e5/scripts/fast-reboot (L269)
However, this dump file is deleted, after it is loaded back into db post reboot.
The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful.
Instead of deleting the dump, rename and keep the dump.