Why I did it
Need to be able to run smartctl when pmon docker is not running.
How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.
How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
[image]: Prevent radius passkey and snmp community string into syslog. (#9727)
#### Why I did it
Prevent radius passkey and snmp community string into syslog.
#### How I did it
Add radius and snmp config command to PASSWD_CMDS
#### How to verify it
Run and pass all UTs.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
#### Description for the changelog
Add radius and snmp config command to PASSWD_CMDS to prevent radius passkey and snmp community string into syslog.
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.
How I did it
Changed the size parameter and related macros in logrotate config for rsyslog
How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503
How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
- Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
- Add support for `Woodleaf` product
- Move `libsfp-eeprom.so` to a different `.deb` package
- Add new logrotate configuration for arista logs
- Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
- Initialize chassis cards in parallel
- Refactor of `get_change_event` to fix interrupts treated as presence change
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files
Submission containing materials of a third party:
Copyright Google LLC; Licensed under Apache 2.0
#### Why I did it
Adds P4RT container to SONiC for PINS
The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md
#### How I did it
Followed the pattern and templates used for other SONiC applications
#### How to verify it
Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.
#### Which release branch to backport (provide reason below if selected)
None
#### Description for the changelog
Build P4RT container for PINS
Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389
How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.
Signed-off-by: mlok <marty.lok@nokia.com>
This pull request integrate audisp-tacplus to SONiC for per-command accounting.
#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.
#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC
#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add audisp-tacplus for per-command accounting.
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
Add bgpcfgd support to advertise routes.
How I did it
Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly.
How to verify it
Added unit tests in bgpcfgd and verify on KVM about route advertisement.
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
- Remove unused imports
- Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
Why I did it
Fix#9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'.
How I did it
All of platform specific gbsyncd dockers use a common name 'gbsyncd'
Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker
#### Why I did it
Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](https://github.com/Azure/SONiC/pull/818 )
Requires: https://github.com/Azure/sonic-utilities/pull/1796.
Merging in any order would be fine.
Summary of the changes:
- Added the YANG Models for the new tables introduces as a part of this feature.
- Enhanced init_cfg.json with the default config required
- Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json
- Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
This pull request add a bash plugin for TACACS+ per-command authorization
#### Why I did it
1. To support TACACS per command authorization, we check user command before execute it.
2. Fix libtacsupport.so can't parse tacplus_nss.conf correctly issue:
Support debug=on setting.
Support put server address and secret in same row.
3. Fix the parse_config_file method not reset server list before parse config file issue.
#### How I did it
The bash plugin will be called before every user command, and check user command with remote TACACS+ server for per-command authorization.
#### How to verify it
UT with CUnit cover all code in this plugin.
Also pass all current UT.
#### Which release branch to backport (provide reason below if selected)
N/A
#### Description for the changelog
Add Bash TACACS+ plugin.
#### A picture of a cute animal (not mandatory but encouraged)
* [ACL] enable ACL FC when genereting config from minigraph but disable by default
Why I did it
To support ACL counters on Flex Counter Infrastructure.
How I did it
Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset.
How to verify it
Together with depends PRs. Run ACL/Everflow test suite.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Allow mellanox platform to build and successfully switch packets in
Debian 11
Upgraded
* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches
Adjusted build system to support host system running bullseye and
dockers running buster.
1. Fix build for armhf and arm64
2. upgrade centec tsingma bsp support to 5.10 kernel
3. modify centec platform driver for linux 5.10
Co-authored-by: Shi Lei <shil@centecnetworks.com>
ISSU will likely be broken. As of right now, the issu-version file is
not being generated during build.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The common startup script used for SONiC containers is calling an inline
python command that uses Python 2 syntax, and thus errors out when run
with Python 3. Make this work with Python 3.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
There appears to be some network issue in the pipeline builds when
downloading packages from our mirror. Change the source to be from the
main debian repos to try to get around this issue.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Remove Python 2 package installation from the base image. For container
builds, reference Python 2 packages only if we're not building for
Bullseye.
For libyang, don't build Python 2 bindings at all, since they don't seem
to be used.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Python 2 is no longer available, so remove those packages, and remove
the pip2 commands. For picocom and systemd, just install from the
regular repo, since there's no backports yet.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.
- How I did it
Changed the way how to version in RPC and DBG images are constructed.
- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Enable gbsyncd support for cisco platforms
Signed-off-by: Sachin Naik sachnaik@cisco.com
Why I did it
To enable cisco gbsyncd container for cisco gearbox hardwares.
How I did it
Create symlink to gbsyncd.service.j2 to start gearbox systemd service.
How to verify it
Verify that the gbsyncd-cisco container started for x86_64-88_lc0_36fh_mo-r0 Line card
root@localhost:/home/cisco# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
50d309ea9967 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes telemetry
65cebc9e181b docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes mgmt-framework
5a9b510da24d docker-snmp:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes snmp
c291b0a1fc87 26195cc7c042 "/usr/bin/docker_ini…" 26 minutes ago Up 6 minutes dhcp_relay
d85aa5e6b78c docker-router-advertiser:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes radv
46c787329374 docker-lldp:latest "/usr/bin/docker-lld…" 28 minutes ago Up 6 minutes lldp
6643f53e4ceb docker-gbsyncd-cisco:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes gbsyncd-cisco
f05ae8af4aaa docker-syncd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes syncd
02e0e53b62cf docker-teamd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes teamd
fc7bc2dbb6a9 docker-orchagent:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes swss
5c5147c986c9 docker-fpm-frr:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes bgp
63b5ce3d4c80 docker-platform-monitor:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes pmon
7e6f34dca0e5 docker-database:latest "/usr/local/bin/dock…" 28 minutes ago Up 29 minutes database
Signed-off-by: Sachin Naik <sachnaik@cisco.com>
Co-authored-by: Sachin Naik <sachnaik@cisco.com>
This is due to the SERVICE variable declared after reading a file
#### Why I did it
To fix an issue that dhcp_relay does not restart with swss.
#### How I did it
Fixed in the swss.sh script
#### How to verify it
sudo systemctl restart swss
verify dhcp_relay restarts as well.
Add code to interfaces-config.sh to configure eth1 in multi-asic
containers so that they can access midplane subnet.
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
#### Why I did it
Nokia IXR7250E platform requires grpcio, grpcio-tools python library, and libprotobuf-dev, libgrpc++ library
#### How I did it
Modified the build_debian.sh install libprotobuf-dev and libgrpc++ to support nokia ndk
Modified the sonic_debian_extension.j2 to install the grpcio and grpcio-tools in the host
Modified the docker-platform-monitor/Dockerfile.js to install grpcio and grpcio-tools for the pmon container.
#### How to verify it
Image running success.
- add a new service "mark_dhcp_packet" to mux container
- apply packet marks on a per-interface basis in ebtables
- write packet marks to "DHCP_PACKET_MARK" table in state_db
[write_standby]: Ignore non-auto interfaces
* In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
[mux] Update Service Install With SONiC Target
Recent PR grouped all SONiC service into sonic.taget. The install section
of mux.service was not update and this causes delays when using config
reload as the service failed state is not being reset.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
[mux] Start Mux on Only Dual-ToR Platform
mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.
Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y
Edit: linkmgrd PR will follow.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Related work items: #2315, #3146150
copp-config service needs to be started after sonic.target so that it could
render the copp-config with the latest information.
It also needs to be restarted when config reload or load_minigraph is invoked.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This pull request add plugin support library to bash.
And we will create a TACACS+ plugin for bash in an other PR, which will bring per command authorization feature to bash.
Why I did it
To support TACACS per command authorization, we check user command before execute it.
How I did it
Add plugin support to bash.
How to verify it
UT with CUnit under bash project cover all new code in plugin.c.
Also pass all current UT.
Which release branch to backport (provide reason below if selected)
N/A
Description for the changelog
Add plugin support to bash.
Depends on Azure/sonic-utilities#1626
Depends on Azure/sonic-swss#1754
QOS tables in config db used ABNF format i.e "[TABLE_NAME|name] to refer fieldvalue to other qos tables.
Example:
Config DB:
"Ethernet92|3": {
"scheduler": "[SCHEDULER|scheduler.1]",
"wred_profile": "[WRED_PROFILE|AZURE_LOSSLESS]"
},
"Ethernet0|0": {
"profile": "[BUFFER_PROFILE|ingress_lossy_profile]"
},
"Ethernet0": {
"dscp_to_tc_map": "[DSCP_TO_TC_MAP|AZURE]",
"pfc_enable": "3,4",
"pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]",
"tc_to_queue_map": "[TC_TO_QUEUE_MAP|AZURE]"
},
This format is not consistent with other DB schema followed in sonic.
And also this reference in DB is not required, This is taken care by YANG "leafref".
Removed this format from all platform files to consistent with other sonic db schema.
Example:
"Ethernet92|3": {
"scheduler": "scheduler.1",
"wred_profile": "AZURE_LOSSLESS"
},
Dependent pull requests:
#7752 - To modify platfrom files
#7281 - Yang model
Azure/sonic-utilities#1626 - DB migration
Azure/sonic-swss#1754 - swss change to remove ABNF format
As a part of warmboot, redis database is dumped:
c97fe546e5/scripts/fast-reboot (L269)
However, this dump file is deleted, after it is loaded back into db post reboot.
The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful.
Instead of deleting the dump, rename and keep the dump.
Added logrotate file for wtmp and btmp to override default conf and set size cap as 100K as done in
PR: #865. For buster this is control by separate file wtmp and btmp.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
*Removed execute permissions from the systemd copp-config.service file.
Without this we will get a warning: "Configuration file /lib/systemd/system/copp-config.service is marked executable. Please remove executable permission bits. Proceeding anyway."
Why I did it
fstrim has dependency on pmon docker.
How I did it
start fstrim timer after sonic.target.
How to verify it
local test and PR test.
Signed-off-by: Ying Xie ying.xie@microsoft.com
The Lodoga platform also matched crow which was hardcoding the flash
size to 3700. This change enables autodetect on Clearlake which in turns
allows autodetect for Lodoga.
The threshold was bumped from 3700 to 4000 because size computation can
differ slightly and report slightly above 3700.
Lodoga actually has a 8GB storage device.
LodogaSsd variant has a 30GB SSD drive.
However, in boot0 both were mishandled and assigned 4GB for legacy reasons.
Remove the hardcoding of the flash size and let boot0 autodetect the available space.
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.
A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Enable fib_multipath_use_neigh for v4
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
Why I did:
This is helpful if the neighbor are not directly connected then Kernel forward to unreachable neighbor option. With this option forwarding using neighbor state to be valid.
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created
#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`
#### How to verify it
Manually test.
Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now.
Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Fix#7968
Issue is detected on SONiC.20201231.11
In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 180 0.1 0.0 6296 1272 pts/0 S 17:03 0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
In version 3.0.0, If a broadcast address is specified in
/etc/network/interfaces, then when ifup is run, it will fail with an
error saying `'str' object has no attribute 'packed'`. This appears to
be because it expects all attributes for an interface to be "packable"
into a compact binary representation. However, it doesn't actually
convert the broadcast address into an IPNetwork object (other addresses
are handled).
Therefore, convert the broadcast address it reads in from a str to an
IPNetwork object.
Also explicitly specify the scope of the loopback address in
/etc/network/interfaces as host scope. Otherwise, it will get added as
global scope by default. As part of this, use JSON to parse ip's output
instead of text, for robustness.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Use DOCKER_HOST. Every client including docker command and python docker API uses this environment variable to connect to dockerd.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
the branch refers the branch name that the commit is in,
for example master, 202012, 201911, ...
In case there is no branch, the name will be HEAD.
release is encoded in /etc/sonic/sonic_release file.
the file is only available for a release branch.
It is not available in master branch.
example for master branch
```
build_version: 'master.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: 'master'
release: 'none'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```
example for 202012 release branch
```
build_version: '202012.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: '202012'
release: '202012'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```
Signed-off-by: Guohan Lu <lguohan@gmail.com>
#### Why I did it
* `arp_update` fails to ping those neighbors over vlan sub interfaces.
#### How I did it
* modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned.
* modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
This change is to add a gbsyncd container to accommodate the syncd process and the SAI libraries for the Credo gearbox chips.
How I did it
This container works similar to the existing Broadcom syncd container. Its main difference is that the SAI-related dynamic libraries are replaced by the ones for Credo gearbox chips, and the container only reacts to SAI events for the gearbox chips. The SAI libraries will be provided by the package libsai-credo_1.0_amd64.deb.
For the image build, the added container will be built and included in the Broadcom platform image, after $(LIBSAI_CREDO)_URL = is replaced to the correct value. For now, as $(LIBSAI_CREDO)_URL is empty, the container build is skipped in the image build.
After the container is included in the image, in the runtime, the container will begin with checking the existence of /usr/share/sonic/hwsku/gearbox_config.json; if that file is not provided, the container will exit by itself. Therefore, for platforms unrelated to the Credo chips, as long as they are not providing the file, they will not be affected by this change.
This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is
mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware
upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has
to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files.
Hence we require this change.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@
There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up.
Why I did it
For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state.
How I did it
Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers.
For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts.
For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips.
Management dockers are decoupled from ASIC docker creation.
- Why I did it
Make DHCP relay docker an extension. DHCP relay now carries dhcp relay commands CLI plugin and has a complete manifest.
It is installed as extension if INCLUDE_DHCP_REALY is set to y.
DEPENDS on #5939
- How I did it
Modify DHCP relay docker makefile and dockerfile. Make changes to sonic_debian_extension.j2 to install sonic packages.
I moved DHCP related CLI tests from sonic-utilities to DHCP relay docker.
This PR introduces a way to write a plugin as part of docker image and run the tests from cli-plugin-tests directory under docker directory.
The test result is available in target/docker-dhcp-relay.gz.log:
[ REASON ] : target/docker-dhcp-relay.gz does not exist NON-EXISTENT PREREQUISITES: docker-start target/docker-config-engine-buster.gz-load target/python-wheels/sonic_utilities-1.2-py3-none-any.whl-in
stall target/debs/buster/python3-swsscommon_1.0.0_amd64.deb-install
[ FLAGS FILE ] : []
[ FLAGS DEPENDS ] : []
[ FLAGS DIFF ] : []
============================= test session starts ==============================
platform linux -- Python 3.7.3, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /sonic/dockers/docker-dhcp-relay/cli-plugin-tests, inifile:
plugins: cov-2.6.0
collecting ... collected 10 items
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_plugin_registration PASSED [ 10%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_nonexist_vlanid PASSED [ 20%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_vlanid PASSED [ 30%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_ip PASSED [ 40%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_exist_ip PASSED [ 50%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_del_dhcp_relay_dest PASSED [ 60%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_nonexist_dhcp_relay_dest PASSED [ 70%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_dhcp_relay_dest_with_nonexist_vlanid PASSED [ 80%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_plugin_registration PASSED [ 90%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_dhcp_relay_column_output PASSED [100%]
=============================== warnings summary ===============================
/usr/local/lib/python3.7/dist-packages/tabulate.py:7
/usr/local/lib/python3.7/dist-packages/tabulate.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import namedtuple, Iterable
-- Docs: https://docs.pytest.org/en/latest/warnings.html
==================== 10 passed, 1 warnings in 0.35 seconds =====================
#### Why I did it
I made this change to support warm/fast reboot for SONiC extension packages as per HLD Azure/SONiC#682.
#### How I did it
I extended manifest.json.j2 with new warm/fast reboot related fields and also extended sonic_debian_extension.j2 script template to generate the shutdown order files for warm and fast reboot.
After https://github.com/Azure/sonic-buildimage/pull/7598 the packages.json generation is broken. This change fixes it make the whole build fail in case generation failed.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Currently dhcp packets are disabled by the COPP manager for non ToRRouter type switches.
Even if the feature is enabled, DHCP packets wont hook to the CPU since the COPP manager will not trap this packets.
This change is to disable dhcp_relay by default for non ToRRouter switches from init_cfg.json.
With this approach, if the user want to enable the feature for non ToRRouter switches, manual enablement is required by the 'feature' configuration.
This is to keep the current approach for MSFT production issue with dhcp relay for non ToRRouter switched and allow the user to decide if to use it or not.
- How I did it
Configure dhcp_relay 'disabled' by default on init_cfg.json for non ToRRouter switches.
Remove the exclusion of dhcp packets on copp_cfg.json
- How to verify it
Enable dhcp_relay feature on a non ToRRouter switch.
Unit-tests modified so the default values on mocked CONFIG DB in 'test_vectors.py' for dhcp_relay will be 'disabled'.
This is by the change for 'init_cfg.json.j2'.
For ToRRouter the state will change from 'disabled' to 'enabled'.
Another test case added for a 'ToR' switch type, this is to test the state is 'enabled' if the user configured it to be so.
dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead
The below error message is seen when a reboot is issued.
[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
The voq system lag id boundary is set in redis-chassis. Changes include
setting this from database-chassis container. This fixes a timing issue
in finding datbase_config.json file from redis directory which is
created from database container. Since database container usually
starts after database-chassis container the existence of this file is
unreliable while running the command. Running the command under
database-chassis container makes sure that the database_config.json form
redis-chassis directory is guaranteed to be available and hence fixes the
timing issue.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Signed-off-by: Stepan Blyschak stepanb@mellanox.com
Why I did it
To support building DHCP relay as extension and installing it during build time.
How I did it
Created infrastructure. Users need to define their packages in rules/sonic-packages.mk
How to verify it
Together with #6531
Before this change, a process running inside every SONiC container dealt with FEATURE table 'auto_restart' field and depending on the value decided whether a container has to be killed or not.
If killed service auto restart mechanism restarts the container.
This change moves the logic from container to the host daemon - hostcfgd.
The 'auto_restart' handling is kept in supervisor-proc-exit-listener but now it is not required for container that wants to support auto restart feature.
hostcfgd refactoring - move feature handling in another class.
override systemd service Restart= setting from hostcfgd.
remove default systemd Restart=always.
Signed-off-by: Stepan Blyshchak stepanb@nvidia.com
- Why I did it
Remove the need to deal with container orchestration logic from the container itself. Leave this logic to the orchestrator - host OS.
- How I did it
hostcfgd configures 'Restart=' value for systemd service.
- How to verify it
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp enabled
root@r-tigon-11:/home/admin# show feature status | grep lldp
lldp enabled enabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 20 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 5 seconds lldp
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp disabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 35 seconds lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 3 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 39 seconds ago lldp
root@r-tigon-11:/home/admin#
#### Why I did it
- To build flashrom properly with dependency tracking.
#### How I did it
- Moved flashrom code from platform/broadcom/sonic-platform-modules-dell/tools directory to src/flashrom directory.
- At the end, flashrom_0.9.7_amd64.deb package is build which will be installed in the devices.
- Currently flashrom builds only for Dell S6100 platforms.
Introduce new sonic-buildimage images for Broadcom DNX ASIC family.
sonic-broadcom-dnx.bin
sonic-aboot-broadcom-dnx.swi
How I did it
NO CHANGE to existing make commands
make init; make configure PLATFORM=broadcom; make target/sonic-aboot-broadcom.swi; make target/sonic-broadcom.bin
The difference now is that it will result in new broadcom images for DNX asic family as well.
sonic-broadcom.bin, sonic-broadcom-dnx.bin
sonic-aboot-broadcom.swi, sonic-aboot-broadcom-dnx.swi
Note: This PR also adds support for Broadcom SAI 5.0 (based on 1.8 SAI ) for DNX based platform + changes in platform x86_64-arista_7280cr3_32p4 bcm config files and platform_env.conf files
#### Why I did it
Following the discussion in another PR https://github.com/Azure/sonic-buildimage/pull/7708#discussion_r642933510 , since there will be multi subfolders under **/var/log/mellanox**, so we agreed to only mount this folder and the subfolders will be created afterward on demand.
#### How I did it
during the syncd docker creation, only mount folder **/var/log/mellanox**
#### How to verify it
build an Mellanox image and verify the related folder on the host and docker side.
#### Why I did it
Create a target for delayed service timers. Few services in sonic have delayed to speed up the bring up of the system and essential services. However there is no way to track when they start. This will be a problem when executing config reload as config reload expects all services to be up. Hence grouped all the timers that trigger the delayed services under one target so that they could be tracked in 'config reload' command
#### How I did it
Created delay.target service and add created dependency on the delayed targets.