Commit Graph

959 Commits

Author SHA1 Message Date
mssonicbld
2339907cf1
[ci/build]: Upgrade SONiC package versions (#11339) 2022-07-06 04:43:34 +00:00
mssonicbld
51327e6125
[ci/build]: Upgrade SONiC package versions (#11258)
Upgrade SONiC Versions
2022-07-04 20:18:21 +08:00
Hua Liu
9b4387ace9
[swsscommon] Add c++ version sonic-db-cli from sonic-swss-common (#10825) (#11262)
Fix sonic-db-cli high CPU usage on SONiC startup issue: https://github.com/Azure/sonic-buildimage/issues/10218
    ETA of this issue will be 2022/05/31

    Re-write sonic-cli with c++ in sonic-swss-common: https://github.com/Azure/sonic-swss-common/pull/607
    Modify swss-common rules and slave.mk to install c++ version sonic-db-cli.

    Pass all E2E test scenario.

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111

    Build and install c++ version sonic-db-cli from swss-common.

<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration.
-->
2022-06-30 09:34:13 +08:00
Sudharsan Dhamal Gopalarathnam
5188fdee0c [lldp]Fix lldp spawned after reboot when disabled (#11080)
- Why I did it
When LLDP is disabled through feature command, it gets spawned after reboot.

- How I did it
In syncd.sh check if the service is enabled before spawning automatically during cold reboot.

- How to verify it
Disable lldp feature. Perform cold reboot and verify its not spawned.
2022-06-22 19:50:03 -07:00
mssonicbld
8d72c484f8
[ci/build]: Upgrade SONiC package versions (#11114)
Co-authored-by: mssonicbld <vsts@fv-az131-194.obwncbgs1wzu1bhwgvhcl5zkeg.jx.internal.cloudapp.net>
2022-06-21 13:55:06 +08:00
shlomibitton
20deb7985a [Mellanox] [pmon] Fix for PMON service not starting when restarting SWSS service after fast/warm reboot (#10901)
- Why I did it
Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform.
Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again.

- How I did it
On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running.
If it is running it means we are at the first boot and continue normally.
If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent.

- How to verify it
Run fast/warm reboot.
service swss restart.
Observe PMON service starting.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2022-06-20 08:21:10 -07:00
mssonicbld
94e8be646a
[ci/build]: Upgrade SONiC package versions (#10973)
Co-authored-by: mssonicbld <vsts@fv-az48-122.y11my21s2nfuzmiq0sccgy5und.cx.internal.cloudapp.net>
[ci/build]: Upgrade SONiC package versions (#10973)
2022-06-07 14:11:41 +08:00
mssonicbld
bb0c71246d
[ci/build]: Upgrade SONiC package versions (#10906) 2022-05-23 21:40:47 +00:00
Marty Y. Lok
b1c3ab73ca [VoQ][config] Multiasic Supervisor card fails to load config_db#.json in chassis when system is reboot (#10106)
Supervisor card fails to load config_db#.json in chassis when system reboot. 
This is an intermittent issue, fixes #10105
2022-05-15 23:11:01 -07:00
mssonicbld
a4283019cd
[ci/build]: Upgrade SONiC package versions (#10724) 2022-05-08 23:13:04 +00:00
Junchao-Mellanox
4dabc46d82 Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:

Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.

How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.

How to verify it
Manual test.
2022-05-07 23:17:07 -07:00
shlomibitton
d3d6d0fb52 [Fastboot] Delay PMON service for better fastboot performance (#10567)
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
2022-05-07 23:16:41 -07:00
shlomibitton
94f271c667 [Fastboot] Delay LLDP service for better fastboot performance (#10568)
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
2022-05-01 23:16:18 -07:00
Saikrishna Arcot
f1ec7107cb Remove SSH host keys after installing the custom version of sshd (#10633)
* Remove SSH host keys after installing the custom version of sshd

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Use an override for for sshd instead of overwriting the service file

Don't overwrite upstream's .service file, and instead use an override
file for making sure the host key(s) are generated.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-05-01 23:16:14 -07:00
mssonicbld
e2a2b30676
[ci/build]: Upgrade SONiC package versions (#10722) 2022-05-01 22:40:25 +00:00
mssonicbld
ff48ad4e9b
[ci/build]: Upgrade SONiC package versions (#10658) 2022-04-29 22:37:55 +00:00
mssonicbld
b7d77c7193
[ci/build]: Upgrade SONiC package versions (#10653) 2022-04-23 00:43:12 +00:00
Samuel Angebault
df8eaa0544 [Arista] Fix arista-net initramfs hook
The interface renaming logic fails if one interface is missing.
Because of the `set -e` the whole initramfs hook would abort early on
error.
This change fixes the current behavior to make sure missing interfaces
are properly skipped and ensure existing interface are renamed.
2022-04-20 10:04:21 -07:00
Samuel Angebault
eaf9a0bde8 [Arista] rename management interface in initrd (#9856)
On some products the pci enumeration adds randomness into which nic gets
initialized first.
Because SONiC doesn't use deterministic interface naming but instead old
style interface naming, this leads to eth0 not always being the
management port.
To make sure eth0 is always the management port (SONiC expectation)
rename the interfaces in the initramfs for Arista products.
2022-04-20 10:04:21 -07:00
mssonicbld
ae6caab040
[ci/build]: Upgrade SONiC package versions (#10521)
Upgrade SONiC Versions
2022-04-20 12:19:56 +08:00
Vivek R
4902c26bd8 [interfaces-config] "main exception: cannot find interfaces: eth0" error log avoided (#10463)
- Why I did it
Fixes #9628
During bootup, this error log is seen

Dec 22 04:26:29 sonic interfaces-config.sh[2546]: error: main exception: cannot find interfaces: eth0 (interface was probably never up ?)
This is of non-functional nature and doesn't affect the flow.

- How I did it
Dont take the ifdown if not needed

- How to verify it
Verified during reboot. Log did not appear and IP was acquired on eth0 as expected

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2022-04-10 23:17:30 -07:00
mssonicbld
a90ca48e1e
[ci/build]: Upgrade SONiC package versions (#10476)
Why I did it
[ci/build]: Upgrade SONiC package versions
2022-04-08 07:28:32 +08:00
xumia
7ddad86a15
[Build]: Fix armhf 202111 build broken issue (#10423)
* [Marvell] Update armhf SAI deb version 1.9.1 (#9865)

Move marvell armhf SAI deb to 1.9.1 to address build failures.

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>

* [Marvell] Update armhf driver/sai deb version (#10126)

Fixed Marvell SAI deb version naming issue reported in Marvell-switching/sonic-marvell-binaries#62

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>

* [Build]: only install grpc in amd64 (#10212)

[Build]: only install grpc in amd64
Unblock marvell-armhf build.

Co-authored-by: Rajkumar-Marvell <54936542+rajkumar38@users.noreply.github.com>

Why I did it
Cherry-pick commits from master to 202111 to fix build broken issue.
See detail in the commits.
2022-04-01 22:25:11 +08:00
Samuel Angebault
e10504592a
[202111][Arista] Update driver submodules (#9495)
cherry-pick of #9393 for 202111

 - Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
 - Add support for `Woodleaf` product
 - Move `libsfp-eeprom.so` to a different `.deb` package
 - Add new logrotate configuration for arista logs
 - Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
 - Initialize chassis cards in parallel
 - Refactor of `get_change_event` to fix interrupts treated as presence change
2022-03-23 21:19:13 +05:30
xumia
d5354df3d8 [Build]: Fix armhf mirrors not existing issue (#10312)
Why I did it
[Build]: Fix armhf mirrors not existing issue
The mirror endpoint debian-archive.trafficmanager.net does not support armhf, change to use deb.debian.org and security.debian.org.
2022-03-22 07:28:21 +00:00
xumia
4fccda4ab9 [Build]: Use one debian mirror config (#10274)
Why I did it
Use one debian mirror config.
The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense.
All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt.

How I did it
Remove files/image_config/apt and the reference.
2022-03-21 08:56:24 +00:00
xumia
cab6ac6e19 [Build]: Fix /proc not mounted issue (#10164)
[Build]: Fix /proc not mounted issue
2022-03-20 15:26:27 -07:00
Stepan Blyshchak
f506751d28 [teamd.sh] kill teamd docker on warm shutdown for faster shutdown (#10219)
This can save 6 sec for teamd LAG restoration - the time between:

```
Mar  9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar  9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```

- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.

- How I did it
Kill teamd docker after graceful shutdown of teamd processes.

- How to verify it
Run warm reboot.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-03-20 15:26:23 -07:00
xumia
9999b9cf63 [build]: Fix marvell-armhf build hung issue (#10156) (#10229)
Why I did it
The marvel-armhf build is hung, it does not exit after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
2022-03-20 15:26:09 -07:00
Saikrishna Arcot
26079ac8f9 Specify the filesystem type when mounting to /host (#10169)
When mounting the partition that contains `/host` during initramfs, the
mount binary available there (coming from busybox) tries each filesystem
in `/proc/filesystems` and sees which one succeeds. During this time,
there may be some error messages logged into dmesg because some of the
incorrect filesystems failed to mount the partition.

Specify the filesystem type explicitly so that initramfs knows it's that
type, and we know what filesystem will always get used there.

Fixes #9998

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-20 15:26:06 -07:00
Stepan Blyshchak
cec403eec2 [hostcfgd] record feature state in STATE DB (#9842)
- Why I did it
To implement blocking feature state change.

- How I did it
Record the actual feature state in STATE DB from hostcfg.

- How to verify it
UT + verification by running on the switch and checking STATE DB.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-03-20 15:26:02 -07:00
Marty Y. Lok
76ee6448b5 [chassis][supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9042 (#9043)
Why I did it
Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042

How I did it
Added database-chassis to the always running docker list if platform is supervisor card.

How to verify it
Execute the CLI command "sudo monit status container_checker"


Signed-off-by: mlok <marty.lok@nokia.com>
2022-03-07 09:21:32 -08:00
wenyiz2021
043fcfd099 Update container_checker for multi-asic devices when state is 'always_enabled' (#10067)
* Update container_checker for multi-asic devices 

Update container_checker for multi-asic devices to add database containers in always_running_containers. 
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.

* Update container_checker

Update an indent
2022-03-07 09:21:28 -08:00
Aravind Mani
0ab166a823 [sonic-cfggen]: Fix sonic-cfggen build failures for armhf (#10132)
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9

The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.

How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
2022-03-07 09:21:17 -08:00
Stepan Blyshchak
e5e1b70966 [rsyslog.j2] fix typo in VAR_LOG_SIZE_KB (#9954)
This issue causes negative threshold value and thus deleting log files even when there is enough space.

This issue causes negative threshold value and thus deleting log files even when there is enough space.

- Why I did it
To fix an issue when log files get deleted even if there is enough space.

- How I did it
Fixed an typo.

- How to verify it
Run the portion of the script that calculates threshold, see that the threshold is calculated correctly.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-02-21 23:48:19 -08:00
Dror Prital
30b69ed621
[202111] Add sonic_release file in order to have "release 202111" on sonic_version.yml (#9985)
- Why I did it
Fix Issue 9972: Incorrect information about release version in sonic_version.yml

- How I did it
Add "sonic_release" file to /sonic-buildimage/files/image_config/

- How to verify it
Install the image and run: cat /etc/sonic/sonic_version.yml
Verify the following item on sonic_version.yml file: release: '202111'
2022-02-15 10:46:42 +02:00
byu343
9b5bb887c8 Support multi-asic on macsec container (#9921)
This change enables the support of running multiple macsec containers, each for one ASIC.
2022-02-13 22:46:33 -08:00
Prince George
fa0de261aa Close console session due to user inactivity (#9890)
Signed-off-by: Prince George <prgeor@microsoft.com>
2022-02-13 18:00:54 -08:00
Alexander Allen
6b6b40c046 [pmon] Move smartctl from pmon to host (#9607)
Why I did it
Need to be able to run smartctl when pmon docker is not running.

How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.

How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
2022-02-13 17:54:58 -08:00
Alexander Allen
e8418fd2da [Mellanox] Modified Platform API to support all firmware updates in single boot (#9608)
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.

How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
2022-01-30 22:48:54 -08:00
dflynn-Nokia
e94ef351f4 [firsttime boot] suppress error message on platforms not supporting kdump (#9521)
Why I did it
Eliminate benign firsttime boot error reported when running on platforms that do not support kdump.

How I did it
Change rc.local to check for presence of the file /etc/default/kdump-tools before referencing it.

How to verify it
Install a new image on an armhf or arm64 platform and check for a failed reference to /etc/default/kdump-tools on firsttime boot.
2022-01-22 22:42:30 -08:00
Shyam
dad9a73004 Added gbsyncd infra for multi-ASIC, multi-PHY mode (#9722)
- External PHY is managed via gearbox (gbsybcd docker container) in SONiC
  - Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC
  - Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode
  - Added gbsyncd.service.j2 on per_namespace basis.
  - Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its
    own Gearbox table, redis-DB

Signed-off-by: Shyam Kumar <shyakuma@cisco.com>
2022-01-22 22:42:26 -08:00
Sudharsan Dhamal Gopalarathnam
1524d6569d [rsyslog]Setting log file size to 16Mb (#9504)
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.

How I did it
Changed the size parameter and related macros in logrotate config for rsyslog

How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.

Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
2022-01-16 22:45:08 -08:00
xumia
6fd332791e
[Build]: Add debian host base image version to support the reproducible build (#9672)
[Build]: Add debian host base image version to support the reproducible build
2022-01-05 17:27:09 +08:00
Marty Y. Lok
4783268902 [multiasic][database]database.sh failed to create the database for namespace (#9502)
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503

How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
2021-12-26 20:55:17 -08:00
Qi Luo
c2b60bda71 Revert "CRM init config for SRV6 Nexthop and MY_SID resource (#9238)" (#9506)
This reverts commit 8187d473af.
2021-12-26 20:54:56 -08:00
Brian O'Connor
6bffcb9e71 [PINS] Build P4RT container for PINS (#9083)
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files

Submission containing materials of a third party:
    Copyright Google LLC; Licensed under Apache 2.0

#### Why I did it

Adds P4RT container to SONiC for PINS

The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md

#### How I did it

Followed the pattern and templates used for other SONiC applications

#### How to verify it

Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.

#### Which release branch to backport (provide reason below if selected)

None

#### Description for the changelog

Build P4RT container for PINS
2021-12-08 20:59:23 +00:00
Marty Y. Lok
cb4c66ae98
[chassis][multiasic] fixed rsyslogd FATAL issue in the database container in multi-asic box (#8390)
Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.

Signed-off-by: mlok <marty.lok@nokia.com>
2021-12-01 07:16:49 -08:00
liuh-80
739c45645c
[TACACS+] Add audisp-tacplus for per-command accounting. (#8750)
This pull request integrate audisp-tacplus to SONiC for per-command accounting.

#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.

#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC

#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.

#### Which release branch to backport (provide reason below if selected)
N/A

#### Description for the changelog
Add audisp-tacplus for per-command accounting.

#### A picture of a cute animal (not mandatory but encouraged)
2021-12-01 11:50:09 +08:00
noaOrMlnx
0908f9ec49
[CoPP] Add always_enabled field (#9302)
*Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.
2021-11-30 11:04:15 -08:00