Commit Graph

804 Commits

Author SHA1 Message Date
mssonicbld
839527def8
[ci/build]: Upgrade SONiC package versions (#7659) 2021-05-20 14:05:56 +00:00
Sujin Kang
d1043e3c91 add config-setup.service as dependency for pcie-check.service (#7599)
Why I did it
start pcie-check.service after config-setup.service since pcie_util depends on device_info which is available with config db metadata.

How I did it
Add config-setup.service as a dependency of pcie-check.service

How to verify it
Upon reboot, check if the pcie-check.sh throws the platform api error which is dependent on DEVICE_METADATA
2021-05-19 18:14:19 +00:00
mssonicbld
1461c9f98f
[ci/build]: Upgrade SONiC package versions (#7613) 2021-05-19 14:02:10 +00:00
gechiang
3eaadec21c
[202012] BRCM SAI 4.3.3.5-2 picked up BRCM hsdk-6.5.21-p1.patch and CS00012097141 fix (#7581)
This is to pick up BRCM SAI 4.3.3.5-2 which contains 2 main changes:
1.  hsdk-6.5.21-p1.patch (to address some field problems related to SDK issue.
2. Fix for CS00012097141 (remove some unnecessary debug setting that was causing non functional impacting problem at boot time)

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on S6100 DUT and all passed:
```
     ipfwd/test_dir_bcast.py
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     decap/test_decap.py
     fdb/test_fdb.py
```
2021-05-12 07:56:46 -07:00
mssonicbld
c8ee27319b
[ci/build]: Upgrade SONiC package versions (#7589) 2021-05-12 14:11:06 +00:00
mssonicbld
aac5c04d91
[ci/build]: Upgrade SONiC package versions (#7583) 2021-05-12 07:19:54 +00:00
Qi Luo
262d0948f1
[build]: Update the netifaces pip3 package version (#7574)
202012 branch is using reproducible build. The versions-py3 file must contains correct pip3 package version. Otherwise we will get a build error

```
Step 31/74 : RUN pip3 install          scapy==2.4.4          pyroute2==0.5.14          netifaces==0.10.9
 ---> Running in d7a2401dd21d
Collecting scapy==2.4.4
  Downloading scapy-2.4.4.tar.gz (1.0 MB)
Collecting pyroute2==0.5.14
  Downloading pyroute2-0.5.14.tar.gz (873 kB)
ERROR: Cannot install netifaces==0.10.9 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested netifaces==0.10.9
    The user requested (constraint) netifaces==0.10.7

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
The command '/bin/sh -c pip3 install          scapy==2.4.4          pyroute2==0.5.14          netifaces==0.10.9' returned a non-zero code: 1
[  FAIL LOG END  ] [ target/docker-sonic-vs.gz ]
```
2021-05-11 09:19:52 -07:00
mssonicbld
a3980f948a
[ci/build]: Upgrade SONiC package versions (#7576) 2021-05-11 14:35:57 +00:00
Renuka Manavalan
99958304c1 [container_checker] Use Feature table to get running containers (#7474)
Why I did it
Finding running containers through "docker ps" breaks when kubernetes deploys container, as the names are mangled.

How I did it
The data is is available from FEATURE table, which takes care of kubernetes deployment too.

How to verify it
Deploy a feature via kubernetes and don't expect error from container_check.
2021-05-10 15:59:57 -07:00
mssonicbld
b95f86722c
[ci/build]: Upgrade SONiC package versions (#7540) 2021-05-09 04:44:39 +00:00
mssonicbld
b0460c7ce1
[ci/build]: Upgrade SONiC package versions (#7480)
Co-authored-by: mssonicbld <vsts@fv-az196-264.g5lmldmxpfterdw5iojffagm3c.gx.internal.cloudapp.net>
2021-05-06 06:50:02 +08:00
Nazarii Hnydyn
0e970582c1
[swss_vars]: Add 'resource_type' attribute. (#7188)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-05-03 10:38:11 -07:00
guxianghong
a0fde3a626 [arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-05-02 08:11:56 -07:00
Samuel Angebault
30cc959787 [Arista] Fix dockerd issue on Arista platforms (#7376)
Why I did it
Recent systemd upgrade from #7228 requires an extra cmdline parameter for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.

How I did it
Just added the missing cmdline parameter in files/Aboot/boot0.j2
This change fixes #7372

How to verify it
Boot the image and dockerd should start normally.
2021-05-01 19:43:51 -07:00
Stepan Blyshchak
ae574ab000 [systemd] disable default systemd udev rules for interfaces (#7369)
Fix #7364

99-default.link - was always in SONiC, but previous systemd (<247) had an issue and it did not work due to issue systemd/systemd#3374. Now systemd 247 works.

However, such policy overrides teamd provided mac address which causes teamd netdev to use a random mac
address. Therefore, needs to be disabled.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-05-01 19:43:41 -07:00
xumia
1b05982727 Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
2021-04-29 10:08:55 -07:00
mssonicbld
f7adf9c180
[ci/build]: Upgrade SONiC package versions (#7415) 2021-04-24 15:17:07 +00:00
Kuanyu Chen
66dedf38c2 [config-setup]: Fix a bug in checking if updategraph is enabled (#7093)
Encounter error during "config-setup boot" if the updategraph is enabled.

How I did it
Correct the code inside the config-setup script.
Remove the space between the assignment operator.

How to verify it
Remove the /etc/sonic/config_db.json and reboot the device.
Originally, it will return following error after boot up.
rv: command not found
After modification, it can correctly parse the status of updategraph without error.
2021-04-21 13:58:03 -07:00
yozhao101
c63b59698c [container_checker] Exclude the 'always_disabled' container from expected running container list (#7217)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Since we introduced a new value always_disabled for the state field in FEATURE table, the expected running container list
should exclude the always_diabled containers. This bug was found by nightly test and posted at here: issue. This PR fixes #7210.

How I did it
I added a logic condition to decide whether the value of state field of a container was always_disabled or not.

How to verify it
I verified this on the device str-dx010-acs-1.

Which release branch to backport (provide reason below if selected)
 201811
 201911
 202006
[ x] 202012
2021-04-02 11:52:35 -07:00
mssonicbld
505db8e91a
[ci/build]: Upgrade SONiC package versions (#7042)
Co-authored-by: mssonicbld <vsts@fv-az80-884.nqsemdo0cabejmrqkclmmohwag.dx.internal.cloudapp.net>
2021-03-29 08:07:44 -07:00
Tamer Ahmed
6fb1600234 [hostcfgd]: Add Ability To Configure Feature During Run-time (#6700)
Features may be enabled/disabled for the same topology based on run-time
configuration. This PR adds the ability to enable/disable feature based
on config db data.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-03-15 19:09:31 -07:00
arlakshm
cc6e521b40 [baseimage] add ipintutil in sudoer file (#6845)
show ip interfaces is enhanced recently to support multi ASIC platforms in this PR- https://github.com/Azure/sonic-utilities/pull/1396 .
The ipintutil script as to run as sudo user, to get the ip interface from each namespace.
Add this script to the sudoer file so that show ip interface command is available for user with read-only permissions

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-03-13 23:29:24 -08:00
mssonicbld
77477857b4
Update SONiC version files (#6996)
Co-authored-by: mssonicbld <vsts@fv-az113-375.lunlmptkugju1kgiw3yhqmpbea.bx.internal.cloudapp.net>
2021-03-12 10:01:12 +08:00
Renuka Manavalan
a4d81f3c19 Copy dummy flannel.conf to get around absence of CNI Network (#6985)
Why I did it
We skip install of CNI plugin, as we don't need. But this leaves node in "not ready" state, upon joining master.
To fix, we copy this dummy .conf file in /etc/cni/net.d

How I did it
Keep this file in /usr/share/sonic/templates and copy to /etc/cni/net.d upon joining k8s master.

How to verify it
Upon configuring master-IP and enable join, watch node join and move to ready state.
You may verify using kubectl get nodes command
2021-03-10 09:32:49 -08:00
mssonicbld
0830738503
Update SONiC version files (#6972)
Co-authored-by: mssonicbld <vsts@fv-az131-135.jj2e24u0tnvezfdztknplege1f.xx.internal.cloudapp.net>
2021-03-08 14:55:35 -08:00
mssonicbld
57085e4a6a
Update SONiC version files (#6963)
Co-authored-by: mssonicbld <vsts@fv-az124-394.1jx3ho342nguppyzzg0wtvoj2f.bx.internal.cloudapp.net>
2021-03-05 13:45:16 +08:00
yozhao101
7748597fa2 [Supervisord] Deduplicate the alerting messages of critical processes from Supervisord. (#6849)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
In the configuration of rsyslog, duplicate messages will be suppressed and reported in the format of message repeated n times.
Due to this behavior, if a critical process in a container exited unexpectedly, the alerting message will be written into syslog once
and not be written into syslog anymore until the second critical process exited. This PR aims to differentiate these alerting messages such that they will not be suppressed by rsyslogd and can appear in the syslog periodically.

How I did it
This PR adds a counter into the alerting message and shows how many minutes a critical process was not running.

How to verify it
I verified and test this implementation on a physical DUT.
2021-03-04 21:23:05 +00:00
Sujin Kang
15aed52ef2 [pcie.yaml] Move pcie configuration file path to platform directory (#6475)
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437

- How I did it

Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
2021-03-04 21:23:05 +00:00
Stepan Blyshchak
7fb5a72d23 [services] introduce sonic.target (#5705)
- Why I did it
Group all SONiC services together and able to manage them together. Will be used in config reload command as much simpler and generic way to restart services.

- How I did it
Add services to sonic.target

- How to verify it
Together with Azure/sonic-utilities#1199
config reload -y

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-03-04 21:23:05 +00:00
mssonicbld
193fc248a0
Update SONiC version files (#6927)
Co-authored-by: mssonicbld <vsts@fv-az95-714.mvvw4rc1ki0utgz3kiduxkzutd.ex.internal.cloudapp.net>
2021-03-03 19:05:52 +08:00
dflynn-Nokia
e3ab6b0494 [armhf build] Fix azure-storage dependency on cryptography package (#6780)
Fix marvell-armhf build break

The azure-storage package depends on the cryptography package. Newer
versions of cryptography require the rust compiler, the correct version
for which is not readily available in buster. Hence we pre-install an
older version here to satisfy the azure-storage dependency.
Note: This is not a problem for other architectures as pre-built versions
of cryptography are available for those. This sequence can be removed
after upgrading to debian bullseye.
2021-03-01 09:40:00 -08:00
Samuel Angebault
a654518968
[Arista] Driver and platform update (#6468) (#6872)
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
 - Add support for more variants of `DCS-7280CR3-32[PD]4`
 - Add Supervisor to Linecard consutil support
 - Complete Watchdog platform API support
 - Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
 - Fix SEU management on `DCS-7060CX-32S`
 - Allow kernel modules to build up to linux 5.10
 - Rename led color `orange` to `amber`
 - Miscellaneous fixes
2021-02-24 10:09:52 -08:00
SuvarnaMeenakshi
b6aaeb979e [multi_asic][vs]: Add dependency in teamd service to start after topology service(#6594)
[multi_asic][vs]: Add dependency in teamd service to start after topology service.
- Why I did it
In multi-asic VS, topology service is run after database service to set up the internal asic topology.
swss and syncd have a dependency to start after topology service is run so that the interfaces are moved to right namespace and created in the right namespace. In case of multi-asic vs, during the initial boot up, when there is no configuration added, teamd service starts and swss/syncd do not start as topology service does not start. Upon loading configuration using config_db or minigraph, swss and sycnd start up , but teamd is not restarted as swss is not stopped and started. This causes teamd to be in a bad state and requires a reload of config.

- How I did it
Add dependency in teamd service to start after topology service is completed.

- How to verify it
No change in single asic vs or platform.
No change in multi-asic regular image.
Change only in multi-asic VS. Bring up a multi-asic VS image without any configration, teamd service will fail to start due to dependency failure. Load minigraph, start topology service, load configuration, ensure all services come up.
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
2021-02-23 23:56:01 +00:00
arlakshm
d7be5a021a [Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
This PR has the changes to support having different swss.rec and sairedis.rec for each asic.
The logrotate script is updated as well

- How I did it

Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747)
In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec

Update the logrotate script for multiasic platform .
2021-02-23 23:56:01 +00:00
Joe LeVeque
d7517a704c [PDDF] Build and install Python 3 package (#6286)
- Make PDDF code compliant with both Python 2 and Python 3
- Align code with PEP8 standards using autopep8
- Build and install both Python 2 and Python 3 PDDF packages
2021-02-23 23:56:01 +00:00
xumia
7c00106817
Add the version files for the branch 202012 (#6836) 2021-02-23 10:36:32 +08:00
xumia
fa7d636226 Add mirrors for reproducible build (#6813) 2021-02-19 11:51:34 -08:00
shlomibitton
6361d36fb2 Stop teamd service before syncd (#6755)
- What I did
All SWSS dependent services should stop before SWSS service to avoid future possible issues.
For example 'teamd' service will stop before to allow the driver unload netdev gracefully.
This is to stop all LAG's before restarting syncd service when running 'config reload' command.

- How I did it
Change the order of dependent services of SWSS.

- How to verify it
Run 'config reload' command.
Previously the operation failed when a large number of PortChannel configured on the system.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-02-16 15:33:10 -08:00
Lawrence Lee
e0efbc1e14 [swss]: Clear MUX-related state DB tables on start (#6759)
* Add *MUX_CABLE_TABLE* to set of tables to clear on SWSS start, which
will clear HW_MUX_CABLE_TABLE and MUX_CABLE_TABLE
* Order swss to start before pmon to ensure that DBs are cleared before
xcvrd (running inside pmon) starts and re-populates the tables

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-02-16 15:33:03 -08:00
Lior Avramov
80b048d4d1 [systemd] Increase syncd startup script timeout to support FW upgrade on init. (#6709)
**- Why I did it**
To support FW upgrade on init.

**- How I did it**
Change timeout value

**- How to verify it**
I manually changed ASIC and Gearbox FW followed by hard reset in order for FW upgrade to take place on init.

Signed-off-by: liora <liora@nvidia.com>
2021-02-16 15:31:10 -08:00
judyjoseph
0c17839908 [teamd]: Increase wait timeout for teamd docker stop to clean Port channels. (#6537)
The Portchannels were not getting cleaned up as the cleanup activity was taking more than 10 secs which is default docker timeout after which a SIGKILL will be send.
Fixes #6199
To check if it works out for this issue in 201911 ? #6503

This issue is significantly seen in master branch compared to 201911 because the Portchannel cleanup takes more time in master. Test on a DUT with 8 Port Channels.

master

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m15.599s
    user    0m0.061s
    sys     0m0.038s
Sonic 201911.v58

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m5.541s
    user    0m0.020s
    sys     0m0.028s
2021-02-05 16:22:28 -08:00
Joe LeVeque
57a6fb9f39 [pcie-check] Update underlying pcieutil command and add to sudoers file (#6682)
- Why I did it

As of Azure/sonic-utilities#1297, subcommands of pcieutil have changed to remove the redundant pcie- prefix. This PR adapts calling applications (pcie-check) to the new syntax.

Resolves #6676

- How I did it

Remove pcie- prefix from pcieutil subcommands in calling applications
Also add pcieutil * to sudoers file, as pcieutil requires elevated permissions
2021-02-05 15:47:58 -08:00
Guohan Lu
bab136fc8f [proc-exit-listener]: fix syntax error
the bug is introduced in commit 34cca20c

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-03 10:46:07 -08:00
arlakshm
24d785a64d [baseimage]: add docker ps to the sudoer file (#6604)
fixes Azure/sonic-utilities#1389

With the recent changes in sudoer files. The  show commands fails for the read-only users. 
The problem here is the 'docker ps' is failing in the function [get_routing_stack()](8a1109ed30/show/main.py (L54)) therefore all the CLI commands are failing.

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-02-03 10:39:00 -08:00
arlakshm
197f75a246 [multi asic] add ip netns identify command to sudoer (#6591)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

- Why I did it
The command sudo ip netns identify <pid> is used in function get_current_namespace
to check in the cli command is running in host context or within a namespace.

This function is used for every CLI command and command sudo ip netns identify <pid> needs to be added in sudoer files to allow users with RO access to run show cli commands

This problem is not there on single asic platforms.

- How I did it
Add ip netns identify [0-9]* to sudoers file.
2021-02-03 10:38:24 -08:00
Guohan Lu
f00bb52f7c [proc-exit-listener]: ignore blank lines
make proc-exit-listener more rebust

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:28:52 -08:00
yozhao101
cc9c3f567e [supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-28 09:28:27 -08:00
Qi Luo
c5b7370a8f [baseimage]: Cleanup sudoers file (#6518) 2021-01-21 08:41:23 -08:00
Ying Xie
a1951ea198 [warm boot finalizer] only wait for enabled components to reconcile (#6454)
* [warm boot finalizer] only wait for enabled components to reconcile

Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-01-15 08:20:28 -08:00
yozhao101
bfec282a82 [Monit] Monitoring the running status of containers. (#6251)
**- Why I did it**
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. 

**- How I did it**
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

**- How to verify it**
I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-01-09 08:27:53 -08:00