Commit Graph

4343 Commits

Author SHA1 Message Date
Mahesh Maddikayala
f6b842edd3
[BCMSAI] Update BCMSAI debian to 4.3.0.10 with 6.5.21 SDK, and opennsl module to 6.5.21 (#6526)
BCMSAI 4.3.0.10, 6.5.21 SDK release with enhancements and fixes for vxlan, TD3 MMU, TD4-X9 EA support, etc.
2021-01-28 08:38:47 -08:00
Qi Luo
0e7287856c
[build]: stop prompt during build (#6585)
Some commands used during build will prompt user interactively, but this is not expected during build. Since most output is collected into log file, user could not see the prompt and feel the build process hangs.

- How I did it

Use mv command in non interactive mode
Redirect stdin to null if command output is collected into log file.
2021-01-28 02:21:38 -08:00
Guohan Lu
fb0b999361 [ci]: append job.attempt in memdump/log artifacts
azure pipepline does not allow upload same artifacts again.
thus, use job.attempt to uniquely name the test artifacts

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 01:05:44 -08:00
Vaibhav Hemant Dixit
98298f7f7f
[sonic-sairedis] advance submodule to include fix for syncd crash during shutdown (#6581)
Remove unregisterMessageHandler from NetMsgRegistrar thread (#779)
2021-01-27 21:55:18 -08:00
lguohan
7d01613300
[ci]: correct ownership of artifacts (#6582)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 20:57:17 -08:00
Qi Luo
69c5832153
[ci]: Download artifact instead of using nfs storage (#6570)
I notice that I rerun a failed job (not the stages), the nfs store is already cleaned by previous failed jobs.
2021-01-27 19:58:58 -08:00
Guohan Lu
f7346cca32 [docker-fmp-frr]: remove blank lines in generated critical_process
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 19:41:59 -08:00
Guohan Lu
34cca20cb6 [proc-exit-listener]: ignore blank lines
make proc-exit-listener more rebust

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 19:41:59 -08:00
Shi Su
aab37b7f42
[FRR] Create a separate script to wait zebra to be ready to receive connections (#6519)
The requirement for zebra to be ready to accept connections is a generic problem that is not 
specific to bgpd. Making the script to wait for zebra socket a separate script and let bgpd and 
staticd to wait for zebra socket.
2021-01-27 12:36:02 -08:00
Kebo Liu
7f222e7bc1
[mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566)
Changes in the new release:

1. Policy based hashing optimization
2. New attribute support for Max port headroom
3. Tunnel ECN map fixes
4. Tunnel EVPN skeleton extensions (peer attrib, maps)
5. Bridge port admin not affecting port admin (optimize port down time)
6. CRM new API for neighbors and tunnel termination entries
7. Improve FDB event for flush by bridge port (before, null bridge was reported to SONiC, now the bridge will be extracted from bridge port)
8. DHCP L2 v4+v6 traps (for ZTP use case)
9. Generic counter implementation

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-27 12:29:28 -08:00
dflynn-Nokia
1f2797a56d
[docker-config-engine-stretch]: Fix dependency typo PYTHON2_SWSSCOMMON (#6568)
This commit fixes a typo in the fix delivered in PR #6538

syncd fails on the armhf platform within sonic-config-engine/portconfig.py when importing the following
'from swsscommon.swsscommon import ConfigDBConnector'
2021-01-27 12:27:41 -08:00
abdosi
cfa8fbbf1a
[baseimage]: Updates for Ebtables and support for multi-asic (#6542)
Following changes were done for ebtables:

- Support for Multi-asic platforms. Ebtable filters are installed in namespace for multi-asic and not host. On Single asic installed on  host.

- For Multi-asic platforms we don't want to install on host otherwise Namespace-to-Namespace communication does not happens since ARP Request are not forwarded.

- Updated to use text file to restore ebtables rules then the binary format. Rules are restore as part of Database docker init instead of rc.local

- Removed the ebtable service files for buster as not needed as filters are restored/installed as part of database docker init.
   All the binaries are pre-installed with ebtables* binary are same as ebatbles-legacy-* 

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-01-27 08:36:10 -08:00
Guohan Lu
f3a901c41e [ci]: build docker-ptf on vs platform
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Guohan Lu
044efe72ca [build]: add _BUILD_ENV to specify env for dpkg-buildpackage
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Guohan Lu
ca0e8cbe0e [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Kebo Liu
9ff56445c9
Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550)
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
2021-01-27 15:39:54 +02:00
bingwang-ms
6fa807d0d0
[bgpmon]: Fix exception in bgpmon caused by duplicate bgp neighbor ID (#6546)
* Fix exception in bgpmon caused by duplicate keys
It is possible that BGP neighbors in IPv4 and IPv6 address families
share the same name (such as bgp monitor). However, such case is not
handled in bgpmon, and an Exception will be raised. This commit will
address the issue by Using set instead of list to avoid duplicate keys.
2021-01-26 23:01:42 -08:00
Qi Luo
e616a32950
[ci]: add master and 202012 into azure-pipelines trigger (#6560) 2021-01-26 15:42:04 -08:00
Prince Sunny
7337483381
[submodule]: update sonic-swss (#6561)
f4aefba - 2021-01-25 : [Mux] Fix repeating logs in case of tunnel creation fail (#1610) [Prince Sunny]
2021-01-26 13:35:11 -08:00
lguohan
a9a0e3062c
[ci]: archive kvmtest artifacts (#6567)
- archive logs
- archive kvm memdump when failed
- publish kvm test results

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-26 13:09:56 -08:00
lguohan
30ae46ea7f
[ci]: add -k ceos option to setup t0 testbed (#6565)
this is due to command line change in
1e12790a93

this is due to command line change in
Azure/sonic-mgmt@1e12790

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-26 02:28:27 -08:00
lguohan
6957e376be
[ci]: reset the owner for all files under working directory (#6557)
reset the owner for all files under working directory. some files were owned by root after build, which cause
next build to fail since directory cannot be cleanned.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-25 19:05:01 -08:00
Kebo Liu
84985e103d
[mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552)
Bugs fixes:
    All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
    All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
    Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
    Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.

Enahncments:
    All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-25 10:52:22 -08:00
Tamer Ahmed
8d857fab16
[dhcp-relay]: Launch DHCP Relay On L3 Vlan (#6527)
Recent changes brought l2 vlan concept which do not have DHCP
clients behind them and so DHCP relay is not required. Also,
dhcpmon fails to launch on those vlans as their interfaces
lack IP addresses. This PR limit launch of both DHCP relay
and dhcpmon to L3 vlans only.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-25 10:48:48 -08:00
Guohan Lu
a38377e96d [submodule]: update sonic-swss
* f4e8245 2021-01-24 | [fpmsyncd] Skip routes to eth0 or docker0 (#1606) (HEAD, origin/master, origin/HEAD) [Shi Su]
* f4c3579 2021-01-23 | Enhance dynamic buffer calculation and bug fixes (#1601) [Stephen Sun]
* e800c9f 2021-01-22 | [logfile]: Add option to specify swss rec file name (#1546) [arlakshm]
* 1acf60e 2021-01-17 | Implementation of System ports initialization, Interface & Neighbor Synchronization... (#1431) [minionatwork]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 22:25:11 -08:00
Guohan Lu
4b5212b8b2 [vstest]: add default vs test
Check #6483

add test to make sure default route change in eth0 does not
affect the default route in the default vrf

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 22:25:11 -08:00
lguohan
cd3ed549c3
[submodule]: update sonic-sairedis (#6544)
* 20b9573 2021-01-24 | [SAI]: update SAI submodule (#775) (HEAD, origin/master, origin/HEAD) [lguohan]
* 667c33d 2021-01-22 | [syncd] Comparison logic add support to LABEL attribute with higher priority (#764) [Kamil Cudnik]
* aaf5b98 2021-01-22 | [vslib]: Fix missing MACsec Create Port action (#770) [Ze Gan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 21:01:47 -08:00
lguohan
3bc82e5556
[ci]: add vs tests (#6506)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 21:01:19 -08:00
Zhenhong Zhao
a171e6c5e4
[frrcfgd] introduce frrcfgd to manage frr config when frr_mgmt_framework_config is true (#5142)
- Support for non-template based FRR configurations (BGP, route-map, OSPF, static route..etc) using config DB schema.
- Support for save & restore - Jinja template based config-DB data read and apply to FRR during startup

**- How I did it**

- add frrcfgd service
- when frr_mgmg_framework_config is set, frrcfgd starts in bgp container
- when user changed the BGP or other related table entries in config DB, frrcfgd will run corresponding VTYSH commands to program on FRR.
- add jinja template to generate FRR config file to be used by FRR daemons while bgp container restarted

**- How to verify it**
1. Add/delete data on config DB and then run VTYSH "show running-config" command to check if FRR configuration changed.
1. Restart bgp container and check if generated FRR config file is correct and run VTYSH "show running-config" command to check if FRR configuration is consistent with attributes in config DB

Co-authored-by: Zhenhong Zhao <zhenhong.zhao@dell.com>
2021-01-24 17:57:03 -08:00
Dmytro Shevchuk
dd0e1100a5
[sonic-cfggen] parse optional fec and autoneg fields from hwsku.json (#6155)
**- Why I did it**

For now `hwsku.json` and `platform.json` dont support optional fields. For example no way to add `fec` or `autoneg` field using `platform.json` and `hwsku.json`.

**- How I did it**
Added parsing of optional fields from hwsku.json.

**- How to verify it**
Add optional field to `hwsku.json`. After first boot will be generated new `config_db.json` or you can generate it using `sonic-cfggen` command. In this file must be optional field from `hwsku.json` or check using command `redis-cli hgetall PORT_TABLE:Ethernet0`
Example of `hwsku.json`, that must be parsed:
```
{
    "interfaces": {
        "Ethernet0": {
            "default_brkout_mode": "1x100G[40G]",
            "fec": "rs",
            "autoneg": "0"
        },
...
}
```
Example of generated `config_db.json`:
```
    "PORT": {
        "Ethernet0": {
            "alias": "Ethernet0",
            "lanes": "0,1,2,3",
            "speed": "100000",
            "index": "1",
            "admin_status": "up",
            "fec": "rs",
            "autoneg": "0",
            "mtu": "9100"
        },
```
So, we can see this entries in redis db:
```
admin@sonic:~$ redis-cli hgetall PORT_TABLE:Ethernet0

 1) "alias"
 2) "Ethernet0"
 3) "lanes"
 4) "0,1,2,3"
 5) "speed"
 6) "100000"
 7) "index"
 8) "1"
 9) "admin_status"
10) "up"
11) "fec"
12) "rs"
13) "autoneg"
14) "0"
15) "mtu"
16) "9100"
17) "description"
18) ""
19) "oper_status"
20) "up"
```

Also its way to fix `show interface status`, `FEC` field but also need add `FEC` field to `hwsku.json`.
Before:
```
admin@sonic:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC        Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -----------  ------  ------  -------  ---------------  ----------
  Ethernet0          0,1,2,3     100G   9100     N/A    Ethernet0  routed      up       up  QSFP28 or later         N/A
```
After:
```
admin@sonic:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC        Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -----------  ------  ------  -------  ---------------  ----------
  Ethernet0          0,1,2,3     100G   9100     rs    Ethernet0  routed      up       up  QSFP28 or later         N/A
```
2021-01-24 17:46:33 -08:00
Praveen Chaudhary
24df482e0e
[yang_model_test]: Tests for default value of docker_routing_config_mode and Empty ACL ports. (#6470)
Tests for default value of docker_routing_config_mode and Empty ACL ports.

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
2021-01-24 17:33:12 -08:00
lguohan
0daad0b51d
[ci]: build syncd-rpc for broadcom and mellanox (#6522)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 17:31:47 -08:00
Vadym Hlushko
48e7116bcd
[DPB][SN3700C] extended set of speeds for split modes (#6277)
platform.json and hwsku.json files has not a full set of speeds for split modes

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2021-01-24 16:33:52 -08:00
Vadym Hlushko
709c1ecb06
[DPB][SN4700] extended set of speeds for split modes (#6278)
platform.json and hwsku.json files has not a full set of speeds for split modes

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2021-01-24 16:33:25 -08:00
Antonina Melnyk
da7f80d70b
[barefoot] Fixes for platform API (#6487)
There was a mismatch with Eeprom class methods names and methods called from Eeprom class.

Signed-off-by: Antonina Melnyk antoninax.melnyk@intel.com
2021-01-24 16:32:18 -08:00
judyjoseph
46b3bd5503
[teamd]: Increase wait timeout for teamd docker stop to clean Port channels. (#6537)
The Portchannels were not getting cleaned up as the cleanup activity was taking more than 10 secs which is default docker timeout after which a SIGKILL will be send.
Fixes #6199
To check if it works out for this issue in 201911 ? #6503

This issue is significantly seen in master branch compared to 201911 because the Portchannel cleanup takes more time in master. Test on a DUT with 8 Port Channels.

master

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m15.599s
    user    0m0.061s
    sys     0m0.038s
Sonic 201911.v58

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m5.541s
    user    0m0.020s
    sys     0m0.028s
2021-01-23 20:57:52 -08:00
Joe LeVeque
238803d6bf
[sonic-host-services] Report unit test coverage (#6533)
To view unit test coverage of sonic-host-services package upon build
2021-01-23 00:32:06 -08:00
Tamer Ahmed
8ce1e3ed92
[build-docker-buster]: Install libboost 1.171 In Build Docker (#6532)
Installing newst buster version of libboost (v1.71) in build docker.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-23 00:29:35 -08:00
Joe LeVeque
d4cde6d310
[process-reboot-cause] Make process-reboot-cause executable (#6534)
process-reboot-cause script should be executable.
2021-01-23 00:29:13 -08:00
Joe LeVeque
4a8e513460
[sonic-platform-daemons] Update submodule (#6535)
Submodule changes to be committed:

* src/sonic-platform-daemons 81318f7...e72f6cd (3):
  > [ledd] Minor refactor; add unit tests (#143)
  > [thermalctld] Report unit test coverage (#141)
  > [psud] Increase unit test coverage (#140)
2021-01-23 00:28:04 -08:00
Danny Allen
ef6a05f07e
[DellEMC Z9332f] Remove duplicate ipmihelper.py script (#6536)
Fixes #6445

Because the ipmihelper.py script in the 9332 folder is slightly different than the common one (due to LGTM fixes), when the common one gets copied during build time it causes the workspace/build to become dirty.

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-23 00:26:46 -08:00
Qi Luo
1c13340f8e
[docker-config-engine-stretch]: Add missing dependency PYTHON2_SWSSCOMMON (#6538)
Otherwise all the docker image derived from docker-config-engine-stretch will have broken SONIC_CONFIG_ENGINE_PY2
The bug is introduced in #6406
2021-01-23 00:25:11 -08:00
Guohan Lu
104367838d Revert "[files/build/versions]: support reproduceable build for git (#5774)"
This reverts commit d75c290f00.
2021-01-22 10:28:44 -08:00
arlakshm
0e12ca81c7
[Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
This PR has the changes to support having different swss.rec and sairedis.rec for each asic.
The logrotate script is updated as well

- How I did it

Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747)
In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec

Update the logrotate script for multiasic platform .
2021-01-22 09:42:19 -08:00
Prince Sunny
07200ee4f3
Submodule update swss-common (#6525) 2021-01-22 09:31:53 -08:00
abdosi
5f39926646
Updated BBR to use peer group name as prefix. (#6515)
To make BBR configured for peer-group if it's name starts with (prefixed) with the string define in constants.yml instead of exact string match.
2021-01-22 08:29:27 -08:00
Samuel Angebault
0464d15b18
[pmon]: Run ledd using python3 unless excluded (#6528)
**- Why I did it**

Ledd is the last daemon that is not enabled to run in python3.
Even though there is a plan to deprecate this daemon and to replace it by something else it's one simple step toward python2 deprecation.

**- How I did it**

Changed the `command=` line for `ledd` in the `supervisord` configuration of `pmon`.
Copied what was done for other daemons.

**- How to verify it**

Booting a product that has a `led_control.py` should now show the ledd running in python3.
I ran `python3 -m pylint` on all `led_control.py` plugin which means that most of them should be python3 compliant.
There is however still a risk that some might not work.
2021-01-22 07:12:01 -08:00
Lawrence Lee
8729fdc9ed
[minigraph.py]: Force /128 prefix for server IPv6 loopbacks (#6524)
Meet the requirement for the MUX_CABLE table that IPv6 loopbacks have a /128 prefix

Note that this change only affects the MUX_CABLE table, all other tables continue to use the loopback address provided in minigraph.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-01-21 15:05:35 -08:00
yozhao101
be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
Tamer Ahmed
5c31f6d8cc
[sonic-swss-common]: Update Submodule (#6508)
Update in this change:
640a218 [packaging]: Add Support For Libboost v1.71.0 (#449)

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-21 12:11:04 -08:00