Commit Graph

857 Commits

Author SHA1 Message Date
kellyyeh
6e17ef311a [dhcp_relay] Remove dhcp6mon (#10467) 2022-04-12 18:39:19 +00:00
Stepan Blyshchak
721a53b9a0 [scapy] update scapy to 2.4.5 and patch it (#10457)
Why I did it
Running warm-reboot in a loop for 500 times leads to this error on 318-th iteration:

Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors Traceback (most recent call last):
Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/bin/restore_neighbors.py", line 24, in <module>
Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors     from scapy.all import conf, in6_getnsma, inet_pton, inet_ntop, in6_getnsmac, get_if_hwaddr, Ether, ARP, IPv6, ICMPv6ND_NS, ICMPv6NDOptSrcLLAddr
Apr  2 15:56:27.346795 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/all.py", line 25, in <module>
Apr  2 15:56:27.346956 sonic INFO swss#/supervisord: restore_neighbors     from scapy.route import *
Apr  2 15:56:27.346995 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/route.py", line 205, in <module>
Apr  2 15:56:27.347089 sonic INFO swss#/supervisord: restore_neighbors     conf.iface = get_working_if()
Apr  2 15:56:27.347129 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/arch/linux.py", line 128, in get_working_if
Apr  2 15:56:27.347213 sonic INFO swss#/supervisord: restore_neighbors     ifflags = struct.unpack("16xH14x", get_if(i, SIOCGIFFLAGS))[0]
Apr  2 15:56:27.347250 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/arch/common.py", line 31, in get_if
Apr  2 15:56:27.347345 sonic INFO swss#/supervisord: restore_neighbors     return ioctl(sck, cmd, struct.pack("16s16x", iff.encode("utf8")))
Apr  2 15:56:27.347365 sonic INFO swss#/supervisord: restore_neighbors OSError: [Errno 19] No such device
The issue was reported to scapy devs secdev/scapy#3369, the fix is secdev/scapy#3371, however there is no released scapy version with this fix right now, thus decided to build scapy v2.4.5 from sources and apply the fix in a form of a patch.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-04-07 22:57:47 +00:00
kellyyeh
0e6f1833e0 Update docker-router-advertiser.supervisord.conf.j2 (#10375) 2022-04-07 22:57:37 +00:00
Lawrence Lee
5b0f0c1d99 [tun_pkt]: Wait for AsyncSniffer to init fully (#10346)
Fix for Tunnel packet handler can crash at system startup 
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-03-30 21:16:18 +00:00
Lior Avramov
07c170fa04
Remove quagga from SONiC (#10384)
Quagga is no longer being used in SONiC. Cherry-pick from master PR #7898

Co-authored-by: liora <liora@nvidia.com>
2022-03-30 13:57:34 -07:00
Saikrishna Arcot
e9db38594d
Image disk space reduction (#10172) (#10371)
Reduce the disk space taken up during bootup and runtime.

1. Remove python package cache from the base image and from the containers.
2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup.
3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example).

* Remove pip2 and pip3 caches from some containers

Only containers which appeared to have a significant pip cache size are
included here.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Don't create var-log.ext4 if we're storing logs in memory

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Run tune2fs on the device containing /host to not reserve any blocks for just the root user

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 5617b1ae3e)
2022-03-29 10:11:28 -07:00
Saikrishna Arcot
e4b30e3090 [restapi]: Don't use python/python2 for restapi start scripts (#10285)
Python 2 isn't installed by default in Buster and Bullseye containers,
and the scripts/modules can be used with Python 3, so make sure Python 3
is used.

Why I did it
After the Buster and Bullseye upgrade for the restapi container, processes will no longer start because supervisord is trying to call python and python2, both of which are unavailable.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-22 18:35:27 -07:00
kellyyeh
adaec6337f
[radv] Support multiple ipv6 prefixes per vlan interface (#9934) (#10253)
Why I did it
Radvd.conf.j2 template creates two copies of the vlan interface when there are more than one ipv6 address assigned to a single vlan interface. Changed the format to add prefixes under the same vlan interface block.

How I did it
Modifies radvd.conf.j2 and added unit tests

How to verify it
Configure multiple ipv6 address to the same vlan, start radvd
Unit test will check if radvd.conf with multiple ipv6 addresses is formed correctly
2022-03-20 17:17:59 -07:00
Shilong Liu
3455e99d45
Add a config variable to override default container registry instead of dockerhub. (#10166) (#10262)
* Add variable to reset default docker registry
* fix bug in docker version control
2022-03-18 12:01:52 +08:00
Longxiang Lyu
259aa0856b Add dualtor TSA/B/C support (#9726)
Why I did it
Add TSA/B/C dualtor support

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
For TSA, toggle all the mux to standby if the device type is dualtor and there are active mux ports.
For TSC, add mux status output.

How to verify it
Run TSA/B/C on a dualtor setup
2022-03-08 19:02:06 +00:00
Saikrishna Arcot
ee2b08e988
[202012] Upgrade restapi docker to Buster (#10003)
Backport the changes done in #9791 to the 202012 branch, and change the base image to Buster.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-04 20:44:07 -08:00
Lawrence Lee
d162ffe0a5
[swss]: Wait for vlan intf to start ndppd (#10119) (#10153)
202012 version of #10119

Why I did it
If the VLAN interface is not up when ndppd starts, it will fail to enable allmulti mode on the interface and be unable to process received NDP packets

The following logs are seen:

/var/log/syslog.33.gz:Feb 18 10:33:12.825406 sonic INFO swss#/supervisord: ndppd (error) Failed to set allmulti: No such device

How I did it
Use the wait_for_link script currently used by radv to delay ndppd startup until the vlan interface is ready

How to verify it
Apply the changes to a device. config reload the device and confirm that the above error logs are not observed when ndppd starts. Run the arp/test_arp_dualtor.py::test_proxy_arp test case and verify it passes.
2022-03-04 20:40:29 -08:00
xumia
2a7378b8c4 [Security]: Upgrade urllib3 to fix CVE-2021-33503
See https://security.archlinux.org/CVE-2021-33503
2022-02-25 09:11:56 +00:00
Richard.Yu
38f5e3bc66 [PTF-SAIv2]Add ptf docker for sai-ptf (saiv2) (#9729)
* [PTF-SAIv2]Add ptf dockre for sai-ptf (saiv2)

Base on current ptf docker create a new docker for sai-ptf(saiv2)
upgrade related package
use the latest ptf and install it

test done:
NOJESSIE=1 NOSTRETCH=1 NOBULLSEYE=1 ENABLE_SYNCD_RPC=y make target/docker-ptf-sai.gz
BLDENV=buster make -f Makefile.work target/docker-ptf-sai.gz

* upgrade the thrift to 014
2022-02-23 22:46:33 +00:00
Travis Van Duyn
d18b7fa24c updated jinja template for snmp contact python2 vs python3 issue (#9949) 2022-02-12 01:06:13 +00:00
arlakshm
14bbccc9d6 [multi-asic] fix network command for internal loopback (#7878)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
In the multi asic platforms all the ASIC are advertising the same IPv6 /64 network from Loopback4096.
Therefore, the IPv6 loopback address of backend asic is not learnt on the frontend asic.
Change the bgpd.conf.main.conf.j2 template file to advertise the Loopback4096 ipv6 address as /128
2022-02-09 19:27:46 +00:00
abdosi
17a8f42704 [muti-asic] Updated BGP community for Internal routes (#7617)
Following changes are done:

Internal routes are tagged with no-export instead of local-AS
Option to add User Define BGP community on top of no-export
2022-02-09 19:27:32 +00:00
Lawrence Lee
59a7dc9f1e [swss]: Reduce tunnel_packet_handler memory usage (#9762)
* Configure scapy to not store sniffed packets

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-02-08 19:07:40 +00:00
vdahiya12
73b27b7c9e
fix build error (#9902)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2022-02-03 08:52:29 +05:30
Shi Su
0b9077dc47 Add openbfdd to ptf docker (#9488)
Why I did it
To enable test support for BFD-related features, the PTF docker needs to have the proper support for BFD. This PR aims to add BFD support in ptf docker.

How I did it
Clone and build OpenBFDD for PTF docker.

How to verify it
Build locally and verify BFD is supported.
2022-01-31 20:08:49 +00:00
Saikrishna Arcot
5f3269a61b Create a docker-swss-layer that holds the swss package.
This is to save about 40MB of disk space, since 5 containers
individually install this package.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit bd479cad29)
2022-01-27 23:53:09 -08:00
SuvarnaMeenakshi
d2ee7a5bef [docker-snmp]: Modify log level of snmpd (#9734)
#### Why I did it
resolves https://github.com/Azure/sonic-buildimage/issues/8779
snmpd writes the below error message in syslog :
snmp#snmpd[27]: truncating integer value > 32 bits
This message is written in syslog when the hrSystemUptime(1.3.6.1.2.1.25.1.1.0 / system uptime) or sysUpTime(1.3.6.1.2.1.1.3 network management portion or snmpd uptime) is queried when either of these counters overflow beyond 32 bit value. This happens the device uptime or snmpd uptime is more than 497 days.

#### How I did it
Reference: https://access.redhat.com/solutions/367093 and https://linux.die.net/man/1/snmpcmd

To avoid seeing this message if the counter grows, the snmpd error log level is changed to display  LOG_EMERG, LOG_ALERT, LOG_CRIT, and LOG_DEBUG.

Without this change, LOG_ERR and LOG_WARNING would also be logged in syslog.

#### How to verify it
On a device which is up for more than 497 days, modify supervisord.conf  with the change and restart snmp.
Query 1.3.6.1.2.1.1.3 and verify that log message is not seen.
2022-01-14 23:01:19 +00:00
Shi Su
60ac485f96 Reduce route selection deferral timer for bgp graceful restart (#7533)
Why I did it
There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation.

Fix #7488

How I did it
Reduce route selection deferral timer for bgp graceful restart to 15 seconds.
2021-12-20 19:24:58 +00:00
Lawrence Lee
a41c15a329 [swss]: Listen for undeliverable tunnel packets (#9348)
- Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.
- This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.
2021-12-16 11:59:34 -08:00
Travis Van Duyn
0226140e9c [snmp]: updated to support snmp config from redis configdb (#6134)
**- Why I did it**
I'm updating the jinja2 template to support getting SNMP information from the redis configdb. 
I'm using the format approved here: 
https://github.com/Azure/SONiC/pull/718

This will pave the way for us to decrement using the snmp.yml in the future.  
Right now we will still be using both the snmp.yml and configdb to get variable information in order to create the snmpd.conf via the sonic-cfggen tool. 

**- How I did it**
I first updated the SNMP Schema in PR #718 to get that approved as a standardized format. 
Then I verified I could add snmp configs to the configdb using this standard schema.  Once the configs were added to the configdb then I updated the snmpd.conf.j2 file to support the updates via the configdb while still using the variables in the snmp.yml file in parallel.  This way we will have backward compatibility until we can fully migrate to the configdb only. 

By updating the snmpd.conf.j2 template and running the sonic-cfggen tool the snmpd.conf gets generated with using the values in both the configdb and snmp.yml file. 

Co-authored-by: trvanduy <trvanduy@microsoft.com>
2021-12-13 17:42:48 +00:00
kellyyeh
2019ccaa2a [radv] Run radv on MgmtToRRouter (#9424)
* Allow radv to run on mgmt tor and EPMS
2021-12-06 21:32:33 +00:00
arlakshm
9f0fc89cff remove staticd.conf.j2 (#9182)
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-12-01 02:28:51 +00:00
Stephen Sun
fafd5327bd [Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-12-01 02:28:46 +00:00
Lawrence Lee
77378b4364 [mux]: Call write_standby from host only
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
25712c712e [mux]: Make write_standby available on host
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

[write_standby]: Cleanup and fix build

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
84cd0e9471 [mux]: Initialize all mux ports as standby
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b8f70f8986 Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.

Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y

Edit: linkmgrd PR will follow.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Related work items: #2315, #3146150
2021-11-10 18:54:33 -08:00
tjchadaga
9a1b1bc44e Fix for additional intf flap during fast-reboot (#9166) 2021-11-09 23:20:06 +00:00
Lawrence Lee
8ada006302 [swss]: Start ndppd after vlanmgrd (#9155)
Why I did it
During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage.

How I did it
Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-05 00:39:10 +00:00
Saikrishna Arcot
bb1bc59a22 docker-dhcp-relay: Fix waiting for interfaces to get set up (#9034)
Fix the check used to wait for interfaces to come up. The group name in
the supervisor config files has changed from isc-dhcp-relay to
dhcp-relay.

Also, in the wait script, wait 10 additional seconds after the vlans,
port channels, and any interfaces are up. This is because dhcrelay
listens on all interfaces (in addition to port channels and vlans), and
to ensure that it stays in a clean state during runtime, wait some extra
time to make sure that those interfaces are created as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-10-22 17:14:22 +00:00
kellyyeh
d4a6a009cf
Change radv interval to 3min (#8891)
(cherry picked from commit 0e175e6d6c)
2021-10-01 23:00:17 -07:00
kellyyeh
a4b6788b4b
Replace isc-dhcp with DHCPv6 Relay in dhcp_relay docker (#8884) 2021-10-01 19:55:03 -07:00
kellyyeh
47ba7a9091
[dhcp_relay] DHCP relay support for IPv6 (#7772) (#8871) 2021-09-30 01:33:02 -07:00
Christian Svensson
5dce093464 [mgmt-framework]: Fix typo in mgmt_vars.j2 (#8475)
Signed-off-by: Christian Svensson <blue@cmd.nu>
2021-08-25 04:11:16 +00:00
Kostiantyn Yarovyi
387ae82c5d [Pcied] run by python 3
Why I did it
Pcied running by python 2.

How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2

How to verify it
docker exec pmon supervisorctl status
2021-08-23 03:34:48 +00:00
xumia
b1c2659044
Support to build armhf/arm64 platforms on arm based system (#7731) (#8458)
Why I did it
Support to build armhf/arm64 platforms on arm based system without qemu simulator.
When building the armhf/arm64 on arm based system, it is not necessary to use qemu simulator.

How I did it
Build armhf on armhf system, or build arm64 on arm64 system, by default, qemu simulator will not be used.
When building armhf on arm64, and you have enabled armhf docker, then it will build images without simulator automatically. It is based how the docker service is run.

Docker base image change:
For amd64, change from debian:to amd64/debian:
For arm64, change from multiarch/debian-debootstrap:arm64- to arm64v8/debian:
For armhf, change from multiarch/debian-debootstrap:armhf- to arm32v7/debian:
See https://github.com/docker-library/official-images#architectures-other-than-amd64
The mapping relations:
arm32v6 --- armel
arm32v7 --- armhf
arm64v8 --- arm64

Docker image armhf deprecated info: https://hub.docker.com/r/armhf/debian, using arm32v7 instead.
2021-08-13 19:33:08 +08:00
richardyu
36ab000557 PTF adds unittest-xml-reporting (#8417)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-08-12 07:09:58 +00:00
Sujin Kang
ae7fa32691
[pmon]: Enable Autorestart of the daemons in PMON for unexpected exit (#8358)
Enable Autorestart of the daemons in PMON for unexpected exit
Remove the daemon list from the critical_process which prevent the PMON
from restarting when the individual daemon crashes.
2021-08-07 22:43:38 -07:00
Blueve
d2f2a07c7c [ARM] Fix issue whre the ping6 tool is missing from orchagent docker (#8345)
Signed-off-by: Jing Kan jika@microsoft.com
2021-08-05 15:25:53 +00:00
VenkatCisco
3aed7eab8f [pmon]: add python3-jsonschema pmon (#8018)
jsonschema is an implementation of JSON Schema for Python .

Signed-off-by: Venkat Garigipati <venkatg@cisco.com>
2021-08-05 15:23:06 +00:00
novikauanton
08dc00f817 [iccpd][docker] fix initial startup configuration (#7982)
#### Why I did it
The process of config generation (sonic-cfggen) fails, but the services continue to run with invalid config

#### How I did it
* add exit with error on errors in start.sh script (because supervisord relies on start.sh return code).
* fix jinja template. Jinja use common python expressions under the hood and `has_key` method was removed from dict in py3, so use check by `in` operator as it is supported by both py2 and py3.
#### How to verify it
* compile sonic with enabled iccp. 
* add mclag config to CONFIG_DB. 
    ``` 
    'MC_LAG|1' => {
        "local_ip": "10.0.0.2",
        "peer_ip": "10.0.0.3",
        "peer_link": "Ethernet8",
        "mclag_interface": "Ethernet12" 
    }
* unmaks, enable and start swss and iccpd services in sonic.
* log in into the iccpd container and check the config file `/etc/iccpd/iccpd.conf`
* expected config:
    ```
    mclag_id:1
        local_ip:10.0.0.2
        peer_ip:10.0.0.3
        peer_link:Ethernet8
        mclag_interface:Ethernet12
    system_mac:YOUR_SYSTEM_MAC

#### Description for the changelog
Fixed initial iccpd startup configuration.
2021-08-05 15:21:33 +00:00
Vivek Reddy
67202cc2bb autorestart inside restapi docker is disabled (#8006)
Fix issue with critical process in the restapi docker restarting immediately after getting killed
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-07-27 05:14:28 +00:00
Guohan Lu
bed4c26b09 Revert "Add ethtool to docker-platform-monitor (#8017)"
This reverts commit d66425dd76.
2021-07-07 23:37:28 -07:00
VenkatCisco
d66425dd76 Add ethtool to docker-platform-monitor (#8017)
#### Why I did it
ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices. 

#### How I did it
add package extension to docker-platform-monitor/Dockerfile.j2
2021-07-07 09:40:11 +00:00
VenkatCisco
36d7dfbea3 Add libpci3 pkg to docker-platform-monitor (#8016)
#### Why I did it
The libpci library provides portable access to configuration registers of devices connected to the PCI bus.

#### How I did it
update dockers/docker-platform-monitor/Dockerfile.j2
2021-07-07 09:40:06 +00:00