Commit Graph

281 Commits

Author SHA1 Message Date
Tamer Ahmed
7bf0537c9e [redis] Add redis Group And Grant Read/Write Access to Members (#5289)
sonic-cfggen is now using Unix Domain Socket for Redis DB. The socket
is created using root account. Subsequently, services that are started
as admin fails to start. This PR creates redis group and add admin
user to redis group. It also grants read/write access on redis.sock
for redis group members.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-04 21:16:16 +00:00
Joe LeVeque
9048d7ae4d
[201911] Remove sonic-daemon-base package (#5181)
sonic-daemon-base package has been deprecated in favor of the sonic-py-common package. All related functionality has been moved there.

This is a backport of https://github.com/Azure/sonic-buildimage/pull/5131 and parts of https://github.com/Azure/sonic-buildimage/pull/5168 to the 201911 branch
2020-08-22 17:55:27 -07:00
abdosi
8437f4a6c3 Fix unwanted python exception in syslog during database container (#5227)
startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-21 07:41:32 -07:00
Tamer Ahmed
9decadfed2 [services] Fix Delay Start of SNMP And Telemetry (#5211)
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-20 16:26:08 -07:00
abdosi
f785a0a270 [caclmgrd] Add support for multi-ASIC platforms (#5022)
* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
 1) Moved from using blocking listen() on Config DB to the select() model
 via python-swsscommon since we have to wait on event from multiple
 config db's
 2) Since  python-swsscommon is not available on host added libswsscommon and python-swsscommon
    and dependent packages in the base image (host enviroment)
 3) Made iptables programmed in all namespace using ip netns exec

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Fix Comments

* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Added more comments on logic.

* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.

* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Verified with swsscommon package. Fix issue for single asic platforms.

* Moved to new python package

* Address Review Comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.
2020-08-20 16:01:12 -07:00
Mahesh Maddikayala
42ab5d43fa + Modified buffer config template to include internal ASIC with 5m ca… (#4959)
+ Modified buffer config template to include internal ASIC with 5m cable length
+ added unit test to verify the changes.
2020-08-19 15:07:44 -07:00
Prince Sunny
9ad1971c78 Enable restapi, update sonic-restapi (#5169)
* Enable restapi if included in image
* [Submodule update] sonic-restapi
2020-08-14 09:22:09 -07:00
Abhishek Dosi
b19dab4ba5 As part of commit 78c803851c
that cherry-pick PR#5081 ICCPD feature got include din 201911
which is not needed so removed it back

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-09 12:00:09 -07:00
lguohan
78c803851c [build]: combine feature and container feature table (#5081)
1. remove container feature table
2. do not generate feature entry if the feature is not included
   in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities

* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-09 11:55:40 -07:00
Sujin Kang
ff6cb6c402 Add disabling HW watchdog during boot for fast-reboot and warm-reboot (#4927)
* Add disabling HW watchdog during boot for fast-reboot and warm-reboot case

* typo
2020-08-09 11:25:31 -07:00
yozhao101
517592afb8 [dockers] Default container autorestart feature to "enabled" for all except database (#4853)
Set the default auto_restart state to "enabled" in init_cfg.json for all containers except database

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-08-09 10:50:39 -07:00
isabelmsft
c56ddf0dba [Kubernetes Setup] Remove flannel, kube-proxy images (#5098)
Removes installation of kube-proxy (117 MB) and flannel (53 MB) images from Kubernetes-enabled devices. These images are tested to be unnecessary for our use case, as we do not rely on ClusterIPs for Kubernetes Services or a CNI for pod networking.
2020-08-09 10:48:59 -07:00
rkdevi27
f1bbda19f0 Fix "/host unmount failure" during reboot (#4558) 2020-08-09 10:34:02 -07:00
Joe LeVeque
6556c40040
[201911] Introduce sonic-py-common package (#5063)
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.

The package currently includes four modules:
- daemon_base
- device_info
- logger
- task_base

NOTE: This is a combination of all changes from https://github.com/Azure/sonic-buildimage/pull/5003, https://github.com/Azure/sonic-buildimage/pull/5049 and some changes from https://github.com/Azure/sonic-buildimage/pull/5043 backported to align with the 201911 branch. As part of the 201911 port, I am not installing the Python 3 package in the base image or in the VS container, because we do not have pip3 installed, and we do not intend to migrate to Python 3 in 201911.
2020-08-03 11:50:06 -07:00
Nazarii Hnydyn
4e558bca25
[201911][Mellanox] Update MFT to 4.15.0-104 (#5077)
* [Mellanox] Update MFT to 4.15.0-104.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Remove build system W/A.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Add MFT DKMS build support.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-08-03 13:53:33 +03:00
Stepan Blyshchak
b100ec559e [services] remove swss from WantedBy for nat service (#4991)
Otherwise, it may cause issues for warm restarts, warm reboot.
Warm restart of swss will start nat which is not expected for warm
restart. Also it is observed that during warm-reboot script execution
nat container gets started after it was killed. This causes removal of
nat dump generated by nat previously:

A check [ -f /host/warmboot/nat/nat_entries.dump ] || echo "NAT dump
does not exists" was added right before kexec:

```
Fri Jul 17 10:47:16 UTC 2020 Prepare MLNX ASIC to fastfast-reboot:
install new FW if required
Fri Jul 17 10:47:18 UTC 2020 Pausing orchagent ...
Fri Jul 17 10:47:18 UTC 2020 Stopping nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopped nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopping radv ...
Fri Jul 17 10:47:19 UTC 2020 Stopping bgp ...
Fri Jul 17 10:47:19 UTC 2020 Stopped bgp ...
Fri Jul 17 10:47:21 UTC 2020 Initialize pre-shutdown ...
Fri Jul 17 10:47:21 UTC 2020 Requesting pre-shutdown ...
Fri Jul 17 10:47:22 UTC 2020 Waiting for pre-shutdown ...
Fri Jul 17 10:47:24 UTC 2020 Pre-shutdown succeeded ...
Fri Jul 17 10:47:24 UTC 2020 Backing up database ...
Fri Jul 17 10:47:25 UTC 2020 Stopping teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopped teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopping syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopped syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopping all remaining containers ...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Fri Jul 17 10:47:37 UTC 2020 Stopped all remaining containers ...
NAT dump does not exists
Fri Jul 17 10:47:39 UTC 2020 Rebooting with /sbin/kexec -e to
SONiC-OS-201911.140-08245093 ...
```

With this change, executed warm-reboot 10 times without hitting this
issue, while without this change the issue is easily reproducible almost
every warm-reboot run.

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2020-07-26 11:11:35 -07:00
arlakshm
7c699df654 Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (#4926)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms
2020-07-11 09:47:24 -07:00
abdosi
4869fa7173 [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838)
* [sonic-buildimage] Changes to make network specific sysctl
common for both host and docker namespace (in multi-npu).

This change is triggered with issue found in multi-npu platforms
where in docker namespace
net.ipv6.conf.all.forwarding was 0 (should be 1) because of
which RS/RA message were triggered and link-local router were learnt.

Beside this there were some other sysctl.net.ipv6* params whose value
in docker namespace is not same as host namespace.

So to make we are always in sync in host and docker namespace
created common file that list all sysctl.net.* params and used
both by host and docker namespace. Any change will get applied
to both namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments and made sure to invoke augtool
only one and do string concatenation of all set commands

* Address Review Comments.
2020-07-05 15:32:30 -07:00
SuvarnaMeenakshi
fad2d47421 [systemd-generator]: Fix dependency update for multi-asic platform (#4820)
* [systemd-generator]: Fix the code to make sure that dependencies
of host services are generated correctly for multi-asic platforms.
Add code to make sure that systemd timer files are also modified
to add the correct service dependency for multi-asic platforms.

Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>

* [systemd-generator]: Minor fix, remove debug code and
remove unused variable.
2020-07-05 15:26:07 -07:00
abdosi
75d5e30f07 Changes to make default route programming correct in multi-npu platforms (#4774)
* Changes to make default route programming
correct in multi-asic platform where frr is not running
in host namespace. Change is to set correct administrative distance.
Also make NAMESPACE* enviroment variable available for all dockers
so that it can be used when needed.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix review comments

* Review comment to check to add default route
only if default route exist and delete is successful.
2020-07-05 15:21:09 -07:00
arlakshm
b6b1f3fac8 syslog changes Multi ASIC platforms (#4738)
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-07-05 15:19:22 -07:00
Joe LeVeque
5df5015835 [build][systemd] Mask disabled services by default (#4721)
When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.
2020-06-28 07:28:56 -07:00
abdosi
c2981b8cdf [build] Ensure /usr/lib/systemd/system/ directory exists before referencing (#4788)
* Fix the Build on 201911 (Stretch) where the directory
/usr/lib/systemd/system/ does not exist so creating
manually. Change should not harm Master (buster) where
the directory is created by Linux

* Fix as per review comments
2020-06-17 09:59:53 -07:00
Renuka Manavalan
f8a9a1b805 [k8s]: switching to Flannel from Calico. (#4768)
Switching to Flannel from Calico which brings down the image size by around 500+MB.
2020-06-16 08:18:54 -07:00
Joe LeVeque
c625e0e3e6 [build] Enable telemetry service by default (#4760)
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image

**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
2020-06-16 08:17:47 -07:00
Joe LeVeque
42bc14f44c [systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673)
This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation
2020-06-16 08:12:47 -07:00
abdosi
9ea746e25f Changes for LLDP docker to support multi-npu platforms (#4530)
* Changes for LLDP for Multi NPU Platoforms:-
a) Enable LLDP for Host namespace for Management Port
b) Make sure Management IP is avaliable in per asic namespace
   needed for LLDP Chassis configuration
c) Make sure chassis mac-address is correct in per asic namespace
d) Do not run lldp on eth0 of per asic namespace and avoid chassis
   configuration for same
e) Use Linux hostname instead from Device Metadata for lldp chassis
   configuration since in multi-npu platforms device metadata hostname
   will be differnt

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comment with following changes:
a) Use Device Metadata hostname even in per namespace conatiner.
   updated minigraph parsing for same to have hostname as system
   hostname and add new key for asic name

b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace
   config also as needed for LLDP for setting chassis management IP.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments
2020-05-20 07:51:49 -07:00
lguohan
710d176162 [baseimage]: pin down package version for azure-storage, watchdog and futures (#4575)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-05-12 06:19:05 +00:00
judyjoseph
c808640f4e Multi DB with namespace support, Introducing the database_global.json… (#4477)
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host

* Updates based on comments

* Adding the j2 templates for database_config and database_global files.

* Updating to retrieve the redis DIR's to be mounted from database_global.json file.

* Additional check to see if asic.conf file exists before sourcing it.

* Updates based on PR comments discussion.

* Review comments update

* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.

* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.

* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.

* Update the comments for sudo usage in docker_image_ctrl.j2

* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.

* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli

* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
2020-05-09 21:33:07 -07:00
Dong Zhang
3faa4e936e [MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541) 2020-05-09 18:16:48 -07:00
Akhilesh Samineni
3be7c5786b [NAT] : Removed requires dependency on swss (#4551)
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
2020-05-09 18:16:02 -07:00
Neetha John
596bec1b32 [qos]: Alpha and ECN settings change for Th (#4564)
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices

Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present

Signed-off-by: Neetha John <nejo@microsoft.com>
2020-05-09 18:13:10 -07:00
arlakshm
542f722055 [docker]: Enabled ipv6 in dockers when using docker bridge network (#4426)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-04-27 08:50:23 -07:00
Renuka Manavalan
9b017a83b5
[baseimage]: Install Kubernetes packages if enabled in image (#4374) (#4432)
Install kubeadm, which transparently installs kubelet & kubectl
As well download required Kubernetes images required to run as kubernetes node.
The kubelet service is intentionally kept in disabled state, as it would otherwise
continuously restart wasting resources, until join to master.
2020-04-16 21:54:45 -07:00
SuvarnaMeenakshi
0099305475 Multi-ASIC implementation (#3888)
Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.
2020-04-15 13:08:34 -07:00
Nazarii Hnydyn
0b35fcf3bf [mellanox]: Add SSD FW update tool (#4351)
* [mellanox]: Add SSD FW update tool.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Align Platform API.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Fix firmware description.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Update SSD tool.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-04-15 13:02:36 -07:00
Abhishek Dosi
249265ad99 Revert "Multi-ASIC implementation (#3888)"
This reverts commit 2e87a16941.
2020-04-03 14:34:38 -07:00
SuvarnaMeenakshi
2e87a16941 Multi-ASIC implementation (#3888)
Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.
2020-04-01 23:21:49 -07:00
Kebo Liu
2fd1641feb copy spc3 fw file to image (#4328) 2020-03-29 22:48:10 -07:00
Stepan Blyshchak
ee84dca683 [docker_image_ctl.j2] Share UTS namespace with host OS (#4169)
Instead of updating hostname manualy on Config DB hostname change,
simply share containers UTS namespace with host OS.
Ideally, instead of setting `--uts=host` for every container in SONiC,
this setting can be set per container if feature requires.
One behaviour change is introduced in this commit, when `--privileged`
or `--cap-add=CAP_SYS_ADMIN` and `--uts=host` are combined, container
has privilege to change host OS and every other container hostname.
Such privilege should be fixed by limiting containers capabilities.

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2020-03-22 23:04:02 -07:00
Andriy Kokhan
39889a3c35 [Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247. (#4250)
* [Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247.

Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>
2020-03-19 22:18:13 -07:00
Joe LeVeque
8e36068237 [sonic-cfggen] Loading the configuration from init_cfg.json and then from config_db.json (#4148) 2020-03-15 08:54:05 -07:00
Olivier Singla
a8baca0d6e [kernel]: security kernel update to 4.9.189 (#3913)
This patch upgrade the kernel from version
4.9.0-9-2 (4.9.168-1+deb9u3) to 4.9.0-11-2 (4.9.189-3+deb9u2)

Co-authored-by: rajendra-dendukuri <47423477+rajendra-dendukuri@users.noreply.github.com>
2020-03-15 08:52:29 -07:00
Joe LeVeque
102cb83097 [Services] Restart NAT service upon unexpected critical process exit. (#4208) 2020-03-14 18:03:29 -07:00
Stephen Sun
c700127101 [Mellanox]Take advantage of sdk variable to customize the location where sdk_socket exists. (#4223)
Take advantage of an SDK environment variable to customize the location where sdk_socket exists.
In the latest SDK sdk_socket has been moved from /tmp to /var/run which is a better place to contain this kind of file.
However, this prevents the subdirs under /var/run from being mapped to different volumes. To resolve this, we take advantage of an SDK variable to designate the location of sdk_socket.
This requires every process that requires to access sdk_socket have this environment variable defined. However, to define environment variable for each process is less scalable. We take advantage of the docker scope environment variable to avoid that.
It depends on PR 4227
2020-03-14 18:02:43 -07:00
rajendra-dendukuri
8581a52571 ZTP infrastructure changes to support DHCP discovery provisioning data (#3298)
* ZTP infrastructure changes to support DHCP discovery provisioning data

- Dynamically generate DHCP client configuration based on current ZTP state
- Added support to request and process hostname when using DHCPv6
- Do not process graphservice url dhcp option if ZTP is enabled, ZTP service
will process it
- Generate /e/n/i file with all active interfaces seeking address assignment
via DHCP. Only interfaces that are created in Linux will be added to /e/n/i.
Also DHCP is started only on linked up in-band interfaces.

Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
2020-03-03 22:23:59 -08:00
yozhao101
5c8c4b2a50 [Services] Restart BGP service upon unexpected critical process exit. (#4207) 2020-03-03 19:19:44 -08:00
rajendra-dendukuri
1edb69647e [sonic-ztp]: Build sonic-ztp package (#3299)
* Build sonic-ztp package

- Add changes in make rules to conditionally include sonic-ztp package

Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
2020-02-24 14:27:24 -08:00
Stepan Blyshchak
398929c622 [mgmt-framework] start after syncd (#4174)
every service starts after syncd to start the most critical parts first

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2020-02-24 11:04:51 -08:00
Prince Sunny
6740b2d3df Fix service and container name to be same (#4151) 2020-02-24 10:24:11 -08:00