Commit Graph

589 Commits

Author SHA1 Message Date
Baptiste Covolato
a35609faaf [arista/aboot]: Zero out 1st MB before repartitioning (#5220)
The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97f1e. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.

Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.

Signed-off-by: Baptiste Covolato <baptiste@arista.com>
2020-08-22 18:48:57 -07:00
rkdevi27
80dc2b71d1 [baseimage]: /host unmount timeout issue during reboot. (#5032)
Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket

Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.
2020-07-25 08:31:05 +00:00
Joe LeVeque
210dc90d0d [caclmgrd] Filter DHCP packets based on dest port only (#4995) 2020-07-21 10:15:08 +00:00
arlakshm
40e37f385e syslog changes Multi ASIC platforms (#4738)
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-07-12 18:16:44 +00:00
abdosi
15440b6e43
Changes to make default route programming correct in multi-npu platforms (#4774)
* Changes to make default route programming
correct in multi-asic platform where frr is not running
in host namespace. Change is to set correct administrative distance.
Also make NAMESPACE* enviroment variable available for all dockers
so that it can be used when needed.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix review comments

* Review comment to check to add default route
only if default route exist and delete is successful.
2020-06-29 11:38:46 -07:00
SuvarnaMeenakshi
ab2177b4a9
[systemd-generator]: Fix dependency update for multi-asic platform (#4820)
* [systemd-generator]: Fix the code to make sure that dependencies
of host services are generated correctly for multi-asic platforms.
Add code to make sure that systemd timer files are also modified
to add the correct service dependency for multi-asic platforms.

Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>

* [systemd-generator]: Minor fix, remove debug code and
remove unused variable.
2020-06-29 09:39:23 -07:00
Praveen Chaudhary
07930c39ba
[build] Add essential PY PKGs on host for sonic-utilities/config/config_mgmt.py (#4740)
Add essential PY PKGs on host by installing them in sonic_debian_extension.j2

Signed-off-by: Praveen Chaudhary pchaudhary@linkedin.com
2020-06-28 11:03:48 -07:00
Qi Luo
6849a0351c
[redis] Install vanilla redis packages for Buster and Stretch; upgrade Buster to 6.0.5 (#4732)
upgrade redis server to 5:6.0.5-1~bpo10+1
2020-06-27 01:17:20 -07:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
Qi Luo
719c8e68c8
[secureboot] only remove exec bit in secureboot (#4836)
Address issue #4832
2020-06-25 10:07:50 -07:00
Joe LeVeque
63d2efbe03
[build][systemd] Mask disabled services by default (#4721)
When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.
2020-06-24 15:25:16 -07:00
Samuel Angebault
f7d43173a2 [secureboot] only remove exec bit in secureboot
Address issue #4832
2020-06-23 11:34:07 -07:00
Samuel Angebault
67987e9c0e
[secureboot] Add secureboot support for Arista devices (#4741)
* Add secureboot support in boot0
* Initramfs changes for secureboot on Aboot devices
* Do not compress squashfs and gz in fs.zip
It doesn't make much sense to do so since these files are already
compressed.
Also not compressing the squashfs has the advantage of making it
mountable via a loop device.
* Add loopoffset parameter to initramfs-tools
2020-06-22 09:30:31 -07:00
Kebo Liu
2b568ec136
Add with_i2cdev for mst start to have I2C device loaded properly (#4790) 2020-06-21 16:27:05 +03:00
Joe LeVeque
4d2d95e8e6
[hostcfgd] Synchronize all feature statuses once upon start (#4714)
- Ensure all features (services) are in the configured state when hostcfgd starts
- Better functionalization of code
- Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.
2020-06-20 12:09:29 -07:00
padmanarayana
95e3cda5da
[DELL]: FTOS to SONiC fast conversion fixes (#4807)
While migrating to SONiC 20181130, identified a couple of issues:
1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 
2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.
2020-06-19 11:02:08 -07:00
Joe LeVeque
1f8a78cef1
[build] No longer install Python 'click-default-group' package (#4811)
All dependencies upon the Python 'click-default-group' package have been removed from sonic-utilities as of https://github.com/Azure/sonic-utilities/pull/903. The submodule was updated to include this patch as of https://github.com/Azure/sonic-buildimage/pull/4601, therefore we no longer need to install this package in the SONiC image.
2020-06-19 10:54:10 -07:00
Joe LeVeque
6960477cc2
[caclmgrd] Don't limit connection tracking to TCP (#4796)
Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.
2020-06-18 00:18:20 -07:00
abdosi
30d7ce0004
[build] Ensure /usr/lib/systemd/system/ directory exists before referencing (#4788)
* Fix the Build on 201911 (Stretch) where the directory
/usr/lib/systemd/system/ does not exist so creating
manually. Change should not harm Master (buster) where
the directory is created by Linux

* Fix as per review comments
2020-06-17 09:16:58 -07:00
xumia
76a395cdbf
[secure boot] Support rw files allowlist (#4585)
* Support rw files allowlist for Sonic Secure Boot
* Improve the performance
* fix bug
* Move the config description into a md file
* Change to use a simple way to remove the blank line
* Support chmod a-x in rw folder
* Change function name
* Change some unnecessary words
2020-06-13 00:10:13 -07:00
Renuka Manavalan
edeb40ffcf
[k8s]: switching to Flannel from Calico. (#4768)
Switching to Flannel from Calico which brings down the image size by around 500+MB.
2020-06-12 18:06:08 -07:00
Joe LeVeque
4e482c16ba
[build] Enable telemetry service by default (#4760)
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image

**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
2020-06-12 16:20:31 -07:00
Ying Xie
ae7bf3db52
[ntp] disable ntp long jump (#4748)
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-06-11 13:01:21 -07:00
yozhao101
4ea2e5e6dc
[docker-syncd] Add timeout to force stop syncd container (#4617)
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.

164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done

The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:

str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134

The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:

After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution 
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.

**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-06-04 15:17:28 -07:00
Joe LeVeque
7b8037770d
[caclmgrd] Get first VLAN host IP address via next() (#4685)
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.

This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
2020-06-02 02:11:21 -07:00
Joe LeVeque
eff8a89523
[hostcfgd] Get service enable/disable feature working (#4676)
Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here:

1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop
2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots
3. Substitute returns with continues so that even if one service fails, we still try to handle the others

Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.
2020-06-02 02:07:22 -07:00
Joe LeVeque
1e369b0998
[systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673)
This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation
2020-05-30 13:46:44 -07:00
Qi Luo
65e7a84509
[baseimage]: Build and install redis-dump-load Python 3 package in host image (#4661)
Fix #4656
2020-05-30 05:52:27 -07:00
Samuel Angebault
d35a8a3800
[arista]: Add SmartsvilleDDBK and SmartsvilleBkMs (#4662)
Co-authored-by: Boyang Yu <byu@arista.com>
2020-05-28 14:59:00 -07:00
taocy
4cd36175ce arm arch: 1. install required libraries; 2. umount /proc after dockerfs. 2020-05-25 13:15:19 +00:00
taocy
ea2dd9541d change image apt source list from stretch to buster for arm 2020-05-25 13:15:19 +00:00
Praveen Chaudhary
0ccdd70671
[sonic-yang-mgmt]: sonic-yang-mgmt package for configuration validation. (#3861)
**- What I did**

#### wheel package Makefiles

- wheel package Makefiles for sonic-yang-mgmt package.

#### libyang Python APIs:
- python APIs based on libyang
- functions to load/merge yang models and Yang data files
- function to validate data trees based on Yang models
- functions to merge yang data files/trees
- add/set/delete node in schema and data trees
- find data/schema nodes from xpath from the Yang data/schema tree in memory
- find dependencies
- dump the data tree in json/xml

#### Extension of libyang Python APIs:
-- Cropping input config based on Yang Model.
-- Translate input config based on Yang Model.
-- rev Translate input config based on Yang Model.
-- Find xpath of port, portleaf and a yang list.
-- Find if node is key of a list while deletion if yes, then delete the parent.

Signed-off-by: Praveen Chaudhary pchaudhary@linkedin.com
Signed-off-by: Ping Mao pmao@linkedin.com
2020-05-21 16:27:57 -07:00
simonJi2018
0b6253baa1
[platform/nephos] Optimize the code to reduce changes due to the kernel upgrade (#4332)
- bug fix : Fixed an issue which the nps ko file was not loaded due to the wrong service file name
- Optimize the code to reduce changes due to the kernel upgrade
- Remove nephos ko file loaded in swss.service.j2 because it has loaded at syncd.service.j2
2020-05-21 02:21:07 -07:00
anand-kumar-subramanian
34586032dc
[mgmt-framework] removed requires dependency on swss (#4548)
fixes #4473
2020-05-20 20:47:09 -07:00
Joe LeVeque
bce42a7595
[caclmgrd] Allow more ICMP types (#4625) 2020-05-20 17:45:07 -07:00
abdosi
a44fc07e78
Changes to support config-setup service for multi-npu (#4609)
* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using  config db from previous
file system

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Address Review comments

* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.

* Updated to use python command so no code duplication.
2020-05-20 16:32:33 -07:00
rkdevi27
32f58b5864
Fix "/host unmount failure" during reboot (#4558) 2020-05-20 11:18:11 -07:00
Ying Xie
cdfb1ced44
[ntp] enable/disable NTP long jump according to reboot type (#4577)
* [ntp] enable/disable NTP long jump according to reboot type

- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* fix typo

* further refactoring

* use sonic-db-cli instead
2020-05-20 10:57:21 -07:00
rajendra-dendukuri
9c7105b5f3
Install swsssdk-py3 in the base Debian image for python3 based apps (#4542)
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
2020-05-19 11:15:05 -07:00
Joe LeVeque
5150e7b655
[caclmgrd] Ignore keys in interface-related tables if no IP prefix is present (#4581)
Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.
2020-05-12 18:16:55 -07:00
abdosi
5fe2216ea3
Fix for issue where image is compile with flag ENABLE_DHCP_GRAPH_SERVICE (#4573)
and then we load image and reboot even if there was existing
config_db.json we will look for DHCP Service. we should disbale
update_graph in such cases. This behaviour is silimar to what we have in
201811 image.
2020-05-12 14:49:56 -07:00
lguohan
1066f238ba
[baseimage]: pin down package version for azure-storage, watchdog and futures (#4575)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-05-11 23:17:47 -07:00
Joe LeVeque
5e8e0d76fc
[caclmgrd] Add some default ACCEPT rules and lastly drop all incoming packets (#4412)
Modified caclmgrd behavior to enhance control plane security as follows:

Upon starting or receiving notification of ACL table/rule changes in Config DB:
1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions
2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute
3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute
4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages
5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets
6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets
7. Add iptables/ip6tables commands to allow all incoming BGP traffic
8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP)
9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured)
10. Add iptables rules to drop all packets destined for loopback interface IP addresses
11. Add iptables rules to drop all packets destined for management interface IP addresses
12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses
13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses
14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute)
15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets
2020-05-11 12:36:47 -07:00
abdosi
a96f9ecee9
Changes for LLDP docker to support multi-npu platforms (#4530)
* Changes for LLDP for Multi NPU Platoforms:-
a) Enable LLDP for Host namespace for Management Port
b) Make sure Management IP is avaliable in per asic namespace
   needed for LLDP Chassis configuration
c) Make sure chassis mac-address is correct in per asic namespace
d) Do not run lldp on eth0 of per asic namespace and avoid chassis
   configuration for same
e) Use Linux hostname instead from Device Metadata for lldp chassis
   configuration since in multi-npu platforms device metadata hostname
   will be differnt

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comment with following changes:
a) Use Device Metadata hostname even in per namespace conatiner.
   updated minigraph parsing for same to have hostname as system
   hostname and add new key for asic name

b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace
   config also as needed for LLDP for setting chassis management IP.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments
2020-05-11 11:05:44 -07:00
Neetha John
286aa35ac6
[qos]: Alpha and ECN settings change for Th (#4564)
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices

Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present

Signed-off-by: Neetha John <nejo@microsoft.com>
2020-05-09 11:21:18 -07:00
judyjoseph
acf465b43b
Multi DB with namespace support, Introducing the database_global.json… (#4477)
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host

* Updates based on comments

* Adding the j2 templates for database_config and database_global files.

* Updating to retrieve the redis DIR's to be mounted from database_global.json file.

* Additional check to see if asic.conf file exists before sourcing it.

* Updates based on PR comments discussion.

* Review comments update

* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.

* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.

* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.

* Update the comments for sudo usage in docker_image_ctrl.j2

* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.

* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli

* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
2020-05-08 21:24:05 -07:00
Akhilesh Samineni
86627dfd35
[NAT] : Removed requires dependency on swss (#4551)
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
2020-05-08 00:01:48 -07:00
Joe LeVeque
dfdd94d8ad
[process-reboot-cause] If software reboot cause is unknown add note if first boot into new image (#4538) 2020-05-06 22:48:33 -07:00
wangshengjun
bed4a799df
[ebtables]add the filter rule for ARP packets with vlan tag: (#3945)
1. ebtables -t filter -A FORWARD -p 802_1Q --vlan-encap 0806 -j DROP
The ARP packet with vlan tag can't match the default rule.

Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>
2020-05-06 20:03:09 -07:00
Dong Zhang
340cf826a6
[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541) 2020-05-06 15:41:28 -07:00