Commit Graph

363 Commits

Author SHA1 Message Date
Samuel Angebault
77cde50541 [device/Arista] Improvements to the boot of Arista devices. (#2898)
* Fix showing systemd shutdown sequence when verbose is set

* Fix creation of kernel-cmdline file

Sometimes boot0 prints error
"mv: can't preserve ownership of '/mnt/flash/image-arsonic.xxxx/kernel-cmdline': Operation not permitted"

* Improve flash space usage during installation

Some older systems only have 2GB of flash available. Installing a second
image on these can prove to be challenging.
The new installation process moves the installer swi to memory in order
to avoid free up space from the flash before uncompressing it there.
It removes all the flash space usage spike and also improves the IO
since the installation is no more reading and writting to the flash at
the same time.

* Add support of 7060CX-32S-SSD

* 7260CX3: use inventory powerCycle procedures

* 7050QX-32S: use inventory powerCycle procedures

* 7050QX-32: use inventory powerCycle procedures

* platform: arista: add common platform_reboot

Replace platform_reboot by a link to new common for devices already
using a similar script.

* 7060CX-32S: use inventory powerCycle procedures

* Install python smbus in pmon

Some platform plugin need the python smbus library to perform some actions.
This installs the dependency.
2019-05-15 12:45:05 -07:00
Renuka Manavalan
a357693f52 [tacacs]: skip accessing tacacs servers for local non-tacacs users (#2843)
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.

* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
2019-05-09 14:36:32 -07:00
Ying Xie
9efcf1759a
[ebtables] install ebtables in base image and install filter rules (#2805)
- Add ebtables package, and install some filter rules:
  1. ebtables -A FORWARD -d BGA -j DROP
  2. ebtables -A FORWARD -p ARP -j DROP

Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-09 09:44:41 -07:00
lguohan
5fb185cd83
[docker-frr]: bring quagga docker features to frr docker (#2870)
- use superviord to manage process in frr docker
- intro separated configuration mode for frr
- bring quagga configuration template to frr.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-05-08 23:00:49 -07:00
Joe LeVeque
6eca27e564 [services] Restart SwSS service upon unexpected critical process exit (#2845)
* [service] Restart SwSS Docker container if orchagent exits unexpectedly

* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes

* Move supervisor-proc-exit-listener script

* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB

* Ensure dependent services stop/start/restart with SwSS

* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)

* Also update journald.conf options

* Remove 'PartOf' option from unit files

* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile

* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container

* Update critical_processes file for swss container
2019-05-01 08:02:38 -07:00
Joe LeVeque
2736da97c7 [sudoers] Add /usr/bin/teamshow to READ_ONLY_CMDS (#2846) 2019-05-01 08:01:44 -07:00
Ying Xie
6431248243
[db migrator] migrate the DB to latest schema when needed (#2808)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-30 14:46:18 -07:00
Qi Luo
6b3a26f0cc
Remove unused packages in docker images and host (#2807)
* Remove unneeded packages in docker images and host
* Remove libpython3.6 from snmp docker image
2019-04-29 17:21:24 -07:00
Ying Xie
c7af19a4db
[teamd service] start teamd service after swss (#2829)
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-26 15:12:33 -07:00
Andriy Moroz
ca7924eb27 Increase syncd start timeout (#2776)
* Increase syncd start timeout

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Replace TimeoutSec to TimeoutStartSec

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>
2019-04-24 17:51:26 +03:00
zhenggen-xu
75964ef243 [baseimage]: Add fstrim service and fstrim timer by default (#2804)
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-04-21 14:21:16 -07:00
Stepan Blyshchak
6a4ffef1fd [snmp.service] Make swss.service a requisite (#2790) 2019-04-16 18:32:36 -07:00
Ying Xie
8bf9247c5e
[tmpfs var/log] mount /var/log as tmpfs for some platforms (#2780)
SONiC is a heavy writer to /var/log partition, we noticed that this
behavior causes certain flash drive to become read-only over time.
To avoid this issue, we mount /var/log parition on these devices as
tmpfs.

- Mount /var/log as tmpfs
- /var/log default size is 128M
- Adjust size according to existing var-log.ext4 file size.
- Adjust size to between 5% to 10% of total memory size.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-14 22:46:26 -07:00
Ying Xie
f583f57af6
[service] add warmboot finializer service (#2715)
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-12 15:45:58 -07:00
Renuka Manavalan
6d7ecc426c [hostcfgd] -- Fix the default for failthrough as false.
This implies that by default, if TACACS is configured properly and it reported auth_err, then don't try fail through to traditional unix authentication through /etc/passwd.

If this failthrough is intended, make it explicit through "sudo config aaa authentication failthrough enable"

Removed an unused variable "aaa.fallback"

Tested manually. Note the presence of 'auth_err=die' in all cases except when failthrough is explicitly enabled.

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough default; date
Wed Apr  3 23:05:18 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1316 Apr  3 23:05 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough enable; date ; h4 "AAA|authentication"
Wed Apr  3 23:06:37 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1294 Apr  3 23:06 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough disable; date ; h4 "AAA|authentication"
Wed Apr  3 23:07:09 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1321 Apr  3 23:07 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass
2019-04-03 23:16:56 +00:00
Ying Xie
00a0f22f38
Revert "[teamd service] teamd service should start after syncd (#2724)" (#2733)
This reverts commit 0d1efb131c.
2019-04-03 08:20:44 -07:00
paavaanan
b56124bf48 removing dhcp- turn- off option from initrd (#2555)
* removing dhcp changes from initrd

* removing mgmt-intf-dhcp file
2019-04-02 15:48:04 -07:00
Ying Xie
0d1efb131c
[teamd service] teamd service should start after syncd (#2724)
* [teamd service] teamd service should start after syncd

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* combine after lines
2019-04-01 15:40:22 -07:00
Qi Luo
9c83b5480d
[security] Do not generate ssh server keys for non RSA protocols (#2718) 2019-03-29 15:27:33 -07:00
Ying Xie
698b248a13
[docker script] skip docker mount point checking for database container (#2683)
database container doesn't mount hwsku folder.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-03-19 20:14:07 -07:00
Renuka Manavalan
ae05579c67 [baseos]: Install ipaddress python package that has deprecated current ipaddr. … (#2674)
* Install ipaddress python package that has deprecated current ipaddr. ipaddress has backport to python2.7

* Install python ipaddress module as required by route_check.py sonic utility. BTW, ipaddress deprecates ipaddr and ipaddress has python2 backport

* Revert the old chaneg per review comments.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
2019-03-18 11:12:47 -07:00
Pavlo Yadvichuk
11c2e9ee3d [barefoot]: Allow configuration of platform-specific interfaces used for internal purposes (#2631)
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).

Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.

- What I did

removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it

build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog

Allow configuration of platform-specific interfaces
2019-03-09 06:22:32 -08:00
Joe LeVeque
2bb5400948 [services] Services which start containers now use 'docker wait' instead of 'docker attach' (#2661) 2019-03-08 10:59:41 -08:00
Wenda Ni
f9c9fa8ba1 [qos]: Map tc 1, 2, 5, and 6 back to pg 0 (#2650)
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.

Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-03-08 02:23:32 -08:00
Nazarii Hnydyn
b22fe37670 [mellanox]: Upgraded hw-management V.2.0.0160. (#2643)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-03-06 18:51:46 -08:00
Wenda Ni
784bf77a92 Add hook to allow customizing link cable lengths
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-03-05 22:06:00 +00:00
Ying Xie
66f5202b9f
[swss/syncd] cold start syncd service in swss in attach method (#2639)
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.

attach() is called by service start method, which is non-blocking.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-03-04 16:46:55 -08:00
RAMA CHANDRA REDDY GADDAM
b9edb7153d [aaa] Fix common-auth-sonic.j2 template issue (#2613) 2019-03-02 15:36:35 -08:00
Joe LeVeque
5eb7872a07 [services] Ensure swss and syncd services start before dependent services (#2634)
* [services] Ensure swss and syncd services start before dependent services

* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each

* Add 'After=swss.service' to syncd.service
2019-03-02 15:28:34 -08:00
yurypm
d632569a6a Add initramfs hook for Arista devices (#2595)
We are going to use initramfs hook for firmware upgrades
To install Arista hook:
- create folder /mnt/flash/<image dir>/platform/hooks/boot1/ from Aboot or
  /host/<image dir>/platform/hooks/boot1/ from Sonic
- add executable script to created folder
2019-02-27 10:28:04 -08:00
Ying Xie
3086f4f391
Revert "[baseimage] Delay ntp-config service to start after 5 minutes (#2494)" (#2590)
This reverts commit 33fe8d298e.
2019-02-21 10:04:54 -08:00
Nikos
1158277533 [frr]: staticd terminating due to inadequate permissions (#2580)
Signed-off-by: nikos <ntriantafillis@gmail.com>
2019-02-19 21:50:19 -08:00
lguohan
572db1e0a9
[swss]: flush asic db in swss.sh for non warm-boot (#2582)
need to flush asic db in swss.sh instead of syncd.sh

orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-02-19 21:48:43 -08:00
Jipan Yang
ff74daaf13 Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#2538)
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
2019-02-19 17:06:56 -08:00
Renuka Manavalan
fa7c46611e [hostcfgd]: Promote logs for update-notifications-from-DB from DEBUG to INFO (#2576)
* Add a log message for each notification of add/del TACACS server.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>

* Moved another syslog message from DEBUG to INFO to be able to see those notifications.

All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
2019-02-16 10:17:13 -08:00
Stepan Blyshchak
2dd769bf46 [syncd.sh] Don't stop sxdkernel during warm shutdown on Mellanox platform (#2572)
/etc/init.d/sxdkernel stop may take up to 15 sec which has impact on
control plane downtime

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2019-02-15 16:08:08 -08:00
Nazarii Hnydyn
d53df059d4 [devices]: Added new SN3700/SN3700C Mellanox platforms (#2548)
* [mlnx-msn3700]: Added MSN3700 platform.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Upgrade FW burn: use ASIC auto detect.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Updated HW-MGMT/FW/MFT/SAI/SDK.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Added MSN3700C platform.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-02-13 23:08:04 -08:00
Ying Xie
44551d0fb5
[swss/syncd] log swss/syncd service script activities (#2545)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-02-10 11:56:31 -08:00
zzhiyuan
6037707abc [devices]: Add device data for Arista 7060PX/DX4-32 (#2534)
* Add boot0 definition for Arista 7060PX4-32 and 7060DX4-32

* Add port configuration for Arista 7060PX4-32

* Add plugins for Arista 7060PX4-32

* Add platform_reboot for Arista 7060PX4-32

* Add Arista 7060DX4-32 as symlink of 7060PX4-32

* Add sensors configuration and fancontrol for Arista 7060PX4-32

* Update arista-driver submodules for barefoot/broadcom

* Add platform_reboot script for Alhambra

* Rook fancontrol CPLD rename
2019-02-08 22:02:01 -08:00
Nadiia Stetskovych
bb5a171ffc [minigraph]: Do not fail for minigraphs which do not have neighbors listed in <Devices> section (#2522)
Signed-off-by: Nadiya.Stetskovych <nstetskovych@barefootnetworks.com>
2019-02-04 22:43:08 -08:00
lguohan
f20665008c
[build]: put stretch debian packages under target/debs/stretch/ (#2519)
* [build]: put stretch debian packages under target/debs/stretch/

* in stretch build phase, all debian packages built in that stage are placed under target/debs/stretch directory.
* for python-based debian packages, since they are really the same for jessie and stretch, they are placed under target/python-debs directory.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-02-04 22:06:37 -08:00
zhenggen-xu
982eddfaa4 [updategraph] After system upgrade, restore files/directories with original attributes etc. (#2368)
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Removed deployment_id_asn_map.yml from copy list

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-02-02 12:50:19 -08:00
lguohan
9c2d7240ea
[vs]: Force10-S6000 buffer settings for virtual switch (#2515)
Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-02-01 11:18:02 -08:00
Prince Sunny
39e12a1d82 [swss]: Change VrfMgrd startup order, cleanup VRF_TABLE from state DB (#2510) 2019-01-31 23:28:31 -08:00
Wenda Ni
58adf06cc0 [QoS]: Link pg 2 and 6 to lossy buffer profile (#2511)
* Link pg 2 and 6 to lossy buffer profile

Signed-off-by: Wenda <wenni@microsoft.com>
2019-01-31 23:27:58 -08:00
Joe LeVeque
33fe8d298e [baseimage] Delay ntp-config service to start after 5 minutes (#2494) 2019-01-30 19:01:21 -08:00
Wenda Ni
ce9a3f0c5a [QoS]: QoS Config change for multiple devices (#2505)
* QoS config change: 1) DSCP mapping; 2) link pg/queue 6 to lossy buffer;
3) redistribute scheduler

Signed-off-by: Wenda <wenni@microsoft.com>

* Add scheduling weight to queue 2

Signed-off-by: Wenda <wenni@microsoft.com>

* Link pg/queue 2 to lossy buffer

Signed-off-by: Wenda <wenni@microsoft.com>

* Update the pg headroom for a7060-D48C8 50G

Signed-off-by: Wenda <wenni@microsoft.com>

* Update config gen test for qos

Signed-off-by: Wenda <wenni@microsoft.com>

* Update pg headroom size, and update egress lossy pool size accordingly

Signed-off-by: Wenda <wenni@microsoft.com>

* Update headroom pool size; Update ingress service pool and egress lossy
pool sizes accordingly;

Signed-off-by: Wenda <wenni@microsoft.com>

* a7260: update headroom pool size; Update ingress service pool and egress lossy pool sizes accordingly;

Signed-off-by: Wenda <wenni@microsoft.com>

* Update config gen test for buffer

Signed-off-by: Wenda <wenni@microsoft.com>
2019-01-30 19:00:13 -08:00
Joe LeVeque
39b60d2a50 [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades (#2490)
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades

* [sonic-utilities] Update submodule to include related changes
2019-01-29 03:42:19 -08:00
Joe LeVeque
8f43cad061 [rsyslog] Suppress duplicate messages from base image and all Docker containers (#2497) 2019-01-29 03:41:40 -08:00
lguohan
4ccd35bc25
[kernel]: update sonic kernel to 4.9.0-8-2 (#2468)
* [kernel]: update sonic kernel to 4.9.0-8-2

* 3b2114d 2019-01-20 | [sonic-linux-kernel] add udp_l3mdev_accept kernel upstream patch (#70) (HEAD, azure/master) [Harish Venkatraman]
* 37734aa 2019-01-10 | L3mdev cgroup (#73) [lguohan]
* d631eeb 2018-12-15 | yet another uart race condition fix (#75) [lguohan]

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* Update Mellanox SDK

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* Update arista platform driver to match 4.9.0-8-2 kernel

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-01-25 00:46:09 -08:00