Commit Graph

114 Commits

Author SHA1 Message Date
Renuka Manavalan
76bf5a0bc4 [build]: Added debug symbols to many debug dockers. (#3098)
* Added debug symbols to many debug dockers.

* For debug images *only*:
1) Archive source files into debug image
2) Archived source is copied into /src
3) Created an empty dir /debug
4) Mount both /src as ro & /debug as rw into every docker
5) Login banner will give some details on /src & /debug
6) Devs can copy core file into /debug and view it from inside a container.
7) Dev may create all gdb logs and other data directly into /debug.

* Dropped redundant REDIS_TOOLS per review comments.

* Added debug symbols to frr package and hence FRR based BGP docker.

* 1) Moved dbg_files.sh to scripts/
2) Src directories to archive are now collected from individual Makefiles.
3) Added few more debug symbols
4) Added few more debug dockers.

Here after no more changes except per review comments.

To debug:
Install required version of debug image in Switch or VM.
Copy core file into /debug of host
Get into Docker
gdb /usr/bin/<daemon> -c /debug/<your core file>
set directory /src/... <-- inside gdb to get the source

For non-in-depth debugging:

Download corresponding debug Docker image (docker-...-dbg.gz) to your VM
Load the image
Run image with entrypoint as 'bash' with dir containing core mapped in.
Run gdb on the core.
2019-07-03 22:13:55 -07:00
Joe LeVeque
f14354f003
[monit] Restart rsyslog service if rsyslogd consumes > 800 MB memory (#3117) 2019-07-03 18:21:05 -07:00
Qi Luo
e7b1988638
[submodule] update sonic-linux-kernel (#2985)
* [submodule] update sonic-linux-kernel
* update linux kernel version
* Fix many version strings
* update mellanox components (built with new kernel)
* [mlnx] add make files for SDK WJH libs
* Update arista driver submodule (#8)
Make the debian packaging point to a newer kernel version.
2019-06-18 10:00:16 -07:00
SuvarnaMeenakshi
0f665bdd06 [baseimage] kernel oom-killer to panic when the system is truly out of memory (#2988)
- What I did
Currently when the system is under memory pressure, the OOM killer kicks in and kills a rogue process. Killing a rogue process can cause the device to be un-healthy leading to blackholing of the traffic.

To avoid this, configure the OOM to do a kernel panic which will cause the device to reboot and come back up healthy.

- How I did it
Added the sysctl variable panic_on_oom and set the value to 2.
Setting it to 2 will ensure OOM killer to always do a kernel panic.
2019-06-11 16:19:49 -07:00
lguohan
30b37ec6fb
[build]: make sonic-slave-stretch as the default build docker (#2921)
Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-05-27 15:50:51 -07:00
Qi Luo
62ef8593e7 [monit] Set memory usage alert at 50% (#2939)
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2019-05-24 02:38:41 -07:00
Ying Xie
9efcf1759a
[ebtables] install ebtables in base image and install filter rules (#2805)
- Add ebtables package, and install some filter rules:
  1. ebtables -A FORWARD -d BGA -j DROP
  2. ebtables -A FORWARD -p ARP -j DROP

Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-09 09:44:41 -07:00
paavaanan
e7485fdcef [baseimage]: Flashrom utitily support for BIOS upgrade (#2867) 2019-05-09 00:09:01 -07:00
paavaanan
b56124bf48 removing dhcp- turn- off option from initrd (#2555)
* removing dhcp changes from initrd

* removing mgmt-intf-dhcp file
2019-04-02 15:48:04 -07:00
Qi Luo
145c1348b3
[docker] Update docker package version for CVE-2019-5736 fix (#2663) 2019-03-18 18:50:05 -07:00
yurypm
d632569a6a Add initramfs hook for Arista devices (#2595)
We are going to use initramfs hook for firmware upgrades
To install Arista hook:
- create folder /mnt/flash/<image dir>/platform/hooks/boot1/ from Aboot or
  /host/<image dir>/platform/hooks/boot1/ from Sonic
- add executable script to created folder
2019-02-27 10:28:04 -08:00
Ying Xie
97f5e05b70
[ntp] disable ntp time jump (#2589)
- removing -g to disable jump when time difference is greater than 1000s
- add -x to disable initial jump
2019-02-20 08:04:12 -08:00
Ramesh Santhanakrishnan
734dcb2185 [build]: apply proxy setting to curl. (#2544) 2019-02-08 22:01:29 -08:00
Mykola F
460281d663 [baseimage] install dmidecode (#2540)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-02-08 11:52:12 -08:00
lguohan
f20665008c
[build]: put stretch debian packages under target/debs/stretch/ (#2519)
* [build]: put stretch debian packages under target/debs/stretch/

* in stretch build phase, all debian packages built in that stage are placed under target/debs/stretch directory.
* for python-based debian packages, since they are really the same for jessie and stretch, they are placed under target/python-debs directory.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-02-04 22:06:37 -08:00
Prince Sunny
5fe13ad93b
Disable IPv6 ra for eth0 interface (#2493)
* Disable IPv6 ra for eth0 interface
2019-01-29 09:03:23 -08:00
lguohan
36a9678d3e
[baseimage]: install cgroup-tools and set udp_l3mdev_accept=1 (#2485)
cgroup-tools is required to bind a process to l3mdev master

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-01-25 11:47:09 -08:00
lguohan
4ccd35bc25
[kernel]: update sonic kernel to 4.9.0-8-2 (#2468)
* [kernel]: update sonic kernel to 4.9.0-8-2

* 3b2114d 2019-01-20 | [sonic-linux-kernel] add udp_l3mdev_accept kernel upstream patch (#70) (HEAD, azure/master) [Harish Venkatraman]
* 37734aa 2019-01-10 | L3mdev cgroup (#73) [lguohan]
* d631eeb 2018-12-15 | yet another uart race condition fix (#75) [lguohan]

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* Update Mellanox SDK

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* Update arista platform driver to match 4.9.0-8-2 kernel

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-01-25 00:46:09 -08:00
Samuel Angebault
bfe46e0f1d [docker-engine] fix systemd shutdown hang (#2451)
When rebooting without the platform_reboot plugin, systemd takes a few
minutes to properly shutdown. It's blocking on some docker cleanup
operation.

As described by https://github.com/docker/for-linux/issues/421 there
is a race between docker.service and containerd.service.
docker needs containerd to properly stop the containers.
2019-01-16 01:54:14 -08:00
Nikos
e55a7d7db7 [baseimage]: Initial changes for dhcp to support eth0 in a mgmt vrf (#2348)
* Initial changes to support eth0 in a mgmt vrf
2019-01-15 18:15:56 -08:00
lguohan
b57a376622
[docker-engine]: upgrade docker engine to 18.09 (#2417)
* [docker-engine]: upgrade docker engine to 18.09
2019-01-04 20:47:43 -08:00
Nikos
1eecdb31bf [baseimage]: Install netifaces package in sonic-slave docker and sonic image (#1353) 2018-12-15 11:52:36 -08:00
zhenggen-xu
f093ef2a9f [security kernel] Upgrade kernel from 4.9.110-3+deb9u2 to 4.9.110-3+deb9u6 (#2367)
* [security kernel] Upgrade kernel from 4.9.110-3+deb9u2 to 4.9.110-3+deb9u6
short version: 4.9.0-7 to 4.9.0-8

See changelogs for security fixes:
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.110-3deb9u6

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Update sonic-linux-kernel submodule after it was merged

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2018-12-11 04:17:17 -08:00
lguohan
64a2b1ce99
[vs]: build sonic vs kvm image (#2269)
Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-11-20 22:32:40 -08:00
Joe LeVeque
1e1add90f9
Remove Arista-specific service ACL solution; All platforms now use caclmgrd (#2202) 2018-10-29 10:25:18 -07:00
ironjosh
2d43385927 [baseimage] set default locale en_US.UTF-8 (#1988)
* [baseimage] set default locale en_US.UTF-8

Signed-off-by: chenhu <chenhu@didichuxing.com>

* [baseimage]set default locale to en_US.UTF-8, clean all other unused

* [baseimage] update-locale after locale-gen

* correct update-locale command line

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-09-11 03:15:19 -07:00
Shuotian Cheng
f85a75c811
[debian]: Enable keep_addr_on_down to keep IPv6 addresses (#1992)
Starting from kernel 4.6, this new attribute keep_addr_on_down
is introduced (https://kernelnewbies.org/Linux_4.6).

If set, static global addresses with no expiration time are not
flushed.

Ref:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
2018-08-28 14:40:01 -07:00
Joe LeVeque
98082d56a0 [baseimage]: Download picocom version 3.1-2 from stretch-backports; No longer build from source (#1946) 2018-08-17 17:38:20 -07:00
lguohan
c059d9982a
[baseimage]: install picocom 3.1 in base image (#1943)
* [baseimage]: install picocom 3.1 in base image

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* add picocom to stretch build

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* fix slave.mk bug

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-08-17 09:06:05 -07:00
lguohan
38f3eba695
[kernel]: upgrade kernel to 4.9.0-7 (4.9.110-3+deb9u1) (#1922)
* [kernel]: upgrade kernel to 4.9.0-7 (4.9.110-3+deb9u1)

Signed-off-by: Guohan Lu <gulv@microsoft.com>

* [mellanox]: Update SDK pointer for 4.9.0-7 kernel (#44)

Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>

* Update arista drivers for 4.9.0-7 linux kernel (#43)
2018-08-16 08:56:56 -07:00
cawand
9f545456c9 Added picocom and pexpect to base image, for use in consutil (#1935)
Signed-off-by: Cayla Wanderman-Milne <t-cawand@microsoft.com>
2018-08-15 21:41:12 -07:00
Qi Luo
40bb27ca1c Simplify script to install docker (#1925)
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2018-08-13 18:30:00 -07:00
lguohan
647af39ff0
[build]: create empty /var/lib/docker if needed (#1920)
stretch docker-engine in base image is not started by default
in the build process. Need to create empty /var/lib/docker

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-08-12 22:23:43 -07:00
zhenggen-xu
d761630f73 Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit (#1904)
* Fix potential blackholing/looping traffic and refresh ipv6 neighbor to avoid CPU hit

In case ipv6 global addresses were configured on L3 interfaces and used for peering,
and routing protocol was using link-local addresses on the same interfaces as prefered nexthops,
the link-local addresses could be aged out after a while due to no activities towards the link-local
addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert
them to the HW, and thus cause looping or blackholing traffic.

Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages.

On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly.
This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed,
and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load.

Also, if link-local addresses were not learned, we may not get them at all later.

It is intended to fix all above issues.

Changes:
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.

* Fix review comments:
Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping
format issue

* Combine regular L3 interface and portchannel interface for looping

* Add ndisc6 package to vs docker
2018-08-12 03:14:55 -07:00
Guohan Lu
002bff4e45 [baseimage]: use rsyslog in baseimage from stretch repo
Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-08-11 18:00:10 +00:00
Guohan Lu
0d2ffd885f [baseimage]: move update initramfs to later stage
some platform drivers install blacklist.conf in /etc/modprobe.d.
Those configuration should be proprogated into initramfs to avoid
loading those blacklisted driver.
2018-08-11 09:09:03 +00:00
Guohan Lu
5ae64e7f2d [ixgbe]: compile and install ixgbe to 4.9.0-5 kernel 2018-08-11 09:09:03 +00:00
lguohan
35ab7a6e09 [kernel]: upgrade linux kernel to 4.9.0-5 (4.9.65-3+deb9u2) (#8) 2018-08-11 09:09:03 +00:00
Guohan Lu
0e141a54e9 [baseimage]: install acl package 2018-08-11 09:07:59 +00:00
Guohan Lu
2449fafae0 [kernel]: update kernel submodule and remove standalone igb driver 2018-08-11 09:07:59 +00:00
Guohan Lu
f64ffe8571 [baseimage]: build root filesystem via overlay fs instead of aufs 2018-08-11 09:07:59 +00:00
Guohan Lu
72d70e98be [baseimage]: install systemd-sysv in the base image 2018-08-11 09:07:59 +00:00
Guohan Lu
b6af83ccf8 [baseimage]: upgrade initramfs to 0.130 2018-08-11 09:07:59 +00:00
Guohan Lu
ff1f508f33 [baseimage]: use debian 4.9.0-3 kernel 2018-08-11 09:07:59 +00:00
Guohan Lu
4d701ad037 [baseimage]: update base image from jessie to stretch 2018-08-11 09:07:59 +00:00
Joe LeVeque
7aefa185d4 Download newer version (8.23.0-2) of rsyslog from jessie-backports in hopes of eliminating memory leaks (#1912) 2018-08-09 23:56:41 -07:00
Qi Luo
162e9b6f56 Add monit for /var/log disk usage (#1836)
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2018-07-03 17:00:31 -07:00
lguohan
0e5c5f2601
[baseimage]: add commonly used network tools (#1832)
* [baseimage]: add commonly used network tools
2018-07-01 09:46:25 -07:00
Serhey Popovych
8d88455509 [baseimage]: Improve password hashing for default user account (#1748)
* [slave.mk]: Fix displaying username and password in build summary

We display contents of DEFAULT_USERNAME and DEFAULT_PASSWORD, while
image can be build with USERNAME and/or PASSWORD given on make(1)
command line. For example:

  $ make USERNAME=adm PASSWORD=mypass target/sonic-broadcom.bin

Fix by displaying USERNAME and PASSWORD variables in build summary.

Signed-off-by: Sergey Popovich <sergey.popovich@ordnance.co>

* [baseimage]: Improve default user account handling

There are couple of issues with current implementation of default
user account management in baseimage:

  1) It uses DES to encrypt accounts password. Furthermore this
     effectively limits password length to 8 symbols, even if more
     provided with PASSWORD or DEFAULT_PASSWORD from rules/config.

  2) Salt value for password is same on all builds even with different
     password increasing attack surface.

  3) During the build process password passed as command line parameter
     either as plain text (if given to make(1) as "make PASSWORD=...")
     or DES encrypted (if given to build_debian.sh) can be seen by
     non-build users using /proc/<pid>/cmdline file that has group and
     world readable permissions.

Both 1) and 2) come from:

  perl -e 'print crypt("$(PASSWORD)", "salt"),"\n"')"

that by defalt uses DES if salt does not have format $<id>$<salt>$,
where <id> is hashing function id. See crypt(3) for more details on
valid <id> values.

To address issues above we propose following changes:

  1) Do not create password by hands (e.g. using perl snippet above):
     put this job to chpasswd(8) which is aware about system wide
     password hashing policy specified in /etc/login.defs with
     ENCRYPT_METHOD (by default it is SHA512 for Debian 8).

  2) Now chpasswd(8) will take care about proper salt value.

  3) This has two steps:

    3.1) For compatibility reasons accept USERNAME and PASSWORD as
         make(1) parameters, but warn user that this is unsafe.

    3.2) Use process environment to pass USERNAME and PASSWORD variables
         from Makefile to build_debian.sh as more secure alternative to
         passing via command line parameters: /proc/<pid>/environ
         readable only by user running process or privileged users like
         root.

Before change:
--------------

  hash1
  -----
  # u='admin'
  # p="$(LANG=C perl -e 'print crypt("YourPaSs", "salt"),"\n"')"
                                      ^^^^^^^^
                                      8 symbols
  # echo "$u:$p" | chpasswd -e

  # getent shadow admin
  admin:sazQDkwgZPfSk:17680:0:99999:7:::
        ^^^^^^^^^^^^^
        Note the hash (DES encrypted password)

  hash2
  -----
  # u='admin'
  # p="$(LANG=C perl -e 'print crypt("YourPaSsWoRd", "salt"),"\n"')"
                                      ^^^^^^^^^^^^
                                      12 symbols
  # echo "$u:$p" | chpasswd -e

  # getent shadow admin
  admin:sazQDkwgZPfSk:17680:0:99999:7:::
        ^^^^^^^^^^^^^
        Hash is the same as for "YourPaSs"

After change:
-------------

  hash1
  -----
  # echo "admin:YourPaSs" | chpasswd
  # getent shadow admin
  admin:$6$1Nho1jHC$T8YwK58FYToXMFuetQta7/XouAAN2q1IzWC3bdIg86woAs6WuTg\
           ^^^^^^^^
           Note salt here
  ksLO3oyQInax/wNVq.N4de6dyWZDsCAvsZ1:17681:0:99999:7:::

  hash2
  -----
  # echo "admin:YourPaSs" | chpasswd
  # getent shadow admin
  admin:$6$yKU5g7BO$kdT02Z1wHXhr1VCniKkZbLaMPZXK0WSSVGhSLGrNhsrsVxCJ.D9\
           ^^^^^^^^
           Here salt completely different from case above
  plFpd8ksGNpw/Vb92hvgYyCL2i5cfI8QEY/:17681:0:99999:7:::

Since salt is different hashes for same password different too.

  hash1
  -----
  # LANG=C perl -e 'print crypt("YourPaSs", "\$6\$salt\$"),"\n"'
                                             ^^^^^
                                             We want SHA512 hash
  $6$salt$qkwPvXqUeGpexO1vatnIQFAreOTXs6rnDX.OI.Sz2rcy51JrO8dFc9aGv82bB\
  yd2ELrIMJ.FQLNjgSD0nNha7/

  hash2
  -----
  # LANG=C perl -e 'print crypt("YourPaSsWoRd", "\$6\$salt\$"),"\n"'
  $6$salt$1JVndGzyy/dj7PaXo6hNcttlQoZe23ob8GWYWxVGEiGOlh6sofbaIvwl6Ho7N\
  kYDI8zwRumRwga/A29nHm4mZ1

Now with same "salt" and $<id>$, and same 8 symbol prefix in password, but
different password length we have different hashes.

Signed-off-by: Sergey Popovich <sergey.popovich@ordnance.co>
2018-06-09 11:29:16 -07:00
Qi Luo
d54a7ae566
[baseimage] Adding setuid permissions to ping binaries, so sudo is no longer needed (#1765) 2018-06-04 21:01:53 -07:00