sonic-buildimage

Author	SHA1	Message	Date
roman_savchuk	c357a56c70	[201911] Add executable permission back to supervisor-proc-exit-listener file (#4891 ) While testing reboot case for 201911 facing error: supervisor-proc-exit-listener FATAL command at '/usr/bin/supervisor-proc-exit-listener' is not executable Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>	2020-07-03 14:32:35 -07:00
yozhao101	d32beffed0	[201911][docker-lldp] Correct lldp-syncd program name in critical_processes file (#4863 ) The program name in critical_processes file must match the program name defined in supervisord.conf file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-28 11:09:27 -07:00
yozhao101	bbcd4c6235	[Monit] Use the string "/usr/bin/syncd\s" to monitor the syncd process (#4706 ) - Why I did it After discussed with Joe, we use the string "/usr/bin/syncd\s" in Monit configuration file to monitor syncd process on Broadcom and Mellanox. Due to my careless, I did not find this bug during the previous testing. If we use the string "/usr/bin/syncd" in Monit configuration file to monitor the syncd process, Monit will not detect whether syncd process is running or not. If we ran the command `sudo monit procmactch “/usr/bin/syncd”` on Broadcom, there will be three processes in syncd container which matched this "/usr/bin/syncd": `/bin/bash /usr/bin/syncd.sh wait`, `/usr/bin/dsserve /usr/bin/syncd –diag -u -p /etc/sai.d/sai.profile` and `/usr/bin/syncd –diag - u -p /etc/sai.d/said.profile`. Monit will select the processes with the highest uptime (at there `/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p /etc/sai.d/said.profile` to match. Similarly, On Mellanox Monit will also select the process with the highest uptime (at there `/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p /etc/sai.d/said.profile` to match. That is why Monit is unable to detect whether syncd process is running or not if we use the string “/usr/bin/syncd” in Monit configuration file. If we use the string "/usr/bin/syncd\s" in Monit configuration file, Monit can filter out the process `/bin/bash /usr/bin/syncd.sh wait` and thus can correctly monitor the syncd process. - How I did it - How to verify it Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-28 07:29:59 -07:00
Aravind Mani	2f97faaf7c	[DellEMC] S52xx fix SFP reset in 1.0 API (#4858 ) Issue: Port with AOC cable does not come up when "sfputil reset <port_name>" is executed. Modified the incorrect mask used in reset API to resolve the issue.	2020-06-28 07:29:24 -07:00
Joe LeVeque	5df5015835	[build][systemd] Mask disabled services by default (#4721 ) When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json. This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.	2020-06-28 07:28:56 -07:00
Joe LeVeque	0768bf7733	[hostcfgd] Synchronize all feature statuses once upon start (#4714 ) - Ensure all features (services) are in the configured state when hostcfgd starts - Better functionalization of code - Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword. This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.	2020-06-28 07:28:33 -07:00
Tamer Ahmed	10cd212577	[fast-reboot] Back up FDB/ARP/Default routes (#4795 ) FDB/ARP/Default routes files are deleted after swssconfig. This makes debugging/validation of device conversion hard. This PR saves those files in order to facilitate debugging of device conversion. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-06-28 07:27:55 -07:00
Wirut Getbamrung	3d0126baeb	[platform-celestica]: Update fancontrol service for Seastone-DX010 device (#3690 ) * [platform/cel]: add fancontrol service support for dx010 * [device/celestica]: add hysteresis temp to dx010 fancontrol configuration	2020-06-28 07:27:20 -07:00
Kebo Liu	1bade2c67b	Add with_i2cdev for mst start to have I2C device loaded properly (#4790 )	2020-06-28 07:24:35 -07:00
Junchao-Mellanox	acafde1895	[Mellanox] Change port index in port_config.ini to 1-based (#4781 ) * Change port index in port_config.ini to 1-based * Add default port index to port_config.ini, change platform plugins to accept 1-based port index * fix port index in sfp_event.py	2020-06-28 07:24:10 -07:00
madhanmellanox	33e2264746	added files to create SKU Mellanox-SN3800-D24C52 (#4808 ) * added files to create SKU Mellanox-SN3800-D24C52 Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-06-28 07:22:12 -07:00
Kebo Liu	9db492e31f	[Mellanox] Update SDK to 4.4.0952, FW to *.2007.1280 (#4842 )	2020-06-28 07:19:25 -07:00
Andriy Kokhan	7d436b336e	[BFN] Disabled thermalctld for Barefoot platforms (#4823 ) Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>	2020-06-27 01:23:12 -07:00
abdosi	a84b534ed7	[broadcom-sai]: Updated broadcom SAI to fix High CPU on TH/Th2 platform. (#4859 ) Verified after loading on TH platforms cpu usage gone down: Previous: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2521 root 20 0 1512860 360452 63540 S 144.6 4.4 8:00.03 syncd After Fix: 7500 root 20 0 1592420 350912 64184 S 45.4 4.3 3:50.99 syncd	2020-06-27 01:10:41 -07:00
yozhao101	c2364cf03e	[201911][dockers] Update critical_processes file syntax (#4854 ) Backport of https://github.com/Azure/sonic-buildimage/pull/4831 to the 201911 branch	2020-06-26 11:37:05 -07:00
madhanmellanox	337502c220	converting to Platform based utils (#4830 ) Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-06-23 09:12:54 -07:00
madhanmellanox	67e183d0dc	added files to create SKU Mellanox-SN3800-C64 (#4812 ) * added files to create SKU Mellanox-SN3800-C64 Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-06-22 17:39:31 -07:00
madhanmellanox	b94c12d920	modified files relevant to SKU Mellanox-SN3800-D112C8 (#4810 ) * modified files relevant to SKU Mellanox-SN3800-D112C8 Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-06-22 17:38:42 -07:00
madhanmellanox	415000b8b2	added files to create SKU Mellanox-SN3800-D28C50 (#4809 ) * added files to create SKU Mellanox-SN3800-D28C50 Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-06-22 17:37:46 -07:00
Abhishek Dosi	16d39c6c19	[submodule update] sonic-utilities Fix for command. show interface transceiver eeprom -d Ethernet (#955) Updated natshow script to support DNAT Pool changes (#921)	2020-06-22 16:29:36 -07:00
padmanarayana	7564b060e4	[DELL]: FTOS to SONiC fast conversion fixes (#4807 ) While migrating to SONiC 20181130, identified a couple of issues: 1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.	2020-06-20 08:15:05 -07:00
Joe LeVeque	d8886ba473	[caclmgrd] Don't limit connection tracking to TCP (#4796 ) Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.	2020-06-20 08:13:11 -07:00
pavel-shirshov	9ffd87044b	Update .gitignore for platform (#4803 )	2020-06-20 08:12:25 -07:00
Qi Luo	817ce94215	Fix bug: check port alias even when port_config_file parameter is not provided (#4787 )	2020-06-20 08:11:02 -07:00
abdosi	c480de4769	[Submodule update] sonic-dbsyncd (#4801 ) lldp: For multi-npu platforms make sure to add Backplane Interface also as Interface Match List.	2020-06-20 08:09:38 -07:00
abdosi	173168ca86	kubeadm package apt-get install has unmet dependency error (#4804 ) to other packages so intsalling them explicitly.	2020-06-18 23:16:30 -07:00
abdosi	9244925943	[baseimage]: incrase docker ramfs from 900MB to 1300MB (#4800 ) Images built from 201911 branch and installed on devices where we mount /var/lib/docker in RAM (because the HDD is small) were failing as there was not enough space to untar docker.tar.gz . This is due to the increase in total number of containers in the image. As of today, /var/lib/docker contains 1.1 GB of data. Therefore, this PR increases the size of the ramdisk to 1.3 GB to accommodate all the containers as of now and any new container going forward. Example output below from an Arista-7050-QX32 SKU: admin@str-a7050-acs-2:~$ df -h Filesystem Size Used Avail Use% Mounted on ..... tmpfs 1.3G 1.1G 221M 84% /var/lib/docker ..... Verified all docker running fine and interfaces/bgp are up. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-06-18 10:21:01 -07:00
judyjoseph	a6877ad1b5	Using (pci) device id to identify the ASIC on sai_switch_create (#4705 ) * Update to sonic_cfggen and utilities to populate the (pci) device id in the "asic_id" field in the DEVICE_METADATA. This Id is parsed from the file "asic.conf" file in the device/<platform> dir. The format of entries are eg: for a 2 ASIC platform. DEV_ID_ASIC_0=03:00.0 DEV_ID_ASIC_1=04:00.0 Going forward will use this device id as the asic instance ID passed to syncd/sai while doing create_switch. Current support is limited, supports only one TD2 platform.	2020-06-17 18:23:08 -07:00
Nazarii Hnydyn	7b18d9c15c	[Mellanox] Update MFT to v4.14.5-2. (#4784 ) Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2020-06-17 10:18:01 -07:00
Abhishek Dosi	1d384be3d3	[submodule update] sonic-sairedis [syncd] Use steady clock for TimerWatchdog (#613)	2020-06-17 10:15:39 -07:00
abdosi	c2981b8cdf	[build] Ensure /usr/lib/systemd/system/ directory exists before referencing (#4788 ) * Fix the Build on 201911 (Stretch) where the directory /usr/lib/systemd/system/ does not exist so creating manually. Change should not harm Master (buster) where the directory is created by Linux * Fix as per review comments	2020-06-17 09:59:53 -07:00
Volodymyr Samotiy	2f82cce3e8	[Mellanox] Update SDK 4.4.0940 and FW xx.2007.1244 (#4777 )	2020-06-16 10:28:22 -07:00
Abhishek Dosi	bce65bbd32	[submodule update] sonic-platform-common [eeprom] Add try-except to catch the IOError (#85) [sfputilbase.py] Don't try to print EEPROM sysfs file name if we failed to read from it (#81)	2020-06-16 10:12:00 -07:00
Abhishek Dosi	a30a2cebcf	[submodule update] sonic-swss-common Add missed BGP tables into the schema (#351)	2020-06-16 10:01:55 -07:00
Abhishek Dosi	85eb651d17	[submodule update] sonic-platfrom-daemons [syseepromd] Prevent the syseepromd from termination (#56) [thermalctld] Fix invalid warning status (#58)	2020-06-16 10:00:44 -07:00
Abhishek Dosi	96a8e24055	[Submodule update] sonic-swss Revert "[portsorch] Enable port-level buffer drop counters (#1237)" (#1308) Add more log message, fix test code (#1239)	2020-06-16 09:12:41 -07:00
Abhishek Dosi	c656b4c582	[Submodule update] sonic-util [201911][thermal control] Backport changes from master branch (#929) [201911][config] Support abbreviation (#933) Add 'hw-management-generate-dump.sh' to 'show techsupport' command (#934) [fwutil]: Update fwutil to v2.0.0.0. (#942) Fixes bug for PFCWD feature parameters (#838) Fixed fast-reboot for BFN platform (#871) [config] Add 'interface transceiver' subgroup with 'lpmode' and 'reset' subcommands (#904) [warm-reboot]: added pre-check for ISSU file (#915) [config] Don't attempt to restart disabled services (#944)	2020-06-16 09:09:22 -07:00
yozhao101	4846fc0337	[docker-syncd] Add timeout to force stop syncd container (#4617 ) - Why I did it When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be stopped as expected. However, I found sometimes syncd container can be stopped, sometimes it can not be stopped. The reason why syncd container can not be stopped is the process (/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process. 164 # wait until syncd quit gracefully 165 while docker top syncd$DEV \| grep -q /usr/bin/syncd; do 166 sleep 0.1 167 done The first thing I did is to profile how long this while loop will spin if syncd container can be normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd container can be normally stopped, two messages will be written into syslog: str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134 str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134 The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds: After that, the testing result is that syncd container can be normally stopped if swss is stopped first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future. - How I did it I added a timer around the while loop in stop() function. This while loop will exit after spinning 20 seconds. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-16 08:21:15 -07:00
Dong Zhang	143e4f524c	[MultiDB] Add REDIS_TIMEOUT_MSECS back which is removed by mistake (#4757 )	2020-06-16 08:19:38 -07:00
Renuka Manavalan	f8a9a1b805	[k8s]: switching to Flannel from Calico. (#4768 ) Switching to Flannel from Calico which brings down the image size by around 500+MB.	2020-06-16 08:18:54 -07:00
arlakshm	c5807c2dd2	[bgp]:Add redistribution connected for ipv6 also for Frontend ASICs (#4767 ) * fix redistribution connected for ipv6 also Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-06-16 08:18:19 -07:00
Joe LeVeque	c625e0e3e6	[build] Enable telemetry service by default (#4760 ) - Why I did it To ensure telemetry service is enabled by default after installing a fresh SONiC image - How I did it Set telemetry feature status to "enabled" when generating init_cfg.json file	2020-06-16 08:17:47 -07:00
shlomibitton	eb2fe4b16e	Fix MSN4700 sensors (#4753 ) Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>	2020-06-16 08:16:55 -07:00
Prince Sunny	b4f45e9c15	Submodule update - sonic-restapi (#4749 )	2020-06-16 08:16:11 -07:00
Arun Saravanan Balachandran	030570de81	[DellEMC]: EEPROM decoder for S6000, S6000-ON (#4718 ) - Why I did it For decoding system EEPROM of S6000 based on Dell offset format and S6000-ON’s system EEPROM in ONIE TLV format. - How I did it - Differentiate between S6000 and S6000-ON using the product name available in ‘dmi’ ( “/sys/class/dmi/id/product_name” ) - For decoding S6000 system EEPROM in Dell offset format and updating the redis DB with the EEPROM contents, added a new class ‘EepromS6000’ in eeprom.py, - Renamed certain methods in both Eeprom, EepromS6000 classes to accommodate the plugin-specific methods. - How to verify it - Use 'decode-syseeprom' command to list the system EEPROM details. - Wrote a python script to load chassis class and call the appropriate methods. UT Logs: [S6000_eeprom_logs.txt](https://github.com/Azure/sonic-buildimage/files/4735515/S6000_eeprom_logs.txt), [S6000-ON_eeprom_logs.txt](https://github.com/Azure/sonic-buildimage/files/4735461/S6000-ON_eeprom_logs.txt) Test script: [eeprom_test_py.txt](https://github.com/Azure/sonic-buildimage/files/4735509/eeprom_test_py.txt)	2020-06-16 08:15:28 -07:00
Ying Xie	aecebac86b	[ntp] disable ntp long jump (#4748 ) Found another syncd timing issue related to clock going backwards. To be safe disable the ntp long jump. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-06-16 08:15:00 -07:00
Junchao-Mellanox	d10b597f50	[Mellanox] Upgrade mft to 4.14.1-8 (#4701 )	2020-06-16 08:14:18 -07:00
Joe LeVeque	ed0e6aed1c	[hostcfgd] Get service enable/disable feature working (#4676 ) Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here: 1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop 2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots 3. Substitute returns with continues so that even if one service fails, we still try to handle the others Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.	2020-06-16 08:13:32 -07:00
Joe LeVeque	42bc14f44c	[systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673 ) This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation	2020-06-16 08:12:47 -07:00
Olivier Singla	18bbbb3c02	[baseimage]: Run fsck filesystem check support prior mounting filesystem (#4431 ) * Run fsck filesystem check support prior mounting filesystem If the filesystem become non clean ("dirty"), SONiC does not run fsck to repair and mark it as clean again. This patch adds the functionality to run fsck on each boot, prior to the filesystem being mounted. This allows the filesystem to be repaired if needed. Note that if the filesystem is maked as clean, fsck does nothing and simply return so this is perfectly fine to call fsck every time prior to mount the filesystem. How to verify this patch (using bash): Using an image without this patch: Make the filesystem "dirty" (not clean) [we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform] [do this only on a test platform!] dd if=/dev/sda3 of=superblock bs=1 count=2048 printf "$(printf '\\x%02X' 2)" \| dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null dd of=/dev/sda3 if=superblock bs=1 count=2048 Verify that filesystem is not clean tune2fs -l /dev/sda3 \| grep "Filesystem state:" reboot and verify that the filesystem is still not clean Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean. fsck log is stored on syslog, using the string FSCK as markup.	2020-06-16 08:12:11 -07:00

... 3 4 5 6 7 ...

3294 Commits