Commit Graph

5777 Commits

Author SHA1 Message Date
yozhao101
04cd1d61e8
[Monit] Monitoring the running status of containers. (#6251)
**- Why I did it**
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. 

**- How I did it**
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

**- How to verify it**
I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-01-07 19:52:22 -08:00
lguohan
fc3cb76dcf
[ci]: Set up CI with Azure Pipelines (#6384)
[skip ci]
2021-01-07 19:50:11 -08:00
Renuka Manavalan
dbc6718408
Take a copy of existing TACACS credentials and restore it during upgrade (#6285)
In scenario where upgrade gets config from minigraph, it could miss tacacs credentials as they are not in minigraph. Hence restore explicitly upon load-minigraph, if present.

- Why I did it
Upon boot, when config migration is required, the switch could load config from minigraph. The config-load from minigraph would wipe off TACACS key and disable login via TACACS, which would disable all remote user access. This change, would re-configure the TACACS if there is a saved copy available.

- How I did it
When config is loaded from minigraph, look for a TACACS credentials back up (tacacs.json) under /etc/sonic/old_config. If present, load the credentials into running config, before config-save is called.

- How to verify it
Remove /etc/sonic/config_db.json and do an image update. Upon reboot, w/o this change, you would not be able ssh in as remote user. You may login as admin and check out, "show tacacs" & "show aaa" to verify that tacacs-key is missing and login is not enabled for tacacs.
With this change applied, remove /etc/sonic/config_db.json, but save tacacs & aaa credentials as tacacs.json in /etc/sonic/. Upon reboot, you should see remote user access possible.
2021-01-07 16:45:38 -08:00
Joe LeVeque
e52581e919
[PDDF] Build and install Python 3 package (#6286)
- Make PDDF code compliant with both Python 2 and Python 3
- Align code with PEP8 standards using autopep8
- Build and install both Python 2 and Python 3 PDDF packages
2021-01-07 10:03:29 -08:00
Danny Allen
0ad2098402
[README] Update build badges to include 202012 build status (#6373)
Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-07 10:02:39 -08:00
Joe LeVeque
2d77a36658
[system-health] Make run_command() Python 3-compliant (#6371)
Pass universal_newlines=True parameter to subprocess.Popen(); no longer use .encode('utf-8') on resulting stdout.
This was missed in #5886

Note: I would prefer to use text=True instead of universal_newlines=True, as the former is an alias only available in Python 3 and is more understandable than the latter. However, Even though the setup.py file for this package only specifies Python 3, the LGTM tool finds other Python 2 code in the repo and validates the code as Python 2 code and alerts that text=True is an invalid parameter. Will stick with universal_newlines=True for now. Once all Python code in the repo has been converted to Python 3, I will change all universal_newlines=True to text=True.
2021-01-07 05:48:13 -08:00
gechiang
a6907a7c62
[brcm]: BRCM SAI 4.2.1.5-9 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)
Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
2021-01-07 05:46:48 -08:00
abdosi
afc87b8ccd
Updated imfile configuration for supervisord logs (#6368)
Updated imfile configuration for supervisord logs for stretch and buster.
2021-01-06 18:47:36 -08:00
carl-nokia
f99dbff722
[Nokia]: Enable Telemetry for armhf and provide required qos files (#6364)
* [platform][Nokia]: Add buffers and qos files for config qos reload

   - providing required files

* [platform][armhf]: remove hardcoded disable for Telemetry on armhf

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-06 15:09:13 -08:00
Joe LeVeque
a013b8c247
[sonic-platform-common][sonic-platform-daemons] Update submodules (#6352)
src/sonic-platform-common 9935fca...8664efc (2):

Make sonic_sfp Python2 and Python3 compatible (#157)
[sffbase.py] Fix to make Python 3-compatible (#156)

src/sonic-platform-daemons e6c786b...81318f7 (1):

[psud] Fix issue where PSU Fan info is not updated in State DB (#137)

Fixes #6341
2021-01-06 14:47:30 -08:00
sudhanshukumar22
8a3ac8ff9c
[docker-lldp]: sonic advertise meaningful SysDescription instead of debian (#6114)
Sonic devices advertise meaningful system description along with Debian package information.

before the fix:

-------------
admin@sonic:~$ show lldp neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: Ethernet0, via: LLDP, RID: 3, Time: 0 day, 16:36:30
SysName: sonic
SysDescr: Debian GNU/Linux 9 (stretch) Linux 4.9.0-11-2-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
-------------------------------------------------------------------------------

After the fix:

root@sonic:~# show lldp neighbors Ethernet16
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    Ethernet16, via: LLDP, RID: 10, Time: 0 day, 00:01:00
    SysName:      sonic
    SysDescr:     SONiC Software Version: SONiC.sonic_upstream_1.0_daily_201130_1501_62-dirty-20201130.203529 - HwSku: Accton-AS7816-64X - Distribution: Debian 10.6 - Kernel: 4.19.0-9-2-amd64
-------------------------------------------------------------------------------

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
2021-01-06 12:24:57 -08:00
Aravind Mani
4632e0b5b1
DellEMC: Z9332f change SFP detection logic (#6261)
- Dynamically change EEPROM driver based on media type.
- Otherwise, EEPROM INFO and DOM INFO might not be fetched properly and will result in erroneous output.
2021-01-06 11:12:04 -08:00
Ying Xie
78dae94e5e
[sairedis] advance sairedis submodule head (#6365)
To incldue following changes:

- [ci]: add build for arm64 and armhf (#757)
- Use template hgetall, because we will tune the return types of library functions (#759)
- [syncd] Fix bulk multi attrs for same key db update (#761)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-01-06 06:14:25 -08:00
abdosi
afd60bdc48
[rsyslog]: Explicitly set the notify mode for rsyslog imfile module (#6351)
Enable the notify mode of rsyslogd imfile module used for supervisord
logs in docker container. 

Setup the mode="inotify" when loading imfile, made sure we are are getting 
supervisord logs in host immediately.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-01-06 00:00:18 -08:00
Travis Van Duyn
d769ef2abd
[snmp]: updated to support snmp config from redis configdb (#6134)
**- Why I did it**
I'm updating the jinja2 template to support getting SNMP information from the redis configdb. 
I'm using the format approved here: 
https://github.com/Azure/SONiC/pull/718

This will pave the way for us to decrement using the snmp.yml in the future.  
Right now we will still be using both the snmp.yml and configdb to get variable information in order to create the snmpd.conf via the sonic-cfggen tool. 

**- How I did it**
I first updated the SNMP Schema in PR #718 to get that approved as a standardized format. 
Then I verified I could add snmp configs to the configdb using this standard schema.  Once the configs were added to the configdb then I updated the snmpd.conf.j2 file to support the updates via the configdb while still using the variables in the snmp.yml file in parallel.  This way we will have backward compatibility until we can fully migrate to the configdb only. 

By updating the snmpd.conf.j2 template and running the sonic-cfggen tool the snmpd.conf gets generated with using the values in both the configdb and snmp.yml file. 

Co-authored-by: trvanduy <trvanduy@microsoft.com>
2021-01-05 13:43:29 -08:00
Vaibhav Hemant Dixit
6fd78b6e3d
[determine-reboot-cause] Ignore non-hardware reboot cause (#6349)
- Why I did it - Reboot cause prints "Non-Hardware (N/A)" instead of showing the software reboot cause.
The issue is mishandling of hardware reboot cause in determine-reboot-cause script.

- How I did it
Fixed the handling for Non Hardware reboot cause. Ignore if Non-Hardware is present in the hardware_reboot_cause output. Added some code refactoring for simplicity.

- How to verify it - With fix, the hardware reboot cause is ignored (if it is non hw):
2021-01-05 10:47:42 -08:00
Junchao-Mellanox
4460076db1
[xcvrd] Remove dependency on SONIC_PLATFORM_API_PY2 and SONIC_PLATFORM_API_PY3 (#6344)
Remove the build time dependency on SONIC_PLATFORM_API_PY2 and SONIC_PLATFORM_API_PY3 from xcvrd make rule
2021-01-05 09:39:52 -08:00
Akhilesh Samineni
62e7c452d0
After first bootup, the FEATURE table is not present in CONFIG_DB (#5911)
Fix the After first bootup(onie-install), the FEATURE table is not present in CONFIG_DB. 
Fix is done by calling config reload.
2021-01-05 09:22:16 -08:00
xumia
95936805e0
Install the latest version of the sonic build hooks in slave container (#6348) 2021-01-05 19:05:13 +08:00
sudhanshukumar22
7fc2d381a9
[lldp] lldp upstream patches (#6118)
The details are as follows:

    1. 0010-Ported-fix-for-length-exceeded-from-lldp-community.patch

    Patch taken from 78243478dc

    lib: remove limit on system description length

    The limit was introduced in 9c49ced while fixing a memory leak.
    The state data is used to ensure we don't interleave operations. We
    need to handle the case where the value is truncated because it is
    larger than the allocated size.

    Fix issue https://github.com/lldpd/lldpd/issues/408

    2. 0011-fix-med-location-len.patch
    
    Patch taken from 5c3479463a

    lib: fix LLDP-MED location parsing in liblldpctl

    Some bounds were not checked correctly when parsing LLDP-MED civic
    location fields. This triggers out-of-bound reads (no write) in
    lldpcli, ultimately leading to a crash.

    Fix https://github.com/lldpd/lldpd/pull/420

Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
2021-01-04 23:19:50 -08:00
lguohan
6fbd52b1c1
[docker-sonic-vs]: reduce the build steps for docker-sonic-vs (#6350)
combine multiple same operation into one operation to reduce
the build steps. this is to avoid max depth exceeded issue
in the build.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-04 21:57:58 -08:00
Arun Saravanan Balachandran
19b2b44638
[DellEMC] Add platform-modules as prerequisite for determine-reboot-cause (#6322)
Add a systemd dependency to make platform-modules service as a prerequisite for determine-reboot-cause service to ensure platform initialization is complete before determine-reboot-cause.service executes.
2021-01-04 20:33:44 -08:00
sandycelestica
c38eee8294
Update udev rules to support 96 ttyUSB ports (#6334)
Signed-off-by: Jing Kan jika@microsoft.com
2021-01-05 10:26:05 +08:00
Myron Sosyak
9006b961b7
[BFN] Convert platform modules to python 3 (#6347)
Fix syntax errors during xcvrd start with Python 3 daemons
2021-01-04 15:00:48 -08:00
Sabareesh-Kumar-Anandan
d6b92da643
[libyang1] Adding LFS support for arm32 (#6346)
In the emulated armhf environment, the function readdir()returns NULL on a ext4 file system directory. When running the libyang1 test cases, it will require to load the plugins from the files (such as metadata.so), because the readdir() is failing, the plugins can’t be loaded in the emulated armhf environment, so it causes libyang1 test error. This error is a combination of the following reasons.
• Emulation of a 32-bit target from a 64-bit host –> qemu from x86_64 to armhf
• Glibc version > 2.27 – Debian buster is using glibc 2.28

- How I did it
Enabled large file support by setting _FILE_OFFSET_BITS=64 for libyang1.

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
2021-01-04 12:28:32 -08:00
Denys Petryshyn
9e57f5157e
[BFN] Upgrade docker-syncd-bfn to buster (#6345)
* Add changes to allow migration of bfn syncd to buster

* Update BFN packages for Debian 10

Signed-off-by: Denys Petryshyn <denysx.petryshyn@intel.com>
2021-01-04 10:38:53 -08:00
lguohan
ae5caee515
[frr]: change frr debug package to extra to avoid build break for dbg image (#6340)
build frr dbg image force to install frr in the build process
which breaks the current build and is uneccessary.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-03 17:06:05 -08:00
lguohan
618b4bf243
[broadcom]: match the brcm sai filename version to control file version (#6339)
the control file version is 4.2.1.5-7

sonic$ sudo dpkg -i target/debs/buster/libsaibcm-dev_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm-dev_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm-dev (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm-dev (4.2.1.5-7) ...
lgh@491d842369cf:/sonic$ sudo dpkg -i target/debs/buster/libsaibcm_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm (4.2.1.5-7) ...
Processing triggers for libc-bin (2.28-10) ...

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-03 08:28:05 -08:00
Mahesh Maddikayala
c6253f6334
[sonic-py-common] Added an API to get file path containing SONiC version (#6309)
* [sonic-py-common] add an API to get file path containing SONiC version so that the API can be mocked for unit tests.
2021-01-03 08:10:28 -08:00
xumia
36fbc01a59
Fix the hostimage version path permission issue (#6337) 2021-01-03 17:32:20 +08:00
Guohan Lu
ae2cb47091 [build]: add artifical dependency between libyang and frr
frr build requires libyang 1.0.184 which conflicts with
libyang 1.0.73. Solution here is to compile frr and libyang 1.0.184
first, and then uninstall libyang 1.0.184 after frr build.
Then, compile libyang 1.0.73 and all packages depend on it later.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-02 12:45:32 -08:00
Guohan Lu
f2c418e78b [frr]: remove dependency betwee frr and frr-snmp
no instalation of frr during the build process

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-02 12:45:32 -08:00
Guohan Lu
d365d0d99e [build]: check if package with given version is installed or not
in case conflicting packages have same name but different version,
the install step needs to wait till package with the conflicting
version is uninstalled

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-02 12:45:32 -08:00
Guohan Lu
30a51c1ff7 [build]: fix dpkg uninstall bug
fix a bug when there are multiple debian packages to be uninstalled

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-02 12:45:32 -08:00
Guohan Lu
66fffedc8a [doc]: add 202012 branch in the PR request template
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-02 11:20:17 -08:00
brandonchuang
c4156b8745
[device/accton] Fix accton driver not been installed (#6327)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
2021-01-02 08:02:30 -08:00
Kebo Liu
3acf7006ed
[mellanox]: Update Mellanox SDK to 4.4.2208 FW to *.2008.2208 (#6333)
Features:
    Spectrum-3 | Systems | Added GA-level support for SN4700 A0 system
    All | Shared headroom | Added GA-level support for Shared headroom between PGs

  Bugs fixes:
    All | Counters | Sent traffic in certain size is wrongly increase to a smaller size counter, because port extended counter has a counter for sent traffic per packet-size range
    All | Shared buffer | Configuring shared buffer on the fly may, on occasion, cause the chip to get stuck
    Spectrum-2 | Modules | On occasion, link down is experienced with INPHI COLORZ PAM4 100G optic cables on SN3700 systems
2020-12-31 17:44:02 -08:00
Tony Titus
70ba59109d
[sonic-swss-common] Update submodule (#6317)
Including commits in sonic-swss-common repo:

b423b9c Add support for hexists call (#432) [Tony Titus]
0982996 Remove extension of tableNameSeparatorMap (#430) [Qi Luo]
d16cc76 [build]: add azure pipeline build badge (#429) [lguohan]
f2aaf55 Set up CI with Azure Pipelines (#428) [lguohan]
2020-12-31 17:43:09 -08:00
abdosi
ef0088c29f
Enable the notify mode of rsyslogd imfile module used for supervisord (#6298)
Enable the notify mode of rsyslogd imfile module used for supervisord logs in docker container
2020-12-31 17:01:57 -08:00
Guohan Lu
a165e632e3 [build]: fix syntax error when DOCKER_BASE_PULL is enabled
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-12-31 00:46:21 -08:00
Vaibhav Hemant Dixit
40f69f0aac
[determine-reboot-cause] Skip invoking platform code for unit tests for hardware_reboot_cause (#6325)
What: Modify unit test to not call any platform dependent api in test_find_hardware_reboot_cause.

- Why I did it
MELLANOX build is failing for the recent PRs. The errors are due to platform library being invoked in a unit test for determine-reboot-cause script.

Verified by running unit tests and a successful Mellanox build.

Co-authored-by: Vaibhav Hemant Dixit <vadixit@microsoft.com>
2020-12-30 17:48:56 -08:00
kktheballer
ba92a081ce
Minigraph ECMP parsing support (cleaner format) (#4985)
Why I did it
To support FG_ECMP  scenarios
- How I did it
Modified minigraph parser to parse ECMP fields in the case they are present in minigraph
- How to verify it
Loaded ensuing config_db file on a DUT to verify the fields are parsed and configure device correctly
2020-12-30 15:18:21 -08:00
carl-nokia
a1fe203788
[Nokia]: EEPROM platform API Python3 compliance changes (#6318)
- Why I did it
Make EEPROM platform APIs Python3 compliant in Nokia platform.

- How I did it
Handle bytearray type returned by read_eeprom and read_eeprom_bytes methods.

- How to verify it
Boot Nokia ixs7215 and verify PMON docker running and show platform syseeprom

Co-authored-by: Carl Keene <keene@nokia.com>
2020-12-30 07:34:57 -08:00
Danny Allen
a64994ec29
[sysctl] Increase hung_task_timeout_secs to 300 (#6312)
Depending on the performance characteristics of a given hardware platform, it's possible to exceed the default 120 second kernel timeout during I/O intensive operations like image installation. This can cause a kernel panic like so:

kernel:[ 852.441781] Kernel panic - not syncing: hung_task: blocked tasks

If this happens during image installation, it's possible for the install to become corrupted and leave the device in an unreachable state that requires a power cycle to resolve. This risk increases as image size continues to increase. So, we need to increase the timeout so that we don't encounter kernel panics on devices with lower disk throughput.

Signed-off-by: Danny Allen <daall@microsoft.com>
2020-12-30 05:00:16 -08:00
Nazarii Hnydyn
119fd7f577
[buildsystem] Fix syntax error: unexpected end of file in Makefile.work (#6315)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-12-30 04:59:16 -08:00
lguohan
de4a3c8f2f
[build]: change user name to lower case when used in sonic-slave tag (#6319)
sonic-slave tag only allows all lower case. In case the user
name is mixed case, we need to change user name to all lower case.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-12-30 04:58:20 -08:00
lguohan
727a451fed
[build]: setup -t option in docker run correctly (#6320)
use bash -t test flag to check if input device is tty or not

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-12-30 04:57:44 -08:00
Joe LeVeque
566ea4f601
[system-health] Convert to Python 3 (#5886)
- Convert system-health scripts to Python 3
- Build and install system-health as a Python 3 wheel
- Also convert newlines from DOS to UNIX
2020-12-29 14:04:09 -08:00
Joe LeVeque
62662acbd5
No longer install some unnecessary Python 2 packages in host (#6301)
- No longer install Python 2 packages in host:
    - libpython2.7-dev
    - docker
    - ipaddress
    - netifaces
    - azure-storage
    - watchdog
    - futures

- Install Python 3 versions of the following packages in host:
    - docker
    - azure-storage
    - watchdog
    - redis
    - swsssdk (install unconditionally)
2020-12-29 13:02:11 -08:00
Pavel Shirshov
a7b8f8914e Patch libyang1.0.184 so version and let frr 7.5 use the patched version 2020-12-29 03:44:49 -08:00