Recently, we found on some of our testbeds the entropy collecting process finishes more than 60 seconds after system started.
This results in swss not able to start sporadically.
To install haveged can accelerate the entropy collect process.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
This PR aims to monitor the critical processes in PMon container by Monit in 201911 branch.
How I did it
I created a template configuration file of Monit and it will be rendered to generate Monit configuration file of PMon container
by a service generate_monit_config.service.
How to verify it
I verified this on a Mellanox device str-msn2700-03 and an Arista device str-a7050-acs-1.
Which release branch to backport (provide reason below if selected)
201811
[x ] 201911
202006
202012
New features and fixes in the new SDK/FW:
SN4600C | AN/LT support
SN2700 | AN/LT bugs fixes
WJH | FID_MISS support
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6.
Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835).
However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88)
Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6
---> Running in bece31f49267
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 1891k 100 1891k 0 0 9564k 0 --:--:-- --:--:-- --:--:-- 9600k
Traceback (most recent call last):
File "<stdin>", line 24298, in <module>
File "<stdin>", line 139, in main
File "<stdin>", line 115, in bootstrap
File "<stdin>", line 96, in monkeypatch_for_cert
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module>
ImportError: cannot import name 'NoReturn'
The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py | python3.6' returned a non-zero code: 1
How I did:
Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage
pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
a364614 2021-04-22 | [201911][acl] Use a list instead of a comma-separated string for ACL port list (#1576) [Danny Allen]
391e524 2021-04-15 | [201911] Fix Multi-ASIC show specific resursive route (#1563) [gechiang]
#### Why I did it
Since we will have multiple `dhcrelay` processes if there exists different VLANs in the table `VLAN_INTERFACE` of `CONIFG_DB`,
we should use unique service name for each `dhcrelay` process in Monit configuration file. Otherwise, Monit service will fail to work.
#### How I did it
I append the VLAN name to the end of each service name such that they are unique.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
#### Why I did it
- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.
#### How I did it
- Modified get_transceiver_change_event in 1.0 to poll mode in all the related branches.
Backport of https://github.com/Azure/sonic-buildimage/pull/7309 to the 201911 branch
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
This PR aims to monitor critical processes in router advertiser and dhcp_relay containers by Monit.
How I did it
Router advertiser container only ran on T0 device and the T0 device should have at least one VLAN interface
which was configured an IPv6 address. At the same time, router advertiser container will not run on devices of which
the deployment type is 8.
As such, I created a service which will dynamically generate Monit configuration file of router advertiser from a
template.
Similarly Monit configuration file of dhcp_relay was also generated from a template since the number of dhcrelay process in dhcp_relay container is depended on number of VLANs.
How to verify it
I verified this implementation on a DuT.
see below error:
+ sudo https_proxy= LANG=C chroot ./fsroot easy_install pip==20.3.3
Searching for pip==20.3.3
Reading https://pypi.python.org/simple/pip/
Couldn't find index page for 'pip' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
No local packages or working download links found for pip==20.3.3
error: Could not find suitable distribution for Requirement.parse('pip==20.3.3')
How I fix:
Install python-pip via apt-get
Pin the version to 20.3.3
Master has same changes.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.
admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30
In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
4a497407c8697a8c531ab999da95936ac1e71c9b (HEAD -> 201911, origin/201911) Fix the LLDP_LOC_CHASSIS not getting populated if no remote neighbors are present (#39)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Fix#7248
Issue is similiar to martinblech/xmltodict#47
The correct solution is to change mockredispy to move
nose from setup_requirement to test_requirement.
The quick workaround is to install nose explicitly.
fix build issue:
05:09:37 Downloading mockredispy-2.9.3.tar.gz (17 kB)
05:09:39 ?[91m ERROR: Command errored out with exit status 1:
05:09:39 command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-sypos2ry/mockredispy_ab86cd14995544df90f78a63ab7041a3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-sypos2ry/mockredispy_ab86cd14995544df90f78a63ab7041a3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-ymhn19ne
05:09:39 cwd: /tmp/pip-install-sypos2ry/mockredispy_ab86cd14995544df90f78a63ab7041a3/
05:09:39 Complete output (23 lines):
05:09:39 Couldn't find index page for 'nose' (maybe misspelled?)
05:09:39 No local packages or working download links found for nose
05:09:39 Traceback (most recent call last):
05:09:39 File "<string>", line 1, in <module>
05:09:39 File "/tmp/pip-install-sypos2ry/mockredispy_ab86cd14995544df90f78a63ab7041a3/setup.py", line 29, in <module>
05:09:39 'with_redis = mockredis.noseplugin:WithRedis'
05:09:39 File "/usr/lib/python3.5/distutils/core.py", line 108, in setup
05:09:39 _setup_distribution = dist = klass(attrs)
05:09:39 File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 317, in __init__
05:09:39 self.fetch_build_eggs(attrs['setup_requires'])
05:09:39 File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 372, in fetch_build_eggs
05:09:39 replace_conflicting=True,
05:09:39 File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 846, in resolve
05:09:39 dist = best[req.key] = env.best_match(req, ws, installer)
05:09:39 File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1118, in best_match
05:09:39 return self.obtain(req, installer)
05:09:39 File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1130, in obtain
05:09:39 return installer(requirement)
05:09:39 File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 440, in fetch_build_egg
05:09:39 return cmd.easy_install(req)
05:09:39 File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 693, in easy_install
05:09:39 raise DistutilsError(msg)
05:09:39 distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('nose')
05:09:39 ----------------------------------------
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Problem:
Default groupadd for redis, takes 1000 by default. This forces, subsequently created admin group to get 1001.
As all TACACS users are created with 1000 as their gid, they end up in redis group.
Fix:
Create redis group *after* admin group is created
Add a check that admin group id is 1000
#### Why I did it
Plexus-utils before 3.0.16 is vulnerable to command injection because it does not correctly process the contents of double quoted strings.
#### How I did it
Upgrade to 3.0.16
The motivation of these changes is to fix (#6051):
- Why I did it
To fix CPU cstates configuration
- How I did it
Updated code to be POSIX compatible
- How to verify it
root@sonic:/home/admin# sonic_installer install sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Feb 17 Fix tests failing due to duplicate vxlan tunnel creation (#75)
Mar 11 Update route api to specify limitation (#77)
Apr 01 Add host_ifname field while adding entry in VLAN table (#80)
Fix the following issues:
Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system
All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911)
All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Make sure Everflow always gets classified as Mirror table and not as Control Plane on multi-asic platforms.
Why I did:
In Multi-asic platforms we generate Everflow acl table data from minigraph for both host and namespace.
It is possible in multi-asic minigraph if there are no external port-channel (Only Router Port IP Interface) then Everflow table will have no binded interface in host and will gets classified as Control Plane ACL while in namespace gets classified as Mirror Table.
For ACL Rule generation we read global db as source of truth for acl table information and so for everflow rule generation if tables gets classified as Control plane we can generate rules with invalid action causing orchagent to throw runtime error.
How I did:
If the table is attach to erspan interface in minigraph then it always gets classified as mirror table.
ecc1f9b1bb0ad18843e0f969fe8564cf37bf2080 (HEAD -> 201911, origin/201911)
[acl_loader]: add iptype match to the rules for dataplane acl
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
ad9022ebf9c13b59ef8dc47aaa1f89628e64315e (HEAD -> 201911, origin/201911) Reduce time taken by show commands on multi-asic platforms (#1544)
4993a3644bff689701aac2ee2b10c351a9d241ef [fast-reboot]: Fix fail to execute fast-reboot problem (#1047)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.
Bug fixes
-Removing critical thermal zones to prevent unexpected software system shutdown:
Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
- hw-mgmt: thermal: Add hardcoded critical trip point
- Removing redundant link for cpld3 for fixed systems (SN2100, SN2010).
- Fix an issue with a missed attribute for cpld3 (port CPLD) for SN2700, SN2410.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
To run VNET route consistency check periodically.
For any failure, the monit will raise alert based on return code.
The tool will log required details.