Improve SSHD config to use more secure settings
#### Why I did it
According to Sonic OS review result, SSHD config file /etc/ssh/sshd_config using insecure settings.
#### How I did it
Change build_debian.sh script to set following settings to /etc/ssh/sshd_config:
ClientAliveInterval is set to 300
MaxAuthTries is set to default of 3
Banner set to /etc/issue
LogLevel is set to VERBOSE
#### How to verify it
Pass all E2E test case.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Improve SSHD config to use more secure settings
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
Some Arista products do not have an SSD but use an eMMC instead.
The SsdUtil plugin is therefore extended to support both.
How I did it
Implemented ssd_util.py platform plugin loaded by ssdutil.
This plugin fallback to the default SONiC implementation if the arista one can't be found.
How to verify it
Run show platform ssdhealth on a product with an eMMC
- Why I did it
As part of Persistent log level HLD , LOGLEVEL_DB content is moved to CONFIG_DB.
In addition, it was decided to remove jinja2_cache which currently appear on LOGLEVEL_DB
This cache was added to speed up template rendering in start scripts. There were a lot of them rendered during system start. This caused a delay in warm boot LAG restore time. It was tested and verified that with and without the cache we don't see any difference in this timing now. It is probably due to a lot of other optimizations done to sonic-cfggen. Since there is no noticeable improvement made by j2 cache now it is safe to remove it.
- How I did it
Remove redis_bcc.py file and and remove the bytcode_cache from sonic-sfggen
- How to verify it
Warm boot was tested with \ without this jinja2_cache and it there is no difference in performance
- Why I did it
interfaces-config service restarts networking service, during the restart loopback interface address is being removed and reassigned back, leaving loopback without an ipv4 address for a while.
On SONiC startup and config reload interfaces-config and bgp services start in parallel and sometimes
fpmsyncd in bgp attempts bind to loopback while it does not have an address, fails with the log
Exception "Cannot assign requested address" had been thrown in daemon
and exits with rc 0.
root@sonic:/# supervisorctl status
fpmsyncd EXITED Jul 20 05:04 AM
zebra RUNNING pid 35, uptime 6:15:05
zsocket EXITED Jul 20 05:04 AM
docker logs bgp
INFO exited: fpmsyncd (exit status 0; expected)
With fpmsyncd dead, configured routes do not appear in the database.
- How I did it
Added ordering dependency on interfaces-config service into bgp.config
- How to verify it
Itself the issue reproduces quite rarely, but one can gain the time interval between networking down and networking up in interfaces-config.sh like this:
diff --git a/files/image_config/interfaces/interfaces-config.sh b/files/image_config/interfaces/interfaces-config.sh
index f6aa4147a..87caceeff 100755
--- a/files/image_config/interfaces/interfaces-config.sh
+++ b/files/image_config/interfaces/interfaces-config.sh
@@ -63,7 +63,11 @@ done
# Read sysctl conf files again
sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf
-systemctl restart networking
+# systemctl restart networking
+
+systemctl start networking
+sleep 10
+systemctl stop networking
# Clean-up created files
rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json
with this change the issue reproduces on every config reload.
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Why I did it
Replace configuration parameter for gnmi write, and we will add other gnmi write features in the future.
How I did it
Update rules/config and other Makefile.
How to verify it
Build sonic image.
Why I did it
test_sai_qos failed because of the following error:
"stderr_lines": [
"Traceback (most recent call last):",
" File \"/usr/bin/ptf\", line 522, in <module>",
" test_modules = load_test_modules(config)",
" File \"/usr/bin/ptf\", line 413, in load_test_modules",
" mod = imp.load_module(modname, *imp.find_module(modname, [root]))",
" File \"saitests/switch.py\", line 19, in <module>",
" import switch_sai_thrift",
"ImportError: No module named switch_sai_thrift"
],
It's because test_sai_qos runs ptf script which imports switch_sai_thrift, switch_sai_thrift is installed from python-saithrift_0.9.4_amd64.deb.
For master image, the deb file is for python3, but ptf only has virtual python3 environment, that's why we add --system-site-packages to allow virtual env to access system site-packeges.
Add thrift package in docker ptf virtual python3 env, because currently env-python3 doesn't have thrift module which is needed in switch_sai_thrift.
How I did it
Enable --system-site-packages for virtual py3 env in ptf docker and install thrift for test_qos_sai
How to verify it
load and login ptf conatiner
dpkg - i python-saithrift_0.9.4_amd64.deb
source /root/env-python3/bin/activate
python
import switch_sai_thrift.switch_sai_rpc
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Why I did it
Current isc-dhcp uses below code to remove DHCP option:
memmove(sp, op, op[1] + 2);
sp += op[1] + 2;
sp points to the option to be stripped, we can call it as option S.
op points to the option after options S, we can call it as option O.
DHCP option is a typical type-length-value structure, the first byte is type, the second byte is length, and remain parts are value.
In this case, option O length is bigger than option S, and more than 2 bytes, after the memmove, we will get this result:
Now Option S and Option O are overwritten, op[1] was the length of Option O, and it's modified after memmove.
But current implementation is still using op[1] as length to update sp (sp+=op[1]+2), so we get the wrong sp.
How I did it
Create patch from https://github.com/isc-projects/dhcp
The new impelementation use mlen to store the length of Option O before memmove, that's how it fixed the bug.
size_t mlen = op[1] + 2;
memmove(sp, op, mlen);
sp += mlen;
How to verify it
I have a PR for sonic-mgmt to cover this issue:
sonic-net/sonic-mgmt#6330
Signed-off-by: Gang Lv ganglv@microsoft.com
* Make client indentity by AME cert
* Join k8s cluster by ipv6
* Change join test cases
* Test case bug fix
* Improve read node label func
* Configure kubelet and change test cases
* For kubernetes version 1.22.2
* Fix undefine issue
Signed-off-by: Yun Li <yunli1@microsoft.com>
Multi-asic Docker instances are created behind Docker's default bridge
which doesn't allow talking to other Docker instances that are in the
host network (like database-chassis).
On linecards, we configure midplane interfaces to let per-asic docker
containers talk to CHASSIS_DB on the supervisor through internal chassis
network.
On the supervisor we don't need to use chassis internal network, but we
still need a similar setup in order to allow fabric containers to talk
to database-chassis
- Why I did it
Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes:
• Spectrum-3 | PCI calibration changes from a static to a dynamic mechanism.
• [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
- Why I did it
ethtool print error logs when EEPROM of a SFP is not available. It prints error like this:
INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error
INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument
However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP.
- How I did it
Redirect ethtool stderr to subprocess and log it better
- How to verify it
Manual test
Why I did it
Replace unsafe functions in iccpd
How I did it
Replace memset() by zero initialization
Replace strtok() by strtok_r()
Signed-off-by: maipbui <maibui@microsoft.com>
*The sonic-frr was upgraded to FRR 8.2.2 as part of PR #10691. However, sonic-frr/frr submodule was still referring to previous 7.5 version. Update the sonic-frr/frr submodule to 8.2.2 commit id. Fixes issue #11484.
* Adding support for get/set low pwer mode for QSFPs in PDDF common APIs
* Adding support for get/set low pwer mode for QSFPs in PDDF common APIs - Review comments
The timer execution may fail if triggered during a config reload
(when the sonic.target is stopped). This might happen in a rare
situation if config reload is executed after reboot in a small
time slot (for 0 to 30 seconds) before the tacacs-config timer
is triggered. To ensure that timer execution will be resumed after
a config reload the WantedBy section of the systemd service is updated
to describe relation to sonic.target.
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Why I did it
This PR is to update TC_TO_QUEUE_MAP|AZURE for SKU Arista-7050CX3-32S-D48C8 and Arista-7260CX3 T0.
The change is only to align the TC_TO_QUEUE_MAP for regular traffic and bounced traffic. It has no impact on business because we have no traffic being mapped to TC2 or TC6.
How I did it
Update TC_TO_QUEUE_MAP|AZURE , and test cases as well.
How to verify it
Verified by running test case test_j2files.py
/sonic/src/sonic-config-engine$ python3 setup.py test -s tests/test_j2files.py
running test
......
----------------------------------------------------------------------
Ran 29 tests in 25.390s
OK
Why I did it
If the SWSS services was restarted, the MACsec service should also be restarted. Otherwise the data in wpa_supplicant and orchagent will not be consistent.
How I did it
Add dependency in docker-macsec.mk.
How to verify it
Manually check by 'sudo service swss restart'.
The MACsec container should be started after swss, the syslog will look like
Sep 8 14:36:29.562953 sonic INFO swss.sh[9661]: Starting existing swss container with HWSKU Force10-S6000
Sep 8 14:36:30.024399 sonic DEBUG container: container_start: BEGIN
...
Sep 8 14:36:33.391706 sonic INFO systemd[1]: Starting macsec container...
Sep 8 14:36:33.392925 sonic INFO systemd[1]: Starting Management Framework container...
Signed-off-by: Ze Gan <ganze718@gmail.com>
Why I did it
Approve step needs special permission settings.
We already added permission setting to enable bypass merging PR.
So, approve step is not necessary.
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.
#### Why I did it
It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1abef1ecebff bcbca2b74df6 "/usr/local/bin/supe…" 22 hours ago Up 22 hours what-just-happened
3c924d405cd5 docker-lldp:latest "/usr/bin/docker-lld…" 22 hours ago Up 22 hours lldp
eb2b12a98c13 docker-router-advertiser:latest "/usr/bin/docker-ini…" 22 hours ago Up 22 hours radv
d6aac4a46974 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours mgmt-framework
d880fd07aab9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours pmon
75f9e22d4fdd docker-snmp:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours snmp
76d570a4bd1c docker-sonic-telemetry:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours telemetry
ee49f50344b3 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours syncd
1f0b0bab3687 docker-teamd:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours teamd
917aeeaf9722 docker-orchagent:latest "/usr/bin/docker-ini…" 22 hours ago Exited (0) 22 hours ago swss
81a4d3e820e8 docker-fpm-frr:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours bgp
f6eee8be282c docker-database:latest "/usr/local/bin/dock…" 22 hours ago Up 22 hours database
```
The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot:
d01a91a569/files/image_config/misc/docker-wait-any (L56)
#### How I did it
Removed the check for ```"Running"```.
#### How to verify it
Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
- Why I did it
Update sonic-platform-daemons submodule pointer to include the following:
[ycabled] enable telemetry for 'active-active'; fix gRPC portid ordering (#284)
[ycabled] remove some spurious logs (#282)
Correct the peer forwarding state table (#281)
add psu input voltage and current (#276)
[ycabled] add capability to enable/disable telemetry (#279)
Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-utilities submodule pointer to include the following:
Replace cmp in acl_loader with operator.eq (#2328)
Subinterface vrf bind issue fix (#2211)
[VRF]Adding CLI checks to ensure Vrf is valid in interface bind and static route commands (#2333)
[doc]: Add MACsec CLI doc (#2334)
[sonic-package-manager] Drop 'expires_in' (#2002)
Handle non-front-panel ports in is_rj45_port (#2327)
[service_mgmt]: Fix fetch MULTI_INST_DEPENDENT bug in service_mgmt.sh.j2 (#2319)
correct an error by changing "show bgp summary" to "show bfd summary" (#2324)
Update VRF unbind command (#2331)
Fix issue: port_type is referenced before initialized (#2323)
Fix issue: exception in is_rj45_port in multi ASIC env (#2313)
Delete .DS_Store (#2244)
Fix bug with checking VRF's routes in route_check.py (#2301)
[decode-syseeprom] Fix setting use_db based on support_eeprom_db (#2270)
Fix vrf UT failed issue (#2309)
add lacp_rate to portchannel (#2036)
- Why I did it
Remove extra empty lines in the SN2201 pg_profile_lookup.ini to make it aligned with other platforms.
This extra empty line could confuse some test cases which need to parse this file.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
* Define whether a test is required by code
Why I did it
Define whether a test job is required before merging by code.
Let the failure of multi-asic and t0-sonic don't block pr merge.
The 'required' configuration can be modified by the owner in the future.
How I did it
Required:
t1-lag, t0
Not required:
multi-asic, t0-sonic
How to verify it
AZP itself verifies it.
Signed-off-by: jianquanye@microsoft.com
* [mux] skip mux operations during warm shutdown
- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>