Commit Graph

6949 Commits

Author SHA1 Message Date
Saikrishna Arcot
9251d4ba8b
[docker-wait-any]: Exit worker thread if main thread is expected to exit (#12255)
There's an odd crash that intermittently happens after the teamd container
exits, and a signal is raised to the main thread to exit. This thread (watching
teamd) continues execution because it's in a `while True`. The subsequent wait
call on the teamd container very likely returns immediately, and it calls
`is_warm_restart_enabled` and `is_fast_reboot_enabled`. In either of these
cases, sometimes, there is a crash in the transition from C code to Python code
(after the function gets executed).  Python sees that this thread got a signal
to exit, because the main thread is exiting, and tells pthread to exit the
thread.  However, during the stack unwinding, _something_ is telling the
unwinder to call `std::terminate`.  The reason is unknown.

This then results in a python3 SIGABRT, and systemd then doesn't call the stop
script to actually stop the container (possibly because the main process exited
with a SIGABRT, so it's a hard crash). This means that the container doesn't
actually get stopped or restarted, resulting in an inconsistent state
afterwards.

The workaround appears to be that if we know the main thread needs to exit,
just return here, and don't continue execution. This at least tries to avoid it
from getting into the problematic code path. However, it's still feasible to
get a SIGABRT, depending on thread/process timings (i.e. teamd exits, signals
the main thread to exit, and then syncd exits, and syncd calls one of the two C
functions, potentially hitting the issue).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-10-05 18:14:10 -07:00
kannankvs
3686454c6e
Updated the template with comment recieved (#12276)
Updated the PR template with comment received on removing the reference link on GCU. Hence added text to show reference for GCU PR.
2022-10-05 17:51:42 -07:00
xumia
1f0699f51e
Fix sonic-config low dpkg hit rate issue (#12244)
Why I did it
When sending a PR only CI change, as expected, the target target/python-wheels/buster/sonic_config_engine-1.0-py2-none-any.whl should be from the cache, because the depended files were not changed, but it rebuilt.

How I did it
Sort the files by name.
2022-10-05 08:10:54 +08:00
Kalimuthu-Velappan
c691b73959
01.Version-cache - restructuring of Makefile.work (#12000)
- The Makefile.work becomes complex and it is very difficult to manage the changes across branches.
- Restructured the Makefile.work and it becomes more readable.
- Added $(QUIET) option to turn on command echo mode through command line option.
- Exported the SONIC_BUILD_VARS variable, through which make options can be set dynamically.
	Eg: make SONIC_BUILD_VARS='INCLUDE_NAT=y'
2022-10-04 14:13:40 -07:00
Mai Bui
95f4af3407
[actions] Support Semgrep by Github Actions (#12249)
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
[Semgrep](https://github.com/returntocorp/semgrep) is a static analysis tool to find security vulnerabilities.
When opening a PR or commtting to PR, Semgrep performs a diff-aware scanning, which scans changed files in PRs.
When merging PR, Semgrep performs a full scan on master branch and report all findings.
Ref: - [Supported Language](https://semgrep.dev/docs/supported-languages/#language-maturity) - [Semgrep Rules](https://registry.semgrep.dev/rule)
#### How I did it
Integrate Semgrep into this repository by committing a job configuration file
#### How to verify it
PR: https://github.com/maipbui/sonic-buildimage/pull/2
Master branch full scan findings: [Master branch findings results](https://github.com/maipbui/sonic-buildimage/actions/runs/3160181876/jobs/5144332404)
PR https://github.com/maipbui/sonic-buildimage/pull/2 scan findings: [Pull request findings results](https://github.com/maipbui/sonic-buildimage/actions/runs/3160193505/jobs/5144357859)
2022-10-03 14:38:55 -04:00
andywongarista
2f46689a05
[Arista] Add components for 720DT-48S (#12217)
Why I did it
Add components data for sonic-mgmt testing

How I did it
Update platform.json and add platform_components.json

How to verify it
Ran sonic-mgmt tests (test_chassis and test_component)
2022-10-03 13:53:34 +08:00
Dror Prital
44356fa8d7
[Mellanox] Add NVIDIA copyright header for NVIDIA added files (#12130)
- Why I did it
Add NVIDIA Copyright header for new "NVIDIA" files

- How I did it
Add the copyright header as remark at the head of the file
2022-10-02 11:34:24 +03:00
Muhammad Danish
8c10851c2a
Update azure.github.io links to sonic-net.github.io (#12209)
Why I did it
azure.github.io/SONiC/ no longer works and returns 404 Not Found. Updated it to the correct sonic-net.github.io/SONiC/
2022-10-02 14:02:10 +08:00
jingwenxie
0a2743d5e4
[submodule] update sonic-utilities (#12138)
0a7557bd9 [minigraph] add option to specify golden path in load_minigraph (#2350)
322aefc37 [GCU]Remove GCU unique lane check for duplicate lanes platforms (#2343)
7099fffa7 [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot (#2286)
09026edbb [warm-reboot] fix warm-reboot when /tmp/cache is missing (#2367)
a3c404c74 Fix typo in platform_sfputil_helper.is_rj45_port (#2374)
637d834ce Vnet_route_check Vxlan tunnel route update. (#2281)
29a3e5180 Added support for tunnel route status in show vnet routes all. (#2341)
1ac584bb3 Use 'default' VRF when VRF name is not provided (#2368)
4d377a620 [subinterface]Added additional checks in portchannel and subinterface commands (#2345)
bbcdf2ed7 disk_check: Publish event  for RO state (#2320)
3fd537b0a Support the bandit check by GitHub Action (#2358)
491d3d380 [generate dump]Added error message when saisdkdump fails (#2356)
6830e01ec [counterpoll]Fixing counterpoll show for tunnel and acl stats (#2355)
3be2ad7de [fast-reboot]Avoid stopping masked services during fast-reboot (#2335)
0e1b0cf20 [GCU] Fix missing backend in dry run (#2347)
676c31bd0 Add verification for override (#2305)
48997c266 Add Password Hardening CLI support (#2338)
414e239ea update unit tests for swap allocator
a91a4922f consider swap checking memory in installer
f0ce58635 [route_check]: Ignore standalone tunnel routes (#2325)
2022-10-01 11:36:55 -07:00
Samuel Angebault
18850e4e28
[Arista] Update platform submodules (#12225)
Implement input power psu API
Report DC power output via API
Add bootloader Component in API
Fix issue where naming was not unique for Component
2022-09-30 16:03:40 +08:00
Hua Liu
004a8b6eae
[AzurePipeline] Fix vstest step failed by libyang missing. (#12240)
Why I did it
Fix PR merge failed because 'vstest' step does not install libyang.

How I did it
Install libyang in azure pipeline.

How to verify it
Pass vstest step.
2022-09-30 15:56:46 +08:00
Volodymyr Samotiy
eea8ebd0a9
[Mellanox] Update MFT to v4.21.0-100 (#11758)
- Why I did it
To update MFT package to the latest version.

- How I did it
Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk.

- How to verify it
Build an image and deploy to the switch
Check MFT version by dpkg -l | grep mft
Verify that all the SONiC services up and running
Run regression testing using tests from sonic-mgmt

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2022-09-30 09:48:40 +03:00
Volodymyr Samotiy
92bd6dae28
[Mellanox] Update SAI to v2205.22.1.19 and SDK/FW to v4.5.3168/v2010.3170 (#12205)
- Why I did it
To include latest fixes and new functionality

SAI fixes and new features
fix #3205239, incorrect object type returned for SG child list
Fix VRF-VNI map entries remove issue
ECC health event and logging
[Port Buffers] restore default queue and pg configuration when all user pools are deleted
Fix EVPN type3 error on removal of uc/bc flood group
Fix EVPN type2 MAC move from local to remote results in SAI failure
Fix Disable learning on VXLAN tunnel
Fix error on VXLAN v6 tunnel removal
Fix port cannot apply schedule group when it is a lag member
Fix BFD add more detailed message on BFD packet not related to any existing session
gcc10 compilation fixes
Disable learning on VXLAN tunnel
Support BFD remote-disc exchange in negotiation stage
Tunnel Loopback packet action attribute implementation (for Dual TOR)
Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only)
Support for CRC2 hash algorithm
Bulk counter support for PGs, queues
Support mirror sample rate attribute (SPC2+)
[Functional] [QoS] | Unable to remove SCHEDULE profile table even if there is no object referencing it
Next hop group optimized bulk API
Reduce verbosity of shared database already exists print
Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+
use same size descriptor pool for rx/tx
fix bfd - notify Sonic for admin-down event
2201 - empty list for supported fec for RJ45 ports
Fix don't disable used tunnel underlay interfaces

SDK fixes
100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom".
Added remote peer UDP port information in BFD packet event.
After editing an ECMP, the resilient ECMP next-hop counter may not count correctly.
Fixed potential memory leaks in some APIs related to LPM
If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero).
In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides
In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected.
In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped.
In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt.
On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
PCI calibration changes from a static to a dynamic mechanism.
SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group.
SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group.
SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump.
SLL configuration is missing in SDK dump.
Spectrum-2 systems, do no support 1GbE on supported 40GbE modules.
When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly.
When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU.
When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed.
When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2022-09-30 09:40:12 +03:00
Junchao-Mellanox
1d69f0916e
[Mellanox] Provide dummy implementation for get_rx_los and get_tx_fault (#12231)
- Why I did it
get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them.
NOTE: in later releases we will get them back via different interface.

- How I did it
Return False * lane_num for get_rx_los and get_tx_fault

- How to verify it
Added unit test
2022-09-30 09:38:05 +03:00
Ye Jianquan
5510d9c03b
Make t0 part1 and part2 be able to be rerun if failed (#12221)
Why I did it
With continueOnError: true, a failed job returns the result: partiallySuccess, which cause it can't be rerun, since AZP consider it as passed. Then we can't only rerun t0 jobs when it fails.

How I did it
Mark t0 part1 and part2 as continueOnError: false.

How to verify it
The pipeline will verify it.
2022-09-30 08:17:01 +08:00
Prince George
179882398c
Revert "Support for serdes platform library debian installation for Innovium SONiC image (#11920)" (#12227)
This reverts commit 8c7e0f8e02.
2022-09-29 17:12:20 -07:00
Andriy Kokhan
9bb0a7f33c
[BFN] Canceling PSU platform API calls on SIGTERM (#10720)
* [BFN] Canceling PSU platform API calls on SIGTERM

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN] Fixed SONiC fwutil exec time (#31)

Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
Co-authored-by: Taras Keryk <tarasx.keryk@intel.com>
2022-09-29 15:18:43 -07:00
Dmytro Lytvynenko
d08fcc971c
[BFN] Updated syseeprom platform plugin to use onie-eeprom (#10556)
* Align system eeprom info with ONIE

* revert linked sonic_platform implementation

* refactor into one class

* refactor after review
2022-09-29 15:13:46 -07:00
Dmytro Lytvynenko
d9c9c70fb5
[BFN] Move qsfp eeprom reading to new cached api (#9909)
* Move qsfp eeprom reading to new cached api

* provide reading multiple pages in recursive manner

* workaround with flat memory on cmis

* remove workaround with memory model

* Remove unused imports
2022-09-29 15:12:01 -07:00
Hua Liu
1f9c89a8d3
[sonic-py-common] porting sonic_db_dump_load.py from sonic-py-swsssdk to sonic-py-common (#12185)
Porting sonic_db_dump_load.py from sonic-py-swsssdk to sonic-py-common.

#### Why I did it
sonic-py-swsssdk will be deprecate, so porting sonic_db_dump_load.py to sonic-py-common.

#### How I did it
Copy sonic_db_dump_load.py to sonic-py-common, and fix minor API different.

#### How to verify it
Pass all E2E test.
The platform_tests/test_advanced_reboot.py::test_warm_reboot will cover this script.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Porting sonic_db_dump_load.py from sonic-py-swsssdk to sonic-py-common.

#### Ensure to add label/tag for the feature raised. example - [PR#2174](https://github.com/sonic-net/sonic-utilities/pull/2174) where, Generic Config and Update feature has been labelled as GCU.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-09-29 10:27:57 +08:00
vijayvyasm
8c7e0f8e02
Support for serdes platform library debian installation for Innovium SONiC image (#11920)
Signed-off-by: vijayvyasm vijayvyasm@marvell.com

Signed-off-by: vijayvyasm vijayvyasm@marvell.com
2022-09-28 18:37:33 -07:00
Stephen Sun
4d317aff94
[Mellanox] Fix typo in platform API (#12136)
- Why I did it
Fix a typo in chassis platform API which causes the following error

>>> import sonic_platform as P
>>> c = P.platform.Platform().get_chassis()
>>> sl = c.get_all_sfps()
>>> sl[0].get_lpmode()
Sep 28 07:48:33 INFO    LOG: Initializing SX log with STDOUT as output file.
False
>>> del c
Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__
    self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle)
NameError: name 'sfp_module' is not defined

- How I did it
Use self while using the SDK handle

- How to verify it
Manual test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-09-28 11:09:18 +03:00
Junchao-Mellanox
f890606d82
Revert "[Mellanox] Redirect ethtool stderr to subprocess for better error log (#12038)" (#12183)
This reverts commit 9750cb4.

There is a PR to handle 202205 branch revert: #12184

- Why I did it
The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged:

Cannot get module EEPROM information: Input/output error
Before the "bad" PR, the message format is like this:

INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error
It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message:

NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error
Rsyslog no longer truncate such log and many such messages are flooded to syslog.

- How I did it
Revert the PR

- How to verify it
Manual test
2022-09-28 10:15:26 +03:00
Ye Jianquan
7666af9403
Fix pip install error (#12198)
Fix the error of pip install introduced in PR #12197
2022-09-28 14:39:33 +08:00
Ye Jianquan
9c602320c3
install missed package python-dateutil (#12197)
Why I did it
Fix issue of can't import dateutil.parser in show_techsupport/test_auto_techsupport.py

How I did it
install python-dateutil
2022-09-28 11:38:41 +08:00
ShiyanWangMS
1995540758
Upgrade docker-sonic-mgmt base image from Ubuntu18.04 to 20.04 (#12056)
Upgrade docker-sonic-mgmt base image from Ubuntu18.04 to 20.04
2022-09-27 09:15:48 +08:00
xumia
60c80ad26d
[Build] Fix the build unstalbe issue caused by the kvm not ready (#12180)
Why I did it
Fix the build unstable issue caused by the kvm 9000 port is not ready to use in 2 seconds.

2022-09-02T10:57:30.8122304Z + /usr/bin/kvm -m 8192 -name onie -boot order=cd,once=d -cdrom target/files/bullseye/onie-recovery-x86_64-kvm_x86_64_4_asic-r0.iso -device e1000,netdev=onienet -netdev user,id=onienet,hostfwd=:0.0.0.0:3041-:22 -vnc 0.0.0.0:0 -vga std -drive file=target/sonic-6asic-vs.img,media=disk,if=virtio,index=0 -drive file=./sonic-installer.img,if=virtio,index=1 -serial telnet:127.0.0.1:9000,server
2022-09-02T10:57:30.8123378Z + sleep 2.0
2022-09-02T10:57:30.8123889Z + '[' -d /proc/284923 ']'
2022-09-02T10:57:30.8124528Z + echo 'to kill kvm:  sudo kill 284923'
2022-09-02T10:57:30.8124994Z to kill kvm:  sudo kill 284923
2022-09-02T10:57:30.8125362Z + ./install_sonic.py
2022-09-02T10:57:30.8125720Z Trying 127.0.0.1...
2022-09-02T10:57:30.8126041Z telnet: Unable to connect to remote host: Connection refused

How I did it
Waiting more time until the tcp port 9000 is ready, waiting for 60 seconds in maximum.
2022-09-27 06:55:19 +08:00
Tal Berlowitz
1b50a2b721
Patch ifupdown2 (#9630) (#11548) 2022-09-26 09:30:38 -07:00
Aryeh Feigin
2c10ebb4fe
Use warm-boot infrastructure for fast-boot (#11594)
This PR should be merged together with the sonic-utilities PR (sonic-net/sonic-utilities#2286) and sonic-sairedis PR (sonic-net/sonic-sairedis#1100).

Use redis contents from dump file in fast-reboot.

Improve fast-reboot flow by utilizing the warm-reboot infrastructure.
This followes https://github.com/sonic-net/SONiC/blob/master/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md
2022-09-26 09:01:49 -07:00
Xin Wang
f50dc28789
[docker-sonic-mgmt] Deprecate azure-kusto-data & azure-kusto-ingest for py2 (#12143)
Why I did it
The python packages azure-kusto-data and azure-kusto-ingest packages for python2 are too old and not really used. The python3 environment has newer version of these packages installed. This change is to deprecate these two packages for python2 in docker-sonic-mgmt image.

How I did it
Removed the lines for installing old version of packages azure-kusto-data and azure-kusto-ingest in python2 in the Dockerfile template.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
2022-09-26 10:48:02 +08:00
Hua Liu
cc0781b40b
Build swss-common with libyang (#12087)
Build swss-common with libyang

#### Why I did it
sonic-swss-common lib add dependency to libyang recently, so need update make file before update sonic-swss-common submodule.

#### How I did it
Add dependency to libyang in rules/swss-common.mk 

#### How to verify it
Pass all E2E test case.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Add new Redis database PROFILE_DB

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-09-25 03:37:35 +08:00
Liu Shilong
c968114a36
[ci] Use absolute template file path in docker-sonic-slave pipeline. (#12153) 2022-09-23 12:54:57 +08:00
Samuel Angebault
27032bfb84
Add BUILD_DATE to SWI (#11915)
Add the BUILD_DATE to the SWI version info, as this is a requirement of Secure Boot.
2022-09-22 17:52:40 -07:00
Marty Y. Lok
57ff7a2308
[chassis][supervisor] show system-health summary fails on the supervisor card (#10631)
Fix the command "sudo show system-health summary" shows the following error on the supervisor card. Fixes #10630
2022-09-22 16:39:31 -07:00
Mai Bui
283efeeacc
[sonic-py-common] Add getstatusoutput_noshell() functions to general module (#12065)
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`getstatusoutput()` function from `subprocess` module has shell injection issue because it includes `shell=True` in the implementation
Eliminate duplicate code
#### How I did it
Reimplement `getstatusoutput_noshell()` and `getstatusoutput_noshell_pipe()` functions with `shell=False`
Add `check_output_pipe()` function
#### How to verify it
Pass UT
2022-09-22 09:40:42 -04:00
Xichen96
8af369a7c9
Enable swap for haliburton device. (#11746)
Signed-off-by: Xichen Lin <lukelin0907@gmail.com>

Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
2022-09-22 13:57:52 +08:00
Hua Liu
f8494d10ad
Improve SSHD config to use more secure settings (#12109)
Improve SSHD config to use more secure settings

#### Why I did it
According to Sonic OS review result, SSHD config file /etc/ssh/sshd_config using insecure settings.


#### How I did it
Change build_debian.sh script to set following settings to /etc/ssh/sshd_config:
ClientAliveInterval is set to 300
MaxAuthTries is set to default of 3
Banner set to /etc/issue
LogLevel is set to VERBOSE

#### How to verify it
Pass all E2E test case.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Improve SSHD config to use more secure settings

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-09-22 09:25:29 +08:00
Zain Budhwani
fd6a1b0ce2
Add events to host and create rsyslog_plugin deb pkg (#12059)
Why I did it

Create rsyslog plugin deb for other containers/host to install
Add events for bgp and host events
2022-09-21 09:20:53 -07:00
Liu Shilong
8211c850f1
[ci] Update docker sonic slave pipeline to build slave base docker (#11908)
* [ci] Update docker sonic slave pipeline to build slave base docker
2022-09-21 15:50:30 +08:00
Samuel Angebault
5ff45c5846
Implement ssd_util plugin for Arista products (#11981)
Why I did it
Some Arista products do not have an SSD but use an eMMC instead.
The SsdUtil plugin is therefore extended to support both.

How I did it
Implemented ssd_util.py platform plugin loaded by ssdutil.
This plugin fallback to the default SONiC implementation if the arista one can't be found.

How to verify it
Run show platform ssdhealth on a product with an eMMC
2022-09-21 14:56:14 +08:00
Junhua Zhai
63c14d2e9e
[PikeZ] Update port alias in Arista-720DT-48S (#12086)
Fix #12037, by following HLD https://github.com/sonic-net/SONiC/blob/master/doc/sonic-port-name.md.
2022-09-20 20:04:14 +08:00
Dror Prital
f30fc76278
Remove jinja2_cache (#11996)
- Why I did it
As part of Persistent log level HLD , LOGLEVEL_DB content is moved to CONFIG_DB.
In addition, it was decided to remove jinja2_cache which currently appear on LOGLEVEL_DB

This cache was added to speed up template rendering in start scripts. There were a lot of them rendered during system start. This caused a delay in warm boot LAG restore time. It was tested and verified that with and without the cache we don't see any difference in this timing now. It is probably due to a lot of other optimizations done to sonic-cfggen. Since there is no noticeable improvement made by j2 cache now it is safe to remove it.

- How I did it
Remove redis_bcc.py file and and remove the bytcode_cache from sonic-sfggen

- How to verify it
Warm boot was tested with \ without this jinja2_cache and it there is no difference in performance
2022-09-20 10:22:33 +03:00
Stepan Blyshchak
e662008f72
[services] kill container on stop in warm/fast mode (#10510)
- Why I did it
To optimize stop on warm boot.

- How I did it
Added kill for containers
2022-09-19 19:34:33 +03:00
Volodymyr Boiko
c243af0cce
[bgp][service] Start bgp service after interfaces-config service (#11827)
- Why I did it
interfaces-config service restarts networking service, during the restart loopback interface address is being removed and reassigned back, leaving loopback without an ipv4 address for a while.
On SONiC startup and config reload interfaces-config and bgp services start in parallel and sometimes
fpmsyncd in bgp attempts bind to loopback while it does not have an address, fails with the log
Exception "Cannot assign requested address" had been thrown in daemon
and exits with rc 0.

root@sonic:/# supervisorctl status
fpmsyncd                         EXITED    Jul 20 05:04 AM
zebra                            RUNNING   pid 35, uptime 6:15:05
zsocket                          EXITED    Jul 20 05:04 AM
docker logs bgp
INFO exited: fpmsyncd (exit status 0; expected)
With fpmsyncd dead, configured routes do not appear in the database.

- How I did it
Added ordering dependency on interfaces-config service into bgp.config

- How to verify it
Itself the issue reproduces quite rarely, but one can gain the time interval between networking down and networking up in interfaces-config.sh like this:

diff --git a/files/image_config/interfaces/interfaces-config.sh b/files/image_config/interfaces/interfaces-config.sh
index f6aa4147a..87caceeff 100755
--- a/files/image_config/interfaces/interfaces-config.sh
+++ b/files/image_config/interfaces/interfaces-config.sh
@@ -63,7 +63,11 @@ done
 # Read sysctl conf files again
 sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf

-systemctl restart networking
+# systemctl restart networking
+
+systemctl start networking
+sleep 10
+systemctl stop networking

 # Clean-up created files
 rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json
with this change the issue reproduces on every config reload.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2022-09-19 17:25:10 +03:00
ganglv
c1d2e88de9
Replace configuration parameter for gnmi write (#11780)
Why I did it
Replace configuration parameter for gnmi write, and we will add other gnmi write features in the future.

How I did it
Update rules/config and other Makefile.

How to verify it
Build sonic image.
2022-09-19 14:54:08 +08:00
Richard.Yu
de68f10923
[submodule] Advance sairedis head (#12098)
Advance sairedis head

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2022-09-18 22:34:38 -07:00
Zhaohui Sun
1effff9836
Enable system-site-packages for ptf docker and install thrift for test_qos_sai (#12094)
Why I did it
test_sai_qos failed because of the following error:

"stderr_lines": [
        "Traceback (most recent call last):", 
        "  File \"/usr/bin/ptf\", line 522, in <module>", 
        "    test_modules = load_test_modules(config)", 
        "  File \"/usr/bin/ptf\", line 413, in load_test_modules", 
        "    mod = imp.load_module(modname, *imp.find_module(modname, [root]))", 
        "  File \"saitests/switch.py\", line 19, in <module>", 
        "    import switch_sai_thrift", 
        "ImportError: No module named switch_sai_thrift"
    ], 

It's because test_sai_qos runs ptf script which imports switch_sai_thrift, switch_sai_thrift is installed from python-saithrift_0.9.4_amd64.deb.
For master image, the deb file is for python3, but ptf only has virtual python3 environment, that's why we add --system-site-packages to allow virtual env to access system site-packeges.

Add thrift package in docker ptf virtual python3 env, because currently env-python3 doesn't have thrift module which is needed in switch_sai_thrift.

How I did it
Enable --system-site-packages for virtual py3 env in ptf docker and install thrift for test_qos_sai

How to verify it
load and login ptf conatiner
dpkg - i python-saithrift_0.9.4_amd64.deb
source /root/env-python3/bin/activate
python
import switch_sai_thrift.switch_sai_rpc
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
2022-09-17 13:33:53 +08:00
ganglv
5650762f2c
Fix dhcp option buffer issue (#12033)
Why I did it
Current isc-dhcp uses below code to remove DHCP option:
memmove(sp, op, op[1] + 2);
sp += op[1] + 2;

sp points to the option to be stripped, we can call it as option S.
op points to the option after options S, we can call it as option O.
DHCP option is a typical type-length-value structure, the first byte is type, the second byte is length, and remain parts are value.
In this case, option O length is bigger than option S, and more than 2 bytes, after the memmove, we will get this result:

Now Option S and Option O are overwritten, op[1] was the length of Option O, and it's modified after memmove.
But current implementation is still using op[1] as length to update sp (sp+=op[1]+2), so we get the wrong sp.

How I did it
Create patch from https://github.com/isc-projects/dhcp
The new impelementation use mlen to store the length of Option O before memmove, that's how it fixed the bug.
size_t mlen = op[1] + 2;
memmove(sp, op, mlen);
sp += mlen;

How to verify it
I have a PR for sonic-mgmt to cover this issue:
sonic-net/sonic-mgmt#6330

Signed-off-by: Gang Lv ganglv@microsoft.com
2022-09-17 06:08:10 +08:00
lixiaoyuner
a1b50cac41
Make client indentity by AME cert (#11946)
* Make client indentity by AME cert

* Join k8s cluster by ipv6

* Change join test cases

* Test case bug fix

* Improve read node label func

* Configure kubelet and change test cases

* For kubernetes version 1.22.2

* Fix undefine issue

Signed-off-by: Yun Li <yunli1@microsoft.com>
2022-09-16 13:13:39 +08:00
juntseng62
23de13feeb
[Alphanetworks] Add new platform BES2348T (#11196)
* Add BES2348T

Signed-off-by: juntseng62 <juntseng62@gmail.com>

* add get_serial_number

Signed-off-by: juntseng62 <juntseng62@gmail.com>

Signed-off-by: juntseng62 <juntseng62@gmail.com>
2022-09-15 21:34:52 -07:00