Commit Graph

963 Commits

Author SHA1 Message Date
賓少鈺
f92aca837d
PDE migration to bullseye (#10836)
#### Why I did it
Upgrade docker-pde to bullseye

#### How to verify it
Check Azp status
2022-07-13 11:58:47 -07:00
tjchadaga
849eb4bf32
Changes to persist TSA/B state across reloads (#11257) 2022-07-12 00:22:48 -07:00
Hua Liu
a9b7a1facd
Replace swsssdk with swsscommon (#11215)
#### Why I did it
Update scripts in sonic-buildimage from py-swsssdk to swsscommon


#### How I did it
Change code to use swsscommon.

#### How to verify it
Pass all E2E test case

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Update scripts in sonic-buildimage from py-swsssdk to swsscommon

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-07-11 10:01:10 +08:00
Ye Jianquan
27d53cbba7
FIX the build error introduced by textfsm 1.1.3(Published on 2022/7/6) (#11394)
Why I did it
sonic-mgmt docker image build error, because of the textfsm new version(1.1.3).
https://dev.azure.com/mssonic/build/_build/results?buildId=119147&view=logs&j=3dc8fd7e-4368-5a92-293e-d53cefc8c4b3&t=44e6c678-cb87-52d9-8547-bcdbd0ad6ae4&l=43043

How I did it
Fix textfsm version to 1.1.2

How to verify it
I build the image on my local env, and reproduce the issue with 1.1.3, it's fixed after I change the version to 1.1.2 .

Signed-off-by: jianquanye@microsoft.com
2022-07-09 19:02:10 +08:00
Ye Jianquan
5873e8ef9c
Upgrade snappi version to 0.7.44 (#11335)
Why I did it
Keysight provide a new version with some snappi API source code related fix: snappi[ixnetwork,convergence]==0.7.44

How I did it
Upgrade snappi version to 0.7.44

How to verify it
Whether it's installed in sonic-mgmt docker container
2022-07-06 16:59:19 +08:00
yozhao101
1720fa21d9
[tunnel_packet_handler] Add a whitespace in the warning syslog message. (#11232)
*This PR aims to add a whitespace in the warning syslog message of process tunnel_packet_handler.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2022-06-28 17:29:02 -07:00
Yakiv Huryk
0ced7081c7
[asan] add print_suppressions=0 to ASAN configs (#11252)
- Why I did it
To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker

- How I did it
Added the "print_suppressions=0" to ASAN configs.

- How to verify it
add a suppression to some ASAN-enabled component (the suppression should catch some leak)
build with ENABLE_ASAN=y
run a test and see that the ASAN report is empty instead of having the suppression summary

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2022-06-28 18:45:52 +03:00
Prince George
11ff80b786
Skip CMIS manager (#10907)
* Removed unwanted changes

* Fix j2 compilation error

* Address review comment

* Add newline
2022-06-22 10:14:06 -07:00
Junhua Zhai
04ea32b0c2
[macsec] CLI Supports display of gearbox macsec counter (#11113)
Why I did it
To support gearbox macsec counter display, following Azure/sonic-swss-common#622.

How I did it
Use swsscommon CounterTable API
2022-06-18 19:17:05 +08:00
Saikrishna Arcot
ead106eda8
Add ping to swss-layer docker (#11093)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-06-10 07:40:37 -07:00
Sudharsan Dhamal Gopalarathnam
14f6f70ca3
[BGP]Adding configuration knob to allow advertise Loopback ipv6 /128 prefix (#10958)
* [BGP]Adding configuration knob to allow advertise Loopback ipv6 /128 prefix
By default when IPv6 address is configured with /128 as subnet mask in Loopback0 interface, it will be advertised as prefix with /64 subnet.
To control this behavior a new field 'bgp_adv_lo_prefix_as_128' is introduced in DEVICE_METADATA table which when set to true will advertise prefix with /128 subnet as it is.
2022-06-06 08:51:04 -07:00
kellyyeh
6bdbe14975
[dhcp_relay] Add "vlan missing ip helper" dhcp relay unittest (#10654) 2022-06-04 11:37:04 -07:00
abdosi
0285bfe42e
[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500)
What/Why I did:

Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. 

Scenario: For example if let's say database service goes in fail state  then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created.

Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface.

Issue 2: Use of IPVLAN vs MACVLAN

Currently we are using ipvlan mode.  However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device."  based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN)  it remains in Kernel Database and never goes to free pool even though namespace is deleted. 

Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address.

Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis   Server.

Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis  Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle

Solution: Added explicit PING from namespace that will check this reachability.

Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet.

Solution: Handle gracefully via try..except and log the message.
2022-05-24 16:54:12 -07:00
kellyyeh
2ead3aaefc
[dhcp6relay] Fix option parsing and add dhcpv6 client messages (#10819) 2022-05-24 14:37:16 -07:00
Ze Gan
0156c21eff
[macsec-cli]: Fixing to config MACsec on the port will clear port attributes in config db (#10903)
Why I did it
There is a bug that the Port attributes in CONFIG_DB will be cleared if using sudo config macsec port add Ethernet0 or sudo config macsec port del Ethernet0

How I did it
To fetch the port attributes before set/remove MACsec field in port table.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2022-05-24 18:42:54 +08:00
Ze Gan
910e1c6eb4
[docker-macsec]: MACsec CLI Plugin (#9390)
#### Why I did it
To provide MACsec config and show CLI for manipulating MACsec

#### How I did it
Add `config macsec` and `show macsec`.

#### How to verify it

This PR includes unittest for MACsec CLI, check Azp status.
- Add MACsec profile
```
admin@sonic:~$ sudo config macsec profile add --help
Usage: config macsec profile add [OPTIONS] <profile_name>

  Add MACsec profile

Options:
  --priority <priority>           For Key server election. In 0-255 range with
                                  0 being the highest priority.  [default:
                                  255]
  --cipher_suite <cipher_suite>   The cipher suite for MACsec.  [default: GCM-
                                  AES-128]
  --primary_cak <primary_cak>     Primary Connectivity Association Key.
                                  [required]
  --primary_ckn <primary_cak>     Primary CAK Name.  [required]
  --policy <policy>               MACsec policy. INTEGRITY_ONLY: All traffic,
                                  except EAPOL, will be converted to MACsec
                                  packets without encryption.  SECURITY: All
                                  traffic, except EAPOL, will be encrypted by
                                  SecY.  [default: security]
  --enable_replay_protect / --disable_replay_protect
                                  Whether enable replay protect.  [default:
                                  False]
  --replay_window <enable_replay_protect>
                                  Replay window size that is the number of
                                  packets that could be out of order. This
                                  field works only if ENABLE_REPLAY_PROTECT is
                                  true.  [default: 0]
  --send_sci / --no_send_sci      Send SCI in SecTAG field of MACsec header.
                                  [default: True]
  --rekey_period <rekey_period>   The period of proactively refresh (Unit
                                  second).  [default: 0]
  -?, -h, --help                  Show this message and exit.
```
- Delete MACsec profile
```
admin@sonic:~$ sudo config macsec profile del --help
Usage: config macsec profile del [OPTIONS] <profile_name>

  Delete MACsec profile

Options:
  -?, -h, --help  Show this message and exit.
```
- Enable MACsec on the port
```
admin@sonic:~$ sudo config macsec port add --help
Usage: config macsec port add [OPTIONS] <port_name> <profile_name>

  Add MACsec port

Options:
  -?, -h, --help  Show this message and exit.
```
- Disable MACsec on the port
```
admin@sonic:~$ sudo config macsec port del --help
Usage: config macsec port del [OPTIONS] <port_name>

  Delete MACsec port

Options:
  -?, -h, --help  Show this message and exit.

```
Show MACsec
```
MACsec port(Ethernet0)
---------------------  -----------
cipher_suite           GCM-AES-256
enable                 true
enable_encrypt         true
enable_protect         true
enable_replay_protect  false
replay_window          0
send_sci               true
---------------------  -----------
	MACsec Egress SC (5254008f4f1c0001)
	-----------  -
	encoding_an  2
	-----------  -
		MACsec Egress SA (1)
		-------------------------------------  ----------------------------------------------------------------
		auth_key                               849B69D363E2B0AA154BEBBD7C1D9487
		next_pn                                1
		sak                                    AE8C9BB36EA44B60375E84BC8E778596289E79240FDFA6D7BA33D3518E705A5E
		salt                                   000000000000000000000000
		ssci                                   0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN         179
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED    0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED    0
		SAI_MACSEC_SA_STAT_OUT_PKTS_ENCRYPTED  0
		SAI_MACSEC_SA_STAT_OUT_PKTS_PROTECTED  0
		-------------------------------------  ----------------------------------------------------------------
		MACsec Egress SA (2)
		-------------------------------------  ----------------------------------------------------------------
		auth_key                               5A8B8912139551D3678B43DD0F10FFA5
		next_pn                                1
		sak                                    7F2651140F12C434F782EF9AD7791EE2CFE2BF315A568A48785E35FC803C9DB6
		salt                                   000000000000000000000000
		ssci                                   0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN         87185
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED    0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED    0
		SAI_MACSEC_SA_STAT_OUT_PKTS_ENCRYPTED  0
		SAI_MACSEC_SA_STAT_OUT_PKTS_PROTECTED  0
		-------------------------------------  ----------------------------------------------------------------
	MACsec Ingress SC (525400edac5b0001)
		MACsec Ingress SA (1)
		---------------------------------------  ----------------------------------------------------------------
		active                                   true
		auth_key                                 849B69D363E2B0AA154BEBBD7C1D9487
		lowest_acceptable_pn                     1
		sak                                      AE8C9BB36EA44B60375E84BC8E778596289E79240FDFA6D7BA33D3518E705A5E
		salt                                     000000000000000000000000
		ssci                                     0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN           103
		SAI_MACSEC_SA_STAT_IN_PKTS_DELAYED       0
		SAI_MACSEC_SA_STAT_IN_PKTS_INVALID       0
		SAI_MACSEC_SA_STAT_IN_PKTS_LATE          0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_USING_SA  0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_VALID     0
		SAI_MACSEC_SA_STAT_IN_PKTS_OK            0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNCHECKED     0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNUSED_SA     0
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED      0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED      0
		---------------------------------------  ----------------------------------------------------------------
		MACsec Ingress SA (2)
		---------------------------------------  ----------------------------------------------------------------
		active                                   true
		auth_key                                 5A8B8912139551D3678B43DD0F10FFA5
		lowest_acceptable_pn                     1
		sak                                      7F2651140F12C434F782EF9AD7791EE2CFE2BF315A568A48785E35FC803C9DB6
		salt                                     000000000000000000000000
		ssci                                     0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN           91824
		SAI_MACSEC_SA_STAT_IN_PKTS_DELAYED       0
		SAI_MACSEC_SA_STAT_IN_PKTS_INVALID       0
		SAI_MACSEC_SA_STAT_IN_PKTS_LATE          0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_USING_SA  0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_VALID     0
		SAI_MACSEC_SA_STAT_IN_PKTS_OK            0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNCHECKED     0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNUSED_SA     0
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED      0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED      0
		---------------------------------------  ----------------------------------------------------------------
MACsec port(Ethernet1)
---------------------  -----------
cipher_suite           GCM-AES-256
enable                 true
enable_encrypt         true
enable_protect         true
enable_replay_protect  false
replay_window          0
send_sci               true
---------------------  -----------
	MACsec Egress SC (5254008f4f1c0001)
	-----------  -
	encoding_an  1
	-----------  -
		MACsec Egress SA (1)
		-------------------------------------  ----------------------------------------------------------------
		auth_key                               35FC8F2C81BCA28A95845A4D2A1EE6EF
		next_pn                                1
		sak                                    1EC8572B75A840BA6B3833DC550C620D2C65BBDDAD372D27A1DFEB0CD786671B
		salt                                   000000000000000000000000
		ssci                                   0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN         4809
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED    0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED    0
		SAI_MACSEC_SA_STAT_OUT_PKTS_ENCRYPTED  0
		SAI_MACSEC_SA_STAT_OUT_PKTS_PROTECTED  0
		-------------------------------------  ----------------------------------------------------------------
	MACsec Ingress SC (525400edac5b0001)
		MACsec Ingress SA (1)
		---------------------------------------  ----------------------------------------------------------------
		active                                   true
		auth_key                                 35FC8F2C81BCA28A95845A4D2A1EE6EF
		lowest_acceptable_pn                     1
		sak                                      1EC8572B75A840BA6B3833DC550C620D2C65BBDDAD372D27A1DFEB0CD786671B
		salt                                     000000000000000000000000
		ssci                                     0
		SAI_MACSEC_SA_ATTR_CURRENT_XPN           5033
		SAI_MACSEC_SA_STAT_IN_PKTS_DELAYED       0
		SAI_MACSEC_SA_STAT_IN_PKTS_INVALID       0
		SAI_MACSEC_SA_STAT_IN_PKTS_LATE          0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_USING_SA  0
		SAI_MACSEC_SA_STAT_IN_PKTS_NOT_VALID     0
		SAI_MACSEC_SA_STAT_IN_PKTS_OK            0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNCHECKED     0
		SAI_MACSEC_SA_STAT_IN_PKTS_UNUSED_SA     0
		SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED      0
		SAI_MACSEC_SA_STAT_OCTETS_PROTECTED      0
		---------------------------------------  ----------------------------------------------------------------
```
2022-05-19 21:59:37 +08:00
Lawrence Lee
0eeb249fd8
[swss]: Convert swss docker to bullseye (#10484)
* [swss]: Convert swss docker to bullseye

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-05-17 13:55:59 -07:00
Alexander Allen
d202bf26d7
Upgrade mellanox platform containers (syncd / saiserver / syncd-rpc) and pmon to bullseye (#10580)
Fixes #9279

- Why I did it
Part of larger effort to move all SONiC systems to bullseye

- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3

- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
2022-05-10 12:45:28 +03:00
Jing Zhang
322363b9ab
[master][sonic-linkmgrd] submodule updates (#10763)
[master][sonic-linkmgrd] submodule updates

df51322 Longxiang Lyu   Fri May 6 10:01:46 2022 +0800   Add `ActiveActiveStateMachine` implementation (#64)
e721ceb Jing Zhang      Wed May 4 10:07:14 2022 -0700   Add doc for default route related changes  (#63)
7bb06fb Jing Zhang      Tue May 3 09:48:28 2022 -0700   Add Cli support to enable or disable default route related feature (#68)
e4b02cb Jing Zhang      Mon May 2 13:27:54 2022 -0700   Reset WaitActiveUp count before switching to active (#70)
212d960 Jing Zhang      Wed Apr 27 10:35:05 2022 -0700  lower log level to warning (#69)
48abc9e Jing Zhang      Thu Apr 14 16:50:04 2022 -0700  Add support to enable switchover time measurement (with link prober interval decreased to 10ms) feature  (#61)
c4858a6 Jing Zhang      Thu Apr 14 11:27:55 2022 -0700  Avoid proactively switching to `active` if default route is missing  (#62)

sign-off: Jing Zhang zhangjing@microsoft.com
2022-05-06 13:42:23 -07:00
kellyyeh
cfdb8431df
[dhcp6relay] Add dhcpv6 option check (#10486) 2022-05-05 18:04:14 -07:00
xumia
8ec8900d31
Support SONiC OpenSSL FIPS 140-3 based on SymCrypt engine (#9573)
Why I did it
Support OpenSSL FIPS 140-3, see design doc: https://github.com/Azure/SONiC/blob/master/doc/fips/SONiC-OpenSSL-FIPS-140-3.md.

How I did it
Install the fips packages.
To build the fips packages, see https://github.com/Azure/sonic-fips
Azure pipelines: https://dev.azure.com/mssonic/build/_build?definitionId=412

How to verify it
Validate the SymCrypt engine:

admin@sonic:~$ dpkg-query -W | grep openssl
openssl 1.1.1k-1+deb11u1+fips
symcrypt-openssl        0.1

admin@sonic:~$ openssl engine -v | grep -i symcrypt
(symcrypt) SCOSSL (SymCrypt engine for OpenSSL)
admin@sonic:~$
2022-05-06 07:21:30 +08:00
Lior Avramov
6284d136ec
[LLDP] Enhance lldmgrd Redis events handling (#10593)
Why I did it
When lldpmgrd handled events of other tables besides PORT_TABLE, error message was printed to log.

How I did it
Handle event according to its file descriptor instead of looping all registered selectables for each coming event.

How to verify it
I verified same events are being handled by printing events key and operation, before and after the change.
Also, before the change, in init flow after config reload, when lldpmgrd handled events of other tables besides PORT_TABLE, error messages were printed to log, this issue is solved now.
2022-05-04 08:21:02 -07:00
Kalimuthu-Velappan
bc30528341
Parallel building of sonic dockers using native dockerd(dood). (#10352)
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.

    docker-database:latest
    docker-swss:latest

When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.

This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.

	docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag

The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
2022-04-28 08:39:37 +08:00
Zhaohui Sun
cc30771f6b
Add python3 virtual environment for docker-ptf (#10599)
Why I did it
Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well.
Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts.

Signed-off-by: Zhaohui Sun zhaohuisun@microsoft.com

How I did it
Add python3 virtual environment for docker-ptf.
Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here:
p4lang/ptf#173
p4lang/ptf#174

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
2022-04-26 09:13:26 +08:00
Song Yuan
aa62e33339
[chassis] Do not configure LLDP on recirc ports (#7909)
Why I did it
Recirc port is used to only forward traffic from one asic to another asic. So it's not required to configure LLDP on it.

How I did it
Add interface prefix helper for recirc port. Similar to skip configuring LLDP on inband port, add check in lldpmgrd to skip recirc port by checking interface prefix.
2022-04-25 13:14:17 -07:00
Maxime Lorrillere
0606add017
[chassis] Get asic PCI ID from CHASSIS_STATE_DB and update asic_id in CONFIG_DB (#9681)
Asic PCI ID (PCI address) is collected by chassisd (inside pmon -
Azure/sonic-platform-daemons#175) and saved in CHASSIS_STATE_DB (in
redis_chassis). CHASSIS_STATE_DB is accessible by swss containers.

At docker-init.sh (script is called after swss container is created and before
anything that could run in swss like orchagent...), we wait until asic PCI ID
of the corresponding asic is populated by chassisd. We then update asic_id in
CONFIG_DB of asic's database.

A system supporting dynamic asic PCI ID identification requires to have a file
(empty) use_pci_id_chassis in its platform dir.

When orchagent runs, it has correct asic PCI ID in its CONFIG_DB.

Together with this PR:

Azure/sonic-platform-daemons#175
Azure/sonic-platform-common#185

Signed-off-by: Maxime Lorrillere <mlorrillere@arista.com>

Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
2022-04-25 13:09:42 -07:00
yozhao101
e24fe9bc60
[Monit] Fix the issue which shows Monit can not reset its counter. (#10288)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>

Why I did it
This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container.

Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following:

  check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
      if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"
If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted.
Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window.

The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok.

Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry:

    Program 'container_memory_telemetry'
         status                             Status ok
         monitoring status          Monitored
         monitoring mode          active
         on reboot                      start
         last exit value                0
         last output                    -
         data collected               Sat, 19 Mar 2022 19:56:26
Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times
within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service:

    Program 'container_memory_telemetry'
         status                             Status failed
         monitoring status          Monitored
         monitoring mode          active
         on reboot                      start
         last exit value                0
         last output                    -
         data collected               Tue, 01 Feb 2022 22:52:55
After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok.

How I did it
In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles.

How to verify it
I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.
2022-04-20 18:08:06 -07:00
Jing Zhang
8b5d908c92
Upgrade mux container to Bullseye (#10498)
sign-off: Jing Zhang zhangjing@microsoft.com

#### Why I did it
As part of the process moving containers from buster to bullseye.

#### How I did it
1. change base image from buster to bullseye. 
2. remove unused addition to orchagent run options 

#### How to verify it
Tested building locally.
2022-04-19 09:27:45 -07:00
Ze Gan
87036c34ec
[macsec]: Upgrade docker-macsec to bullseye (#10574)
Following the patch from : https://packages.debian.org/bullseye/wpasupplicant, to upgrade sonic-wpa-supplicant for supporting bullseye and upgrade docker-macsec.mk as a bullseye component.
2022-04-17 20:32:51 +08:00
Zhaohui Sun
44cf773a96
Revert "[docker-ptf]: Upgrade scapy to 2.4.5 in docker-ptf (#10507)" (#10537)
It upgraded scapy to 2.4.5 in docker-ptf container, after this upgrade, all scripts under ansible/roles/test/files/ptftests will import scapy 2.4.5, some test cases will fail because they are not upgraded accordingly.

Reverts #10507 to avoid breaking regression test.

This reverts commit 92efc01270.
2022-04-13 08:57:10 +08:00
kellyyeh
396a92cb2e
[dhcp_relay] Remove dhcp6mon (#10467) 2022-04-12 10:44:17 -07:00
Ashwin Srinivasan
b4f8f1dd22
Removed python2 dependency for sonic-pcied in sonic-platform-daemons (#10421)
Removed python2 support for sonic-platform-daemons that was causing unit
test errors in sonic_pcied.
* Removed config from docker supervisord jinja templates per VD review comment
* Removed space and python3 per QL comments
2022-04-09 13:16:50 -07:00
Ze Gan
92efc01270
[docker-ptf]: Upgrade scapy to 2.4.5 in docker-ptf (#10507)
Why I did it
Existing dataplane tests cannot be tested under MACsec environment due to the traffic under MACsec link is encrypted. So, I will override the dp_poll of ptf to MACsec dp_poll to decrypt the MACsec packets on injected ports (PR: Azure/sonic-mgmt#5490). MACsec decryption library depends on scapy 2.4.5.

How I did it
Upgrade scapy library to 2.4.5 by pip.

How to verify it
Check the scapy version in docker-ptf by

python -c "import scapy; print(scapy.__version__)"
2.4.5

Signed-off-by: Ze Gan <ganze718@gmail.com>
2022-04-08 22:26:40 -07:00
kellyyeh
330d11a128
Add EPMS and MgmtTsToR (#10478) 2022-04-07 21:49:42 -07:00
Stepan Blyshchak
4426f7715f
[scapy] update scapy to 2.4.5 and patch it (#10457)
Why I did it
Running warm-reboot in a loop for 500 times leads to this error on 318-th iteration:

Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors Traceback (most recent call last):
Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/bin/restore_neighbors.py", line 24, in <module>
Apr  2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors     from scapy.all import conf, in6_getnsma, inet_pton, inet_ntop, in6_getnsmac, get_if_hwaddr, Ether, ARP, IPv6, ICMPv6ND_NS, ICMPv6NDOptSrcLLAddr
Apr  2 15:56:27.346795 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/all.py", line 25, in <module>
Apr  2 15:56:27.346956 sonic INFO swss#/supervisord: restore_neighbors     from scapy.route import *
Apr  2 15:56:27.346995 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/route.py", line 205, in <module>
Apr  2 15:56:27.347089 sonic INFO swss#/supervisord: restore_neighbors     conf.iface = get_working_if()
Apr  2 15:56:27.347129 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/arch/linux.py", line 128, in get_working_if
Apr  2 15:56:27.347213 sonic INFO swss#/supervisord: restore_neighbors     ifflags = struct.unpack("16xH14x", get_if(i, SIOCGIFFLAGS))[0]
Apr  2 15:56:27.347250 sonic INFO swss#/supervisord: restore_neighbors   File "/usr/local/lib/python3.7/dist-packages/scapy/arch/common.py", line 31, in get_if
Apr  2 15:56:27.347345 sonic INFO swss#/supervisord: restore_neighbors     return ioctl(sck, cmd, struct.pack("16s16x", iff.encode("utf8")))
Apr  2 15:56:27.347365 sonic INFO swss#/supervisord: restore_neighbors OSError: [Errno 19] No such device
The issue was reported to scapy devs secdev/scapy#3369, the fix is secdev/scapy#3371, however there is no released scapy version with this fix right now, thus decided to build scapy v2.4.5 from sources and apply the fix in a form of a patch.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-04-07 14:23:35 +03:00
kellyyeh
8cd346d80b
Update docker-router-advertiser.supervisord.conf.j2 (#10375) 2022-04-06 09:44:21 -07:00
Saikrishna Arcot
588ed0b760
Upgrade router-advertiser container to Bullseye (#10374)
Change the base image from `docker-config-engine-buster` to
`docker-config-engine-bullseye`, and remove the hardcoded
`radvd` version from the Dockerfile.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-04-01 16:12:43 -07:00
Junchao-Mellanox
106fac5f09
[counter] Fix issue: non default counters will be delayed forever after fastboot (#10413)
- Why I did it
Fastboot will delay all counters in CONFIG DB, it relies on enable_counters.py to recover the delayed counters. However, enable_counters.py does not recover those non-default counters.

- How I did it
For non-default counters, if it is in CONFIG DB, put delay status to false after the waiting.

- How to verify it
Manual test
2022-03-31 15:23:57 +03:00
Lawrence Lee
b31df59c7c
[tun_pkt]: Wait for AsyncSniffer to init fully (#10346)
Fix for Tunnel packet handler can crash at system startup 
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-03-30 14:03:29 -07:00
judyjoseph
8e642848c2
Introduce the asic_subtype field for adding the sub platform variants. (#10235)
* Introduce the asic_subtype field for adding the sub platform variants. 
   It uses the value of TARGET_MACHINE variable in slave.mk.
2022-03-28 11:22:32 -07:00
tomer-israel
cc938e73a3
Dynamic port configuration - solve lldp issues when adding/removing ports (#9386)
#### Why I did it
when adding and removing ports after init stage we saw two issues:

first:
In several cases, after removing a port, lldpmgr is continuing to try to add a port to lldp with lldpcli command. the execution of this command is continuing to fail since the port is not existing anymore.

second:
after adding a port, we sometimes see this warning messgae:
"Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18"

we added these changes in order to solve it.
#### How I did it
port create events are taken from app db only.
lldpcli command is executed only when linux port is up.

when delete port event is received we remove this command from  pending_cmds dictionary

#### How to verify it
manual tests and running lldp tests


#### Description for the changelog
Dynamic port configuration - solve lldp issues when adding/removing ports
2022-03-25 17:47:24 -07:00
Robert J. Halstead
147d631065
[PINS] update sonic-p4rt docker to bullseye (#10182)
#### Why I did it
SONiC is migrating to bullseye. This will update the sonic-pins container to bullseye.

#### How I did it
The [sonic-pins code](https://github.com/Azure/sonic-buildimage/blob/master/rules/p4rt.mk) isn't dependent on any architecture so it will already build successfully for bullseye. This PR updates the docker to use bullseye.

#### How to verify it
Today we cannot build the docker-sonic-p4rt.gz target (e.g. Issue #9885). With this change the docker will build successfully. The P4RT executable will not run, because of a missing runtime library, libgmpxx, which I'll address in a followup PR.

#### Description for the changelog
Update docker-sonic-p4rt.gz target to build with Bullseye instead of Buster.
2022-03-23 17:21:36 -07:00
Saikrishna Arcot
4a5e75e45e
[restapi]: Don't use python/python2 for restapi start scripts (#10285)
Python 2 isn't installed by default in Buster and Bullseye containers,
and the scripts/modules can be used with Python 3, so make sure Python 3
is used.

Why I did it
After the Buster and Bullseye upgrade for the restapi container, processes will no longer start because supervisord is trying to call python and python2, both of which are unavailable.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-22 18:34:42 -07:00
Oleksandr Kozodoi
c5849c9650
Add scapy support for python3 virtual environment in the sonic-mgmt docker container (#10234)
Why I did it
Migration of sonic-mgmt codebase from Python 2 to Python 3

How I did it
Added scapy dependencies to the env-python3 virtual environment.

How to verify it
Run test case:
py.test --testbed=testbed-t0 --inventory=../ansible/lab --testbed_file=../ansible/testbed.csv --host-pattern=testbed-t0 -- module-path=../ansible/library lldp

Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com>
2022-03-16 12:00:51 +08:00
Saikrishna Arcot
5617b1ae3e
Image disk space reduction (#10172)
# Why I did it

Reduce the disk space taken up during bootup and runtime.

# How I did it

1. Remove python package cache from the base image and from the containers.
2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup.
3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example).


* Remove pip2 and pip3 caches from some containers

Only containers which appeared to have a significant pip cache size are
included here.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Don't create var-log.ext4 if we're storing logs in memory

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Run tune2fs on the device containing /host to not reserve any blocks for just the root user

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-15 18:12:49 -07:00
Shilong Liu
3fa627f290
Add a config variable to override default container registry instead of dockerhub. (#10166)
* Add variable to reset default docker registry
* fix bug in docker version control
2022-03-14 18:09:20 +08:00
xwjiang2021
b73da484c4
Install the allure-pytest package globally in sonic-mgmt docker (#10216)
Why I did it
This fix is to address issue: Azure/sonic-mgmt#5280

In the sonic-mgmt Dockerfile, python package allure-pytest is installed after ENV USER $user.
Consequently the package is installed to path /home/$user/.local and is only available to the $user
account. If we use root account in sonic-mgmt docker container to run tests, any script importing
the allure package will fail with ImportError. We need to install the allure-pytest package to global
directory instead of user local directory.

How I did it
Update the sonic-mgmt Dockerfile to ensure that the allure-pytest package is installed to global directory

How to verify it
Build a new sonic-mgmt docker image based on the changes.
Use sonic-mgmt docker container of the newly built image to run test scripts that depend on the
allure-pytest package. No ImportError is raised.
2022-03-12 20:18:12 +08:00
Oleksandr Kozodoi
3fa18d18d4
Add necessary changes for python3 virtual environment of sonic-mgmt docker container (#9277)
This PR includes necessary changes for the setup of the Python3 virtual environment in the sonic-mgmt docker container.

How to activate Python3 virtual environment?
Connect to the sonic-mgmt container
$ docker exec -ti sonic-mgmt bash
Activate the virtual environment
$ source /var/user/env-python3/bin/activate

Why I did it
Migration of sonic-mgmt codebase from Python 2 to Python 3

How I did it
Added all necessary dependencies to the env-python3 virtual environment.

Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com>
2022-03-09 12:28:01 +08:00
Alexander Allen
d0ff8b5f48
[pmon] Clean up supervisord chassis_db_init entry and fix startsecs (#10071)
Why I did it
Code review was still in progress when #9858 was merged and upon further testing I have arrived at a better solution.

How I did it
Modified supervisord configuration j2 template for pmon to require no minimum uptime for chassisd_db_init and to remove the redundant exit_codes directive

How to verify it
Boot switch and verify in syslog that there are no errors related to chassis_db_init
2022-03-03 17:10:15 -08:00
Lawrence Lee
4d2a55d373
[swss]: Wait for vlan intf to start ndppd (#10119)
- Use the `wait_for_link.sh` script to delay ndppd start until after the VLAN interface is ready
- Avoids issue where ndppd tries to change interface attributes before the interface is ready
2022-03-02 16:23:56 -08:00