Commit Graph

3901 Commits

Author SHA1 Message Date
Qi Luo
4ed2ff805f
[swss-common]: Advance submodule (#5799)
ec96868 2020-11-03 | Fix: treat DBConnector timeout=0 as infinite timeout (#408) [Qi Luo]
b4b8334 2020-11-03 | Add lua script for redis multi keys api hmset and del (#406) [Kamil Cudnik]
2020-11-04 05:49:52 -08:00
Qi Luo
215ce13890
[swss-common]: Advance submodule (#5780)
9b0e955 2020-10-30 | [schema]: Add MACsec support (#403) [Ze Gan]
f24dc97 2020-10-28 | Implementation of System ports initialization, Interface & Neighbor Synchronization... (#380) [minionatwork]
91e0885 2020-10-27 | Mux cable schema definitions (#398) [Prince Sunny]
d0cedea 2020-10-27 | Change log level (#402) [Prince Sunny]
64b3cfe 2020-10-24 | SonicV2Connector supports host and decode_responses in constructor parameters (#401) [Qi Luo]
f8b0065 2020-10-23 | Implement FieldValueMap update method (#400) [Qi Luo]
2020-11-03 14:28:42 -08:00
Joe LeVeque
e3164d5fb4
[lldpmgrd] Convert to Python 3 (#5785)
- Convert lldpmgrd to Python 3
- Install Python 3 swsscommon package in docker-lldp
2020-11-03 12:50:11 -08:00
Andriy Kokhan
0a1c5792a1
[BFN] Updated SDK packages to 20201023 (#5708)
- BFN platform was affected by ACL changes that add IPV6_NEXT_HEADER support.
- Bugfixes

Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>
2020-11-03 11:26:26 -08:00
Joe LeVeque
84d3a26000
[sonic-py-swsssdk] Update submodule (#5757)
Commits included:

* src/sonic-py-swsssdk 748c404...1ea30d2 (1):
  > Fix bug: ConfigDBConnector.get_table does not work in python3 (#92)
2020-11-02 20:31:14 -08:00
Kebo Liu
1158701edc
add pcied config files for mellanox platform (#5669)
This PR has a dependency on community change to move PCIe config files from $PLATFORM/plugin folder to $PLATFORM/ folder
- Why I did it
To support PCIed daemon on Mellanox platforms
- How I did it
Add PCIed config yaml files for all Mellanox platforms
Update pmon daemon config files for SimX platforms
2020-11-02 19:45:36 -08:00
lguohan
9d7355287f
[submodule]: update sairedis (#5772)
* b458e6f 2020-11-02 | [syncd] Disable use bulk api by default (#688) (HEAD, origin/master, origin/HEAD) [Kamil Cudnik]
* d978789 2020-11-01 | [syncd] Set modify redis flag in RedisSelectableChannel (#689) [Kamil Cudnik]
* 5df11f5 2020-11-01 | [syncd] Lower bulk missing api message level from error to info (#687) [Kamil Cudnik]
* dc73a1d 2020-10-30 | [saiplayer] Fix log messages (#686) [Kamil Cudnik]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-02 16:59:47 -08:00
Lawrence Lee
10ab46f7a0
Revert "[docker-base]: Rate limit priority INFO and lower in syslog" (#5763)
* This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.
2020-11-02 08:49:40 -08:00
Junchao-Mellanox
1be9c4a33a
[Mellanox] Update SDK 4.4.1956 and FW *.2008.1956 (#5768)
* [Mellanox] Update SDK 4.4.1956 and FW *.2008.1956

* Update submoudle pointer for Switch-SDK-drivers
2020-11-02 09:56:02 +02:00
lguohan
98d370bcef
[submodule]: swss/sairedis module update (#5765)
swss:

* d7643f2 2020-11-01 | [tlm_teamd]: Make the destionation for std::transform() to use std::back_inserter() for allocating new space for the copied objects (#1490) (HEAD, origin/master, origin/HEAD) [pavel-shirshov]
* 7fa7cd6 2020-10-31 | [vstest]: stablize fgnhg test (#1491) [lguohan]
* 9b0696e 2020-10-29 | Create vnet tunnel map only if it doesn't exist (#1482) [Prince Sunny]
* 0481e99 2020-10-29 | [acl] Update CRM to include LAG bindings for ACL tables (#1487) [Danny Allen]

sairedis

* 5df11f5 2020-11-01 | [syncd] Lower bulk missing api message level from error to info (#687) (HEAD, origin/master, origin/HEAD) [Kamil Cudnik]
* dc73a1d 2020-10-30 | [saiplayer] Fix log messages (#686) [Kamil Cudnik]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-01 20:08:55 -08:00
Blueve
698b5544c9
[openssh] Introduce custom openssh-server package for supporting reverse console SSH (#5717)
* Build and install openssh from source
* Copy openssh deb package to dest folder
* Update make rule
* Update sonic debian extension
* Append empty line before EOF
* Update openssh patch
* Add openssh-server to base image dependency
* Fix indent type
* Fix comments
* Use commit id instead of tag id and add comment

Signed-off-by: Jing Kan jika@microsoft.com
2020-11-02 10:31:15 +08:00
Joe LeVeque
f2a258aca9
[docker-platform-monitor] Check if sonic_platform is available before installed (#5764)
On Arista platforms, sonic_platform packages are not installed in the PMon container, but are rather mounted into the container from the host OS. Therefore, pip show sonic_platform will fail in the PMon container. This change will first check if we can import sonic_platform. If this fails, it will then fall back to checking if the package is installed. If both fail, it will attempt to install the package.
2020-11-01 01:48:07 -08:00
lguohan
c8a00eda95
[mgmt ip]: mvrf ip rule priority change to 32765 (#5754)
Fix Azure/SONiC#551

When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. 

This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.

Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".

This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.

This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.

Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
2020-10-31 20:45:59 -07:00
gechiang
908787d2a2
Added new method get_back_end_interface_set() to speed up back-end in… (#5731)
Added new MultiASIC util method "get_back_end_interface_set()" to speed up back-end interface check by allowing caller to cache the back-end intf into a set. This way the caller can use this set for all subsequent back-end interface check requests  instead of each time need to read from redis DB which become a scaling issue for cases such as checking for thousands of nexthop routes for filtering purpose.
2020-10-31 18:06:06 -07:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Renuka Manavalan
8d8aadb615
Load config after subscribe (#5740)
- Why I did it
The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order.

- How I did it
Do a load after after updating all feature states.

- How to verify it
Not a easy one
Have a script that
restart hostcfgd
sleep 2s
run redis-cli/config command to update AAA/TACACS table

Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute.

- When it repro:
The updates will not reflect in /etc/pam.d/common-auth-sonic
2020-10-31 16:38:32 -07:00
Shi Su
279943c11f
[sonic-swss] Update submodule (#5745)
Update the sonic-swss submodule. The following are the commits in the submodule.

[neighorch] Remove pending DEL operation after SET operation for the same key
2265f548386929b7827d1079efd453128f1ec1f9

[NAT]: Update nat entries to use nat_type to support DNAT Pool changes.
8696e939f973895ead4731ad499a72f257a3b510

[intfsorch] Init proxy_arp variable while adding router interface.
1da3c773762fa637a5ea47017715361bede50a4a
2020-10-31 02:26:19 -07:00
Junchao-Mellanox
781188f549
[thermalctld] Enlarge startretries value to avoid thermalctld not able to restart during regression test (#5633)
Increase startretires value from default of 10 to 50 to prevent supervisor from placing thermalctld in FATAL state during regression testing. Also ensures supervisord tries hard to get thermalctld running in production, as thermalctld is critical to prevent device from overheating.
2020-10-30 12:01:17 -07:00
Joe LeVeque
6333bb73b0
Explicitly call pip2 rather than pip in locations where both pip2 and pip3 are installed (#5747)
As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias.

Also some other pip-related cleanup
2020-10-30 09:43:14 -07:00
Samuel Angebault
12911ba619
[Arista] Update arista driver submodules (#5736)
- Change `/run/arista` mount to pmon by `/var/run/platform_cache`
 - Python3 by default for Arista platform initialisation
 - Fix outstanding py2/3 compatibility issues (eeprom mostly)
 - Use pytest for unit testing
 - Miscellaneous modular fixes
2020-10-30 04:17:30 -07:00
Joe LeVeque
b132ca0980
[build]: Upgrade pip3 before pip2 (#5743)
Upgrading pip3 after pip2 caused the pip command to be aliased to the pip3 command. However, since we are still transitioning from Python 2 to Python 3, most pip commands in the codebase are expecting pip to alias to pip2. The proper solution here is to explicitly call pip2 and pip3, and no longer call pip, however this will require extensive changes and testing, so to quickly fix this issue, we upgraded pip2 after pip3 to ensure that pip2 is installed after pip3.
2020-10-29 19:01:17 -07:00
Arun Saravanan Balachandran
6145e4f6f1
[DellEMC]: FanDrawer and get_high_critical_threshold Platform API implementation for S6000, S6100, Z9100 and Z9264F (#5673)
- Implement FanDrawer and get_high_critical_threshold Platform API for S6000, S6100, Z9100 and Z9264F.
- Fix incorrect fan direction values in S6100, Z9100
2020-10-29 18:05:16 -07:00
Joe LeVeque
e111204206
[caclmgrd] Convert to Python 3; Add to sonic-host-services package (#5739)
To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported, convert caclmgrd to Python 3 and add to sonic-host-services package
2020-10-29 16:29:12 -07:00
Baptiste Covolato
527a69dfbf
[arista/7800r3_48cq(m)2_lc] remove platform_reboot (#5653)
We don't need a custom platform reboot on Clearwater2(Ms). They are expected to be rebooted via a normal linux soft reboot.

Remove symlink to the arista common platform reboot for those 2 platforms.
2020-10-29 16:26:41 -07:00
Volodymyr Boiko
fd7e2a12bc
[submodule-update][sonic-platform-daemon] Update submodule (#5741)
95b1696 [xcvrd] Remove dependence on enum; Add 'sonic-py-common' as dependencies in setup.py (#106)
61ed24e [thermalctld] Print exception using repr(e) to get more information (#103)
8507085 [psud] Fix psud logging (#98)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2020-10-29 09:46:55 -07:00
Shi Su
5ee5c13f32
Enable synchronous mode by default and add in minigraph parser (#5735) 2020-10-29 09:15:12 -07:00
Aravind Mani
42d2bf1a53
[devices]: DellEMC Z9264f buffer changes (#5429)
**- Why I did it**
Converted two SP model to single pool model and modified the buffer size.
**- How I did it**
Changed buffer_default settings for all the DellEMC Z9264f HWSKU's.
**- How to verify it**
Check SP register values in NPU shell.
**- Which release branch to backport (provide reason below if selected)**
Need to be cherry picked for 201911 branch.
2020-10-29 01:52:24 -07:00
Dong Zhang
d95e1969c8
[swsssdk] update submodule for adding new MultiDB API (#5737) 2020-10-28 21:47:01 -07:00
judyjoseph
6088bd59de
[multi-ASIC] BGP internal neighbor table support (#5520)
* Initial commit for BGP internal neighbor table support.
  > Add new template named "internal" for the internal BGP sessions
  > Add a new table in database "BGP_INTERNAL_NEIGHBOR"
  > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"

* Changes in template generation tests with the introduction of internal neighbor template files.
2020-10-28 16:41:27 -07:00
Shi Su
09d5a62fad
[sonic-sairedis] Update submodule (#5728)
Update the sonic-sairedis submodule. The following are the commits in the submodule.

[syncd_init_common.sh] Use template file to retrieve vars (#683)
0bf336a3e895167357d5d2e5a988471e115522e8

[syncd/FlexCounter]:Fix last remove bug (#679)
4d21a264d5956501bf69ad3a89ea2ebccd369654
2020-10-27 20:42:57 -07:00
Lawrence Lee
a639021af2
[minigraph.py]: Parse VLAN MAC address from minigraph when present (#5726) 2020-10-27 17:20:55 -07:00
Danny Allen
e0b09d0998
[swss] Update swss submodule (#5719)
Signed-off-by: Danny Allen <daall@microsoft.com>
2020-10-27 14:38:43 -07:00
lguohan
07748a939f
[gbsyncd]: add gbsyncd to FEATURE table (#5683)
remove syncd from critical process list because
gbsyncd process will exit for platform without
gearbox.

closes #5623

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-10-27 11:40:23 -07:00
vdahiya12
dfe005545a
[sonic-platform-common] update submodule (#5721)
a659219 [SONIC_SFP] adding abstract methods for reading and writing the eeprom address space within platform api (#126)
848f4a6 Add third-party licenses (#138)
c2ecd9a Add license file (#137)
403747a [sonic-platform-common] Add new platform API for SONiC Physical MIB Extension feature (#134)
19b8545 [sonic_y_cable] fix the unpacking (#135)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2020-10-27 09:36:26 -07:00
bingwang-ms
36c52cca2b
Fix 'NoSuchProcess' exception in process_checker (#5716)
The psutil library used in process_checker create a cache for each
process when calling process_iter. So, there is some possibility that
one process exists when calling process_iter, but not exists when
calling cmdline, which will raise a NoSuchProcess exception. This commit
fix the issue.

Signed-off-by: bingwang <bingwang@microsoft.com>
2020-10-27 09:25:35 +08:00
Joe LeVeque
9e34003136
[sonic-config-engine] Clean up dependencies, pin versions; install Python 3 package in Buster container (#5656)
To clean up the image build procedure, and let setuptools/pip[3] implicitly install Python dependencies. Also use ipaddress package instead of ipaddr.
2020-10-26 13:48:50 -07:00
Junchao-Mellanox
7bee5093f1
[Mellanox] Support max/min speed for PSU fan (#5682)
As new hw-mgmt expose the sysfs for PSU fan max speed, we need support max/min speed for PSU fan in mellanox platform API.
2020-10-26 12:47:12 -07:00
shlomibitton
e66d49a57c
[LLDP] Fix for LLDP advertisements being sent with wrong information. (#5493)
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Fix comments

* Fix unit-test output caused a failure during build

* Add 'run_cmd' function and use it

* Resume lldpd even if port init timeout reached
2020-10-26 19:38:09 +02:00
judyjoseph
c14b41dc30
[submodule]: Advance sonic-swss-common and sonic-sairedis module (#5703)
Advance sonic-swss-common submodule by adding the following  commits 

3ec30ef Deprecate RedisClient and remove unused header file (#399)
165a679 Schema update for BGP internal neighbor table (#389)
262e330 Fix SonicV2Connector interfaces (#396)

Advance sonic-sairedis submodule by adding the following  commits 

bc3e044  [Sai]: Change Sai::set log to level INFO (#680)
b16bc8b Clean code: remove unused header file (#678)
40439b4 [syncd] Remove depreacated dependency on swss::RedisClient (#681)
1b6fc2e [syncd] Add supports of bulk api in syncd (#656)
a9f69c1 [syncd] Add to handle FDB MOVE notification (#670)
c7ef5e9 [gbsyncd] exit with zero when platform has no gearbox (#676)
57228fd [gbsyncd]: add missing python dependency (#675)
02a57a6 [vs] Add CRM SAI attributes to virtual switch interface (#673)
609445a fix boot type for fast boot (#674)
1325cdf Add support for saiplayer bulk API and add performance timers (#666)
1d84b90 Add ZeroMQ communication channel between sairedis and syncd (#659)
017056a Support System ports config (#657)
0f3668f Enable fabric counter for syncd's FlexCounter (#669)
2020-10-26 09:19:16 -07:00
Lawrence Lee
c4f9bec562
[minigraph.py]: Add support for parsing mux cable (#5676)
Find LogicalLinks in minigraph and parse the port information. A new field called `mux_cable` is added to each port's entry in the Port table in config DB:

```
PORT|Ethernet0: {
	"alias": "Ethernet4/1"
	...
	"mux_cable": "true"
}
```

If a mux cable is present on a port, the value for `mux_cable` will be `"true"`. If no mux cable is present, the attribute will either be omitted (default behavior) or set to `"false"`.
2020-10-26 08:52:20 -07:00
lguohan
7d4ab4237a
[docker-base]: swss/syncd support use zmq as rpc channel (#5715)
install libzmq5 in docker-base

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-10-25 20:39:38 -07:00
Nazarii Hnydyn
5486f87afc
[Mellanox] Update platform components config files. (#5685)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-10-25 19:44:37 +02:00
Nazarii Hnydyn
ba6f012cc6
[sonic-py-common]: Fix syslog implicit min priority override (#5707)
Current implementation of logger class is based on standard python syslog library.  
Thus, logger class can be instantiated in different places and share the same context across the entire process.  
This means that reducing log severity level will affect other modules which use logging facility.

**- Why I did it**
* To fix syslog implicit min priority override

**- How I did it**
* Added per instance log severity check

**- How to verify it**
1. Run code snippet
```
from sonic_py_common import logger

log1 = logger.Logger(log_identifier='myApp1')
log1.set_min_log_priority_debug()
log1.log_error("=> this is error")
log1.log_warning("=> this is warning")
log1.log_notice("=> this is notice")
log1.log_info("=> this is info")
log1.log_debug("=> this is debug")

log2 = logger.Logger(
    log_identifier='myApp2',
    log_facility=logger.Logger.LOG_FACILITY_DAEMON,
    log_option=(logger.Logger.LOG_OPTION_NDELAY | logger.Logger.LOG_OPTION_PID)
)
log2.log_error("=> this is error")
log2.log_warning("=> this is warning")
log2.log_notice("=> this is notice")
log2.log_info("=> this is info")
log2.log_debug("=> this is debug")
```
2. Sample output:
```
Oct 23 15:08:30.447301 sonic ERR myApp1: => this is error
Oct 23 15:08:30.447908 sonic WARNING myApp1: => this is warning
Oct 23 15:08:30.448305 sonic NOTICE myApp1: => this is notice
Oct 23 15:08:30.448696 sonic INFO myApp1: => this is info
Oct 23 15:08:30.449063 sonic DEBUG myApp1: => this is debug

Oct 23 15:08:30.449442 sonic ERR myApp2[19178]: => this is error
Oct 23 15:08:30.449819 sonic WARNING myApp2[19178]: => this is warning
Oct 23 15:08:30.450183 sonic NOTICE myApp2[19178]: => this is notice
```

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-10-24 12:36:22 -07:00
Shi Su
67408c85aa
[synchronous-mode] Add template file for synchronous mode (#5644)
The orchagent and syncd need to have the same default synchronous mode configuration. This PR adds a template file to translate the default value in CONFIG_DB (empty field) to an explicit mode so that the orchagent and syncd could have the same default mode.
2020-10-23 13:08:35 -07:00
Junchao-Mellanox
15c59e1d8c
[Mellanox] Re-initialize SFP object when detecting a new SFP insertion (#5695)
When detecting a new SFP insertion, read its SFP type and DOM capability from EEPROM again.

SFP object will be initialized to a certain type even if no SFP present. A case could be:

1. A SFP object is initialized to QSFP type by default when there is no SFP present
2. User insert a SFP with an adapter to this QSFP port
3. The SFP object fail to read EEPROM because it still treats itself as QSFP.

This PR fixes this issue.
2020-10-23 12:36:11 -07:00
Samuel Angebault
5bfe37ca42
[Arista] Update driver submodules (#5686)
- Enable thermalctld support for our platforms
 - Fix Chassis.get_num_sfp which had an off by one
 - Implement read_eeprom and write_eeprom in SfpBase
 - Refactor of Psus and PsuSlots. Psus they are now detected and metadata reported
 - Improvements to modular support

Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
2020-10-23 12:28:36 -07:00
Joe LeVeque
3a4435eb53
Add sonic-host-services and sonic-host-services-data packages (#5694)
**- Why I did it**

Install all host services and their data files in package format rather than file-by-file

**- How I did it**

- Create sonic-host-services Python wheel package, currently including procdockerstatsd
  - Also add the framework for unit tests by adding one simple procdockerstatsd test case
- Create sonic-host-services-data Debian package which is responsible for installing the related systemd unit files to control the services in the Python wheel. This package will also be responsible for installing any Jinja2 templates and other data files needed by the host services.
2020-10-23 09:52:29 -07:00
Guohan Lu
bb641913c4 Revert "[build]: Fixes the missing dependency in the debian package is not triggering the docker rebuild (#5650)"
This reverts commit 5c5e42454d.
2020-10-23 10:47:18 +00:00
judyjoseph
ace7f24cba
[docker-teamd]: Add teamd as a depedent service to swss (#5628)
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service   

**- How to verify it**

Verified the following scenario's with the following testbed 
VM1 ----------------------------[DUT 6100] -----------------------VM2,  ping traffic continuous between VMs

1. Stop teamd docker alone  
      >  swss, syncd dockers seen going away
      >  The LAG reference count error messages seen for a while till swss docker stops.
      >  Dockers back up.

2. Enable WR mode for teamd. Stop teamd docker alone  
      >  swss, syncd dockers not removed.
      >  The LAG reference count error messages not seen
      >  Repeated stop teamd docker test - same result, no effect on swss/syncd.

3. Stop swss docker. 
      >  swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

4. Enable WR mode for swss . Stop swss docker 
      >  swss goes off not affecting syncd/teamd dockers.

5. Config reload 
      > no reference counter error seen, dockers comes back correctly, with interfaces up

6. Warm reboot, observations below
	 > swss docker goes off first 
	 > teamd + syncd goes off to the end of WR process.
 	 > dockers comes back up fine.
	 > ping traffic between VM's was NOT HIT

7. Fast reboot, observations below
	 > teamd goes off first ( **confirmed swss don't exit here** )
	 > swss goes off next 
	 > syncd goes away at the end of the FR process
	 > dockers comes back up fine.
	 > there is a traffic HIT as per fast-reboot

8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
2020-10-23 00:41:16 -07:00
yozhao101
af97e23686
[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689)
**- Why I did it**
If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service.
The snmp.timer service will first stop snmp.service and later start snmp.service. 

In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was
updated.

**- How I did it**
When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be
cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a
container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached
value will also be updated. Otherwise, nothing will be done.

**- How to verify it**
We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands  `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-10-22 20:01:07 -07:00