- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.
Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.
- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:
The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.
If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.
- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.
Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.
- Which release branch to backport (provide reason below if selected)
201811
201911
[x ] 202006
**- Why I did it**
As part of migrating SONiC codebase from Python 2 to Python 3
**- How I did it**
- No longer install Python 2 in docker-base-buster or docker-config-engine-buster.
- Install Python 2 and pip2 in the following containers until we can completely eliminate it there:
- docker-platform-monitor
- docker-sonic-mgmt-framework
- docker-sonic-vs
- Pin pip2 version <21 where it is still temporarily needed, as pip version 21 will drop support for Python 2
- Also preform some other cleanup, ensuring that pip3, setuptools and wheel packages are installed in docker-base-buster, and then removing any attempts to re-install them in derived containers
* restoring each database with all data before warmboot and then flush unused data in each instance, following the multiDB warmboot design at https://github.com/Azure/SONiC/blob/master/doc/database/multi_database_instances.md
* restore needs to be done in database docker since we need to know the database_config.json in new version
* copy all data rdb file into each instance restoration location andthen flush unused database
* other logic is the same as before
* backing up database part is in another PR at sonic-utilities https://github.com/Azure/sonic-utilities/pull/1205, they depend on each other
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
* Support for connecting to DB in namespace via IP:port ( using docker bridge network ) for applications in multi-asic platform.
* Added the default IP as 127.0.0.1 if the IPaddress derivation from interface fails.
Moved the localhost loopback IP binding logic into the supervisor.j2 file.
* [database] Implement the auto-restart feature for database container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] Remove the duplicate dependency in service files. Since we
already have updategraph ---> config_setup ---> database, we do not need
explicitly add database.service in all other container service files.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Reorganize the line 73 in event listener script.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] update the file sflow.service.j2 to remove the duplicate
dependency.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add comments in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Update the comments in line 56.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add parentheses for if statement in line 76 in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
this is the first step to moving different databases tables into different database instances
in this PR, only handle multiple database instances creation based on user configuration at /etc/sonic/database_config.json
we keep current method to create single database instance if no extra/new DATABASE configuration exist in database_config.json file.
if user try to configure more db instances at database_config.json , we create those new db instances along with the original db instance existing today.
The configuration is as below, later we can add more db related information if needed:
{
...
"DATABASE": {
"redis-db-01" : {
"port" : "6380",
"database": ["APPL_DB", "STATE_DB"]
},
"redis-db-02" : {
"port" : "6381",
"database":["ASIC_DB"]
},
}
...
}
The detail description is at design doc at Azure/SONiC#271
The main idea is : when database.sh started, we check the configuration and generate corresponding scripts.
rc.local service handle old_config copy when loading new images, there is no dependency between rc.local and database service today, for safety and make sure the copy operation are done before database try to read it, we make database service run after rc.local
Then database docker started, we check the configuration and generate corresponding scripts/.conf in database docker as well.
based on those conf, we create databases instances as required.
at last, we ping_pong check database are up and continue
Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com