Why I did it
redis_chassis process fails to start in the database-chassis on the supervisor card.
How I did it
The redis and redis_chassis processes in supervisor.conf has been modified to start with 'redis' as user. The modification only added redis and redis group for database containers except database-chassis container. This commit adds code to the docker-database-init.sh to add redis and redis group for the database-chassis to address this issue.
How to verify it
Booting the image on Supervisor. And verify the database-chassis container should be in UP state, and the database-chassis.service is in active(running) state
admin@ixre-cpm-chassis15:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9691c6111e0f docker-fpm-frr:latest "/usr/bin/docker_ini?" 16 hours ago Up 16 hours bgp5
...
c7c7159d4e0c docker-database:latest "/usr/local/bin/dock?" 31 hours ago Up 16 hours database-chassis
admin@ixre-cpm-chassis15:~$ sudo systemctl status database-chassis
? database-chassis.service - database-chassis container
Loaded: loaded (/lib/systemd/system/database-chassis.service; enabled-runtime; preset: enabled)
Active: active (running) since Sun 2024-03-03 04:43:12 UTC; 16h ago
Process: 1640 ExecStartPre=/usr/bin/database.sh start chassisdb (code=exited, status=0/SUCCESS)
Main PID: 1938 (database.sh)
Tasks: 10 (limit: 38307)
Memory: 49.7M
CGroup: /system.slice/database-chassis.service
??1938 /bin/bash /usr/bin/database.sh wait chassisdb
??1940 docker wait database-chassis
Notice: journal has been rotated since unit was started, output may be incomplete.
admin@ixre-cpm-chassis15:~$
Signed-off-by: mlok <marty.lok@nokia.com>
#### Why I did it
* Improved switch init time
### How I did it
* Replaced: `sonic-cfggen` -> `sonic-db-cli`
* Aggregated template list for `sonic-cfggen`
#### How to verify it
1. Run `warm-reboot`
Fix bug: #17161 (comment)
multi-asic platforms it will never go to the else part as DATABASE_TYPE is always ""
Microsoft ADO (number only): 25072889
Move the checker NAMESPACE_ID == "" back
Signed-off-by: Ze Gan <ganze718@gmail.com>
Sub PRs:
sonic-net/sonic-host-services#84
#17191
Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.
Microsoft ADO (number only): 25072889
How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.
Signed-off-by: Ze Gan <ganze718@gmail.com>
Why I did it
Running the Redis server as the "root" user is not recommended. It is suggested that the server should be operated by a non-privileged user.
Work item tracking
Microsoft ADO (number only): 15895240
How I did it
Ensure the Redis process is operating under the 'redis' user in supervisord and make redis user own REDIS_DIR inside db container.
How to verify it
Built new image, verify redis process is running as 'redis' user and all containers are up.
Signed-off-by: Mai Bui <maibui@microsoft.com>
#### Why I did it
To fix the timezone sync issue between the containers and the host. If a certain timezone has been configured on the host (SONIC) then the expectation is to reflect the same across all the containers.
This will fix [Issue:13046](https://github.com/sonic-net/sonic-buildimage/issues/13046).
For instance, a PST timezone has been set on the host and if the user checks the link flap logs (inside the FRR), it shows the UTC timestamp. Ideally, it should be PST.
*Critical process for database-chassis is redis-chassis but critical_process contains hard-coded
to `redis` program always. Instead using jinja2 template to render critical process list based on database docker type. redis-chassis for database-chassis docker and redis for regular database docker.
#### Why I did it
Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597
When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update.
#### How I did it
Change flush_unused_database code to use swsscommon API:
Change get_instancelist to getInstanceList.
Change get_dblist to getDbList.
#### How to verify it
Pass all E2E test.
Manually check syslog make sure error log not exist and swss, syncd, bgp service started.
Search code in Azure make sure there all similer case are fixed in this PR.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597
When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update.
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
**- Why I did it**
As part of migrating SONiC codebase from Python 2 to Python 3
**- How I did it**
- No longer install Python 2 in docker-base-buster or docker-config-engine-buster.
- Install Python 2 and pip2 in the following containers until we can completely eliminate it there:
- docker-platform-monitor
- docker-sonic-mgmt-framework
- docker-sonic-vs
- Pin pip2 version <21 where it is still temporarily needed, as pip version 21 will drop support for Python 2
- Also preform some other cleanup, ensuring that pip3, setuptools and wheel packages are installed in docker-base-buster, and then removing any attempts to re-install them in derived containers
* restoring each database with all data before warmboot and then flush unused data in each instance, following the multiDB warmboot design at https://github.com/Azure/SONiC/blob/master/doc/database/multi_database_instances.md
* restore needs to be done in database docker since we need to know the database_config.json in new version
* copy all data rdb file into each instance restoration location andthen flush unused database
* other logic is the same as before
* backing up database part is in another PR at sonic-utilities https://github.com/Azure/sonic-utilities/pull/1205, they depend on each other
The service crash when the platform boots due to missing waits.
/usr/bin/database.sh tries to operate on a missing socket and fails.
We now wait for the chassis database to be ready the same way we do database.
**- Why I did it**
We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.
**- How I did it**
- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
- Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
use correct chassisdb.conf path while bringing up chassis_db service on VoQ modular switch.chassis_db service on VoQ modular switch.
resolves#5631
Signed-off-by: Honggang Xu <hxu@arista.com>
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
* Support for connecting to DB in namespace via IP:port ( using docker bridge network ) for applications in multi-asic platform.
* Added the default IP as 127.0.0.1 if the IPaddress derivation from interface fails.
Moved the localhost loopback IP binding logic into the supervisor.j2 file.
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host
* Updates based on comments
* Adding the j2 templates for database_config and database_global files.
* Updating to retrieve the redis DIR's to be mounted from database_global.json file.
* Additional check to see if asic.conf file exists before sourcing it.
* Updates based on PR comments discussion.
* Review comments update
* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.
* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.
* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.
* Update the comments for sudo usage in docker_image_ctrl.j2
* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.
* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli
* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
this is the first step to moving different databases tables into different database instances
in this PR, only handle multiple database instances creation based on user configuration at /etc/sonic/database_config.json
we keep current method to create single database instance if no extra/new DATABASE configuration exist in database_config.json file.
if user try to configure more db instances at database_config.json , we create those new db instances along with the original db instance existing today.
The configuration is as below, later we can add more db related information if needed:
{
...
"DATABASE": {
"redis-db-01" : {
"port" : "6380",
"database": ["APPL_DB", "STATE_DB"]
},
"redis-db-02" : {
"port" : "6381",
"database":["ASIC_DB"]
},
}
...
}
The detail description is at design doc at Azure/SONiC#271
The main idea is : when database.sh started, we check the configuration and generate corresponding scripts.
rc.local service handle old_config copy when loading new images, there is no dependency between rc.local and database service today, for safety and make sure the copy operation are done before database try to read it, we make database service run after rc.local
Then database docker started, we check the configuration and generate corresponding scripts/.conf in database docker as well.
based on those conf, we create databases instances as required.
at last, we ping_pong check database are up and continue
Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com