sonic-buildimage

Archived

Author	SHA1	Message	Date
Ze Gan	9f08f88a0d	[dpu]: Add DPU database service (#17161 ) Sub PRs: sonic-net/sonic-host-services#84 #17191 Why I did it According to the design, the database instances of DPU will be kept in the NPU host. Microsoft ADO (number only): 25072889 How I did it To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database. Signed-off-by: Ze Gan <ganze718@gmail.com>	2023-11-17 09:10:03 -08:00
Mai Bui	ff5f46955c	[database] make Redis process runs as non-root user (#16326 ) Why I did it Running the Redis server as the "root" user is not recommended. It is suggested that the server should be operated by a non-privileged user. Work item tracking Microsoft ADO (number only): 15895240 How I did it Ensure the Redis process is operating under the 'redis' user in supervisord and make redis user own REDIS_DIR inside db container. How to verify it Built new image, verify redis process is running as 'redis' user and all containers are up. Signed-off-by: Mai Bui <maibui@microsoft.com>	2023-09-01 23:03:15 -07:00
Hua Liu	c91707ff31	Migrate flush_unused_database from py-redis to sonic-swss-common (#15511 ) Migrate flush_unused_database from py-redis to sonic-swss-common #### Why I did it flush_unused_database using py-redis, but sonic-swss-common already support flushdb, so we need migrate to sonic-swss-common ##### Work item tracking - Microsoft ADO (number only): 24292565 #### How I did it Migrate flush_unused_database from py-redis to sonic-swss-common #### How to verify it Pass all UT and E2E test #### Description for the changelog Migrate flush_unused_database from py-redis to sonic-swss-common	2023-06-29 15:08:54 -07:00
nmoray	f978b2bb53	Timezone sync issue between the host and containers (#14000 ) #### Why I did it To fix the timezone sync issue between the containers and the host. If a certain timezone has been configured on the host (SONIC) then the expectation is to reflect the same across all the containers. This will fix [Issue:13046](https://github.com/sonic-net/sonic-buildimage/issues/13046). For instance, a PST timezone has been set on the host and if the user checks the link flap logs (inside the FRR), it shows the UTC timestamp. Ideally, it should be PST.	2023-06-25 16:36:09 -07:00
abdosi	439d4eab98	[chassis] Fixed critical process not correct for database-chassis docker (#13445 ) *Critical process for database-chassis is redis-chassis but critical_process contains hard-coded to `redis` program always. Instead using jinja2 template to render critical process list based on database docker type. redis-chassis for database-chassis docker and redis for regular database docker.	2023-01-20 10:21:48 -08:00
Junchao-Mellanox	2126def04e	[infra] Support syslog rate limit configuration (#12490 ) - Why I did it Support syslog rate limit configuration feature - How I did it Remove unused rsyslog.conf from containers Modify docker startup script to generate rsyslog.conf from template files Add metadata/init data for syslog rate limit configuration - How to verify it Manual test New sonic-mgmt regression cases	2022-12-20 10:53:58 +02:00
Junchao-Mellanox	3b3837a636	[containercfgd] Add containercfgd and syslog rate limit configuration support (#12489 ) * [containercfgd] Add containercfgd and syslog rate limit configuration support * Fix build issue * Fix checker issue * Fix review comment * Fix review comment * Update containercfgd.py	2022-12-08 08:58:35 -08:00
arlakshm	a85b34fd36	update notify-keyspace-events in redis.conf (#12540 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com Why I did it closes #12343 Today in SONiC the notify-keyspace-events is from DbInterface class when application try do any configdb set. In Chassis the chassis_db may not get any configdb set operations, so there is chance this configuration will never be set. So the chassis_db updates from one line card will not be propogated to other linecards, which are doing a psubscribe to get these event. How I did it update the redis.conf to set notify-keyspace-events AKE so that the notify-keyspace-events are set when the redis instance is started How to verify it Test on chassis	2022-10-28 18:28:57 -07:00
Hua Liu	45ded68d8d	Fix docker database flush_unused_database failed issue (#11600 ) #### Why I did it Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597 When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update. #### How I did it Change flush_unused_database code to use swsscommon API: Change get_instancelist to getInstanceList. Change get_dblist to getDbList. #### How to verify it Pass all E2E test. Manually check syslog make sure error log not exist and swss, syncd, bgp service started. Search code in Azure make sure there all similer case are fixed in this PR. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597 When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged) Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>	2022-08-03 10:18:00 +08:00
Hua Liu	a9b7a1facd	Replace swsssdk with swsscommon (#11215 ) #### Why I did it Update scripts in sonic-buildimage from py-swsssdk to swsscommon #### How I did it Change code to use swsscommon. #### How to verify it Pass all E2E test case #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Update scripts in sonic-buildimage from py-swsssdk to swsscommon #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)	2022-07-11 10:01:10 +08:00
abdosi	0285bfe42e	[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500 ) What/Why I did: Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created. Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface. Issue 2: Use of IPVLAN vs MACVLAN Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted. Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address. Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server. Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle Solution: Added explicit PING from namespace that will check this reachability. Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet. Solution: Handle gracefully via try..except and log the message.	2022-05-24 16:54:12 -07:00
Kalimuthu-Velappan	bc30528341	Parallel building of sonic dockers using native dockerd(dood). (#10352 ) Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are created with a fixed docker name and common to all the users. docker-database:latest docker-swss:latest When multiple builds are triggered on the same build server that creates parallel building issue because all the build jobs are trying to create the same docker with latest tag. This happens only when sonic dockers are built using native host dockerd for sonic docker image creation. This patch creates all sonic dockers as user sonic dockers and then, while saving and loading the user sonic dockers, it rename the user sonic dockers into correct sonic dockers with tag as latest. docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for loading and saving the docker image.	2022-04-28 08:39:37 +08:00
Jason Lyu	b023c29a1e	[redis] Upgrade redis version (#9757 ) #### Why I did it The current redis version of SONiC is `6.0.6`, which contains many high-risky security issues like CVEs that are fixed in the latest version. The Redis release notes also highly recommend to upgrade with SECURITY urgency. ``` ================================================================================ Redis 6.0.16 Released Mon Oct 4 12:00:00 IDT 2021 ================================================================================ Upgrade urgency: SECURITY, contains fixes to security issues. Security Fixes: * (CVE-2021-41099) Integer to heap buffer overflow handling certain string commands and network payloads, when proto-max-bulk-len is manually configured to a non-default, very large value [reported by yiyuaner]. * (CVE-2021-32762) Integer to heap buffer overflow issue in redis-cli and redis-sentinel parsing large multi-bulk replies on some older and less common platforms [reported by Microsoft Vulnerability Research]. * (CVE-2021-32687) Integer to heap buffer overflow with intsets, when set-max-intset-entries is manually configured to a non-default, very large value [reported by Pawel Wieczorkiewicz, AWS]. * (CVE-2021-32675) Denial Of Service when processing RESP request payloads with a large number of elements on many connections. * (CVE-2021-32672) Random heap reading issue with Lua Debugger [reported by Meir Shpilraien]. * (CVE-2021-32628) Integer to heap buffer overflow handling ziplist-encoded data types, when configuring a large, non-default value for hash-max-ziplist-entries, hash-max-ziplist-value, zset-max-ziplist-entries or zset-max-ziplist-value [reported by sundb]. * (CVE-2021-32627) Integer to heap buffer overflow issue with streams, when configuring a non-default, large value for proto-max-bulk-len and client-query-buffer-limit [reported by sundb]. * (CVE-2021-32626) Specially crafted Lua scripts may result with Heap buffer overflow [reported by Meir Shpilraien]. Other bug fixes: * Fix appendfsync to always guarantee fsync before reply, on MacOS and FreeBSD (kqueue) (#9416) * Fix the wrong mis-detection of sync_file_range system call, affecting performance (#9371) * Fix replication issues when repl-diskless-load is used (#9280) ``` #### How I did it Edit `Dockerfile.j2` file #### How to verify it Check redis version #### Description for the changelog This PR will upgrade redis-server version to `6.0.16`.	2022-02-15 16:43:01 -08:00
Brian O'Connor	002827f08e	[PINS] Add APPL_STATE_DB and response path log (#9082 ) - Add APPL_STATE_DB to database_config.json - Clear APPL_STATE_DB during SwSS container restarts - Add response path log file to logrotate config: responsepublisher.rec Co-authored-by: PINS Working Group <sonic-pins-subgroup@googlegroups.com>	2021-11-24 10:31:06 -08:00
Junhua Zhai	7de673cb5b	[gearbox] Use separator ':' for GB_ASIC_DB, GB_COUNTERS_DB and GB_FLEX_COUNTER_DB (#9100 ) Keep GB_ASIC_DB, etc consistent with the ones in sonic-swss-common/common/database_config.json	2021-10-28 10:27:52 -07:00
yozhao101	1a3cab43ac	[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-04 10:16:53 -07:00
Myron Sosyak	3bf60b3db2	[docker-database] Fix Python3 issue (#7700 ) #### Why I did it To avoid the following error ``` Traceback (most recent call last): File "/usr/local/bin/flush_unused_database", line 10, in <module> if 'PONG' in output: TypeError: a bytes-like object is required, not 'str' ``` `communicate` method returns the strings if streams were opened in text mode; otherwise, bytes. In our case text arg in Popen is not true and that means that `communicate` return the bytes #### How I did it Set `text=True` to get strings instead of bytes #### How to verify it run `/usr/local/bin/flush_unused_database` inside database container	2021-05-31 05:36:24 -07:00
Myron Sosyak	5ab300b626	Fix python version (#7658 ) #### Why I did it To avoid the following logs ``` Mar 15 15:52:04.599302 igk-dut-04 INFO database#/supervisord: flushdb /bin/bash: /usr/local/bin/flush_unused_database: /usr/bin/python: bad interpreter: No such file or directory Mar 15 15:52:04.599947 igk-dut-04 INFO database#supervisord 2021-03-15 15:52:04,599 INFO exited: flushdb (exit status 126; not expected) ``` #### How I did it Fix shebang #### How to verify it Check the logs	2021-05-20 15:47:46 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Joe LeVeque	d40c9a1e8d	[docker-base-buster][docker-config-engine-buster] No longer install Python 2 (#6162 ) - Why I did it As part of migrating SONiC codebase from Python 2 to Python 3 - How I did it - No longer install Python 2 in docker-base-buster or docker-config-engine-buster. - Install Python 2 and pip2 in the following containers until we can completely eliminate it there: - docker-platform-monitor - docker-sonic-mgmt-framework - docker-sonic-vs - Pin pip2 version <21 where it is still temporarily needed, as pip version 21 will drop support for Python 2 - Also preform some other cleanup, ensuring that pip3, setuptools and wheel packages are installed in docker-base-buster, and then removing any attempts to re-install them in derived containers	2020-12-25 21:29:25 -08:00
mprabhu-nokia	41012f791e	In modular chassis, add CHASSIS_STATE_DB on control card (#5624 ) HLD: Azure/SONiC#646 In modular chassis, add CHASSIS_STATE_DB on control card Why I did it Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/ How I did it Adding another DB on an existing REDIS instance running on port 6380.	2020-12-15 17:15:00 -08:00
Dong Zhang	b2a3de5f4f	[MultiDB] add mutidb warmboot support - restoring database (#5773 ) * restoring each database with all data before warmboot and then flush unused data in each instance, following the multiDB warmboot design at https://github.com/Azure/SONiC/blob/master/doc/database/multi_database_instances.md * restore needs to be done in database docker since we need to know the database_config.json in new version * copy all data rdb file into each instance restoration location andthen flush unused database * other logic is the same as before * backing up database part is in another PR at sonic-utilities https://github.com/Azure/sonic-utilities/pull/1205, they depend on each other	2020-12-10 11:06:19 -08:00
Samuel Angebault	8576911a57	[database-chassis]: Fix the way database-chassis start (#6099 ) The service crash when the platform boots due to missing waits. /usr/bin/database.sh tries to operate on a missing socket and fails. We now wait for the chassis database to be ready the same way we do database.	2020-12-04 10:09:35 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
dflynn-Nokia	ac3a605c75	[build]: ARM build: Download redis-tools and redis-server from sonicstorage (#5797 ) Prevent intermittent build failures when building Sonic for the ARM platform architecture due to version upgrades of the redis-tools and redis-server packages. Modify select Dockerfile templates to download the redis-tools and redis-server packages from sonicstorage rather than from debian.org. This PR has been made possible by the inclusion of ARM versions of redis-tools and redis-server into sonicstorage as described in Issue# 5701	2020-11-04 09:31:06 -08:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
BrynXu	29928c93a1	[chassis]: Use correct path for chassisdb.conf file (#5632 ) use correct chassisdb.conf path while bringing up chassis_db service on VoQ modular switch.chassis_db service on VoQ modular switch. resolves #5631 Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-21 01:40:04 -07:00
BrynXu	a2e3d2fcea	[ChassisDB]: bring up ChassisDB service (#5283 ) bring up chassisdb service on sonic switch according to the design in Distributed Forwarding in VoQ Arch HLD Signed-off-by: Honggang Xu <hxu@arista.com> - Why I did it To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](`90c1289eaf/doc/chassis/architecture.md`). - How I did it Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD. - How to verify it ChassisDB service won't start without chassisdb.conf file on the existing platforms. ChassisDB service is accessible with global.conf file in the distributed arichitecture. Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-14 15:15:24 -07:00
Syd Logan	0311a4a037	Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851 ) * buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature * scripts and configuration needed to support a second syncd docker (physyncd) * physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device * support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY). HLD is located at `b817a12fd8/doc/gearbox/gearbox_mgr_design.md` - Why I did it This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based on multi-switch support in sonic-sairedis. - How I did it Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point): https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged) https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged) https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib - How to verify it In a vslib build: root@sonic:/home/admin# show gearbox interfaces status PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin -------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ ------- 1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down 1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down 1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down In addition, docker ps \| grep phy should show a physyncd docker running. Signed-off-by: syd.logan@broadcom.com	2020-09-25 08:32:44 -07:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Prince Sunny	ae9a86fac4	Add new DB for Restapi to database config (#5350 )	2020-09-16 19:02:47 -07:00
Qi Luo	d4fc8e5b22	[redis] Use redis-server and redis-tools in blob storage to prevent upstream link broken (#5340 ) * [redis] Use redis-server and redis-tools in blob storage to prevent upstream link broken * Use curl instead of wget * Explicitly install dependencies	2020-09-08 19:30:14 -07:00
Qi Luo	48b5792b07	[redis] Upgrade redis version (#5060 ) buster-backports updated and the old version disappeared	2020-07-28 20:50:31 -07:00
abdosi	fc6bcff52b	[sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838 ) * [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (in multi-npu). This change is triggered with issue found in multi-npu platforms where in docker namespace net.ipv6.conf.all.forwarding was 0 (should be 1) because of which RS/RA message were triggered and link-local router were learnt. Beside this there were some other sysctl.net.ipv6* params whose value in docker namespace is not same as host namespace. So to make we are always in sync in host and docker namespace created common file that list all sysctl.net.* params and used both by host and docker namespace. Any change will get applied to both namespace. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments and made sure to invoke augtool only one and do string concatenation of all set commands * Address Review Comments.	2020-07-12 18:08:51 +00:00
judyjoseph	1af68b3aa6	Support for connecting to DB in namespace via TCP port in multi-asic platform. (#4779 ) * Support for connecting to DB in namespace via IP:port ( using docker bridge network ) for applications in multi-asic platform. * Added the default IP as 127.0.0.1 if the IPaddress derivation from interface fails. Moved the localhost loopback IP binding logic into the supervisor.j2 file.	2020-07-12 18:08:51 +00:00
Qi Luo	6849a0351c	[redis] Install vanilla redis packages for Buster and Stretch; upgrade Buster to 6.0.5 (#4732 ) upgrade redis server to 5:6.0.5-1~bpo10+1	2020-06-27 01:17:20 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
joyas-joseph	cae67728f5	[docker-database]: Upgrade docker-database to buster (#4665 ) Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>	2020-05-29 03:29:49 -07:00
Guohan Lu	ddd6368e64	[docker-database]: do not generate pidfile for rsyslogd Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-05-22 11:01:28 -07:00
judyjoseph	acf465b43b	Multi DB with namespace support, Introducing the database_global.json… (#4477 ) * Multi DB with namespace support, Introducing the database_global.json file for supporting accessing DB's in other namespaces for service running in linux host * Updates based on comments * Adding the j2 templates for database_config and database_global files. * Updating to retrieve the redis DIR's to be mounted from database_global.json file. * Additional check to see if asic.conf file exists before sourcing it. * Updates based on PR comments discussion. * Review comments update * Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access. * Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier. * Removing the database_config.json file to avioid confusion in future. We use the database_config.json.j2 file to generate database_config.json files dynamically. * Update the comments for sudo usage in docker_image_ctrl.j2 * Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the PONG response is received when redis server is up. * Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli * Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.	2020-05-08 21:24:05 -07:00
Dong Zhang	340cf826a6	[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541 )	2020-05-06 15:41:28 -07:00
Dong Zhang	de5a04ad18	[MultiDB] : add persistence field for each redis instance (#4254 ) - add "persistence" field for each redis instance in database_config.json - we will use this information to decide if saving redis instance data while warm/fast reboot - before multiDB changes, SONiC uses "redis-cli save " to save all the data into rdb file on default instance on port 6379 - with multiDB changes, we plan to implement "sonic-db-cli save" to save all data to corresponding rdb files on all listed redis instances which has "persistence" field set "yes"	2020-04-07 21:01:39 -07:00
yozhao101	23ff55a709	[Services] Restart BGP service upon unexpected critical process exit. (#4207 )	2020-03-03 16:50:32 -08:00
yozhao101	729f343f77	[Services] Restart database service upon unexpected critical process exit. (#4138 ) * [database] Implement the auto-restart feature for database container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] Remove the duplicate dependency in service files. Since we already have updategraph ---> config_setup ---> database, we do not need explicitly add database.service in all other container service files. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Reorganize the line 73 in event listener script. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] update the file sflow.service.j2 to remove the duplicate dependency. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add comments in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Update the comments in line 56. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add parentheses for if statement in line 76 in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-02-11 14:03:02 -08:00
yozhao101	b7e48b422f	[Services] Allow monit system tool to monitor the critical processes status running in various SONiC containers. (#3940 ) * Add a monit config file for teamd container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file in teamd container into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for snmp container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of snmp container into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for dhcp_relay container in the dir base_image_files. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of dhcp_relay container into base image under /etc/monit/conf.d. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for router advertiser container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of router advertiser contianer into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Pmon] Add a monit config file for pmon container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Pmon] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Add a monit config file for lldp container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Add a monit config file for BGP container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Add a copy mechanism to put monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Add a monit config file for the swss container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Add a copy mechanism to put monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on barefoot platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-centec] Add a monit config file for syncd container on centen platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on centen platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit conifg file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell-armhf. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on nephos. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Add a monit config file for sflow container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Add a copy mechanism to put the monit conifg file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Add a monit config file for telemetry container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Add a monit config file for database container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Dhcprelay] Change a typo. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Dhcprelay] Change the process name in monit config file to dhcrelay. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no desserve process in syncd container on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process desserve in syncd container on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process named desserve in syncd on centec. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process named desserve in syncd on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Should not delete the process desserve in syncd container on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on marvell-armhf. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Radv] Change the process name to radvd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Correct a typo in monit_telemetry. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-teamd] Delete the monit config file for teamd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-teamd] Delete the mechanism to copy the monit config file into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-dhcprelay] Delete the monit config file for dhcp_relay container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-dhcprelay] Delete the mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-radv] Delete the monit config file foe radv container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-radv] Delete the mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] change the monit config file for BGP container such that monit only generates alert if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-snmp] Change the monit config file for snmp container such that monit only generates alret if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Change the monit config file for pmon container such that monit only generates alert if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Change the monit config file for lldp container such that monit only generates alerts if some processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Delete the monit config file for pmon container since some of processes are not running depended on the type of box. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Delete the copy mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Change the matching name for the process lldpd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Change the monit config file for swss container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on barefoot such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Correct a typo in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on broadcom such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on cavium such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell-arm64 such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell-armhf such that monit will generate alert if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on mellanox such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sycnd] Change the monit config file for syncd container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Change the monit config file for sflow container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Change the monit config file for telemetry container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Change the monit config file for database container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Use 4 spcess to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Use 4 spaces to replace 2 space in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-snmp] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on centec. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to repalce 2 spaces in the monit config file on nephos. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Remove the trailing extra spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-01-10 16:21:02 -08:00
Dong Zhang	1e7c458454	Add separator field to database_config.json (#3581 )	2019-10-08 18:10:38 -07:00
Dong Zhang	dd4a50dd61	[database-docker]: update multiBD config file (#3487 ) adding 'hostname' field and rename 'socket'	2019-09-20 09:34:32 -07:00
Dong Zhang	768beb79e1	create multiple Redis DB instances based on CONFIG at /etc/sonic/database_config.json (#2182 ) this is the first step to moving different databases tables into different database instances in this PR, only handle multiple database instances creation based on user configuration at /etc/sonic/database_config.json we keep current method to create single database instance if no extra/new DATABASE configuration exist in database_config.json file. if user try to configure more db instances at database_config.json , we create those new db instances along with the original db instance existing today. The configuration is as below, later we can add more db related information if needed: { ... "DATABASE": { "redis-db-01" : { "port" : "6380", "database": ["APPL_DB", "STATE_DB"] }, "redis-db-02" : { "port" : "6381", "database":["ASIC_DB"] }, } ... } The detail description is at design doc at Azure/SONiC#271 The main idea is : when database.sh started, we check the configuration and generate corresponding scripts. rc.local service handle old_config copy when loading new images, there is no dependency between rc.local and database service today, for safety and make sure the copy operation are done before database try to read it, we make database service run after rc.local Then database docker started, we check the configuration and generate corresponding scripts/.conf in database docker as well. based on those conf, we create databases instances as required. at last, we ping_pong check database are up and continue  Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com	2019-08-28 11:15:10 -07:00

1 2

79 Commits