Commit Graph

24 Commits

Author SHA1 Message Date
shlomibitton
c71c91e2b0
[202012] [Fastboot] Delay PMON service for better fastboot performance (#10745)
#### Why I did it

Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

#### How I did it

Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

#### How to verify it

Run fast-reboot on MLNX platform and observe faster create_switch execution time.
2022-05-15 23:31:32 -07:00
shlomibitton
bca8a244c6
[202012] [Fastboot] Delay LLDP service for better fastboot performance (#10568) (#10744)
This PR is to backport a fix #10568
This PR is dependent on PR: #10745

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
2022-05-15 15:05:29 +03:00
Tamer Ahmed
b42aef68f3 Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform
[mux] Start Mux on Only Dual-ToR Platform

mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b8f70f8986 Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.

Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y

Edit: linkmgrd PR will follow.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Related work items: #2315, #3146150
2021-11-10 18:54:33 -08:00
lguohan
162f0fdfe1
[init_cfg]: allow enable/disable swss/teamd/syncd services (#6291)
swss/teamd/syncd services were changed to always enabled
in commit fad481edc1 as a workaround
for not letting hostcfgd start service during the bootup process.

commit 317a4b3410 introduce
wait till full system bootup before updating feature states in hostcfgd.

Thus, workaround introduced in commit fad481ed can be removed

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-12-28 10:33:46 -08:00
Prabhu Sreenivasan
df13245b9f
[CRM] Add support for snat, dnat and ipmc crm resources (#6012)
Signed-off-by: Prabhu Sreenivasan prabhu.sreenivasan@broadcom

What I did
Added support for snat, dnat and ipmc resources under CRM module.

How I did it
New feature NAT adds new resources snat_enty and dnat_entry that needs to be monitored. ipmc_entry tracks IP multicast resources used by switch.

How to verify it
sonic-utilities tests and crm spytest
2020-12-23 06:15:53 -08:00
Renuka Manavalan
ba02209141
First cut image update for kubernetes support. (#5421)
* First cut image update for kubernetes support.
With this,
    1)  dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled
        for kube management
        init_cfg.json configure set_owner as kube for these

    2)  Each docker's start.sh updated to call container_startup.py to register going up
          As part of this call, it registers the current owner as local/kube and its version
          The images are built with its version ingrained into image during build

    3)  Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'.
         For all locally managed containers, it calls docker commands, hence no change for locally managed.
        
    4)  Introduced a new ctrmgrd service, that helps with transition between owners as  kube & local and carry over any labels update from STATE-DB to API server

    5)  hostcfgd updated to handle owner change

    6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image.

   7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.
2020-12-22 08:01:33 -08:00
Stephen Sun
e010d83fc3
[Dynamic buffer calc] Support dynamic buffer calculation (#6194)
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338)
- [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361)
- [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973)

**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
    - `dynamic` for the dynamic buffer calculation model
    - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
   - ASIC_TABLE in platform/\<vendor\>/asic_table.j2
   - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
   - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
   - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
   - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
   - fetch the vendor name when creates the docker and pass it as a docker environment variable
   - `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
   Update the buffer settings for dynamic buffer calculation
2020-12-13 11:35:39 -08:00
abdosi
fad481edc1
Enhanced Feature table to support 'always_enabled' value for state and auto-restart fields. (#6000)
Added new flag value 'always_enabled' for the state and auto-restart field of feature table

init_cfg.json is updated to initialize state field of database/swss/syncd/teamd feature and auto-restart field of database feature
as always_enabled

Once the state/auto-restart value is initialized as "always_enabled" it is immutable and cannot be change via feature config commands. (config feature..) PR#Azure/sonic-utilities#1271

hostcfgd will not take any action if state field value is 'always_enabled'

Since we have always_enabled field for auto-restart updated supervisor-proc-exit-listener
not to have special check for database and always rely on value from Feature table.
2020-11-25 08:41:11 -08:00
lguohan
e6796da141
[init_cfg.json.j2]: only enable gbsyncd feature for vs platform (#5815)
currently only vs platform has gdbsyncd feature built

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-07 00:46:18 -08:00
lguohan
07748a939f
[gbsyncd]: add gbsyncd to FEATURE table (#5683)
remove syncd from critical process list because
gbsyncd process will exit for platform without
gearbox.

closes #5623

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-10-27 11:40:23 -07:00
abdosi
75e4258508
Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358)
* Enhanced Feature Table state enable/disbale for multi-asic platforms.
In Multi-asic for some features we can service per asic so we need to
get list of all services.

Also updated logic to return if any one of systemctl command return failure
and make sure syslog of feature getting enable/disable only come when
all commads are sucessful.

Moved the service list get api from sonic-util to sonic-py-common

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Make sure to retun None for both service list in case of error.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Return empty list as fail condition

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Address Review Comments.

Made init_cfg.json.j2 knowledegable of Feature
service is global scope or per asic scope

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Fix merge conflict

* Address Review Comment.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-22 08:34:02 -07:00
shi-su
339cfbf9af
Remove the configuration of synchronous mode from init_cfg.json (#5308)
Remove the configuration of synchronous mode from init_cfg.json
2020-09-10 01:26:10 -07:00
Stepan Blyshchak
b31050d60e
[services][mgmt-framework] delay mgmt-framework service on boot (#5226)
management framework provides management plane services like rest and
CLI which is not needed right after boot, instead by delaying this
service we give some more CPU for data plane and control plane services
on fast/warm boot.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2020-08-27 21:53:58 +03:00
shi-su
f3feb56c8a
Add switch for synchronous mode (#5237)
Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1.  Configure mode while building an image
    `make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running 
    Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
    Restart swss with `systemctl restart swss`
2020-08-24 14:04:10 -07:00
Tamer Ahmed
e484ae9dda
[services] Fix Delay Start of SNMP And Telemetry (#5211)
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-19 19:27:59 -07:00
Prince Sunny
3fee094760
Enable restapi, update sonic-restapi (#5169)
* Enable restapi if included in image
* [Submodule update] sonic-restapi
2020-08-13 11:29:49 -07:00
lguohan
082c26a27d
[build]: combine feature and container feature table (#5081)
1. remove container feature table
2. do not generate feature entry if the feature is not included
   in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities

* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-05 13:23:12 -07:00
yozhao101
83738fca2f
[dockers] Default container autorestart feature to "enabled" for all except database (#4853)
Set the default auto_restart state to "enabled" in init_cfg.json for all containers except database

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-07-15 11:49:14 -07:00
Joe LeVeque
4e482c16ba
[build] Enable telemetry service by default (#4760)
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image

**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
2020-06-12 16:20:31 -07:00
Andriy Kokhan
540cc78038
[Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247. (#4250)
* [Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247.

Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>
2020-03-12 16:11:15 -07:00
Prince Sunny
31fb631cd3
Fix service and container name to be same (#4151) 2020-02-14 11:08:57 -08:00
yozhao101
41958aad52
[init_cfg.json] Add new FEATURE and CONTAINER_FEATURE tables (#4137)
* [init_cfg.json] Add a new table CONTAINER_FEATURE.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [init_cfg.json] Update the content of table CONTAINER_FEATURE.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [init_cfg.json] Use the template to generate the table
CONTAINER_FEATURE.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [init_cfg.json] Add a new table FEATURE.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [init_cfg.json] Change the order of container names according to
alphabetical order.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [init_cfg.json] Change the dhcp_relay container name and add rest-api.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-02-11 11:05:21 -08:00
yozhao101
3bb61ab10c
[init_cfg.json] Maintain a separate init_cfg.json.j2 template file (#4092) 2020-02-07 12:35:35 -08:00