sonic-buildimage/files/scripts/asic_status.py
mprabhu-nokia 3fd6e8d500
[systemd] ASIC status based service bringup on VOQ chassis (#7477)
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@
There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up.

Why I did it
For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state.

How I did it
Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers.
For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts.
For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips.
Management dockers are decoupled from ASIC docker creation.
2021-07-27 23:02:49 -07:00

79 lines
2.4 KiB
Python
Executable File

#!/usr/bin/env python3
"""
bootstrap-asic
"""
try:
import re
import sys
from sonic_py_common import daemon_base
from swsscommon import swsscommon
from sonic_py_common import multi_asic
from sonic_py_common.logger import Logger
except ImportError as e:
raise ImportError(str(e) + " - required module not found")
#
# Constants ====================================================================
#
SYSLOG_IDENTIFIER = 'asic_status.py'
CHASSIS_ASIC_INFO_TABLE = 'CHASSIS_ASIC_TABLE'
SELECT_TIMEOUT_MSECS = 5000
def main():
logger = Logger(SYSLOG_IDENTIFIER)
logger.set_min_log_priority_info()
if len(sys.argv) != 3:
raise Exception('Pass service and valid asic-id as arguments')
service = sys.argv[1]
args_asic_id = sys.argv[2]
# Get num asics
num_asics = multi_asic.get_num_asics()
if num_asics == 0:
logger.log_error('Detected no asics on this platform for service {}'.format(service))
sys.exit(1)
# Connect to STATE_DB and subscribe to chassis-module table notifications
state_db = daemon_base.db_connect("CHASSIS_STATE_DB")
sel = swsscommon.Select()
sst = swsscommon.SubscriberStateTable(state_db, CHASSIS_ASIC_INFO_TABLE)
sel.addSelectable(sst)
while True:
(state, c) = sel.select(SELECT_TIMEOUT_MSECS)
if state == swsscommon.Select.TIMEOUT:
continue
if state != swsscommon.Select.OBJECT:
continue
(asic_key, asic_op, asic_fvp) = sst.pop()
asic_id=re.search(r'\d+$', asic_key)
global_asic_id = asic_id.group(0)
if asic_op == 'SET':
asic_fvs = dict(asic_fvp)
asic_name = asic_fvs.get('name')
if asic_name is None:
logger.log_info('Unable to get asic_name for asic{}'.format(global_asic_id))
continue
if asic_name.startswith('FABRIC-CARD') is False:
logger.log_info('Skipping module with asic_name {} for asic{}'.format(asic_name, global_asic_id))
continue
if (global_asic_id == args_asic_id):
logger.log_info('Detected asic{} is online'.format(global_asic_id))
sys.exit(0)
elif asic_op == 'DEL':
logger.log_info('Detected asic{} is offline'.format(global_asic_id))
sys.exit(1)
else:
continue
if __name__ == "__main__":
main()