2bc65aa7ba
### Why I did it Background running lua script may cause redis-server quite busy if batch size is 8192. If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash. ``` Aug 9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757 Aug 9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of ' Aug 9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error' Aug 9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error Aug 9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error ``` 88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua ##### Work item tracking - Microsoft ADO **24741990**: #### How I did it Change batch size from 8192 to 1024. #### How to verify it Run all test cases in sonic-mgmt to verify the system stability. ### Tested branch (Please provide the tested image version) - [x] 20220531.36
73 lines
2.3 KiB
Bash
Executable File
73 lines
2.3 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
SWSS_VARS_FILE=/usr/share/sonic/templates/swss_vars.j2
|
|
|
|
# Retrieve SWSS vars from sonic-cfggen
|
|
SWSS_VARS=$(sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t $SWSS_VARS_FILE) || exit 1
|
|
export platform=$(echo $SWSS_VARS | jq -r '.asic_type')
|
|
export sub_platform=$(echo $SWSS_VARS | jq -r '.asic_subtype')
|
|
|
|
MAC_ADDRESS=$(echo $SWSS_VARS | jq -r '.mac')
|
|
if [ "$MAC_ADDRESS" == "None" ] || [ -z "$MAC_ADDRESS" ]; then
|
|
MAC_ADDRESS=$(ip link show eth0 | grep ether | awk '{print $2}')
|
|
logger "Mac address not found in Device Metadata, Falling back to eth0"
|
|
fi
|
|
|
|
# Create a folder for SwSS record files
|
|
mkdir -p /var/log/swss
|
|
ORCHAGENT_ARGS="-d /var/log/swss "
|
|
|
|
# Set orchagent pop batch size to 1024
|
|
ORCHAGENT_ARGS+="-b 1024 "
|
|
|
|
# Set synchronous mode if it is enabled in CONFIG_DB
|
|
SYNC_MODE=$(echo $SWSS_VARS | jq -r '.synchronous_mode')
|
|
if [ "$SYNC_MODE" == "enable" ]; then
|
|
ORCHAGENT_ARGS+="-s "
|
|
fi
|
|
|
|
# Check if there is an "asic_id field" in the DEVICE_METADATA in configDB.
|
|
#"DEVICE_METADATA": {
|
|
# "localhost": {
|
|
# ....
|
|
# "asic_id": "0",
|
|
# }
|
|
#},
|
|
# ID field could be integers just to denote the asic instance like 0,1,2...
|
|
# OR could be PCI device ID's which will be strings like "03:00.0"
|
|
# depending on what the SAI/SDK expects.
|
|
asic_id=$(echo $SWSS_VARS | jq -r '.asic_id')
|
|
if [ -n "$asic_id" ]
|
|
then
|
|
ORCHAGENT_ARGS+="-i $asic_id "
|
|
fi
|
|
|
|
# for multi asic platforms add the asic name to the record file names
|
|
if [[ "$NAMESPACE_ID" ]]; then
|
|
ORCHAGENT_ARGS+="-f swss.asic$NAMESPACE_ID.rec -j sairedis.asic$NAMESPACE_ID.rec "
|
|
fi
|
|
|
|
# Add platform specific arguments if necessary
|
|
if [ "$platform" == "broadcom" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "cavium" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "nephos" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "centec" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "barefoot" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "vs" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
elif [ "$platform" == "mellanox" ]; then
|
|
ORCHAGENT_ARGS+=""
|
|
elif [ "$platform" == "innovium" ]; then
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
else
|
|
# Should we use the fallback MAC in case it is not found in Device.Metadata
|
|
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
|
|
fi
|
|
|
|
exec /usr/bin/orchagent ${ORCHAGENT_ARGS}
|