sonic-buildimage/dockers/docker-orchagent/orchagent.sh
Zhaohui Sun 2bc65aa7ba
[202305]Change orchagent pop batch size from 8192 to 1024 (#16127)
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.

```
Aug  9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug  9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug  9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug  9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug  9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```

88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua
##### Work item tracking
- Microsoft ADO **24741990**:

#### How I did it
Change batch size from 8192 to 1024.

#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.

### Tested branch (Please provide the tested image version)

- [x] 20220531.36
2023-08-14 17:53:08 -07:00

73 lines
2.3 KiB
Bash
Executable File

#!/usr/bin/env bash
SWSS_VARS_FILE=/usr/share/sonic/templates/swss_vars.j2
# Retrieve SWSS vars from sonic-cfggen
SWSS_VARS=$(sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t $SWSS_VARS_FILE) || exit 1
export platform=$(echo $SWSS_VARS | jq -r '.asic_type')
export sub_platform=$(echo $SWSS_VARS | jq -r '.asic_subtype')
MAC_ADDRESS=$(echo $SWSS_VARS | jq -r '.mac')
if [ "$MAC_ADDRESS" == "None" ] || [ -z "$MAC_ADDRESS" ]; then
MAC_ADDRESS=$(ip link show eth0 | grep ether | awk '{print $2}')
logger "Mac address not found in Device Metadata, Falling back to eth0"
fi
# Create a folder for SwSS record files
mkdir -p /var/log/swss
ORCHAGENT_ARGS="-d /var/log/swss "
# Set orchagent pop batch size to 1024
ORCHAGENT_ARGS+="-b 1024 "
# Set synchronous mode if it is enabled in CONFIG_DB
SYNC_MODE=$(echo $SWSS_VARS | jq -r '.synchronous_mode')
if [ "$SYNC_MODE" == "enable" ]; then
ORCHAGENT_ARGS+="-s "
fi
# Check if there is an "asic_id field" in the DEVICE_METADATA in configDB.
#"DEVICE_METADATA": {
# "localhost": {
# ....
# "asic_id": "0",
# }
#},
# ID field could be integers just to denote the asic instance like 0,1,2...
# OR could be PCI device ID's which will be strings like "03:00.0"
# depending on what the SAI/SDK expects.
asic_id=$(echo $SWSS_VARS | jq -r '.asic_id')
if [ -n "$asic_id" ]
then
ORCHAGENT_ARGS+="-i $asic_id "
fi
# for multi asic platforms add the asic name to the record file names
if [[ "$NAMESPACE_ID" ]]; then
ORCHAGENT_ARGS+="-f swss.asic$NAMESPACE_ID.rec -j sairedis.asic$NAMESPACE_ID.rec "
fi
# Add platform specific arguments if necessary
if [ "$platform" == "broadcom" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "cavium" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "nephos" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "centec" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "barefoot" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "vs" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
elif [ "$platform" == "mellanox" ]; then
ORCHAGENT_ARGS+=""
elif [ "$platform" == "innovium" ]; then
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
else
# Should we use the fallback MAC in case it is not found in Device.Metadata
ORCHAGENT_ARGS+="-m $MAC_ADDRESS"
fi
exec /usr/bin/orchagent ${ORCHAGENT_ARGS}