2017-01-29 13:33:33 -06:00
|
|
|
#!/bin/bash
|
|
|
|
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
function wait_networking_service_done() {
|
|
|
|
local -i _WDOG_CNT="1"
|
|
|
|
local -ir _WDOG_MAX="30"
|
|
|
|
|
|
|
|
local -r _TIMEOUT="1s"
|
|
|
|
|
|
|
|
while [[ "${_WDOG_CNT}" -le "${_WDOG_MAX}" ]]; do
|
|
|
|
networking_status="$(systemctl is-active networking 2>&1)"
|
|
|
|
|
|
|
|
if [[ "${networking_status}" == active || "${networking_status}" == inactive || "${networking_status}" == failed ]] ; then
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
echo "interfaces-config: networking service is running, wait for it done"
|
|
|
|
|
|
|
|
let "_WDOG_CNT++"
|
|
|
|
sleep "${_TIMEOUT}"
|
|
|
|
done
|
|
|
|
|
|
|
|
echo "interfaces-config: networking service is still running after 30 seconds, killing it"
|
|
|
|
systemctl kill networking 2>&1
|
|
|
|
}
|
|
|
|
|
2022-04-06 08:59:47 -05:00
|
|
|
if [[ $(ifquery --running eth0) ]]; then
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
wait_networking_service_done
|
2022-04-06 08:59:47 -05:00
|
|
|
ifdown --force eth0
|
|
|
|
fi
|
2018-04-01 23:36:43 -05:00
|
|
|
|
2019-12-10 10:16:56 -06:00
|
|
|
# Check if ZTP DHCP policy has been installed
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
if [[ -e /etc/network/ifupdown2/policy.d/ztp_dhcp.json ]]; then
|
2019-12-10 10:16:56 -06:00
|
|
|
# Obtain port operational state information
|
|
|
|
redis-dump -d 0 -k "PORT_TABLE:Ethernet*" -y > /tmp/ztp_port_data.json
|
|
|
|
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
if [[ $? -ne 0 || ! -e /tmp/ztp_port_data.json || "$(cat /tmp/ztp_port_data.json)" = "" ]]; then
|
2019-12-10 10:16:56 -06:00
|
|
|
echo "{}" > /tmp/ztp_port_data.json
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Create an input file with ztp input information
|
|
|
|
echo "{ \"PORT_DATA\" : $(cat /tmp/ztp_port_data.json) }" > \
|
|
|
|
/tmp/ztp_input.json
|
|
|
|
else
|
|
|
|
echo "{ \"ZTP_DHCP_DISABLED\" : \"true\" }" > /tmp/ztp_input.json
|
|
|
|
fi
|
|
|
|
|
2020-08-17 17:46:52 -05:00
|
|
|
# Create /e/n/i file for existing and active interfaces, dhcp6 sytcl.conf and dhclient.conf
|
|
|
|
CFGGEN_PARAMS=" \
|
|
|
|
-d -j /tmp/ztp_input.json \
|
|
|
|
-t /usr/share/sonic/templates/interfaces.j2,/etc/network/interfaces \
|
|
|
|
-t /usr/share/sonic/templates/90-dhcp6-systcl.conf.j2,/etc/sysctl.d/90-dhcp6-systcl.conf \
|
|
|
|
-t /usr/share/sonic/templates/dhclient.conf.j2,/etc/dhcp/dhclient.conf \
|
|
|
|
"
|
|
|
|
sonic-cfggen $CFGGEN_PARAMS
|
2017-11-05 01:31:29 -06:00
|
|
|
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
[[ -f /var/run/dhclient.eth0.pid ]] && kill `cat /var/run/dhclient.eth0.pid` && rm -f /var/run/dhclient.eth0.pid
|
|
|
|
[[ -f /var/run/dhclient6.eth0.pid ]] && kill `cat /var/run/dhclient6.eth0.pid` && rm -f /var/run/dhclient6.eth0.pid
|
2017-10-16 19:36:21 -05:00
|
|
|
|
2019-12-10 10:16:56 -06:00
|
|
|
for intf_pid in $(ls -1 /var/run/dhclient*.Ethernet*.pid 2> /dev/null); do
|
Fix race condition between networking service and interface-config service (#10573)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
2022-05-05 17:21:44 -05:00
|
|
|
[[ -f ${intf_pid} ]] && kill `cat ${intf_pid}` && rm -f ${intf_pid}
|
2019-12-10 10:16:56 -06:00
|
|
|
done
|
|
|
|
|
|
|
|
# Read sysctl conf files again
|
|
|
|
sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf
|
|
|
|
|
2017-10-16 19:36:21 -05:00
|
|
|
systemctl restart networking
|
|
|
|
|
2019-12-10 10:16:56 -06:00
|
|
|
# Clean-up created files
|
|
|
|
rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json
|