sonic-buildimage/check_install.py

70 lines
1.8 KiB
Python
Raw Normal View History

#!/usr/bin/env python3
import argparse
import pexpect
import sys
import time
def main():
parser = argparse.ArgumentParser(description='test_login cmdline parser')
parser.add_argument('-u', default="admin", help='login user name')
parser.add_argument('-P', default="YourPaSsWoRd", help='login password')
parser.add_argument('-p', type=int, default=9000, help='local port')
args = parser.parse_args()
login_prompt = 'sonic login:'
passwd_prompt = 'Password:'
cmd_prompt = "{}@sonic:~\$ $".format(args.u)
grub_selection = "The highlighted entry will be executed"
Fix vs check install login timeout issue (#11727) Why I did it Fix a build not stable issue: #11620 The vs vm has started successfully, but failed to wait for the message "sonic login:". There were 55 builds failed caused by the issue in the last 30 days. AzurePipelineBuildLogs | where startTime > ago(30d) | where type =~ "task" | where result =~ "failed" | where name =~ "Build sonic image" | where content contains "Timeout exceeded" | where content contains "re.compile('sonic login:')" | project-away content | extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']), replace("refs/heads/", "", sourceBranch)) | summarize FailedCount=dcount(buildId) by branchName branchName FailedCount master 37 202012 9 202106 4 202111 2 202205 1 201911 1 It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: ) The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console). https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f 2022-07-17T15:00:58.9198877Z [ 25.493855] rc.local[307]: + onie_disco_opt53=05 2022-07-17T15:00:58.9199330Z [ 25.595054] rc.local[307]: + onie_disco_router=10.0.2.2 2022-07-17T15:00:58.9199781Z [ 25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2 2022-07-17T15:00:58.9200252Z [ 25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2 2022-07-17T15:00:58.9200622Z [ 25.880920] 2022-07-17T15:00:58.9200745Z 2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0 2022-07-17T15:00:58.9201201Z 2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: 2022-07-17T15:00:58.9202309Z [ 26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin How I did it Input a newline when finished to run the script /etc/rc.local. If entering a newline, the message "sonic login:" will prompt again.
2022-08-29 20:19:58 -05:00
firsttime_prompt = 'firsttime_exit'
i = 0
while True:
try:
p = pexpect.spawn("telnet 127.0.0.1 {}".format(args.p), timeout=600, logfile=sys.stdout, encoding='utf-8')
break
except Exception as e:
print(str(e))
i += 1
if i == 10:
raise
time.sleep(1)
Remove the rw folder from the image after installing in KVM (#8746) * Remove the rw folder from the image after installing in KVM When the image is installed from within KVM and then loaded, some files (such as timer stamp files) are created as part of that bootup that then get into the final image. This can cause some side effects, such as systemd thinking that some persistent timers need to run because the last trigger time got missed. Therefore, at the end of the check_install.py script, remove the rw folder so that it doesn't exist in the image, and that when this image is started up in a KVM setup for the first time, it starts with a truly clean slate. Without this change, the issue seen was that for fstrim.timer, a stamp file would be present in /var/lib/systemd/timers (and for other timers that are marked as persistent). This would then cause fstrim.service to get started immediately when starting a QEMU setup if the timer for that service missed a trigger, and not wait 10 minutes after bootup. In the case of fstrim.timer, that means if the image was started in QEMU after next Monday, since that timer is scheduled to be triggered weekly. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Split installation of SONiC and test bootup into two separate scripts Just removing the rw directory causes other issues, since the first boot tasks no longer run since that file isn't present. Also, just recreating that file doesn't completely help, because there are some files that are moved from the /host folder into the base filesystem layer, and so are no longer available. Instead, split the installation of SONiC and doing the test bootup into two separate scripts and two separate KVM instances. The first KVM instance is the one currently being run, while the second one has the `-snapshot` flag added in, which means any changes to the disk image don't take effect. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-12-10 15:13:35 -06:00
# select default SONiC Image
p.expect(grub_selection)
Remove the rw folder from the image after installing in KVM (#8746) * Remove the rw folder from the image after installing in KVM When the image is installed from within KVM and then loaded, some files (such as timer stamp files) are created as part of that bootup that then get into the final image. This can cause some side effects, such as systemd thinking that some persistent timers need to run because the last trigger time got missed. Therefore, at the end of the check_install.py script, remove the rw folder so that it doesn't exist in the image, and that when this image is started up in a KVM setup for the first time, it starts with a truly clean slate. Without this change, the issue seen was that for fstrim.timer, a stamp file would be present in /var/lib/systemd/timers (and for other timers that are marked as persistent). This would then cause fstrim.service to get started immediately when starting a QEMU setup if the timer for that service missed a trigger, and not wait 10 minutes after bootup. In the case of fstrim.timer, that means if the image was started in QEMU after next Monday, since that timer is scheduled to be triggered weekly. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Split installation of SONiC and test bootup into two separate scripts Just removing the rw directory causes other issues, since the first boot tasks no longer run since that file isn't present. Also, just recreating that file doesn't completely help, because there are some files that are moved from the /host folder into the base filesystem layer, and so are no longer available. Instead, split the installation of SONiC and doing the test bootup into two separate scripts and two separate KVM instances. The first KVM instance is the one currently being run, while the second one has the `-snapshot` flag added in, which means any changes to the disk image don't take effect. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-12-10 15:13:35 -06:00
p.sendline()
Remove the rw folder from the image after installing in KVM (#8746) * Remove the rw folder from the image after installing in KVM When the image is installed from within KVM and then loaded, some files (such as timer stamp files) are created as part of that bootup that then get into the final image. This can cause some side effects, such as systemd thinking that some persistent timers need to run because the last trigger time got missed. Therefore, at the end of the check_install.py script, remove the rw folder so that it doesn't exist in the image, and that when this image is started up in a KVM setup for the first time, it starts with a truly clean slate. Without this change, the issue seen was that for fstrim.timer, a stamp file would be present in /var/lib/systemd/timers (and for other timers that are marked as persistent). This would then cause fstrim.service to get started immediately when starting a QEMU setup if the timer for that service missed a trigger, and not wait 10 minutes after bootup. In the case of fstrim.timer, that means if the image was started in QEMU after next Monday, since that timer is scheduled to be triggered weekly. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Split installation of SONiC and test bootup into two separate scripts Just removing the rw directory causes other issues, since the first boot tasks no longer run since that file isn't present. Also, just recreating that file doesn't completely help, because there are some files that are moved from the /host folder into the base filesystem layer, and so are no longer available. Instead, split the installation of SONiC and doing the test bootup into two separate scripts and two separate KVM instances. The first KVM instance is the one currently being run, while the second one has the `-snapshot` flag added in, which means any changes to the disk image don't take effect. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-12-10 15:13:35 -06:00
# bootup sonic image
while True:
Fix vs check install login timeout issue (#11727) Why I did it Fix a build not stable issue: #11620 The vs vm has started successfully, but failed to wait for the message "sonic login:". There were 55 builds failed caused by the issue in the last 30 days. AzurePipelineBuildLogs | where startTime > ago(30d) | where type =~ "task" | where result =~ "failed" | where name =~ "Build sonic image" | where content contains "Timeout exceeded" | where content contains "re.compile('sonic login:')" | project-away content | extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']), replace("refs/heads/", "", sourceBranch)) | summarize FailedCount=dcount(buildId) by branchName branchName FailedCount master 37 202012 9 202106 4 202111 2 202205 1 201911 1 It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: ) The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console). https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f 2022-07-17T15:00:58.9198877Z [ 25.493855] rc.local[307]: + onie_disco_opt53=05 2022-07-17T15:00:58.9199330Z [ 25.595054] rc.local[307]: + onie_disco_router=10.0.2.2 2022-07-17T15:00:58.9199781Z [ 25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2 2022-07-17T15:00:58.9200252Z [ 25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2 2022-07-17T15:00:58.9200622Z [ 25.880920] 2022-07-17T15:00:58.9200745Z 2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0 2022-07-17T15:00:58.9201201Z 2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: 2022-07-17T15:00:58.9202309Z [ 26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin How I did it Input a newline when finished to run the script /etc/rc.local. If entering a newline, the message "sonic login:" will prompt again.
2022-08-29 20:19:58 -05:00
i = p.expect([login_prompt, passwd_prompt, firsttime_prompt, cmd_prompt])
if i == 0:
# send user name
p.sendline(args.u)
elif i == 1:
# send password
p.sendline(args.P)
Fix vs check install login timeout issue (#11727) Why I did it Fix a build not stable issue: #11620 The vs vm has started successfully, but failed to wait for the message "sonic login:". There were 55 builds failed caused by the issue in the last 30 days. AzurePipelineBuildLogs | where startTime > ago(30d) | where type =~ "task" | where result =~ "failed" | where name =~ "Build sonic image" | where content contains "Timeout exceeded" | where content contains "re.compile('sonic login:')" | project-away content | extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']), replace("refs/heads/", "", sourceBranch)) | summarize FailedCount=dcount(buildId) by branchName branchName FailedCount master 37 202012 9 202106 4 202111 2 202205 1 201911 1 It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: ) The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console). https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f 2022-07-17T15:00:58.9198877Z [ 25.493855] rc.local[307]: + onie_disco_opt53=05 2022-07-17T15:00:58.9199330Z [ 25.595054] rc.local[307]: + onie_disco_router=10.0.2.2 2022-07-17T15:00:58.9199781Z [ 25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2 2022-07-17T15:00:58.9200252Z [ 25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2 2022-07-17T15:00:58.9200622Z [ 25.880920] 2022-07-17T15:00:58.9200745Z 2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0 2022-07-17T15:00:58.9201201Z 2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: 2022-07-17T15:00:58.9202309Z [ 26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin How I did it Input a newline when finished to run the script /etc/rc.local. If entering a newline, the message "sonic login:" will prompt again.
2022-08-29 20:19:58 -05:00
elif i == 2:
# fix a login timeout issue, caused by the login_prompt message mixed with the output message of the rc.local
time.sleep(1)
p.sendline()
else:
break
# check version
time.sleep(5)
p.sendline('uptime')
p.expect([cmd_prompt])
p.sendline('show version')
p.expect([cmd_prompt])
p.sendline('show ip bgp sum')
p.expect([cmd_prompt])
p.sendline('sync')
p.expect([cmd_prompt])
if __name__ == '__main__':
main()