* Corefile uploader service
1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.
* Remove rw permission for .rc file for group & others.
* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.
* Azure storage upload requires python module futures, hence added it to install list.
* Removed trailing spaces.
* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
if yes the software reboot cause file will be treated as the reboot cause.
finish
2. check whether platform api returns a reboot cause.
if yes it is treated as the reboot cause.
finish.
3. check whether /hosts/reboot-cause contains a cause.
if yes it is treated as the cause otherwise return unknown.
* [process-reboot-cause]Fix review comments
* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG
* [process-reboot-cause]address comments
* refactor the code flow
* Remove escape
* Remove extra ':'
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.
It would be safer to sed the file contents into a new file and switch
new file with the old file.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* ZTP infrastructure changes to support DHCP discovery provisioning data
- Dynamically generate DHCP client configuration based on current ZTP state
- Added support to request and process hostname when using DHCPv6
- Do not process graphservice url dhcp option if ZTP is enabled, ZTP service
will process it
- Generate /e/n/i file with all active interfaces seeking address assignment
via DHCP. Only interfaces that are created in Linux will be added to /e/n/i.
Also DHCP is started only on linked up in-band interfaces.
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.
* Create a SONiC configuration management service
* Perform config db migration after loading config_db.json to redis DB
* Migrate config-setup post migration hooks on image upgrade
config-setup post migration hooks help user to migrate configurations from
old image to new image. If the installed hooks are user defined they will not
be part of the newly installed image. So these hooks have to be migrated to
new image and only then they can be executing when the new image is booting.
The changes in this fix migrate config-setup post-migration hooks and ensure
that any hooks with the same filename in newly installed image are not
overwritten.
It is expected that users install new hooks as per their requirement and
not edit existing hooks. Any changes to existing hooks need to be done as
part of new image and not post bootup.
* Build sonic-ztp package
- Add changes in make rules to conditionally include sonic-ztp package
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
The sflow service should not start unless the swss service is started. However, if this service is not started, the sflow service should not attempt to start them, instead it should simply fail to start. Using Requisite=, we will achieve this behavior, whereas using Requires= will cause the required service to be started.
This PR is to handle the issue 3527.
When device boots up, NTP throws a traceback as explained in the issue 3527.
- Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”.
- Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback.
- This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.
Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
* In the event of a kernel crash, we need to gather as much information
as possible to understand and identify the root cause of the crash.
Currently, the kernel does not provide much information, which make
kernel crash investigation difficult and time consuming.
Fortunately, there is a way in the kernel to provide more information
in the case of a kernel crash. kdump is a feature of the Linux kernel
that creates crash dumps in the event of a kernel crash. This PR
will add kermel kdump support.
An extension to the CLI utilities config and show is provided to
configure and manage kdump:
- enable / disable kdump functionality
- configure kdump (how many kernel crash logs can be saved, memory
allocated for capture kernel)
- view kernel crash logs
* Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml
* Fix bgp templates
* Add community for loopback when bgpd is isolated
* Use correct community value
We noticed in tests/production that there is a low probability failure
where /etc/hosts could have some garbage characters before the entry for
local host name. The consequence is that all sudo command would be very
slow. In extreme cases it would prevent some services from starting
properly.
I suspect that the /etc/hosts file might be opened by some process causing
the issue. Editing contents with new file level and replace the whole file
should be safer.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
While doing CLI changes for SNMP configuration, few changes are made in backend to handle the modified CLI.
** Changes**
- "community" for "snmp trap" is also made as "configurable". snmpd_conf.j2 is modified to handle the same.
- Changed the snmp.yml file generation from postStartAction to preStartAction in docker_image_ctl.j2 specific to SNMP docker, to ensure that the snmp.yml is generated before sonic-cfggen generates the snmpd.conf.
- Changed to make the code common for management vrf and default vrf. Users can configure snmp trap and snmp listening IP for both management vrf and default vrf.
- after reloading minigraph, write latest version string in the DB.
- if old config_db.json file exists, use it and migrate to latest version.
- only reload minigraph when config_db.json doesn't exist and minigraph
exists.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Issue Overview
shutdown flow
For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following:
INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use.
INFO syncd.sh[23597]: Unloading sx_core[FAILED]
INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use
config reload & service swss.restart
In the flows like "config reload" and "service swss restart", the failure cause further consequences:
sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16"
syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE"
swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases.
reboot, warm-reboot & fast-reboot
In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched.
summary
To summarize, we have to come up with a way to ensure:
shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow;
don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time.
for "reboot" flow, either order is acceptable.
Solution
To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot.
- How I did it
To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence.
Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline.
This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait()
To start pmon automatically after syncd started.
slave.mk: add SONIC_PLATFORM_API_PY2 as dependency of host
sonic_debian_extension.j2: install sonic_daemon_base and Mellanox-specific sonic_platform on host
mlnx-platform-api.mk: export mlnx_platform_api_py2_wheel_path for sonic_debian_extension.j2
sonic-daemon-base.mk: export daemon_base_py2_wheel_path for sonic_debian_extension.j2
daemon_base.py: hind unnecessary dependency of swss_common on host
* [SNMP] management VRF SNMP support
This commit adds SNMP support for Management VRF using l3mdev.
The patch included provides VRF support, there is no single
"listendevice" configuration, rather multiple agentaddress
config options can each have their own "interface" to bind to
using "ip%interface". The snmpd.conf file is accordingly
generated using the snmp.yml file and redis database info.
Adding below the comments of SNMP patch 1376
--------------------------------------------
Since the Linux kernel added support for Virtual Routing
and Forwarding (VRF) in version 4.3
(Note: these won't compile on non-linux platforms)
https://www.kernel.org/doc/Documentation/networking/vrf.txt
Linux users could not use snmpd in its current form to
bind specific listening IP addresses to specific VRF
devices. A simplified description of a VRF inteface
is an interface that is a master (a container of sorts)
that collects a set of physicalinterfaces to form a
routing table.
This set of two patches (one for V5-7-patches and one
for V5-8-patches branches) is almost identical to patch
single "listendevice" configuration. Rather, multiple
agentAddress config options can each have their own
"interface" to bind to using the <ip>%<interface>
syntax.</interface></ip>
-------------------------------------------
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
This commit adds NTP support for management VRF using L3mdev. Config vrf add
mgmt will enable management VRF, enslave the eth0 device to the master device
mgmt, stop ntp service in default, restart interfaces-configs and restart ntp
service in mgmt-vrf context. Requirement and design are covered in mgmt vrf
design document.
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
Introduce a new "sflow" container (if ENABLE_SFLOW is set). The new docker will include:
hsflowd : host-sflow based daemon is the sFlow agent
psample : Built from libpsample repository. Useful in debugging sampled packets/groups.
sflowtool : Locally dump sflow samples (e.g. with a in-unit collector)
In case of SONiC-VS, enable psample & act_sample kernel modules.
VS' syncd needs iproute2=4.20.0-2~bpo9+1 & libcap2-bin=1:2.25-1 to support tc-sample
tc-syncd is provided as a convenience tool for debugging (e.g. tc-syncd filter show ...)
* Use dot1p to tc mapping for backend switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Do not write DSCP to TC mapping into CONFIG_DB or config_db.json for
storage switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script
Signed-off-by: Danny Allen <daall@microsoft.com>
* Update interval for running cron job
* Respond to feedback
* Change syslog id
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
* Use dot1p to tc mapping for backend switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Do not write DSCP to TC mapping into CONFIG_DB or config_db.json for
storage switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
[build_debian] Generate checksum of ASIC config files
* Adds script to generate checksums for ASIC config files
* Adds step to build_debian that copies ASIC config checksum into SONiC filesystem
Signed-off-by: Danny Allen daall@microsoft.com
this is the first step to moving different databases tables into different database instances
in this PR, only handle multiple database instances creation based on user configuration at /etc/sonic/database_config.json
we keep current method to create single database instance if no extra/new DATABASE configuration exist in database_config.json file.
if user try to configure more db instances at database_config.json , we create those new db instances along with the original db instance existing today.
The configuration is as below, later we can add more db related information if needed:
{
...
"DATABASE": {
"redis-db-01" : {
"port" : "6380",
"database": ["APPL_DB", "STATE_DB"]
},
"redis-db-02" : {
"port" : "6381",
"database":["ASIC_DB"]
},
}
...
}
The detail description is at design doc at Azure/SONiC#271
The main idea is : when database.sh started, we check the configuration and generate corresponding scripts.
rc.local service handle old_config copy when loading new images, there is no dependency between rc.local and database service today, for safety and make sure the copy operation are done before database try to read it, we make database service run after rc.local
Then database docker started, we check the configuration and generate corresponding scripts/.conf in database docker as well.
based on those conf, we create databases instances as required.
at last, we ping_pong check database are up and continue
Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
As well convert priority to integer for use by sort.
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
when device disk is small, do not unzip dockerfs.tar.gz on disk.
keep the tar file on the disk, unzip to tmpfs in the initrd phase.
enabled this for 7050-qx32
Signed-off-by: Guohan Lu <gulv@microsoft.com>
- What I did
Move the enabling of Systemd services from sonic_debian_extension to a new systemd generator
- How I did it
Create a new systemd generator to manually create symlinks to enable systemd services
Add rules/Makefile to build generator
Add services to be enabled to /etc/sonic/generated_services.conf to be read by the generator at boot time
Signed-off-by: Lawrence Lee <t-lale@microsoft.com>
ARM Architecture support in SONIC
make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH]
SONIC_ARCH: default amd64
armhf - arm32bit
arm64 - arm64bit
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
This commit adds support for New feature management VRF using L3mdev. Added
commands to enable/disable management VRF. Config vrf add mgmt will enable
management VRF, enslave the eth0 device to the master device mgmt and restart
interfaces-configs in mgmt-vrf context.
management interface (eth0) can be configured using config interface eth0 ip
add command and removed using config interface eth0 ip remove command.
Requirement and design are covered in mgmt vrf design document. Currently show
command displays linux command output; will update show command display in next
PR after concluding what would be the output for the show commands. Added
metric for default routes in dhcp and static, any changes for metric will be
addressed subsequently after discussing.
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
* [warm reboot] save configuration after warm reboot
After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Update finalize-warmboot.sh
* Upgrade ifupdown2 to version 1.2.8
Required by ZTP to support ZTP over IPv6 transport
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
- Make sure that migrated DB contents persisted for next boot
- Make sure that db saved after warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Added debug symbols to many debug dockers.
* For debug images *only*:
1) Archive source files into debug image
2) Archived source is copied into /src
3) Created an empty dir /debug
4) Mount both /src as ro & /debug as rw into every docker
5) Login banner will give some details on /src & /debug
6) Devs can copy core file into /debug and view it from inside a container.
7) Dev may create all gdb logs and other data directly into /debug.
* Dropped redundant REDIS_TOOLS per review comments.
* Added debug symbols to frr package and hence FRR based BGP docker.
* 1) Moved dbg_files.sh to scripts/
2) Src directories to archive are now collected from individual Makefiles.
3) Added few more debug symbols
4) Added few more debug dockers.
Here after no more changes except per review comments.
To debug:
Install required version of debug image in Switch or VM.
Copy core file into /debug of host
Get into Docker
gdb /usr/bin/<daemon> -c /debug/<your core file>
set directory /src/... <-- inside gdb to get the source
For non-in-depth debugging:
Download corresponding debug Docker image (docker-...-dbg.gz) to your VM
Load the image
Run image with entrypoint as 'bash' with dir containing core mapped in.
Run gdb on the core.
* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
* [submodule] update sonic-linux-kernel
* update linux kernel version
* Fix many version strings
* update mellanox components (built with new kernel)
* [mlnx] add make files for SDK WJH libs
* Update arista driver submodule (#8)
Make the debian packaging point to a newer kernel version.
* Updated Makefile infrastructure to build debug images.
As a sample, platform/broadcom/docker-orchagent-brcm.mk is updated to add a docker-orchagent-brcm-dbg.gz target.
Now "BLDENV=stretch make target/docker-orchagent-brcm-dbg.gz" will build the debug image.
NOTE: If you don't specify NOSTRETcH=1, it implicitly calls "make stretch", which builds all stretch targets and that would include debug dockers too.
This debug image can be used in any linux box to inspect core file. If your module's external dependency can be suitably mocked, you my even manually run it inside.
"docker run -it --entrypoint=/bin/bash e47a8fb8ed38"
You may map the core file path to this docker run.
* Dropped the regular binary using DBG_PACKAGES and a small name change to help readability.
* Tweaked the changes to retain the existing behavior w.r.t INSTALL_DEBUG_TOOLS=y.
When this change ('building debug docker image transparently') is extended to all dockers, this flag would become redundant. Yet, there can be some test based use cases that rely on this flag.
Until after all the dockers gets their debug images by default and we switch all use cases of this flag to use the newly built debug images, we need to maintain the existing behavior.
* 1) slave.mk - Dropped unused Docker build args
2) Debug template builder: renamed build_dbg_j2.sh to build_debug_docker_j2.sh
3) Dropped insignifcant statement CMD from debug Docker file, as base docker has Entrypoint.
* Reverted some changes, per review comments.
"User, uid, guid, frr-uid & frr-guid" are required for all docker images, with exception of debug images.
* Get in sync with the new update that filters out dockers to be built (SONIC_STRETCH_DOCKERS_FOR_INSTALLERS) and build debug-dockers only for those to be built and debug target is available.
* Mkae a template for each target that can be shared by all platforms.
Where needed a platform entry can override the template.
This avoids duplication, hence easier to maintain.
* A small change, that can fit better with other targets too.
Just take the platform code and do the rest in template.
* Extended debug to all stretch based docker images
* 1) Combined all orchagent makefiles into one platform independent make under rules/docker-orchagent.mk
2) Extened debug image to all stretch dockers
* Changes per review comments:
1) Dropped LIBSAIREDIS_DBG from database, teamd, router-advertiser, telemetry, and platform-monitor docker*.mk files from _DBG_DEPENDS list
2) W.r.t docker make for syncd, moved DEPENDS from template to specific makefile and let the template has stuff that is applicable to all.
* 1) Corrected a copy/paste mistake
* Fixed a copy/paste bug
* The base syncd dockers follow a template, which defines the base docker as DOCKER_SYNCD_BASE instead of DOCKER_SYNCD_<platform code>. Fix the docker-syncd-<mlnx, bfn>.mk to use the new one.
[Yet to be tested locally]
* Fixed spelling mistake
* Enable build of dbg-sonic-broadcom.bin, which uses dbg-dockers in place of regular dockers, for dockers that build debug version. For dockers that do not build debug version, it uses the regular docker.
This debug bin is installable and usable in a DUT, just like a regular bin.
* Per review comments:
1) Share a single rule for final image for normal & debug flavors (e.g. sonic-broadcom.bin & sonic-broadcom-dbg.bin)
2) Put dbg as suffix in final image name.
3) Compared target/sonic-broadcom.bin.logs with & w/o fix to verify integrity of sonic-broadcom.bin
4) Compared target/sonic-broadcom.bin.logs with sonic-broadcom-dbg.bin.log for verification
This fix takes care of ONIE image only. The next PR will cover the rest.
The next PR, will also make debug image conditional with flag.
* Updated per comments.
Now that debug dockers are available, do not need a way to install debug symbols in regular dockers.
With this commit, when INSTALL_DEBUG_TOOLS=y is set, it builds debug dockers (for dockers that enable debug build) and the final image uses debug dockers. For dockers that do not enable debug build, regular dockers get used in the final image.
Note:
The debug dockers are explicitly named as <docker name>-dbg.gz. But there is no "-dbg" suffix for image.
Hence if you make two runs with and w/o INSTALL_DEBUG_TOOLS=y, you have complete set of regular dockers + debug dockers. But the image gets overwritten.
Hence if both regular & debug images are needed, make two runs, as one with INSTALL_DEBUG_TOOLS=y and one w/o. Make sure to copy/rename the final image, before making the second run.
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes
* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround
However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
* Add boot0 support for the 7280CR3
* Add platform and plugins for 7280CR3
* Add port config for 7280CR3
* Add platform_reboot for 7280CR3
* Add support for 7280CR3-32D4 based on the 7280CR3-32P4
* Update arista driver submodules
- Introduce new 7280CR3-32P4
- Improve to the led plugin for OSFP
* Fix showing systemd shutdown sequence when verbose is set
* Fix creation of kernel-cmdline file
Sometimes boot0 prints error
"mv: can't preserve ownership of '/mnt/flash/image-arsonic.xxxx/kernel-cmdline': Operation not permitted"
* Improve flash space usage during installation
Some older systems only have 2GB of flash available. Installing a second
image on these can prove to be challenging.
The new installation process moves the installer swi to memory in order
to avoid free up space from the flash before uncompressing it there.
It removes all the flash space usage spike and also improves the IO
since the installation is no more reading and writting to the flash at
the same time.
* Add support of 7060CX-32S-SSD
* 7260CX3: use inventory powerCycle procedures
* 7050QX-32S: use inventory powerCycle procedures
* 7050QX-32: use inventory powerCycle procedures
* platform: arista: add common platform_reboot
Replace platform_reboot by a link to new common for devices already
using a similar script.
* 7060CX-32S: use inventory powerCycle procedures
* Install python smbus in pmon
Some platform plugin need the python smbus library to perform some actions.
This installs the dependency.
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.
* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
- Add ebtables package, and install some filter rules:
1. ebtables -A FORWARD -d BGA -j DROP
2. ebtables -A FORWARD -p ARP -j DROP
Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- use superviord to manage process in frr docker
- intro separated configuration mode for frr
- bring quagga configuration template to frr.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
SONiC is a heavy writer to /var/log partition, we noticed that this
behavior causes certain flash drive to become read-only over time.
To avoid this issue, we mount /var/log parition on these devices as
tmpfs.
- Mount /var/log as tmpfs
- /var/log default size is 128M
- Adjust size according to existing var-log.ext4 file size.
- Adjust size to between 5% to 10% of total memory size.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Install ipaddress python package that has deprecated current ipaddr. ipaddress has backport to python2.7
* Install python ipaddress module as required by route_check.py sonic utility. BTW, ipaddress deprecates ipaddr and ipaddress has python2 backport
* Revert the old chaneg per review comments.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).
Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.
- What I did
removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it
build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog
Allow configuration of platform-specific interfaces
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
We are going to use initramfs hook for firmware upgrades
To install Arista hook:
- create folder /mnt/flash/<image dir>/platform/hooks/boot1/ from Aboot or
/host/<image dir>/platform/hooks/boot1/ from Sonic
- add executable script to created folder
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Add a log message for each notification of add/del TACACS server.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* Moved another syslog message from DEBUG to INFO to be able to see those notifications.
All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* [build]: put stretch debian packages under target/debs/stretch/
* in stretch build phase, all debian packages built in that stage are placed under target/debs/stretch directory.
* for python-based debian packages, since they are really the same for jessie and stretch, they are placed under target/python-debs directory.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Removed deployment_id_asn_map.yml from copy list
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* QoS config change: 1) DSCP mapping; 2) link pg/queue 6 to lossy buffer;
3) redistribute scheduler
Signed-off-by: Wenda <wenni@microsoft.com>
* Add scheduling weight to queue 2
Signed-off-by: Wenda <wenni@microsoft.com>
* Link pg/queue 2 to lossy buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the pg headroom for a7060-D48C8 50G
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for qos
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom size, and update egress lossy pool size accordingly
Signed-off-by: Wenda <wenni@microsoft.com>
* Update headroom pool size; Update ingress service pool and egress lossy
pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* a7260: update headroom pool size; Update ingress service pool and egress lossy pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades
* [sonic-utilities] Update submodule to include related changes
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.
Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.
- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.
- How to verify it
Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.