* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
* [submodule] update sonic-linux-kernel
* update linux kernel version
* Fix many version strings
* update mellanox components (built with new kernel)
* [mlnx] add make files for SDK WJH libs
* Update arista driver submodule (#8)
Make the debian packaging point to a newer kernel version.
* Updated Makefile infrastructure to build debug images.
As a sample, platform/broadcom/docker-orchagent-brcm.mk is updated to add a docker-orchagent-brcm-dbg.gz target.
Now "BLDENV=stretch make target/docker-orchagent-brcm-dbg.gz" will build the debug image.
NOTE: If you don't specify NOSTRETcH=1, it implicitly calls "make stretch", which builds all stretch targets and that would include debug dockers too.
This debug image can be used in any linux box to inspect core file. If your module's external dependency can be suitably mocked, you my even manually run it inside.
"docker run -it --entrypoint=/bin/bash e47a8fb8ed38"
You may map the core file path to this docker run.
* Dropped the regular binary using DBG_PACKAGES and a small name change to help readability.
* Tweaked the changes to retain the existing behavior w.r.t INSTALL_DEBUG_TOOLS=y.
When this change ('building debug docker image transparently') is extended to all dockers, this flag would become redundant. Yet, there can be some test based use cases that rely on this flag.
Until after all the dockers gets their debug images by default and we switch all use cases of this flag to use the newly built debug images, we need to maintain the existing behavior.
* 1) slave.mk - Dropped unused Docker build args
2) Debug template builder: renamed build_dbg_j2.sh to build_debug_docker_j2.sh
3) Dropped insignifcant statement CMD from debug Docker file, as base docker has Entrypoint.
* Reverted some changes, per review comments.
"User, uid, guid, frr-uid & frr-guid" are required for all docker images, with exception of debug images.
* Get in sync with the new update that filters out dockers to be built (SONIC_STRETCH_DOCKERS_FOR_INSTALLERS) and build debug-dockers only for those to be built and debug target is available.
* Mkae a template for each target that can be shared by all platforms.
Where needed a platform entry can override the template.
This avoids duplication, hence easier to maintain.
* A small change, that can fit better with other targets too.
Just take the platform code and do the rest in template.
* Extended debug to all stretch based docker images
* 1) Combined all orchagent makefiles into one platform independent make under rules/docker-orchagent.mk
2) Extened debug image to all stretch dockers
* Changes per review comments:
1) Dropped LIBSAIREDIS_DBG from database, teamd, router-advertiser, telemetry, and platform-monitor docker*.mk files from _DBG_DEPENDS list
2) W.r.t docker make for syncd, moved DEPENDS from template to specific makefile and let the template has stuff that is applicable to all.
* 1) Corrected a copy/paste mistake
* Fixed a copy/paste bug
* The base syncd dockers follow a template, which defines the base docker as DOCKER_SYNCD_BASE instead of DOCKER_SYNCD_<platform code>. Fix the docker-syncd-<mlnx, bfn>.mk to use the new one.
[Yet to be tested locally]
* Fixed spelling mistake
* Enable build of dbg-sonic-broadcom.bin, which uses dbg-dockers in place of regular dockers, for dockers that build debug version. For dockers that do not build debug version, it uses the regular docker.
This debug bin is installable and usable in a DUT, just like a regular bin.
* Per review comments:
1) Share a single rule for final image for normal & debug flavors (e.g. sonic-broadcom.bin & sonic-broadcom-dbg.bin)
2) Put dbg as suffix in final image name.
3) Compared target/sonic-broadcom.bin.logs with & w/o fix to verify integrity of sonic-broadcom.bin
4) Compared target/sonic-broadcom.bin.logs with sonic-broadcom-dbg.bin.log for verification
This fix takes care of ONIE image only. The next PR will cover the rest.
The next PR, will also make debug image conditional with flag.
* Updated per comments.
Now that debug dockers are available, do not need a way to install debug symbols in regular dockers.
With this commit, when INSTALL_DEBUG_TOOLS=y is set, it builds debug dockers (for dockers that enable debug build) and the final image uses debug dockers. For dockers that do not enable debug build, regular dockers get used in the final image.
Note:
The debug dockers are explicitly named as <docker name>-dbg.gz. But there is no "-dbg" suffix for image.
Hence if you make two runs with and w/o INSTALL_DEBUG_TOOLS=y, you have complete set of regular dockers + debug dockers. But the image gets overwritten.
Hence if both regular & debug images are needed, make two runs, as one with INSTALL_DEBUG_TOOLS=y and one w/o. Make sure to copy/rename the final image, before making the second run.
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes
* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround
However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
* Add boot0 support for the 7280CR3
* Add platform and plugins for 7280CR3
* Add port config for 7280CR3
* Add platform_reboot for 7280CR3
* Add support for 7280CR3-32D4 based on the 7280CR3-32P4
* Update arista driver submodules
- Introduce new 7280CR3-32P4
- Improve to the led plugin for OSFP
* Fix showing systemd shutdown sequence when verbose is set
* Fix creation of kernel-cmdline file
Sometimes boot0 prints error
"mv: can't preserve ownership of '/mnt/flash/image-arsonic.xxxx/kernel-cmdline': Operation not permitted"
* Improve flash space usage during installation
Some older systems only have 2GB of flash available. Installing a second
image on these can prove to be challenging.
The new installation process moves the installer swi to memory in order
to avoid free up space from the flash before uncompressing it there.
It removes all the flash space usage spike and also improves the IO
since the installation is no more reading and writting to the flash at
the same time.
* Add support of 7060CX-32S-SSD
* 7260CX3: use inventory powerCycle procedures
* 7050QX-32S: use inventory powerCycle procedures
* 7050QX-32: use inventory powerCycle procedures
* platform: arista: add common platform_reboot
Replace platform_reboot by a link to new common for devices already
using a similar script.
* 7060CX-32S: use inventory powerCycle procedures
* Install python smbus in pmon
Some platform plugin need the python smbus library to perform some actions.
This installs the dependency.
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.
* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
- Add ebtables package, and install some filter rules:
1. ebtables -A FORWARD -d BGA -j DROP
2. ebtables -A FORWARD -p ARP -j DROP
Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- use superviord to manage process in frr docker
- intro separated configuration mode for frr
- bring quagga configuration template to frr.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
SONiC is a heavy writer to /var/log partition, we noticed that this
behavior causes certain flash drive to become read-only over time.
To avoid this issue, we mount /var/log parition on these devices as
tmpfs.
- Mount /var/log as tmpfs
- /var/log default size is 128M
- Adjust size according to existing var-log.ext4 file size.
- Adjust size to between 5% to 10% of total memory size.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Install ipaddress python package that has deprecated current ipaddr. ipaddress has backport to python2.7
* Install python ipaddress module as required by route_check.py sonic utility. BTW, ipaddress deprecates ipaddr and ipaddress has python2 backport
* Revert the old chaneg per review comments.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).
Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.
- What I did
removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it
build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog
Allow configuration of platform-specific interfaces
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
We are going to use initramfs hook for firmware upgrades
To install Arista hook:
- create folder /mnt/flash/<image dir>/platform/hooks/boot1/ from Aboot or
/host/<image dir>/platform/hooks/boot1/ from Sonic
- add executable script to created folder
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Add a log message for each notification of add/del TACACS server.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* Moved another syslog message from DEBUG to INFO to be able to see those notifications.
All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* [build]: put stretch debian packages under target/debs/stretch/
* in stretch build phase, all debian packages built in that stage are placed under target/debs/stretch directory.
* for python-based debian packages, since they are really the same for jessie and stretch, they are placed under target/python-debs directory.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Removed deployment_id_asn_map.yml from copy list
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* QoS config change: 1) DSCP mapping; 2) link pg/queue 6 to lossy buffer;
3) redistribute scheduler
Signed-off-by: Wenda <wenni@microsoft.com>
* Add scheduling weight to queue 2
Signed-off-by: Wenda <wenni@microsoft.com>
* Link pg/queue 2 to lossy buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the pg headroom for a7060-D48C8 50G
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for qos
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom size, and update egress lossy pool size accordingly
Signed-off-by: Wenda <wenni@microsoft.com>
* Update headroom pool size; Update ingress service pool and egress lossy
pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* a7260: update headroom pool size; Update ingress service pool and egress lossy pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades
* [sonic-utilities] Update submodule to include related changes
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.
Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.
- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.
- How to verify it
Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.
* Perform stop/start of Mellanox driver tools for all types of reboot
* Don't set Mellanox FAST_BOOT option for "cold" reboot
* Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
* [security kernel] Upgrade kernel from 4.9.110-3+deb9u2 to 4.9.110-3+deb9u6
short version: 4.9.0-7 to 4.9.0-8
See changelogs for security fixes:
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.110-3deb9u6
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Update sonic-linux-kernel submodule after it was merged
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* [update graph] adapt to warm reboot scenario
When migrating configuration, always copy config files from old_config
to /etc/sonic. But if warm reboot is detected, then skip configuration
operations.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* log file copies and misses
* 1) DSCP 46 to 5; 2) ecn config for lossless traffic; 3) ecn on by default; 4) DWRR equal weight;
Signed-off-by: Wenda <wenni@microsoft.com>
* 1) link pg & queue 5 to lossy buffer profile; 2) ingress lossless alpha 1/8
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the test case for qos & buffer json template
Signed-off-by: Wenda <wenni@microsoft.com>
* Migrate a7050-qx32 and s6000 to use pg_profile lookup architecture
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom egress service pool for a7050-qx-32s, a7050-qx32, and s6000
Signed-off-by: Wenda <wenni@microsoft.com>
* Link queue 5 to lossy profile
Signed-off-by: Wenda <wenni@microsoft.com>
* Update arista drivers submodule
* Ignore the possible timestamp warning in tar extraction
* Add verbosity toggle to boot0
Console logging is slow because of the 9600 baud rate.
Some time can be saved by decreasing the console verbosity.
* Add hook mechanism in boot0.
Support additional features in boot0 via hooks.
Hooks are unpacked and executed at post-install or pre-exec time.
* Fix 7170 sensors.conf file
Fix critical temperature settings for MAX6658 sensors
* Fix the random swap of storage devices
For arista 7050 switches running with linux 4.9, it is likely the device
name of flash drive (/dev/sda) and usb (/dev/sdb) randomly swap in kernel
booting, depending on which one is ready first. It breaks the expectation
that flash will be mounted as root by setting root=/dev/sda1. This patch
will correct ROOT to flash device refering to the path under block_flash.
* Fix 7170 fancontrol
* Do not remove aquota.user file in boot0
This file is a filesystem protected file used by EOS.
It can be simply removed and will make the SONiC installation failed if
not skipped.
init_interfaces meant to be sonic init interfaces configuration file.
However, it needs to be copied to the right file name to take effect.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
There was a fix to speed up initialization when networking used init.d
but it did not carry over to systemd networking.service. This fix will
apply the same change on the systemd service.
The result is much less time spent being blocked in networking.service.
* [warmboot] Load database from `redis-cli save`
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Add trivial statement to make bash function valid
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Update submodule sonic-utilities: Use 'redis-cli save' to dump database to file
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Move configdb-load.sh outside docker, and only run in cold
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Fix for more strict warm check
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
This driver should be loaded by sonic service. If kernel tries to load
it, the driver would be loaded with default parameters, which is not
right for sonic.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix redis-py version
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Update submodule sonic-py-swsssdk: Fix redis-py version to 2.10.6
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Clear WARM_RESTART table could cause component level warm restart to
fail due to missing WARM_RESTART state.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- cold shutdown is used by regular service stop and/or fast reboot
- warm shutdown is used by warm restart and/or warm reboot
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This script shall not flush all the entries in the state database
when it starts up, since there are entries maintained and written
by other processes outside this docker.
The issue we noticed was that the portchannel states are cleaned
up after teamsyncd writes the entries into the database, which
causes the IPs failed to be configured because intfmgrd considers
the portchannels are not ready yet.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>