We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3.
Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package.
**- How I did it**
- Build and install sonic-utilities as a Python package
- Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel
- Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications
- Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable
Submodule updates:
* src/sonic-utilities aa27dd9...2244d7b (5):
> Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122)
> [consutil] Display remote device name in show command (#1120)
> [vrf] fix check state_db error when vrf moving (#1119)
> [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117)
> Update to make config load/reload backward compatible. (#1115)
* src/sonic-ztp dd025bc...911d622 (1):
> Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)
This PR limited the number of calls to sonic-cfggen to one call
per iteration instead of current 3 calls per iteration.
The PR also installs jq on host for future scripts if needed.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
communication on docker eth0 ip . Without this TCP Connection to Redis
does not happen in namespace.
Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
Introduced a new build parameter 'SONIC_IMAGE_VERSION' that allows build
system users to build SONiC image with a specific version string. If
'SONIC_IMAGE_VERSION' was not passed by the user, SONIC_IMAGE_VERSION will be
set to the output of functions.sh:sonic_get_version function.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
As per the VOQ HLDs, internal networking between the linecards and supervisor is required within a chassis.
Allocating 127.X/16 subnets for private communication within a chassis is a good candidate.
It doesn't require any external IP allocation as well as ensure that the traffic will not leave the chassis.
References:
https://github.com/Azure/SONiC/pull/622https://github.com/Azure/SONiC/pull/639
**- How I did it**
Changed the `interfaces.j2` file to add `127.0.0.1/16` as the `lo` ip address.
Then once the interface is up, the post-up command removes the `127.0.0.1/8` ip address.
The order in which the netmask change is made matters for `127.0.0.1` to be reachable at all times.
**- How to verify it**
```
root@sonic:~# ip address show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/16 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
```
Co-authored-by: Baptiste Covolato <baptiste@arista.com>
- Merge chassis codebase upstream
- Add support for Otterlake supervisor
- Add support for NorthFace and Camp chassis
- Add support for Eldridge, Dragonfly and Brooks fabrics
- Add support for Clearwater2 and Clearwater2Ms linecards
- Add new arista Cli to power on/off cards
- Add new arista show Cli to inspect supervisor, chassis, fabrics and linecards
Issue: Binary ebtables config file is CPU arch dependent
Fix: Load the text config during firsttime boot and
Generate the binary persistent atomic file
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
sonic-cfggen is now using Unix Domain Socket for Redis DB. The socket
is created using root account. Subsequently, services that are started
as admin fails to start. This PR creates redis group and add admin
user to redis group. It also grants read/write access on redis.sock
for redis group members.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
[schema] Make schema header support C project (#373)
Removed DB specific get api's from Selectable class (#378)
With the change as part of #378 caclmgrd need to be updated
to use new client side Get API to access namespace.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
The pcie-check.sh script was added in https://github.com/Azure/sonic-buildimage/pull/4771, but was not given executable permission. Therefore, we would see messages like:
```
Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied
Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied
Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'.
```
management framework provides management plane services like rest and
CLI which is not needed right after boot, instead by delaying this
service we give some more CPU for data plane and control plane services
on fast/warm boot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
New attribute 'has_timer' introduced to init_cfg.json does not evaluate
as Bool, rather it evaluates as string. This PR fixes this issue. Also,
this PR fixes an issue when there is system config unit (snmp, telemetry) that
has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings
in the [Install] section. In the latter case, the .service should not be enabled.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1. Configure mode while building an image
`make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running
Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
Restart swss with `systemctl restart swss`
The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97f1e. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.
Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.
Signed-off-by: Baptiste Covolato <baptiste@arista.com>
- Why I did it
When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality:
Image management
Configuration save and load
ZTP enable/disable and status
Show tech support
- How I did it
The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor.
This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality.
- How to verify it
- Description for the changelog
Add SONiC Host Service for container to execute select commands in host
Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>
Commit e484ae9dd introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
1) Moved from using blocking listen() on Config DB to the select() model
via python-swsscommon since we have to wait on event from multiple
config db's
2) Since python-swsscommon is not available on host added libswsscommon and python-swsscommon
and dependent packages in the base image (host enviroment)
3) Made iptables programmed in all namespace using ip netns exec
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fix Review Comments
* Fix Comments
* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fix Review Comments
* Added more comments on logic.
* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.
* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Verified with swsscommon package. Fix issue for single asic platforms.
* Moved to new python package
* Address Review Comments.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments.
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when running interfaces-
config.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Removes installation of kube-proxy (117 MB) and flannel (53 MB) images from Kubernetes-enabled devices. These images are tested to be unnecessary for our use case, as we do not rely on ClusterIPs for Kubernetes Services or a CNI for pod networking.
1. remove container feature table
2. do not generate feature entry if the feature is not included
in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities
* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]
Signed-off-by: Guohan Lu <lguohan@gmail.com>
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module
This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
* [platform] Add Support For Environment Variable
This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Changes to add template support for copp.json.
This is needed so that we can install differnt type of
Traps based on Device Role (Tor/Leaf/Mgmt/etc...).
Initial use case is to install DHCP/DHCPv6 tarp only
for tor router.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comments.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comment.
virtual-chassis test uses multiple vs instances to simulate a
modular switch and a redis-chassis service is required to run on
the vs instance that represents a supervisor card.
This change allows vs docker start redis-chassis service according
to external config file.
**- Why I did it**
To support virtual-chassis setup, so that we can test distributed forwarding feature in virtual sonic environment, see `Distributed forwarding in a VOQ architecture HLD` pull request at https://github.com/Azure/SONiC/pull/622
**- How I did it**
The sonic-vs start.sh is enhanced to start new redis_chassis service if external chassis config file found. The config file doesn't exist in current vs environment, start.sh will behave like before.
**- How to verify it**
The swss/test still pass. The chassis_db service is verified in virtual-chassis topology and tests which are in following PRs.
Signed-off-by: Honggang Xu <hxu@arista.com>
(cherry picked from commit c1d45cf81ce3238be2dcbccae98c0780944981ce)
Co-authored-by: Honggang Xu <hxu@arista.com>
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.
The package currently includes three modules:
- daemon_base
- device_info
- logger
Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket
Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.
`sonic-installer list` is a read-only command. Specify it as such in the sudoers file.
This will also ensure the new `show boot` command, which calls `sudo sonic-installer list` under the hood doesn't fail due to permissions.
Otherwise, it may cause issues for warm restarts, warm reboot.
Warm restart of swss will start nat which is not expected for warm
restart. Also it is observed that during warm-reboot script execution
nat container gets started after it was killed. This causes removal of
nat dump generated by nat previously:
A check [ -f /host/warmboot/nat/nat_entries.dump ] || echo "NAT dump
does not exists" was added right before kexec:
```
Fri Jul 17 10:47:16 UTC 2020 Prepare MLNX ASIC to fastfast-reboot:
install new FW if required
Fri Jul 17 10:47:18 UTC 2020 Pausing orchagent ...
Fri Jul 17 10:47:18 UTC 2020 Stopping nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopped nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopping radv ...
Fri Jul 17 10:47:19 UTC 2020 Stopping bgp ...
Fri Jul 17 10:47:19 UTC 2020 Stopped bgp ...
Fri Jul 17 10:47:21 UTC 2020 Initialize pre-shutdown ...
Fri Jul 17 10:47:21 UTC 2020 Requesting pre-shutdown ...
Fri Jul 17 10:47:22 UTC 2020 Waiting for pre-shutdown ...
Fri Jul 17 10:47:24 UTC 2020 Pre-shutdown succeeded ...
Fri Jul 17 10:47:24 UTC 2020 Backing up database ...
Fri Jul 17 10:47:25 UTC 2020 Stopping teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopped teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopping syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopped syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopping all remaining containers ...
Warning: Stopping telemetry.service, but it can still be activated by:
telemetry.timer
Fri Jul 17 10:47:37 UTC 2020 Stopped all remaining containers ...
NAT dump does not exists
Fri Jul 17 10:47:39 UTC 2020 Rebooting with /sbin/kexec -e to
SONiC-OS-201911.140-08245093 ...
```
With this change, executed warm-reboot 10 times without hitting this
issue, while without this change the issue is easily reproducible almost
every warm-reboot run.
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* [sonic-buildimage] Move BGP warm reboot scripts into BGP service /usr/local/bin
* Revert "[sonic-buildimage] Move BGP warm reboot scripts into BGP service /usr/local/bin"
This reverts commit d16d163fc4.
* [sonic-buildimage] Move BGP warm reboot script to BGP service
* [sonic-buildimage] Move BGP warm reboot script to BGP service
* [sonic-buildimage] Move BGP warm reboot script to BGP service
- access DB correctly
* Address code review comments, also change file mode of bgp.sh (+x)
* Address code review comments, also change the file mode of bgp.sh (+x)
* BGP warm reboot script to service, also handle fast boot as indicated by flag saved in StateDB
* BGP warm reboot script to service, code review comments on space alignment
* BGP warm reboot script to service: remove uncesseary space
* BGP warm reboot script to service: replace tab with space
* Code review comments: -) use new multi-db api -) add ignore error from zebra in case it's not configured
* Integrate with multi-ASIC changes committed recently
Co-authored-by: heidi.ou@alibaba-inc.com <heidi.ou@alibaba-inc.com>
* Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table
* Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled
* Addressed code review comment to replace duplicate code with already existing functions
* removed raising Exception when rule conflict in Control plane ACLs are found
* added code to remove the rule_props if it is conflicting ACL table versioning rule
* addressed review comment to add ignoring rule in the error statement
Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
* PCIe Monitor service
* Add rescan to pcie-mon.service when it fails to get all pcie devices
* space
* Clean up
* review comments
* update the pcie status in state db
* update the failed pcie status once at the end
* Update the pcie_status in STATE_DB and rename the service
* Add log to exit the service if the configuration file doesn't exist.
* fix the build failure
* Redo the pcie rescan for pcie-check failed case.
* review comments
* review comments
* review comments
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms
This pull request was cherry picked from "#1238" to resolve the conflicts.
- Why I did it
Add support to specify source address for TACACS+
- How I did it
Add patches for libpam-tacplus and libnss-tacplus. The patches parse the new option 'src_ip' and store the converted addrinfo. Then the addrinfo is used for TACACS+ connection.
Add a attribute 'src_ip' for table "TACPLUS|global" in configDB
Add some code to adapt to the attribute 'src_ip'.
- How to verify it
Config command for source address PR in sonic-utilities
config tacacs src_ip <ip_address>
- Description for the changelog
Add patches to specify source address for the TACACS+ outgoing packets.
- A picture of a cute animal (not mandatory but encouraged)
**UT logs: **
UT_tacacs_source_intf.txt
* [sonic-buildimage] Changes to make network specific sysctl
common for both host and docker namespace (in multi-npu).
This change is triggered with issue found in multi-npu platforms
where in docker namespace
net.ipv6.conf.all.forwarding was 0 (should be 1) because of
which RS/RA message were triggered and link-local router were learnt.
Beside this there were some other sysctl.net.ipv6* params whose value
in docker namespace is not same as host namespace.
So to make we are always in sync in host and docker namespace
created common file that list all sysctl.net.* params and used
both by host and docker namespace. Any change will get applied
to both namespace.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments and made sure to invoke augtool
only one and do string concatenation of all set commands
* Address Review Comments.
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms
Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
* Changes to make default route programming
correct in multi-asic platform where frr is not running
in host namespace. Change is to set correct administrative distance.
Also make NAMESPACE* enviroment variable available for all dockers
so that it can be used when needed.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fix review comments
* Review comment to check to add default route
only if default route exist and delete is successful.
* [systemd-generator]: Fix the code to make sure that dependencies
of host services are generated correctly for multi-asic platforms.
Add code to make sure that systemd timer files are also modified
to add the correct service dependency for multi-asic platforms.
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
* [systemd-generator]: Minor fix, remove debug code and
remove unused variable.
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.
Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.
**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json.
This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.
* Add secureboot support in boot0
* Initramfs changes for secureboot on Aboot devices
* Do not compress squashfs and gz in fs.zip
It doesn't make much sense to do so since these files are already
compressed.
Also not compressing the squashfs has the advantage of making it
mountable via a loop device.
* Add loopoffset parameter to initramfs-tools
- Ensure all features (services) are in the configured state when hostcfgd starts
- Better functionalization of code
- Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword.
This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.
While migrating to SONiC 20181130, identified a couple of issues:
1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127.
2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.
Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.
* Fix the Build on 201911 (Stretch) where the directory
/usr/lib/systemd/system/ does not exist so creating
manually. Change should not harm Master (buster) where
the directory is created by Linux
* Fix as per review comments
* Support rw files allowlist for Sonic Secure Boot
* Improve the performance
* fix bug
* Move the config description into a md file
* Change to use a simple way to remove the blank line
* Support chmod a-x in rw folder
* Change function name
* Change some unnecessary words
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image
**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.
164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done
The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:
str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134
The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:
After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.
**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.
This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here:
1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop
2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots
3. Substitute returns with continues so that even if one service fails, we still try to handle the others
Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.
**- What I did**
#### wheel package Makefiles
- wheel package Makefiles for sonic-yang-mgmt package.
#### libyang Python APIs:
- python APIs based on libyang
- functions to load/merge yang models and Yang data files
- function to validate data trees based on Yang models
- functions to merge yang data files/trees
- add/set/delete node in schema and data trees
- find data/schema nodes from xpath from the Yang data/schema tree in memory
- find dependencies
- dump the data tree in json/xml
#### Extension of libyang Python APIs:
-- Cropping input config based on Yang Model.
-- Translate input config based on Yang Model.
-- rev Translate input config based on Yang Model.
-- Find xpath of port, portleaf and a yang list.
-- Find if node is key of a list while deletion if yes, then delete the parent.
Signed-off-by: Praveen Chaudhary pchaudhary@linkedin.com
Signed-off-by: Ping Mao pmao@linkedin.com
- bug fix : Fixed an issue which the nps ko file was not loaded due to the wrong service file name
- Optimize the code to reduce changes due to the kernel upgrade
- Remove nephos ko file loaded in swss.service.j2 because it has loaded at syncd.service.j2
* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using config db from previous
file system
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments.
* Address Review comments
* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.
* Updated to use python command so no code duplication.
* [ntp] enable/disable NTP long jump according to reboot type
- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* fix typo
* further refactoring
* use sonic-db-cli instead
Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.
and then we load image and reboot even if there was existing
config_db.json we will look for DHCP Service. we should disbale
update_graph in such cases. This behaviour is silimar to what we have in
201811 image.
Modified caclmgrd behavior to enhance control plane security as follows:
Upon starting or receiving notification of ACL table/rule changes in Config DB:
1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions
2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute
3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute
4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages
5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets
6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets
7. Add iptables/ip6tables commands to allow all incoming BGP traffic
8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP)
9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured)
10. Add iptables rules to drop all packets destined for loopback interface IP addresses
11. Add iptables rules to drop all packets destined for management interface IP addresses
12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses
13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses
14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute)
15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets
* Changes for LLDP for Multi NPU Platoforms:-
a) Enable LLDP for Host namespace for Management Port
b) Make sure Management IP is avaliable in per asic namespace
needed for LLDP Chassis configuration
c) Make sure chassis mac-address is correct in per asic namespace
d) Do not run lldp on eth0 of per asic namespace and avoid chassis
configuration for same
e) Use Linux hostname instead from Device Metadata for lldp chassis
configuration since in multi-npu platforms device metadata hostname
will be differnt
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comment with following changes:
a) Use Device Metadata hostname even in per namespace conatiner.
updated minigraph parsing for same to have hostname as system
hostname and add new key for asic name
b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace
config also as needed for LLDP for setting chassis management IP.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices
Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present
Signed-off-by: Neetha John <nejo@microsoft.com>
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host
* Updates based on comments
* Adding the j2 templates for database_config and database_global files.
* Updating to retrieve the redis DIR's to be mounted from database_global.json file.
* Additional check to see if asic.conf file exists before sourcing it.
* Updates based on PR comments discussion.
* Review comments update
* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.
* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.
* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.
* Update the comments for sudo usage in docker_image_ctrl.j2
* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.
* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli
* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
1. ebtables -t filter -A FORWARD -p 802_1Q --vlan-encap 0806 -j DROP
The ARP packet with vlan tag can't match the default rule.
Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>
currently, vs docker always create 32 front panel ports.
when vs docker starts, it first detects the peer links
in the namespace and then setup equal number of front panel
interfaces as the peer links.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
* Run fsck filesystem check support prior mounting filesystem
If the filesystem become non clean ("dirty"), SONiC does not run fsck to
repair and mark it as clean again.
This patch adds the functionality to run fsck on each boot, prior to the
filesystem being mounted. This allows the filesystem to be repaired if
needed.
Note that if the filesystem is maked as clean, fsck does nothing and simply
return so this is perfectly fine to call fsck every time prior to mount the
filesystem.
How to verify this patch (using bash):
Using an image without this patch:
Make the filesystem "dirty" (not clean)
[we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform]
[do this only on a test platform!]
dd if=/dev/sda3 of=superblock bs=1 count=2048
printf "$(printf '\\x%02X' 2)" | dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null
dd of=/dev/sda3 if=superblock bs=1 count=2048
Verify that filesystem is not clean
tune2fs -l /dev/sda3 | grep "Filesystem state:"
reboot and verify that the filesystem is still not clean
Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean.
fsck log is stored on syslog, using the string FSCK as markup.
The one big bgp configuration template was splitted into chunks.
Currently we have three types of bgp neighbor peers:
general bgp peers. They are represented by CONFIG_DB::BGP_NEIGHBOR table entries
dynamic bgp peers. They are represented by CONFIG_DB::BGP_PEER_RANGE table entries
monitors bgp peers. They are represented by CONFIG_DB::BGP_MONITORS table entries
This PR introduces three templates for each peer type:
bgp policies: represent policieas that will be applied to the bgp peer-group (ip prefix-lists, route-maps, etc)
bgp peer-group: represent bgp peer group which has common configuration for the bgp peer type and uses bgp routing policy from the previous item
bgp peer-group instance: represent bgp configuration, which will be used to instatiate a bgp peer-group for the bgp peer-type. Usually this one is simple, consist of the referral to the bgp peer-group, bgp peer description and bgp peer ip address.
This PR redefined constant.yml file. Now this file has a setting for to use or don't use bgp_neighbor metadata. This file has more parameters for now, which are not used. They will be used in the next iteration of bgpcfgd.
Currently all tests have been disabled. I'm going to create next PR with the tests right after this PR is merged.
I'm going to introduce better bgpcfgd in a short time. It will include support of dynamic changes for the templates.
FIX:: #4231
- create a file in files/image_config/ntp/ntp-systemd-wrapper to add mgmt vrf related start cmd for ntp service. So that the default /usr/lib/ntp/ntp-systemd-wrapper can be overriden during build time.
- modify build_debian.sh to cp files/image_config/ntp/ntp-systemd-wrapper to /usr/lib/ntp/ntp-systemd-wrapper during build time.
Co-authored-by: Bing Sun <Bing_Sun@dell.com>
do mount/umount in the chroot environment
install cron explicitly
install rasdaemon as a replacement for mcelog
switch python package docker-py to docker
- build SONIC_STRETCH_DOCKERS in sonic-slave-stretch docker
- build image related module in sonic-slave-buster docker.
This includes all kernels modules and some packages
Signed-off-by: Guohan Lu <lguohan@gmail.com>
[sonic-yang-models]: First version of yang models for Port, VLan, Interface, PortChannel, loopback and ACL.
YANG models as per Guidelines.
Guideline doc: https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md
[sonic-yang-models/tests]: YANG model test code and JSON input for testing.
[sonic-yang-models/setup.py]: Build infra for yang models.
**- What I did**
Created Yang model for Sonic.
Tables: PORT, VLAN, VLAN_INTERFACE, VLAN_MEMBER, ACL_RULE, ACL_TABLE, INTERFACE.
Created build infra files using which a new package (sonic-yang-models) can be build and can be deployed on sonic switches. Yang models will be part of this new package.
**- How I did it**
Wrote yang models based on Guideline doc:
https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md
and
https://github.com/Azure/SONiC/wiki/Configuration.
Wrote python wheel Package infra which runs test for these Yang models using a json files which consists configuration as per yang models. These configs are for negative tests, which means we want to test that most must condition, pattern and when condition works as expected.
**- How to verify it**
Build Logs and testing:
———————————————————————————————————
```
/sonic/src/sonic-yang-models /sonic
running test
running egg_info
writing top-level names to sonic_yang_models.egg-info/top_level.txt
writing dependency_links to sonic_yang_models.egg-info/dependency_links.txt
writing sonic_yang_models.egg-info/PKG-INFO
reading manifest file 'sonic_yang_models.egg-info/SOURCES.txt'
writing manifest file 'sonic_yang_models.egg-info/SOURCES.txt'
running build_ext
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
running bdist_wheel
running build
running build_py
(Reading database ... 155852 files and directories currently installed.)
Preparing to unpack .../libyang_1.0.73_amd64.deb ...
Unpacking libyang (1.0.73) over (1.0.73) ...
Setting up libyang (1.0.73) ...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
Processing triggers for man-db (2.7.6.1-2) ...
(Reading database ... 155852 files and directories currently installed.)
Preparing to unpack .../libyang-cpp_1.0.73_amd64.deb ...
Unpacking libyang-cpp (1.0.73) over (1.0.73) ...
Setting up libyang-cpp (1.0.73) ...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
(Reading database ... 155852 files and directories currently installed.)
Preparing to unpack .../python3-yang_1.0.73_amd64.deb ...
Unpacking python3-yang (1.0.73) over (1.0.73) ...
Setting up python3-yang (1.0.73) ...
INFO:YANG-TEST:module: sonic-vlan is loaded successfully
ERROR:YANG-TEST:Could not get module: sonic-head
INFO:YANG-TEST:module: sonic-portchannel is loaded successfully
INFO:YANG-TEST:module: sonic-acl is loaded successfully
INFO:YANG-TEST:module: sonic-loopback-interface is loaded successfully
ERROR:YANG-TEST:Could not get module: sonic-port
INFO:YANG-TEST:module: sonic-interface is loaded successfully
INFO:YANG-TEST:
------------------- Test 1: Configure a member port in VLAN_MEMBER table which does not exist.---------------------
libyang[0]: Leafref "/sonic-port:sonic-port/sonic-port:PORT/sonic-port:PORT_LIST/sonic-port:port_name" of value "Ethernet156" points to a non
-existing leaf. (path: /sonic-vlan:sonic-vlan/VLAN_MEMBER/VLAN_MEMBER_LIST[vlan_name='Vlan100'][port='Ethernet156']/port)
INFO:YANG-TEST:Configure a member port in VLAN_MEMBER table which does not exist. Passed
INFO:YANG-TEST:
------------------- Test 2: Configure non-existing ACL_TABLE in ACL_RULE.---------------------
libyang[0]: Leafref "/sonic-acl:sonic-acl/sonic-acl:ACL_TABLE/sonic-acl:ACL_TABLE_LIST/sonic-acl:ACL_TABLE_NAME" of value "NOT-EXIST" points
to a non-existing leaf. (path: /sonic-acl:sonic-acl/ACL_RULE/ACL_RULE_LIST[ACL_TABLE_NAME='NOT-EXIST'][RULE_NAME='Rule_20']/ACL_TABLE_NAME)
INFO:YANG-TEST:Configure non-existing ACL_TABLE in ACL_RULE. Passed
INFO:YANG-TEST:
------------------- Test 3: Configure IP_TYPE as ARP and ICMPV6_CODE in ACL_RULE.---------------------
libyang[0]: When condition "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV6' or .='IPv6ANY'])" not satisfied. (path: /sonic-acl:sonic-acl/ACL_RU
LE/ACL_RULE_LIST[ACL_TABLE_NAME='NO-NSW-PACL-V4'][RULE_NAME='Rule_40']/ICMPV6_CODE)
INFO:YANG-TEST:Configure IP_TYPE as ARP and ICMPV6_CODE in ACL_RULE. Passed
INFO:YANG-TEST:
INFO:YANG-TEST:
------------------- Test 4: Configure IP_TYPE as ipv4any and SRC_IPV6 in ACL_RULE.---------------------
libyang[0]: When condition "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV6' or .='IPv6ANY'])" not satisfied. (path: /sonic-acl:sonic-acl/ACL_RU
LE/ACL_RULE_LIST[ACL_TABLE_NAME='NO-NSW-PACL-V4'][RULE_NAME='Rule_20']/SRC_IPV6)
INFO:YANG-TEST:Configure IP_TYPE as ipv4any and SRC_IPV6 in ACL_RULE. Passed
------------------- Test 5: Configure l4_src_port_range as 99999-99999 in ACL_RULE---------------------
libyang[0]: Value "99999-99999" does not satisfy the constraint "([0-9]{1,4}|[0-5][0-9]{4}|[6][0-4][0-9]{3}|[6][5][0-2][0-9]{2}|[6][5][3][0-5]{2}|[6][5][3][6][0-5])-([0-9]{1,4}|[0-5][0-9]{4}|[6][0-4][0-9]{3}|[6][5][0-2][0-9]{2}|[6][5][3][0-5]{2}|[6][5][3][6][0-5])" (range, length, or pattern). (path: /sonic-acl:sonic-acl/ACL_RULE/ACL_RULE_LIST[ACL_TABLE_NAME='NO-NSW-PACL-V6'][RULE_NAME='Rule_20']/L4_SRC_PORT_RANGE)
INFO:YANG-TEST:Configure l4_src_port_range as 99999-99999 in ACL_RULE Passed
INFO:YANG-TEST:
------------------- Test 6: Configure empty string as ip-prefix in INTERFACE table.---------------------
libyang[0]: Invalid value "" in "ip-prefix" element. (path: /sonic-interface:sonic-interface/INTERFACE/INTERFACE_LIST[interface='Ethernet8'][ip-prefix='']/ip-prefix)
INFO:YANG-TEST:Configure empty string as ip-prefix in INTERFACE table. Passed
INFO:YANG-TEST:
------------------- Test 7: Configure Wrong family with ip-prefix for VLAN_Interface Table---------------------
libyang[0]: Must condition "(contains(../ip-prefix, ':') and current()='IPv6') or (contains(../ip-prefix, '.') and current()='IPv4')" not satisfied. (path: /sonic-vlan:sonic-vlan/VLAN_INTERFACE/VLAN_INTERFACE_LIST[vlanid='100'][ip-prefix='2a04:5555:66:7777::1/64']/family)
INFO:YANG-TEST:Configure Wrong family with ip-prefix for VLAN_Interface Table Passed
INFO:YANG-TEST:
------------------- Test 8: Configure IP_TYPE as ARP and DST_IPV6 in ACL_RULE.---------------------
libyang[0]: When condition "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV6' or .='IPV6ANY'])" not satisfied. (path: /sonic-acl:sonic-acl/ACL_RULE/ACL_RULE_LIST[ACL_TABLE_NAME='NO-NS
W-PACL-V6'][RULE_NAME='Rule_20']/DST_IPV6)
INFO:YANG-TEST:Configure IP_TYPE as ARP and DST_IPV6 in ACL_RULE. Passed
INFO:YANG-TEST:
------------------- Test 9: Configure INNER_ETHER_TYPE as 0x080C in ACL_RULE.---------------------
libyang[0]: Value "0x080C" does not satisfy the constraint "(0x88CC|0x8100|0x8915|0x0806|0x0800|0x86DD|0x8847)" (range, length, or pattern). (path: /sonic-acl:sonic-acl/ACL_RULE/ACL_RULE_LIST[ACL_TABLE_NAME='NO-NSW-PACL-V4'][RULE_NAME='Rule_40']/INNER_ETHER_TYPE)
INFO:YANG-TEST:Configure INNER_ETHER_TYPE as 0x080C in ACL_RULE. Passed
INFO:YANG-TEST:
------------------- Test 10: Add dhcp_server which is not in correct ip-prefix format.---------------------
libyang[0]: Invalid value "10.186.72.566" in "dhcp_servers" element. (path: /sonic-vlan:sonic-vlan/VLAN/VLAN_LIST/dhcp_servers[.='10.186.72.566'])
INFO:YANG-TEST:Add dhcp_server which is not in correct ip-prefix format. Passed
INFO:YANG-TEST:
------------------- Test 11: Configure undefined acl_table_type in ACL_TABLE table.---------------------
libyang[0]: Invalid value "LAYER3V4" in "type" element. (path: /sonic-acl:sonic-acl/ACL_TABLE/ACL_TABLE_LIST[ACL_TABLE_NAME='NO-NSW-PACL-V6']/type)
INFO:YANG-TEST:Configure undefined acl_table_type in ACL_TABLE table. Passed
INFO:YANG-TEST:
------------------- Test 12: Configure undefined packet_action in ACL_RULE table.---------------------
libyang[0]: Invalid value "SEND" in "PACKET_ACTION" element. (path: /sonic-acl:sonic-acl/ACL_RULE/ACL_RULE_LIST/PACKET_ACTION)
INFO:YANG-TEST:Configure undefined packet_action in ACL_RULE table. Passed
INFO:YANG-TEST:
------------------- Test 13: Configure wrong value for tagging_mode.---------------------
libyang[0]: Invalid value "non-tagged" in "tagging_mode" element. (path: /sonic-vlan:sonic-vlan/VLAN_MEMBER/VLAN_MEMBER_LIST/tagging_mode)
INFO:YANG-TEST:Configure wrong value for tagging_mode. Passed
INFO:YANG-TEST:
------------------- Test 14: Configure vlan-id in VLAN_MEMBER table which does not exist in VLAN table.---------------------
libyang[0]: Leafref "../../../VLAN/VLAN_LIST/vlanid" of value "200" points to a non-existing leaf. (path: /sonic-vlan:sonic-vlan/VLAN_MEMBER/VLAN_MEMBER_LIST[vlanid='200'][port='Ethernet0']/vlanid)
libyang[0]: Leafref "../../../VLAN/VLAN_LIST/vlanid" of value "200" points to a non-existing leaf. (path: /sonic-vlan:sonic-vlan/VLAN_MEMBER/VLAN_MEMBER_LIST[vlanid='200'][port='Ethernet0']/vlanid)
INFO:YANG-TEST:Configure vlan-id in VLAN_MEMBER table which does not exist in VLAN table. Passed
INFO:YANG-TEST:All Test Passed
../../target/debs/stretch/libyang0.16_0.16.105-1_amd64.deb installtion failed
../../target/debs/stretch/libyang-cpp0.16_0.16.105-1_amd64.deb installtion failed
../../target/debs/stretch/python2-yang_0.16.105-1_amd64.deb installtion failed
YANG Tests passed
Passed: pyang -f tree ./yang-models/*.yang > ./yang-models/sonic_yang_tree
copying tests/yangModelTesting.py -> build/lib/tests
copying tests/test_sonic_yang_models.py -> build/lib/tests
copying tests/__init__.py -> build/lib/tests
running egg_info
writing top-level names to sonic_yang_models.egg-info/top_level.txt
writing dependency_links to sonic_yang_models.egg-info/dependency_links.txt
writing sonic_yang_models.egg-info/PKG-INFO
reading manifest file 'sonic_yang_models.egg-info/SOURCES.txt'
writing manifest file 'sonic_yang_models.egg-info/SOURCES.txt'
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/tests
copying build/lib/tests/yangModelTesting.py -> build/bdist.linux-x86_64/wheel/tests
copying build/lib/tests/test_sonic_yang_models.py -> build/bdist.linux-x86_64/wheel/tests
copying build/lib/tests/__init__.py -> build/bdist.linux-x86_64/wheel/tests
running install_data
creating build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data
creating build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data
creating build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-head.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-acl.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-interface.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-loopback-interface.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-port.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-portchannel.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
copying ./yang-models/sonic-vlan.yang -> build/bdist.linux-x86_64/wheel/sonic_yang_models-1.0.data/data/yang-models
```
* Install kubernetes worker node packages, if enabled.
* Minor updates
* Added some comments
* Updates per review comments.
Built a private image to test to work fine.
* Remove the removed file.
* Update per comments
Make a fix, as kubeadm no demands a higher version of kubelet & kubectl.
As kubeadm auto install kubectl & kubelet, removing explicit install is an easier/robust fix.
* Changes per review comments.
* Updates per comments.
1) Dropped helper & pod scripts
2) Made install verbose
* Drop creation of pods subdir, as this PR does not use them.
* From comments to 'n' per review comments.
* 1) kubeadm.conf is created as part of kubeadm package install. Hence dropped explicit copy.
sonic-netns-exec fails to execute below command in swss.sh:
sonic-netns-exec "$NET_NS" sonic-db-cli $1 EVAL "
local tables = {$2}
for i = 1, table.getn(tables) do
local matches = redis.call('KEYS', tables[i])
for j,name in ipairs(matches) do
redis.call('DEL', name)
end
end" 0
This command fails with error " redis.exceptions.ResponseError: value is not an integer or out of range" .
Root cause:
When sonic-netns-exec executes the above function, argument passed to sonic-db-cli is NOT executed as a single script.
The argument is passed as separate keywords to sonic-db-cli, as below:
['EVAL', 'local', 'tables', '=', "{'PORT_TABLE*'}", 'for', 'i', '=', '1,', 'table.getn(tables)', 'do', 'local', 'matches', '=', "redis.call('KEYS',", 'tables[i])', 'for', 'j,name', 'in', 'ipairs(matches)', 'do', "redis.call('DEL',", 'name)', 'end', 'end', '0']
- How I did it
To make sure that the parameters are passed as they were set initially, fix sonic-netns-exec to use double quoted "$@", where "$@" is "$1" "$2" "$3" ... "${N}"
After fix, the argument passed to sonic-db-cli is as below:
Argument passed to sonic-db-cli:
['EVAL', "\n local tables = {'PORT_TABLE*'}\n for i = 1, table.getn(tables) do\n local matches = redis.call('KEYS', tables[i])\n for j,name in ipairs(matches) do\n redis.call('DEL', name)\n end\n end", '0']
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
* Fix the CMD for the PROCESSSTATS entries so that
there is a space between the command name and the
arguments.
Signed-off-by: Garrick He <garrick_he@dell.com>
- What I did
Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large.
- How I did it
Added "tinker panic 0" in ntp.conf file.
- How to verify it
[this assumes that there is a valid NTP server IP in config_db/ntp.conf]
Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s.
Reboot the device.
Before the fix:
3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time.
After the fix:
3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.
Take advantage of an SDK environment variable to customize the location where sdk_socket exists.
In the latest SDK sdk_socket has been moved from /tmp to /var/run which is a better place to contain this kind of file.
However, this prevents the subdirs under /var/run from being mapped to different volumes. To resolve this, we take advantage of an SDK variable to designate the location of sdk_socket.
This requires every process that requires to access sdk_socket have this environment variable defined. However, to define environment variable for each process is less scalable. We take advantage of the docker scope environment variable to avoid that.
It depends on PR 4227
Instead of updating hostname manualy on Config DB hostname change,
simply share containers UTS namespace with host OS.
Ideally, instead of setting `--uts=host` for every container in SONiC,
this setting can be set per container if feature requires.
One behaviour change is introduced in this commit, when `--privileged`
or `--cap-add=CAP_SYS_ADMIN` and `--uts=host` are combined, container
has privilege to change host OS and every other container hostname.
Such privilege should be fixed by limiting containers capabilities.
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
This patch upgrade the kernel from version
4.9.0-9-2 (4.9.168-1+deb9u3) to 4.9.0-11-2 (4.9.189-3+deb9u2)
Co-authored-by: rajendra-dendukuri <47423477+rajendra-dendukuri@users.noreply.github.com>
Fix the issue of incorrectly skipping the convertfs hook when fast-reboot from EOS, by adding an extra kernel cmdline param "prev_os" to differentiate fast-reboot from EOS and from SONiC.
This is because we still do disk conversion for fast reboot from eos to sonic, like format the disk.
* [database] Implement the auto-restart feature for database container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] Remove the duplicate dependency in service files. Since we
already have updategraph ---> config_setup ---> database, we do not need
explicitly add database.service in all other container service files.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Reorganize the line 73 in event listener script.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] update the file sflow.service.j2 to remove the duplicate
dependency.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add comments in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Update the comments in line 56.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add parentheses for if statement in line 76 in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Add a new table CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Update the content of table CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Use the template to generate the table
CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Add a new table FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Change the order of container names according to
alphabetical order.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Change the dhcp_relay container name and add rest-api.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Changes in sonic-buildimage for the NAT feature
- Docker for NAT
- installing the required tools iptables and conntrack for nat
Signed-off-by: kiran.kella@broadcom.com
* Add redis-tools dependencies in the docker nat compilation
* Addressed review comments
* add natsyncd to warm-boot finalizer list
* addressed review comments
* using swsscommon.DBConnector instead of swsssdk.SonicV2Connector
* Enable NAT application in docker-sonic-vs
as the current kdump installation is searching for grub path, and ARM arch (marvell-armhf) are dependent on uboot, these changes has to be addressed. For now skipping kdump installation on ARM
Co-authored-by: lguohan <lguohan@gmail.com>
- move single instance services into their own folder
- generate Systemd templates for any multi-instance service files in slave.mk
- detect single or multi-instance platform in systemd-sonic-generator based on asic.conf platform specific file.
- update container hostname after creation instead of during creation (docker_image_ctl)
- run Docker containers in a network namespace if specified
- add a service to create a simulated multi-ASIC topology on the virtual switch platform
Signed-off-by: Lawrence Lee <t-lale@microsoft.com>
Signed-off-by: Suvarna Meenakshi <Suvarna.Meenaksh@microsoft.com>
* [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector
* update comment for a potential bug
* update comment
* add TODO maker as review reqirement
* Update arista driver submodule
* Add support for 7260CX3-64E in boot0
* Refactor boot0 platform specific definition
Make it easier to manage new sku
* Add support for 7050CX3-32S in boot0
Just contains the required boot0 information
* Add basic plugin support for DCS-7050CX3-32S
* Add port config for Arista-7050CX3-32S-C32
Co-authored-by: yurypm <yurypm@arista.com>
Co-authored-by: byu343 <byu@arista.com>
* [initramfs] Updated reuired tools for initramfs
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [initramfs] Updated required tools for initramfs
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [Platform] [Marvell] Platform specific debian package for et6448m device
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* Removed auto-generated files
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [initramfs] Added mtd and uboot firmware tools package required for arm arch
Its been enabled to all arch including amd64
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [initramfs] Added mtd and uboot firmware tools package required for arm arch
Its been enabled to all arch including amd64
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [initramfs] Marvell arm modules update and platform config update
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* [iniramfs] add initramfs uboot-utils hook script only for ARM
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
When database service is down, psud daemon throws an error because of DB connection reset, this because pmon service has no dependency with database service.
To resolve this issue, added database service dependency to the pmon service.
Also, increased the net.core.somaxconn value to 512 to solve the connection failure on the scaled setup.
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
This is an addendum to #3958, which also instructs apt to ignore the "Valid Until" date in Release files inside the slave containers, making a complete solution, much like the previously abandoned PR #2609. This patch also unifies file names and contents.
When the Debian team archives a repo, it stops updating the "Valid Until" date, thus apt-get will not apply updates for that repo unless we explicitly tell it to ignore the "Valid Until" date. Also, this has become an issue with active (i.e., non-archived) repos twice in the past year because the Debian folks seem to occasionally let the expiration lapse before updating the date. This will cause SONiC builds to fail with a message like E: Release file for http://debian-archive.trafficmanager.net/debian-security/dists/jessie/updates/InRelease is expired (invalid since 3d 3h 11min 20s). Updates for this repository will not be applied. until the dates have been updated and propagated to all mirrors. With this patch, SONiC should no longer be affected by lapsed "Valid Until" dates, whether they be accidental or purposeful.
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.
* Avoid reloading already uploaded file, by marking the names with a prefix.
* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.
* Fix few bugs
* The binary update_json.py will come from sonic-utilities.
* Added sonic-mgmt-framework as submodule / docker
* fix build issues
* update sonic-mgmt-framework submodule branch to master
* Merged changes 70007e6d2ba3a4c0b371cd693ccc63e0a8906e77..00d4fcfed6a759e40d7b92120ea0ee1f08300fc6
00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 Modified environemnt variables
* Changes to build sonic-mgmt-framework docker
* bumped up sonic-mgmt-framework commit-id
* version bump for sonic-mgmt-framework commit-it
* bumped up sonic-mgmt-framework commit-id
* Add python packages to docker
* Build fix for docker with python packages
* added libyang as dependent package
* Allow building images on NFS-mounted clones
Prior to this change, `build_debian.sh` would generate a Debian
filesystem in `./fsroot`. This needs root permissions, and one of the
tests that is performed is whether the user can create a character
special file in the filesystem (using mknod).
On most NFS deployments, `root` is the least privileged user, and cannot
run mknod. Also, attempting to run commands like rm or mv as root would
fail due to permission errors, since the root user gets mapped to an
unprivileged user like `nobody`.
This commit changes the location of the Debian filesystem to `/fsroot`,
which is a tmpfs mount within the slave Docker. The default squashfs,
docker tarball and zip files are also created within /tmp, before being
copied back to /sonic as the regular user.
The side effect of this change is that the contents of `/fsroot` are no
longer available once the slave container exits, however they are
available within the squashfs image.
Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>
* bumped up sonc-mgmt-framework commit to include PR #18
* REST Server startup script is enahnced to read the settings from
ConfigDB. Below table provides mapping of db field to command line
argument name.
============================================================
ConfigDB entry key Field name REST Server argument
============================================================
REST_SERVER|default port -port
REST_SERVER|default client_auth -client_auth
REST_SERVER|default log_level -v
DEVICE_METADATA|x509 server_crt -cert
DEVICE_METADATA|x509 server_key -key
DEVICE_METADATA|x509 ca_crt -cacert
============================================================
* Replace src/telemetry as submodule to sonic-telemetry
* Update telemetry commit HEAD
* Update sonic-telemetry commit HEAD
* libyang env path update
* Add libyang dependency to telemetry
* Add scripts to create JSON files for CLI backend
Scripts to create /var/platform/syseeprom and /var/platform/system, which are back-end
files for CLI, for system EEPROM and system information.
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* In startup script, create directory where CLI back-end files live
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* build dependency pkgs added to docker for build failure fix
* Changes to fix build issue for mgmt framework
* Fix exec path issue with telemetry
* s5232[device] PSU detecttion and default led state support
* Processing of first boot in rc.local should not have premature exit
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* docker mount options added for platform, system features
* bumped up sonic-mgmt-framework commit id to pick 23rd July 2019 changes
* Added mount options for telemetry docker to get access for system and platform info.
* Update commit for sonic-utilities
* [dell]: Corrected dport map and renamed config files for S5232F
* Fix telemetry submodule commit
* added support for sonic-cli console
* [Dell S5232F, Z9264F] Harden FPGA driver kernel module
For Dell S5232F and Z9264F platforms, be more strict when checking state
in ISR of FPGA driver, to harden against spurious interrupts.
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* update mgmt-framework submodule to 27th Aug commit.
* remove changes not related to mgmt-framework and sonic-telemetry
* Revert "Replace src/telemetry as submodule to sonic-telemetry"
This reverts commit 11c3192975.
* Revert "Replace src/telemetry as submodule to sonic-telemetry"
This reverts commit 11c3192975.
* make submodule changes and remove a change not related to PR
* more changes
* Update .gitmodules
* Update Dockerfile.j2
* Update .gitmodules
* Update .gitmodules
* Update .gitmodules
reverting experimental change
* Removed syspoll for release_1.0
Signed-off-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com>
* Update docker-sonic-mgmt-framework.mk
* Update sonic-mgmt-framework.mk
* Update sonic-mgmt-framework.mk
* Update docker-sonic-mgmt-framework.mk
* Update docker-sonic-mgmt-framework.mk
* Revert "Processing of first boot in rc.local should not have premature exit"
This reverts commit e99a91ffc2.
* Remove old telemetry directory
* Update docker-sonic-mgmt-framework.mk
* Resolving merge conflict with Azure
* Reverting the wrong merge
* Use CVL_SCHEMA_PATH instead of changing directory for telemetry startup
* Add missing export
* Add python mmh3 to slave dockerfile
* Remove sonic-mgmt-framework build dep for telemetry, fix dialout startup issues
* Provided flag to disable compiling mgmt-framework
* Update sonic-utilites point latest commit id
* Point sonic-utilities to Azure accepted SHA
* Updating mgmt framework to right sha
* Add sonic-telemetry submodule
* Update the mgmt-framework commit id
Co-authored-by: jghalam <joe.ghalam@gmail.com>
Co-authored-by: Partha Dutta <51353699+dutta-partha@users.noreply.github.com>
Co-authored-by: srideepDell <srideep_devireddy@dell.com>
Co-authored-by: nirenjan <nirenjan@users.noreply.github.com>
Co-authored-by: Sachin Holla <51310506+sachinholla@users.noreply.github.com>
Co-authored-by: Eric Seifert <seiferteric@gmail.com>
Co-authored-by: Howard Persh <hpersh@yahoo.com>
Co-authored-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com>
Co-authored-by: Arunsundar Kannan <31632515+arunsundark@users.noreply.github.com>
Co-authored-by: rvasanthm <51932293+rvasanthm@users.noreply.github.com>
Co-authored-by: Ashok Daparthi-Dell <Ashok_Daparthi@Dell.com>
Co-authored-by: anand-kumar-subramanian <51383315+anand-kumar-subramanian@users.noreply.github.com>
ASIC reset events are captured by hw-mgmt and hw-mgmt calls chipup/chipdown internally without OS iteraction
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Delay CPU intensive services at boot
- How I did it
Made snmp.timer work and add telemetry.timer.
But this is not enough because it breaks the existing snmp dependency on swss.
So, in this solution snmp timer is a wanted by swss service, but since OnBootSec timer expires only once it will not trigger snmp service, so I added line "OnUnitActiveSec=0 sec" which will start snmp service based on the last time it was active. On boot only OnBootSec will expire, on swss start/restarts only second timer will expire immediately and trigger snmp service.
However, snmp service will not stop after "systemctl stop snmp" because of the second timer which will always expire when snmp service because unavailable.
So there is a conflict which will be handled by systemd if we add "Conflicts=" line to both snmp.service and snmp.timer.
So during boot:
snmp does not start by default
swss starts and starts snmp timer
OnUnitActiveSec=0 does not expire since there is no snmp active
OnBootSec expires and starts snmp service and snmp timer gets stopped
During "systemctl restart swss"
snmp stops because of Requisite on swss
snmp unblocks snmp timer from running
swss starts and starts snmp timer
OnUnitActiveSec=0 expires imidiately and start snmp which stops snmp timer
During "systemctl stop snmp"
stop of snmp service unblocks snmp timer but no one starts the timer so it is not started by "OnUnitActiveSec=0"
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Corefile uploader service
1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.
* Remove rw permission for .rc file for group & others.
* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.
* Azure storage upload requires python module futures, hence added it to install list.
* Removed trailing spaces.
* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
if yes the software reboot cause file will be treated as the reboot cause.
finish
2. check whether platform api returns a reboot cause.
if yes it is treated as the reboot cause.
finish.
3. check whether /hosts/reboot-cause contains a cause.
if yes it is treated as the cause otherwise return unknown.
* [process-reboot-cause]Fix review comments
* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG
* [process-reboot-cause]address comments
* refactor the code flow
* Remove escape
* Remove extra ':'
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.
It would be safer to sed the file contents into a new file and switch
new file with the old file.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* ZTP infrastructure changes to support DHCP discovery provisioning data
- Dynamically generate DHCP client configuration based on current ZTP state
- Added support to request and process hostname when using DHCPv6
- Do not process graphservice url dhcp option if ZTP is enabled, ZTP service
will process it
- Generate /e/n/i file with all active interfaces seeking address assignment
via DHCP. Only interfaces that are created in Linux will be added to /e/n/i.
Also DHCP is started only on linked up in-band interfaces.
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.
* Create a SONiC configuration management service
* Perform config db migration after loading config_db.json to redis DB
* Migrate config-setup post migration hooks on image upgrade
config-setup post migration hooks help user to migrate configurations from
old image to new image. If the installed hooks are user defined they will not
be part of the newly installed image. So these hooks have to be migrated to
new image and only then they can be executing when the new image is booting.
The changes in this fix migrate config-setup post-migration hooks and ensure
that any hooks with the same filename in newly installed image are not
overwritten.
It is expected that users install new hooks as per their requirement and
not edit existing hooks. Any changes to existing hooks need to be done as
part of new image and not post bootup.
* Build sonic-ztp package
- Add changes in make rules to conditionally include sonic-ztp package
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
The sflow service should not start unless the swss service is started. However, if this service is not started, the sflow service should not attempt to start them, instead it should simply fail to start. Using Requisite=, we will achieve this behavior, whereas using Requires= will cause the required service to be started.
This PR is to handle the issue 3527.
When device boots up, NTP throws a traceback as explained in the issue 3527.
- Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”.
- Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback.
- This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.
Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
* In the event of a kernel crash, we need to gather as much information
as possible to understand and identify the root cause of the crash.
Currently, the kernel does not provide much information, which make
kernel crash investigation difficult and time consuming.
Fortunately, there is a way in the kernel to provide more information
in the case of a kernel crash. kdump is a feature of the Linux kernel
that creates crash dumps in the event of a kernel crash. This PR
will add kermel kdump support.
An extension to the CLI utilities config and show is provided to
configure and manage kdump:
- enable / disable kdump functionality
- configure kdump (how many kernel crash logs can be saved, memory
allocated for capture kernel)
- view kernel crash logs
* Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml
* Fix bgp templates
* Add community for loopback when bgpd is isolated
* Use correct community value
We noticed in tests/production that there is a low probability failure
where /etc/hosts could have some garbage characters before the entry for
local host name. The consequence is that all sudo command would be very
slow. In extreme cases it would prevent some services from starting
properly.
I suspect that the /etc/hosts file might be opened by some process causing
the issue. Editing contents with new file level and replace the whole file
should be safer.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
While doing CLI changes for SNMP configuration, few changes are made in backend to handle the modified CLI.
** Changes**
- "community" for "snmp trap" is also made as "configurable". snmpd_conf.j2 is modified to handle the same.
- Changed the snmp.yml file generation from postStartAction to preStartAction in docker_image_ctl.j2 specific to SNMP docker, to ensure that the snmp.yml is generated before sonic-cfggen generates the snmpd.conf.
- Changed to make the code common for management vrf and default vrf. Users can configure snmp trap and snmp listening IP for both management vrf and default vrf.
- after reloading minigraph, write latest version string in the DB.
- if old config_db.json file exists, use it and migrate to latest version.
- only reload minigraph when config_db.json doesn't exist and minigraph
exists.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Issue Overview
shutdown flow
For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following:
INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use.
INFO syncd.sh[23597]: Unloading sx_core[FAILED]
INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use
config reload & service swss.restart
In the flows like "config reload" and "service swss restart", the failure cause further consequences:
sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16"
syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE"
swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases.
reboot, warm-reboot & fast-reboot
In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched.
summary
To summarize, we have to come up with a way to ensure:
shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow;
don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time.
for "reboot" flow, either order is acceptable.
Solution
To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot.
- How I did it
To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence.
Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline.
This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait()
To start pmon automatically after syncd started.
slave.mk: add SONIC_PLATFORM_API_PY2 as dependency of host
sonic_debian_extension.j2: install sonic_daemon_base and Mellanox-specific sonic_platform on host
mlnx-platform-api.mk: export mlnx_platform_api_py2_wheel_path for sonic_debian_extension.j2
sonic-daemon-base.mk: export daemon_base_py2_wheel_path for sonic_debian_extension.j2
daemon_base.py: hind unnecessary dependency of swss_common on host
* [SNMP] management VRF SNMP support
This commit adds SNMP support for Management VRF using l3mdev.
The patch included provides VRF support, there is no single
"listendevice" configuration, rather multiple agentaddress
config options can each have their own "interface" to bind to
using "ip%interface". The snmpd.conf file is accordingly
generated using the snmp.yml file and redis database info.
Adding below the comments of SNMP patch 1376
--------------------------------------------
Since the Linux kernel added support for Virtual Routing
and Forwarding (VRF) in version 4.3
(Note: these won't compile on non-linux platforms)
https://www.kernel.org/doc/Documentation/networking/vrf.txt
Linux users could not use snmpd in its current form to
bind specific listening IP addresses to specific VRF
devices. A simplified description of a VRF inteface
is an interface that is a master (a container of sorts)
that collects a set of physicalinterfaces to form a
routing table.
This set of two patches (one for V5-7-patches and one
for V5-8-patches branches) is almost identical to patch
single "listendevice" configuration. Rather, multiple
agentAddress config options can each have their own
"interface" to bind to using the <ip>%<interface>
syntax.</interface></ip>
-------------------------------------------
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
This commit adds NTP support for management VRF using L3mdev. Config vrf add
mgmt will enable management VRF, enslave the eth0 device to the master device
mgmt, stop ntp service in default, restart interfaces-configs and restart ntp
service in mgmt-vrf context. Requirement and design are covered in mgmt vrf
design document.
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
Introduce a new "sflow" container (if ENABLE_SFLOW is set). The new docker will include:
hsflowd : host-sflow based daemon is the sFlow agent
psample : Built from libpsample repository. Useful in debugging sampled packets/groups.
sflowtool : Locally dump sflow samples (e.g. with a in-unit collector)
In case of SONiC-VS, enable psample & act_sample kernel modules.
VS' syncd needs iproute2=4.20.0-2~bpo9+1 & libcap2-bin=1:2.25-1 to support tc-sample
tc-syncd is provided as a convenience tool for debugging (e.g. tc-syncd filter show ...)
* Use dot1p to tc mapping for backend switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Do not write DSCP to TC mapping into CONFIG_DB or config_db.json for
storage switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script
Signed-off-by: Danny Allen <daall@microsoft.com>
* Update interval for running cron job
* Respond to feedback
* Change syslog id
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
* Use dot1p to tc mapping for backend switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Do not write DSCP to TC mapping into CONFIG_DB or config_db.json for
storage switches
Signed-off-by: Wenda Ni <wenni@microsoft.com>
[build_debian] Generate checksum of ASIC config files
* Adds script to generate checksums for ASIC config files
* Adds step to build_debian that copies ASIC config checksum into SONiC filesystem
Signed-off-by: Danny Allen daall@microsoft.com
this is the first step to moving different databases tables into different database instances
in this PR, only handle multiple database instances creation based on user configuration at /etc/sonic/database_config.json
we keep current method to create single database instance if no extra/new DATABASE configuration exist in database_config.json file.
if user try to configure more db instances at database_config.json , we create those new db instances along with the original db instance existing today.
The configuration is as below, later we can add more db related information if needed:
{
...
"DATABASE": {
"redis-db-01" : {
"port" : "6380",
"database": ["APPL_DB", "STATE_DB"]
},
"redis-db-02" : {
"port" : "6381",
"database":["ASIC_DB"]
},
}
...
}
The detail description is at design doc at Azure/SONiC#271
The main idea is : when database.sh started, we check the configuration and generate corresponding scripts.
rc.local service handle old_config copy when loading new images, there is no dependency between rc.local and database service today, for safety and make sure the copy operation are done before database try to read it, we make database service run after rc.local
Then database docker started, we check the configuration and generate corresponding scripts/.conf in database docker as well.
based on those conf, we create databases instances as required.
at last, we ping_pong check database are up and continue
Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
As well convert priority to integer for use by sort.
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
when device disk is small, do not unzip dockerfs.tar.gz on disk.
keep the tar file on the disk, unzip to tmpfs in the initrd phase.
enabled this for 7050-qx32
Signed-off-by: Guohan Lu <gulv@microsoft.com>
- What I did
Move the enabling of Systemd services from sonic_debian_extension to a new systemd generator
- How I did it
Create a new systemd generator to manually create symlinks to enable systemd services
Add rules/Makefile to build generator
Add services to be enabled to /etc/sonic/generated_services.conf to be read by the generator at boot time
Signed-off-by: Lawrence Lee <t-lale@microsoft.com>
ARM Architecture support in SONIC
make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH]
SONIC_ARCH: default amd64
armhf - arm32bit
arm64 - arm64bit
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
This commit adds support for New feature management VRF using L3mdev. Added
commands to enable/disable management VRF. Config vrf add mgmt will enable
management VRF, enslave the eth0 device to the master device mgmt and restart
interfaces-configs in mgmt-vrf context.
management interface (eth0) can be configured using config interface eth0 ip
add command and removed using config interface eth0 ip remove command.
Requirement and design are covered in mgmt vrf design document. Currently show
command displays linux command output; will update show command display in next
PR after concluding what would be the output for the show commands. Added
metric for default routes in dhcp and static, any changes for metric will be
addressed subsequently after discussing.
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
* [warm reboot] save configuration after warm reboot
After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Update finalize-warmboot.sh
* Upgrade ifupdown2 to version 1.2.8
Required by ZTP to support ZTP over IPv6 transport
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
- Make sure that migrated DB contents persisted for next boot
- Make sure that db saved after warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Added debug symbols to many debug dockers.
* For debug images *only*:
1) Archive source files into debug image
2) Archived source is copied into /src
3) Created an empty dir /debug
4) Mount both /src as ro & /debug as rw into every docker
5) Login banner will give some details on /src & /debug
6) Devs can copy core file into /debug and view it from inside a container.
7) Dev may create all gdb logs and other data directly into /debug.
* Dropped redundant REDIS_TOOLS per review comments.
* Added debug symbols to frr package and hence FRR based BGP docker.
* 1) Moved dbg_files.sh to scripts/
2) Src directories to archive are now collected from individual Makefiles.
3) Added few more debug symbols
4) Added few more debug dockers.
Here after no more changes except per review comments.
To debug:
Install required version of debug image in Switch or VM.
Copy core file into /debug of host
Get into Docker
gdb /usr/bin/<daemon> -c /debug/<your core file>
set directory /src/... <-- inside gdb to get the source
For non-in-depth debugging:
Download corresponding debug Docker image (docker-...-dbg.gz) to your VM
Load the image
Run image with entrypoint as 'bash' with dir containing core mapped in.
Run gdb on the core.
* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* [ARISTA] adding 7060_cs32s to eMMC exclusions
Following PR 2774 we added the 7060-cx32s according to the guidelines of
PR 2780
This adds the 7060-cx32s to the list f devices that mount /var/log as a
tmpfs to mitigate eMMC wearout
Signed-off-by: Michel Moriniaux <m.moriniaux@criteo.com>
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
* [submodule] update sonic-linux-kernel
* update linux kernel version
* Fix many version strings
* update mellanox components (built with new kernel)
* [mlnx] add make files for SDK WJH libs
* Update arista driver submodule (#8)
Make the debian packaging point to a newer kernel version.
* Updated Makefile infrastructure to build debug images.
As a sample, platform/broadcom/docker-orchagent-brcm.mk is updated to add a docker-orchagent-brcm-dbg.gz target.
Now "BLDENV=stretch make target/docker-orchagent-brcm-dbg.gz" will build the debug image.
NOTE: If you don't specify NOSTRETcH=1, it implicitly calls "make stretch", which builds all stretch targets and that would include debug dockers too.
This debug image can be used in any linux box to inspect core file. If your module's external dependency can be suitably mocked, you my even manually run it inside.
"docker run -it --entrypoint=/bin/bash e47a8fb8ed38"
You may map the core file path to this docker run.
* Dropped the regular binary using DBG_PACKAGES and a small name change to help readability.
* Tweaked the changes to retain the existing behavior w.r.t INSTALL_DEBUG_TOOLS=y.
When this change ('building debug docker image transparently') is extended to all dockers, this flag would become redundant. Yet, there can be some test based use cases that rely on this flag.
Until after all the dockers gets their debug images by default and we switch all use cases of this flag to use the newly built debug images, we need to maintain the existing behavior.
* 1) slave.mk - Dropped unused Docker build args
2) Debug template builder: renamed build_dbg_j2.sh to build_debug_docker_j2.sh
3) Dropped insignifcant statement CMD from debug Docker file, as base docker has Entrypoint.
* Reverted some changes, per review comments.
"User, uid, guid, frr-uid & frr-guid" are required for all docker images, with exception of debug images.
* Get in sync with the new update that filters out dockers to be built (SONIC_STRETCH_DOCKERS_FOR_INSTALLERS) and build debug-dockers only for those to be built and debug target is available.
* Mkae a template for each target that can be shared by all platforms.
Where needed a platform entry can override the template.
This avoids duplication, hence easier to maintain.
* A small change, that can fit better with other targets too.
Just take the platform code and do the rest in template.
* Extended debug to all stretch based docker images
* 1) Combined all orchagent makefiles into one platform independent make under rules/docker-orchagent.mk
2) Extened debug image to all stretch dockers
* Changes per review comments:
1) Dropped LIBSAIREDIS_DBG from database, teamd, router-advertiser, telemetry, and platform-monitor docker*.mk files from _DBG_DEPENDS list
2) W.r.t docker make for syncd, moved DEPENDS from template to specific makefile and let the template has stuff that is applicable to all.
* 1) Corrected a copy/paste mistake
* Fixed a copy/paste bug
* The base syncd dockers follow a template, which defines the base docker as DOCKER_SYNCD_BASE instead of DOCKER_SYNCD_<platform code>. Fix the docker-syncd-<mlnx, bfn>.mk to use the new one.
[Yet to be tested locally]
* Fixed spelling mistake
* Enable build of dbg-sonic-broadcom.bin, which uses dbg-dockers in place of regular dockers, for dockers that build debug version. For dockers that do not build debug version, it uses the regular docker.
This debug bin is installable and usable in a DUT, just like a regular bin.
* Per review comments:
1) Share a single rule for final image for normal & debug flavors (e.g. sonic-broadcom.bin & sonic-broadcom-dbg.bin)
2) Put dbg as suffix in final image name.
3) Compared target/sonic-broadcom.bin.logs with & w/o fix to verify integrity of sonic-broadcom.bin
4) Compared target/sonic-broadcom.bin.logs with sonic-broadcom-dbg.bin.log for verification
This fix takes care of ONIE image only. The next PR will cover the rest.
The next PR, will also make debug image conditional with flag.
* Updated per comments.
Now that debug dockers are available, do not need a way to install debug symbols in regular dockers.
With this commit, when INSTALL_DEBUG_TOOLS=y is set, it builds debug dockers (for dockers that enable debug build) and the final image uses debug dockers. For dockers that do not enable debug build, regular dockers get used in the final image.
Note:
The debug dockers are explicitly named as <docker name>-dbg.gz. But there is no "-dbg" suffix for image.
Hence if you make two runs with and w/o INSTALL_DEBUG_TOOLS=y, you have complete set of regular dockers + debug dockers. But the image gets overwritten.
Hence if both regular & debug images are needed, make two runs, as one with INSTALL_DEBUG_TOOLS=y and one w/o. Make sure to copy/rename the final image, before making the second run.
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes
* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround
However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
* Add boot0 support for the 7280CR3
* Add platform and plugins for 7280CR3
* Add port config for 7280CR3
* Add platform_reboot for 7280CR3
* Add support for 7280CR3-32D4 based on the 7280CR3-32P4
* Update arista driver submodules
- Introduce new 7280CR3-32P4
- Improve to the led plugin for OSFP
* Fix showing systemd shutdown sequence when verbose is set
* Fix creation of kernel-cmdline file
Sometimes boot0 prints error
"mv: can't preserve ownership of '/mnt/flash/image-arsonic.xxxx/kernel-cmdline': Operation not permitted"
* Improve flash space usage during installation
Some older systems only have 2GB of flash available. Installing a second
image on these can prove to be challenging.
The new installation process moves the installer swi to memory in order
to avoid free up space from the flash before uncompressing it there.
It removes all the flash space usage spike and also improves the IO
since the installation is no more reading and writting to the flash at
the same time.
* Add support of 7060CX-32S-SSD
* 7260CX3: use inventory powerCycle procedures
* 7050QX-32S: use inventory powerCycle procedures
* 7050QX-32: use inventory powerCycle procedures
* platform: arista: add common platform_reboot
Replace platform_reboot by a link to new common for devices already
using a similar script.
* 7060CX-32S: use inventory powerCycle procedures
* Install python smbus in pmon
Some platform plugin need the python smbus library to perform some actions.
This installs the dependency.
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.
* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
- Add ebtables package, and install some filter rules:
1. ebtables -A FORWARD -d BGA -j DROP
2. ebtables -A FORWARD -p ARP -j DROP
Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- use superviord to manage process in frr docker
- intro separated configuration mode for frr
- bring quagga configuration template to frr.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
SONiC is a heavy writer to /var/log partition, we noticed that this
behavior causes certain flash drive to become read-only over time.
To avoid this issue, we mount /var/log parition on these devices as
tmpfs.
- Mount /var/log as tmpfs
- /var/log default size is 128M
- Adjust size according to existing var-log.ext4 file size.
- Adjust size to between 5% to 10% of total memory size.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Install ipaddress python package that has deprecated current ipaddr. ipaddress has backport to python2.7
* Install python ipaddress module as required by route_check.py sonic utility. BTW, ipaddress deprecates ipaddr and ipaddress has python2 backport
* Revert the old chaneg per review comments.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).
Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.
- What I did
removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it
build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog
Allow configuration of platform-specific interfaces
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
We are going to use initramfs hook for firmware upgrades
To install Arista hook:
- create folder /mnt/flash/<image dir>/platform/hooks/boot1/ from Aboot or
/host/<image dir>/platform/hooks/boot1/ from Sonic
- add executable script to created folder
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Add a log message for each notification of add/del TACACS server.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* Moved another syslog message from DEBUG to INFO to be able to see those notifications.
All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* [build]: put stretch debian packages under target/debs/stretch/
* in stretch build phase, all debian packages built in that stage are placed under target/debs/stretch directory.
* for python-based debian packages, since they are really the same for jessie and stretch, they are placed under target/python-debs directory.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Removed deployment_id_asn_map.yml from copy list
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* QoS config change: 1) DSCP mapping; 2) link pg/queue 6 to lossy buffer;
3) redistribute scheduler
Signed-off-by: Wenda <wenni@microsoft.com>
* Add scheduling weight to queue 2
Signed-off-by: Wenda <wenni@microsoft.com>
* Link pg/queue 2 to lossy buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the pg headroom for a7060-D48C8 50G
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for qos
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom size, and update egress lossy pool size accordingly
Signed-off-by: Wenda <wenni@microsoft.com>
* Update headroom pool size; Update ingress service pool and egress lossy
pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* a7260: update headroom pool size; Update ingress service pool and egress lossy pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades
* [sonic-utilities] Update submodule to include related changes
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.
Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.
- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.
- How to verify it
Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.
* Perform stop/start of Mellanox driver tools for all types of reboot
* Don't set Mellanox FAST_BOOT option for "cold" reboot
* Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
* [security kernel] Upgrade kernel from 4.9.110-3+deb9u2 to 4.9.110-3+deb9u6
short version: 4.9.0-7 to 4.9.0-8
See changelogs for security fixes:
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.110-3deb9u6
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Update sonic-linux-kernel submodule after it was merged
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* [update graph] adapt to warm reboot scenario
When migrating configuration, always copy config files from old_config
to /etc/sonic. But if warm reboot is detected, then skip configuration
operations.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* log file copies and misses
* 1) DSCP 46 to 5; 2) ecn config for lossless traffic; 3) ecn on by default; 4) DWRR equal weight;
Signed-off-by: Wenda <wenni@microsoft.com>
* 1) link pg & queue 5 to lossy buffer profile; 2) ingress lossless alpha 1/8
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the test case for qos & buffer json template
Signed-off-by: Wenda <wenni@microsoft.com>
* Migrate a7050-qx32 and s6000 to use pg_profile lookup architecture
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom egress service pool for a7050-qx-32s, a7050-qx32, and s6000
Signed-off-by: Wenda <wenni@microsoft.com>
* Link queue 5 to lossy profile
Signed-off-by: Wenda <wenni@microsoft.com>
* Update arista drivers submodule
* Ignore the possible timestamp warning in tar extraction
* Add verbosity toggle to boot0
Console logging is slow because of the 9600 baud rate.
Some time can be saved by decreasing the console verbosity.
* Add hook mechanism in boot0.
Support additional features in boot0 via hooks.
Hooks are unpacked and executed at post-install or pre-exec time.
* Fix 7170 sensors.conf file
Fix critical temperature settings for MAX6658 sensors
* Fix the random swap of storage devices
For arista 7050 switches running with linux 4.9, it is likely the device
name of flash drive (/dev/sda) and usb (/dev/sdb) randomly swap in kernel
booting, depending on which one is ready first. It breaks the expectation
that flash will be mounted as root by setting root=/dev/sda1. This patch
will correct ROOT to flash device refering to the path under block_flash.
* Fix 7170 fancontrol
* Do not remove aquota.user file in boot0
This file is a filesystem protected file used by EOS.
It can be simply removed and will make the SONiC installation failed if
not skipped.
init_interfaces meant to be sonic init interfaces configuration file.
However, it needs to be copied to the right file name to take effect.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
There was a fix to speed up initialization when networking used init.d
but it did not carry over to systemd networking.service. This fix will
apply the same change on the systemd service.
The result is much less time spent being blocked in networking.service.
* [warmboot] Load database from `redis-cli save`
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Add trivial statement to make bash function valid
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Update submodule sonic-utilities: Use 'redis-cli save' to dump database to file
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Move configdb-load.sh outside docker, and only run in cold
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Fix for more strict warm check
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
This driver should be loaded by sonic service. If kernel tries to load
it, the driver would be loaded with default parameters, which is not
right for sonic.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix redis-py version
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Update submodule sonic-py-swsssdk: Fix redis-py version to 2.10.6
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Clear WARM_RESTART table could cause component level warm restart to
fail due to missing WARM_RESTART state.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- cold shutdown is used by regular service stop and/or fast reboot
- warm shutdown is used by warm restart and/or warm reboot
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This script shall not flush all the entries in the state database
when it starts up, since there are entries maintained and written
by other processes outside this docker.
The issue we noticed was that the portchannel states are cleaned
up after teamsyncd writes the entries into the database, which
causes the IPs failed to be configured because intfmgrd considers
the portchannels are not ready yet.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Auto negotiating console speed could cause sonic to lock on a wrong
speed under rare conditions. The only way to come out of the wrong
speed is to issue line break or restart console service with forced
speed, or reboot sonic.
Lock down the console speed to avoid these situations.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Remove the teamd.j2 templates used for starting the teamd. Add
teammgrd instead to manage all port channel related configuration
changes. Remove front panel port related configurations in
interfaces.j2 templates as well.
Remove teamd.sh script and use teammgrd to start all the teamd
processes. Remove all the logics in the start.sh script as well.
Update the sonic-swss submodule.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* Unify qos config with qos_config.j2 template
Signed-off-by: Wenda <wenni@microsoft.com>
* Change 7050 to use qos config template
Signed-off-by: Wenda <wenni@microsoft.com>
modified: device/arista/x86_64-arista_7050_qx32/Arista-7050-QX32/qos.json.j2
modified: device/arista/x86_64-arista_7050_qx32s/Arista-7050-QX-32S/qos.json.j2
* Change a7060, a7260, s6000, s6100, z9100 to use qos config template
Signed-off-by: Wenda <wenni@microsoft.com>
* Change mlnx devices to use qos config template
Signed-off-by: Wenda <wenni@microsoft.com>
modified: ../../../mellanox/x86_64-mlnx_msn2100-r0/ACS-MSN2100/qos.json.j2
modified: ../../../mellanox/x86_64-mlnx_msn2410-r0/ACS-MSN2410/qos.json.j2
modified: ../../../mellanox/x86_64-mlnx_msn2700-r0/ACS-MSN2700/qos.json.j2
modified: ../../../mellanox/x86_64-mlnx_msn2700-r0/Mellanox-SN2700-D48C8/qos.json.j2
* Change barefoot devices to use qos config template
Signed-off-by: Wenda <wenni@microsoft.com>
modified: barefoot/x86_64-accton_wedge100bf_32x-r0/montara/qos.json.j2
modified: barefoot/x86_64-accton_wedge100bf_65x-r0/mavericks/qos.json.j2
* Change accton as7212 to use qos config template
Signed-off-by: Wenda <wenni@microsoft.com>
modified: accton/x86_64-accton_as7212_54x-r0/AS7212-54x/qos.json.j2
* Apply PORT_QOS_MAP to active ports only
Signed-off-by: Wenda <wenni@microsoft.com>
* Update qos config test with qos_config.j2 template
Signed-off-by: Wenda <wenni@microsoft.com>
* Update sample output of qos-dell6100.json
Signed-off-by: Wenda <wenni@microsoft.com>
* Remove generating the default port name and index list, i.e., remove the generate_port_lists macro, because PORT is always defined
Signed-off-by: Wenda <wenni@microsoft.com>
* Include pfc_to_pg_map according to platform asic type obtained from
/etc/sonic/sonic_version.yml rather than specifying per hwsku
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Customize TC_TO_PRIORITY_GROUP_MAP and
PFC_PRIORITY_TO_PRIORITY_GROUP_MAP for barefoot
Signed-off-by: Wenda <wenni@microsoft.com>
* Unify PFC_PRIORITY_TO_PRIORITY_GROUP_MAP: remove "0":"0", "1":"1" as
these two pgs do not generate PFC frames.
Signed-off-by: Wenda <wenni@microsoft.com>
Update the hw-mgmt to latest release V.2.0.0060.
Update the related files according to the latest hw-mgmt.
Signed-off-by: Kevin Wang <kevinw@mellanox.com>
Flashes used for the 7050QX-32 and 7050QX-32S have a fw issue.
The best option to solve the problem is to upgrade to a newer firmware.
However this can only be done while in memory and take 10 seconds.
Adding an upgrade mechanism is possible but would need more
consideration as flashing the firmware and reformating the flash will
exceed the fast-reboot requirements.
A quick mitigation is to align the ext4 partition that we create on
these vfat based system on a 4k boundary.
Here we chose 1M instead but it's the same.
Newer version of sfdisk do this automatically but the one in SONiC
today doesn't have this behavior.
This workaround will only reduce the pace of the flash health
degradation. The only long term fix is to flash the firmware.
* Adapt to the new WARM_RESTART_TABLE table schema: change from restart_count to restore_count
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update variable and function name to match restore_count name change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update swss submodule for warm restart schema change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* [syncd] warn shutdown syncd process when warm boot is enabled
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [warmboot] mount folder to hold warmboot temporary files
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix a typo
* [swss.sh] refactor ssh service script code
- Move checks and waits to helper functions.
- Remove early returns from code stream
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [swss.sh] Add debug log for service state changes
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [syncd] Separate out syncd service from swss service
Still make them start/stop/restart synchronously so existing scripts
continue working.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Remove extra 'After' in swss service and remove syncd docker warm boot code
Syncd warm boot needs more thinking, we can put it back once the work
flow has been defined and ready for coding/testing.
* [syncd] syncd start/stop/restart shouldn't affect swss state
Semi-detach syncd service state change from swss:
- swss state change still chase syncd service to follow except warm boot
- syncd state change will only affect itself.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* add missing '{'
* [updategraph] add support to use preset config instead of default minigraph
* Fix variable case
* Remove default minigraph case
* Remove default minigraphs and add default_sku files
Currently setting the next boot image is the same as setting a default
image.
With this change SWI_DEFAULT= will be considered the default image and
SWI= the next image.
When executing the boot0 SWI= will be overriden by SWI_DEFAULT= if it
exists and create in with the value of SWI= otherwise.
On overlay filesystem the name of the mountpoint will also match in the
mount command for overlayfs as upperdir=
To prevent detecting the wrong partition we now look for space before.
This ensure that we match mountpoint and not devices in df and mount
outputs.
- Move front panel ports and port channels MTU and IP configurations out of
the current /etc/network/interfaces file and store them in the configuration
database.
- The default MTU value for both front panel ports and the port channels is
9100. They are set via the minigraph or 9100 by default.
- Introduce portmgrd which will pick up the MTU configurations from the
configuration database.
- The updated intfmgrd will pick up IP address changes from the configuration
database.
- Update sonic-swss submodule
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Previously use / to separate container name and program name.
However, in rsyslogd:
Precisely, the programname is terminated by either (whichever occurs first):
end of tag
nonprintable character
‘:’
‘[‘
‘/’
The above definition has been taken from the FreeBSD syslogd sources.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Fix potential blackholing/looping traffic and refresh ipv6 neighbor to avoid CPU hit
In case ipv6 global addresses were configured on L3 interfaces and used for peering,
and routing protocol was using link-local addresses on the same interfaces as prefered nexthops,
the link-local addresses could be aged out after a while due to no activities towards the link-local
addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert
them to the HW, and thus cause looping or blackholing traffic.
Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages.
On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly.
This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed,
and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load.
Also, if link-local addresses were not learned, we may not get them at all later.
It is intended to fix all above issues.
Changes:
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.
* Fix review comments:
Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping
format issue
* Combine regular L3 interface and portchannel interface for looping
* Add ndisc6 package to vs docker