Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms
Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
While testing reboot case for 201911 facing error:
supervisor-proc-exit-listener FATAL command at '/usr/bin/supervisor-proc-exit-listener' is not executable
Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json.
This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.
- Ensure all features (services) are in the configured state when hostcfgd starts
- Better functionalization of code
- Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword.
This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.
While migrating to SONiC 20181130, identified a couple of issues:
1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127.
2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.
Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.
* Fix the Build on 201911 (Stretch) where the directory
/usr/lib/systemd/system/ does not exist so creating
manually. Change should not harm Master (buster) where
the directory is created by Linux
* Fix as per review comments
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.
164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done
The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:
str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134
The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:
After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.
**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image
**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here:
1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop
2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots
3. Substitute returns with continues so that even if one service fails, we still try to handle the others
Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.
* Run fsck filesystem check support prior mounting filesystem
If the filesystem become non clean ("dirty"), SONiC does not run fsck to
repair and mark it as clean again.
This patch adds the functionality to run fsck on each boot, prior to the
filesystem being mounted. This allows the filesystem to be repaired if
needed.
Note that if the filesystem is maked as clean, fsck does nothing and simply
return so this is perfectly fine to call fsck every time prior to mount the
filesystem.
How to verify this patch (using bash):
Using an image without this patch:
Make the filesystem "dirty" (not clean)
[we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform]
[do this only on a test platform!]
dd if=/dev/sda3 of=superblock bs=1 count=2048
printf "$(printf '\\x%02X' 2)" | dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null
dd of=/dev/sda3 if=superblock bs=1 count=2048
Verify that filesystem is not clean
tune2fs -l /dev/sda3 | grep "Filesystem state:"
reboot and verify that the filesystem is still not clean
Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean.
fsck log is stored on syslog, using the string FSCK as markup.
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.
This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.
Modified caclmgrd behavior to enhance control plane security as follows:
Upon starting or receiving notification of ACL table/rule changes in Config DB:
1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions
2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute
3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute
4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages
5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets
6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets
7. Add iptables/ip6tables commands to allow all incoming BGP traffic
8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP)
9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured)
10. Add iptables rules to drop all packets destined for loopback interface IP addresses
11. Add iptables rules to drop all packets destined for management interface IP addresses
12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses
13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses
14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute)
15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets
* [ntp] enable/disable NTP long jump according to reboot type
- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* fix typo
* further refactoring
* use sonic-db-cli instead
* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using config db from previous
file system
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments.
* Address Review comments
* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.
* Updated to use python command so no code duplication.
and then we load image and reboot even if there was existing
config_db.json we will look for DHCP Service. we should disbale
update_graph in such cases. This behaviour is silimar to what we have in
201811 image.
* Changes for LLDP for Multi NPU Platoforms:-
a) Enable LLDP for Host namespace for Management Port
b) Make sure Management IP is avaliable in per asic namespace
needed for LLDP Chassis configuration
c) Make sure chassis mac-address is correct in per asic namespace
d) Do not run lldp on eth0 of per asic namespace and avoid chassis
configuration for same
e) Use Linux hostname instead from Device Metadata for lldp chassis
configuration since in multi-npu platforms device metadata hostname
will be differnt
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comment with following changes:
a) Use Device Metadata hostname even in per namespace conatiner.
updated minigraph parsing for same to have hostname as system
hostname and add new key for asic name
b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace
config also as needed for LLDP for setting chassis management IP.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address Review Comments
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host
* Updates based on comments
* Adding the j2 templates for database_config and database_global files.
* Updating to retrieve the redis DIR's to be mounted from database_global.json file.
* Additional check to see if asic.conf file exists before sourcing it.
* Updates based on PR comments discussion.
* Review comments update
* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.
* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.
* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.
* Update the comments for sudo usage in docker_image_ctrl.j2
* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.
* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli
* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
1. ebtables -t filter -A FORWARD -p 802_1Q --vlan-encap 0806 -j DROP
The ARP packet with vlan tag can't match the default rule.
Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices
Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present
Signed-off-by: Neetha John <nejo@microsoft.com>
The one big bgp configuration template was splitted into chunks.
Currently we have three types of bgp neighbor peers:
general bgp peers. They are represented by CONFIG_DB::BGP_NEIGHBOR table entries
dynamic bgp peers. They are represented by CONFIG_DB::BGP_PEER_RANGE table entries
monitors bgp peers. They are represented by CONFIG_DB::BGP_MONITORS table entries
This PR introduces three templates for each peer type:
bgp policies: represent policieas that will be applied to the bgp peer-group (ip prefix-lists, route-maps, etc)
bgp peer-group: represent bgp peer group which has common configuration for the bgp peer type and uses bgp routing policy from the previous item
bgp peer-group instance: represent bgp configuration, which will be used to instatiate a bgp peer-group for the bgp peer-type. Usually this one is simple, consist of the referral to the bgp peer-group, bgp peer description and bgp peer ip address.
This PR redefined constant.yml file. Now this file has a setting for to use or don't use bgp_neighbor metadata. This file has more parameters for now, which are not used. They will be used in the next iteration of bgpcfgd.
Currently all tests have been disabled. I'm going to create next PR with the tests right after this PR is merged.
I'm going to introduce better bgpcfgd in a short time. It will include support of dynamic changes for the templates.
FIX:: #4231
Install kubeadm, which transparently installs kubelet & kubectl
As well download required Kubernetes images required to run as kubernetes node.
The kubelet service is intentionally kept in disabled state, as it would otherwise
continuously restart wasting resources, until join to master.
sonic-netns-exec fails to execute below command in swss.sh:
sonic-netns-exec "$NET_NS" sonic-db-cli $1 EVAL "
local tables = {$2}
for i = 1, table.getn(tables) do
local matches = redis.call('KEYS', tables[i])
for j,name in ipairs(matches) do
redis.call('DEL', name)
end
end" 0
This command fails with error " redis.exceptions.ResponseError: value is not an integer or out of range" .
Root cause:
When sonic-netns-exec executes the above function, argument passed to sonic-db-cli is NOT executed as a single script.
The argument is passed as separate keywords to sonic-db-cli, as below:
['EVAL', 'local', 'tables', '=', "{'PORT_TABLE*'}", 'for', 'i', '=', '1,', 'table.getn(tables)', 'do', 'local', 'matches', '=', "redis.call('KEYS',", 'tables[i])', 'for', 'j,name', 'in', 'ipairs(matches)', 'do', "redis.call('DEL',", 'name)', 'end', 'end', '0']
- How I did it
To make sure that the parameters are passed as they were set initially, fix sonic-netns-exec to use double quoted "$@", where "$@" is "$1" "$2" "$3" ... "${N}"
After fix, the argument passed to sonic-db-cli is as below:
Argument passed to sonic-db-cli:
['EVAL', "\n local tables = {'PORT_TABLE*'}\n for i = 1, table.getn(tables) do\n local matches = redis.call('KEYS', tables[i])\n for j,name in ipairs(matches) do\n redis.call('DEL', name)\n end\n end", '0']
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
* Fix the CMD for the PROCESSSTATS entries so that
there is a space between the command name and the
arguments.
Signed-off-by: Garrick He <garrick_he@dell.com>
Instead of updating hostname manualy on Config DB hostname change,
simply share containers UTS namespace with host OS.
Ideally, instead of setting `--uts=host` for every container in SONiC,
this setting can be set per container if feature requires.
One behaviour change is introduced in this commit, when `--privileged`
or `--cap-add=CAP_SYS_ADMIN` and `--uts=host` are combined, container
has privilege to change host OS and every other container hostname.
Such privilege should be fixed by limiting containers capabilities.
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
- What I did
Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large.
- How I did it
Added "tinker panic 0" in ntp.conf file.
- How to verify it
[this assumes that there is a valid NTP server IP in config_db/ntp.conf]
Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s.
Reboot the device.
Before the fix:
3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time.
After the fix:
3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.
This patch upgrade the kernel from version
4.9.0-9-2 (4.9.168-1+deb9u3) to 4.9.0-11-2 (4.9.189-3+deb9u2)
Co-authored-by: rajendra-dendukuri <47423477+rajendra-dendukuri@users.noreply.github.com>
Take advantage of an SDK environment variable to customize the location where sdk_socket exists.
In the latest SDK sdk_socket has been moved from /tmp to /var/run which is a better place to contain this kind of file.
However, this prevents the subdirs under /var/run from being mapped to different volumes. To resolve this, we take advantage of an SDK variable to designate the location of sdk_socket.
This requires every process that requires to access sdk_socket have this environment variable defined. However, to define environment variable for each process is less scalable. We take advantage of the docker scope environment variable to avoid that.
It depends on PR 4227
* ZTP infrastructure changes to support DHCP discovery provisioning data
- Dynamically generate DHCP client configuration based on current ZTP state
- Added support to request and process hostname when using DHCPv6
- Do not process graphservice url dhcp option if ZTP is enabled, ZTP service
will process it
- Generate /e/n/i file with all active interfaces seeking address assignment
via DHCP. Only interfaces that are created in Linux will be added to /e/n/i.
Also DHCP is started only on linked up in-band interfaces.
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
* Build sonic-ztp package
- Add changes in make rules to conditionally include sonic-ztp package
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Fix the issue of incorrectly skipping the convertfs hook when fast-reboot from EOS, by adding an extra kernel cmdline param "prev_os" to differentiate fast-reboot from EOS and from SONiC.
This is because we still do disk conversion for fast reboot from eos to sonic, like format the disk.
* [database] Implement the auto-restart feature for database container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] Remove the duplicate dependency in service files. Since we
already have updategraph ---> config_setup ---> database, we do not need
explicitly add database.service in all other container service files.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Reorganize the line 73 in event listener script.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [database] update the file sflow.service.j2 to remove the duplicate
dependency.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add comments in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Update the comments in line 56.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [event listener] Add parentheses for if statement in line 76 in event listener.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Add a new table CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Update the content of table CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Use the template to generate the table
CONTAINER_FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Add a new table FEATURE.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Change the order of container names according to
alphabetical order.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [init_cfg.json] Change the dhcp_relay container name and add rest-api.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector
* update comment for a potential bug
* update comment
* add TODO maker as review reqirement
- move single instance services into their own folder
- generate Systemd templates for any multi-instance service files in slave.mk
- detect single or multi-instance platform in systemd-sonic-generator based on asic.conf platform specific file.
- update container hostname after creation instead of during creation (docker_image_ctl)
- run Docker containers in a network namespace if specified
- add a service to create a simulated multi-ASIC topology on the virtual switch platform
Signed-off-by: Lawrence Lee <t-lale@microsoft.com>
Signed-off-by: Suvarna Meenakshi <Suvarna.Meenaksh@microsoft.com>
* Changes in sonic-buildimage for the NAT feature
- Docker for NAT
- installing the required tools iptables and conntrack for nat
Signed-off-by: kiran.kella@broadcom.com
* Add redis-tools dependencies in the docker nat compilation
* Addressed review comments
* add natsyncd to warm-boot finalizer list
* addressed review comments
* using swsscommon.DBConnector instead of swsssdk.SonicV2Connector
* Enable NAT application in docker-sonic-vs
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
This is an addendum to #3958, which also instructs apt to ignore the "Valid Until" date in Release files inside the slave containers, making a complete solution, much like the previously abandoned PR #2609. This patch also unifies file names and contents.
When the Debian team archives a repo, it stops updating the "Valid Until" date, thus apt-get will not apply updates for that repo unless we explicitly tell it to ignore the "Valid Until" date. Also, this has become an issue with active (i.e., non-archived) repos twice in the past year because the Debian folks seem to occasionally let the expiration lapse before updating the date. This will cause SONiC builds to fail with a message like E: Release file for http://debian-archive.trafficmanager.net/debian-security/dists/jessie/updates/InRelease is expired (invalid since 3d 3h 11min 20s). Updates for this repository will not be applied. until the dates have been updated and propagated to all mirrors. With this patch, SONiC should no longer be affected by lapsed "Valid Until" dates, whether they be accidental or purposeful.
* Create a SONiC configuration management service
* Perform config db migration after loading config_db.json to redis DB
* Migrate config-setup post migration hooks on image upgrade
config-setup post migration hooks help user to migrate configurations from
old image to new image. If the installed hooks are user defined they will not
be part of the newly installed image. So these hooks have to be migrated to
new image and only then they can be executing when the new image is booting.
The changes in this fix migrate config-setup post-migration hooks and ensure
that any hooks with the same filename in newly installed image are not
overwritten.
It is expected that users install new hooks as per their requirement and
not edit existing hooks. Any changes to existing hooks need to be done as
part of new image and not post bootup.
* Added sonic-mgmt-framework as submodule / docker
* fix build issues
* update sonic-mgmt-framework submodule branch to master
* Merged changes 70007e6d2ba3a4c0b371cd693ccc63e0a8906e77..00d4fcfed6a759e40d7b92120ea0ee1f08300fc6
00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 Modified environemnt variables
* Changes to build sonic-mgmt-framework docker
* bumped up sonic-mgmt-framework commit-id
* version bump for sonic-mgmt-framework commit-it
* bumped up sonic-mgmt-framework commit-id
* Add python packages to docker
* Build fix for docker with python packages
* added libyang as dependent package
* Allow building images on NFS-mounted clones
Prior to this change, `build_debian.sh` would generate a Debian
filesystem in `./fsroot`. This needs root permissions, and one of the
tests that is performed is whether the user can create a character
special file in the filesystem (using mknod).
On most NFS deployments, `root` is the least privileged user, and cannot
run mknod. Also, attempting to run commands like rm or mv as root would
fail due to permission errors, since the root user gets mapped to an
unprivileged user like `nobody`.
This commit changes the location of the Debian filesystem to `/fsroot`,
which is a tmpfs mount within the slave Docker. The default squashfs,
docker tarball and zip files are also created within /tmp, before being
copied back to /sonic as the regular user.
The side effect of this change is that the contents of `/fsroot` are no
longer available once the slave container exits, however they are
available within the squashfs image.
Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>
* bumped up sonc-mgmt-framework commit to include PR #18
* REST Server startup script is enahnced to read the settings from
ConfigDB. Below table provides mapping of db field to command line
argument name.
============================================================
ConfigDB entry key Field name REST Server argument
============================================================
REST_SERVER|default port -port
REST_SERVER|default client_auth -client_auth
REST_SERVER|default log_level -v
DEVICE_METADATA|x509 server_crt -cert
DEVICE_METADATA|x509 server_key -key
DEVICE_METADATA|x509 ca_crt -cacert
============================================================
* Replace src/telemetry as submodule to sonic-telemetry
* Update telemetry commit HEAD
* Update sonic-telemetry commit HEAD
* libyang env path update
* Add libyang dependency to telemetry
* Add scripts to create JSON files for CLI backend
Scripts to create /var/platform/syseeprom and /var/platform/system, which are back-end
files for CLI, for system EEPROM and system information.
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* In startup script, create directory where CLI back-end files live
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* build dependency pkgs added to docker for build failure fix
* Changes to fix build issue for mgmt framework
* Fix exec path issue with telemetry
* s5232[device] PSU detecttion and default led state support
* Processing of first boot in rc.local should not have premature exit
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* docker mount options added for platform, system features
* bumped up sonic-mgmt-framework commit id to pick 23rd July 2019 changes
* Added mount options for telemetry docker to get access for system and platform info.
* Update commit for sonic-utilities
* [dell]: Corrected dport map and renamed config files for S5232F
* Fix telemetry submodule commit
* added support for sonic-cli console
* [Dell S5232F, Z9264F] Harden FPGA driver kernel module
For Dell S5232F and Z9264F platforms, be more strict when checking state
in ISR of FPGA driver, to harden against spurious interrupts.
Signed-off-by: Howard Persh <Howard_Persh@dell.com>
* update mgmt-framework submodule to 27th Aug commit.
* remove changes not related to mgmt-framework and sonic-telemetry
* Revert "Replace src/telemetry as submodule to sonic-telemetry"
This reverts commit 11c3192975.
* Revert "Replace src/telemetry as submodule to sonic-telemetry"
This reverts commit 11c3192975.
* make submodule changes and remove a change not related to PR
* more changes
* Update .gitmodules
* Update Dockerfile.j2
* Update .gitmodules
* Update .gitmodules
* Update .gitmodules
reverting experimental change
* Removed syspoll for release_1.0
Signed-off-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com>
* Update docker-sonic-mgmt-framework.mk
* Update sonic-mgmt-framework.mk
* Update sonic-mgmt-framework.mk
* Update docker-sonic-mgmt-framework.mk
* Update docker-sonic-mgmt-framework.mk
* Revert "Processing of first boot in rc.local should not have premature exit"
This reverts commit e99a91ffc2.
* Remove old telemetry directory
* Update docker-sonic-mgmt-framework.mk
* Resolving merge conflict with Azure
* Reverting the wrong merge
* Use CVL_SCHEMA_PATH instead of changing directory for telemetry startup
* Add missing export
* Add python mmh3 to slave dockerfile
* Remove sonic-mgmt-framework build dep for telemetry, fix dialout startup issues
* Provided flag to disable compiling mgmt-framework
* Update sonic-utilites point latest commit id
* Point sonic-utilities to Azure accepted SHA
* Updating mgmt framework to right sha
* Add sonic-telemetry submodule
* Update the mgmt-framework commit id
Co-authored-by: jghalam <joe.ghalam@gmail.com>
Co-authored-by: Partha Dutta <51353699+dutta-partha@users.noreply.github.com>
Co-authored-by: srideepDell <srideep_devireddy@dell.com>
Co-authored-by: nirenjan <nirenjan@users.noreply.github.com>
Co-authored-by: Sachin Holla <51310506+sachinholla@users.noreply.github.com>
Co-authored-by: Eric Seifert <seiferteric@gmail.com>
Co-authored-by: Howard Persh <hpersh@yahoo.com>
Co-authored-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com>
Co-authored-by: Arunsundar Kannan <31632515+arunsundark@users.noreply.github.com>
Co-authored-by: rvasanthm <51932293+rvasanthm@users.noreply.github.com>
Co-authored-by: Ashok Daparthi-Dell <Ashok_Daparthi@Dell.com>
Co-authored-by: anand-kumar-subramanian <51383315+anand-kumar-subramanian@users.noreply.github.com>
Delay CPU intensive services at boot
- How I did it
Made snmp.timer work and add telemetry.timer.
But this is not enough because it breaks the existing snmp dependency on swss.
So, in this solution snmp timer is a wanted by swss service, but since OnBootSec timer expires only once it will not trigger snmp service, so I added line "OnUnitActiveSec=0 sec" which will start snmp service based on the last time it was active. On boot only OnBootSec will expire, on swss start/restarts only second timer will expire immediately and trigger snmp service.
However, snmp service will not stop after "systemctl stop snmp" because of the second timer which will always expire when snmp service because unavailable.
So there is a conflict which will be handled by systemd if we add "Conflicts=" line to both snmp.service and snmp.timer.
So during boot:
snmp does not start by default
swss starts and starts snmp timer
OnUnitActiveSec=0 does not expire since there is no snmp active
OnBootSec expires and starts snmp service and snmp timer gets stopped
During "systemctl restart swss"
snmp stops because of Requisite on swss
snmp unblocks snmp timer from running
swss starts and starts snmp timer
OnUnitActiveSec=0 expires imidiately and start snmp which stops snmp timer
During "systemctl stop snmp"
stop of snmp service unblocks snmp timer but no one starts the timer so it is not started by "OnUnitActiveSec=0"
Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.
The sflow service should not start unless the swss service is started. However, if this service is not started, the sflow service should not attempt to start them, instead it should simply fail to start. Using Requisite=, we will achieve this behavior, whereas using Requires= will cause the required service to be started.
ASIC reset events are captured by hw-mgmt and hw-mgmt calls chipup/chipdown internally without OS iteraction
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.
* Avoid reloading already uploaded file, by marking the names with a prefix.
* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.
* Fix few bugs
* The binary update_json.py will come from sonic-utilities.
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.
It would be safer to sed the file contents into a new file and switch
new file with the old file.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Corefile uploader service
1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.
* Remove rw permission for .rc file for group & others.
* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.
* Azure storage upload requires python module futures, hence added it to install list.
* Removed trailing spaces.
* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
if yes the software reboot cause file will be treated as the reboot cause.
finish
2. check whether platform api returns a reboot cause.
if yes it is treated as the reboot cause.
finish.
3. check whether /hosts/reboot-cause contains a cause.
if yes it is treated as the cause otherwise return unknown.
* [process-reboot-cause]Fix review comments
* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG
* [process-reboot-cause]address comments
* refactor the code flow
* Remove escape
* Remove extra ':'