marios log

blog

/etc

about

My summary of the OpenStack Stein Infrastructure Summit and Train PTG aka Denver III

This was the first re-combined event with both summit and project teams gathering happening in the same week and the third consecutive year that OpenStack has descended on Denver. This is also the first Open Infrastructure summit - the foundation is expanding to allow other non openstack projects to use the Open Infrastructure foundation for housing their projects.

This is a brief summary with pointers of the sessions or rooms I attended in the order they happened. The full summit schedule is here and the PTG schedule is here.

There is a list of some of the etherpads used in various summit sessions in this wiki page thanks to T. Carrez who let me take a photo of his screen for the URL :).

Photos
Open Infra Summit Day 1
Open Infra Summit Day 2
Open Infra Summit Day 3
Project Teams Gathering Day 1
Project Teams Gathering Day 2

Photos

The Marketplace - [warn] panoramic large-ish file ~15MB
big-blue-bear
SnowpenStack
Downtown - [warn] panoramic large-ish ~ 16MB

Summit Day One

General impression is a slightly reduced attendance - though I should note the last summit I attended was Austin unless I’m mistaken, attending PTG but not summit. There were about ~2000 summit attendees according to one of the keynote speakers. Having said that however J. Bryce gave some interesting numbers in his keynote, hilighting that Stein is the 19th on time release for OpenStack, that OS is still the 3rd largest open source project in the world with 105,000 members across 180 countries and with 65000 merged changes in the last year.

It was interesting to hear from Deutche Telekom - especially that they are using and contributing to zuul upstream and that they rely on CI for their ever growing deployments. One of the numbers given is they are adding capacity at 400 servers per week.

Some other interesting points from the keynotes are

the increasing use of Ironic as a standalone service outside of OpenStack deployments, for managing the baremetal infrastructure (further hilighting the OpenInfra vs OpenStack only theme),
the increasing adoption of zuul for CI and that it is being adopted as a foundation project
ericsson brought a 5g network to summit, apparently the first 5G network (?) in the United States that was available at their booth and which uses OpenStack for their infrastructure. There was also a demonstration of the latency differences between 3/4/5G networks involving VR headsets.

Besides the keynotes I attended the OpenStack Ansible project update - there was a shout out for the TripleO team by Mohammed Nasser who higlighted the excellent cross team collaboration story by the TripleO tempest team and the Ansible project. Finally I attended a talk called “multicloud ci/cd with openstack and kubernetes” where the presented setup a simple ‘hello world’ application across a number of different geographic locations and showed how CI/CD meant he could make a simple change to the app and have it be tested then deployed across the different clouds that run that application.

Summit Day Two

I attended the Zuul project BOF (‘birds of a feather’) where it was interesting to hear about various folks that are running Zuul internally - some on older versions and wanting to upgrade.

I also caught the “Deployment Tools: defined common capabilities” where folks that work on or are knowledgable about the various OpenStack deployment tools including TripleO got together and used this etherpad to try and compile a list of ‘tags’ which the various tools can claim to implement. Examples include containerized (i.e. support for containerized deployments), version support, day 2 operations etc. The first step will be to socialize further distill and then socialize these ‘capabilities’ via the openstack-discuss mailing list.

The Airship project update was the next session I went to and was quite well attended. In general it was interesting to hear about the similarities in the concepts and approach taken in Airship compared to TripleO. Especially the concept of an ‘undercloud’ and that deployment is driven by yaml files which define the deployment and service configuration values. In Airship these yaml files are known as charts. The equivalence in TripleO is the tripleo heat templates repo which holds the deployment and service configuration for TripleO deployments.

Finally an interesting session on running zuul ontop of Kubernetes and using Helm Charts. The presenters said they would make the charts used in their deployment would be made available upstream “soon”. This then spawned a side conversation with weshay and sshnaidm about using kubernetes for the TripleO CI squad’s zuul based reproducer. Prompted by weshay we micro-hackfest explored the use of k3s - 5 less than k8s. Taking the docker-compose file we tried to convert it using the kompose tool. We got far enough running the k3s service but stumbled on the lack of support for dependencies in kompose. We could investigate writing some Helm charts to do this but it is still TBD if k3s is a direction we will adopt for the reproducer this cycle or if we will keep podman which replaced docker (sshnaidm++ was working on this).

Summit Day Three

On Wednesday the first session I attended was a comparison of TripleO, Kolla and Airship as a deployment tool. The common requirement was support for container based deployments. You can see event details here - apparently there should be a recording though this isn’t available at time of writing. Again it was interesting to hear about the similarities between The Airship and TripleO project approach to config management including the management node ‘undercloud’.

I then went to the very well attended and well lead (by slagle and emilienm) TripleO project update. Again there should be a recording available at some point via that link but it isn’t there at present time. Besides a general stein update, slagle introduced the concepts of scaling (thousand not hundred) and edge as one of the main use cases for these ‘thousand node deployments’. These concepts were then further discussed in subsequent TripleO sessions noted in following paragraphs.

The first of these TripleO sessions was the forum that was devoted to scale and lead by slagle - etherpad is here. There is a good list of the identified and discussed “bottleneck services” on the undercloud - including Heat, Ironic, Mistral&Zaqar, Neutron, Keyston and Ansible and the technical challenges around possibly removing these. This was further explored during the PTG.

Finally I was at the Open Infrastructure project update given by C. Boylan which hilighted the move to opendev.org and then the zuul project update by J. Blair.

Project Teams Gathering Day 1

I spent the PTG in the TripleO room Room etherpad and picture

The etherpad contains notes from the various discussions but I hilight some of the main themes here. As usual there was a brief retrospective on the stein cycle and some of that was captured in this etherpad. This was followed by an operator feedback session - one of the main issues raised was ‘needs more scale’.

Slagle lead the discussion on Edge which introduced and discussed the requirements for The Distributed Compute Node architecture, where we will have a central deployment for our controllers and compute nodes spread across a number of edge locations. There was participation here from both the Edge working group as well as the Ironic project.

Then fultonj and gfidente lead the storage squad update (notes on the main tripleo room etherpad. Among other things, there was discussion around ceph deployments ‘at the edge’ and the challenges, as well as the trigerring of tripleo jobs in ceph-ansible pull requests.

Finally emilien lead the Deployment squad topics (notes on tripleo room etherpad). In particular there was further discussion around making the undercloud ‘lighter’ by considering which services we might remove. For this cycle it is likely that we keep Mistral albeit changing the way we use it so that is only executes ansible, keeping Neutron and os-net-config as is, but making the network configuration be applied more directly by ansible. There was also discussion around the use of Nova and whether we can just use Ironic directly. There will be exploration around the use of metalsmith to provide the information about the nodes in our deployment that we lose by removing Nova.

Project Teams Gathering Day 2

Room etherpad and day two picture

Slagle lead the first session which revisited the “thousand node scale” topic introduced in the tripleo operator forum and captured in the tripleo-forum-scale etherpad.

The HA session was introduced by bandini and dciabrin (see main room etherpad for notes). Some of the topics raised here were the need for a new workflow for minor deployment configuration changes such as changing a service password, how we can improve the issue posed by a partial/temporary disconection of one of the cluster/controlplane nodes and whether pacemaker should be the default in upstream deployments (this is a topic revisited most summits…) and there was no strong push back on this however this is still to be proposed as a gerrit change so is still TBD.

The upgrades squad was represented by chem, jfrancoa and ccamacho. There are notes in this upgrades session etherpad. Amongst other topics there was discussion around ‘FFWD II’ which is Queens to Train (and which includes the upgrade from Centos7 to Centos8) as well as a discussion around a completely fresh approach to the upgrades workflow that uses a separate set of nodes for the controlplane. The idea is to replicate the existing controlplane onto 3 new nodes but deploying the target upgrade version. This could mean more than 3 nodes if you have distributed the controlplane services across a number of dedicated nodes like Networker for example. Once the ‘new’ controlplane is ready you would migrate the data from your old controloplane and at that point there would be controlplane outage. However since the target controlplane is ready to go, the hope is that the switch over from old to new controlplane will be a relatively painless process once the details are worked out in this cycle. For the rest of the nodes Compute etc the existing workflow would be used with the tripleoclient running the relevant ansible playbooks to deliver upgrades on per node basis.

The TripleO CI squad was represented by weshay, quiquell, sshnaidm and myself. The session was introduced by weshay and we had a good discussion lasting well over an hour about numerous topics (captured in the main triplo room etherpad) including the performance gains from moving to standalone jobs, plans around the standalone-upgrade in particular that for stable/stein this should be green and voting now taiga story in progress, the work around rhel7/8 on baremetal and the software factory jobs, using browbeat to monitor changes to the deployment time and possibly alert of even block if this is significant.

Finally weshay showed off the shiny new zuul-based reproducer (kudos quiquell and sshnaidm). In short you can find the reproducer-quickstart in any TripleO ci job and follow the related reproducer README to have your own zuul and gerrit running the given job using either libvirt or ovb (i.e. on rdocloud). This is the first time the new reproducer was introduced to the wider team and whilst we (TripleO squad) would probably still call this a beta, we think its ready enough for any early adopters that might find this interesting and useful enough to try it out and the CI squad would certainly appreciate any feedback.

blog comments powered by Disqus

site.xml

tripleo.xml

marios@ ~]$ /var/log