<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>marios log - tripleo</title>
<description>Posts categorized as 'tripleo'</description>
<link>http://mariosandreou.com</link>

<item>
<title>My summary of the OpenStack Stein Infrastructure Summit and Train PTG aka Denver III</title>
<description>&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;blog_title&quot;&gt;

My summary of the OpenStack Stein Infrastructure Summit and Train PTG aka Denver III

&lt;/div&gt;

&lt;p&gt;This was the first re-combined event with both summit and project teams
gathering happening in the same week and the third consecutive year that
OpenStack has descended on Denver. This is also the first Open Infrastructure
summit - the foundation is expanding to allow other non openstack projects to
use the Open Infrastructure foundation for housing their projects.&lt;/p&gt;

&lt;p&gt;This is a brief summary with pointers of the sessions or rooms I attended in
the order they happened. The full &lt;a href=&quot;https://www.openstack.org/summit/denver-2019/summit-schedule/#day=2019-04-29&quot;&gt;summit schedule is here&lt;/a&gt; and the &lt;a href=&quot;https://www.openstack.org/ptg/#tab_schedule&quot;&gt;PTG schedule is here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There is a list of some of the etherpads used in various summit sessions in
&lt;a href=&quot;https://wiki.openstack.org/wiki/Forum/Denver2019&quot;&gt;this wiki page&lt;/a&gt; thanks
to T. Carrez who let me take a photo of his screen for the URL :).&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#photos&quot;&gt;Photos&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#summit-day-one&quot;&gt;Open Infra Summit Day 1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#summit-day-two&quot;&gt;Open Infra Summit Day 2&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#summit-day-three&quot;&gt;Open Infra Summit Day 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#ptg-day-one&quot;&gt;Project Teams Gathering Day 1&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#ptg-day-two&quot;&gt;Project Teams Gathering Day 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;photos&quot;&gt;Photos&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;photos&quot;&gt; &lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/images/DenverSummit/market-place.jpg&quot;&gt;The Marketplace&lt;/a&gt; - [warn] panoramic
large-ish file ~15MB&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/images/DenverSummit/big-blue-bear.jpg&quot;&gt;big-blue-bear&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/images/DenverSummit/snow.jpg&quot;&gt;SnowpenStack&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/images/DenverSummit/downtown.jpg&quot;&gt;Downtown&lt;/a&gt; - [warn] panoramic large-ish
~ 16MB&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;summit-day-one&quot;&gt;Summit Day One&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;summit-day-one&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;General impression is a slightly reduced attendance - though I should note
the last summit I attended was Austin unless I’m mistaken, attending PTG but
not summit. There were about ~2000 summit attendees according to one of the
keynote speakers. Having said that however J. Bryce gave some interesting
numbers in his keynote, hilighting that Stein is the &lt;strong&gt;19th&lt;/strong&gt; on time release
for OpenStack, that OS is still the 3rd largest open source project in the
world with 105,000 members across 180 countries and with 65000 merged changes
in the last year.&lt;/p&gt;

&lt;p&gt;It was interesting to hear from Deutche Telekom - especially that they are
using and contributing to zuul upstream and that they rely on CI for their
ever growing deployments. One of the numbers given is they are adding capacity
at 400 servers &lt;strong&gt;per week&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Some other interesting points from the keynotes are&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the increasing use of Ironic as a standalone service outside of OpenStack
deployments, for managing the baremetal infrastructure (further hilighting
the OpenInfra vs OpenStack only theme),&lt;/li&gt;
  &lt;li&gt;the increasing adoption of zuul for CI and that it is being adopted as a
foundation project&lt;/li&gt;
  &lt;li&gt;ericsson brought a 5g network to summit, apparently the first 5G network
(?) in the United States that was available at their booth and which uses
OpenStack for their infrastructure. There was also a demonstration of the
latency differences between 3/4/5G networks involving VR headsets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides the keynotes I attended the OpenStack Ansible project update - there
was a shout out for the TripleO team by Mohammed Nasser who higlighted the
excellent cross team collaboration story by the TripleO tempest team and
the Ansible project. Finally I attended a talk called “multicloud ci/cd with
openstack and kubernetes” where the presented setup a simple ‘hello world’
application across a number of different geographic locations and showed how
CI/CD meant he could make a simple change to the app and have it be tested then
deployed across the different clouds that run that application.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;summit-day-two&quot;&gt;Summit Day Two&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;summit-day-two&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I attended the Zuul project BOF (‘birds of a feather’) where it was interesting
to hear about various folks that are running Zuul internally - some on older
versions and wanting to upgrade.&lt;/p&gt;

&lt;p&gt;I also caught the “Deployment Tools: defined common capabilities” where folks
that work on or are knowledgable about the various OpenStack deployment tools
including TripleO got together and used &lt;a href=&quot;https://etherpad.openstack.org/p/DEN-deployment-tools-capabilities&quot;&gt;this etherpad&lt;/a&gt;
to try and compile a list of ‘tags’ which the various tools can claim to
implement. Examples include &lt;em&gt;containerized&lt;/em&gt; (i.e. support for containerized
deployments), version support, day 2 operations etc. The first step will be
to socialize further distill and then socialize these ‘capabilities’ via the
&lt;a href=&quot;http://lists.openstack.org/pipermail/openstack-discuss/&quot;&gt;openstack-discuss mailing list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Airship project update was the next session I went to and was quite well
attended. In general it was interesting to hear about the similarities in
the concepts and approach taken in Airship compared to TripleO. Especially
the concept of an ‘undercloud’ and that deployment is driven by yaml files
which define the deployment and service configuration values. In Airship these
yaml files are known as charts. The equivalence in TripleO is the &lt;a href=&quot;https://opendev.org/openstack/tripleo-heat-templates&quot;&gt;tripleo
  heat templates&lt;/a&gt; repo
which holds the deployment and service configuration for TripleO deployments.&lt;/p&gt;

&lt;p&gt;Finally an interesting session on running zuul ontop of Kubernetes and using
&lt;a href=&quot;https://opendev.org/openstack/openstack-helm&quot;&gt;Helm Charts&lt;/a&gt;. The presenters
said they would make the charts used in their deployment would be made
available upstream “soon”. This then spawned a side conversation with weshay
and sshnaidm about using kubernetes for the TripleO CI squad’s zuul based
reproducer. Prompted by weshay we micro-hackfest explored the use of
&lt;a href=&quot;https://github.com/rancher/k3s&quot;&gt;k3s - 5 less than k8s&lt;/a&gt;. Taking the
docker-compose file we tried to convert it using the &lt;a href=&quot;https://github.com/kubernetes/kompose&quot;&gt;kompose tool&lt;/a&gt;. We got far enough running the k3s service but stumbled on the lack of
support for dependencies in kompose. We could investigate writing some Helm
charts to do this but it is still TBD if k3s is a direction we will adopt for
the reproducer this cycle or if we will keep podman which replaced docker
(sshnaidm++ was working on this).&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;summit-day-three&quot;&gt;Summit Day Three&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;summit-day-three&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On Wednesday the first session I attended was a comparison of TripleO, Kolla
and Airship as a deployment tool. The common requirement was support for
container based deployments. You can see &lt;a href=&quot;https://www.openstack.org/summit/denver-2019/summit-schedule/events/23180/choosing-the-containerized-cloud-provisioning-tool-that-best-suits-your-need&quot;&gt;event details here&lt;/a&gt; - apparently there should be a recording though this isn’t available
at time of writing. Again it was interesting to hear about the similarities
between The Airship and TripleO project approach to config management including
the management node ‘undercloud’.&lt;/p&gt;

&lt;p&gt;I then went to the very well attended and well lead (by slagle and
emilienm) &lt;a href=&quot;https://www.openstack.org/summit/denver-2019/summit-schedule/events/23738/tripleo-project-update&quot;&gt;TripleO project update&lt;/a&gt;. Again there should be a recording available at
some point via that link but it isn’t there at present time. Besides a
general stein update, slagle introduced the concepts of scaling (thousand not
hundred) and edge as one of the main use cases for these ‘thousand node
deployments’. These concepts were then further discussed in subsequent TripleO
sessions noted in following paragraphs.&lt;/p&gt;

&lt;p&gt;The first of these TripleO sessions was the forum that was devoted to scale and
lead by slagle - &lt;a href=&quot;https://etherpad.openstack.org/p/DEN-tripleo-forum-scale&quot;&gt;etherpad is here&lt;/a&gt;.
There is a good list of the identified and discussed “bottleneck services” on
the undercloud - including Heat, Ironic, Mistral&amp;amp;Zaqar, Neutron, Keyston and
Ansible and the technical challenges around possibly removing these. This was
further explored during the PTG.&lt;/p&gt;

&lt;p&gt;Finally I was at the &lt;a href=&quot;https://www.openstack.org/summit/denver-2019/summit-schedule/events/23727/openstack-infrastructure-project-project-update&quot;&gt;Open Infrastructure project update&lt;/a&gt;
given by C. Boylan which hilighted the move to opendev.org and then the &lt;a href=&quot;https://www.openstack.org/summit/denver-2019/summit-schedule/events/23726/zuul-project-update&quot;&gt;zuul
project update&lt;/a&gt; by J. Blair.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;project-teams-gathering-day-1&quot;&gt;Project Teams Gathering Day 1&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;ptg-day-one&quot;&gt; &lt;/a&gt;
I spent the PTG in the TripleO room &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-train&quot;&gt;Room etherpad&lt;/a&gt;
and &lt;a href=&quot;/images/DenverSummit/tripleo-ptg-1.jpg&quot;&gt;picture&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The etherpad contains notes from the various discussions but I hilight some of
the main themes here. As usual there was a brief retrospective on the stein
cycle and some of that was captured in &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-stein-retrospective&quot;&gt;this etherpad&lt;/a&gt;. This was followed by an operator feedback session - one of the
main issues raised was ‘needs more scale’.&lt;/p&gt;

&lt;p&gt;Slagle lead the discussion on Edge which introduced and discussed the
requirements for The Distributed Compute Node architecture, where we will have
a central deployment for our controllers and compute nodes spread across a
number of edge locations. There was participation here from both the Edge
working group as well as the Ironic project.&lt;/p&gt;

&lt;p&gt;Then fultonj and gfidente lead the storage squad update (notes on the main
&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-train&quot;&gt;tripleo room etherpad&lt;/a&gt;.
Among other things, there was discussion around ceph deployments ‘at the edge’
and the challenges, as well as the trigerring of tripleo jobs in ceph-ansible
pull requests.&lt;/p&gt;

&lt;p&gt;Finally emilien lead the Deployment squad topics (notes on &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-train&quot;&gt;tripleo room etherpad&lt;/a&gt;). In particular there was further discussion
around making the undercloud ‘lighter’ by considering which services we might
remove. For this cycle it is likely that we keep Mistral albeit changing the
way we use it so that is &lt;em&gt;only&lt;/em&gt; executes ansible, keeping Neutron and
os-net-config as is, but making the network configuration be applied more
directly by ansible. There was also discussion around the use of Nova and
whether we can just use Ironic directly. There will be exploration around the
use of &lt;a href=&quot;https://github.com/openstack/metalsmith&quot;&gt;metalsmith&lt;/a&gt; to provide the
information about the nodes in our deployment that we lose by removing Nova.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;project-teams-gathering-day-2&quot;&gt;Project Teams Gathering Day 2&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;ptg-day-two&quot;&gt; &lt;/a&gt;
&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-train&quot;&gt;Room etherpad&lt;/a&gt;
and &lt;a href=&quot;/images/DenverSummit/tripleo-ptg-2.jpg&quot;&gt;day two picture&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Slagle lead the first session which revisited the “thousand node scale” topic
introduced in the tripleo operator forum and captured in the
&lt;a href=&quot;https://etherpad.openstack.org/p/DEN-tripleo-forum-scale&quot;&gt;tripleo-forum-scale etherpad&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The HA session was introduced by bandini and dciabrin (see main room etherpad
for notes). Some of the topics raised here were the need for a new workflow
for minor deployment configuration changes such as changing a service password,
how we can improve the issue posed by a partial/temporary disconection of one
of the cluster/controlplane nodes and whether pacemaker should be the default
in upstream deployments (this is a topic revisited most summits…) and there
was no strong push back on this however this is still to be proposed as a
gerrit change so is still TBD.&lt;/p&gt;

&lt;p&gt;The upgrades squad was represented by chem, jfrancoa and ccamacho. There are
notes in &lt;a href=&quot;https://etherpad.openstack.org/p/upgrades_denver_ptg&quot;&gt;this upgrades session etherpad&lt;/a&gt;.
Amongst other topics there was discussion around ‘FFWD II’ which is Queens to
Train (and which includes the upgrade from Centos7 to Centos8) as well as a
discussion around a completely fresh approach to the upgrades workflow that
uses a separate set of nodes for the controlplane. The idea is to replicate
the existing controlplane onto 3 new nodes but deploying the target upgrade
version. This could mean more than 3 nodes if you have distributed
the controlplane services across a number of dedicated nodes like Networker
for example. Once the ‘new’ controlplane is ready you would migrate the data
from your old controloplane and at that point there would be controlplane
outage. However since the target controlplane is ready to go, the hope is that
the switch over from old to new controlplane will be a relatively painless
process once the details are worked out in this cycle. For the rest of the
nodes Compute etc the existing workflow would be used with the tripleoclient
running the relevant ansible playbooks to deliver upgrades on per node basis.&lt;/p&gt;

&lt;p&gt;The TripleO CI squad was represented by weshay, quiquell, sshnaidm and myself.
The session was introduced by weshay and we had a good discussion lasting
well over an hour about numerous topics (captured in the &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-train&quot;&gt;main triplo room etherpad&lt;/a&gt;) including the performance gains from moving
to standalone jobs, plans around the standalone-upgrade in particular that
for stable/stein this should be green and voting now &lt;a href=&quot;https://tree.taiga.io/project/tripleo-ci-board/us/959&quot;&gt;taiga story in progress&lt;/a&gt;, the work around rhel7/8 on baremetal and the
software factory jobs, using browbeat to monitor changes to the deployment time
and possibly alert of even block if this is significant.&lt;/p&gt;

&lt;p&gt;Finally weshay showed off the shiny new zuul-based reproducer (kudos quiquell
and sshnaidm). In short you can find the &lt;a href=&quot;http://logs.openstack.org/98/656398/5/check/tripleo-ci-fedora-28-standalone/8e43e29/logs/reproducer-quickstart/&quot;&gt;reproducer-quickstart&lt;/a&gt;
in any TripleO ci job and follow the related &lt;a href=&quot;http://logs.openstack.org/98/656398/5/check/tripleo-ci-fedora-28-standalone/8e43e29/logs/README-reproducer.html&quot;&gt;reproducer README&lt;/a&gt;
to have your own zuul and gerrit running the given job using either
libvirt or ovb (i.e. on rdocloud). This is the first time the new reproducer
was introduced to the wider team and whilst we (TripleO squad) would probably
still call this a beta, we think its ready enough for any early adopters that
might find this interesting and useful enough to try it out and the CI squad
would certainly appreciate any feedback.&lt;/p&gt;
</description>
<published>2019-05-06 00:00:00 +0300</published>
<link>http://mariosandreou.com/tripleo/2019/05/06/open-infrastructure-summit-denver-3.html</link>
</item>

<item>
<title>My summary of the OpenStack Stein PTG in Denver</title>
<description>&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;blog_title&quot;&gt;

My summary of the OpenStack Stein PTG in Denver

&lt;/div&gt;

&lt;p&gt;After only 3 take off/landings I was very happy to participate in the Stein
PTG in Denver.
This is a brief summary with pointers of the sessions or rooms I attended in
the order they happened (&lt;a href=&quot;https://web14.openstack.org/assets/ptg/Denver-map.pdf&quot;&gt;Stein PTG Schedule&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrades-ci-standalone&quot;&gt;Upgrades ci with the stand-alone deployment&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrades-sig&quot;&gt;Upgrades SIG&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#edge-room&quot;&gt;Edge room&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#tripleo&quot;&gt;TripleO room&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrades-ci-with-the-stand-alone-deployment&quot;&gt;Upgrades CI with the stand-alone deployment&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrades-ci-standalone&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We had a productive impromptu round table (weshay++) in one of the empty rooms
with the tripleo ci folks present (weshay, panda, sshnaidm, arxcruz, marios)
 the tripleo upgrades folks present (chem and holser) as well emeritus PTL mwahaha
around the stand-alone and how we can use it for upgrades ci. We introduced
the proposed &lt;a href=&quot;https://review.openstack.org/#/c/579854/&quot;&gt;spec&lt;/a&gt; and one of
the main topics discussed was, ultimately is it worth it, to solve all of these
subproblems to only end up with some approximation of the upgrade?&lt;/p&gt;

&lt;p&gt;The consensus was yes since we can have 2 types of upgrades job: use the
stand-alone to ci the actual tasks, i.e. upgrade_tasks and deployment_tasks
for each service in the tripleo-heat-templates, and another job (the current
job which will be adapted) to ci the upgrades workflow tripleoclient/mistral
workflows etc. There was general consensus in this approach between the upgrades
and ci representatives so that we could try and sell it to the wider team in
the tripleo room on wednesday together.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrades-special-interest-group&quot;&gt;Upgrades Special Interest Group&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrades-sig&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/upgrade-sig-ptg-stein&quot;&gt;Room etherpad&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Monday afternoon was spent in the upgrades SIG room. There was first discussion
of the placement api extraction and how this would have to be dealt with during
the upgrade, with a solution sketched out around the db migrations required.&lt;/p&gt;

&lt;p&gt;This lead into discussion around pre-upgrade checks that could deal with things
like db migrations (or just check if something is missing and fail accordingly
before the upgrade). As I was reminded during the lunchtime presentations pre
upgrade checks is one of the Stein community goals (together with python-3).
The idea is that each service would own a set of checks that should be performed
before an upgrade is run and that they would be invoked via the openstack client
(sthing along the lines of ‘openstack pre-upgrade-check nova’ - I believe there
is already some implementation (from the nova team) but I don’t readily have
details.&lt;/p&gt;

&lt;p&gt;There was then a productive discussion about the purpose and direction of the
upgrades SIG. One of the points raised was that the SIG should not be just
about the fast forward upgrade even though that has been a main focus until
now. The pre-upgrade checks are a good example of that and the SIG will try
and continue to promote these with adoption by all the OpenStack services.
On that note I proposed that whilst the services themselves will own the
service specific pre-upgrade checks, it’s the deployment projects which will own
the pre-upgrade infrastructure checks, such as healthy cluster/database or
responding service endpoints.&lt;/p&gt;

&lt;p&gt;There was ofcourse discussion around the fast forward upgrade with
status updates from the deployment projects present (kolla-ansible, TripleO,
charms, OSA). TripleO is the only project with an implemented workflow at present.
Finally there was a discussion about whether we’re doing better in terms of
operator experience for upgrades in general and how we can continue to improve
(e.g. rolling upgrades was one of the discussed points here).&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;edge-room&quot;&gt;Edge room&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;edge-room&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/EdgeComputingGroupPTG4&quot;&gt;Room etherpad&lt;/a&gt;
 &lt;a href=&quot;https://etherpad.openstack.org/p/edge-requirements-stein-ptg&quot;&gt;Room etherpad2&lt;/a&gt;
 &lt;a href=&quot;https://wiki.openstack.org/wiki/Edge_Computing_Group/Use_Cases&quot;&gt;Use cases&lt;/a&gt;
 &lt;a href=&quot;https://www.openstack.org/edge-computing/cloud-edge-computing-beyond-the-data-center?lang=en_US&quot;&gt;Edge primer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was only in attendance for the first part of this session which was about
understanding the requirements (and hopefully continuing to find the common
ground). The room started with a review of the various proposed use cases from dublin and any review of work since then. One of the main points raised by shardy is
that in TripleO whilst we have a number of exploratory efforts ongoing (like
split controlplane for example) it would be good to have a specific architecture
to aim for and that is missing currently. It was agreed that the existing use
cases will be extended to include the proposed architecture and that these
can serve as a starting point for anyone looking to deploy with edge locations.&lt;/p&gt;

&lt;p&gt;There are pointers to the rest of the edge sessions in the etherpad above.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;tripleo-room&quot;&gt;TripleO room&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;tripleo&quot;&gt; &lt;/a&gt;
&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-stein&quot;&gt;Room etherpad&lt;/a&gt;
&lt;a href=&quot;https://www.dropbox.com/sh/2pmvfkstudih2wf/AADlnSNAHoJcNToiJET6buvPa/TripleO?dl=0&amp;amp;preview=DSC_4440.JPG&amp;amp;subfolder_nav_tracking=1&quot;&gt;Team picture&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The order of sessions was slightly revised from that listed in the etherpad
above because the East coast storms forced folks to change travel plans. The
following order is to the best of my recollection ;)&lt;/p&gt;

&lt;h4 id=&quot;tripleo-and-edge-cloud-deployments&quot;&gt;TripleO and Edge cloud deployments&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-stein-edge&quot;&gt;Session etherpad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There was first a summary from the Edge room from shardy and then tripleo
specific discussion around the current work (split controlplane). There was
some discussion around possibly using/repurposing “the multinode job” for
multiple stacks to simulate the Edge locations in ci. There was also discussion
around the networking aspects (though this will depend on the architecture which we
don’t yet have fully targetted) with respect to the tripleo deployment networks
(controlplane/internalapi etc) in an edge deployment. Finally there was
consideration of the work needed in tripleo-common and the mistral workflows
needed for the split controlplane deployment.&lt;/p&gt;

&lt;h4 id=&quot;os--platform&quot;&gt;OS / Platform&lt;/h4&gt;
&lt;p&gt;(tracked on main tripleo etherpad linked above)&lt;/p&gt;

&lt;p&gt;The main items discussed here were Python 3 support, removing instack-undercloud
and “that upgrade” to Centos8 on Stein.&lt;/p&gt;

&lt;p&gt;For Python3 the discussion included the fact that in TripleO we are bound by whatever python the deployed services support (as well as what the upstream distribution will be i.e. Centos 7/8 and which python ships where).&lt;/p&gt;

&lt;p&gt;For the Centos8/Stein upgrade the upgrades folks chem and holser lead the
discussion outlining how we will need a completely new workflow, which may be
dictated in large by how the Centos8 is delivered. One of the approaches
discussed here was to use a completely external/distinct upgrade workflow for
the OS, versus the TripleO driven OpenStack upgrade itself.
We got into more details about this during the Baremetal session see below).&lt;/p&gt;

&lt;h4 id=&quot;tripleo-ci&quot;&gt;TripleO CI&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/ptg_denver_2018_tripleo_ci&quot;&gt;Session etherpad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the first items raised was the stand-alone deployment and its use in ci.
The general proposal is that we should use a lot more of it! In particular to
replace existing jobs (like scenarios 1/2) with a standalone deployement.&lt;/p&gt;

&lt;p&gt;There was also discussion around the stand-alone for the upgrades ci as we
agreed with the upgrades folks on Monday (&lt;a href=&quot;https://review.openstack.org/#/c/579854/&quot;&gt;spec&lt;/a&gt;). The idea of service vs workflow upgrades was presented/solidified here
and I have just updated v8 of the spec accordingly to emphasise this point.&lt;/p&gt;

&lt;p&gt;Other points discussed in the CI session were testing ovb in infra and how we
could make jobs voting. The first move will be towards removing te-broker.&lt;/p&gt;

&lt;p&gt;There was also some consideration of the involvement of the ci team with other
squads and vice versa. There is a new column in our &lt;a href=&quot;https://trello.com/b/U1ITy0cu/tripleo-and-rdo-ci&quot;&gt;trello board&lt;/a&gt; called “requests from other DFG”.&lt;/p&gt;

&lt;p&gt;A further point raised was the reproducer scripts and future directions including
running and not only generating this in ci. As related side note it sounds like
folks are using the reproducer and having some successes.&lt;/p&gt;

&lt;h4 id=&quot;ansible--framework&quot;&gt;Ansible / Framework&lt;/h4&gt;
&lt;p&gt;(tracked on main tripleo etherpad linked above)&lt;/p&gt;

&lt;p&gt;In this session an overview of the work towards splitting out the ansible tasks
from the tripleo-heat-templates into re-usable roles was given by jillr and
slagle. More info and pointers in the the main tripleo etherpad above.&lt;/p&gt;

&lt;h4 id=&quot;security&quot;&gt;Security&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-security-ptg-stein&quot;&gt;Session etherpad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Discussion around the workflow to change overcloud/service passwords (this is
currently borked!). In particular problems around trying to CI this since the
deploy takes too long to have deploy + stack update for the passwords and
validation within the timeout. Possibly could be a 3rd party (but then non
voting) job for now. There was also an overview of work towards using Castellan
with TripleO, as well as discussion around selinux and locking down ssh.&lt;/p&gt;

&lt;h4 id=&quot;ux--ui&quot;&gt;UX / UI&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ui-ptg-stein&quot;&gt;Session etherpad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CLI/UI feature parity is a main goal for this cycle (and further probably it
seems there is a &lt;em&gt;lot&lt;/em&gt; to do) and plan management operations around this.
Also good discussion around validations with Tengu joining remotely via Bluejeans
to champion the effort of providing a nice way to run these via the tripleoclient.&lt;/p&gt;

&lt;h4 id=&quot;baremetal&quot;&gt;Baremetal&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-baremetal-ptg-stein&quot;&gt;Session etherpad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This session started with discussion around metalsmith vs nova on the undercloud
and the required upgrade path to make this so. Also considered were the
overcloud image customization and discussions around network automation
(ansible with python-networking-ansible ml2 driver ).&lt;/p&gt;

&lt;p&gt;However unexpectedly and the most interesting part of this session personally
was an impromptu design session started by ipilcher (prompted by a question
from phuongh who I believe was new to the room). The session was about the
upgrade to Centos8 and three main approaches were explored, the “big bang”
(everything off upgrade everything back), “some kind of rolling upgrade” and
finally supporting either Centos8/Rocky or Centos7/Stein. The first and third
were deemed unworkable but there was a very lively and well engaged group
design session trying to navigate to a workable process for the ‘rolling upgrade’
aka split personality. Thanks to ipilcher (via bandini) the &lt;a href=&quot;https://drive.google.com/file/d/1IbcS7xcltxdsST1zJpOk7JTnurlC-8hL/view&quot;&gt;whiteboards looked like this&lt;/a&gt;.&lt;/p&gt;
</description>
<published>2018-09-18 00:00:00 +0300</published>
<link>http://mariosandreou.com/tripleo/2018/09/18/openstack-stein-ptg-denver.html</link>
</item>

<item>
<title>My summary of the OpenStack Rocky PTG in Dublin</title>
<description>&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;blog_title&quot;&gt;

My summary of the OpenStack Rocky PTG in Dublin

&lt;/div&gt;

&lt;p&gt;I was fortunate to be part of the OpenStack PTG in Dublin this February. Here
is a summary of the sessions I was able to be at. In the end the second day
of the TripleO meetup &lt;a href=&quot;https://calendar.google.com/calendar/embed?src=tgpb5tv12mlu7kge5oqertje78%40group.calendar.google.com&amp;amp;ctz=Europe%2FDublin&quot;&gt;thursday&lt;/a&gt; was disrupted
as we had to leave the PTG venue. However we still managed to cover a wide
range of topics some of which are summarized here.&lt;/p&gt;

&lt;p&gt;In short and in the order attended:
    * &lt;a href=&quot;#ffu&quot;&gt;FFU&lt;/a&gt;
    * &lt;a href=&quot;#release_cycles&quot;&gt;Release cycles&lt;/a&gt;
    * &lt;a href=&quot;#tripleo&quot;&gt;TripleO&lt;/a&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;ffu&quot;&gt;FFU&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;ffu&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/ffu-ptg-rocky&quot;&gt;session etherpad&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;There are at least 5 different ways of doing FFU! Deployment projects
update (tripleo, openstack-ansible, kolla, charms)&lt;/li&gt;
  &lt;li&gt;Some folks trying to do it manually (via operator feedback)&lt;/li&gt;
  &lt;li&gt;We will form a SIG (freenode #openstack-upgrades? )
 –&amp;gt; first order of business is documenting something! Agreeing on best
     practices when FFU.
 –&amp;gt; meetings every 2 weeks?&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;release-cycles&quot;&gt;Release Cycles&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;release_cycles&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/release-cycles-ptg-rocky&quot;&gt;session etherpad&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Release cadence to stay at 6 months for now. Wide discussion about the potential impacts of a longer release cycle including maintenance of stable branches, deployment project/integration testing and d/stream product release cycles, marketing, documentation and others. In the end the merits of a frequent upstream release cycle won, or at least, there was no consensus about getting a longer cycle.&lt;/li&gt;
  &lt;li&gt;On the other hand operators still think upgrades suck and don’t want to do it every six months. FFU is being relied on as the least painfull way to do upgrades at a longer cadence than the upstream 6 month development cycle which for now will stay as is.&lt;/li&gt;
  &lt;li&gt;There will be an extended maintenance tag or policy introduced for projects that will support the LTS long term support for stable branches&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;tripleo&quot;&gt;TripleO&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;tripleo&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-rocky&quot;&gt;main tracking etherpad&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;retro session (emilienm) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-rocky-retro&quot;&gt;session etherpad&lt;/a&gt; some main points here are ‘do more and better ci’, communicate more and review at least a bit outside your squad, improve bugs triage, bring back deepdives.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;ci session (weshay) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-ci&quot;&gt;session etherpad&lt;/a&gt; some main points here are ‘we need more attention on promotion’, upcoming features like new jobs (containerized undercloud, upgrades jobs), more communication with squads (upgrades ongoing for ex and continue to integrate the tripleo-upgrade role), python3 testing.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;config download (slagle) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-config-download&quot;&gt;session etherpad&lt;/a&gt; some main points are Rocky will bring config download and ansible-playbook worfklow for deployment of the environment, not just upgrade.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;all in one (dprince) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-rocky-all-in-one&quot;&gt;session etherpad&lt;/a&gt; some main points: using containerized undercloud have an ‘all-in-one’ role with only those services you need for your development at the given time. Some discussion around the potential CLI and pointers to more info https://review.openstack.org/#/c/547038/&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;tripleo for generic provisioning (shadower) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-generic-provisioning&quot;&gt;session etherpad&lt;/a&gt; some main points are re-using the config download with external_deploy_tasks (idea is kubernetes or openshift deployed in a tripleo overcloud), some work still needed on the interfaces and discussion around ironic nodes and ansible.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;upgrades (marios o/, chem, jistr, lbezdick) at  &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-rocky-upgrades&quot;&gt;session etherpad&lt;/a&gt; , some main points are improvements in the ci - tech debt (moving to using the tripleo-upgrade role now), containerized undercloud upgrade is coming in Rocky (emilien investigating), Rocky will be a stabilization cycle with focus on improvements to the operator experience including validations, backup/restore, documentation and cli/ui. Integration with UI might be considered during Rocky to be revisitied with UI squad.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;containerized undercloud (dprince, emilienm) &lt;a href=&quot;https://trello.com/b/nmGSNPoQ/containerized-undercloud&quot;&gt;session etherpad&lt;/a&gt; dprince gave a demonstration of a running containerized undercloud environment and reviewed the current work from the trello board. It is running well today and we can consider switching to default containerized undercloud in Rocky.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;multiple ceph clusters (gfidente, johfulto), &lt;a href=&quot;https://blueprints.launchpad.net/tripleo/+spec/deploy-multiple-ceph-clusters&quot;&gt;linked bug&lt;/a&gt; , discussion around possible approaches including having multiple heat stacks. gfidente or jfulton are better sources of info you are interested in this feature.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;workflows api (thrash) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-rocky-workflows-api&quot;&gt;session etherpad&lt;/a&gt; , some main points are fixing inconsistencies in workflows (should all have an output value, and not trying to get that from a zaqar message) and fixing usability, make a v2 tripleo mistral workflows api (tripleo-common) and re-organise the directories moving existing things under v1, look into optimizing the calls to swift to avoid a large number of individual object GET as currently happens.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;UI (jtomasek) &lt;a href=&quot;https://etherpad.openstack.org/p/tripleo-ptg-rocky-ui&quot;&gt;session etherpad&lt;/a&gt; some main points here are adding UI support for the new composable networks configuration, integration with coming config-download deployment, continue to increase UI/CLI feature parity, allow deployment of multiple plans, prototype workflows to derive parameters for the operator based on input for specific scenarios (like HCI), investigate root device hints support and setting physical_network on particular nodes. Florian led a side session in the Hotel on Thursday morning after we were kicked out of Croke Park stadium because &lt;a href=&quot;https://twitter.com/jistr/status/968976088486547457&quot;&gt;nodublin&lt;/a&gt; where we discussed allowing operators to upload customvalidations and prototyping the use of swift for storing validations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;You might note that there are errors in the html validator for this post, but its late here and I’m in no mood to fight that right now. Yes, I know. cool story bro&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;
</description>
<published>2018-03-07 00:00:00 +0200</published>
<link>http://mariosandreou.com/tripleo/2018/03/07/openstack-rocky-ptg-dublin.html</link>
</item>

<item>
<title>Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).</title>
<description>&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;blog_title&quot;&gt;

Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).

&lt;/div&gt;

&lt;p&gt;This post is about how I was able to mostly successfully follow the tripleo-docs,
to deploy a stable/mitaka 3-control 1-compute development (virt) setup so I can
ultimately test upgrading this to Newton.&lt;/p&gt;

&lt;p&gt;I wasn’t sure there was something worth writing here, but then the same
tools I used to address the two issues I hit &lt;em&gt;deploying&lt;/em&gt; mitaka kept coming
up during the week when trying to &lt;em&gt;upgrade&lt;/em&gt; that environment. I’ve had to use
a lot of grep and git blame/log to get to the bottom of &lt;a href=&quot;https://bugs.launchpad.net/tripleo/+bug/1593182&quot;&gt;issues&lt;/a&gt;
I’m &lt;a href=&quot;https://bugs.launchpad.net/tripleo/+bug/1593736&quot;&gt;seeing&lt;/a&gt;
trying to upgrade the undercloud from stable/mitaka to latest/newton.&lt;/p&gt;

&lt;p&gt;The Newton upgrade work is ongoing and possibly worthy of a future post.&lt;/p&gt;

&lt;p&gt;I guess this post is mostly about git blame, and using URI munging using the
change-id to get to actual gerrit code reviews from an error/issue you are seeing.&lt;/p&gt;

&lt;p&gt;For the record I deployed stable/mitaka following the instructions at
&lt;a href=&quot;http://docs.openstack.org/developer/tripleo-docs/&quot;&gt;tripleo-docs&lt;/a&gt; and setting
stable/mitaka repos in appropriate places. For example, during the &lt;a href=&quot;http://docs.openstack.org/developer/tripleo-docs/environments/environments.html#virtual-environment&quot;&gt;virt-setup&lt;/a&gt; and the
&lt;a href=&quot;http://docs.openstack.org/developer/tripleo-docs/installation/installation.html&quot;&gt;undercloud installation&lt;/a&gt;
I followed the ‘Stable Branch’ admonition and enabled mitaka repos like:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo curl -o /etc/yum.repos.d/delorean-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/current/delorean.repo
sudo curl -o /etc/yum.repos.d/delorean-deps-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/delorean-deps.repo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then when &lt;a href=&quot;http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html#get-images&quot;&gt;building images&lt;/a&gt;
I enabled the mitaka repo like:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;export NODE_DIST=centos7
export USE_DELOREAN_TRUNK=1
export DELOREAN_TRUNK_REPO=&quot;http://trunk.rdoproject.org/centos7-mitaka/current/&quot;
export DELOREAN_REPO_FILE=&quot;delorean.repo&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The two issues I hit:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#pebcak&quot;&gt;The pebcak issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#overcloud_memory&quot;&gt;The overcloud needs moar memory bug&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;the-pebcak-issue&quot;&gt;The pebcak issue.&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;pebcak&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This issue is the pebcak issue because whilst there is indeed a &lt;a href=&quot;https://bugs.launchpad.net/tripleo/+bug/1584792&quot;&gt;bona-fide
bug&lt;/a&gt; that I hit here, I only
hit that because I had a nit in my deployment command.&lt;/p&gt;

&lt;p&gt;My deployment command looked like this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openstack overcloud deploy --templates --control-scale 3 --compute-scale 1
  --libvirt-type qemu
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
-e network_env.yaml --ntp-server &quot;pool.ntp.org&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Deploying like that ^^^ got me this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;The files ('overcloud-without-mergepy.yaml', 'overcloud.yaml') not found
in the /usr/share/openstack-tripleo-heat-templates/ directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Err.. no I’m pretty sure those files are there (!)&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# [stack@instack ~]$ ls -l /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  lrwxrwxrwx. 1 root root 14 Jun 17 08:55 /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml -&amp;gt; overcloud.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So I know that message is very likely from the tripleoclient so I traced it. The
code has actually already been fixed on master so grep gave me nothing there.
However when I also tried against stable/mitaka:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[m@m python-tripleoclient]$ git checkout stable/mitaka
Switched to branch 'stable/mitaka'
[m@m python-tripleoclient]$ grep -rni &quot;not found in the&quot; ./*
./tripleoclient/v1/overcloud_deploy.py:414:  message = &quot;The files {0} not
found in the {1} directory&quot;.format(
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So then we can now use git blame to get to the code review that fixed it. Since
we now know the file that error message comes from, we can use git blame against
master branch. Since it is fixed on master, something must have fixed it:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[m@m python-tripleoclient]$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
[m@m python-tripleoclient]$ git blame tripleoclient/v1/overcloud_deploy.py

1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  382)     def _try_overcloud_deploy_with_compat_yaml(self, tht_root, stack,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  383)                                                stack_name, parameters,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  384)                                                environments, timeout):
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  385)         messages = ['The following errors occurred:']
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  386)         for overcloud_yaml_name in constants.OVERCLOUD_YAML_NAMES:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  387)             overcloud_yaml = os.path.join(tht_root, overcloud_yaml_name)
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  388)             try:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  389)                 self._heat_deploy(stack, stack_name, overcloud_yaml,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  390)                                   parameters, environments, timeout)
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  391)             except six.moves.urllib.error.URLError as e:
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  393)             else:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  394)                 return
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  395)         raise ValueError('\n'.join(messages))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So the git blame may not display great above, but I see the following line as
particularly interesting since it is different to stable/mitaka:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So now we can use git log to see the actual commit and check it is the one we
are looking for:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[m@m python-tripleoclient]$ git log 7a05679e
commit 7a05679ebc944e3bec6f20c194c40fae1cf39d8d
Author: James Slagle &amp;lt;jslagle@redhat.com&amp;gt;
Date:   Fri Apr 1 08:57:41 2016 -0400

Show correct missing files when an error occurs

This function was swallowing all missing file exceptions, and then
printing a message saying overcloud.yaml or
overcloud-without-mergepy.yaml were not found.

The problem is that the URLError could occur for any missing file, such
as a missing environment file, typo in a relative patch or filename,
etc. And in those cases, the error message is actually quite misleading,
especially if the overcloud.yaml does exist at the exact shown path.

This change makes it such that the actual missing file paths are shown
in the output.

Closes-Bug: 1584792
Change-Id: Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that sounds promising! So not only do we have the actual bug number, but
we have the Change-Id. We can use &lt;em&gt;that&lt;/em&gt; to get to the gerrit code review:&lt;/p&gt;

&lt;p&gt;[m@m ~]$ gimmeGerrit Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff&lt;/p&gt;

&lt;p&gt;Where &lt;strong&gt;gimmeGerrit&lt;/strong&gt; is a bash alias in my .profile:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  2  gimme_gerrit() {$
  3      gerrit_url=&quot;http://review.openstack.org/#q,$1,n,z&quot;$
  4      firefox $gerrit_url$
  5  }$
  93 alias gimmeGerrit=gimme_gerrit$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So from the review to &lt;a href=&quot;https://review.openstack.org/#/c/300462/&quot;&gt;master&lt;/a&gt; I just
made a cherry-pick to &lt;a href=&quot;https://review.openstack.org/#/c/329438/&quot;&gt;stable/mitaka&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now the reason I was seeing this issue in the first place, was because my deploy
command was indeed wrong (it’s just that the error message was eaten by this
particular bug). I was using ‘network_env.yaml’ but I had actually created
network-env.yaml. Yes, much palmface, but if I hadn’t I wouldn’t have backported
the fix so meh.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;the-overcloud-needs-moar-memory-bug&quot;&gt;The overcloud needs moar memory bug.&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;overcloud_memory&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is more or less well known in the tripleo community that 4GB overcloud nodes
will no longer cut it even in a virt environment, which is why we default to
5GB on current &lt;a href=&quot;https://github.com/openstack/instack-undercloud/blob/2dec7d7521799c0323d076cd66ba71ebb444c706/scripts/instack-virt-setup#L89&quot;&gt;master&lt;/a&gt;
instack-undercloud.&lt;/p&gt;

&lt;p&gt;I was seeing OOM issues on the overcloud nodes with current stable/mitaka like:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;16021:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]: u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Constraint::Base[storage_mgmt_vip-then-haproxy]/Exec[Creating order constraint storage_mgmt_vip-then-haproxy]: Could not evaluate: Cannot allocate memory - fork(2)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Service[openstack-nova-novncproxy]/Pacemaker::Resource::Systemd[openstack-nova-novncproxy]/Pcmk_resource[openstack-nova-novncproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs resource show openstack-nova-novncproxy &amp;gt; /dev/null 2&amp;gt;&amp;amp;1 2&amp;gt;&amp;amp;1\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-vncproxy-then-nova-api-constraint]/Exec[Creating order constraint nova-vncproxy-then-nova-api-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-api-with-nova-vncproxy-colocation]/Pcmk_constraint[colo-openstack-nova-api-clone-openstack-nova-novncproxy-clone]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-consoleauth-then-nova-vncproxy-constraint]/Exec[Creating order constraint nova-consoleauth-then-nova-vncproxy-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-vncproxy-with-nova-consoleauth-colocation]/Pcmk_constraint[

16313:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Sahara::Service::Api/Service[sahara-api]: Could not
evaluate: Cannot allocate memory - fork(2)
16314:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]:
Could not evaluate: Cannot allocate memory - fork(2)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Suspecting from previous experience this would be defaulted in instack-undercloud:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[m@m instack-undercloud]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
[m@m instack-undercloud]$ grep -rni 'NODE_MEM' ./*
./scripts/instack-virt-setup:89:export NODE_MEM=${NODE_MEM:-5120}

[m@m instack-undercloud]$ git blame scripts/instack-virt-setup | grep  NODE_MEM
2dec7d75 (Carlos Camacho  2016-03-30 09:17:44 +0000  89) export NODE_MEM=${NODE_MEM:-5120}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So using git log to see more about 2dec7d75:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[m@m instack-undercloud]$ git log 2dec7d75
commit 2dec7d7521799c0323d076cd66ba71ebb444c706
Author: Carlos Camacho &amp;lt;ccamacho@redhat.com&amp;gt;
Date:   Wed Mar 30 09:17:44 2016 +0000

    Overcloud is not able to deploy with the default 4GB of RAM using instack-undercloud

    When deploying the overcloud with the default value of 4GB of RAM the overcloud fails throwing &quot;Cannot allocate memory&quot; errors.
    By increasing the default memory to 5GB the error is solved in instack-undercloud

    Change-Id: I29036edeebefc1959643a04c5396e72863fdca5f
    Closes-Bug: #1563750
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So as in the case of the pebcak issue, gimmeGerrit yields the &lt;a href=&quot;https://review.openstack.org/#/c/299232/&quot;&gt;review&lt;/a&gt;
so I then just cherrypicked that to &lt;a href=&quot;https://review.openstack.org/#/c/329874/&quot;&gt;stable/mitaka&lt;/a&gt;
too.&lt;/p&gt;

</description>
<published>2016-06-17 00:00:00 +0300</published>
<link>http://mariosandreou.com/tripleo/2016/06/17/deploy-tripleo-stable-mitaka.html</link>
</item>

<item>
<title>Monitoring a tripleo Overcloud upgrade</title>
<description>&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;blog_title&quot;&gt;

Monitoring a tripleo Overcloud upgrade

&lt;/div&gt;

&lt;p&gt;The tripleo overcloud upgrades workflow (&lt;a href=&quot;https://review.openstack.org/#/c/308985/&quot;&gt;WIP Docs&lt;/a&gt;)
has been well tested for upgrades to stable/liberty. There is &lt;a href=&quot;https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades&quot;&gt;ongoing work&lt;/a&gt;
to adapt this workflow for upgrades to stable/mitaka/newton (current master),
as well as to change the process altogether and make it &lt;a href=&quot;https://review.openstack.org/#/c/319264/&quot;&gt;more composable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This post is a description of the kinds of things I look for when monitoring a
stable/liberty upgrade  - verification points after a given step and some
explanation in various points that may/not be helpful. I recently had to share
a lot of this information as  as part of a customer POC upgrade and thought
it would be useful to have written down somewhere.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#undercloud&quot;&gt;Upgrade the undercloud&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrade_init&quot;&gt;Upgrade init step&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrade_controllers&quot;&gt;Upgrade controllers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrade_compute_and_ceph&quot;&gt;Upgrade compute and ceph nodes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upgrade_converge&quot;&gt;Upgrade converge - apply config deployment wide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For reference, the overcloud being upgraded in the examples below was deployed
like:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e network_env.yaml --ntp-server '0.fedora.pool.ntp.org'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrade-your-undercloud&quot;&gt;Upgrade your undercloud.&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;undercloud&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first thing to check and very likely have to re-instate is any post-create
customizations you had to make to your undercloud, such as creation of a new ovs
interface for talking to your overcloud nodes, or any custom IP routes. The
undercloud upgrade will revert those and you’ll have to re-add/create them.&lt;/p&gt;

&lt;p&gt;The upgrade to liberty delivers a new &lt;em&gt;upgrade-non-controller.sh&lt;/em&gt; script for
the undercloud, so you can check this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[stack@instack ~]$ which upgrade-non-controller.sh
/bin/upgrade-non-controller.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Other than that I always just sanity check that services are running OK post
upgrade:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[stack@instack ~]$ openstack-service status
MainPID=2107 Id=neutron-dhcp-agent.service ActiveState=active
MainPID=2106 Id=neutron-openvswitch-agent.service ActiveState=active
MainPID=1191 Id=neutron-server.service ActiveState=active
MainPID=1232 Id=openstack-glance-api.service ActiveState=active
MainPID=1172 Id=openstack-glance-registry.service ActiveState=active
MainPID=1201 Id=openstack-heat-api-cfn.service ActiveState=active
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;execute-the-upgrade-initialization-step&quot;&gt;Execute the upgrade initialization step&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrade_init&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is called the initialization step since it sets up the repos on the
overcloud nodes (for the upgrade we are going to) and delivers the upgrade
script to the non-controller nodes. This step is instigated through the
inclusion of the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker-init.yaml&quot;&gt;major-upgrade-pacemaker-init.yaml&lt;/a&gt;
in the deployment command. For example:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml
  -e network_env.yaml --ntp-server '0.fedora.pool.ntp.org'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the heat stack has gone to &lt;em&gt;UPDATE_COMPLETE&lt;/em&gt; you can check all non controller
nodes for the presence of the newly delivered upgrade script &lt;em&gt;tripleo_upgrade_node.sh&lt;/em&gt;:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@overcloud-novacompute-0 ~]# ls -l /root
-rwxr-xr-x. 1 root root 348 Jun  3 11:26 tripleo_upgrade_node.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One point to note is that the rpc version which we will use for pinning nova
rpc during the upgrade is set in the compute upgrade script:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@overcloud-novacompute-0 ~]# cat tripleo_upgrade_node.sh
### DO NOT MODIFY THIS FILE
### This file is automatically delivered to the compute nodes as part of the
### tripleo upgrades workflow

# pin nova to kilo (messaging +-1) for the nova-compute service

crudini  --set /etc/nova/nova.conf upgrade_levels compute mitaka

yum -y install python-zaqarclient  # needed for os-collect-config
yum -y update
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The line with the upgrade_levels compute above is actually written using the
parameter we passed in the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker-init.yaml#L2&quot;&gt;major-upgrade-pacemaker-init.yaml&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should also see the updated /etc/yum.repos.d/* on all overcloud nodes after
this step so you can confirm that is all in order for the upgrade to proceed.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrade-controller-nodes-and-your-entire-pacemaker-cluster&quot;&gt;Upgrade controller nodes (and your entire pacemaker cluster)&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrade_controllers&quot;&gt; &lt;/a&gt;
&lt;em&gt;(I skipped upgrading swift nodes, as it isn’t very interesting/much to say, see
the &lt;a href=&quot;https://review.openstack.org/#/c/308985/&quot;&gt;WIP Docs&lt;/a&gt; for more or ping me).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This step will upgrade your controller nodes and during this process the entire
cluster will be taken offline - this is normal. This step
is instigated by including the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker.yaml&quot;&gt;major-upgrade-pacemaker.yaml&lt;/a&gt;
environment file. For example:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml
  -e network_env.yaml --ntp-server '0.fedora.pool.ntp.org'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I typically observe the pacemaker cluster during the upgrade process. For
example on controller 1 i have &lt;strong&gt;watch -d pcs status&lt;/strong&gt; and on controller-2 I
have &lt;strong&gt;watch -d pcs status | grep -ni stop -C 2&lt;/strong&gt;. During the upgrade the
pacemaker cluster goes down completely at some point, before yum packages are
updated and then the cluster is brought back up.&lt;/p&gt;

&lt;p&gt;Once you start to see pacemaker services go down it means that the code in
&lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh&quot;&gt;major_upgrade_controller_pacemaker_1.sh&lt;/a&gt;
 is running and eventually the cluster is stopped completely.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Every 2.0s: pcs status | grep -ni stop -C2 -B1                                                               Fri Jun  3 11:52:07 2016

Error: cluster is not currently running on this node
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At this point you can start to monitor /var/log/yum.log to see packages being
upgraded.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@overcloud-controller-0 ~]# tail -f /var/log/yum.log
Jun 03 11:51:52 Updated: erlang-otp_mibs-18.3.3-1.el7.x86_64
Jun 03 11:51:52 Installed: python2-rjsmin-1.0.12-2.el7.x86_64
Jun 03 11:51:52 Updated: python-django-compressor-2.0-1.el7.noarch
Jun 03 11:51:53 Updated: ntp-4.2.6p5-22.el7.centos.2.x86_64
Jun 03 11:51:53 Updated: rabbitmq-server-3.6.2-3.el7.noarch
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the cluster starts to come back online and services start then
you know that &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh&quot;&gt;major_upgrade_controller_pacemaker_2.sh&lt;/a&gt;
is being executed.&lt;/p&gt;

&lt;p&gt;After the stack is UPDATE_COMPLETE, you can check the rpc pin is set on
nova.conf on all controllers:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@overcloud-controller-0 ~]# grep -rni upgrade -A 1 /etc/nova/*
/etc/nova/nova.conf:106:[upgrade_levels]
/etc/nova/nova.conf-107-compute = mitaka
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrade-compute-and-ceph-nodes&quot;&gt;Upgrade compute and ceph nodes&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrade_compute_and_ceph&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This uses the &lt;a href=&quot;https://github.com/openstack/tripleo-common/blob/463cf7f922291dc47593caabe5ef4e8b728c2f55/scripts/upgrade-non-controller.sh&quot;&gt;upgrade-non-controller.sh&lt;/a&gt; script, to execute
the &lt;em&gt;tripleo_upgrade_node.sh&lt;/em&gt; on each non controller node, for example:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[stack@instack ~]$ upgrade-non-controller.sh --upgrade overcloud-novacompute-0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On both node types you can check that the yum update has been executed
successfully. Note that the &lt;em&gt;tripleo_upgrade_node.sh&lt;/em&gt; script is customized for
each node type, so they &lt;em&gt;will&lt;/em&gt; be different between computes and ceph nodes for
example. However in all cases there will at some point be a
&lt;strong&gt;yum -y update&lt;/strong&gt;. See the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/major_upgrade_compute.sh&quot;&gt;major_upgrade_compute.sh&lt;/a&gt;
and &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/major_upgrade_ceph_storage.sh&quot;&gt;major_update_ceph_storage.sh&lt;/a&gt; for
more info on how else they might differ.&lt;/p&gt;

&lt;p&gt;For compute nodes you can check that the upgrade_levels is set for the nova
rpc pinning in /etc/nova/nova.conf (which in the case of computes is used by
nova-compute itself, api/sched/conductor etc are on controller).&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[root@overcloud-novacompute-0 ~]# grep -rni upgrade -A 1 /etc/nova/*
/etc/nova/nova.conf:106:[upgrade_levels]
/etc/nova/nova.conf-107-compute = mitaka
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;upgrade-converge---apply-config-deployment-wide-and-restart-things&quot;&gt;Upgrade converge - apply config deployment wide and restart things.&lt;/h3&gt;
&lt;p&gt;&lt;a id=&quot;upgrade_converge&quot;&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The last step in the upgrade workflow is where we re-apply the deployment-wide
config as specified by the tripleo-heat-templates used in the deploy/upgrade
commands. It is instigated by including the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker-converge.yaml&quot;&gt;major-upgrade-pacemaker-converge.yaml&lt;/a&gt; environment file, for example:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml
  -e network_env.yaml --ntp-server '0.fedora.pool.ntp.org'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For both &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker-init.yaml&quot;&gt;major-upgrade-pacemaker-init.yaml&lt;/a&gt;
(upgrade initialisation) as well as &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker.yaml&quot;&gt;major-upgrade-pacemaker.yaml&lt;/a&gt;
(controller upgrade) we specify for the resource registry:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;OS::TripleO::ControllerPostDeployment: OS::Heat::None
OS::TripleO::ComputePostDeployment: OS::Heat::None
OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None
OS::TripleO::BlockStoragePostDeployment: OS::Heat::None
OS::TripleO::CephStoragePostDeployment: OS::Heat::None
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which means that things like the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/puppet/controller-config-pacemaker.yaml&quot;&gt;controller-config-pacemaker.yaml&lt;/a&gt;
&lt;em&gt;do not&lt;/em&gt; happen for controllers during those steps. That is, application of
the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/tree/bcd726f1242d78169e6a5687e998473c1043c622/puppet/manifests&quot;&gt;overcloud_**.pp manifests&lt;/a&gt;
does not happen during upgrade initialisation or controller upgrade.&lt;/p&gt;

&lt;p&gt;However for converge we simply do not override this in the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/environments/major-upgrade-pacemaker-converge.yaml&quot;&gt;major-upgrade-pacemaker-converge.yaml&lt;/a&gt;
environement file so that the normal puppet manifests get applied for each node,
delivering any config changes (e.g. updates to liberty had to deal with a
rabbitmq password change causing issues such &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1321132&quot;&gt;as this&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Since we are applying new config we need to make sure everything is restarted
properly to pick this up so we use the
&lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/pacemaker_resource_restart.sh&quot;&gt;pacemaker_resource_restart.sh&lt;/a&gt; after the normal puppet manifests are applied.&lt;/p&gt;

&lt;p&gt;So during this step, the pacemaker cluster will first go into an “unmanaged”
state and this is to be expected and not a cause for alarm. This is because as
a matter of practice, before applying the controller puppet manifest, we set
he cluster to maintenance mode (as we are going to write to the pacemaker
resource definitions/constraints to the cib) &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/pre_puppet_pacemaker.yaml&quot;&gt;like this&lt;/a&gt;
which uses the &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/pacemaker_maintenance_mode.sh&quot;&gt;script here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After the manifest is applied we unset maintenance mode &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/post_puppet_pacemaker.yaml&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You should then see services restarting as &lt;a href=&quot;https://github.com/openstack/tripleo-heat-templates/blob/bcd726f1242d78169e6a5687e998473c1043c622/extraconfig/tasks/pacemaker_resource_restart.sh&quot;&gt;pacemaker_resource_restart.sh&lt;/a&gt; is being executed. Seeing all the services running again at this
point is a good indication that the converge step is coming to an end successfully.&lt;/p&gt;

</description>
<published>2016-06-03 00:00:00 +0300</published>
<link>http://mariosandreou.com/tripleo/2016/06/03/monitor-tripleo-upgrade.html</link>
</item>

</channel>
</rss>
