Jekyll2021-01-04T15:50:12+00:00http://www.aarondodd.com/feed.xmlaaron dodd dot comRunning Jekyll on Amazon Linux 22019-02-17T00:00:00+00:002019-02-17T00:00:00+00:00http://www.aarondodd.com/aws/amazon/linux/jekyll/2019/02/17/running_jekyll_on_amazon_linux_2<p>Switching to AWS Workspaces using the Amazon Linux 2 image as my main development machine, I was struggling to get Jekyll running. There are a few differences from Ubuntu and the <a href="https://jekyllrb.com/docs/installation/">Jekyll installation instructions</a>.</p>
<p>First, install Ruby using the amazon-linux-extras command:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>amazon-linux-extras list
0 ansible2 available <span class="o">[</span> <span class="o">=</span>2.4.2 <span class="o">=</span>2.4.6 <span class="o">]</span>
2 httpd_modules available <span class="o">[</span> <span class="o">=</span>1.0 <span class="o">]</span>
3 memcached1.5 available <span class="o">[</span> <span class="o">=</span>1.5.1 <span class="o">]</span>
4 nginx1.12 available <span class="o">[</span> <span class="o">=</span>1.12.2 <span class="o">]</span>
5 postgresql9.6 available <span class="o">[</span> <span class="o">=</span>9.6.6 <span class="o">=</span>9.6.8 <span class="o">]</span>
6 postgresql10 available <span class="o">[</span> <span class="o">=</span>10 <span class="o">]</span>
8 redis4.0 available <span class="o">[</span> <span class="o">=</span>4.0.5 <span class="o">=</span>4.0.10 <span class="o">]</span>
9 R3.4 available <span class="o">[</span> <span class="o">=</span>3.4.3 <span class="o">]</span>
10 rust1 available <span class="se">\</span>
<span class="o">[</span> <span class="o">=</span>1.22.1 <span class="o">=</span>1.26.0 <span class="o">=</span>1.26.1 <span class="o">=</span>1.27.2 <span class="o">=</span>1.31.0 <span class="o">]</span>
11 vim available <span class="o">[</span> <span class="o">=</span>8.0 <span class="o">]</span>
13 ruby2.4<span class="o">=</span>latest enabled <span class="o">[</span> <span class="o">=</span>2.4.2 <span class="o">=</span>2.4.4 <span class="o">]</span>
15 php7.2 available <span class="se">\</span>
<span class="o">[</span> <span class="o">=</span>7.2.0 <span class="o">=</span>7.2.4 <span class="o">=</span>7.2.5 <span class="o">=</span>7.2.8 <span class="o">=</span>7.2.11 <span class="o">=</span>7.2.13 <span class="o">=</span>7.2.14 <span class="o">]</span>
16 php7.1 available <span class="o">[</span> <span class="o">=</span>7.1.22 <span class="o">=</span>7.1.25 <span class="o">]</span>
17 lamp-mariadb10.2-php7.2 available <span class="se">\</span>
<span class="o">[</span> <span class="o">=</span>10.2.10_7.2.0 <span class="o">=</span>10.2.10_7.2.4 <span class="o">=</span>10.2.10_7.2.5
<span class="o">=</span>10.2.10_7.2.8 <span class="o">=</span>10.2.10_7.2.11 <span class="o">=</span>10.2.10_7.2.13
<span class="o">=</span>10.2.10_7.2.14 <span class="o">]</span>
18 <span class="nv">libreoffice</span><span class="o">=</span>latest enabled <span class="o">[</span> <span class="o">=</span>5.0.6.2_15 <span class="o">=</span>5.3.6.1 <span class="o">]</span>
19 <span class="nv">gimp</span><span class="o">=</span>latest enabled <span class="o">[</span> <span class="o">=</span>2.8.22 <span class="o">]</span>
20 <span class="nv">docker</span><span class="o">=</span>latest enabled <span class="se">\</span>
<span class="o">[</span> <span class="o">=</span>17.12.1 <span class="o">=</span>18.03.1 <span class="o">=</span>18.06.1 <span class="o">]</span>
21 mate-desktop1.x<span class="o">=</span>latest enabled <span class="o">[</span> <span class="o">=</span>1.19.0 <span class="o">=</span>1.20.0 <span class="o">]</span>
22 GraphicsMagick1.3<span class="o">=</span>latest enabled <span class="o">[</span> <span class="o">=</span>1.3.29 <span class="o">]</span>
23 tomcat8.5 available <span class="o">[</span> <span class="o">=</span>8.5.31 <span class="o">=</span>8.5.32 <span class="o">]</span>
24 epel available <span class="o">[</span> <span class="o">=</span>7.11 <span class="o">]</span>
25 testing available <span class="o">[</span> <span class="o">=</span>1.0 <span class="o">]</span>
26 ecs available <span class="o">[</span> <span class="o">=</span>stable <span class="o">]</span>
27 corretto8 available <span class="se">\</span>
<span class="o">[</span> <span class="o">=</span>1.8.0_192 <span class="o">=</span>1.8.0_202 <span class="o">]</span>
28 firecracker available <span class="o">[</span> <span class="o">=</span>0.11 <span class="o">]</span>
29 golang1.11 available <span class="o">[</span> <span class="o">=</span>1.11.3 <span class="o">]</span>
30 squid4 available <span class="o">[</span> <span class="o">=</span>4 <span class="o">]</span>
<span class="nv">$ </span>amazon-linux-extras <span class="nb">install </span>ruby2.4</code></pre></figure>
<p>Then, install additional packages, jekyll, and bundler</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>yum <span class="nb">install </span>ruby-rdoc ruby-devel <span class="nt">-y</span>
<span class="nb">sudo </span>gem <span class="nb">install </span>jekyll bundler</code></pre></figure>
<p>To run Jekyll, I had to do so via <code class="highlighter-rouge">bundle exec</code> instead of <code class="highlighter-rouge">jekyll s</code></p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">bundle <span class="nb">exec </span>jekyll serve</code></pre></figure>Aaron DoddSwitching to AWS Workspaces using the Amazon Linux 2 image as my main development machine, I was struggling to get Jekyll running. There are a few differences from Ubuntu and the Jekyll installation instructions.Effective report visualization: an example2018-10-26T00:00:00+00:002018-10-26T00:00:00+00:00http://www.aarondodd.com/reporting/visualizations/2018/10/26/intro_to_effective_visualization<p>A few years ago we had inherited the support of several dozen applications from another vendor, partially because that vendor was not meeting their Service Level Agreement (SLA) obligations. After we started, we realized one of the issues was a lack of resources. Other efficiencies were needed, but people were as well. We effectively convinced the client to gradually expand the team to what we projected was needed, with the caveat that we would pay a severe penalty if the added people (and cost) did not bring SLAs up within a year and into compliance after two.</p>
<p>The delivery manager on that project had fifteen minutes to present our improvements to the client’s management, so the report needed to be brief and easy to understand, ideally fitting on one PowerPoint slide. I had recently read Edward R. Tufte’s amazing book <em>The Visual Display of Quantitative Information</em>, so when he reached out for help I was excited to apply the concepts to something concrete.</p>
<h2 id="determining-the-message">Determining the message</h2>
<p>The first step we needed was to understand exactly what story we wanted to tell. Aside from our potential non-compliance fee, we certainly would not be renewed if we did not meet our goals, so showing our improvement was critical to keeping the client. Highlighting how well we improved support would also enhance our reputation, leading to more work.</p>
<p>We had also identified a potential area the client could improve, so finding a way to highlight this was important for our sales team.</p>
<p>After reviewing the amended contract and our activities, we developed the following objectives:</p>
<ul>
<li>Track the addition of people over the past two years</li>
<li>Show our SLA compliance and mean-time-to-resolve (MTTR) for requests as these were the key performance indicators in our contract</li>
<li>Graph the request volumes by severity, since higher severity tickets require more attention, and total volume is a measure of our throughput</li>
<li>Add the client’s deployment counts since we knew these affected ticket volumes: whenever the client deployed new code, we were inundated with high severity tickets and this would give our account manager a segue into proposing an optimization of their deployment process (a potential upsell)</li>
</ul>
<h2 id="gathering-the-data">Gathering the data</h2>
<p>One of my guiding principles is loosely based off Peter Drucker: “You can’t manage what you don’t measure”. Fortunately, we had the data to meet our story goals, although it required some effort to extract:</p>
<ul>
<li>Request volumes had to be queried from the ticketing system’s database since the built in reports didn’t work</li>
<li>MTTR required matching ticket IDs with the activity in JIRA, since tickets were mirrored there and end users rarely marked tickets resolved in the ticketing system</li>
<li>Deployments could be gathered from NewRelic, since each deployment set a flag so application performance could be tracked by releases</li>
</ul>
<p>The end result was the following table in Excel:</p>
<p><img src="/assets/images/visualization/table.png" alt="source table" class="img-align-center" /></p>
<h2 id="the-first-attempt">The first attempt</h2>
<p>At this point, the delivery manager drafted an initial report for our account team to review and I went back to my regularly scheduled programming. A few days later, he sent me the following, along with a long textual explaination, that the account team rejected:</p>
<p><img src="/assets/images/visualization/initial_chart.png" alt="source table" class="img-align-center" /></p>
<p>I admit, I was scratching my head looking at the chart.</p>
<p>I ignored the explaination, and asked him to step me through it. The points we wanted to address are indeed in there. They are just lost, drowning in a combination of mis-aligned scales and jumbled elements. When he sent three separate graphs, the story was coming together, but still lacking and would not fit on a single slide.</p>
<p>We went through a few iterations to clearly show the message, but before I get to that, let’s pick this apart with Tufte’s help.</p>
<p>The single chart of all metrics is a “visual puzzle” which requires the reader to “interpret through a verbal rather than a visual process” (Tufte, 153)<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Executives have limited attention spans and time, so the more concise we can present our data, the more likely our point will be made.</p>
<p>Tufte says that “graphics can be designed to have at least three viewing depths: (1) what is seen from a distance … (2) what is seen up close and in detail … and (3) what is seen implicitly, underlying the graphic” (Tufte, 155)<sup id="fnref:1:1"><a href="#fn:1" class="footnote">1</a></sup>. From a distance, nothing in this chart is significant. Total Tickets and Regular Tickets are the highest two lines. But, Total Tickets is not necessary information since it just sums the other metrics. Trying to drill down, with Regular Tickets so high and still tied to the same vertical axis as the other metrics, the lower lines are lost in a tangle of visually similar values, hiding any information they should show. Implicitly, no trends or correlations are obvious.</p>
<p>The black bar in the center was intended to show the break between 2016 and 2017, also delineating the change between starting to add more people and finally having a full team. Instead, it stands out as the first thing the eye sees and conveys no meaningful information.</p>
<p>The gridlines are heavy, with both horizontal and vertical lines. The left axis is dense, with too many labeled points. The legend’s border and background are unnecessary, as is its placement to the right, cutting into the graph area. The axes are labelled poorly, with text on the vertical oriented sideways and generically called Values. The horizontal label is redundant since the axis points clearly convey their meaning. Since “every bit of ink on a graphic requires reason” (Tufte, 96)<sup id="fnref:1:2"><a href="#fn:1" class="footnote">1</a></sup>, these elements are “chartjunk”, distracting or extraneous flourishes that add no value (Tufte, 113)<sup id="fnref:1:3"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<p>Another issue I saw was in placing metrics of various values on the same scale. SLA Compliance is a value from 0 to 100, meaning it will never be above the lower lines of the graph. Full Time Employees started at 6 and ends at 10, making it nearly invisible. Even among the ticket counts, Regular Tickets reach 498 while the high severity (and more important) tickets never exceed 40.</p>
<p>Glancing back to the original data table, I can divine more information reading that than I can from this chart.</p>
<h2 id="revising">Revising</h2>
<p>The first item I wanted to address was the wide disparity between the lines shown. Since Total Tickets is redundant information, I removed that metric completely. It was only included it since we thought it would be too much effort for someone to understand the scale of requests, but “it is a frequent mistake in thinking about statistical graphics to underestimate the audience … why not assume that if you understand it, most readers will, too?” (Tufte, 136)<sup id="fnref:1:4"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<p>The Regular Tickets scale was far larger than the severity tickets, so I added a second vertical axis just for that measure. This allows us to show a comparative change in each type of ticket at a similar view, while still giving the reader the underlying numbers. In this way, trends of each type of ticket and their relations to each other are more accessible.</p>
<p>This left me considering the other metrics: SLA Compliance, Av. Ticket TTR, and Full Time Employees. These do not match the scale of the ticket counts, so they should be moved to separate graphs. But, the relationship between these three I still wanted to compare (less employees correlate to higher response times and lower compliance). To keep these together I changed the scale. For the left vertical axis, I kept the SLA Compliance range, but set the minimum and maximum values to the actual data end points (0.2 and 1.0) instead of arbitrarily starting at 0.</p>
<p>The Full Time Employees ranged from 6 to 10 and the Av. Ticket TTR (min) ranged from 17 minutes to 92 minutes. On consideration, the TTR times below 20 minutes weren’t really relevant (20 could be a good cut-off point since it would be unreasonable to resolve a ticket in 0 minutes). I decided to alter the scale of that metric to min/10, reducing a time of, for example, 98 minutes to a value of 9.8. This would then show the TTR metric with the same 0 to 10 range as Full Time Employees. Since the SLA axis was set from 0.2 to 1, I created a right axis for Employees and TTR (min/10) ranging from 2 to 10.</p>
<p>The original graphic attempted to combine two time periods on one graph: 2016 versus 2017. I still wanted to keep this idea, as the time progression naturally leads for a left-to-right viewing and breaking the charts up by year would create duplication of various non-data-ink elements. Instead, I decided to create a quadrant graph out: the left column would show 2016 data and the right would show 2017; the top would show ticket values and the bottom would show compliance, time, and team members. The horizontal axis remains the same, so I kept only one label for that.</p>
<p>I took inspiration for this stacked chart form from New York City’s Weather for 1980 below (Tufte, 30)<sup id="fnref:1:5"><a href="#fn:1" class="footnote">1</a></sup> as it kept the horizontal axis the same while combining related sets of data. This lets one vertically compare various metrics at different scales along the same time line while still able to compare the period as a whole.</p>
<p><img src="/assets/images/visualization/tufte_weather.jpg" alt="New York City's Weather for 1980" class="img-align-center" /></p>
<p>Next, the awful black line in the original graphic was erased completely, including the underlying data-ink, to create a white line running down both charts to subtly separate the years. I used the horizontal axis as the division between the top and bottom graphs. Combined with the white line, this creatively uses text and whitespace to create the quadrants.</p>
<p>The vertical grid lines and axis lines were removed. For the data lines, if a graphic suggests a horizon then it “also suggests that a shaded, high contrast display might … be better than the floating snake” (Tufte, 187)<sup id="fnref:1:6"><a href="#fn:1" class="footnote">1</a></sup>, so I filled the areas. Not satisfied with multiple metrics overlapping and hiding each other, I set a transparency to allow the reader to still view each area and line.</p>
<p>Since there is no intuitive hierarchy between colors, but variations of shading imply a direct order (Tufte, 154)<sup id="fnref:1:7"><a href="#fn:1" class="footnote">1</a></sup>, I decided that Outage Tickets, being the highest severity, should be red with Sev 1 being an orange and Sev 2 being a yellow, using an intuitive temperature order. The Regular Tickets I assigned a neutral blue as they are far less important, but also worth highlighting as a contrasting measure.</p>
<p>I still wanted to show the impact of Deployments on the workload, but there was no particular importance order to that measure, so a colored area felt arbitrary and distracting, muddying the order given to the other tickets. Instead, I left that as a line but with the weight reduced and the points themselves increased. I decided similarly for the Full Time Employees measure as it was essentially just an increasing line and didn’t need to cover the SLA or TTR values.</p>
<p>Lastly, my point about the left versus right portions needed some highlighting. Since Tufte says “use words, numbers, and drawing together” (Tufte, 177)<sup id="fnref:1:8"><a href="#fn:1" class="footnote">1</a></sup>, I decided to write directly on the chart, as well as to clearly describe the vertical axis.</p>
<p>The result of these alterations is below:</p>
<p><img src="/assets/images/visualization/revised_chart_1.png" alt="revised graph" class="img-align-center" /></p>
<p>I can quickly see the request volumes and how deployments align to increased high-severity work. The inverse valleys and peaks of deployments to regular requests is shown. The impact of the number of employees to the key performance indicators for our group are quickly discerned at the bottom. Between the two graphs, it is even possible to see that, while the number of tickets and deployments remain on average the same, the response times and compliance are much improved in 2017.</p>
<h2 id="another-revision">Another revision</h2>
<blockquote>
<p>“Just as a good editor of prose ruthlessly prunes out unnecessary words, so a designer of statistical graphics should prune out ink that fails to present fresh data-information” (Tufte, 100)<sup id="fnref:1:9"><a href="#fn:1" class="footnote">1</a></sup>.</p>
</blockquote>
<p>We had fun coming up with the revised chart, but why stop there? We wondered if the breakdown of severity tickets is even required. If I combine Outage, Sev 1, and Sev 2 I can further simplify the graph while still showing the correlations with deployments and regular work. For a front-line manager like myself, I would care about the severity details, but as a summary for executives, it could be irrelevant.</p>
<p>I could reduce the non-data-ink even further by removing my axis-labels and combining the legend with the related axis, which might be a little off-putting on first glance but feels more intuitive.</p>
<p>Our final revision:</p>
<p><img src="/assets/images/visualization/revised_chart_2.png" alt="final graph" class="img-align-center" /></p>
<p>Merging the higher severity tickets together added another dimension to the story: you clearly see regular requests increase when deployments are low, and severity requests increase with deployments. This allowed us to dig further and discover many on the development side were simply sitting on new work while preparing for a deployment, and submitting their backlog when they had time. This was another point our sales team could add to the deployment optimization proposal.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Far from being a theoretical book, Tufte’s <em>The Visual Display of Quantitative Information</em> is a practical guide for designing compelling statistical graphics. It does not give a how-to approach so much as best practices supported with real-world examples.</p>
<p>The key points used here were:</p>
<ul>
<li>Start with the story–what does (or should) the graphic portray–then think how to visualize it.</li>
<li>Non-data “ink” should be kept to a minimum. Extraneous marks detract from the data, which should be the focus (gridlines can be subdued or often removed).</li>
<li>Don’t be afraid to use white space, even erasing parts of the graphic.</li>
<li>Multivariate data needs to be related and properly scaled.</li>
<li>Colors have no inherent order, but using related colors (shades or “temperature”) can imply priority. Avoid assigning random colors.</li>
<li>Lines that suggest a horizon generally work better with shaded areas. Opacity can help highlight overlaps without hiding information.</li>
<li>If multiple sets of data tell a coherent story together, don’t be afraid to get creative in combining them.</li>
<li>Avoid duplication (of data and other elements) and arbitrary axis scales.</li>
</ul>
<p>This is just a small use of the concepts. I highly recommend the book to anyone who has to visualize information.</p>
<p class="footnotes-title">References and further reading:</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Tufte, Edward R.. <a href="https://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142/ref=sr_1_1?ie=UTF8&qid=1540584572&sr=8-1&keywords=the+visual+display+of+quantitative+information&dpID=41tNVlRHZNL&preST=_SX218_BO1,204,203,200_QL40_&dpSrc=srch">The Visual Display of Quantitative Information. Graphics Press 2001</a> <a href="#fnref:1" class="reversefootnote">↩</a> <a href="#fnref:1:1" class="reversefootnote">↩<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote">↩<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote">↩<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote">↩<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote">↩<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote">↩<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote">↩<sup>8</sup></a> <a href="#fnref:1:8" class="reversefootnote">↩<sup>9</sup></a> <a href="#fnref:1:9" class="reversefootnote">↩<sup>10</sup></a></p>
</li>
</ol>
</div>Aaron DoddA few years ago we had inherited the support of several dozen applications from another vendor, partially because that vendor was not meeting their Service Level Agreement (SLA) obligations. After we started, we realized one of the issues was a lack of resources. Other efficiencies were needed, but people were as well. We effectively convinced the client to gradually expand the team to what we projected was needed, with the caveat that we would pay a severe penalty if the added people (and cost) did not bring SLAs up within a year and into compliance after two.Continuous DevOps: The difference between Continuous Delivery and Deployment2018-10-17T00:00:00+00:002018-10-17T00:00:00+00:00http://www.aarondodd.com/devops/2018/10/17/continuous_devops_diff_btwn_cont_delivery_and_deployment<p>Continuous Integration, Delivery, and Deployment sound like similar concepts and there is often confusion around the last two. Many in DevOps have a good understanding of CI, but what is the difference between Continuous Delivery and Deployment?</p>
<p>When we conducted a Yammer poll in our DevOps group at work and only about 50% answered correctly. If we deliver our code, isn’t that the same as deployed? Well, no, but you may have laid the groundwork for a Continuous Deployment process, and maybe even partially implemented one, without realizing it.</p>
<p>Stefana Muller, Founder of LI Women In Tech and VP at Opsani, breaks down these three concepts succinctly (emphasis added by me):</p>
<blockquote>
<p>“Continuous Integration (CI) is a software engineering practice in which developers integrate code into a shared repository several times a day in order to obtain rapid feedback of that code. CI enables automated build and testing so that teams can rapidly work on a single project together.</p>
</blockquote>
<blockquote>
<p>“Continuous Delivery (CD) is a software engineering practice in which teams develop, build, test, and <strong>release software</strong> in short cycles. It depends on automation at every stage so that cycles can be both quick and reliable.</p>
</blockquote>
<blockquote>
<p>“Continuous Deployment is the process by which <strong>qualified changes</strong> in software code or architecture are deployed to production as soon as they are ready and <strong>without human intervention</strong>.”<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
</blockquote>
<p>Another clarification comes from Carl Caum, Product Manager for Puppet:</p>
<p><img src="/assets/images/caum_tweet_delivery.png" alt="Carl Caum on Delivery" class="img-align-center" /></p>
<blockquote>
<p>“Continuous deployment should be the goal of most companies that are not constrained by regulatory or other requirements … There are business cases in which IT must wait for a feature to go live, making continuous deployment impractical. While application feature toggles solve many of those cases, they don’t work in every case. The point is to decide whether continuous deployment is right for your company based on business needs.”<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
</blockquote>
<p>Delivery is about keeping the code in a deployable state. The end goal of the Integration/Delivery pipeline is the creation of artefacts on a repository (S3 bucket, download server, etc.) where all included features have passed all unit tests and quality controls and are ready for deployment to production, even if they aren’t deployed at that moment.</p>
<p>For Delivery, automation around infrastructure provisioning/teardown and code deployment to nodes is in support of this goal. Delivery is where many organizations stop and DevOps still has the silo of “developers code, operations deploys.” The last push to production is manual. Even if the actual deployment execution is automated, the approval and initiation is not.</p>
<p>Continuous Deployment goes a step further. Production changes require a special level of trust and responsibility (hopefully!). Changes in code may also require changes in architecture. Key performance objectives cannot be negatively impacted. Therefore, the process and tooling needs to align to and automate the enterprise’s workflows for approvals, audits, and controls around this application. Qualified (approved) releases are then automatically deployed to production.</p>
<p>Foundationally, think of Integration, Delivery, and Deployment as a pyramid sitting atop development and building on each other to production (inspired by Anatoliy Okhotnikov, Head of Engineering at Softjourn Inc.<sup id="fnref:1:1"><a href="#fn:1" class="footnote">1</a></sup>):</p>
<p><img src="/assets/images/ci_cd_cd_pyramid.png" alt="CI, CD, CD Pyramid" class="img-align-center" /></p>
<p>Functionally, a simplified process view would look like (borrowed from Atlassian and others<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>):</p>
<p><img src="/assets/images/ci_cd_cd_process.png" alt="CI, CD, CD Process" class="img-align-center" /></p>
<p>If Agile is supposed to accelerate time to market and improve product quality, then CI/CD/CD are the end-to-end framework extending this through to production (when feasible).</p>
<p class="footnotes-title">References and further reading:</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Stackify. <a href="https://stackify.com/continuous-delivery-vs-continuous-deployment-vs-continuous-integration/">Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration</a> <a href="#fnref:1" class="reversefootnote">↩</a> <a href="#fnref:1:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:2">
<p>Caum, Carl. <a href="https://puppet.com/blog/continuous-delivery-vs-continuous-deployment-what-s-diff">Continuous Delivery Vs. Continuous Deployment: What’s the Diff?</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Atlassian. <a href="https://www.atlassian.com/continuous-delivery/ci-vs-ci-vs-cd">Continuous integration vs. continuous delivery vs. continuous deployment</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Aaron DoddContinuous Integration, Delivery, and Deployment sound like similar concepts and there is often confusion around the last two. Many in DevOps have a good understanding of CI, but what is the difference between Continuous Delivery and Deployment?DevOps Lessons Learned: Don’t become the problem (again)2018-06-01T00:00:00+00:002018-06-01T00:00:00+00:00http://www.aarondodd.com/devops/2018/06/01/DevOps_Lessons_Learned-Be_Vigilant<p><em>This was published in the company newsletter for the topic of describing a DevOps-related failure and what was learned.</em></p>
<p>The following is a true story. The names have been changed to protect the guilty.</p>
<p>I glanced at my phone as I pushed the “brew” button on my coffee maker. My notifications were the typical overnight ones: a few emails that could wait, a bunch to delete, one that I should probably read but didn’t want to deal with right after waking up, and a slew of escalations in our client’s group chat. That last one was odd as we were over a year into our DevOps engagement and had ironed out most of the troublesome processes and issues months ago. I scrolled through the chat log and frowned. One of my engineers and one of the client’s developers had been arguing back and forth for the past few hours over a recurring high-cpu alert on one web server in a farm. Both were obviously frustrated. Seeing my status change from “away” to “online” my overnight engineer pinged me.</p>
<p>“If Chris’ team has a code issue, why won’t they debug it?” he asked. Chris was the manager for the customer’s PHP development team whereas I ran our operations group that supported the infrastructure. Despite some good progress getting our guys to collaborate better, the stress of an abnormally high number of new environment builds in the past few weeks had caused old tensions to flare up.</p>
<p>I finished scanning the chat log before replying. It was the same issue we’d highlighted for over a week. One server in a farm of thirty became unresponsive due to high CPU once every few days at varying times, but always overnight. We could find no infrastructure or configuration issue with that node, and each time we confirmed the node had the identical software stack and code base as the others in the farm. The only clue was that one PHP thread would consume the CPU, but there were no cron jobs or access log entries to indicate a scheduled task either on the server or executed remotely.</p>
<p>“You escalated the ticket to development with your validation checks?” I asked.</p>
<p>“And again,” he said, “it was closed as ‘not an issue’ because it isn’t happening by the time they look into it.”</p>
<p>I wrote to the group chat: “If the server is healthy now and the application isn’t impacted, let’s write up our findings and discuss on the morning stand up. I’ll join in person.” To my engineer, I asked for all the tickets we had opened for the development team.</p>
<p><img src="/assets/images/cup_500x300.png" alt="coffee" class="img-align-center" /></p>
<p>I was on my third cup of coffee, this time from Starbucks on the way to the office, as I sat down for the morning sync between operations and development. The agenda was straightforward: discuss what happened yesterday, what’s planned for today, and what blockers exist. When we reached blockers, I brought up the seventeen tickets I could find for the development team about the CPU issue and asked what we can do to stop this from happening.</p>
<p>Chris sighed. “Look,” he said, “it doesn’t happen in dev, QA, or the load environments. It’s only one server and the application is still up. It’s not a code issue, it’s an operations one. Let’s just kill the instance when it happens and let it be respun. That the beauty of a cloud, right?”</p>
<p>“What is different about the application usage in production as opposed to lower environments?” I asked. “Aside from significantly higher load and more servers, are there any application configurations that don’t match?”</p>
<p>Chris glanced at his watch. “I have a hard stop, but if it’ll make you feel better, I’ll have someone dump the config tables and compare. I guarantee code is the same. If there’s no differences in configs, just restart PHP or the node and let’s move on. I’m not wasting any more development time on non-issues.”</p>
<p>I received Chris’ comparison of the configurations a few days later via email. He had also cc’d Rich, the VP of Technology to whom we both reported. As I expected he’d say, he found no differences to explain the CPU spikes and suggested to Rich that my team “use the cloud as it’s meant and put in self-healing.” I scheduled a follow up meeting and included Rich, but I was tired of fighting what felt like a small battle was more interested in just keeping Chris from painting us in a negative light. Both Chris and I eventually agreed to several action items: my team would automate restarting PHP if the CPU suddenly spiked or respin the node if it became unavailable, and Chris would implement “watchdog” logic in code to throttle the process.</p>
<p>The alerts stopped occurring. Since we logged all performance metrics and alerts for analysis, we could still see that, even though the CPU alerts weren’t generating, there was still a random node in the farm that would get respun every few nights, sometimes several times a night. Since we could find no availability issues and no one complained, the “issue” was soon forgotten. Both Chris and I mentioned the additional changes we implemented as “continuous improvements” on our monthly reports to Rich; a “win-win” for both our teams</p>
<hr />
<p>Later that month, Rich invited everyone for drinks after work as congratulations on the past year’s successful application launches. I found myself having a beer with Darren, introduced as the somewhat redundant sounding “Content Management System Manager” for one of the brands Rich–and, therefore, all of us–supported. “So, you’re the guy that pushes pretty buttons on websites,” I joked. Darren laughed. “Nah, my team develops the custom Drupal modules and front-end design.”</p>
<p>I was surprised as I was only aware of one development group. “Do you work with Chris’ team?” I asked.</p>
<p>He hesitated a moment, trying to place the name, then said, “Sort of. We commit to Git. They wave a wand to make it live. Sometimes we collaborate on Drupal core issues. They’re okay, I guess. I mean, we’re live, right? But, we’re pretty sure there’s something wrong with the setup and they keep saying it has to be our module.”</p>
<p>I smiled, remembering similar interactions, and asked what the issue was.</p>
<p>“A few months ago, we implemented an ingest hook to get assets from our vendor and process them for display on the site. In dev, we can do a full ingest of live data in just under an hour, but in production it seems to die randomly. It happens so often we implemented batching so we can re-process only parts of the feed after the ingest dies just so it can eventually complete. Even so, it takes upwards of six hours in prod and its reaching about eight hours this month. We keep spending time refactoring but I’m sure it’s not the module.”</p>
<p>I had a sinking feeling. “Does this run overnight?” I asked.</p>
<p>“Yea, every two or three days.”</p>
<p>I’m an idiot, I thought. I fished a business card out of my bag and asked Darren to call me, saying that I might know what’s going on. And worse, I kept to myself, we might be the cause.</p>
<hr />
<p>After I ended the conference bridge with Darren and his lead developer the next day, I looked over my notes on the ingest logic he described. There was no doubt. This change was the cause of the random production issues we’d been experiencing. Darren’s team had the foresight to realize ingestion couldn’t occur from every node in the farm at the same time, nor could they rely on any single node always being available, and they knew the additional processing might impact production traffic, so their logic would randomly choose one on which to execute an ingest event after business hours. Given the typical load in production as the node served content to end users, the overhead of this added process was enough to impair or kill a single server, something they wouldn’t notice in the non-production environments.</p>
<p><img src="/assets/images/work_600x400.png" alt="work" class="img-align-center" /></p>
<p>Worse still, our remediation steps treated the issue as just a CPU alert and, in our attempt to “fix” that, we caused bigger problems. I realized we had broken several key concepts of “DevOps” that we needed to address right away, in addition to actually helping Darren with getting their ingestion logic working.</p>
<p>We called ourselves a “DevOps” team and, while our KPIs on the monthly reports testified to the improvements we had brought to the project, they weren’t enough if we weren’t properly aligned to the business. The RACI matrix we agreed to with the client listed Chris’ development team as responsible for anything code and application configuration related, while my team was responsible for infrastructure and operating systems. Although each team was designated to the other as “to be consulted,” in practice we’d regressed to the old siloed “operations versus development” mindset since we could simply point to the RACI. More importantly, we had completely missed including the critical development teams into our processes. This meant that we were all working on tiny pieces of the larger puzzle without understanding what was actually needed or what effects we were having, which actually decreased efficiency and reliability across the board.</p>
<p>These were key lessons and they seemed obvious in retrospect, but they led me to a more important one. I had become part of the problem. I had grown complacent with the quality of my team’s work. Instead of keeping focused on continually improving, I had grown weary of fighting political battles that shouldn’t have existed in the first place had I been properly highlighting project risks and working to improve the processes. I let myself, my team, and the customer down by not remaining vigilant, by acting like an old-school operations manager and not a DevOps lead.</p>Aaron DoddThis was published in the company newsletter for the topic of describing a DevOps-related failure and what was learned.Opening links from WSL Ubuntu in Windows’ Chrome2018-03-23T00:00:00+00:002018-03-23T00:00:00+00:00http://www.aarondodd.com/wsl/tips/2018/03/23/opening_links_from_wsl_in_windows_chrome<p>Aside from Chrome, MS Office, and some work chat apps, most of my time is spent in WSL (using VcXsrv for X11 with WGL enabled).</p>
<p>One thing I was missing was how to open links in the Windows’ version of Chrome (I’ve yet to get sound working in WSL…).</p>
<p>For most cases, exporting BROWSER to the /mnt/c path of Google Chrome is sufficient, but I also use Markdown heavily in Vim and have a macro to generate HTML from it.</p>
<p>Below is a script to map the WSL paths to Windows, and to adjust some of my known DrvFS mounts to proper Windows mounts. If I’m browsing a path local to WSL, it instead opens my WSL version of Chrome.</p>
<p>I find it best to open a command prompt and type “dir /x c:" to get the 8.3 short-name of “C:\Program Files (x86)” since WSL seems to choke on spaces in paths.</p>
<p>I then renamed /usr/local/google-chrome to /usr/local/google-chrome_main and symlink’d /usr/local/google-chrome to this:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#!/bin/bash</span>
<span class="c"># Purpose:</span>
<span class="c"># 1. to open local file paths from /mnt/DRV in Windows' Google Chrome</span>
<span class="c"># 2. to translate known mount points (in this example, /home/aaron/addc_g is a DrvFS mount to my ExpanDrive H: for Google Drive)</span>
<span class="c"># Assumes:</span>
<span class="c"># 1. Google chrome is installed in windows</span>
<span class="c"># 2. Google chrome is installed in WSL</span>
<span class="c"># 3. the output of "dir /x c:\" shows the "Program Files (x86)" as "PROGRA~2"</span>
<span class="nv">WIN_GOOGLE</span><span class="o">=</span><span class="s2">"/mnt/c/PROGRA~2/Google/Chrome/Application/chrome.exe"</span>
<span class="nv">LIN_GOOGLE</span><span class="o">=</span><span class="s2">"google-chrome_main"</span>
<span class="k">if</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">==</span> /mnt/<span class="k">*</span> <span class="o">]]</span>
<span class="k">then</span>
<span class="c">#echo "windows"</span>
<span class="nv">url</span><span class="o">=</span><span class="k">${</span><span class="nv">1</span>:5<span class="k">}</span>
<span class="nv">url</span><span class="o">=</span><span class="k">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">url</span><span class="k">}</span><span class="s2">"</span> | <span class="nb">sed</span> <span class="s1">'s/^\///'</span> | <span class="nb">sed</span> <span class="s1">'s/\//\\\\/g'</span> | <span class="nb">sed</span> <span class="s1">'s/^./\0:/'</span><span class="k">)</span>
<span class="nb">eval</span> <span class="k">${</span><span class="nv">WIN_GOOGLE</span><span class="k">}</span> <span class="s2">"</span><span class="k">${</span><span class="nv">url</span><span class="k">}</span><span class="s2">"</span>
<span class="k">elif</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">==</span> /home/aaron/addc_g/<span class="k">*</span> <span class="o">]]</span>
<span class="k">then</span>
<span class="c">#echo "windows g-drive"</span>
<span class="nv">url</span><span class="o">=</span><span class="k">${</span><span class="nv">1</span>:19<span class="k">}</span>
<span class="nv">url</span><span class="o">=</span><span class="s2">"/h/</span><span class="k">${</span><span class="nv">url</span><span class="k">}</span><span class="s2">"</span>
<span class="nv">url</span><span class="o">=</span><span class="k">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">url</span><span class="k">}</span><span class="s2">"</span> | <span class="nb">sed</span> <span class="s1">'s/^\///'</span> | <span class="nb">sed</span> <span class="s1">'s/\//\\\\/g'</span> | <span class="nb">sed</span> <span class="s1">'s/^./\0:/'</span><span class="k">)</span>
<span class="nb">eval</span> <span class="k">${</span><span class="nv">WIN_GOOGLE</span><span class="k">}</span> <span class="s2">"</span><span class="k">${</span><span class="nv">url</span><span class="k">}</span><span class="s2">"</span>
<span class="k">elif</span> <span class="o">[[</span> <span class="nv">$1</span> <span class="o">==</span> http<span class="k">*</span> <span class="o">]]</span>
<span class="k">then</span>
<span class="c">#echo "windows url"</span>
<span class="nb">eval</span> <span class="k">${</span><span class="nv">WIN_GOOGLE</span><span class="k">}</span> <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
<span class="k">else</span>
<span class="c">#echo "linux"</span>
<span class="nv">url</span><span class="o">=</span><span class="nv">$1</span>
<span class="nb">eval</span> <span class="k">${</span><span class="nv">LIN_GOOGLE</span><span class="k">}</span> <span class="k">${</span><span class="nv">url</span><span class="k">}</span>
<span class="k">fi</span>
<span class="c">#echo $1</span>
<span class="c">#echo ${url}</span></code></pre></figure>
<p><a href="https://gist.github.com/aarondodd/913c6316351b11f6c3ea271ee8ce7ab7">Gist</a></p>
<p>For general Bash usage, my .bashrc has: export BROWSER=”/mnt/c/PROGRA~2/Google/Chrome/Application/chrome.exe”</p>Aaron DoddAside from Chrome, MS Office, and some work chat apps, most of my time is spent in WSL (using VcXsrv for X11 with WGL enabled).Scale an AWS Aurora cluster’s writer node2018-03-13T00:00:00+00:002018-03-13T00:00:00+00:00http://www.aarondodd.com/aws/auroria/2018/03/13/scale_aurora_cluster_writer_node<p>While there is no “autoscaling” for RDS, for adjusting an instance on a schedule, the AWS CLI can be used. For an Aurora cluster, when you resize the primary node (the writer), AWS fails writes over to one of the readers but never switches that role back (see the AWS doc for details on the failover settings and logic). This would leave the cluster in a state where the up-sized node becomes a “reader” and one of the smaller nodes would remain a “writer”.</p>
<p>In this use-case, I needed to increase the capacity of the writer for a few hours a day for a known ingestion event (the fleet of readers could remain the same size and number), but then decrease the writer afterwards. The standard “aws rds modify-db-instance” call works as expected, but after scaling the instance, Aurora still leaves a smaller reader in the cluster as the primary (writer).</p>
<p>Below is a script that does the resizing, then waits until the change has taken effect, then switches the “writer” back to the newly resized node.</p>
<p>Example usage:</p>
<p>scriptname.sh db.m4.xlarge</p>
<p>Where the parameter is the new shape to apply.</p>
<p>The cluster ID and node names used can be found in the RDS / Cluster page in the AWS console.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#!/bin/bash</span>
<span class="nv">instance_size</span><span class="o">=</span><span class="nv">$1</span>
<span class="nv">cluster_id</span><span class="o">=</span><span class="s2">"mycoolcluster-xxxx"</span>
<span class="nv">primary_node</span><span class="o">=</span><span class="s2">"mycoolcluster-node"</span>
<span class="nv">region</span><span class="o">=</span><span class="s2">"us-east-1"</span>
<span class="nv">pending_status</span><span class="o">=</span><span class="s2">""</span>
<span class="nv">check_for</span><span class="o">=</span><span class="s1">'"PendingModifiedValues": {},'</span>
aws rds modify-db-instance <span class="nt">--db-instance-identifier</span> <span class="s2">"</span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--db-instance-class</span> <span class="s2">"</span><span class="k">${</span><span class="nv">instance_size</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--apply-immediately</span> <span class="nt">--region</span> <span class="s2">"</span><span class="k">${</span><span class="nv">region</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">echo</span> <span class="s2">"Checking status for </span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2">..."</span>
<span class="k">until</span> <span class="o">[</span> <span class="s2">"</span><span class="k">${</span><span class="nv">pending_status</span><span class="k">}</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">""</span> <span class="o">]</span><span class="p">;</span> <span class="k">do
</span><span class="nv">pending_status</span><span class="o">=</span><span class="k">$(</span>aws rds describe-db-instances <span class="nt">--db-instance-identifier</span> <span class="s2">"</span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--region</span> <span class="s2">"</span><span class="k">${</span><span class="nv">region</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--output</span> json | <span class="nb">grep</span> <span class="s2">"</span><span class="k">${</span><span class="nv">check_for</span><span class="k">}</span><span class="s2">"</span><span class="k">)</span>
<span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2"> still pending changes, waiting."</span>
<span class="nb">sleep </span>10
<span class="k">done
</span><span class="nb">echo</span> <span class="s2">"Failing back to </span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2">"</span>
<span class="c"># sometimes it seems pending status is removed but node is not yet ready for failback (likely pending-reboot but not shown in CLI response)</span>
<span class="c"># so if an error occurs, keep trying (there's probably a better way to do this)</span>
<span class="k">while</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-ne</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">do
</span>aws rds failover-db-cluster <span class="nt">--db-cluster-identifier</span> <span class="s2">"</span><span class="k">${</span><span class="nv">cluster_id</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--target-db-instance-identifier</span> <span class="s2">"</span><span class="k">${</span><span class="nv">primary_node</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--region</span> <span class="s2">"</span><span class="k">${</span><span class="nv">region</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">sleep </span>5
<span class="k">done</span></code></pre></figure>
<p><a href="https://gist.github.com/aarondodd/76968a1a745717669bf88269191711fd">Gist</a></p>
<p>This can run as either a cron or part of the ingestion process.</p>
<p>There are likely better ways to actually check for the status.</p>Aaron DoddWhile there is no “autoscaling” for RDS, for adjusting an instance on a schedule, the AWS CLI can be used. For an Aurora cluster, when you resize the primary node (the writer), AWS fails writes over to one of the readers but never switches that role back (see the AWS doc for details on the failover settings and logic). This would leave the cluster in a state where the up-sized node becomes a “reader” and one of the smaller nodes would remain a “writer”.Query AWS EC2 nodes launched older than a certain date2018-02-28T00:00:00+00:002018-02-28T00:00:00+00:00http://www.aarondodd.com/aws/ec2/script/query/2018/02/28/query_ec2_instances_older_than_date<p>The AWS CLI allows for querying and filtering results, but I was having issues with creating a script to give me a list of running nodes launched more than 10 minutes ago.</p>
<p>Below is an example of how to do this.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">#!/bin/bash</span>
<span class="c"># Example how to query AWS for nodes that have been online older than a certain date</span>
<span class="c"># Example below returns just the "Name" tag value (intent is for looping through for other actions)</span>
<span class="c"># Example below also filters by "state=running" to excluded stopped or pending instances</span>
<span class="c"># To get newer than a certain date, just alter ?LaunchTime<=${ec2_older_than_date} for the <= to be >=</span>
<span class="nv">query_older_than_minutes</span><span class="o">=</span>10
<span class="c"># sed line conforms date output to AWS's datetime format</span>
<span class="nv">ec2_older_than_date</span><span class="o">=</span><span class="k">$(</span><span class="nb">date</span> <span class="nt">--date</span><span class="o">=</span><span class="s1">'-10 minutes'</span> <span class="nt">--utc</span> <span class="s2">"+%FT%T.%N"</span> | <span class="nb">sed</span> <span class="nt">-r</span> <span class="s1">'s/[[:digit:]]{6}$/Z/'</span><span class="k">)</span>
<span class="c"># add backticks to variable for inclusion in AWS call</span>
<span class="nv">ec2_older_than_date</span><span class="o">=</span><span class="s2">"</span><span class="se">\`</span><span class="k">${</span><span class="nv">ec2_older_than_date</span><span class="k">}</span><span class="se">\`</span><span class="s2">"</span>
<span class="nv">aws_servers</span><span class="o">=</span><span class="k">$(</span>aws ec2 describe-instances <span class="nt">--filters</span> <span class="s2">"Name=tag:APPGROUP,Values=myfunapp"</span> <span class="s2">"Name=instance-state-name,Values=running"</span> <span class="nt">--query</span> <span class="s2">"Reservations[].Instances[?LaunchTime<=</span><span class="k">${</span><span class="nv">ec2_older_than_date</span><span class="k">}</span><span class="s2">].[Tags[?Key==</span><span class="se">\`</span><span class="s2">Name</span><span class="se">\`</span><span class="s2">].Value]"</span> <span class="nt">--output</span> text<span class="k">)</span></code></pre></figure>
<p><a href="https://gist.github.com/aarondodd/9ef20793cc6f397199d90e924b3c98c2">Gist</a></p>
<p>The “aws_servers” could then be looped (for server in ${aws_servers}; do …)</p>Aaron DoddThe AWS CLI allows for querying and filtering results, but I was having issues with creating a script to give me a list of running nodes launched more than 10 minutes ago.Using the AWS PHP SDK to get a current EC2 node from a group of nodes2018-01-26T00:00:00+00:002018-01-26T00:00:00+00:00http://www.aarondodd.com/aws/ec2/script/query/2018/01/26/use_aws_php_sdk_to_get_ec2_node_info<p>When porting a Drupal application from on-site to cloud hosting, one of the issues was the use of drush aliases in one environment for drush commands to be run against the cloud environment. Since EC2 nodes in an autoscaling group can be replaced at any time, the developers needed an alternative to hard-coding IPs.</p>
<p>Below is a snippet of a drushrc file. Assuming the aws.phar is in the same folder as the drushrc, and the AWS CLI is properly configured with credentials (or, if this is on an EC2 instance, an IAM role is applied), this will query for nodes matching a tag “Group” and return the list. The drush aliases are then set to reference only the first response for the query.</p>
<p>Multiple filters can be applied in the query, just be sure to create a second array under Filters.</p>
<figure class="highlight"><pre><code class="language-php" data-lang="php"><span class="cp"><?php</span>
<span class="k">require</span> <span class="s1">'aws.phar'</span><span class="p">;</span>
<span class="c1">//===================================================================</span>
<span class="c1">// Readme:</span>
<span class="c1">//===================================================================</span>
<span class="c1">// To populate hostnames based on current live values, look up from AWS directly.</span>
<span class="c1">// Requires:</span>
<span class="c1">// - aws.phar in same folder as this script, or full path specified in the above require</span>
<span class="c1">// - IAM role assigned to node that allows Get* for ec2</span>
<span class="c1">//</span>
<span class="c1">// See https://docs.aws.amazon.com/aws-sdk-php/v3/guide/getting-started/installation.html#installing-via-phar for how to get the aws.phar file</span>
<span class="c1">//</span>
<span class="c1">// In the "Set Nodes" block, always specify index [0] to ensure only one name comes back (prod farms have multiple nodes)</span>
<span class="c1">//===================================================================</span>
<span class="c1">//===================================================================</span>
<span class="c1">// Set up connection:</span>
<span class="c1">//===================================================================</span>
<span class="nv">$ec2</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Aws\Ec2\Ec2Client</span><span class="p">([</span>
<span class="s1">'version'</span> <span class="o">=></span> <span class="s1">'latest'</span><span class="p">,</span>
<span class="s1">'region'</span> <span class="o">=></span> <span class="s1">'us-east-1'</span>
<span class="p">]);</span>
<span class="c1">//===================================================================</span>
<span class="c1">// Get Nodes - retrieves all nodes matching said filters</span>
<span class="c1">//===================================================================</span>
<span class="nv">$dev_nodes</span> <span class="o">=</span> <span class="nv">$ec2</span><span class="o">-></span><span class="na">describeInstances</span><span class="p">([</span>
<span class="s1">'Filters'</span> <span class="o">=></span> <span class="p">[</span>
<span class="p">[</span>
<span class="s1">'Name'</span> <span class="o">=></span> <span class="s1">'tag:Group'</span><span class="p">,</span>
<span class="s1">'Values'</span> <span class="o">=></span> <span class="p">[</span><span class="s1">'myfancyappdev'</span><span class="p">]</span>
<span class="p">]</span>
<span class="p">]</span>
<span class="p">]);</span>
<span class="nv">$qa_nodes</span> <span class="o">=</span> <span class="nv">$ec2</span><span class="o">-></span><span class="na">describeInstances</span><span class="p">([</span>
<span class="s1">'Filters'</span> <span class="o">=></span> <span class="p">[</span>
<span class="p">[</span>
<span class="s1">'Name'</span> <span class="o">=></span> <span class="s1">'tag:Group'</span><span class="p">,</span>
<span class="s1">'Values'</span> <span class="o">=></span> <span class="p">[</span><span class="s1">'myfancyappqa'</span><span class="p">]</span>
<span class="p">]</span>
<span class="p">]</span>
<span class="p">]);</span>
<span class="nv">$prod_nodes</span> <span class="o">=</span> <span class="nv">$ec2</span><span class="o">-></span><span class="na">describeInstances</span><span class="p">([</span>
<span class="s1">'Filters'</span> <span class="o">=></span> <span class="p">[</span>
<span class="p">[</span>
<span class="s1">'Name'</span> <span class="o">=></span> <span class="s1">'tag:Group'</span><span class="p">,</span>
<span class="s1">'Values'</span> <span class="o">=></span> <span class="p">[</span><span class="s1">'myfancyappprod'</span><span class="p">]</span>
<span class="p">]</span>
<span class="p">]</span>
<span class="p">]);</span>
<span class="c1">//===================================================================</span>
<span class="c1">// Set Nodes - assign public DNS of first node to var to use later</span>
<span class="c1">//===================================================================</span>
<span class="nv">$dev</span> <span class="o">=</span> <span class="nv">$dev_nodes</span><span class="p">[</span><span class="s1">'Reservations'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'Instances'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'PublicDnsName'</span><span class="p">];</span>
<span class="nv">$qa</span> <span class="o">=</span> <span class="nv">$qa_nodes</span><span class="p">[</span><span class="s1">'Reservations'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'Instances'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'PublicDnsName'</span><span class="p">];</span>
<span class="nv">$prod</span> <span class="o">=</span> <span class="nv">$prod_nodes</span><span class="p">[</span><span class="s1">'Reservations'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'Instances'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'PublicDnsName'</span><span class="p">];</span>
<span class="c1">//===================================================================</span>
<span class="c1">// environment dev</span>
<span class="nv">$aliases</span><span class="p">[</span><span class="s1">'dev'</span><span class="p">]</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span>
<span class="s1">'remote-host'</span> <span class="o">=></span> <span class="nv">$dev</span><span class="p">,</span>
<span class="p">);</span>
<span class="c1">// environment qa</span>
<span class="nv">$aliases</span><span class="p">[</span><span class="s1">'qa'</span><span class="p">]</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span>
<span class="s1">'remote-host'</span> <span class="o">=></span> <span class="nv">$qa</span><span class="p">,</span>
<span class="p">);</span>
<span class="c1">// prod</span>
<span class="nv">$aliases</span><span class="p">[</span><span class="s1">'prod'</span><span class="p">]</span> <span class="o">=</span> <span class="k">array</span><span class="p">(</span>
<span class="s1">'remote-host'</span> <span class="o">=></span> <span class="nv">$prod</span><span class="p">,</span>
<span class="p">);</span></code></pre></figure>
<p><a href="https://gist.github.com/aarondodd/c78a68142b402fa98e8dba9dbf5cc8fb">Gist</a></p>Aaron DoddWhen porting a Drupal application from on-site to cloud hosting, one of the issues was the use of drush aliases in one environment for drush commands to be run against the cloud environment. Since EC2 nodes in an autoscaling group can be replaced at any time, the developers needed an alternative to hard-coding IPs.Updating AWS Autoscaling Launch Configs with a new AMI using Lambda2017-12-08T00:00:00+00:002017-12-08T00:00:00+00:00http://www.aarondodd.com/aws/ec2/script/query/2017/12/08/updating_aws_launchconfig_with_new_ami<p>Amazon provides a great article<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> on using Lambda to automate updating the AMI of an auto scaling group’s launch configuration. The only problem with their provided code is that the existing launch configuration’s storage settings (ebs volumes) are not kept, so the new launch config has no disks specified, resulting in new launches using the AMI’s default settings.</p>
<p>Since an AMI may be generic, the storage settings may be specific to different use cases.</p>
<p>I took their example code and altered it slightly. In the version below, there is an additional lookup for the existing root volume information, which is then applied to the newly generated launch config. My change only looks for the root volume since that fits my use-case, but someone smart can adjust to loop through and keep all storage assignments, if needed.</p>
<p>Lambda code:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">from __future__ import print_function
import json
import datetime
import <span class="nb">time
</span>import boto3
print<span class="o">(</span><span class="s1">'Loading function'</span><span class="o">)</span>
def lambda_handler<span class="o">(</span>event, context<span class="o">)</span>:
print<span class="o">(</span><span class="s2">"Received event: "</span> + json.dumps<span class="o">(</span>event, <span class="nv">indent</span><span class="o">=</span>2<span class="o">))</span>
<span class="c"># get autoscaling client</span>
client <span class="o">=</span> boto3.client<span class="o">(</span><span class="s1">'autoscaling'</span><span class="o">)</span>
clientEc2 <span class="o">=</span> boto3.client<span class="o">(</span><span class="s1">'ec2'</span><span class="o">)</span>
<span class="c"># get object for the ASG we're going to update, filter by name of target ASG</span>
response <span class="o">=</span> client.describe_auto_scaling_groups<span class="o">(</span><span class="nv">AutoScalingGroupNames</span><span class="o">=[</span>event[<span class="s1">'targetASG'</span><span class="o">]])</span>
<span class="k">if </span>not response[<span class="s1">'AutoScalingGroups'</span><span class="o">]</span>:
<span class="k">return</span> <span class="s1">'No such ASG'</span>
<span class="c"># get name of InstanceID in current ASG that we'll use to model new Launch Configuration after</span>
sourceInstanceId <span class="o">=</span> response.get<span class="o">(</span><span class="s1">'AutoScalingGroups'</span><span class="o">)[</span>0][<span class="s1">'Instances'</span><span class="o">][</span>0][<span class="s1">'InstanceId'</span><span class="o">]</span>
<span class="c"># get the snapshotID of the source AMI</span>
responseAmi <span class="o">=</span> clientEc2.describe_images<span class="o">(</span><span class="nv">ImageIds</span><span class="o">=[</span>event[<span class="s1">'newAmiID'</span><span class="o">]])</span>
sourceAmiSnapshot <span class="o">=</span> responseAmi.get<span class="o">(</span><span class="s1">'Images'</span><span class="o">)[</span>0][<span class="s1">'BlockDeviceMappings'</span><span class="o">][</span>0][<span class="s1">'Ebs'</span><span class="o">][</span><span class="s1">'SnapshotId'</span><span class="o">]</span>
print<span class="o">(</span><span class="s1">'New source AMI: '</span> + event[<span class="s1">'newAmiID'</span><span class="o">]</span> + <span class="s2">" has snapshot ID: "</span> + sourceAmiSnapshot<span class="o">)</span>
<span class="c"># get block device mapping (by default boto doesn't copy this)</span>
sourceLaunchConfig <span class="o">=</span> response.get<span class="o">(</span><span class="s1">'AutoScalingGroups'</span><span class="o">)[</span>0][<span class="s1">'LaunchConfigurationName'</span><span class="o">]</span>
print<span class="o">(</span><span class="s1">'current launch config name:'</span> + sourceLaunchConfig<span class="o">)</span>
responseLC <span class="o">=</span> client.describe_launch_configurations<span class="o">(</span><span class="nv">LaunchConfigurationNames</span><span class="o">=[</span>sourceLaunchConfig]<span class="o">)</span>
sourceBlockDevices <span class="o">=</span> responseLC.get<span class="o">(</span><span class="s1">'LaunchConfigurations'</span><span class="o">)[</span>0][<span class="s1">'BlockDeviceMappings'</span><span class="o">]</span>
print<span class="o">(</span><span class="s1">'Current LC block devices:'</span><span class="o">)</span>
print<span class="o">(</span>sourceBlockDevices[0][<span class="s1">'Ebs'</span><span class="o">])</span>
sourceBlockDevices[0][<span class="s1">'Ebs'</span><span class="o">][</span><span class="s1">'SnapshotId'</span><span class="o">]</span> <span class="o">=</span> sourceAmiSnapshot
print<span class="o">(</span><span class="s1">'New LC block devices (snapshotID changed):'</span><span class="o">)</span>
print<span class="o">(</span>sourceBlockDevices[0][<span class="s1">'Ebs'</span><span class="o">])</span>
<span class="c"># create LC using instance from target ASG as a template, only diff is the name of the new LC and new AMI</span>
timeStamp <span class="o">=</span> time.time<span class="o">()</span>
timeStampString <span class="o">=</span> datetime.datetime.fromtimestamp<span class="o">(</span>timeStamp<span class="o">)</span>.strftime<span class="o">(</span><span class="s1">'%Y-%m-%d-%H-%M-%S'</span><span class="o">)</span>
newLaunchConfigName <span class="o">=</span> event[<span class="s1">'targetASG'</span><span class="o">]</span> + <span class="s1">'_'</span>+ event[<span class="s1">'newAmiID'</span><span class="o">]</span> + <span class="s1">'_'</span> + timeStampString
print<span class="o">(</span><span class="s1">'new launch config name: '</span> + newLaunchConfigName<span class="o">)</span>
client.create_launch_configuration<span class="o">(</span>
InstanceId <span class="o">=</span> sourceInstanceId,
<span class="nv">LaunchConfigurationName</span><span class="o">=</span>newLaunchConfigName,
<span class="nv">ImageId</span><span class="o">=</span> event[<span class="s1">'newAmiID'</span><span class="o">]</span>,
BlockDeviceMappings <span class="o">=</span> sourceBlockDevices <span class="o">)</span>
<span class="c"># update ASG to use new LC</span>
response <span class="o">=</span> client.update_auto_scaling_group<span class="o">(</span>AutoScalingGroupName <span class="o">=</span> event[<span class="s1">'targetASG'</span><span class="o">]</span>,LaunchConfigurationName <span class="o">=</span> newLaunchConfigName<span class="o">)</span>
<span class="k">return</span> <span class="s1">'Updated ASG `%s` with new launch configuration `%s` which includes AMI `%s`.'</span> % <span class="o">(</span>event[<span class="s1">'targetASG'</span><span class="o">]</span>, newLaunchConfigName, event[<span class="s1">'newAmiID'</span><span class="o">])</span></code></pre></figure>
<p><a href="https://gist.github.com/aarondodd/3f0f3f81c82dc5cce1a828ecc904a6b9">Gist</a></p>
<p>Example call:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>aws lambda invoke <span class="nt">--invocation-type</span> RequestResponse <span class="nt">--function-name</span> autoscaling_update_ami <span class="nt">--log-type</span> Tail <span class="nt">--region</span> us-west-2 <span class="nt">--payload</span> <span class="s1">'{"newAmiID": "ami-123456", "targetASG": "my-fun-asg"}'</span> </code></pre></figure>
<p class="footnotes-title">References:</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-asgroup.html">AWS: Patch an AMI and Update an Auto Scaling Group</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Aaron DoddAmazon provides a great article1 on using Lambda to automate updating the AMI of an auto scaling group’s launch configuration. The only problem with their provided code is that the existing launch configuration’s storage settings (ebs volumes) are not kept, so the new launch config has no disks specified, resulting in new launches using the AMI’s default settings. AWS: Patch an AMI and Update an Auto Scaling Group ↩Calling AWS for current nodes in a group instead of hardcoding public IPs2017-03-17T00:00:00+00:002017-03-17T00:00:00+00:00http://www.aarondodd.com/aws/ec2/script/query/2017/03/17/query_ec2_by_tag_instead_of_hardcoding_ips<p>When integrating CI/CD with cloud instances, the old-school method of specifying a server IP is problematic since a well-architected cloud solution allows for instance to be replaced as needed. Instead, Jenkins or other processes should verify the current running nodes before issuing a connection attempt.</p>
<p>Below is a sample query that return the public DNS names of servers tagged with a certain value (Group=fancyapp1).</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">aws ec2 describe-instances <span class="nt">--region</span> us-east-1 <span class="nt">--filters</span> <span class="s2">"Name=tag:Group,Values=fancyapp1"</span> <span class="nt">--output</span> json <span class="nt">--query</span> <span class="s1">'Reservations[*].Instances[*].{Name:Tags[?Key==`Name`].Value,PublicIP:PublicIpAddress}'</span></code></pre></figure>
<p>The response would look like:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="o">[</span>
<span class="o">[</span>
<span class="o">{</span>
<span class="s2">"Name"</span>: <span class="o">[</span>
<span class="s2">"myfancyappserver-1516203598"</span>
<span class="o">]</span>,
<span class="s2">"PublicIP"</span>: <span class="s2">"52.187.211.151"</span>
<span class="o">}</span>
<span class="o">]</span>,
<span class="o">[</span>
<span class="o">{</span>
<span class="s2">"Name"</span>: <span class="o">[</span>
<span class="s2">"myfancyappserver2-1516200980"</span>
<span class="o">]</span>,
<span class="s2">"PublicIP"</span>: <span class="s2">"52.211.223.141"</span>
<span class="o">}</span>
<span class="o">]</span>
<span class="o">]</span></code></pre></figure>
<p>Or, if you just want the first node, change Reservations[*] to Reservations[0]. And if you only want the public IP, remove the Name: part of the query and change output to –text:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">aws ec2 describe-instances <span class="nt">--region</span> us-east-1 <span class="nt">--filters</span> <span class="s2">"Name=tag:Group,Values=myfancyapp1"</span> <span class="nt">--output</span> text <span class="nt">--query</span> <span class="s1">'Reservations[0].Instances[*].{PublicIP:PublicIpAddress}'</span></code></pre></figure>
<p>In this case the output would be:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">52.187.211.151</code></pre></figure>
<p>From the source script, you could just set the result of the above to a variable for the server to connect to.</p>Aaron DoddWhen integrating CI/CD with cloud instances, the old-school method of specifying a server IP is problematic since a well-architected cloud solution allows for instance to be replaced as needed. Instead, Jenkins or other processes should verify the current running nodes before issuing a connection attempt.