The Westworld Software Team is bad at XP and DevOps

While watching Season 1 of Westworld, I spent the entire time annoying my wife. I would bug her about the anti-patterns I saw the software teams at Westworld using.  Of course, the anti-patterns were for dramatic effect. It wouldn’t be interesting if everything went right. But, since it’s rife with bad practices it can be a good way to point out when teams are doing things wrong.

What follows will be a load of spoilers. We will learn from the Westworld team’s mistakes and prevent them in our own teams.

Again there will be many spoilers, so if you haven’t finished Season 1 and don’t want it to spoiled: stop reading.

Without further ado.

Continue…

Sabremetrics for Software Teams

“He’s got the look of a ballplayer.” That phrase was enough to make a purchasing decision in the millions. That changed with Sabremetrics. Sabremetrics is a series of a statistics designed by Bill James in the 70s. It has transformed baseball but was ignored for 40 years. James figured out that statistics we used to rank baseball players were inaccurate. For example, he found that batting average didn’t capture player’s impact. It didn’t include walks or differentiate between singles and home runs. James instead leveraged On Base Percentage and Slugging Percentage to value hitters. This led to a more accurate read on hitters.

The cost of misjudging a player is high. Baseball teams are enterprises. Teams compete for talent. The richest teams have budgets several times larger than their competitors. Picking up a dud can cost teams millions and getting a superstar is out of reach for small market teams. Scouts, the people charged with finding and ranking potential players relied on intuition. The top tier of players was easy to recognize and thus were expensive and out of reach for small market teams. In response to mounting costs, teams began to leverage Sabremetrics. Soon, great players who traditional scouts ignored were then discovered. Traditional scouts ignored David Ortiz. But, he had high on-base and slugging percentages early on his career. For the Red Sox, these were better indicators of his potential than the fact that he was chubby. Decisions like signing Ortiz are how the Red Sox won the World Series in 2004, 2007 and 2013.

Software teams are just as guilty for misjudging talent and missing the diamonds in the rough. Delivering working software that people will want to use is the goal of most SaaS software teams. Yet, what kind of traits are software teams looking for in candidates? If you look at lists of interview questions like Awesome-interview-questions, you’ll find only technology questions. Technical questions have value and you should be doing a technical screening. But we shouldn’t miss the other elements of working on a software team. How did they approach a problem they had never solved before? How would they? How do they ensure quality in their codebase? How do they interact with nontechnical team members? How do they leverage open source technologies? Do they mentor and pair program? How do they react to conflict? I’ve had candidates who were almost seething when I mentioned tabs and spaces. Do you think that person will keep their cool when something goes wrong?

With a competitive market, smaller teams need to find the undervalued players. But they also need to avoid talented professionals who are difficult to work with. Build a team of all stars without competing for the types of talent Google are fighting for.

Molasses! A feature toggle library for Elixir

Early this year when I read about Erlang in Seven Languages In Seven Weeks, I mentioned that Elixir was a language I intended to explore this year. At Maxwell Health, we also started to investigate Elixir for new projects because we wanted a functional language in our arsenal and we liked it’s concurrency story. So I thought it would be in my best interest to dive into Elixir. I read through Programming Elixir and Programming Phoenix and recently started work on several Elixir projects.

Feature toggling is a way to release code to production without executing it by default. There are several ways to leverage feature toggles: releasing a feature production that may not be ready for customers or may be broken, releasing a feature to a subset of users or simply A/B testing. I wanted to create a library that captured all of these features for Elixir. Having contributed to other feature toggle libraries, I had a basic sense of the features I wanted. I wanted to be able to activate and deactivate features, have features that could be shown to a percentage of users and have features that could be only available to groups of users. In languages like PHP, this can mean several classes that are managing these pieces of functionality. However, in Elixir I could leverage pattern matching which shrank my code down.

 

    def is_active(client, key, id)  do
        case get_feature(client, key) do
            {:error, _} -> false
            %{active: false} -> false
            %{active: true,percentage: 100, users: []} -> true
            %{active: true,percentage: 100, users: users} -> Enum.member?(users, id)
            %{active: true,percentage: percentage} when is_integer(id) ->
                value = Integer.to_string(id) |> :erlang.crc32 |> rem(100) |> abs
                value <= percentage 
            %{active: true,percentage: percentage} when is_bitstring(id) ->
                value = id |> :erlang.crc32 |> rem(100) |> abs
                value <= percentage
        end
    end

The code above handles all of those features in just a few lines and it handles several kinds of Ids (strings and integers). Isn’t that cool? Other than pattern matching, other languages features I’ve enjoyed are the piping of functions into one another. It has been a great way to visualize how data is being manipulated in the pipeline. I can’t wait to explore more of Elixir in the new year.

You can check out the whole repo on Github.

 

Antidote

antidote-logo-29pt3x

As of data released yesterday, the state of Massachusetts loses 5 people a day to opioid overdose. The whole nation is facing an epidemic, young people are dying at an alarming rate and more people becoming addicted to opioids every day. Naloxone is a drug that reverses the effects of an opioid overdose and saves lives.  Civilians can be trained to administer the drug and be there to reverse the fatal effects of an overdose. However, in order to administer the drug they must be in the right place at the right time. To help connect responders with those in need the FDA held a month-long hackathon.

To help connect responders with those in need the FDA held a month-long hackathon. The goal of which was to provide a mobile application to get Naloxone to those who need it within the 5 minutes after the person in need begins overdosing. In many parts of the country, the number of overdoses that happen in one evening far outpaces the number of EMTs and police officers who are on duty to respond.

Cory, Carrie and I were inspired to participate after several years of watching our towns getting gutted by heroin addiction. We’ve lost friends and neighbors who had so much more to experience, so when the FDA opportunity came about we knew that our skill sets could help make a difference.

simulator-screen-shot-nov-8-2016-12-51-46-pmThe application is called Antidote. It is designed to connect those who need Naloxone to their area with those who have it. It has two roles: the responder, and the requester. The responder is a user who is trained to administer Naloxone and the requester is someone who needs it. We wanted the workflow to be minimal and familiar. The guiding principal was if you have used Uber you would know how this would work. The application asks for minimal information, just a phone number and your location, to respect the privacy of those involved.

simulator-screen-shot-nov-8-2016-12-53-09-pm

From a technology perspective, we knew we wanted to support both iOS and Android. While we never used React Native before, it made sense to adopt it as it is designed for those who have used React on the web and it integrates well with Redux. There are an endless number of plugins for React Native, so the implementation of location, maps, notifications, styling and persistence was relatively straightforward. We didn’t want to bother with authentication or authorization and wanted to be able to verify accounts with SMS.  Auth0 provides this SMS/passwordless functionality out of the box and made itself an even easier choice in its implementation.

You can experience a demo of our application at Youtube and read all of the code on Github: here and here. We’re really proud of the work we’ve done and hope to expand on it in the future.

 

TV Pilots are a treasure trove of data

A few months ago, I did a little weekend project of looking at TV comedy pilot scripts. For those unfamiliar with the concept, when a television show is being developed a network will order a pilot episode as a test to see if it will pick it up for a full season. As a result, the idea may be reworked and elements changed to “make it work” for that network.

Part 1: Fetching and Normalizing the data

To start, I scraped about 450 television pilots from https://sites.google.com/site/tvwriting/us-drama/pilot-scripts. Here was my first challenge, some of these were just text files (awesome) but others were PDFs. In order to extract the text, I turned to Tesseract. Below is the script I used to extract the text:

#!/bin/bash
for f in $(find . -name '*.pdf'); do
  filename=(${f//\.\/scripts\/comedy\//})
  parsedfilename=(${filename[0]})
  PAGES=`pdfinfo $f | grep Pages: | awk '{print $2}' | tail -n 1`
  if [ ! -f textfiles/$parsedfilename.txt ]
    #some text was parsable just using pdftotext
    pdftotext -layout $f - > textfiles/$parsedfilename.txt
  then
    echo "File $parsedfilename does not exists"
    for i in `seq 1 $PAGES`; do
      # converts the file to an image
      convert -density 500 -depth 8 $f\[$(($i - 1 ))\] images/page$i.png
      # tesseract parses the image for text and puts it into a file
      tesseract images/page$i.png stdout >> parsed/$parsedfilename.txt
    done
  fi
done

This got most of the scripts in a format that could be queried. Here’s a sample of the very funny 30 Rock pilot – note the different character names –

ACT ONE
INT. NBC STUDIOS, NEW YORK e DAY
The studio's homebase set. Workman are polishing a big
sign that reads, "Friday Night Bits with Jenna DeCarlo.
"Pull back through the picture window to where KENNETH a
bright and chirpy (Clay Aiken type) NBC page is giving a
tour. He stands next to-a life-size standee of impish
comedian Jenna DeCarlo. '

Part 2: Apache Spark analysis

Now that the data was machine readable, the best first course of action was to query the text files for data that I thought might be interesting.  Apache Spark is a great tool for loading up datasets like this so I went into the Spark shell and ran some different experiments. Here is some of the code I used to get to these numbers:


//loads both folders of the 450 comedy scripts into the RDD
var parsedFiles = sc.textFile("./tvscript/parsed,./tvscript/textfiles")
//outputs the count of the phrase "20s"
parsedFiles.filter(line => line.contains("20s")).count()

Exterior vs interior scenes

Screenplays are unique because of the way they are formatted, they announce whether a scene is interior or exterior at the beginning of the scene with either INT. or EXT. so I started there.

 

Screen Shot 2016-10-09 at 1.46.01 PM

My take: It is significantly cheaper to shoot indoors than outdoors, this might be a self-selection by writers to make sure their show gets picked up.

 

Age of characters

When announcing a character in a screenplay you usually give a short description which includes their age usually by decade, for example from The Grinder script “STEWART SANDERSON (30’s) drives with his family”.

Screen Shot 2016-10-09 at 1.48.06 PM

 

My Take: No surprises here, television is geared towards 24-54 and they want to show a good distribution of those people on TV.

Part 3: Sentiment Analysis

That was a fun experiment, but it was time to go further. In looking at the data, I realized I could do a sentiment analysis of block of text in an episode and see if there were any patterns that appeared. I created a new scala project focused on using Stanford’s natural language processing library and based on work done here. Each block of text was taken and analyzed then put into a MongoDB store with a structure that looks like this

{
  "sentiment" : 1,
  "textFile" : "Black-ish 1x01 - Pilot",
  "line" : " DIANE\n She’s weird, so feel free to say no.",
  "weight" : 263
}

Here the “sentiment” is a scale from 1-5 with 1 being most negative, “line” is the actual block of text and “weight” is what order it occurs in the episode, so this was the 263rd thing said in the episode. With the data in place, I built a small node server that could display a chart for the scripts I parsed. Here are some screenshots of the results

 

Screen Shot 2016-06-05 at 11.08.25 AM Screen Shot 2016-06-05 at 11.07.40 AM

Pretty neat right? Well the completely interactive version is located at https://script-sentiment.herokuapp.com/ where you can look at the 100+ scripts I did sentiment analysis for.

Seven Languages in Seven Weeks By Bruce Tate

Full disclosure: I didn’t fully embrace this book, I didn’t do the exercises at the end of each chapter. For most of the languages, I don’t even have them installed on my computer. Ok, It feels good to admit that. With that disclosed, I will say Seven Languages In Seven Weeks was a treat. Of the 7 languages in this book, Ruby, Io, Prolog, Scala, Clojure, Erlang, and Haskell, I had only used 2 of them in the past, Ruby and Scala. My background is mostly in object-oriented, procedural and prototypical languages, however, this book shifts its focus towards languages that are more functional, and are built with pattern matching and concurrency in mind. Concepts that are not focuses of the languages I’ve worked with in the past other than Scala.

Seeing the evolution of languages was insightful to me, how closely tied to Lisp Closure is or how Scala and Erlang’s pattern matching are inspired by Prolog. While first investigating Scala, I could see the implementation of pattern matching but it wasn’t clear how powerful it could be until I saw how Erlang and Prolog leverage it in this book.  While I’ve appreciated limiting state in the systems I build, it was made much more clear how functional programming can be leveraged to supercharge concurrency. Working with languages like Java and Go, concurrency can be a proceed with caution situation because the developers writing the code are worried about race conditions and side effects. When we can significantly limit race conditions and mutable state then concurrency is less scary.

The interviews with the authors of each language spoke volumes about what the tradeoffs and intentions of each language were. Ruby is a language where the trade-offs are most obvious, a great syntax which leads to productivity traded for speed.  Having revisited Ruby recently for other projects both personal and for work, you can see the productivity increase right away but, it might take some time to feel that trade off in speed. The challenge, of course, is knowing when to migrate.

What I discovered at the end of reading this book was the language I wanted to explore next. Oddly enough, it wasn’t in this book and even more strange after doing some more reading it was partially inspired BY this book and that language is Elixir. Elixir combines some of the syntactical sugar we love about Ruby, the metaprogramming of Clojure and the concurrency and power of Erlang.

Seven Languages in Seven Weeks: A Pragmatic Guide to Learning Programming Languages (Pragmatic Programmers)

Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation  By Jez Humble

Hopefully, more books I read this year will have an impact on my day to day life, but at the very least this one definitely will. Continuous Delivery by Jez Humble was suggested to me by our VP of engineering along with Release It By Michael Nygard which I read last year. Both cover the software development pipeline by focusing on resiliency and regularly delivering working code. While many a blog will eschew the ideas of continuous delivery, this book gives you the patterns to use to significantly shorten your cycle time – the time between ideation and code live on production.

 

If you talk to people who are responsible for releasing software, most will tell you releasing is painful. The reasons for this sentiment are many: they do it rarely so it is a big event, it’s manual, there is no automated testing, they have no idea what they are releasing because someone else wrote it and the kicker of them all, production is different than their testing environments. This book calls all of these complaints and more out as anti-patterns and models many of the solutions around one idea: if it is painful right now, then move it forward in your process and do it more often, which forces you to automate it. This means if there is only manual testing which is slow and at the end of your process then do test driven development, i.e. write tests first. He includes in this TDD pattern server deployments where one builds the monitoring and health checks first before they release the server as the “test” will pass once they are deployed. Chances are for many teams especially ones with little automation in place, it means doing a lot of hard things first, automated testing, configuration management, better version control strategies, and getting staging looking more like production. Humble suggests incremental improvements but it is critical to have testing in place as you can release all the time but your customers won’t be too happy if your applications break regularly.

 

This book will change the way you deliver software and likely will make your life a lot easier. I also suggest reading Nygard’s Release It either right before or right after as it gives you concrete architecture patterns to implement some of Humble’s ideas.

Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler))