Haskell on AWS

Publish date: Sun, Dec 1, 2019

haskell logo

Haskell on AWS - Part 1

I taught AP Computer Science once. The language the students had to know was Java, and the style taught was imperative. I saw the same errors over and over again; I quickly moved to a Functional Programming style for my own code. Ever since, I have been a huge fan of the benefits of functional programming, including immutabie variables, lazy evaluation and improved concurrency primitives. Generally, my code is shorter, easier to reason about, and I continually impressed with the power of map and filter. I think of programs as data transformation pipelines, and I am a better programmer because of it.

For the past several years, I have been using Lisp/Clojure or Erlang/Elixir. However, I have always felt that these virtual machines extract a certain amount of start-up time penalty, which in this age of microservices may not always be worth it. So, I have been investigating Haskell, which has the advantages of being compiled, and strongly typed as well. So, I thought I would do a small series of articles on Haskell, and in particular how one might use it on/for AWS development.

Some quick history regarding Haskell. Haskell was originally an academic language - created to investigate new ideas in Functional Programming. It is nearly 30 years old, and the compiler is mature, fast and solid. Haskell can actually be compiled into C code, which makes C FFI very easy. About 10 years ago, many of its ideas starting to seep into other mainstream language (Python, Ruby and Javascript). In 2015, Facebook starting using Haskell in a big way to fight spam, and hired one of the luminaries of the language Simon Marlow. Simon Marlow is a British computer programmer, author, and co-developer of the Glasgow Haskell Compiler (GHC), which is primary compiler used by the Haskell community.

While the learning curve is higher than other languages, it definitely will expand your programming horizons while allowing you to write more robust code. While I am not 100% convinced that the steeper learning curve is worth it, I definitely recognize that it does eliminate a large number of run-time bugs and errors. Also, I find it encouraging that languages like Elm or Purescript, which are based upon Haskell, are bringing increased stability to the JS ecosystem.

We will write a small Haskell program which queries all AWS regions, and will print out the EC2 instances we find. Although short in length, it will demonstrate concurrency, immutability, and leverage some amazing libraries:

Let’s get started

We will use the stack tool for managing builds, package dependencies, and so on. It is built upon cabal, an older build tool, but improves on it in many ways.

Download stack

$ curl -sSL https://get.haskellstack.org/ | sh

Although Haskell is compiled, stack does allow you to create shell scripts. For example, to quickly run a Haskell script, copy the following content into a file called HelloWorld.hs

#!/usr/bin/env stack
-- stack --resolver lts-14.16 script

main :: IO ()
main = putStrLn "Hello World"

Open up a terminal and run stack HelloWorld.hs. Done!

Create a new Haskell project

stack will generate the scaffolding for a typical project

$ stack new ec2-demo
$ cd ec2-demo

Update the project dependencies

Packages or libraries are listed in two locations. All packages which are used by your code must be enumerated in package.yaml.

dependencies:
- base >= 4.7 && < 5
- bytestring
- lens
- conduit
- resourcet
- text
- unordered-containers
- amazonka
- amazonka-core
- amazonka-ec2
- async

Also, if the package is not known to be embedded in a Stackage Long Term support Releases or LTR, it must be listed as a extra dependency in the stack.yaml file. For us, the only library that we must add is the Amazonka package, which actuallly is made up of two main libaries, and one service specific library. Do NOT alter the ec2-demo.cabal file, which is automatically generated by the stack tool. This used to be the way dependencies were handled by the older cabal tool, but is no longer necessary.

extra-deps:
- amazonka-1.6.1
- amazonka-core-1.6.1
- amazonka-ec2-1.6.1

Show me the code!

We will put all code in the app/Main.hs file. The first few lines are compiler directives and import statements.

{-# LANGUAGE OverloadedStrings #-}

-- |
-- Author  : Nick Brandaleone <nbrand@mac.com>
-- Based upon example by Brendan Hay <https://github.com/brendanhay>
-- December 2019
--
module Main where

import           Control.Lens
import           Control.Monad.IO.Class
import           Control.Monad.Trans.AWS
import           Data.ByteString.Builder (hPutBuilder)
import           Data.Conduit
import qualified Data.Conduit.List       as CL
import           Data.Monoid
import           Network.AWS.Data
import           Network.AWS.EC2
import           System.IO
import           Control.Concurrent.Async

Helper functions

The next two functions are minor helper functions, which filter out Regions I am not interested in polling. Otherwise, we would poll all regions, including China and GovCloud. The should speed things up, and prevent weird run-time errors which might occur from polling these unusual regions.

myRegions x = case x of Beijing -> False
                        GovCloud -> False
                        GovCloudFIPS -> False
                        NorthCalifornia -> False
                        otherwise -> True

-- Remove some AWS regions from inspection
regions = filter myRegions [NorthVirginia .. Beijing]

Primary function and main

Here is the meat of the program, including main. Notice the mapConcurrently function call in main. It has each region polled on a separate thread. AWS credentials are automatically discovered using the standard AWS search path.

-- Print out EC2 information for a given Region
instanceOverview :: Region -> IO ()
instanceOverview r = do
    lgr <- newLogger Info stdout
    env <- newEnv Discover <&> set envLogger lgr

    let pp x = mconcat
          [ "[instance:" <> build (x ^. insInstanceId) <> "] {"
          , "\n  public-dns = " <> build (x ^. insPublicDNSName)
          , "\n  tags       = " <> build (x ^. insTags . to show)
          , "\n  state      = " <> build (x ^. insState . isName . to toBS)
          , "\n}\n"
          ]

    runResourceT . runAWST env . within r $
      runConduit $
        paginate describeInstances
             .| CL.concatMap (view dirsReservations)
             .| CL.concatMap (view rInstances)
             .| CL.mapM_ (liftIO . hPutBuilder stdout . pp)

main :: IO ()
main = do
  mapConcurrently instanceOverview regions
  return ()

Build the program

The stack build command compiles your code.

$ stack build

Run the program

The stack exec command will execute your program.

$ stack exec ec2-demo-exe
[instance:i-097a5645b6a80c96b] {
  public-dns = Just ec2-18-223-149-3.us-east-2.compute.amazonaws.com
  tags       = [Tag' {_tagKey = "Name", _tagValue = "Development"}]
  state      = running
}
[instance:i-03acd66f63e0cbf1f] {
  public-dns = Just ec2-3-18-213-74.us-east-2.compute.amazonaws.com
  tags       = [Tag' {_tagKey = "Name", _tagValue = "Testing"}]
  state      = running
}
[instance:i-051adc983881eb369] {
  public-dns = Just ec2-54-211-140-170.compute-1.amazonaws.com
  tags       = [Tag' {_tagKey = "PrincipalId", _tagValue = "AROAITFNF4PPONWVMMSDI:nbrand-Isengard"},Tag' {_tagKey = "Owner", _tagValue = "nbrand-Isengard"}]
  state      = running
}
[instance:i-045cb25ca7a48018c] {
  public-dns = Just ec2-35-171-160-24.compute-1.amazonaws.com
  tags       = [Tag' {_tagKey = "PrincipalId", _tagValue = "AROAITFNF4PPONWVMMSDI:nbrand-Isengard"},Tag' {_tagKey = "Owner", _tagValue = "nbrand-Isengard"}]
  state      = running
}
[instance:i-08fc9ff3d13be8e6f] {
  public-dns = Just ec2-3-94-90-233.compute-1.amazonaws.com
  tags       = [Tag' {_tagKey = "Owner", _tagValue = "nbrand-Isengard"},Tag' {_tagKey = "PrincipalId", _tagValue = "AROAITFNF4PPONWVMMSDI:nbrand-Isengard"}]
  state      = running
}
[instance:i-0a8b10030afcd78a0] {
  public-dns = Just ec2-35-164-44-32.us-west-2.compute.amazonaws.com
  tags       = []
  state      = running
}
[instance:i-03e3eb79844ef7365] {
  public-dns = Just ec2-34-216-142-23.us-west-2.compute.amazonaws.com
  tags       = []
  state      = running
}

Issues

The program worked well, and executed in less than 2 seconds. This is pretty fast considering there are now over 20 AWS regions. Also, did you notice that we did not need to worry about mutex locks, or any other asynchronous magic?! Pretty cool!

However, since amazonka is not an official AWS SDK, it tends to run behind AWS releases. The author created an amazing library, but only updates it about once or twice a year. This can cause issues if you use some of the newer features or services. I came upon two issues myself, and there are probably more.

  1. The list of regions is not up to date in the latest 1.6.1 package.
  2. If one uses newer instance types (t3 or i3 in my case), there may be a run-time error.

There are ways around this. One can clone the library, and make improvement on your copy. Feel free to push pull requests upstream. Or, you can create your own client. None of these solutions are ideal, but none of them are too difficult either.

Summary

I was able demonstrate how to use Haskell to interact with the AWS platform. We did this in a functional way, using highly-concurrent code. We did this in about 20-30 LOC (exluding some boilerplate), which is not bad for a language which is not natively supported by AWS.

The code is available on Github, in case you had difficulty following along. If you are unfamilar with Haskell, I am sure that many of the operators and function calls may seem strange or bizarre. Please see the references below to start your Haskell journey. I can promise it will be rewarding!

In the next blog in the series, I will show you how to use Haskell in a Lambda function.


References