Looking to improve technical reporting?
Increase site speeds?
Streamline a data/dev pipeline?

Yeah, I do that!

All using freely available tools, in my spare time.

About Me

I'm an inherently curious person. This drives me to learn how things work, and how to solve problems. I'm entirely self-taught: from compiling Linux hardware drivers to Adobe Illustrator to non-linear statistical regressions.

My strengths lie in a coding & deployment-process background. My values for that include favoring flexibility instead of overly-precise, over-engineered setups. When it comes to data, it's all about being test-driven.

Recently, I took the Strength Finders test and a majority of my strengths are strategic. Here are some scary-accurate descriptions:

  • "you work seriously at something when you have defined the specific objective"
  • "you give credit to certain individuals who make key points that advance everyone's understanding of a particular theory, concept, or idea.
  • "able to find connections between seemingly disparate phenomena"
  • "takes psychological ownership of what they say they will do"

I've always worked with designers and often directly with clients, keeping me in touch with aesthetic sense, and free from technical jargon.

For the past 6 years, I've worked remotely for a pretty great ad/design firm. I've had the freedom and responsibility to test out ideas and learn from a wide variety of over 80 client sites. What you see below are focused examples of my own sheer curiosity.

With over a decade of web development experience, I'd like to continue to support other developer and analytic teams to improve their work, by supplying them with data and resources to prioritize, automate, and motivate their tasks. After all, everyone likes to see their work having an effect.

So, what problems does your team have? I'd love to be a part of the solution!

Thank you for your interest, Mark Wallace

Did Someone Say Case Studies?

Click below to find out more

Buzzword soup

I'm a developer and I know what each buzzword means.

Clients often come to us wishing to integrate their business process into their online presence. While sometimes this is possible, often multi-level hurdles exist.

The leasing agency for The Duke Nashville was able to achieve their dream by wisely choosing a property leasing platform with an API. I was able to provide not only an up-to-date listing of available properties, but also organize and search all available information.

  • CDN/Caching compatible architecture
  • REST Based
  • CORS compatible
  • Custom Design
  • Data sourced through RentCAFE's API
  • Fully instrumented for conversion tracking
Results:
  • 70% of sessions involved an availability search.
  • 8% of sessions continuing to the rental application.
  • No Downtime.
  • Happy client.

Curiosity got the best of me, and I had to look up and see which apartments drew the most interactions:

Dean & Sinatra: Classic.
Looks like the lowest price with the same number of beds & baths matter more than anything.

What can a CDN do for you?

How I'm more than just a developer

By now, just about every developer knows that a Content Delivery Network is an essential part of the production stack. In 2014 however, I was the only one in the office who had heard of them. I knew there would be significant improvements across the board for our clients, and so I started collecting as many measurements as I could, culminating in my first "State of the Web" annual review for my bosses.

First up, I lined up all our sites on Cloudflare. That alone dropped our DNS times by hundreds of milliseconds. Even without much customized page caching, Cloudflare dropped server load from 20+ page-builds per hour to around 2 per hour.

Second, I offloaded all our media libraries to Amazon's S3 & Cloudfront. That's now handling over 3 Terabytes per year for all our clients.

All of our sites' bandwidth, by source.
Each month, I migrated more sites over.

An early test to see how effective caching would be.
This is showing the number of pages one of our servers was building per hour.
April 16th was a test; April 20th we went live with the first wave, and by May 1, we have a second layer of caching.

It's always fun to see the immediate improvements!

Developer Pipeline

Humans are terribly forgetful, projects tend to get messy over time.

Project Managers and Developers alike need something reliable. What if there was a way to provide verification that changes were made? What if that was happening Also, offloading the What a developer does (or is supposed to do) manually, this will automate.

What's in a Developer Pipeline?

  1. Build: All code committed is packaged up with all dependencies.
  2. Test: Basic checks are in place to make sure requirements are in place.
  3. Local Deploy: It is then deployed to an internal dev site for all to review.
  4. Compare: The dev site is visually compared against the live site.
  5. Hold for Deploy: Nothing goes live without an actual manual approval, as a sanity check.
  6. Approval: The go-live process is automated, and no one is allowed in to muck with it.

An early simplified edition of the pipeline
Alerts were issued on the visual comparisons.

How were goals achieved?

This was a multi-year process. It all started with finding the common ground of software requirements on the staging server. Even if updating projects on that server was manual for months, each time that manual process was run, another step was added to slowly to and converged upon a consistent set of scripts, able to be automated.

Once the staging server was consistently handling a variety of projects, we started to replicate the setup locally. Vagrant was actually the worst part of the process. Docker is rightly a popular successor.

Finally, the last step was to choose between Jenkins, CircleCI, and other competitors. CircleCI won out for its independently managed flexibility. The actual spin up of CircleCI was relatively easy. All the hard work was in the overhaul of workflow process.

Goals
Application
Flexibility
Providing a base platform for every developer to build on locally, or remotely. Vagrant image provided for local and dev servers; next round will use Docker.
Allowing each developer to use their own coding tools, while agreeing on a few core essentials. If vagrant is too bulky, developers can run their own platform, so long as their code still passes tests. When using Sublime instead of Atom, be sure to install an equivalent linter.
Consistency
Every step that is automated can be manually replicated for debugging. All the tools that used in automation are also available to developers on the local virtual machine for manual step-by-step checking.
Starting from known sources: Each external resource would have a version or release number specified Using a package manager instead of adding everything to git.
Keeping content & resources available independently from code. All uploads were sent to Amazon S3.
Local virtual machine has MySQL replicated for fast local reads.
Confidence
Basic tests for easy-to-miss things Added to CircleCI
Visual tests to compare what we are about to go-live with vs. what is presently live. BackstopJS was added to CircleCI
Independently stored site versions (code and data) for easy rollback No live-site deployments are overwritten, just moved.

Takeaway:

Having visual tests automated and readily available has been the biggest confidence builder for our design and management leads. We are able to have a record showing the changes applied.

Sample visual comparison from BackstopJS.
A bad database import resulted in added glyphs.

Importantly, we also reduced a culture of immediacy. No longer was everyone nagging the go-live manager and waiting impatiently playing the "refresh game" as it was nicknamed. Project changes went live when the machines deemed it so.

User Session Trends

What do you do when Google Analytics doesn’t give you enough?

What makes a quality user session? Number of pages? Amount of time spent on site?

Google Analytics‘ reductive tendency.

Site owners will always hope for the end-user to convert – be it purchasing a product, or getting in contact. However, there are other more subtle actions users take which can be telling. Even if a user has no immediate need for a site, we still want to know that they found something about it useful or interesting.

Google Analytics has a nasty habit of providing a singular number (average) for metrics which are in fact much more dynamic. Session Duration is an average. Pages per Session is as well. These averages are at the mercy of bounce-rate and the fateful long-tail. Any goals or reports on these metrics can only hope that the number is higher.

By exporting the number of pages viewed for each user-session, as well as the amount of time spent, I was able to find the full distribution of session durations.

Session Duration: East Lake, Q1.
This graph (to 120 seconds) makes up 66% of user sessions. Expanding to the right up to 240 seconds would contain 95% of the data.
Google Analytics: the session duration average is 2:21 (141 seconds).

Pages per Session: East Lake, Q1
Google Analytics: the pages per session average is 3.02.
A larger volume of traffic helps to 'smooth' out the curve.

The East Lake site traffic shows a smooth curve, something which we might assume is normal.

Session Duration: Wier/Stewart, Q1.
Google Analytics says 2:49 (169 sec) is the average Session Duration.

Pages per Session: Wier/Stewart, Q1
Google Analytics calls this 2.43 pages / session. Only data within the first 30 seconds is valid, since there's less traffic after that.

However, it seems more common for sites to have a curve that is sharper. Most sessions are rather short, and as such, create the initial 10-second bump for Wier/Stewart.

Session Duration: VeryVera, Q1.
Google Analytics: 2:54 (174 sec) Average Session Duration

Pages per Session: VeryVera, Q1
Google Analytics: 3.6 Average Pages per Session

Indeed, Wier/Stewart and VeryVera load faster, allowing users to move on with their decisions sooner.

Variability in the Session Durations shows a potential for 2 groups of users for Wier/Stewart: those under and over 20 seconds. Wier/Stewart also has fewer users after the 30 second mark, so statistical relevance is weaker, shown by wildly diverging lines.

Take-away:

While this model has not been fully applied yet, it certainly surfaces user-experience features which can be tracked against site and content changes.

I've learned there's at least 3 steps to data-exploration, each with their own set of parameters:

  1. That something happened: Wier/Stewart has a 'bump' that other sites’ data don't have.
  2. What or How it happened: The quality of the Wier/Stewart site data indicates a higher number of users with shorter sessions.
  3. Why it happened: Why are there proportionally more shorter sessions? Repeat visitors, thirsting for more? Terribly irrelevant content? For now, this is an ongoing, open quesetion.

Month to month may only change the average .08 percentage-points, which is no real help or encouragement for teams looking for results. Being able to see the full data curve at 10-second intervals allows for a much more diverse story to be told, or at least a more complete baseline to be held.

User Experience Transparency

Ever wonder how fast websites load for real users?

I didn't need a way to tell if our sites were up. UptimeRobot had me covered. What I needed was to know was whether the sites were loading slow, and how slow.

Running webpagetest.org, GTMetrix or Pingdom tests would happen at scheduled intervals at best, and not from browsers with 34 other tabs open.

I needed a way to get raw load-time data, and see the trends. Setting cold hard alerting thresholds would just cause panic.

Of course, Real User Monitoring isn't a new idea: New Relic has a great Browser product for $50/month, but where's the fun in that? I had spare time, and a small budget, so I made my own.

Turns out, browsers have built-in load-time data. After a few months of consistent data, I needed something better than Google Charts to help me get usable information out of the now-gigabytes worth of raw browser timings. Domo provided a pre-built, all-in-one Data ETL/Visualization platform …for free!

The Data Pipeline:

  • My Custom Code:
    • Instrumentation: Javascript tracking is added to each site.
    • Collection: Data is sent to a simple endpoint which saves the raw JSON data.
    • Extract: Those JSON files are batched hourly to be compiled into a CSV and uploaded to Amazon S3.
  • Domo:
    • Transform & Load: Domo pulls in the S3 source and pushes changes through an easy to configure data-cleanup and tabling process.
    • Visualization: Being able to create multiple versions of the same chart is helpful: often I need a good clear chart for easy communication, a simple chart for alerting data, and another one with "the works" that shows comparisons which could be shown easier elsewhere.
    • Dashboards: Grouping charts into a single place, and exporting them all out to Powerpoint or other formats allows for easy offline sharing.
    • Alerting: Domo keeps alerting simple: for any chart made, you can sign up an alert if anything changes by a given percent or deviation.

Here's an example of 3 high-to-lower level charts that keep me aware of site performance. We'll start by looking our sites with enough traffic to be significant, for the past 10 days:

Latest 10-day Page Load Times
The percentage of users who experience page-loads under a given set of seconds.
Our goals are that 60% of users will experience page-loads under 1 second, and 80% under 2 seconds.

Singling out eastlakegolfclub.com, we can see how the latest week compares historically over the last year. Looking further, I'm also able to provide performance information on a per-page level.

Weekly Historical View for East Lake:
Average page load-times are around the 60% / 80% goals.
The first few months were their prior hosting.

Per-Page Page Load Data for East Lake.
The black line shows the number of views each page received, most-visited is on the left.

Even if I needed a larger scale with custom Python or R scripting, Domo provides an excellent platform on which to quickly iterate, explore and log what transformations & visualizations might be needed.

Right now, I get twice-weekly email alerts if any site has degraded performance.

Take-away:

At-a-glance insights of live data regarding technical aspects of our sites can provide more valuable direction than other testing methods.

Data-based decisions have started a self-reinforcing landslide of how I think about website production.

I get really excited to see real data about how our client sites & our code perform.

So what do you think?

How can I help your organization?