Shifting Culture

By Anand Kumar, 21 December, 2020

When I joined the Technology Solutions (TS) group at the American Institutes for Research (AIR) in 2015, I found the team experiencing several pain points. Projects were unwieldly due to constantly changing requirements, project managers trained in Waterfall models had not embraced agile methodologies, defects were commonplace and increasing as more features and bugs were pushed into production environments without proper testing, the overall posture was reactive, and recurring feedback from the team was that they were constantly swimming upstream and struggling to stay afloat.  I set out to resolve the issues by focusing on training staff, creating efficient processes, automating redundancies, and incorporating industry best practices into daily tasks and workflows.

I established a team of senior staff to first identify and document all the issues and problems.  At every step, the entire team was informed of activities and progress - as I believe transparency is paramount to achieving culture shift.  Feedback from all stakeholders (including clients, senior leadership, project managers, and developers) was elicited in the form of periodic surveys using SurveyMonkey.  Focus groups were established and debates encouraged on how to address deficiencies. Deliverables from these first tasks included a comprehensive list of issues and improvements.  Success would not have been possible without the continued participation and continuous feedback of all impacted parties. To measure the effectiveness of the process improvement efforts, metrics were established which included customer satisfaction, defect density, ROI, and cost and schedule variance. 

With a primary objective to improve client outcomes, I led a process improvement team to break down processes into discrete steps. Tackling the requirements gathering process was considered highest priority, which commenced with a detailed comparison of leading project management software methodologies: Waterfall, PMBOK, Scrum, Kanban, and Extreme Programming.  To handle the influx of projects requiring rapid iteration and regular stakeholder feedback, the team settled on the Scrum framework – implementing two week sprints and daily stand-up meetings. Eliminating the generally inflexible project plans (especially scope) immediately boosted client satisfaction, while technical staff embraced the unconvoluted development cycle. After training staff on Scrum roles, workflow, and artifacts, I deployed  Atlassian JIRA software to help with the planning, tracking, releasing, and reporting of agile projects. JIRA presented all users of the system (including clients) with an intuitive interface to define workflows (typically To Do -> In Progress -> Done), create epics, stories, and tasks, plan sprints, and distribute tasks across the software team. Prioritizing tasks within JIRA was a simple drag and drop, while Scrum boards with backlogs ensured all stakeholders had the ability to discuss work in full context with complete visibility.

All client requests made their way into JIRA (staff were requested to enter relevant information after any communications) for prioritization, scheduling, and assignment. Several rules were put into place – the most important mandating that once a two-week sprint was underway, that reprioritization or swapping out of tasks within the current sprint would not be allowed. This constraint both enabled the development team to focus on the current sprint goal and encouraged clients to clearly articulate expectations for a working, high-quality, and usable deliverable at the end of the two-week sprint.

The feedback loop produced a tidbit as it related to documentation. The majority of staff were not inclined to work with documents within SharePoint (which was the document warehouse software of choice at the time) and instead preferred to work with a wiki style, easily searchable interface to retrieve, create, and edit project documentation.  I set-up and configured Atlassian’s Confluence wiki and created easy to use templates for communications, initiation, planning, design, development, testing, implementation, maintenance, operations, transition and closing - which included blueprints for meeting minutes, status reports, correspondence logging, work breakdown structures, estimation worksheets, schedules, project management plans, project communications plans, roles & responsibilities, standing meetings, design diagrams, contract artifacts, organization charts, work product tables, baselines, baseline audits, checklists, customer acceptance forms, and definitions of done.  A “Read Me First” page – similar to a FAQ - was created to guide staff on how and where to capture various items of import (ex. fleshing out requirements using Confluence templates vs. finalizing user stories within JIRA).

With the process and tools in place to document requirements, I added several layers to incorporate quality. Every user story or task within Jira was required to specify Acceptance Criteria (conditions that must be met to avoid unexpected results and be accepted by the client) and every product increment was required to specify a Definition of Done (checklist that asserts quality). The TS Definition of Done includes:

  • The code is well-written. That is, the team does not feel they need to immediately refactor or rewrite it.
  • The code is checked in.
  • The code comes with automated tests at all appropriate levels.
  • The code has been either pair programmed or has been code inspected.
  • The feature the code implements has been documented as needed in any end-user documentation.

(Note. Some of the elements in the DoD were pulled from Mike Cohn – Mountain Goat Software)

In addition to ensuring that acceptance criteria and definition of done were specified, the Jira workflow was customized to ensure that user stories or tasks could not progress to the next stage without estimates. Level of complexity estimates were required for user stories whereas tasks required hours. This crucial step, along with prioritization, enabled the Scrum team to determine which product backlog items would make it into the two-week sprint for completion - a best guess effort based on empirical evidence.

Security and data governance were also built in at both the product increment and overall project level. Technology Solutions staff were trained to ferret out any special requirements (PII, PHI, FISMA, HIPAA) and work in collaboration with the AIR Information Security and Information Technology teams to review technology stack and architecture, provision servers for hosting, and periodically scan for vulnerabilities. Developers were encouraged at all stages to raise gaps and non-compliances.

Configuration management was the next target for improvement – especially as gaps were found with source control or lack thereof. All development staff were trained in Git – a distributed version-control system for tracking changes in source code, and BitBucket - Atlassian’s web-based front-end to Git.  TS solutions architects produced a comprehensive Git branching strategy in anticipation of a future continuous integration and continuous deployment (CI/CD) implementation. At a high level, all work would happen in isolated branches, traceable via Jira ticket numbers, with frequent commits consisting of detailed commit messages, followed by iterative merges into integration and live branches.

Developers were encouraged to create work logs within the Jira ticket as well as post comments to provide a running commentary. Checklists for security (OWASP Top 10), 508 compliance, and mobile / responsiveness were distributed and periodically referred to. Upon completion of code development, the developer would then test locally and merge their code into the integration environment. Initial developer sentiment indicated much angst with the various environments being out of sync – causing code that would work locally or in one environment to break in others (in 2015, the TS team had not yet embraced Docker and containerization). Developer access to the non-production environments allowed for undocumented configuration changes. After cutting off direct developer access to all environments, the environments were reset to match production, and shell scripts were created to merge in, build, and test the respective branches. Atlassian’s Bamboo software – a continuous integration server used to automate release management - was deployed and Bamboo plans that integrated with the shell scripts were created so that developers would be able to run builds on-demand.

As a result of the successes from automating tedious build tasks which included an increase in quality since defects were found much earlier in the life-cycle and a happier development team, the TS DevOps team was formed to focus primarily on reducing risk, delivering business value faster by identifying and implementing automation opportunities, and establishing open lines of communication and greater partnership with IT, InfoSec, and client teams.

Embracing a culture of continuous improvement and continuous learning, the TS Innovation team was borne to rapidly prototype behavioral and social science research and evaluation products to enhance everyday life. This elite team consisting of product managers, solutions architects, developers, and designers, was tasked with developing an innovation portfolio under an exceptional constraint – to spend no more than forty hours of prototyping per product or service. The team rose to the challenge and continues to release noteworthy products including:

  • two-way SMS nudge bots for use in kinesiology, substance misuse, and student and classroom outcomes
  • a server-less bulk mail tool for research surveys

  • secure video hosting services for classroom-based initiatives

  • audio and video derived analytics via automated transcoding, transcribing, and natural language processing for classroom insights and relationships

  • Xamarin and Flutter mobile starter-templates for social science research initiatives, and

  • custom Alexa skills using AWS, Lambda, and Python for users with disabilities.

A repeatable formula was devised to jumpstart creativity – consisting of three main phases: imagining and modeling; prototyping; and pitching. Brainstorming sessions were first conducted to dream up various ideas to address challenges. After prioritizing the ideas, modeling sessions were initiated to plan out builds. Sandboxing environments were created using containerization and provisioning tools (specifically Docker, Ansible, AWS CloudFormation, and Azure Resource Manager), and prototypes spun up utilizing classic and novel technologies and services.

The development team currently boasts an array of skills consisting of front-end, back-end, and DevOps capabilities. Technologies used on a daily basis in developing and maintaining over 130 projects include: HTML, CSS, JavaScript, Bootstrap, SASS, Webpack, React.js, Python, Ruby, PHP, Node.js, .NET, MySQL, MariaDB, PostgreSQL, MSSQL, MongoDB, Memcached, Redis, RabbitMQ, Solr, Apache, Ngnix, MS IIS, Docker, Ansible, Kubernetes, AWS, Azure, and Heroku. The team's efforts are concentrated on providing end-to-end technical expertise and solutions to and for researchers who are conducting and applying, evaluating, or disseminating research. Requests span: developing surveys and configuring devices (mobile, tablets) to be used in underserved areas without stable wireless or internet connections; data collection, processing (ETL), analysis, and visualization requiring auto-scaling of cloud-based instances; producing high impact portals and content management system to collaborate and share findings with interested communities and the public at large.

Several stacks are utilized for portal development, mainly LAMP (Linux, Apache, MySQL, PHP) for Drupal (7 and 8) development, and WISA/C (Windows, ISS, SQL, ASP.Net / C#) for DNN (DotNetNuke) and custom C# development.

Massive developer efficiencies were recently achieved by implementing Docker container technology. Vetted and hardened images provided by the IT and InfoSec teams are downloaded from the secure AIR Docker Trusted Registry (DTR) to local hardware to precisely mirror production environments. Local Docker stack deployment for each application is aided via utility shell scripts, with Bamboo serving as the automation taskmaster for image build and container configuration, and execution in subsequent environments.

TS’s agile portal development lifecycle process incorporates audience identification, user scenarios, interaction flows, UI / UX design, theming and styling, rules and workflows, content integration, testing, and security to create a comprehensive solution that can almost entirely be managed by end-users upon release (one major exception being application vulnerability patching).

The TS team is comprised of staff in various geographic locations with most working remotely. To support this diverse and mobile workforce, the team has implemented VPNs (Virtual Private Networks), screen-sharing technologies (GoToMeeting, WebEx, and Skype), and chat-based applications (Teams and Slack). Remote desktop interfaces are also available for handling sensitive data. Chat is fully integrated into the development lifecycle with Jira, Confluence, BitBucket, and Bamboo applications routing relevant status messages to respective project channels – along with the typical developer communications and banter. A separate channel is created for clients to provide instant feedback to the project team, allowing for a much richer and nuanced relationship. Incident management is also enriched via chat-based project channel integration with monitoring tools (Site 24x7, Uptime Robot, and Pingdom). The DevOps team’s most recent success was to successfully route Amazon GuardDuty (intelligent threat detection) findings for cloud-based applications to the proper chat channels (via Amazon CloudWatch alerts to an Amazon SNS topic to AWS Lambda).

With mobile devices accounting for almost half the traffic to online government resources, the TS team develops all websites with mobile responsive designs that respond or adapt based on the size of the screen the content is presented on - whether a mobile phone, tablet, laptop, or desktop. The team generally works with either the Bootstrap or Foundation frameworks to utilize responsive grid-layouts, but debates amongst the CSS developers often spring up on whether to use lesser-known but equally powerful options (ex. Skeleton, Pure, etc...). Lunch and learn sessions are then scheduled to weigh the pros and cons of various frameworks prior to finalizing a selection for each project.

The team's solutions architects in late 2017 developed a blueprint for in-house development and issued a mantra of: "No more monolithic applications". Much thought is given to creating re-usable components to support both legacy and microservices architectures. The team has embraced polyglot implementations with a healthy caution to ensure tolerable complexity. Services are exposed via REST APIs, with API gateways used to route requests to the appropriate microservice(s). Solutions architects are involved at every stage of the project and ensure due consideration is given to requirements for connectivity, security, scalability, high availability and disaster recovery, logging, monitoring and alerting, reporting and analytics, and cost.

An early success for the team was replacing MailChimp and ExactTarget with an in-house microservices C# application built with AWS Lambda (serverless computing), AWS Simple Email Service (SES), and AWS Simple Storage Service (S3). The application requires the upload of two completed templates by end-users - one, a text file with tokens for data replacement, and the other a CSV template populated with recipient e-mail addresses and columns with data for token replacement. The CSV and mail template files are uploaded to S3 which triggers a Lambda function written in C# that processes each row in the CSV and sends an e-mail using SES. 

The lessons learned and processes implemented form the core of the services that Technology Solutions now provides to all its clients – internal and external.