Best Practices for Fixing Software Problems

16.02.2022 23:16

eWeek

There’s no such thing as a perfect software product. No matter how stable your application is, there’s bound to be occasions where things go wrong in production. To make the most and learn from each incident, it’s crucial that engineering teams regularly commit to doing post-mortem investigations.

This is especially important as companies grow and teams increasingly transition to a remote working environment. Even something that seems small can be analyzed and learned from in order to prevent future, and potentially more serious, vulnerabilities.

Having best practices in place for how to conduct a post-mortem software investigation around an incident is something that cannot be overlooked by technology providers.

Also see: The Best Project Management Software

Fixing Software Problems: Key Steps

While there’s no one-size-fits-all solution for every team, there are several fundamental steps that should be taken to make it an effective process and ensure that incidents remain rare.

Collect data during the incident. It’s important to collect as much data as you can in a single location, as the incident goes on. This includes server graphs, snippets from logs, and screenshots showing what was going on at each point in the incident. It doesn’t all end up being useful, but it’s good to have everything collected when you start going through the investigation in detail.
Start the investigation right away. Get one of the developers/managers involved to take on the role of lead investigator, which means they’re in charge of making sure the investigation gets done, the post-mortem document gets filled in, and the debrief gets held. Starting it right away makes sure nothing gets lost.
Review the results within a week. While things are still fresh, hold a debrief to review the post-mortem document as a group, discuss the action items, and make any edits needed. This can be a 30-60 minute video session with the team involved in the incident, as well as representatives from other departments (primarily the customer support team, but any impacted department should attend).
Share the results. As soon as the debrief is done, everyone should get a chance to learn from it. Post it where the whole company has access to it for transparency – incidents shouldn’t be hidden away.

Also see: Digital Transformation: Definition, Types & Strategies

Additional Measures for Efficient Software Fixes

These best practices will set up teams for success, but as the future of work evolves, there are new challenges in following them.

For instance, companies are now facing employees working in all sorts of time zones, and a mix of remote and hybrid teams makes scheduling and coordination much more complicated. There are several additional measures that can help ensure that a post-mortem investigation remains effective, regardless of the environment:

Assume async. Scheduling the debrief more quickly means that it’s harder to find a spot in everyone’s calendars. Rather than pushing the meeting further and further out, do more of the work asynchronously. Make sure the document can stand on its own, and use the quickest communications channels to ask people for their contributions. Also consider recording the debrief (easy with Zoom) so that anyone who couldn’t attend is also able to watch it later, so nobody has to worry about missing out.
Complete the investigation quickly. It’s important to shorten the timeline expectations on the investigation. Collecting the data early avoids having multiple ongoing investigations, and allows everyone involved to get back to their sprint work sooner.
Simplify the incident document template. Consider simplifying the template so that there are less sections to worry about, and make each section as easy as possible to fill in. In order to still be complete, this document should include sections for:
Impact and Scope
Trigger (what started the incident)
Resolution (what ended up fixing it)
Timeline of events
Root Cause
What went well
What didn’t go well
Action items
Data & Analysis (all the charts)
Ask for input from customer-facing teams right away. A customer success team always has great input and is able to help fill in gaps in the timeline. Reach out to them early so there’s time for their input to be added into the post-mortem document before the debrief. Waiting for the debrief is too late!
Track action items in backlogs. Why track action item progress in an incident document when there is already a standard tool for tracking work? As soon as you can, get all action items from post-mortems so they can be assigned to backlogs and don’t get lost. It’s also beneficial to have automated reports set up to view the list of outstanding post-mortem actions—driven by a post-mortem label on the items.
Have a section for “things we should do if we have time.” Realistically, not all action items are actually actionable—some are more aspirational or something everyone should keep in mind. In order to keep the action items clearer, include this section as a spot to put the things you think are important but you couldn’t turn into assignable/trackable work. It’s better to have a smaller set of action items that you actually do than a giant list of things you would like to do given infinite time.
Keep it Blameless. This one isn’t actually new, but it’s well worth repeating! Be interested in what happened and what you’re going to do to fix it going forward, not in pointing fingers.

Remote work and fast-paced development don’t have to make incidents complicated. By following these best practices, software engineers and team managers can make the most of an incident post-mortem and focus on what matters most: learning from it and making things better for the future.

Also see: 7 Digital Transformation Trends Shaping 2022

About the Author:

Jesse van Herk, Senior Manager of Product Engineering, Jobber

The post Best Practices for Fixing Software Problems appeared first on eWEEK.

Выставку о развитии Подольска и его героях открыли в День народного единства

Большой спортивный праздник прошел в Долгопрудном

В Москве арестовали замначальника тыла Росгвардии

Маршрут, посвященный архитектору Бове, появился на портале «Узнай Москву»

Fixing Software Problems: Key Steps

Additional Measures for Efficient Software Fixes

Читайте на 123ru.net

Game24.pro

Видео-новости

Объявления

Личное

Частные объявления в Вашем городе, в Вашем регионе и в России

Новости от наших партнёров в Вашем городе

В Калужской области мужчина похитил, избил и ограбил женщину

Cнегозадерживающие щиты и деревья будут бороться с наносами на М-12

Пытался уйти от преследования и совершил смертельное ДТП

Сергунина рассказала о новом онлайн-проекте на портале Москвы

Уральская ТПП реализует важные межнациональные проекты и укрепляет народное единство

Футболисты «Спартака» и ЦСКА устроили массовую драку на поле

В России открылся первый центр подготовки Международного волонтерского корпуса 80-й годовщины Победы Великой Отечественной войне

Джиган, Artik & Asti и NILETTO спели о худи, а Дина Саева стала новым артистом: в Москве прошел BRUNCH Rocket Group

Голкипер «Ростов-Дона» Юлия Грацкевич признана лучшим игроком октября

В Калужской области мужчина похитил, избил и ограбил женщину

Женщина в Челябинской области убила сына, засунув его в стиральную машину

Верховный суд РФ признал законным приговор Стрелкову

Кажетта Ахметжанова рассказала, сбываются ли сны с четверга на пятницу

В ДК Железнодорожников состоится премьера мюзикла "Али-Баба и сорок разбойников"

В Коми главврачу вынесено представление из-за неработающего лифта – медперсонал спускает и поднимает пациентов на носилках

В Ростове-на-Дону прошёл концерт народного хора имени Пятницкого

Александр Зверев: «Очень сложно стать первым без победы на «Шлеме». У меня был шанс в 2022-м, но это редкость, тогда были особые обстоятельства»

Прямая трансляция первого матча Елены Рыбакиной на Итоговом турнире WTA

Соболенко досрочно пробилась в плей-офф Итогового WTA. А Рыбакина уже не выйдет из группы

Даниил Медведев станет самым возрастным участником Итогового турнира — 2024

Без помех и потери сигнала: первые беспроводные игровые клавиатуры

Спасателям вынесли приговор за смерть ребенка в челябинском термальном комплексе

Легко устроились // Застройщики наращивают ввод объектов light industrial

Джиган, Artik & Asti и NILETTO спели о худи, а Дина Саева стала новым артистом: в Москве прошел BRUNCH Rocket Group

Топ новостей на этот час

Сергунина рассказала о новом онлайн-проекте на портале Москвы

Женщина в Челябинской области убила сына, засунув его в стиральную машину

В Калужской области мужчина похитил, избил и ограбил женщину

Суд отложил заседание по делу о банкротстве «королевы марафонов» Блиновской