Debugging Hard Things: Safari Edition

Some problems can be hard for us to debug because we think it’s too difficult to understand. Signs that this may be happening is if we start making wild guesses, we try the same thing over and over again, or copy and paste a solution we don’t understand in the hopes that it will go away.

Hint: Adding more zeros usually won’t help…

I’ve seen this in practice from many smart and capable folks (including myself!) with concepts like z-index (like did you know there are multiple stacking contexts, not a single global one?), CSS specificity (it really is just counting!), spotting memory leaks, puzzling through concurrency issues, or trying to work around browser bugs.

So what can we do instead, when we notice something that’s hard for us to debug? Plenty!

  • Take a pause with the beginner’s mindset. It’s okay to not know how something works. We can take some time to learn some of the basics to help us move forward.
  • Remember that computers aren’t magic! 🪄 There’s a reason why this is happening. If we’re having trouble figuring it out, it might be for a reason that’s on a layer that we’re not familiar with. (browser, compiler, DB, OS, API, network, a bad physical cable, and so on).
  • Know that we can break big problems into smaller ones and work systematically to rule out or narrow down theories.

So that all sounds great, but how do we apply that?

Let’s take a closer look at what this process can look like in practice. The following is a real example of a browser bug I hunted down and some techniques I have found helpful.

A Real Example: Flickering Elements in Safari

One issue that piqued my interest while working full time on the WordPress Block Editor was some puzzling behavior in Safari when scrolling post content in the WordPress block editor.

The text was flickering on image captions and there was a black flashing when scrolling quickly.

Cursory searching said that most simply promoted more elements to their own compositing layer (more on this below) via some CSS like transform: translate3D(0,0,0); and called it a day.

While it’s pretty tempting to copy paste a CSS rule like that, what made me pause on accepting a PR that did just that is:

  1. We don’t understand why this was happening.
  2. We don’t understand why this maybe fixed it.
  3. We don’t understand what the consequences of doing so would be.

With that in mind, I dug in to try and provide an alternative solution. Here’s where I started:

Reproduce the Issue

A great first step to start when looking at a bug is testing to see if we can reproduce it.

Reproducing a bug lets us iterate on our theories and potential fixes in an ideally speedy test-a-fix and see-if-the-bug-is-still-there loop. If we can’t reproduce an issue, it doesn’t mean the problem doesn’t exist. It’ll just be much more difficult to iterate on. When things are not reproducible, sometimes we end up needing to test ideas by chatting with those who can reproduce the issue, or bulletproof and verify if an error or issue goes away with production instrumentation 😭.

Observe and Ask

In this Safari example, it was straightforward to reproduce. The issue already had a few videos attached to the issue, so I could verify which editors were being used by 🔍 looking at the UX elements and I could spot types of block content being used in the post as a starting point (paragraphs, cover block, gallery block and more). Creating a quick test post and scrolling in Safari confirmed that this was an issue.

If there’s not enough information in a bug report, we can ask the reporter for more information. Screenshots or full videos can help a lot too when folks don’t understand how to describe a behavior or technical terms of what’s actually happening.

Reduce What’s Needed to Reproduce

A bug report might have hidden assumptions or ideas on why something is happening. These assumptions aren’t always correct, so it helps to confirm or rule these out ourselves. One of the first things I did was also verify that this wasn’t showing any visual issues in FF or Chrome with the same content.

If we can reduce what’s needed to reproduce an issue, this can also help speed up our testing loop. With the flickering elements, I narrowed down post content to contain a single cover block and gallery.

Great so we can reproduce the issue! What’s next?

Assemble Our Known Clues

Like reading a murder mystery, it can help to keep our known facts or hints assembled together. It makes it easy to revisit when thinking of new ideas to investigate or if we need to go back and challenge our assumptions.

Initial Clues List

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint?
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • From quick internet searches: possibly related to layer compositing issues?

Pick a Clue to Investigate or Challenge

Sidenote: sometimes we can use past experience (intuition) to skip through some of these discovery steps. I personally have some experience working on some reasonably complex canvas apps, so compositing was something that came to top of mind when looking at the symptoms of this bug. For the purposes of this section let’s assume that we have zero experience to work with.

Okay so searching on the internet for “Safari flickering scroll” gives us an answer of something like add this magic CSS rule that does nothing: transform: translate3D(0,0,0); (move this element nowhere in 3D space) or something like backface-visibility: hidden (toggling this value is invisible in 2D space) with zero explanation.

Very suspicious, right?

At this point, it might be easy to give up and think “This is weird! Just paste that answer and move on with our lives!” but let’s remain curious and dig deeper. It’s okay if we’re never encountered a browser bug like this before or never had to dig more deeply into understanding browser rendering.

Be persistent and keep searching (or asking folks) and refine our questions!

  • Adding a follow up search like “transform: translate3D(0,0,0);” we can see that we’re trying to get the browser to do something around hardware acceleration.
  • Another search on hardware acceleration hints at a rendering process called compositing.

Bingo! There’s something new to learn or refresh our memories about! Let’s look at browser compositing.

A great way of gaining a basic understanding is finding multiple (hopefully reputable resources), read it, synthesize it and try to explain it again to someone else.

Here’s are the posts I used to try and summarize the next section:

Let’s give that explanation part a try!

Understand a System More Deeply: Compositing

Compositing is one of the last steps that a browser takes when turning a web page into pixels on your screen. It’s also an optimization over a more naive implementation.

Very broadly a modern browser renderer process handles:

  • Parsing: turning an HTML string into a Document Object Model (DOM). Loading external resources (images, styles, javascript), and loading, parsing and executing any JS.
  • Style: computing the style for each DOM node. (Which CSS rule won?)
  • Layout: calculating where to draw nodes and how big they should be.
  • Paint Order: what order should we paint elements? Think of how we might paint a real oil painting where we have some background mountains, a person, and a dog as our main subject. One method would be to paint back-to-front, drawing background elements first. Mountains, then the person, then the dog.
  • Paint and Compositing: Determine how to group elements into layers, paint each layer to fill with pixels, and then draw or put together each layer in the right order for a final image. Let’s go over this in more detail below.

What Is Compositing?

Animation of compositing process from https://developers.google.com/web/updates/2018/09/inside-browser-part3#what_is_compositing

While we might naively fill in pixels on our screen by painting each element in our viewport in paint order, this is slow. What if we break up parts of the page into their own layers that don’t change as much? Using our painting analogy, we might make a layer for the mountains, one for the person, and a last one for the dog. Like in cel animation, after we paint each layer, we can reposition each layer independently without needing to repaint the entire scene. For browsers this is very useful in scenarios like smooth animation or scrolling.

Determining what should be in a layer is non-trivial. These are internal implementation details and may vary by browser and change over time, but roughly a browser will create a new layer when it has:

  • 3D or perspective transform CSS properties
  • <iframe>, <canvas> or <video> elements
  • CSS animations and accelerated CSS filters
  • It has a descendant that is a compositing layer
  • It has a sibling with a lower z-index which has a compositing layer (in other words the layer overlaps a composited layer and should be rendered on top of it)

Layers can also get pretty large too, which can waste resources if the browser viewport only intersects a small part of it. We can optimize for this by subdividing a layer in a process called tiling. Going back to our painting analogy, think of how we might portion out squares of a large wall mural, in some prioritized order, for multiple artists to draw.

In the browser, determining what layers to create and how to put it back together again is usually split out to a compositor thread, which may in turn also create child threads to give small pieces of work to the Graphics Processing Unit (GPU). The GPU is great at small tasks like painting pixels for polygons, hence the term hardware acceleration that gets thrown around.

On Performance

Another way of thinking about this is that each compositing layer acts as a pixel cache. This is of interest to us as web developers because some types of updates to a web page can skip parts of the expensive rendering process.

From most to least expensive:

  • Layout Change: If we make an element bigger, smaller, or change its position on the page we can’t skip any steps of the rendering process. (Using cel animation as an analogy, think of needing to throw out all of our existing cels and needing to ink, color and reposition them.)
  • Paint Change: If we update a paint property like background, or color, we can skip layout. (Using cel animation as an analogy, we can repaint the existing cels, then reposition them).
  • Compositor change: If we only update a compositor supported property: transform or opacity. We can skip layout and paint. When done correctly we can see very smooth animations and scrolls. (Using cel animation as an analogy, no need to ink or repaint cels, we can simply reposition the layers for a different final image).

Why can’t we make everything a layer?

Have you tried promoting another layer? Maybe all of them? Is this perhaps a bad idea?

We can’t make everything a layer since the tradeoff is memory use and overhead for managing each of those layers! Done haphazardly, we can make our webpage much slower, or even crash! Like with most things we should profile to make sure layers make sense and are kept in check.

Revisit Our Browser Tools

After we understand a system more deeply, let’s check to see if browser have any tools to help track this down! It’s always great to double check what debugging tools are available to us since it makes investigation work go by much more quickly.

Thankfully, Chrome and Safari do have a layers tab in devtools which list these layers, their memory use, and why it created a layer.

Using the Layers Tool

So right out of the box, we can see a few suspicious things:

  1. We have a number of layers, some of them are very big!
  2. Some layers are caused by some position value, like “position: fixed”
  3. Others are caused by “–webkit-overflow-scrolling:touch”

Let’s update the clues list

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint?
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”

Refine the Search Using New Information

Using the information taken from inspecting the layers, one thing that called out to me was --webkit-overflow-scrolling since this issue only triggered on scroll (that we know about).

What can we do with that? Well, maybe let’s try toggling the value!

And so here we try to override this in dev tools, but with an unhappy surprise! It’s an unsupported property! But somehow a compositing reason? How rude! 

Well how about we change overflow on .interface-interface-skeleton__content?

https://github.com/WordPress/gutenberg/pull/32637

This works to get rid of the glitches, but it breaks the sidebars. We can’t go with that approach of course, but we now have more information!

Let’s update the clues list

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint?
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”
    • –webkit-overflow-scrolling:touch can no longer be set or unset. This is added automatically when overflow is set to scroll or auto. In other words, any scroller now uses compositing with hardware acceleration and we can’t turn it off.

Hidden Assumption

Sometimes we have assumptions on our clues list that are incorrect and can even limit our thinking sometimes. Let’s take a closer look at this one:

The black flashing is from a bad browser paint?

If true, this would almost 💯 be a browser bug to isolate and ideally be filed as a bug.

This wasn’t the case! While debugging in the elements pane, I stumbled upon the fact that it was coming from a .edit-post-layout .interface-interface-skeleton__content parent element, and we could make it any color we pleased, like pink! The gray overlay was intended to be used to frame the tablet/mobile and template previews.

This wasn’t a graphics glitch, but possibly an incorrect ordering or compositing problem on scroll.

Let’s update the clues list

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint? The background color is “bleeding” from .edit-post-layout .interface-interface-skeleton__content
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”
    • –webkit-overflow-scrolling:touch can no longer be set or unset. This is added automatically when overflow is set to scroll or auto. In other words, any scroller now uses compositing with hardware acceleration and we can’t turn it off.

Reduce the Problem

There’s a lot going on in the WordPress Block Editor. To help isolate the noise, two approaches can work in reducing a problem. We can start turning off larger pieces of logic in the app in exploratory PRs OR create a new base case mimicking conditions from scratch.

I opted to try and create a simple HTML/CSS test case, since I suspected it was still part of a browser bug. It’d be much easier to test our guesses in simpler markup and we’d need a simple test case to use for browser bug reporting in WebKit anyway.

My first attempt I came up with was this. I tried to pick out what I thought were the most important parts of the skeleton interface, along with what I hoped was enough test content to trigger the scrolling glitch.

I had partial success. I could trigger this on my large resolution monitor (and see it stop display the behavior when I set font-size back down to something more reasonable). Others however still couldn’t consistently reproduce.

Spot the Difference

Another game I like to play when debugging is spot the difference, where we go piece by piece and make sure we note where a difference appears in one environment or another. This work can be a bit tedious and sometimes requires turning off your brain, similar to going through git bisect to test which commit caused a failure in production. Once we spot a difference, we can then dig in and question why that is.

So, as I was refining the test case to try and make it more reproducible for more folks, I noticed something. Do you see it?

One of the compositing layers was much bigger in size in WordPress Block Editor than in the test case! In the Block Editor one layer was as tall as all content in that pane! Meanwhile my test case shows a layer that is the size of the current browser viewport.

Let’s update the clues list

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint? The background color is “bleeding” from .edit-post-layout .interface-interface-skeleton__content
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big. Our initial test case does not have such large layers, what could be causing it?
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”
    • –webkit-overflow-scrolling:touch can no longer be set or unset. This is added automatically when overflow is set to scroll or auto. In other words, any scroller now uses compositing with hardware acceleration and we can’t turn it off.

Detailed Work

With that new clue in hand, I tried to make a much more accurate base case. I attempted to fully mirror the skeleton interface. It was a lot of divs 😭.

My second basecase attempt was this and from the methodical work a surprising item popped out. A simple div was the cause of the huge layer:

<div tabindex="0" style="position: fixed;"></div>

See how removing it gives us a much more reasonable layer size? With the scrollable browser pane at ~1584px x 588px with the test case, there was also around a 50MB difference in memory usage.

Neat! I also opened a debug PR to see what would happen. In the Block Editor we insert this div in the content to aid in focus related issues while scrolling. When we don’t allow it to insert we can see that it too also gives us a more reasonable layer size.

We have mixed results: I wasn’t able to recreate the text flickering issue anymore, but I could still see the background bleed on scroll in some cases.

And for no apparent reason, doing so also triggered a new fun glitch where the background color from .edit-post-layout .interface-interface-skeleton__content also “bleeds” into the scrollbar element when selecting an image.

Let’s update the clues list

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint? The background color is “bleeding” from .edit-post-layout .interface-interface-skeleton__content
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big. Our initial test case does not have such large layers, what could be causing it? Adding a div in the scrollable content with position: fixed, will create a layer of height that equals of the scrollable content height.
      • Removing the fixed div, causes a different scrollbar glitch to appear. Removing the fixed div did not fix the background scroll flashing. Do we have more than one problem?
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”
    • –webkit-overflow-scrolling:touch can no longer be set or unset. This is added automatically when overflow is set to scroll or auto. In other words, any scroller now uses compositing with hardware acceleration and we can’t turn it off.

Surfacing for Air

It is easy and normal for one to get frustrated or even stuck on debugging hard things. When that happens, pause, and surface for air. One great thing to do is recap what we’ve done so far. Reminding ourselves of what we’ve uncovered so far in our clues list, summarizing the pieces, and getting input on what to try next and test assumptions works in wonderful ways to unstick us!

So after writing an internal post at Automattic (which contained content like this post, just stopping at this part) I circled back and tried to decompose the problem. I was pretty sure we were seeing multiple issues instead of just one.

Workarounds

At this point, I was pretty sure that we were looking at at least one browser bug, perhaps several. The symptoms here were interesting enough to distill and report back to WebKit, but knowing if I could isolate a bug and see the fix upstream in a reasonable amount of time is typically out of my control.

As web developers, if functionality is important, we often need to pragmatically find a workaround where we typically re-implement the same functionality but avoid the browser bug, even if we’ve isolated the bug or have found a fix.

Background Bleeding on Scroll

The first workaround I focused on was the background “bleeding” on scroll. While, I had a good lead on the flickering text issue, at the time I didn’t know what functionality I broke by deleting the position:fixed divs and what it was intended to be used for. I asked for help in tracking down that why and moved back with fresh eyes on the background-flashing issue.

I tried several things: like changing which components were toggling the overlay for tablet and mobile previews. Everything I tried wasn’t quite right, and I still saw that color bleed. Taking a break I looked at my clues list again.

Going back to my clues list:

  • Glitches only display while scrolling in Safari?
  • The black flashing is from a bad browser paint? The background color is “bleeding” from .edit-post-layout .interface-interface-skeleton__content
  • Maybe related to images (since it’s easier to reproduce with that content)?
  • Possibly related to layer compositing issues?
    • We have many layers. Some of them are big. Our initial test case does not have such large layers, what could be causing it? Adding a div in the scrollable content with position: fixed, will create a layer of height that equals of the scrollable content height.
      • Removing the fixed div, causes a different scrollbar glitch to appear. Removing the fixed div did not fix the background scroll flashing. Do we have more than one problem?
    • Some layers are caused by some position value, like “position: fixed” or “–webkit-overflow-scrolling:touch”
    • –webkit-overflow-scrolling:touch can no longer be set or unset. This is added automatically when overflow is set to scroll or auto. In other words, any scroller now uses compositing with hardware acceleration and we can’t turn it off.

Well it does look like the common thread here is the compositing layer. In this particular case, the background-color was also being set on a div that was a compositing layer.

I was focusing so much on business logic, what would happen if we changed how the CSS rules were applied? If there was a browser bug, what if we moved the background-color rule to a div that wasn’t a compositing layer?

https://github.com/WordPress/gutenberg/pull/32747

Moving the background-color to a div that was not a compositing layer did the trick! The workaround PR was super simple in retrospect but took quite a bit of 🔍️ investigation to get there.

Back to Text Flickering

With the background bleeding fix in hand I went ahead and rebased my debug PR with the background fixes. As a bonus, that scrollbar glitch was also fixed!

We already knew that removing the fixed divs would fix the text-flickering and doing so avoided creating a very large compositing layer on the block list wrapper.

Now it was a straightforward matter of understanding why the fixed divs were added and providing an alternative implementation to retain functionality. Thanks to others we learned that the divs were used to help prevent scrolling on tab. The workaround needed here was to remove these fixed divs and re-implement the scrolling on tab behavior in a different way:

https://github.com/WordPress/gutenberg/pull/32824

Isolate and Report

At this point all issues were resolved in Gutenberg with the workarounds that landed.

With limited time, it’d be pretty common to call it quits once we find a workaround for a browser bug. I was really curious about the root causes here too, so I wanted to isolate good test cases for WebKit bug reporting and hopefully get this fixed for others.

Doing this work took about as much time for me as it took to fix the problems in the WordPress Block Editor. I used all the techniques I noted before: reduce the problem, spot the difference, and lots of tedious detailed work.

One of the goals of isolation is to find the smallest test case possible that reproduces the behavior. If you have a good instinct on where the problem might lie, it might be faster to start from scratch and reproduce the conditions agnostic to your app. If not, it can make sense to start with where you can reproduce an issue, then try to chisel away as much as you can.

I also asked others to verify if they could reproduce a test case before reporting. Sometimes different operating system settings, or different hardware can be needed to trigger an issue. Aiming for a simple and easy to reproduce test case will usually lead to faster fixes in any project.

The Solutions

Background Flashing

We fixed this in the Block Editor by moving a background color to a div that was not a compositing layer.

https://github.com/WordPress/gutenberg/pull/32747

https://bugs.webkit.org/show_bug.cgi?id=227532

To isolate the problem, one missing ingredient was it required very quick scrolling (usually from a mousewheel). Webkit maintainers also noted that the artifacts here are from tiled layer flashing. (When we split up a large layer into smaller pieces to paint).

It should look like this:

Text Flickering

This was fixed in the Block Editor by removing two position:fixed divs, so we avoided creating a compositing layer that was very large.

https://github.com/WordPress/gutenberg/pull/32824

To isolate a test case, I had to brute force this one, by turning off as much as I could in the editor, then perform a binary search on the styles to see what kept the glitch or not. The overall test case is a weird combination of needing flex styles, two fixed divs, an iframe and some extra z-index stacking contexts.

https://bugs.webkit.org/show_bug.cgi?id=227705

After submitting the issue, WebKit maintainers quickly used the technique to reduce what was needed to reproduce (https://bug-227705-attachments.webkit.org/attachment.cgi?id=432961 ✨). It turns out this was a duplicate bug of this regression which had a patch, and there was an amazing turn around of about a day to get that committed.

See also this comment from Simon Fraser on what was happening:

Compositing backing sharing logic exists to reduce the count of composited layers,
allowing layers that would otherwise get composited to paint into the backing store
of some containing block ancestor (usually a scroller). A "backing sharing sequence"
is a backing-provider layer, and a set of layers contiguous in z-order that can
paint into that shared backing. If a layer becomes composited, it must interrupt
the sequence (because layers later in z-order must render on top).

The bug occurred when a layer became composited between the calls to 
BackingSharingState::updateBeforeDescendantTraversal() and BackingSharingState::updateAfterDescendantTraversal(),
for example because of an indirect reason like overflow positioning...

Bonus: Safari Scrollbar Wrong Color

I happened to stumble on this one while debugging a black scrollbar on a test commit, but thought it would be interesting to isolate. This was fast for me to find through luck (half-a-day), since I’m inherently suspicious of extra z-index contexts.

https://bugs.webkit.org/show_bug.cgi?id=227545

Somehow the overflow controls container is getting behind the scroller layer. Triggered by the negative z-index child

Simon Fraser https://bugs.webkit.org/show_bug.cgi?id=227545#c3

There’s already a committed patch in WebKit for this one! 🎉

Summary

So when debugging hard problems, don’t despair! Remember that it’s okay to not know how things work. If we notice a gap in our knowledge, we can take some time to learn how things work.

If we work systematically and keep track of what we know and don’t know so far, we can work toward finding a fix, or help narrow down what the problem might be. Techniques like picking a clue to challenge or investigate, reduce the problem, spot the difference, detailed work and surfacing for air can help move an issue forward.

When time allows, isolating an issue and reporting upstream can both help others and deepen our own understanding of what’s actually happening in a layer we are unfamiliar with.

I’m happily employed by Automattic. If working on these kinds of problems excite you, check out our open jobs page!

#code

How to Unstick a Pull Request

Sometimes pull requests can get stuck during code review. In many cases, it’s not because the changes were unneeded, but because the conversation just appears to… well, stop.

I’ll walk through five common problems pull requests get trapped in and what you can do as a reviewer to help move things along.

The Forgotten Pull Request

Photo by cottonbro on Pexels.com

Symptoms: There might have been some conversation on the pull request but now it’s just dead silence. Reading back you see that the last comment was from six months ago without clear next steps. The pull request is still open, waiting for someone, anyone, to help nudge it.

Try: Ask the author if the pull request is still needed and if they’re interested in working on the problem. What happens next is either the pull request is swiftly closed OR the author tidies it up for a fresh set of eyes and a new review. ✨

What is the problem? What does this do?

Symptoms: The pull request summary is a terse sentence. In other cases, the pull request doesn’t summarize and has too much context. For whatever the reason, as a PR reviewer you’re not quite sure why the PR exists or how to test it.

Try: There’s no need to be a detective 🕵️‍♀️. If something is confusing, go ahead and ask the author! Being clear on why a PR is needed and what is does, is not only good for the reviewer, but it helps provide context to future maintainers on why certain decisions were made.

Ideally we want to know:

  • What problem(s) the PR is addressing.
  • What does the PR change?
  • How can we manually test it?
  • What type of feedback is the PR author looking for?
  • Are there any trade-offs to the solution we should be aware of?

PR Checks are failing

Symptoms: Newer contributors to a project might not realize that their pull request has failing checks. Maybe the linter is not happy with some whitespace, or a test is broken and the author isn’t sure on how to run the suite.

Try: Gently remind folks in a comment that checks have failed. If you have the time, point to documentation on how to run the tests and other contribution guidelines. It can also be helpful to troubleshoot environment issues with them or give them additional hints on how to clear up the issues. Using our best judgment, we can sometimes accept the PR in a less than perfect state and help fix the issues by asking and making changes on the branch directly, or in a follow up PR.

No Clear Next Steps

Photo by Leah Kelley on Pexels.com

Symptoms: We see a PR where other participants have left some previous reviews. It isn’t clear what needs to happen next. The conversation might even still be active, but it’s going on and on and on and on… to the point where GitHub thinks it’s a good idea to start hiding part of the timeline:

Try: Ask the active participants what needs to happen for the PR to either merge or be closed. It might be as simple as someone taking the time to summarize what was discussed and what needs to happen next, for example: fix tests, address design feedback, and update documentation. At other times, a PR working through multiple concerns might need to be split out: like starting a new PR exploring expanding an API, or proposing a framework change in a different medium like chat or long-form writing.

Additional Reviews Needed

Symptoms: You’re not confidant of approving the PR on your own. The changes affect multiple areas. Maybe the PR needs other expertise like design or security feedback that you’re unable to provide. You’ve done a great job already by knowing what you know and what you don’t know.

Try: Leave a review clearly stating what you tested, what feedback you have, and what type of feedback you think the PR needs to land. Manual testing and other partial reviews are still a great help to other reviewers. If you know who’d be great at unblocking the review, ping them in a comment with context. If you’re not sure, leave a note on what type of reviewer expertise is needed and if you have time, help the author by asking other contributors on who’d be a good fit to review.

Give it a try!

If you notice a stuck pull request that fits one of the patterns here, try unsticking it! You don’t need to be the project expert to help move things along. Simple actions like asking good questions, manually testing, or asking for additional review help can make all the difference.

#code, #process

Code Reviews 📚

The following is a collection of my thoughts about what makes a good code review. This is a repost from the internal Calypso blog with a few modifications made from feedback. I have also included a few tips to structure PRs in a reviewer friendly way. It’s my hope that this post will help encourage folks to get excited about code reviews.

Frame of Mind

At the start of my career, I didn’t understand why good code reviews were helpful. This was partly because I hadn’t seen a good code review yet. At best someone might have rubber stamped my change, and at worst code gatekeepers nitpicked irrelevant details in the patch leaving both parties in a foul mood.

My thoughts on reviewing changed drastically once I realized I was approaching this with the wrong frame of mind. We are rewarded by what effort we put into it, and as part of that participants must share some common understandings to avoid an unproductive review.

At Automattic, I think we already have a very strong reviewing culture. The following are a few points I personally remind myself, before starting a review.

We all share ownership of the code. It is not yours, or mine, it’s ours. Always welcome improvements and share knowledge freely.

Many programing decisions are opinions. There is often no one right answer. Discuss tradeoffs in a productive way and move on quickly. Stick with project convention for stylistic things like tabs vs spaces, even if you don’t agree with it. Changing convention can be done outside of the PR as a larger discussion with the group.

We can always learn new things. No matter how much one knows, we can always learn more. Folks can expose you to new ideas. Explaining concepts you’re familiar with can help improve your understanding of it.

Communicating

Another huge part of a successful code review is good communication. We’re all nice folks at Automattic, but text communication is tricky. It is very easy to misinterpret feedback about code as something more personal. I think we’ve done a very good job at avoiding this, but here are a few techniques to lesson confusion.

  • Avoid separating code ownership. Do not assign ownership of the code with words like “my code” or “your code”. Doing so makes code reviews feel more like a personal judgement. We all share ownership of the code. Remember that the code is also a product of many constraints (time, familiarity with the codebase, etc.) and is not a personal reflection about the author. Even the best developers will produce code from time to time that has some issues to work through.
  • Assume best intent, stay positive. Avoid sarcasm and negative descriptors like “terrible” or “dumb” that may be misread.
  • Avoid demands, offer suggestions instead. “What about moving this into it’s own file?” It’s also helpful to phrase these as questions. Often times the reviewer may be missing context on why a particular suggestion will not work.
  • Authors should respond to suggestions. “Great catch! Updated in 565acae.” “We went ahead with the original approach because of timing concerns.”
  • Be explicit. “Let’s do change X because of reason Y”
  • Say if something is a blocker or optional. “Due to security concerns we should update this method before shipping.” “This is optional but I think this reads better if we move this into it’s own method.”
  • If something is confusing, ask. “What is the reasoning behind these changes?”  “I don’t understand what’s happening on this line, could you please explain?
  • Let the author know when you appreciate a change. “Thanks for taking on this task!” “I really ❤️ how this new workflow feels, I left a few notes on some things we can improve.” “This PR drops our build size by 500kb! Great work!”
  • Explain next steps, or complete the review. “I noted a few blockers I’d like to see resolved before we 🚢” “Changes here look great. 🚢 when you’re ready.”
  • Keep up momentum. If a PR looks stalled ask if anything needs to be done. This is especially important for OSS contributors. It is usually better to accept a PR that has a few issues left to work through, and fix it up later, than have the OSS person abandon the PR.

Preparing your PR to be Reviewed

If folks are always waiting for a code review it helps to have some empathy for the reviewer too!

  • Explain why. Assume reviewers have little or no context when reading the PR. Explain why we need this PR and what it does. (This is also very useful when looking at past decisions). Screenshots and gifs are appreciated when behavior is complex.
  • Add Step-By-Step Test Instructions. Can someone unfamiliar with the changes test your PR by reading the summary?
  • Keep changes small. Large changes are difficult to review and understand. Try to separate janitorial changes from PRs that change behavior.
  • Note weird things. This includes explaining any odd code workarounds, or buggy behavior. This can save some back and forth between the reviewer and author, and may also expose existing bugs.

Code Review Benefits

When done well, code reviews can help on many different levels.

  • It spreads code ownership.
  • Communicates changes across teams.
  • Serves as a sanity check to verify that requirements make sense.
  • Allows folks to find bugs though both manual testing and in code.
  • Lets all folks involved learn new things!
  • Can also serve as documentation for how something worked, and why certain decisions were made. Perhaps even for a future you!

Anyone can be a Reviewer!

The fact that code reviews work on many levels also means that reviewers don’t need to know all things about a project in order to make a meaningful contribution. Sharpening copy, manually testing, polishing design, or asking questions about confusing things is a great help.

Code Review Challenge

If this isn’t a habit for you yet, I’ll like to challenge you to try reviewing a few PRs from a different team or one that you may have felt intimidated to contribute to.

Here are a few strategies I use to pick PRs to review:

  1. Take a look at one of the oldest PRs on the needs review list. Ping the author with questions if it looks inactive.
  2. If you don’t have a lot of time, choose a tiny PR to look at. These are the fastest to review and test, and usually have the least risk of causing a regression.
  3. Choose something you’re unfamiliar with. Reviewing PRs is a great way to learn, and to keep a pulse of what’s happening on other teams. Don’t be afraid to dive into a section you’ve never looked at before. If you don’t follow, or something is unclear, ask questions! The PR author is usually happy to explain.

Have fun reviewing!

#code, #process

s
search
c
compose new post
r
reply
e
edit
t
go to top
j
go to the next post or comment
k
go to the previous post or comment
o
toggle comment visibility
esc
cancel edit post or comment
%d bloggers like this: