In Defense of Silly Little Blinking LED Lights

Engineering for in situe debugging & fault finding

Its a common issue right, your PCB has been fabricated, the system has been assembled and it is now sitting, in location doing its job. And then one day, it isn’t.
Or, possibly even more commonly, your system was unable to be fully tested on the bench, and it’s only now, as all the pieces are brought together in one location, that we realise it isnt working as we intended. We have all done it at somepoint, gone down a development rabbit hole on the theory that “well it should work in theory,” only to find that reality doesnt match with theory as closly as we hedged our bets on.

So, how can we design for these cases and this inevitable, in field debugging process? I hope to outline some approaches I take, to build robust and flexible embedded systems projects that dont leave you guessing once we reach this stage where development meets deployment, theory meets reality or users with little technical knowledge meet a system they don’t understand for the first time, when you are working remotely.

Part 1: Fault Finding

Indication LEDs

Its an obvious point, but they really can be under utilised by engineers and technicians new to developing embedded systems. There really isn’t any reason to not chuck a coloured LED onto a voltage bus. It lights up when there is power. If it doesn’t light, you can within seconds of examining your PCB, understand exactly what parts of your circuit are unpowered. If its lighting up, but dimly then perhaps the voltage supply is too low, or not able to deliver enough current. This is 2 or 3 more pieces of information you now have about a malfunctioning system, while its in situation before ever having to reach for your multimeter and start probing.

In one of my previous roles, we had some high end video radio receivers. Their power suppliys were pretty bad, the units would operate in all kinds of conditions, and they were fairly power hungry. Over time, the connectors between the power supply and the motherboard tended to oxidise. Once this oxidation has started, the added resistance meant the reciever unit would need to draw more power, which would lead to more heating of the connector contacts, which would lead to faster oxidation untill the unit would stop functioning.

You could generally tell when a unit was approaching the point that it would stop responding properly to user inputs on the front panel, because the LED backlight for the screen would grow much dimmer than normal. The current limiting caused by the oxidation would dim the LED in the screen, before it affected the operation of the receiver itself! What a fantasticly specific symptom of imminent failure. Once I was used to this symptom, the first thing I would do if I had a unit with a slightly dimmed screen, was swap this cable out. Problem mitigated before it even started.

Modern screens are likely to be more energy efficient, and less likely to show this dimming effect before other things start failing, but a nice bright status LED will still do the job of indicating a voltage bus that is under performing.

Start Up Conditions

Okay, this is still about LEDs but bear with me.

You power the equipment on. Its not doing its job. What visual tools do we possess that would allow us to start making informed judgements about what the problem is?

Well, that entirely depends on what you have implemented. Along with Power Indication LEDs, as outlined above, we can also include Status LEDs.

These can range from, a single light shows us the equipment has reached a steady state after power up. Different colours can be used to indicate different states, or even error states. We can use many LEDs and specific codes to indicate specific states or faults, we can make them flash, at different speeds to mean different things. These are all very very good tools, but definatly can be overused. In a similar vein I have used equipment which beeps, a series of high & low tones to indicate different states. This is a great idea, but it can be hard to decode, even with a manual that spells it out quite clearly.

“Here’s a short summary of all of the modes and the beeping (or flashing [sic] that accompanies each mode. In the description of the beeping pattern, “dit” means a short beep while “dah” means a long beep (three times as long). “Brap” means a long dissonant tone.”
Altus Metrum 2022

In this case, it is for a device which is desgined to be embedded deep within the payload bay of a model rocket, so the use of tones is appropriate, but I question whether this approach is suitable outside of this specific use case. So maybe trying to code too much information into LEDs or beeps is overkill. There is at least some value in a flashing LED of any kind though. Especially as a startup condition.

Remote Labs Power Supply Project

In a recent project, we used AtMega382p microcontrollers to turn on the outputs of a power supply that will be providing power to 4 different remote labs experiments housed in one box in a public area. Eventually, the plan is to use this microcontroller to report current use, and act as a watchdog for the experimental setups, however at the moment, although the hardware exists on the PCB, we had not implemented any of these higher functions in software.

During testing, we blew some firmware in that flashed an onboard LED 6 times quickly, then set the outputs high to turn on the power to the experiments. I only put the flashing LED in the firmware so during testing, I could quickly make sure my program was actually running [debugging value added +1]. However, nearly 3 months after programming these devices, we were installing the next round of brand new experiments and were seeing major power issues. Luckily, because of this throwaway loop to flash the LED 6 times, we had a major clue. The LED would not stop flashing.

This means that the controller is only ever getting part way through its setup function, before the power is being reset and the program is starting again from the start. This told me the problem is downstream of the microcontroller. It cut out what could have been days of fault finding, without even plugging the system into my laptop and opening serial communication, or getting out a multimeter. [debugging value added +5]

For now, the broken board is sat on my desk for further investigation, but this quick bit of visual fault finding gave us the confidence to quickly swap out a known broken piece of hardware for a working one, and get the rest of the system deployed in time to meet our deadline.

Part 1 Conclusion

These approaches are by no means the only way to include visual or audible fault indicators, but hopefully it has given you some ways of thinking about the problem of diagnosing a malfunctioning system during the in-situ environment, and given the motivation to include silly little features like power LEDs, indication LEDs and flashing LEDs to your projects to help you, or your end users understand what is going on inside the little black boxes of magic you have provided to them.

Part 2: Flexibility as a Design Goal to Aid Debugging

I like flexibility. Options and places to go make technicians lives easier, and they will thank you for it. In my previous role as a broadcast engineer, we always used to talk about “getting out the shit” equipment. The spare parts we would have in the van that likely would not be needed……BUT….might be if *Situation Happened*

There is no reason this approach to engineering cannot be applied to embedded systems, especially in the modern approach of rapid prototyping and deployment. Often we are ordering 25 PCBs, with the aim that we build 2 to find the flaws and debug, we deploy 20 systems, then finally do a PCB revision for any systems after that.

Of course for this to work, or at least, to make this easier and cleaner to accomplish, its best to design potential design revisions or options in at the ground level. There are many approaches I take when designing for flexibility, here are the ones I can think of as I am writing.

Spare Power Bus & GND Pins

Please Please Please, find some space on your PCBs for a few additional ground pins and at least one for each voltage rail or bus your circuit contains. These serve two purposes. The first is a test point. This is especially important for grounds. Why not use a test point footprint? Well for two reasons… One it will then only serve this first purpose, and two, its much easier to jam a multimeter probe into a plated through hole and hold it there with one hand, than keep the tip balanced on a tiny and smooth peice of copper.

The 2nd reason for including spare pins to voltage rails and ground, is to enable easy modification later. This is especially important when prototyping, having the ability to quickly run a wire cleanly to your voltage rail or ground, insert decoupling caps or even power some external device or daughterboard that gets implemented later can be the difference between your first found of PCBs being worthless junk, or something that can be adapted to at least develop a working prototype, or maybe even make it into deployed systems.

Route spare IC package pins somewhere

If you have the space, its always worth routing unused (but potentially useable) spare IC I.O. pins to something, possibly some more pin headers as above. This is mostly for the same reasons as above. Flexibility, the ability to add and easily connect additional boards, push additional features. Do development on PCBs you have to test concepts out for “mk II” projects. This also has the added benefit of providing additional copper to the soldered pads, which make loosing any if you need to remove the IC far less likely. Non Connected pads are very easy to accidently remove when heating and reheating a part to remove it.

Use solder jumpers and pin headers for reconfigurable circuit layouts

Often for my work I am translating application notes directly to PCB without testing beforehand. This is a risky way of working, but we can mitigate some of this risk by providing different wiring options at the board level. Whether this is space for additional parts, or using breaks in the circuit and solderable jumpers, we can often make a PCB easily configurable in several different ways. This is good not only for prototyping, but sometimes it means we can just get more bang for our buck out of single PCBs, by making it possible to do several different things with basically the same circuit.

If supply chains are struggling, include footprints for multiple packages.

Due to the Global Pandemic…. Had become a line so common it is already cliche, however the pandemic has totally change risk profiles for any kind of electronic development work. My design work changed from a focus on the circuit and application level during the design phase and component sourcing, to spending the majority of my time just on component sourcing, and this leading the design work first and foremost. I.E. Projects went from “What parts do we need to do X”, to “We have access to these components, and these components are going to be impossible to get, what would we be able to do with what is available”

We knew we needed a certain number of products to be avilable to our users by a specific date, so we had to make sure our supply could meet our demand before we even looked at circuit specifics. We changed from doing design, then purchasing parts, to purchasing parts, then doing detailed design.

There are other ways to mitigate risk in this area. On one roind of PCBs we made, we placed footprints for both a single microcontroller IC, and a development board version, as we had no idea which type we would be able to maintain a steady supply of.

spare ground pins

unused IOs

solder jumpers and pin headers with shorts for easy reconfiguration

Implemented alternative versions of the schematic when application notes are unclear or conflicting

Alternative footprints to cope with supply chain issues

blog