Do you find that things that tested fine, once you put it on a shelf for a year or two because of other project delays, that they would just stop working?
We rarely put anything back into storage after building it. Our market is high value, low volume. We usually build on demand.
Having said that, when doing a new development, there are multiple points of failure that we need to look into:
1. Design issues
2. Component issues
3. Build quality issues
4. Requirement/specification issues
1. Sometimes, no matter how much time is spent on simulations, reviews and prototyping, issues with the design will happen and only show itself during qualification and testing.
2. Most of my time is spent trying to solve various component issues. Production spread on components can cause issues, obsolescence, especially on new designs are a big problem, but at the moment one of the biggest issues that we face is when one component manufacturer buys out another and moves production of the components to a new fab. Even if the fab process is the same, those components are never the same.
3. When you have to solder, epoxy, wirebond and weld anything, there is always room for something to go wrong. We spent the better part of the last year trying to solve a laser welding and plating issue that stopped our products from being hermetically sealed.
4. Sometimes the client specifications don't make sense. Trying to convince them of this is difficult.
When doing a new design, you will build a prototype to the same standard as a production unit, and then put it through HALT (highly accelerated life test) and HASS (highly accelerated stress screening). These processes simulate the lifetime of a product, and will show where failures occur. You then have to correct those issues and try again. HALT is typically a process that continues until the prototype breaks.
Production units are NOT put through HALT and HASS, as those processes are too harsh and will damage the units. Instead, you put a production unit through an ESS (environmental stress screening) process. Similar to HASS, but not as harsh. You also don't degrade component performance and lifetime.
Qualification is done either on a lower level component (like a specific PCB or module), or on a higher leve3l system. There are various factors that dictate exactly how this will happen.
All of these processes above are followed to minimize the risk of failure. But it is difficult/impossible to completely stop failures from happening. You can do all of this and then find a failure at the last minute (as it seemed happened in this case with the SLS). I've seen systems work on the test bench, then carried to the next part of the assembly, only for things to fail there. It happens.
Hopefully Boeing has replacement controllers available on hand to swop out the faulty one, test everything and be good to go.