Tyler Hoffman
With nearly a decade of embedded engineering experience, Tyler Hoffman is Co-Founder and Head of Developer Experience at Memfault, a provider for firmware delivery, monitoring, and diagnostics solutions for embedded device companies. Prior to founding Memfault, Tyler led the Firmware Developer Productivity team at Fitbit and was an Embedded Software Engineer at Pebble Tech, where he helped them ship and maintain millions of wearable devices running RTOS-level firmware. You can find some of the articles Tyler has written at: https://interrupt.memfault.com/blog/authors/tyler/.
Debugging Embedded Devices at Scale: Effective Techniques for Diagnosis and Resolution
Status: Available NowIn this presentation, I’ll walk through effective approaches for detecting, diagnosing, debugging, and resolving issues in embedded firmware and devices deployed at a large scale, such as in populations of hundreds of thousands or millions. While much has been written on monitoring smaller fleets, typical strategies like onsite debugging, debugger-based diagnosis, and manual log analysis can fail when dealing with massive populations of devices.
The presentation focuses on low-level debugging techniques like fault handling and exception parsing on the device, as well as a novel approach to capturing core dumps on Cortex-M MCUs. The presentation provides guidance on collecting core dumps and diagnosing crashes and faults without a debugger. I’ll also explore the changes in developer behavior that can be implemented when core dump functionality is integrated into firmware, and dumps are received centrally.
This presentation includes content that is often overlooked by online resources, which assume that firmware works flawlessly and bugs are not introduced by developers. We know this is not the case and have developed these strategies to offer real-world solutions for new and experienced firmware engineers who are eager to tackle the challenges of debugging at scale.
Live Q&A - Debugging Embedded Devices at Scale: Effective Techniques for Diagnosis and Resolution
Status: Available NowLive Q&A with Tyler Hoffman for the theatre talk titled Debugging Embedded Devices at Scale: Effective Techniques for Diagnosis and Resolution
Memfault Demo
Status: Available NowThe video showcases Memfault, a powerful device reliability platform that empowers hardware companies and firmware engineers to monitor their devices' behavior in the field, from just a few devices to millions. The platform features an intuitive crash analysis tool that mimics the experience of a debugger connected over JTAG to a device. With Memfault, developers can efficiently diagnose issues such as crashes, asserts, performance problems, and connectivity issues.
Objectively Measuring the Reliability of IoT Devices
Status: Available NowIn the realm of IoT devices, the metric that reigns supreme is device reliability. When managing thousands or even millions of devices, tracking this metric becomes paramount.
While Mean Time Between Failure (MTBF) has been a historical stalwart for assessing product stability, its utility diminishes when dealing with extensive device fleets and complex operating patterns.
Join us in this talk as we introduce a groundbreaking metric for evaluating the reliability of expansive device fleets: Failure Free Hours. Discover how this metric unveils the frequency of firmware faults, unexpected device reboots, and core function failures. Through a systematic approach to calculate this metric device operators and engineers gain the power to actively monitor and enhance IoT fleet reliability, thereby ensuring seamless operations and exceptional user experiences.
Essential Device and Firmware Metrics
Status: Available NowAs embedded engineers, we love data. Our desk is littered with tools that help capture tons of data, such as oscilloscopes, logic analyzers, debuggers, tracers, and power meters. However, once a device (or thousands) leave our desk and are shipped to customers, all of these tools are paperweights. It's now up to the devices to report issues back to the developers.
This is where metrics come in. Throughout my career, metrics have been the most powerful and simplest way to monitor thousands to millions of devices. This talk covers what metrics are, how to capture them, and explores the seemingly infinite metrics you can capture and creative uses to help you solve real-world, elusive device issues, such as power consumption, performance, and battery life issues.
Live Q&A - Essential Device and Firmware Metrics
Status: Available NowLive Q&A with Tyler Hoffman for the talk titled Essential Device and Firmware Metrics
How to Employ Scalable and Reliable IoT Management Systems
Status: Available NowIoT management systems that can handle large-scale deployments are complex to build and maintain, especially at scale. Firmware updates, debugging, monitoring, and security are all critical components of an IoT system, and they must be managed carefully to ensure smooth operations.
Building IoT management systems from scratch can be a daunting task. However, by understanding the key challenges involved and taking steps to address them, it is possible to build systems that are scalable, reliable, and easy to maintain. Watch this presentation to learn how to build IoT management systems that will ensure the smooth operation of an IoT deployment and flexibly adapt to your needs as your devices grow.
The Best Defense is Offensive Programming
Status: Available NowLet's face it. The firmware we write has bugs, and we need to defend against and accept failures when our system experiences bugs. This is when we typically employ defensive programming practices, but if not used correctly, they may cause more problems than they solve.
There is an alternative and complimentary set of programming practices, often called "offensive programming", which takes defensive programming and flips it on its head. Instead of defending against errors in firmware, this technique will surface them immediately using liberal asserting and proper fault handling.
This talk will cover offensive programming techniques, prerequisites to implementing the suggestions in your firmware, and how you can use them to quickly and easily track down and root cause 1 in 1,000 hour bugs and keep your sanity at the same time.
Live Q&A - The Best Defense is Offensive Programming
Status: Available NowLive Q&A with Tyler Hoffman for the talk titled The Best Defense is Offensive Programming
Monitoring IoT Devices At Scale (2020)
Status: Available NowI'd like to talk about how companies should think about and build out their IoT monitoring solutions using metrics. The differences between logs, metrics, and traces have been talked about at length in the software engineering space, but not for firmware. Using metrics to monitor a fleet of devices allows for assessing the health of thousands to millions of devices, even across groups of devices or firmware versions, all while keeping complexity, bandwidth, and power consumption to a minimum.
Takeaways:- Know how to think about and build a metrics library for gathering compressed and aggregated metrics on devices
- Understand the differences between logs, metrics, and traces, and why using metrics is the best way to monitor fleets of devices post-deployment.
- Know the next steps on how to ingest the data in a server under their control to do monitoring analysis.
- Learn some formulas for calculating fleet health, such as expected battery life, crash free hours, and average connectivity per hour.