Software engineering notes

Mobile growth: experimentation

with one comment

Once we have a way to quantify usage, we can compare usage between variants of an app.

To do this, we need to log an event when someone enters an experiment variant. I’ve heard this referred to as an “impression” or “activation” event. Firebase ABT simplifies things a bit by enabling developers to identify an existing event for this purpose. The basic idea is to serialize events by time, identify a common start point (the activation event), and then compare the events after for things like increased signups, or increased time using the app, increased purchases, etc.

It’s critical this event is logged equivalently for all variants so we can compare apples to apples. This is an example of where QA features in analytics SDKs and services is helpful.

Testing identical variants (“A/A testing”) is helpful for identifying issues in analysis infrastructure.

As with analytics, building experimentation infrastructure is non-trivial and the cost of errors is high, so using an existing provider is advisable.

Written by Erik

September 28, 2019 at 7:22 pm

Posted in mobile-growth

Mobile growth: analytics

with one comment

Local features of an installation, like locale or device type, provide a limited opportunity for personalization. Defining a mechanism for communicating feedback from an app to the service supporting the app expands the range of opportunity.

Analytics infra generally provides a few things:

  • a process for defining events
  • an SDK for logging events and communicating them to a service
  • service infra to persist a high volume of events
  • storage for a large volume of events
  • stream, batch and or ad hoc aggregation
  • visualization of aggregate data

Given all this is non-trivial, and the cost of errors is high, using one of the many existing analytics providers is advisable.


Logging garbage is costly. A simple example would be defining events as simple strings, misspelling an event string, failing to include the misspelling in aggregation logic resulting in an erroneous report, and basing a business decision on the report. The latency involved in collecting, aggregating and analyzing event data can make such errors hard to detect.

A process and tooling for explicitly defining event types can reduce the risk of logging garbage. For example, we can use protobuf to define events and source control to oversee protobuf maintenance, and then use the protobuf consistently at all layers, from event generation to aggregation.


A simple SDK can just have a method to log events, a buffer of events, and network logic to flush the buffer periodically to the analytics service.

One nuance concerns the priority of events. For example, we might want to report errors immediately, or monitor events more closely during a release.

Because the events logged by the SDK are critical for growth functionality, providing a way to mock the SDK in tests is helpful for QA.

I’m sure there are a million other nuances folks on analytics teams can speak to, but from the perspective of an SDK user, I just need a way to log events (and assert they were logged correctly).


My only experience with analytics services concerns asserting events were logged correctly.

Enabling developers to point an SDK at a mock endpoint and listen to the event stream is helpful for development. Enabling test infra to access the resulting logs enables integration testing.


Providing intermediate columnar storage, like Dremel or Vertica, is helpful for ad hoc analysis.

Providing access control at the storage layer ensures data is only visible to those who need it.


We typically need to aggregate analytics data for it to be useful. For example, signups per day vs a single signup event. To this end, tools supporting aggregation, like Flume, are helpful.


Analytics data is often presented as a time-series. Storage and client-side tools for displaying time-series data are helpful.

Written by Erik

September 28, 2019 at 7:21 pm

Posted in mobile-growth

Mobile growth: authentication

with one comment

I define “authentication” broadly to cover assertion of app and user (including anonymous) identity.

The principle of least privilege can help us determine what type of authentication a given feature requires.

In general, I bias toward standards, namely OAuth 2, to avoid reinventing the wheel (and fixing the same bugs), especially with respect to security, where bugs can be very expensive.


A caller’s IP address is usually the baseline server-side identifier. We can use an IP address to derive a reasonable default location, for example.


Asserting the identity of an app is a hard problem. Malicious users can easily scrape identifiers out of an app instance, but we need to start somewhere.

Google’s “API key restrictions” are the closest I’ve seen to app authentication.


Now that we have an idea of which app is calling, we can identify the caller further by defining an “instance”. A simple approach is to just generate a random number or uuid, persist it in the client, and tolerate some collisions.

A slightly more complicated approach is to also generate and persist a secret, and register it with the service supporting the app, on installation, and then use a token derived from that secret ever after to identify the app. I like this approach because it still relatively cheap and makes an incremental step toward authenticating the caller.

Anything stored server-side and associated with an instance should require an instance token.


The next layer of authentication is the person using the app instance.

Many apps do not need a person to authenticate, but would benefit from growth features. A weather app that wants to A/B test new features would be an example.

Another subset of apps provide some functionality before a person authenticates and would like to ensure a continuous experience before and after a person authenticates. An example would be a comment widget that enables composition while logged out, but requires authentication before publication.

Anonymous state is generally device-specific as it’s much easier to transfer state between devices with a common user identifier.


Identifying a user can be as simple as asking for a username and password. Basing user authentication on email or phone can reduce the friction of inventing usernames and passwords, and provides a communication channel for things like account recovery. Federated authentication improves security through consolidation of account management, and can further reduce friction, so long as the user wants account consolidation.

We can pass an instance token in a user authentication request to provide a personalized experience incorporating what we know about the installation, for example.

Written by Erik

September 28, 2019 at 7:20 pm

Posted in mobile-growth

Mobile growth

with 4 comments

From my perspective, “growth” in the context of mobile app development refers to use of authentication, analytics, experimentation and personalization tools to ensure an app meets customers’ needs. The interaction of these tools forms a natural feedback, aka “action-insight”, OODA, etc loop.

These tools have other uses too, such as authentication for security, experimentation for product validation, or analytics for stability, which can blur the line between tuning app behavior and ensuring proper functionality.

A focus on growth, or even an ability to differentiate development and specialize in an area, often comes after an app has achieved some success, again blurring the definition of “growth”.

Whole books have been written on the topic, so my goal is just to document features I’d recommend based on my experience in app and growth SDK development:

Written by Erik

September 28, 2019 at 7:19 pm

Posted in mobile-growth


leave a comment »

A colleague once relayed to me someone else’s observation that every syntax variation allowed by a language will eventually appear in a code base. Resisting the process of breaking down into what’s possible requires energy. The idea that “naming things is hard” seems a variation of this. If I could remember the originator, I’d call it ___’s Law. In the meantime, I think “entropy” is the general form.

With its Greek prefix en-, meaning “within”, and the trop- root here meaning “change”, entropy basically means “change within (a closed system)”

In this context, static analysis tools like linters help limit what’s possible.

An organizational approach I’ve seen a couple times is to embrace the range of possibility. For example, given a camp in favor of Java and another in favor of Scala, a former team avoided endless debate by supporting both until there was an obvious reason not to. Another example is Google Cloud’s reconciliation of REST and gRPC:

All our Cloud APIs expose a simple JSON REST interface that you can call directly or via our client libraries. Some of our latest generation of APIs also provide an RPC interface that lets clients make calls to the API using gRPC: many of our client libraries use this to provide even better performance when you use these APIs

Another organizational strategy David Poll brilliantly described: products will express the org structure that created them (Conway’s Law); we can expend energy resisting this, eg review processes, and/or we can create orgs in the shape of the products we intend.

Written by Erik

September 25, 2019 at 10:35 pm

Posted in org, pattern

Praise for a task tracker 📉

leave a comment »

I just wrapped up a three month project in a new tech stack. From the beginning to the end, I was regularly asked: “Will this be done in time?” I was new to the tech, so it was difficult to estimate. The best I could do was document tasks as they revealed themselves, and then point at the percentage of tasks complete as a measure of progress.

For the first couple months, I added more tasks than I closed. The scope of the project seemed like it would grow unbounded, but then it leveled off. Today, I closed the last task. Amazing, and it wasn’t that hard, because we at least had an agreed-upon task tracker. I just needed to feed it.

Some features I like:

  • Using the same system used for tracking bugs. This frees folks to use whatever tracker they prefer. One colleague prefers a simple spreadsheet of tasks in descending priority, for example. It also enabled folks to use familiar bug tracking tools, like subscribing to updates, assigning ownership, commenting, etc, to communicate about tasks
  • At least trying to present a burn-down chart, though it doesn’t yet know how to work with nested milestones. A burn-down chart is the most accurate tool I’ve found for estimation; even if we can’t estimate the completion time, we can say the job will be done when the number of tasks drops to zero, and easily follow along on the chart.
  • Organizing tasks in a variety of ways, notably, differentiating organization by project from organization by week
  • Organizing tasks by blocking relationship. In other words, enabling me to specify this task depends on that task. Identifying branch nodes in this tree as milestones is helpful for maintaining a sense of momentum

Some usage patterns I like:

  • Presenting the tracker in a periodic (agile, 4DX) cadence meeting and using it to structure discussion. Avoiding tracker maintenance was helpful for maintaining meeting pace
  • Agreeing on one approach to tracking helped focus attention and minimize maintenance cost, though an alternative popped in and out mid-project, and the project launch used a different tracker

Written by Erik

September 25, 2019 at 10:01 pm

Posted in org, tool

Better together SDK pattern

leave a comment »

I’m a fan of an SDK product pattern I’ve heard people call “better together”. The idea is for SDKs to be decoupled, but complementary.

An example is an SDK that needs telemetry. One approach would be to add telemetry to the SDK, but this has a few problems: bloat, opacity, redundancy and coupling. An app may already have a telemetry SDK installed, so bundling another with an unrelated SDK bloats the app. Data logged inside the SDK is opaque to the app, which also complicates any SDK billing story. If the SDK does want to export telemetry data, it will need to build telemetry-specific logic redundant to the app’s telemetry provider. Any telemetry logic built by the SDK is coupled to the SDK.

The better-together pattern provides an alternative. To continue with the example above, an SDK requiring telemetry could detect if a telemetry provider is installed and publish events to it. A simplistic example would be to provide a method on the SDK to set a telemetry provider, eg:

class SDK {
   constructor(telemetry = null);
     if (telemetry) {
 telemetry = new Telemetry();
 sdk = new SDK(telemetry);

With this approach telemetry is only included in the app if the app owner wants it, minimizing bloat. Telemetry from the SDK is visible alongside the app’s other telemetry. The SDK can focus on whatever it does best. Telemetry is reusable elsewhere in the app.

One potential downside with this pattern concerns differentiating “internal” use-cases. Continuing with the telemetry example, the SDK may want to log events that are unrelated to the app’s functionality. I’ve seen three approaches: don’t differentiate, differentiate throughout, or don’t use the better-together pattern. The first approach treated all data as belonging to the app and namespaced all events published by the SDK, which worked well. The second approach was expensive due to technical complexity and eventually discontinued. The third approach was expensive due to redundant staffing, infra, UX, etc, but necessary so long as some parties don’t buy into the better-together pattern. I guess this stresses the “together” part of better-together 🙂

Written by Erik

September 25, 2019 at 8:20 am

Posted in org, pattern