Test Mode at Stripe

Interview with Kimberly Hou

Who are you?

I’m a software engineer on Stripe Connect, the product platforms and marketplaces like Lyft and Shopify use to process payments and share funds with Lyft drivers and Shopify sellers. Over the last couple years, I’ve worked on building and maintaining the product/API layer of money movement pipelines as well as parts of the Stripe dashboard for Connect platforms. Outside of Stripe, I enjoy playing piano, composing music for film/media, and music production.

How does Stripe's test mode work?

Stripe provides every user with a pair of API keys upon sign-up, one for live
mode and one for test mode. Depending on which API key is used in a given
request, Stripe creates and stores internal data models with the corresponding
livemode: true or livemode: false flag. This is the same boolean flag that
appears on the API response of Payments, Transfers, Payouts, and many more of today’s Stripe objects[1]. The Stripe dashboard currently separates live mode
and test mode data by allowing a user to see only one type of data at a time,
toggleable with a switch[2].

We strive to keep test mode and live mode behavior as consistent as possible

We strive to keep test mode and live mode behavior as consistent as possible[3]
so as not to defeat the purpose of a de facto testing environment for users.
While this statement might sound obvious, it also hints at a philosophical
choice. Users accepted into a beta feature should have access to it in live mode
and not just test mode, and vice versa. This way, the users have full control
over rolling out their particular Stripe integration, rather than Stripe
deciding when and how to divide behavior access between a user’s test and
non-test environment.

To keep behavior expectations consistent, test data models need to go through
the same state machines that live mode ones could. The transaction availability
timing of a first-time Stripe user in test mode matches the timing of a
subsequent transaction in live mode, for example.

How do we make this possible when the user is not actually collecting a payment on Stripe from a customer’s debit card nor transferring out real money to their bank account?

Answer: the bank account, the debit card, and even the customer are all not real, either!

They’re created as test data models that, if used in a test money movement
operation, may trigger a mocked response or subsequent reaction flow.

Enter test mode input for test mode data. Whether through the API or the
dashboard, users can test their Stripe integration without changing their code
by inputting specific test tokens and numbers such as tok_chargeDeclinedInsufficientFunds or the credit card number
4242 4242 4242 4242.

Each test token as listed on Stripe’s docs[4] maps to a characteristic (a valid
Visa credit card from Australia) or response (a bank account transfer that fails
with an account_closed code) along with the resulting test mode-only object.
These test inputs allow users to build robust integrations that handle various
financial network failures and responses before they launch in production.

What does it look like to manage both code paths?

Always a work in progress! And quite variable depending on the situation. In many cases, code paths can execute agnostic of the livemode flag until reaching a function that does need to check the livemode value. That being said, there is an important distinction between running as much of the same code as possible for live/test mode requests versus routing entire live mode requests through separate infrastructure than test mode requests.

As an example of the latter, we keep live and test mode API requests on physically separate host sets such that we can apply changes to the host sets with test mode requests first, thereby de-risking feature rollouts. With this design, extremely high traffic rates for test mode requests won’t be able to cause resource exhaustion for live mode requests, either.

Meanwhile, the considerations might appear different for the product backend code itself, going back to the former challenge of keeping live/test mode behavior consistent. For one instance in particular, we started out with a dedicated test mode queue to process test mode streaming events separate from a live mode queue, along with if livemode statements upstream. Over time, this resulted in the team prioritizing failures mainly in the live mode queue over the test mode queue; the queues also accrued different sizes and behaviors over time, making it hard to debug subsequent test mode failures. We ended up combining the queues for both test mode and live mode, which simplified the whole system while still maintaining performance for processing live mode events. Another advantage was that seeing a bug in test mode now indicated the bug’s likely presence in live mode as well.

What are the challenges around building a test mode?

Test mode is useful for a few reasons, including:

  1. Knowing what to expect within the life cycle of a money movement operation
    • Confirming how long a pending payment takes before moving to the user’s available balance or what happens to a
      payment after a dispute has been won
  2. Making sure one's integration with Stripe works smoothly
    • Ensuring the user has no badly formatted arguments in the
      API requests such as mistakenly inputting the wrong number of decimals
  3. Handling potential responses or errors from financial networks
    • Checking the user’s retry mechanism logic upon receiving a transfer.failed event
      notification/webhook (when relevant, ideally using an idempotency key![5])
    • Testing questions such as: How would the user’s integration handle a successful
      payment that was later disputed with the reason ‘product not received’?

Designing and then supporting a parallel test mode story is key before initial
rollout

From Stripe’s perspective, many challenges of building a robust test mode lie in
the implementation enabling users to form accurate decisions to points 1 and 3
in particular. Whether building new money movement pipelines or revamping
existing ones, designing and then supporting a parallel test mode story is key
before initial rollout. In the implementation, test and live mode data models
should often go through the exact same code paths until they ‘physically’
cannot, when they reach the point of funds entering or leaving Stripe. At this
point, the test mode designs can widely vary based on the kind of institutional
network communications Stripe would need to send or receive to mock a given
response.

Mocking the timing of asynchronous responses from financial institutions or
account verifications is one area requiring more thought. In synchronous flows,
we technically could check as one of the first few validations at the API layer
if testmode && input.is_a?(TestToken), immediately returning the corresponding
test mode response if so. With asynchronous flows, the ideal approach might be
less straightforward from an engineering perspective.

Let’s imagine collecting a verification requirement, like the Employer Identification Number (EIN) of a business signing up with Stripe. Let’s also imagine that Stripe verifies the inputted EIN asynchronously, after which the result is streamed to a Kafka-esque consumer. The consumer processes each result and later updates any relevant data of the business’ verification status based on whether the EIN was successfully verified. The API layer wouldn’t return such responses immediately for live mode EIN inputs in cases like this. That being said, should the test token mappings themselves be threaded all the way down to these consumers, forcing all the code in between to contemplate the concepts of test mode and test tokens? To what extent should the processing of test tokens be more centralized or limited to the first x layers of the code? Depending on the situation, sometimes the answer is straightforward, but other times there are trade-offs to consider ranging from feasibility of live/test mode logic branching to deciding how closely to mimic the timing of live mode responses for the optimal user experience.

Aside from test mode, any "fun" bug stories in general?

While I didn’t experience this firsthand, I heard back around 2013 that an
engineer accidentally affected the live mode API instead of the staging
environment, during which our users (a single digit number at the time) received
a hardcoded ‘Hi there’ as the API response to every incoming API request for
some minutes. We (obviously) made myriad changes and have come a long way since then!


  1. https://stripe.com/docs/api ↩︎

  2. Fun fact: According to one screenshot of the Stripe dashboard back around
    2010, test and live mode data were all on the same page, with live mode
    transactions labeled as ‘type: real’ :) https://twitter.com/shl/status/1277232942717128706 ↩︎

  3. Main exceptions prevent processing and storage of real user data. Stripe’s
    API would error if a user passed real credit card or bank account information
    into a test mode request, for instance. ↩︎

  4. https://stripe.com/docs/testing ↩︎

  5. https://stripe.com/docs/api/idempotent_requests ↩︎

Subscribe to the newsletter

Join a growing community of readers interested in payments. Every so often, I'll share curated content and updates when new interviews are published.
Great! Check your inbox and click the link.
Sorry, something went wrong. Please try again.
I won't send you spam. Unsubscribe at any time.