Securing a Bank UI

How I Didn't Lose Any Customer's Money And Retained My Sanity Throughout

Before I joined Clerk, I was a fintech CTO for nearly 6 years, with 4 years at Letter and 2 years with WorkMade. Aside from the truck load of valuable lessons those experiences taught me, I also had the fun* challenge of securing people's money on a daily (hourly? MINUTELY??) basis. I pretty much poured my 15 years of overall experience into the following post: Chaos Engineering: Jurassic Park & Distributed Systems, which spun into the following talk: Letting go of perfectionism in distributed systems, given in Austin, Texas at fintech_devcon in 2024. However, I wanted to do a deep-dive on some of the more interesting ways I implemented security in a banking system, from the UI (React) through to the API (GraphQL) and beyond.

*induced crippling anxiety

Diving Deep in 3...2...1

Securing a user interface (UI) for a banking platform is a multifaceted challenge that demands careful attention to data integrity, system architecture, and user safety. A bank's UI is not just a visual layer - it is the point where customers interact with complex backend systems, access sensitive data, and initiate actions that require airtight security. Every touchpoint for your system exposes a surface area for weakness and attack and general nefariousness from ill-intentioned people. Designing a secure bank UI involves far more than aesthetics (although, cool animations are nice to look at); it is about creating an experience that instills trust while seamlessly safeguarding user data and system functionality.

The Foundation: Event-Driven Architecture and Data Flow

At the heart of the secure banking system I built at Letter lay an event-driven architecture. This approach allows actions initiated by users (GraphQL requests) to flow through the system in a controlled and traceable manner. For instance, when a customer performs a transaction, such as transferring funds, the action is first routed through a GraphQL layer that validates the request. The use of GraphQL provides a structured data contract, ensuring that only properly formatted and authorized requests make their way into the system.

Once validated, the action is handed off to a dedicated business-logic microservice. This microservice is responsible for processing the request while adhering to stringent rules. For example, it might check the customer's account balance before approving a transfer request. If the transaction passes these checks, metadata describing the action is recorded as a schema-driven JSON payload. This payload was stored in a distributed database (Etcd), and broadcast across the system in a pubsub fashion.

This event-driven pattern not only ensures data consistency but also enables ancillary microservices to process events in parallel. These services might update their own data stores, such as PostgreSQL, with normalized data for efficient querying. This division of responsibilities creates a modular, scalable system capable of handling the high demand of banking operations. We had the GraphQL act as a traffic router of sorts; if you wanted to change something in the system, your request was routed to the business logic and validated before Etcd insertion, however if you simply wanted to read information from the system, your requests was routed to a read-only cache service which queried a PostgreSQL database.

The business logic microservice was the ruler of system lore; if it allowed the action, the action was written into Etcd and never altered, ie. UserCreated - the ancilliary microservices took this new event as gospel and updated their own stores dilligently, ready to be read by the UI.

Here's that very same ancilliary microservice listening for a new user:

events.addEventWatcher(async (event) => {
  const { revision } = event.header();

  // Ensure the event contains a revision
  if (!revision) {
    throw new DataError(DataErrorKind.RevisionNotDefined);
  }

  // Check if the event type is UserCreated
  if (UserCreated.is(event)) {
    const { timestamp } = event.header();
    const payload = event.payload();

    // Destructure relevant data from the event payload
    const { id, firstName, lastName, email, phone } = payload;

    // Prepare user creation parameters
    const createParams = {
      firstName,
      lastName,
      email,
      phone,
      externalId: id,
      notifications: {
        receivePush: false,
        receiveSms: false,
      },
      createdAt: timestamp,
      revision,
    };

    try {
      // Attempt to create the user in the database
      await this.create(createParams);

      // Publish a UserCreateCommitted event upon successful creation
      const evt = UserCreateCommitted.create({ id, email }, event.header());
      await events.inTransaction((state) => state.publishEvent(evt));
    } catch (error) {
      if (error?.code === SYSTEM.PostgresErrorCodes.DUPLICATE) {
        // Log duplicate record attempts at trace level
        ctx.logger.trace({ event: "user insert duplicate", error });
      } else {
        // Log other errors at error level
        ctx.logger.error({ event: "user insert", error });
      }
    }
  }
});

We used the event revision as a sequencing marker, to ensure events are stored in order - how do you ensure order in the chaos of a distributed system? - the answer; with great difficulty.

Immutable Event Storage for Auditability

A key principle in securing a bank UI is maintaining a comprehensive audit trail of user actions. To achieve this, every customer action is stored as an immutable JSON payload. Each event is uniquely keyed using ULIDs, which chain the events together in sequence. This immutability ensures that the historical record of actions cannot be altered, providing transparency and trustworthiness in the system's operation.

If a customer wishes to undo an action, the system writes a new event to denote the subsequent change rather than modifying the original record. This approach preserves the integrity of the event log while accommodating user-driven changes. Whether for internal auditing or regulatory compliance, this design ensures that every action is traceable and verifiable.

Leveraging GraphQL and TypeScript for Data Integrity

Data integrity is non-negotiable in a banking system, and the combination of GraphQL and TypeScript plays a critical role in ensuring it. GraphQL's strict data-contract capabilities enforce consistent schemas, preventing malformed or unauthorized requests from being processed. TypeScript complements this by adding static typing and compile-time checks, reducing the risk of runtime errors.

This combination of technologies also streamlines development. Using TypeScript across both client-side and backend implementations ensures a unified codebase, making it easier to maintain and extend the system. Additionally, the widespread adoption of TypeScript improves hiring flexibility, as skilled developers familiar with the language are readily available.

Masking Client-Facing IDs for Enhanced Security

One of the most significant security risks in any system is the exposure of sensitive data in client-facing environments. To mitigate this, I implemented a mechanism to mask and secure database IDs before they were ever sent to the client. This design prevents unauthorized access and shields backend details from potential attackers.

This means there were zero database IDs exposed in any UI or API response - which generally felt a lot safer.

Here's how the process works:

  • When an object is retrieved from the database, all sensitive IDs are replaced with randomly generated, URL-friendly keys.
  • These keys are stored in a Redis database, alongside the ID of the requesting user and the real object ID. Each key is assigned a time-to-live (TTL) of 60 minutes, ensuring that client-facing IDs are ephemeral and short-lived.

When a user interacts with an object, the system performs a reverse lookup to retrieve the real ID. If the key has expired or the requesting user's ID does not match the stored record, an authentication error is returned. This ensures that only authorized users can access the data, even if a URL is shared or intercepted.

This means you could share the following URL with someone else: https://bankservice.com/accounts/hfyubu1, but because the receiving user's User ID wasn't used to create that unique key, the system would reject the whole request (after the deserialization failed) - thus tying URLs to the logged-in user, and only for 30 minutes.

As you might've already guessed, this does have real-world implications, and some are negative (but can be mitigated and alleviated) - if a user had an issue with a particular Account ID, it made it very difficult for them to talk to support about a particular account, as the ID in the URL was ephemeral and tied to the user's session. For this, I came up with a way for Letter's CX to securely log in to an internal portal (protected by VPN and Google OAuth) and paste the unique ID into a decrypter of sorts - which would only allow a reverse lookup to the real account ID using the staff member's own session JWT.

To better illustrate the above, here's (pretty much) the full implementation of the little library:

const config = applyDefaults(ctx, {
  secureMasking: {
    ttl: 60, // 60 minutes by default
    namespace: "secure-mask.",
    prefix: "sec-",
  },
});

const secureMask = {
  /**
   * Masks a sensitive ID by generating a random key and storing it in Redis.
   */
  mask: async (id: string, userId: string): Promise<string> => {
    if (!id) {
      throw new AppError("ID must be a valid plaintext value.");
    }

    // If already masked, return the key as is
    if (id.startsWith(config.secureMasking.prefix)) return id;

    const maskedKey = `${config.secureMasking.prefix}${await generateRandomKey(
      10
    )}`;
    await ctx.redis.set(
      `${config.secureMasking.namespace}${maskedKey}`,
      JSON.stringify({ userId, id }),
      "ex",
      config.secureMasking.ttl * 60 // Convert minutes to seconds
    );

    return maskedKey;
  },

  /**
   * Retrieves the original ID from a masked key.
   */
  retrieve: async (maskedKey: string, userId: string): Promise<string> => {
    if (!maskedKey.startsWith(config.secureMasking.prefix)) {
      return maskedKey; // Return plaintext ID if not masked
    }

    const data = await ctx.redis.get(
      `${config.secureMasking.namespace}${maskedKey}`
    );
    if (!data) {
      throw new AppError("Masked key not found or has expired.");
    }

    const { userId: storedUserId, id } = JSON.parse(data);
    if (storedUserId !== userId) {
      throw new AppError("Access denied: User ID mismatch.");
    }

    return id;
  },
};

And here's how we were using it within a GraphQL Resolver, in this example, masking an Account ID:

id: authenticatedResolver(async ({ id }, _, { ctx, userCtx }) => {
  const mask = await ctx.masque.mask(
    id,
    userCtx.userId,
    EphemeralShortCode.Account
  );
  return mask;
});

And to retrieve the same ID after the user requested to perform an action:

let id: string;
try {
  id = await ctx.masque.retrieve(accountId, userCtx.userId);
} catch (error) {
  ctx.logger.error({ error, text: "Unmasking account ID failed", userCtx });
  throw new UserInputError(AccountErrors.AccountFetchFail);
}

Building Trust Through Secure Design

A secure bank UI is not just about protecting data; it's about building trust with users. Every design decision—from the event-driven architecture to immutable event storage and masked IDs—works toward creating a system that customers can rely on. By prioritizing data integrity, transparency, and user safety, we created a banking experience that balances security with usability.

Securing a bank UI is a continuous process that evolves alongside emerging threats and technological advancements. However, by focusing on foundational principles and leveraging modern technologies like GraphQL, TypeScript, and Redis, we can build systems that are not only secure but also scalable and user-friendly.

As I'm now out of those roles, I can safely say that we didn't face any damaging attacks or loss of user funds during my tenure as CTO for either company, and it only cost me a few years off the end of my life...