
Building OnePay’s Data Platform: Unlocking Real-Time Intelligence at Scale

Share
In this Article
Share
At OnePay, we wake up every day thinking about how to do more for our customers, faster. As we evolved into a multi-product platform, a need showed up everywhere: teams needed fast, trustworthy data to make decisions, build and iterate with confidence.
That’s where the OnePay Data Platform comes in.
Our objective is simple to state and hard to achieve: Unlock business insights as quickly as possible to enable business decision making in real-time. In a multi-product ecosystem, with large numbers of backend services, vendors, partners and millions of customers, two challenges arise. First, how do we stream, centralize and cleanse large amounts of data from all these systems into a cohesive data warehouse. Secondly and even more importantly, how do we make this data readily usable by the systems, individuals and AI Agents that need it, as quickly as possible.
The scale behind the data platform
Our platform manages ~1.2 PB of data, with hundreds of gigabytes and nearly a billion rows flowing through it daily. We balance real-time streaming pipelines with structured batch processing to match the needs of each use case. Freshness and completeness trend above 99.9%; because reliability isn’t optional at this scale.
When systems operate at this volume, small cracks widen quickly. Definitions must be precise, governance must be intentional, and onboarding must be fast or velocity suffers.
But scale isn’t just about volume, it’s about unification. We’ve built a system where all data across our product lines, supporting vendor and partner systems, lives in one place, rather than being fragmented across tools or organizations. Having this shared foundation is key to eliminating fragmented views of the data, incomplete business reports, and different access patterns to the data.
Making Data Findable and Trustworthy
When shipping new products at our velocity, iterating weekly, expanding into new verticals, data has a tendency to sprawl. New services spin up, schemas evolve, and the data landscape fragments. Not because anyone planned it that way, but because the pace of product development outran the pace of data organization. Teams end up asking the same questions: Where does this data live? Who owns it? What does this field actually mean? Which version should I trust?
We built the Data Platform to make sure those questions always have immediate answers. We structure data assets around product-aligned catalogs with standardized naming conventions, clear ownership, and documented definitions. When an analyst, engineer, or product manager needs to understand what data exists for a given product, who owns it, and which version is the canonical one, those answers are built into the platform, not buried in Slack threads or tribal knowledge. This reduces reinvention and makes cross-product work dramatically faster.
We also structure data in a tiered pattern, often described as a medallion architecture. Raw data is replayable and auditable. Standardized data is cleaned and conformed. Curated data is KPI-ready for dashboards, reports, and downstream applications. This creates a practical gradient: teams can explore freely at the raw level while trusting that the curated layers are consistent, governed, and ready for production use.
Broad Visibility, Tight Control
Making data accessible only works if it's safe by default. We designed access around a two-level paradigm. General access gives teams visibility into product data along with access to non-sensitive standardized assets and analytics marts. Sensitive fields are masked by default, so most users can run meaningful analysis without ever seeing raw sensitive values. Elevated access requires separate, explicit permissions tied to sensitivity classifications, ensuring that only people who truly need sensitive data can see it.
This model keeps data broadly usable without compromising privacy or control. It's the same principle we apply everywhere at OnePay: move fast, but never at the expense of customer trust.
Day-Zero KPI Visibility
The best day to see product KPIs is the day the product launches. Shipping features without instrumentation is like launching blind, so we aim for "eyes on glass" the moment new features come online without making data integration the bottleneck.
Instead of bespoke pipelines for every new product, we invest in reusable ingestion archetypes so new integrations can be assembled quickly and consistently. In practice, a new data source can be onboarded in hours, data platform support can be integrated in parallel with product development, and products can launch with immediate KPI visibility. The result is predictable launches, fewer one-off surprises, and faster time to first dashboard.
Standardization amplifies this further. We define foundational KPI families, daily active users, onboarding funnels, transaction volumes in a way that travels across product lines. When definitions are consistent, teams spend less time debating what a metric means and more time improving the outcomes it measures.
Observability as a Data Problem
System uptime is table stakes; data trust is the real goal. We treat data quality as a first-class reliability problem, monitoring freshness (did data arrive on time?), completeness (did we receive what we expected?), and anomaly detection (are volumes or distributions shifting unexpectedly?). This keeps trust scalable as volume, velocity, and product surface area grow.
One Platform for the Full Data Lifecycle
A key principle of our approach is that the Data Platform isn't just for analytics, it's where data work happens end-to-end. Teams use the same platform to run analyses and answer business questions, build and train models, debug product behavior and investigate issues, and power reverse ETL into backend systems and marketing tools. This eliminates the fragmentation between "analytics systems" and "production systems." The same trusted data that informs decisions can also power product experiences and operational workflows, creating tighter feedback loops where insights don't just sit in dashboards but flow directly into the systems that act on them.
AI as a Force Multiplier
In Scaling Engineering: The Agents Supporting OnePay Engineers, we shared how agents are becoming a practical way to scale productivity, not as novelty, but as leverage. We apply the same philosophy to the data platform: reduce friction, shorten feedback loops, and make speed available to everyone who interacts with data, without compromising standards or governance.
We think about this through the lens of three stakeholder personas that interact with our platform:
Builders
For teams building pipelines and data products, the Data Help Agent accelerates the repeatable work: standing up integrations from proven patterns, iterating on transformations, and enforcing platform standards like naming, tagging, ownership metadata, and operational guardrails. The intent is simple: ship faster, and land it correctly the first time.
Explorers
For teams that need answers quickly, the Data Analytics Agent reduces time from question to trusted dataset to insight. It helps users navigate toward curated assets, apply consistent metric definitions, and get to reliable conclusions without needing to be deep in SQL or notebooks. This is how we make data self-serve real across the company, not just for specialists.
Integrators
As more internal systems become agentic, the Data MCP Server provides a governed way for those systems to interact with the Data Platform. Instead of each tool reinventing access patterns, our MCP server enables policy-aware discovery and querying so AI agents can support workflows end-to-end without bypassing controls.
Together, these capabilities act as a force multiplier: engineers spend less time on scaffolding and rework, business teams spend less time hunting for answers, and internal tools integrate faster while staying inside the same governance model that protects customer data and controls cost.
A Foundation of Trust and Security
Our commitment to serving customers faster is inextricably linked to our commitment to protecting their information. We treat customer data with the extreme sensitivity it deserves. This is why our Data Platform is engineered from the ground up on the principle of least privilege, ensuring that access is granted strictly on an as-needed basis.
Every design choice, from our tiered catalog structure to our two-level data access paradigm, reinforces this foundation. As we scale, trust remains our most important metric.
Conclusion
The biggest bottleneck in most organizations isn't lack of data, it's the effort required to find, understand, and trust it.
A truly effective data platform grants seamless speed and agility to engineering, product, and business teams. It enables faster iteration and launches, grounded in reliable, high-quality data. It ensures every strategic decision is based on a single source of truth, transforming guesswork into confident, data-driven decisions.
Ultimately, our customers experience this through faster innovation as we quickly validate product hypotheses and understand usage patterns, more personalized experiences powered by rich real-time data, and improved service through enhanced operational analytics. Our data platform is the quiet, critical catalyst behind it all.