Practical AI: Data Platforms for Product Teams

Almost every product manager I know complains about data. Either you don’t have enough of it, or you’re drowning in it but can’t find what you need. Your analysts take three days to answer a simple question. Your engineers are reinventing analytics for every new feature. Your stakeholders want dashboards for everything, yesterday.

Then someone mentions “data platforms” and suddenly you’re in meetings about data lakes, warehouses, lakehouses (yup), streaming pipelines, and event-driven architectures. None of it makes sense, all of it sounds expensive, and you still can’t answer “how many users completed the onboarding flow this week?”

I’ve been there. I’ve also worked with teams where data actually enables product decisions instead of creating bottlenecks. The difference isn’t about having more sophisticated technology, but it’s about understanding what problems a data platform actually solves and building exactly what you need, not what’s trendy.

Let me share what works when you’re trying to make data useful for product teams without building infrastructure that requires a PhD to maintain.

Technology Overview

Current state

Right, let’s demystify the data platform landscape because it’s genuinely confusing and vendors aren’t incentivised to make it clearer.

The fundamental problem a data platform solves: your product generates data across multiple systems (application database, analytics events, logs, third-party tools), and you need to query across all of it to understand what’s happening and make decisions. Simple version: getting data from where it’s created to where it’s useful.

In 2026, there are three broad approaches:

Traditional data warehouses (Snowflake, BigQuery, Redshift): You extract data from source systems, transform it into a useful shape, and load it into a centralised warehouse for analysis. This is called ELT. Extract, Load, Transform. It works, it scales, it’s well-understood. The downside? Latency. You’re always looking at data from hours or days ago because it takes time to move and transform it.

I work with a Fortune 500 company using this approach. Their dashboards showed yesterday’s performance. For high-level trends and strategic decisions (and that is the use-case here), that is fine. For responding to emerging issues or running experiments with quick feedback, it isn’t the tool sadly.

Streaming data platforms (Kafka, Flink, Kinesis): Events flow in real-time from your product to your analytics systems. You can query current state, respond to patterns immediately, and build features that react to user behaviour as it happens. The downside? Complexity. Stream processing is harder than batch processing, and debugging is substantially more difficult.

An insurtech I worked with went all-in on streaming because “real-time data” sounded essential. We spent six months building infrastructure before we could answer basic product questions.

Hybrid approaches (what most teams actually need): Streaming for things that genuinely need real-time responses, batch processing for everything else. Store events cheaply in object storage (S3, GCS), process them incrementally, query through a warehouse or query engine.

The SaaS company I build this setup. Critical user actions streamed to systems that needed immediate response (fraud detection, usage-based billing). Everything else land in S3 (R2 really), is processed nightly, and is available to query the next morning. Simple, reliable, cheap.

Here’s the thing nobody tells you: your choice of data platform matters far less than having clear data contracts between your product and your analytics systems. I’ve seen teams with sophisticated Kafka pipelines that couldn’t answer basic questions because nobody specified what data to capture and how. I’ve also seen teams with basic batch pipelines that enabled fantastic product decisions because they’d thoughtfully designed what to measure.

Technology is an implementation detail. Data contracts are strategic.

Key capabilities

Let’s talk about what a well-designed data platform actually enables for product teams, not theoretical possibilities.

Unified view across systems: Your product isn’t one database, but it’s your application, analytics tools like Amplitude, customer data in Salesforce, support tickets in Zendesk, payments in Stripe. A data platform pulls these together so you can see the full picture.

At the e-commerce company, we couldn’t understand why checkout conversion varied so dramatically between users until we connected analytics events with payment processor data and fraud detection logs. Turned out users from certain regions were getting flagged for additional verification, adding friction that wasn’t visible in our application metrics alone.

This only worked because we’d invested in bringing different data sources into one place where they could be joined and queried together. It’s unglamorous infrastructure work, but it unlocks insights that aren’t visible from any single system.

Self-service analytics for product teams: The goal is enabling PMs and designers to answer their own questions instead of queueing requests to analysts. This requires both technology (accessible query tools, clear data models) and process (documentation, training, governance).

The SaaS is building a semantic layer. A documented, tested set of tables and views that represented key product concepts. Instead of raw events, product people can query tables like user_signups, feature_usage, and subscription_changes. These abstract away the complexity of how data is collected and stored.

Result? PMs can answer 80% of their questions themselves in minutes instead of waiting days for analyst availability. Analysts focus on complex investigations and building better data models instead of repeatedly answering “how many users did X?”

Experiment analysis infrastructure: Running experiments means randomising users into variants, tracking their behaviour, and computing statistical significance. You can build this per experiment, but that’s slow and error-prone. Better: build the infrastructure once, then experiments become cheap to run and analyse.

At the adtech, we created an experimentation framework on top of our data platform. Engineers instrumented experiments using a standard library, events flowed into the data warehouse, and analysis happened automatically in a shared dashboard. Running an experiment went from “two-week project requiring analyst support” to “afternoon of implementation, results available next morning.”

This changed our product culture. We experimented constantly because it was easy. Teams that can’t run experiments efficiently don’t run many experiments. Simple as that.

Anomaly detection and monitoring: Your data platform should tell you when something unexpected happens, not just when you think to look for it. Sudden drop in conversion? Spike in errors for a specific user segment? Feature usage pattern that deviates from normal?

We set up automated monitoring on key metrics with reasonable thresholds. Not everything. Alert fatigue is real. Just the metrics that truly indicated problems. When checkout conversion dropped below expected levels, we got alerted within an hour, not discovered it three days later in a weekly review.

This only works if your data platform supports reasonably fresh data and flexible alerting. Batch processing that runs nightly can’t alert you to problems happening right now.

Historical analysis and time-travel: Product decisions need context. Is this metric normal for this time of year? How did that feature perform when we first launched it? What happened the last time we changed the pricing page?

Data warehouses excel at this. You can query data from months or years ago as easily as yesterday. Compare cohorts across time periods. Identify seasonal patterns. This is harder with streaming systems that don’t typically retain history.

Product Applications

Use cases

Let me get specific about where data platforms deliver tangible value for product work, based on teams I’ve actually worked with.

Feature adoption tracking: You shipped a feature. Who’s using it? How often? Are they successful? Is adoption growing or plateauing? These seem like basic questions, but without proper instrumentation and a data platform to analyse it, you’re guessing.

The SaaS company built a standard “feature adoption dashboard” template. For every significant feature, we’d instrument key actions, and automatically generate a dashboard showing adoption over time, broken down by user segment, with cohort retention curves. This made it obvious which features were succeeding and which were languishing.

When we launched a collaboration feature that seemed successful based on overall metrics, the segmented view revealed that only existing power users adopted it. New users rarely discovered it. That insight led to onboarding changes that tripled new user adoption.

Funnel analysis and optimization: Every product has funnels like signup, onboarding, conversion, engagement. Data platforms let you analyse where users drop off, which changes improve conversion, and how different segments behave differently.

At the e-commerce company, we instrumented every step of checkout meticulously. The data platform let us analyse which steps had the highest drop-off, how long users spent on each step, and where users who abandoned checkout typically exited. This guided optimization efforts far more effectively than intuition.

We discovered that users who spent more than 90 seconds on the shipping address form were 3x more likely to abandon. Turned out our address autocomplete was broken for international addresses. Fixing that single issue improved completion rate by 8%. We’d never have identified that without granular funnel data.

Cohort analysis and retention: Different user cohorts behave differently. Users who signed up in January might have different retention than those from March. Users from different acquisition channels might engage differently. Users who complete onboarding are likelier to stick around than those who don’t.

Data platforms make cohort analysis straightforward. Define cohorts based on signup date, user attributes, or behaviour. Track how they perform over time. This reveals patterns invisible in aggregate metrics.

The fintech startup discovered that users who connected a bank account within their first three days had 60% better retention than those who didn’t. This made “connect bank account early” a priority for onboarding redesign. That insight only emerged from cohort analysis—aggregate retention metrics looked fine but hid this critical pattern.

Segmentation for personalization: Understanding how different user segments behave enables better product decisions and personalised experiences. But this requires analysing behaviour at segment level, which requires a data platform that can efficiently query across large datasets.

The SaaS company used data platform analysis to identify distinct user personas based on actual behaviour rather than assumed characteristics. We found that “infrequent power users” (logged in rarely but used advanced features intensively when they did) had completely different needs than “daily casual users” (logged in regularly but used basic features only).

This segmentation guided feature prioritisation, UI design decisions, and communication strategies. We stopped treating “users” as homogeneous and started designing for specific segments with different needs.

Performance and reliability monitoring: Product quality isn’t just features, it’s whether the product works reliably and performs well. Data platforms can track error rates, load times, and user impact of technical issues across different browsers, devices, and regions.

At the e-commerce company, we discovered that checkout failures were 10x higher on older iOS versions. This wasn’t visible in error tracking tools because the errors were inconsistent and sporadic. Only by analysing error rates by device type in our data warehouse did the pattern become obvious.

We prioritised fixing compatibility issues for those versions, dramatically improving experience for a non-trivial user segment. That’s the power of being able to slice data by any dimension—you discover problems you didn’t know to look for.

Integration approaches

Right, let’s talk about how you actually build and integrate a data platform into your product workflow without creating a technical nightmare.

Start with event tracking, not infrastructure: Your first step isn’t choosing a warehouse, it’s instrumenting your product to emit meaningful events. What user actions matter? What context is important? What questions do you need to answer?

At the fintech startup, we defined a “tracking plan”, a document specifying every event we’d capture, what properties each event included, and why we were tracking it. This prevented the common problem of collecting tons of data but not the right data.

Use a tracking library (Segment, RudderStack, or similar) that decouples event collection from destinations. This lets you send events to multiple tools and change your backend without re-instrumenting your product. Worth the small overhead.

Choose simple, managed services over complex self-hosted: Unless you have specific requirements that necessitate self-hosting (data residency, cost at scale, unique capabilities), use managed services. BigQuery, Snowflake, Redshift, they all work fine. Your competitive advantage isn’t running a data warehouse.

The SaaS company started with BigQuery because we were already on GCP. It worked perfectly. Total setup time was afternoon. If we’d tried to self-host a data warehouse, we’d still be configuring it.

Build a semantic layer for product people: Raw event data is hard to query. Create views or models that represent product concepts—users, sessions, feature usage, conversions. This abstracts complexity and makes self-service analytics possible.

We used dbt (data build tool) to create these models. dbt let us write SQL transformations, version control them, test them, and document them. Product people queried clean tables, analysts maintained the transformations behind them.

This separation of concerns worked brilliantly. Product people got accessible data, data team maintained quality and consistency.

Integrate with your product workflow: Data platform insights need to reach decision-makers when decisions happen. This means dashboards accessible during product reviews, experiment results automatically calculated, alerts routed to appropriate channels.

At the Fortune 500 company, every product team had a dashboard linked from their Slack channel and their weekly meeting notes. No separate “go find the data” step, it was just there. This sounds trivial but dramatically increased how often teams actually looked at data.

We also automated experiment analysis. When an experiment reached statistical significance, it posted results to Slack automatically. Teams could make decisions based on data without manually running analysis each time.

Plan for data quality and governance: Bad data is worse than no data because it leads to wrong decisions confidently made. Build in data quality checks, validation, and governance from the start.

We implemented automatic tests: expected ranges for key metrics, consistency checks between related metrics, anomaly detection for sudden changes. When data looked wrong, we got alerted before anyone made decisions based on it.

Also established simple governance: who owns each dataset, what’s the refresh schedule, what’s the SLA for data freshness. This prevented confusion about whether data was trustworthy.

Future Implications

Trends to watch

The data platform landscape evolves constantly. Here are trends that will actually affect product teams in the next 12-18 months, not just hype cycles.

Reverse ETL and operational analytics: Data isn’t just for analysis, it’s also useful in operational systems. Reverse ETL takes insights from your data warehouse and pushes them back to operational tools. User segments computed from historical data flowing into your marketing automation platform. Churn predictions feeding into your customer success tool.

I advised a B2B company that used reverse ETL to identify users showing early churn signals and automatically create tasks for their account management team. The prediction model ran in the data warehouse, results flowed back to Salesforce. Retention improved measurably because the team could intervene proactively.

This blurs the line between analytics and product. Your data platform becomes infrastructure for product features, not just reporting.

AI-powered query and analysis: LLMs are making data analysis more accessible. Instead of writing SQL, ask natural language questions and get answers. Instead of building dashboards manually, describe what you want to see and get generated visualisations.

This is genuinely useful, not just hype. Tools like Mode’s AI assistant, Thoughtspot’s search, and various AI-powered BI tools are making data accessible to non-technical product people.

Caveat: verify the results. LLMs can confidently generate wrong queries. Use them to speed up analysis, not as autonomous decision-makers.

Real-time features becoming economical: Streaming infrastructure is getting simpler and cheaper. What used to require dedicated data engineering teams can now be done with managed services like Confluent Cloud, AWS Kinesis, or even just serverless functions processing events.

This makes real-time product features more viable for more teams. Personalisation that updates based on current session behaviour. Recommendations that reflect what user just did. Collaboration features that need instant synchronisation.

Imagine this: you can build real-time recommendations using serverless functions and DynamoDB. When users upvot content, recommendations update immediately. This used to require complex infrastructure, now it’s serverless functions responding to events.

Data quality tools maturing: Data observability platforms (Monte Carlo, Datafold, Great Expectations) are making data quality management more systematic. They catch data issues before they affect decisions, track lineage so you understand where data comes from, and automate testing.

This matters because data quality is usually the weakest part of data platforms. Teams invest in infrastructure but not validation, then make bad decisions based on incorrect data. These tools make quality manageable at scale.

Preparing your team

Technology alone won’t make your team data-driven. Here’s how to build the culture and capabilities to actually use a data platform effectively.

Invest in data literacy: Product people need to understand basic data concepts, like what an event is, what makes a good metric, how to interpret statistical significance, what causes common analytical mistakes.

Best way is to do workshops: hands-on practice writing queries, interpreting results, identifying misleading analyses. Not to make everyone analysts, but to build enough literacy to use data tools independently and spot problems.

This investment pays off immediately. Teams that understand data ask better questions, interpret results correctly, and don’t waste analyst time on trivial queries. I can help with workshops - contact me for details.

Create shared definitions and metrics: If different teams define “active user” differently, you’ll have endless confusion. If feature teams each create their own conversion funnel definitions, you can’t compare results.

Establish canonical definitions for key concepts. Document them. Enforce them through your semantic layer so everyone queries the same underlying logic. This seems bureaucratic but prevents tremendous waste.

At the SaaS company, we spent two weeks getting alignment on core metric definitions. Painful at the time, saved us countless hours of “wait, how did you calculate that?” conversations later.

Build analyst-product partnerships: Embed analysts in product teams so they understand product context and can proactively surface insights. Analysts who work across teams lose context and become order-takers for queries.

Our embedded analysts attended product meetings, understood roadmaps, and could say “hey, this data suggests you might want to reconsider that prioritisation.” That proactive insight was far more valuable than reactive query answering.

Establish rhythms for data review: Make looking at data a habit, not something that happens when someone remembers. Weekly metric reviews. Post-launch analysis for every significant feature. Monthly cohort retention reviews.

These rhythms meant data insights fed into decisions naturally. Teams didn’t ship features and forget them, they checked back to see if the feature succeeded and iterated based on findings.

Balance self-service with support: Enable product people to answer simple questions themselves, but provide analyst support for complex investigations. Finding that balance requires iteration.

We established “office hours” where product people could get analyst help on trickier analyses. This let them self-serve most of the time while having expert support available when needed.

Key Takeaways

Let me distil the essentials you can act on:

Data contracts matter more than infrastructure: Thoughtfully design what events to capture and why before worrying about warehouses and pipelines. Bad data in sophisticated infrastructure is still useless.
Start simple with managed services: Use BigQuery, Snowflake, or similar managed platforms. Spend your engineering time on product features, not maintaining data infrastructure. Self-hosting only makes sense at significant scale or with specific requirements.
Build a semantic layer for product teams: Create queryable tables that represent product concepts, not raw event streams. This enables self-service analytics. Use tools like dbt to manage transformations systematically.
Integrate data into product workflow: Make insights accessible where decisions happen. Slack channels, meeting dashboards, automated reports. Data that requires a separate “go look at analytics” step doesn’t get used consistently.
Invest in data literacy across the team: Everyone working with data needs basic understanding of events, metrics, and statistical concepts. This prevents misinterpretation and enables self-service effectively.
Establish clear metric definitions: Agree on canonical definitions for key concepts and metrics. Document them. Enforce them through your data models. This prevents endless confusion about “how did you calculate that?”
Plan for data quality from the start: Implement validation, testing, and monitoring for data quality. Bad data leads to wrong decisions. Build quality checks into your pipeline, not as an afterthought.

Final Thoughts

Data platforms are enabling infrastructure, not ends in themselves. The goal isn’t having sophisticated technology, it’s making better product decisions faster.

I’ve seen teams with basic batch pipelines and queries make brilliant data-driven decisions because they’d invested in proper instrumentation, clear definitions, and data literacy. I’ve also seen teams with real-time streaming infrastructure and expensive tooling make poor decisions because they collected the wrong data or didn’t understand how to interpret it.

Start with the decisions you need to make, work backwards to the data required, then build just enough infrastructure to make that data accessible and trustworthy. Don’t start with the infrastructure and hope useful insights emerge.

The SaaS company I mentioned started with a simple question: “How many users are actually getting value from our product?” Answering that required defining “value” (which features indicated success), instrumenting those features, collecting the data, and building a simple dashboard. That foundation enabled increasingly sophisticated analysis over time.

Start with one question you need answered. Build the minimum infrastructure to answer it reliably. Learn from that. Expand thoughtfully. That’s how you build a data platform that actually serves product teams instead of becoming a distraction from product work.

Have questions or thoughts? Get in touch - I’d love to hear from you!