AI & Data Engineering

Engineering intelligenceinto production systems

Sunny Data designs and builds data platforms, model-backed workflows, optimisation systems, and scientific AI infrastructure for organisations where technical decisions have operational consequences.

See what we solve
0+
Years in production
0+
Technical domains
3
Cloud platforms

Founder experience across

Nokia
Mars
BCC
Challenges

Serious systems,real constraints

The work sits where data, AI, infrastructure, and software decisions are tightly coupled. Each challenge is framed around the operating reality first: users, ownership, failure modes, cost, governance, and maintainability.

Data platforms

Trusted data under real operational constraints

When ingestion, modelling, and business rules drift apart, every downstream decision becomes fragile. The platform has to make ownership, lineage, quality, and cost visible by design.

Large-scale lakehouses · regulated environments · enterprise supply chains

AI systems

Model-backed workflows that must survive real users

The model is rarely the hard part. Retrieval quality, permissions, latency, evaluation, retries, cost, and operational handoffs decide whether the system is usable.

Operational LLM workflows · retrieval and evaluation · enterprise systems

Machine learning

Predictive systems that need an owner after launch

A useful model is only one part of the system. Feature logic, training, deployment, monitoring, drift response, and business ownership have to be designed together.

Risk modelling · scientific environments · production monitoring

Optimisation

Planning and allocation where heuristics stop scaling

Routing, scheduling, allocation, and planning work only when the constraints are modelled honestly. More code does not fix the wrong formulation.

Routing · resource allocation · constraint-heavy decisions

Research engineering

Scientific software that must be reproducible

Research environments need speed without losing traceability. The system has to respect experimental workflows, reproducibility, and the realities of scientific teams.

Genomics platforms · oncology data · published biomedical work

Internal platforms

Internal platforms where the data layer is the product

Dashboards and portals fail when they are treated as UI around weak data. The application has to be built from the operational data model outward.

Operational portals · public-sector workflows · backend and data ownership

Evidence

Grounded in production

Selected founder experience from enterprise platforms, regulated banking, global consumer systems, genomics infrastructure, and oncology research: different domains, same requirement: systems that survive real constraints.

Selected cases

01/05

Mars

Governed cloud foundation for enterprise delivery

Cloud Platform Engineering

Technical ownership behind a governed Azure platform foundation for Mars Snacking, aligning architecture decisions across internal teams, vendor architects, and implementation partners. The work established a common operating model for Azure resources, Terraform-based infrastructure, and coordinated delivery across enterprise constraints.

Azure governanceTerraform infrastructureVendor alignmentOperating standards

01/05

BCC · IBM/Viewnext

First enterprise data platform for banking operations

Data Engineering

Built BCC's first large-scale data foundation: data lake architecture, real-time Kafka flows, and an operational data journal used by technical and business teams to understand system state and information flows in real time.

Data lake from zeroKafka real-time architectureCross-department adoptionTechnical training

02/05

Nokia · Microsoft MixRadio

Recommendation systems for a global music service

Recommendation Systems

Implemented the production data layer behind MixRadio recommendations on AWS, turning data science algorithms into scalable personalisation systems for millions of users. The work also covered APIs, microservices, and internal catalogue tooling for music experts.

Recommendation pipelinesAWS production systemsMillions of usersInternal operations tooling

03/05

Coral Genomics

ML-ready genomics at petabyte scale

Genomics AI

Designed and released DNARecords, an open-source sparse genomics format and SDK for transforming VCF/BGEN-scale datasets into machine-learning-ready representations for deep learning workflows.

bioRxiv preprintOpen-source SDKVCF/BGEN conversionGenomics AI infrastructure

Oncko

Research AI systems for oncology under uncertainty

Oncology AI

Built research software and data systems for drug-combination discovery under high scientific uncertainty: large-scale matrix clustering, bioinformatics pipeline orchestration, LLM-assisted extraction, heterogeneous data harmonisation, and internal hypothesis tracking.

Drug-combination discoveryBioinformatics orchestrationLLM extractionHypothesis tracking

05/05

Mars

Governed cloud foundation for enterprise delivery

Cloud Platform Engineering

Technical ownership behind a governed Azure platform foundation for Mars Snacking, aligning architecture decisions across internal teams, vendor architects, and implementation partners. The work established a common operating model for Azure resources, Terraform-based infrastructure, and coordinated delivery across enterprise constraints.

Azure governanceTerraform infrastructureVendor alignmentOperating standards

01/05

BCC · IBM/Viewnext

First enterprise data platform for banking operations

Data Engineering

Built BCC's first large-scale data foundation: data lake architecture, real-time Kafka flows, and an operational data journal used by technical and business teams to understand system state and information flows in real time.

Data lake from zeroKafka real-time architectureCross-department adoptionTechnical training

02/05

Nokia · Microsoft MixRadio

Recommendation systems for a global music service

Recommendation Systems

Implemented the production data layer behind MixRadio recommendations on AWS, turning data science algorithms into scalable personalisation systems for millions of users. The work also covered APIs, microservices, and internal catalogue tooling for music experts.

Recommendation pipelinesAWS production systemsMillions of usersInternal operations tooling

03/05

Coral Genomics

ML-ready genomics at petabyte scale

Genomics AI

Designed and released DNARecords, an open-source sparse genomics format and SDK for transforming VCF/BGEN-scale datasets into machine-learning-ready representations for deep learning workflows.

bioRxiv preprintOpen-source SDKVCF/BGEN conversionGenomics AI infrastructure

Oncko

Research AI systems for oncology under uncertainty

Oncology AI

Built research software and data systems for drug-combination discovery under high scientific uncertainty: large-scale matrix clustering, bioinformatics pipeline orchestration, LLM-assisted extraction, heterogeneous data harmonisation, and internal hypothesis tracking.

Drug-combination discoveryBioinformatics orchestrationLLM extractionHypothesis tracking

05/05

Mars

Governed cloud foundation for enterprise delivery

Cloud Platform Engineering

Technical ownership behind a governed Azure platform foundation for Mars Snacking, aligning architecture decisions across internal teams, vendor architects, and implementation partners. The work established a common operating model for Azure resources, Terraform-based infrastructure, and coordinated delivery across enterprise constraints.

Azure governanceTerraform infrastructureVendor alignmentOperating standards

01/05

BCC · IBM/Viewnext

First enterprise data platform for banking operations

Data Engineering

Built BCC's first large-scale data foundation: data lake architecture, real-time Kafka flows, and an operational data journal used by technical and business teams to understand system state and information flows in real time.

Data lake from zeroKafka real-time architectureCross-department adoptionTechnical training

02/05

Nokia · Microsoft MixRadio

Recommendation systems for a global music service

Recommendation Systems

Implemented the production data layer behind MixRadio recommendations on AWS, turning data science algorithms into scalable personalisation systems for millions of users. The work also covered APIs, microservices, and internal catalogue tooling for music experts.

Recommendation pipelinesAWS production systemsMillions of usersInternal operations tooling

03/05

Coral Genomics

ML-ready genomics at petabyte scale

Genomics AI

Designed and released DNARecords, an open-source sparse genomics format and SDK for transforming VCF/BGEN-scale datasets into machine-learning-ready representations for deep learning workflows.

bioRxiv preprintOpen-source SDKVCF/BGEN conversionGenomics AI infrastructure

Oncko

Research AI systems for oncology under uncertainty

Oncology AI

Built research software and data systems for drug-combination discovery under high scientific uncertainty: large-scale matrix clustering, bioinformatics pipeline orchestration, LLM-assisted extraction, heterogeneous data harmonisation, and internal hypothesis tracking.

Drug-combination discoveryBioinformatics orchestrationLLM extractionHypothesis tracking

05/05

Process

How we work

Engagements are scoped around technical outcomes: a platform, a production workflow, an architecture decision, a migration, or a recovery plan. The work stays close to the people who own the system and the constraints that shape it.

01

Clarify the operating reality

Start from the current system: data, users, constraints, ownership, failure modes, and the decision that needs to be made.

02

Make the architecture explicit

The path is documented with alternatives, trade-offs, risks, and the reason it fits the client context before critical implementation starts.

03

Build the critical path

Implementation focuses on the parts that determine whether the system can reach production: data contracts, infrastructure, workflows, interfaces, and operational behaviour.

04

Transfer ownership

Repositories, infrastructure, runbooks, and decision records are left in a state that the client team can reason about, operate, and evolve.

Principles

How judgement shows up

The difference is rarely the tool. It is how early the hard constraints are found, how clearly trade-offs are made, and whether the system can be owned after launch.

Failure modes first

Unreliable data, latency, permissions, retries, drift, and handoffs are design inputs from the beginning.

Trade-offs are explicit

Architecture decisions include constraints, alternatives, consequences, and the reason a path fits the client context.

Ownership is transferable

Repositories, infrastructure, runbooks, and decision records are treated as part of the system, not project residue.

Clarity beats cleverness

Readable systems, direct interfaces, and predictable operations usually matter more than impressive abstractions.

Founder

Founder-level
technical ownership

Sunny Data is an intentionally small technical engineering boutique based in Spain. Engagements are led directly by the founder, with a short path between technical diagnosis, architecture, implementation, and ownership.

The work sits at the intersection of enterprise data engineering, applied AI, optimisation, custom software, and scientific infrastructure: environments where technical ambiguity has real operational cost.

The operating model is direct: understand the constraints, design the system, build the critical path, document the reasoning, and leave the client with something they can maintain.

Background

Mathematician · Senior Technology Expert

20+ years building and operating production systems across telecommunications, entertainment, finance, and biomedical research.

IBM · Nokia · Microsoft · Mars · BCC

Technical range

The work crosses sectors, but the pattern is consistent: data, models, infrastructure, and decisions that have to survive operational reality.

Governed enterprise platforms
Cloud foundations, data platforms, and operating models that multiple teams can use without losing control.
Production data systems
Real-time flows, data lakes, operational journals, and pipelines designed around reliability and ownership.
Model-backed workflows
Recommendation systems, ML operationalisation, optimisation loops, and internal tools that reach production.
Scientific AI infrastructure
Genomics, oncology, bioinformatics pipelines, heterogeneous data, and research systems under uncertainty.
20+ years
In production systems
Mathematics + AI
Technical foundation
Selective availability

When the next step
is not obvious

Some systems need technical judgement before another implementation push. Sunny Data helps clarify what matters, what is risky, and what is worth building.