Data Pipelines

Platform

We are the creators of datapipelines.com, the no-code data pipeline builder built on Apache Spark with inbuilt automation, SQL tools & connectors for AWS, Google Cloud, Azure, and more.

To learn more about the platform, go to https://datapipelines.com

Custom Solutions

We implement custom data pipelines of any complexity.

We can connect, transform and automate the delivery of any type of data so you can focus on the insights. We work with all types of databases, data lakes, file stores, a wide range of APIs and virtually any other type of data stores you may be using.

During the development of your pipeline, you don't have to be involved in making any technical decisions at all, unless you want to be. We will suggest the best and most cost-efficient solution based on your specific use case.

Most of our solutions are powered by Apache Spark so you will never have to worry about the amount of data that needs processing.

Your pipelines can be deployed in a variety of environments including Amazon EMR, Databricks or your datacenter.

Apache Spark Consulting

Apache Spark is an open-source analytics engine for large-scale data processing. We have a wealth of experience planning, developing and deploying Apache Spark applications.

  • On-premise or cloud
  • Spark ETL / ELT
  • Spark Streaming
  • Integration with other technologies
  • Bespoke Spark application implementation
  • Cluster provisioning and maintenance

Use cases

  • A data-visualisation company was using a cloud software to integrate and transform some data to feed financial dashboards.
    The cloud system they were using did not have support for some data sources their clients were storing data in. Their solution was very rigid comprising stored procedures, making testing and further development very difficult.
    We recommended switching to a custom data pipeline implementation comprising some custom scripts and an Apache Spark application. The solution was easily testable and scalable. We recommended running their workloads via Databricks, inside their own existing AWS infrastructure.
  • A digital marketing company needed to run daily custom reports and move their clients' impression data from AWS S3 to Google BigQuery at regular intervals.
    We recommended our flexible off-the-shelf cloud solution, Data Pipelines. As part of the onboarding process, their analysts received free training on Data Pipelines.