Data management platform Bigeye unveils rapid dataset validation tool

Data management platform Bigeye unveils rapid dataset validation tool

Data platform Bigeye released Deltas, a new feature that enables data teams to automatically compare and validate datasets. Deltas replaces SQL queries, manual spreadsheet matching, and one-off Python scripts with automated comparisons and instant validation. This adds speed and reliability to key parts of the process of data management, whether migrating data to the cloud (or between clouds), replicating data, or promoting data from staging to production.

The founders of Bigeye, Kyle Kirwan, and Egor Gryaznov, managed Uber’s first data warehouse for reporting and data analysis. Kirwan and Gryaznov moved on to Bigeye in 2019 with the intention of solving what they observed to be an industry-wide problem — data reliability.

When moving data, all sorts of issues can occur, including delayed ingestion, dropped or duplicated records, and mutated values. Comparing datasets is a crucial step for many data engineering projects, but it’s often difficult and time-consuming due to the need for custom SQL queries, complex and overburdened spreadsheets, or bespoke Python scripts.

“We architected Bigeye to be an extensible framework, which allows us to apply data observability to all kinds of exciting use cases. We started by enabling data teams to automatically detect data quality and data pipeline issues. Now with Deltas, customers can easily compare and validate datasets,” said Gryaznov.

Accurate data comparison means accurate data migration

Udacity, an American for-profit offering online courses, uses Bigeye to automate monitoring and anomaly detection and create SLAs to ensure data quality and reliable data pipelines. “Udacity has a strong data culture, and we have hundreds of datasets with new additions and enhancements released weekly. The ability to automatically compare datasets before promoting them to production allows our team to apply software engineering best practices, have greater confidence in our data, catch issues we would otherwise miss, and speed up our development process,” said Simon Dong, head of data engineering at Udacity.

Bigeye users can now identify discrepancies between even complex datasets in seconds. Deltas uses Bigeye’s at-runtime query generation to apply the same observability configuration to both datasets, regardless of the SQL dialects of their sources, and detects differences between them. Bigeye promises that Deltas will alert customers to any issues that occur when moving data from A to B.

The marketplace’s demand for secure data management

After announcing on September 23 that Bigeye closed on a $45 million series B round led by Coatue, the company wasted no tim proving itself. Bigeye now features instant data set validation in addition to its other complementary products: auto metrics, auto thresholds, and integrations. Will reliability combined with speed give Bigeye an edge over other data observability platforms? MonteCarlo is offering operational analytics, and WhyLabs seems to be positioning itself to lead the way with AI innovation in data observability. However, companies like Instacart, Clubhouse, and Udacity are choosing Bigeye to automate monitoring and anomaly detection and create SLAs to ensure data quality and reliable data pipelines.

Deltas extends Bigeye’s data observability platform, making it easy to map a source and target, intelligently apply data quality metrics, and detect drift and discrepancies fast. Gryaznov added, “We look forward to enabling more groundbreaking user workflows through data observability in the near future.”

VentureBeat

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Source: Read Full Article