![]() The Real Time Data Platform team builds software that enables our business partners to transport, process, and sink all data at Netflix. Metacat: Making Big Data Discoverable and Meaningful at Netflix.We also own Metacat, which is our foundational and critical operational Metadata store that enables Big Data Processing and Compute. We are also building a centralized and extensible policy engine that would allow our stakeholders to customize data policy rules for all datasets. We are designing a centralized and yet customizable data detection framework that could sample, detect, and report violations across all datasets which would give us holistic insights on the risk profiles and quality of our data. We are also building a Netflix-wide schema registry to help datasets interoperate across systems and manage the lifecycle of schemas in different datastores. We have built a Netflix-wide data catalog to capture and infer business metadata across all datasets at Netflix and track lineage of data as it moves between different datastores. We aim to provide visibility to the taxonomy and business context of all Netflix datasets (including video assets, unstructured logs, etc). This team is responsible for the broader data strategy at Netflix. Support rotations allow the team members to grow operational skills and learn about technologies that may not be their primary focus.Ĭheck out some of our talks on Iceberg, Presto and Druid. The team has PMC members and committers that shape and contribute to open source projects. The team works on solving challenging problems at scale that have a huge impact on the Data Platform. This is a dream team of highly passionate and intelligent engineers that work really well together. Iceberg table format project was started in the BDC team and is now a thriving open source project. It also provides sub-second latency for a certain class of queries using Druid. It provides support for Spark, to ETL data into the Petabytes-scale data warehouse and access that data using Spark and Presto/ TrinoDB. This team of 8 people (and growing) is central to batch data processing in Data Platform at Netflix. Please check our recent talk on Workflow Orchestration and Building a Scalable Workflow Scheduler with Netflix Conductor Big Data Compute and Warehouseīig Data Compute team is responsible for providing the cloud-native platform for distributed data processing at Netflix, working with Spark, Presto/Trino, Druid, and Iceberg, as well as other supporting technologies. The team is also beginning work on the next generation Orchestration architecture including automatic and efficient ETL triggering and management among other services.īDO Team Services Overview Latest blog from team on Workflow Orchestration Jun He and Harrington Joseph at Qcon plus - Robust Foundation for Data Pipelines at Scale - Lessons From Netflix ![]() For Job abstraction across various engines like Spark and Trino, the team has created an Open Source service called Genie. It owns the existing workflow orchestration platform called “Meson”, which is being replaced by a brand new product developed ground up by the team, enabling high throughput, horizontal scalability and advanced parametrized, event-based scheduling. The Big Data Orchestration team provides a platform of choice that enables scheduling, orchestrating and executing big data jobs and workflows in an easy to use manner. The team aspires to build an intelligent data warehouse that can auto-analyze and auto-optimize while spearheading Iceberg, an industry standard analytics data storage format on cloud object store (AWS S3, in our case). The platform provides an abstraction layer to orchestrate hundreds of thousands of big data workflows and jobs every day, executing on compute engines like Presto, Spark, or Druid. The Big Data Platform team is responsible for software that enables our business partners to make business decisions efficiently and with ease. We have the following teams in Data Platform supporting the data needs for the entire lifecycle of the Applications from Data Stores to Data Movement to Data Persistence. We manage state-of-the-art infrastructures, services and products and are constantly innovating to support all our business needs at scale. ![]() Data Platform lies in the heart of the Platform Organization of Netflix serving all the data needs for the company. Almost all business decisions here are backed by data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |