Top
image credit: Adobe Stock

Orchestrating data for machine learning pipelines

March 22, 2022

Via: InfoWorld

Machine learning (ML) workloads require efficient infrastructure to yield rapid results. Model training relies heavily on large data sets. Funneling this data from storage to the training cluster is the first step of any ML workflow, which significantly impacts the efficiency of model training.

Data and AI platform engineers have long been concerned with managing data with these questions in mind:

  • Data accessibility: How to make training data accessible when data spans multiple sources and data is stored remotely?
  • Data pipelining: How to manage data as a pipeline that continuously feeds data into the training workflow without waiting?
  • Performance and GPU utilization: How to achieve both low metadata latency and high data throughput to keep the GPUs busy?

Read More on InfoWorld