Top

5 priorities that cut cloud costs and improve IT ops

Are tech layoffs inevitable, or can your company avoid them?

Three multicloud myths that need to be crushed

How the cloud helps banking and finance companies tackle core modernization challenges...

People.ai Earns Three Security Certifications that Ensure its Data and Cloud Protection Protocols...

image credit: Adobe Stock

Orchestrating data for machine learning pipelines

March 22, 2022

Via: InfoWorld

Category:

Software & Systems

Machine learning (ML) workloads require efficient infrastructure to yield rapid results. Model training relies heavily on large data sets. Funneling this data from storage to the training cluster is the first step of any ML workflow, which significantly impacts the efficiency of model training.

Data and AI platform engineers have long been concerned with managing data with these questions in mind:

Data accessibility: How to make training data accessible when data spans multiple sources and data is stored remotely?
Data pipelining: How to manage data as a pipeline that continuously feeds data into the training workflow without waiting?
Performance and GPU utilization: How to achieve both low metadata latency and high data throughput to keep the GPUs busy?

Read More on InfoWorld