About
Self Introduction
- Yuki Iwai (he/him)
- Tokyo, Japan (UTC+9)
- Software Engineer and Part-time OSS developer at CyberAgent, Inc.
Technical Interest
- AutoML
- Distributed Trainig
- Batch Workload
- Kubernetes
OSS Activity
I’m focuced on developing the Kubernetes-based Distributed Systems for AutoML, Distributed Training and Batch Workloads.
- Member of Kubernetes Organization
- kubernetes-sigs/kueue (SIG Scheduling / WG Batch) maintainer
- kubernetes/kubernetes kube-controller-manager (mainly batch job-controller) contributor
- Member of Kubeflow Organization
- Technical Lead for WG AutoML and WG Training
- kubeflow/katib (WG AutoML) maintainer
- kubeflow/training-operator (WG Training) maintainer
- kubeflow/mpi-operator (WG Training) maintainer
- Member of Kserve Organization
- kserve/kserve contributor
Experience
- 2022/04 - current: Software Engineer (Private Cloud) and Part-time OSS developer at CyberAgent, Inc.
- Development of the on-prem Kubernetes as a Service (KaaS).
- Security Policies for many Kubernetes Clusters using open-policy-agent/gatekeeper.
- Development of the on-prem Kubernetes-based Machine Learning Platform.
- Hyperparameter Tuning System
- Job Systems for trainig ML models and predicting target values (inference)
- Distributed Training System (RDMA/GPU)
- Serving system for ML models
- Full managed interactive development environment (JupyterLab or Jupyter Notebook)
- Development of the on-prem Kubernetes as a Service (KaaS).
- 2021/07 - 2021/11: Part-time Infra/Software Engineer (Private Cloud) at CyberAgent, Inc.
- Development of the on-prem Kubernetes as a Service (KaaS).
- Security Policies for many Kubernetes Clusters using open-policy-agent/gatekeeper.
- Development of the on-prem Kubernetes-based Machine Learning Platform.
- Serving system for ML models
- Job Systems for trainig ML models
- Development of the on-prem Kubernetes as a Service (KaaS).
Internship
- 2020/10/01 - 2020/10/31: CyberAgent, Inc.
- Survey of Kubeflow features
- Survey of NVIDIA DGX A100 performance and features
- Blog: https://developers.cyberagent.co.jp/blog/archives/27764/ (Japanese)
- 2020/09/02 - 2020/09/15: Yahoo! Japan
- Development of monitoring infrastructure for on-prem Kubernetes as a Service (KaaS)
- 2020/08/03 - 2020/08/14: Cybozu
- Development of Rook on upstream
- 2020/07/20 - 2020/07/28: F@N Communications
- Survey of Vitess features
- Blog: https://n.fancs.tech/blog/beginnerofvitess/ (Japanese)
Education
- 2020/04 - 2022/03: Electronic Engineering Major, Graduate School of Science and Engineering, Kindai University
- Master of Engineering (Computer Science)
- 2016/04 - 2020/03: Department of Informatics, Faculty of Science and Engineering, Kindai University
- Bachelor of Engineering (Computer Science)
Talks
2020
- ML環境でのRook/Ceph at Japan Rook Meetup #3 (Japanese)
2023
- Batch Systems in Production with Kueue: Multi-Tenancy and Fungibility - Yuki Iwai, CyberAgent, Inc. & Aldo Culquicondor, Google at Kubernetes AI+HPC Day North America (English)
2024
- Advanced Resource Management for Running AI/ML Workloads with Kueue - Michał Woźniak, Google & Yuki Iwai, CyberAgent, Inc. at KubeCon EU 2024 Paris (English)
- WG-Batch Updates: What’s New and What Is Next? - Michał Woźniak, Google & Yuki Iwai, CyberAgent, Inc. at KubeCon EU 2024 Paris (English)
- Panel: AutoML and Training Working Group Updates - Andrey Velichkevich, Apple; Yuki Iwai, CyberAgent; Johnu George, Nutanix; Amber Graner, Open Source Evangelist at Kubeflow Summit Europe (English)
Publications
2022
- 入門Kueue〜KubernetesのBatchワークロード最前線〜 (Japanese)
2023
- Kubernetesの知識地図 —— 現場での基礎から本番運用まで (Japanese)