-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #121 from jgaglione/master
updated jgaglion postdoc page
- Loading branch information
Showing
1 changed file
with
18 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,8 +7,8 @@ postdoc-name: Jethro Gaglione | |
title: Post-doctoral researcher | ||
active: True | ||
dates: | ||
start: 2023-10-01 | ||
end: 2024-09-30 | ||
start: 2024-01-01 | ||
end: 2024-12-31 | ||
photo: /assets/images/team/Jethro-Gaglione.jpg | ||
institution: Vanderbilt University | ||
e-mail: [email protected] | ||
|
@@ -26,7 +26,23 @@ presentations: | |
meeting: <Production Group Meeting> | ||
meetingurl: <https://indico.cern.ch/event/1400420/> | ||
|
||
- title: "Machine Learning Training Facility at Vanderbilt - A Prototype for Efficient and Reproducible ML Training" | ||
date: "July 19, 2024" | ||
url: <https://indico.cern.ch/event/1438068/> | ||
meeting: <Fast ML Co-processor Meeting> | ||
meetingurl: <https://indico.cern.ch/event/1438068/> | ||
|
||
current_status: > | ||
<br> | ||
<b>2024 Q3</b> | ||
<br> | ||
This quarter, we made significant progress integrating the btag POG ML training framework b-hive into an MLflow project which can be submitted to the Machine Learning Training Facility (MLTF). This work is very close to being merged, which will make it the first production CMS ML workflow integrated with the MLTF. | ||
Work on hardware capabilities continues to hit delays due to issues with firmwares provided by the manufacturer. Engineers were unable to remotely diagnose the issue, leading Vanderbilt to ship the hardware back for hands-on inspection. This was successful, the engineers were able to find a subtle bug at the PCI-E layer, and updated/flashed the firmware to solve it. As of this writing, the hardware is being shipped back to Vanderbilt with the assertion from the manufacturer that it is fixed. This will, of course, push back hardware-related milestones. | ||
|
||
Vanderbilt developers have completed a first draft of the MLflow “gateway” server, which provides a REST-based job submission infrastructure (similar to CMS’ CRAB functionality). This will allow automated submission of training tasks (e.g. for CI/CD) via REST, or CLI-based job submission using a MLflow plugin which users can install into their environments. The functionality is currently basic, stubbing out the API, but has token-based authentication enabled to the point that the service can be securely accessed. The next work is to implement the missing functions so this service can be opened to alpha users in Q4. | ||
|
||
|
||
|
||
<br> | ||
<b>2024 Q2</b> | ||
|