Skip to content

Commit

Permalink
Merge pull request #1141 from run-ai/JamieWeider72-patch-2
Browse files Browse the repository at this point in the history
Create hotfixes-2-18.md
  • Loading branch information
JamieWeider72 authored Oct 1, 2024
2 parents 193a576 + 30de201 commit f4404bb
Showing 1 changed file with 210 additions and 0 deletions.
210 changes: 210 additions & 0 deletions docs/home/changelog/hotfixes-2-18.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
---
title: Changelog Version 2.18
summary: This article lists the fixed and known issues in the patch versions as well as additional new features that were added in each patch version.
author:
- Jamie Weider
date: 2024-Sep-29
---

The following is a list of the known and fixed issues for Run:ai V2.18.

## Version 2.18.46

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-22054 | Fixed an issue where users could not attach to jobs. |
| RUN-22377 | Removed uncached client from accessrule-controller. |
| RUN-21697 | Fixed an issue where client may deadlock on suspension during allocation request. |

## Version 2.18.45

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20073 | Fixed an issue where it wasn't possible to authenticate with user credentials in the CLI. |
| RUN-21957 | Fixed an issue where there was a missing username-loader container in inference workloads. |

## Version 2.18.39

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-22276 | Fixed an issue where Knative external URL was missing from the Connections modal. |
| RUN-22280 | Fixed an issue when setting scale to zero - there was no pod counter in the Workload grid. |
| RUN-19811 | Added an option to set k8s tolerations to run:ai daemonsets (container-toolkit, runai-device-plugin, mig-parted, node-exporter, etc..) . |
| RUN-22128 | Added GID, UID, Supplemental groups to the V1 CLI. |

## Version 2.18.37

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-21800 | Fixed an issue with old workloads residing in the cluster. |

## Version 2.18.34

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-21907 | Fixed an issue where the SSO user credentials contain supplementary groups as string instead of int. |

## Version 2.18.31

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-21272 | Fixed an issue with multi-cluster credinatils creation, specifically with the same name in different clusters. |

## Version 2.18.29

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20680 | Fixed an issue where workloads page do not present requested GPU. |
| RUN-21200 | Fixed issues with upgrades and connections from v2.13. |

## Version 2.18.27

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20970 | Fixed an issue with PUT APIs. |

## Version 2.18.26

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20927 | Fixed an issue where node affinity was not updated correctly in projects edit. |
| RUN-20084 | Fixed an issue where default department were deleted instead of a message being displayed. |
| RUN-21062 | Fixed issues with the API documentation. |

## Version 2.18.25

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20434 | Fixed an issue when creating a Project/Department with memory resources requires 'units'. |
| RUN-20923 | Fixed an issue with projects/departments page loading slowly. |

## Version 2.18.23

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-19872 | Fixed an issue where the Toolkit crashes and fails to create and replace the publishing binaries. |

## Version 2.18.22

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20861 | Fixed an issue where a pod is stuck on pending due to a missing resource reservation pod. |
| RUN-20842 | Fixed an issue of illegal model name with "." in hugging face integration. |
| RUN-20791 | Fix an issue where notifications froze after startup. |
| RUN-20865 | Fixed an issue where default departments are not deleted when a cluster is deleted. |

## Version 2.18.21

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20698 | Fixed an issue where 2 processes requests a device at the same time received the same GPU, causing failures. |

## Version 2.18.18

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20760 | Fixed an issue where workload protection UI shows wrong status. |

## Version 2.18.15

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20612 | Fixed an issue where it was impossible with the use-table-data to hide node pool columns when there is only one default node pool. |
| RUN-20735 | Fixed an issue where nodePool.name is undefined|

## Version 2.18.12

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20721 | Added error handling to nodes pages. |

## Version 2.18.10

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20578 | Fixed an issue regarding policy enforcement. |
| RUN-20188 | Fixed issue with defining SSO in OpenShift identity provider. |

## Version 2.18.9

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20673 | Fixed an issue where a researcher uses a distributed elastic job, it is possible that in a specific flow it is scheduled on more than one node-pools. |

## Version 2.18.7

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20360 | Fixed an issue where the workload network status was misleading. |
| RUN-22107 | Fixed an issue where passwords containing $ were removed from the configuration. |

## Version 2.18.5

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20510 | Fixed an issue with external workloads - argocd workflow failed to be updated. |

## Version 2.18.4

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20516 | Fixed an issue when after deploying to prod, the cluster-service and authorization-service got multiple OOMKilled every ~1 hour. |


## Version 2.18.2

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20485 | Changed policy flags to Beta. |

## Version 2.18.1

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20005 | Fixed an issue where a sidecar container failure failed the workload. |
| RUN-20169 | Fixed an issue allowing the addition of annotations and labels to workload resources. |
| RUN-20108 | Fixed an issue exposing service node ports to workload status. |
| RUN-20160 | Fixed an issue with version display when installing a new cluster in an airgapped environment. |
| RUN-19874 | Fixed an issue when copying and editing a workload with group access to a tool and the group wasn't removed when selecting users option. |
| RUN-19893 | Fixed an issue when using a float number in the scale to zero inactivity value - custom which sometimes caused the submission to fail. |
| RUN-20087 | Fixed an issue where inference graphs should be displayed only for minimum cluster versions. |
| RUN-10733 | Fixed an issue where we needed to minify and obfuscate our code in production. |
| RUN-19962 | Fixed an issue to fix sentry domains regex and map them to relevant projects. |
| RUN-20104 | Fixed an issue where frontend Infinite loop on keycloak causes an error. |
| RUN-19906 | Fixed an issue where inference workload name validation fails with 2.16 cluster. |
| RUN-19605 | Fixed an issue where authorized users should support multiple users (workload-controller) . |
| RUN-19903 | Fixed an issue where inference chatbot creation fails with 2.16 cluster. |
| RUN-20409 | Fixed an issue where clicking on create new compute during the runai model flow did nothing. |
| RUN-11224 | Fixed an issue where ruani-adm collect all logs was not collecting all logs. |
| RUN-20478 | Improved workloads error status in overview panel. |
| RUN-19850 | Fixed an issue where an application administrator could not submit a job with CLI. |
| RUN-19863 | Fixed an issue where department admin received 403 on get tenants and cannot login to UI. |
| RUN-19904 | Fixed an issue when filtering by allocatedGPU in get workloads with operator returns incorrect result. |
| RUN-19925 | Fixed an issue when upgrade from v2.16 to v2.18 failed on worklaods migrations. |
| RUN-19887 | Fixed an issue in the UI when there is a scheduling rule of timeout, the form opened with the rules collapsed and written "none". |
| RUN-19941 | Fixed an issue where completed and failed jobs were shown in view pods in nodes screen. |
| RUN-19940 | Fixed an issue where setting gpu quota failed because the department quota was taken from wrong department. |
| RUN-19890 | Fixed an issue where editing a project by removing its node-affinity stuck updating. |
| RUN-20120 | Fixed an issue where project update fails when there is no cluster version. |
| RUN-20113 | Fixed an issue in the Workloads table where a researcher does not see other workloads once they clear their filters. |
| RUN-19915 | Fixed an issue when turning departments toggles on on cluster v2.11+ the gpu limit is -1 and there is ui error. |
| RUN-20178 | Fixed an issue where dashboard CPU tabs appeared in new overview. |
| RUN-20247 | Fixed an issue where you couldn't create a workload with namespace of a deleted project. |
| RUN-20138 | Fixed an issue where the system failed to create node-type on override-backend env. |
| RUN-18994 | Fixed an issue where some limitations for department administrator are not working as expected. |
| RUN-19830 | Fixed an issue where resources (GPU, CPU, Memory) units were added to k8s events that are published by run:ai scheduler making our messages more readable. |

## Version 2.18.0

| Internal ID | Description |
| ---------------------------- | ---- |
| RUN-20734 | Fixed an issue where the enable/disable toggle for the feature was presenting wrong info. |
| RUN-19895 | Fixed an issue of empty state for deleted workloads which is incorrect. |
| RUN-19507 | Fixed an issue in V1 where get APIs are missing required field in swagger leading to omit empty. |
| RUN-20246 | Fixed an issue in Departments v1 org unit where if unrecognizable params are sent, an error is returned. |
| RUN-19947 | Fixed an issue where pending multi-nodepool podgroups got stuck after cluster upgrade. |
| RUN-20047 | Fixed an issue where Workload status shows as "deleting" rather than "deleted" in side panel. |
| RUN-20163 | Fixed an issue when a DV is shared with a department and a new project is added to this dep - no pvc/pv is created. |
| RUN-20484 | Fixed an issue where Create Projects Requests Returned 500 - services is not a valid ResourceType. |
| RUN-20354 | Fixed an issue when deleting a department with projects resulted in projects remaining in environment with the status NotReady. |

0 comments on commit f4404bb

Please sign in to comment.