scaleway · ldecarvalho-doc · Sep 4, 2024 · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024
@@ -0,0 +1,8 @@
+---
+meta:
+  title: Generative APIs - API/CLI
+  description: Generative APIs API/CLI
+content:
+  h1: Generative APIs - API/CLI
+  paragraph: Generative APIs API/CLI
+---
@@ -0,0 +1,37 @@
+---
+meta:
+  title: Understanding errors
+  description: This page explains how to understand errors with Generative APIs
+content:
+  h1: Understanding errors
+  paragraph: This page explains how to understand errors with Generative APIs
+tags: generative-apis ai-data understanding-data
+dates:
+  validation: 2024-09-02
+  posted: 2024-09-02
+---
+
+Scaleway is using conventional HTTP response codes to indicate the success or failure of an API request. 
+In general, codes in the 2xx range indicate success, codes in the 4xx range indicate an error given the information provided, and codes in the 5xx range show an error from Scaleway servers.
+
+If the response code is not within the 2xx range, the response will contain an error object structured as follows:
+
+```
+{
+  "error": string,
+  "status": number,
+  "message": string
+}
+```
+
+Following are usual HTTP error codes:
+
+- 400 - **Bad Request**: The format or content of your payload is incorrect. Body may be too large, or fail to parse, or content-type is mismatched.
+- 401 - **Unauthorized**: The `authorization` header is missing. Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/)
+- 403 - **Forbidden**: Your API key doesn't exist or does not have the necessary permissions to access the requested resource. Find required permission sets in [this page](/generative-apis/api-cli/using-generative-apis/)
+- 404 - **Route Not Found**: The requested resource could not be found. Check your request is being made to the correct endpoint.
+- 422 - **Model Not Found**: The `model` key is present in the request payload, but the corresponding model is not found.
+- 422 - **Missing Model**:  The `model` key is missing from the request payload.
+- 500 - **API error**: An unexpected internal error has occurred within Scaleway's systems. If the issue persists, please open a support ticket.
+
+For streaming responses via SSE, errors may occur after a 200 response has been returned.
@@ -0,0 +1,88 @@
+---
+meta:
+  title: Using Chat API
+  description: This page explains how to use the Chat API to query models
+content:
+  h1: Using Chat API
+  paragraph: This page explains how to use the Chat API to query models
+tags: generative-apis ai-data chat-api
+dates:
+  validation: 2024-09-03
+  posted: 2024-09-03
+---
+
+Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have an LLM-driven application that uses one of OpenAI's client libraries, you can easily configure it to point to Scaleway Chat API, and get your existing applications running using open-weight instruct models hosted at Scaleway.
+
+## Create chat completion
+
+Creates a model response for the given chat conversation.
+
+```
+curl --request POST \
+     --url https://api.scaleway.ai/v1/chat/completions \
+     --header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
+     --header 'Content-Type: application/json'
+     --data '{
+     "model": "llama-3.1-8b-instruct",
+     "messages": [
+      {
+        "role": "system",
+        "content": "<string>"
+      },
+      {
+        "role": "user",
+        "content": "<string>"
+      }
+     ],
+     "max_tokens": integer,
+     "temperature": float,
+     "top_p": float,
+     "presence_penalty": float,
+     "stop": "<string>",
+     "stream": boolean,
+     }'
+```
+
+
+## Headers
+
+Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/)
+
+## Body
+
+### Required parameters
+
+| Param  | Type | Description |
+| ------------- |-------------|-------------|
+| **messages***     | array of objects     | A list of messages comprising the conversation so far.     |
+| **model***      | string     | The name of the model to query.     |
+
+Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/chat/create) for more detailed information on the usage.
+
+### Supported parameters
+
+- temperature
+- top_p
+- max_tokens
+- stream
+- presence_penalty
+- logprobs
+- stop
+- seed
+
+### Unsupported parameters
+
+- response_format
+- frequency_penalty
+- n
+- top_logprobs
+- tools
+- tool_choice
+- logit_bias
+- user
+
+If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/).
+
+<Message type="note">
+  To go further, [find here Python code examples](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api) to query text models using Scaleway's Chat API.
+</Message>
@@ -0,0 +1,54 @@
+---
+meta:
+  title: Using Embeddings API
+  description: This page explains how to use the Embeddings API
+content:
+  h1: Using Embeddings API
+  paragraph: This page explains how to use the Embeddings API
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-09-03
+  posted: 2024-09-03
+---
+
+Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have clustering or classification tasks already using one of OpenAI's client libraries, you can easily configure it to point to Scaleway Embeddings API, and get your existing applications running with open-weight embedding models hosted at Scaleway.
+
+## Create embeddings
+
+Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.
+
+```
+curl --request POST \
+     --url https://api.scaleway.ai/v1/embeddings \
+     --header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
+     --header 'Content-Type: application/json'
+     --data '{
+     "model": "sentence-t5-xxl",
+     "input": "<string>"
+     }'
+```
+
+## Headers
+
+Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/)
+
+## Body
+
+### Required parameters
+
+| Param  | Type | Description |
+| ------------- |-------------|-------------|
+| **input***     | string or array     | Input text to embed, encoded as a string or array of strings. Cannot be an empty string.  |
+| **model***      | string     | The name of the model to query.     |
+
+Our embeddings API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/embeddings) for more detailed information on the usage.
+
+### Unsupported parameters
+- encoding_format (default float)
+- dimensions
+
+If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/).
+
+<Message type="note">
+  We provide [here some Python code examples](/ai-data/generative-apis/how-to/query-embedding-models/#querying-embedding-models-via-api) to query embedding models using Scaleway's Embeddings API.
+</Message>
@@ -0,0 +1,65 @@
+---
+meta:
+  title: Using Generative APIs
+  description: This page explains how to use Generative APIs
+content:
+  h1: Using Generative APIs
+  paragraph: This page explains how to use Generative APIs
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-08-28
+  posted: 2024-08-28
+---
+
+## Access
+
+- Access to this service is restricted while in beta. You can request access to the product via [a form here](https://www.scaleway.com/en/betas/#generative-api).
+- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) is needed.
+
+## Authentication
+
+All requests to the Scaleway Generative APIs must include an `Authorization` HTTP header with your API key prefixed by `Bearer`.
+
+We recommend exporting your secret key as an environment variable, which you can then pass directly in your curl request as follows. Remember to replace the example value with your own API secret key.
+
+```
+export SCW_SECRET_KEY=720438f9-fcb9-4ebb-80a7-808ebf15314b
+```
+
+Curl request:
+
+```
+curl -X GET \
+    -H "Authorization: Bearer ${SCW_SECRET_KEY}" \
+    -H "Content-Type: application/json" \
+    "https://api.scaleway.ai/v1/models"
+```
+
+When using the OpenAI Python SDK, the API key is set once during client initialization, and the SDK automatically manages the inclusion of the Authorization header in all API requests. 
+In contrast, when directly integrating with the Scaleway Generative APIs, you are responsible for manually setting the Authorization header with the API key for each request to ensure proper authentication.
+
+## Content types
+
+Scaleway Generative APIs accept JSON in request bodies and returns JSON in response bodies. 
+You will want to send the `Content-Type: application/json` HTTP header in your requests.
+
+## Permissions
+
+Permissions define the actions a user or an application can perform on Scaleway Generative APIs. They are managed using Scaleway’s [Identity and Access Management](/identity-and-access-management/iam/quickstart/) interface.
+
+[Owner](/identity-and-access-management/iam/concepts/#owner) status or certain [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allow you to perform actions in the intended Organization.
+
+Querying AI models hosted by Scaleway Generative APIs will require any of the following [permission sets](/identity-and-access-management/iam/concepts/#permission-set):
+
+- **GenerativeApisModelAccess**
+- **GenerativeApisFullAccess**
+- **AllProductsFullAccess**
+
+## Projects
+
+
+
+
+
+
+
@@ -0,0 +1,25 @@
+---
+meta:
+  title: Using Models API
+  description: This page explains how to use the Models API
+content:
+  h1: Using Models API
+  paragraph: This page explains how to use the Models API
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-09-02
+  posted: 2024-09-02
+---
+
+Scaleway Generative APIs are designed as drop-in replacement for the OpenAI APIs.
+Using the Models API, it is easy to list the various AI models available at Scaleway.
+
+## List models
+
+Lists the available models, and provides basic information about each one.
+
+```
+curl -s \
+     --url "https://api.scaleway.ai/v1/models" \
+     --header "Authorization: Bearer ${SCW_SECRET_KEY}"
+```
@@ -0,0 +1,65 @@
+---
+meta:
+  title: Generative APIs - Concepts
+  description: This page explains all the concepts related to Generative APIs
+content:
+  h1: Generative APIs - Concepts
+  paragraph: This page explains all the concepts related to Generative APIs
+tags:
+dates:
+  validation: 2024-08-27
+categories:
+  - ai-data
+---
+
+## API rate limits
+
+API rate limits define the maximum number of requests a user can make to the Generative APIs within a specific time frame. Rate limiting helps to manage resource allocation, prevent abuse, and ensure fair access for all users. Understanding and adhering to these limits is essential for maintaining optimal application performance using these APIs.
+
+## Context window
+
+The context window is the maximum amount of prompt data considered by the model to generate a response. Using models with high context length, you can provide more information to generate relevant responses. The context is measured in tokens.
+
+## Embeddings
+
+Embeddings are numerical representations of text data that capture semantic information in a dense vector format. In Generative APIs, embeddings are essential for tasks such as similarity matching, clustering, and serving as inputs for downstream models. These vectors enable the model to understand and generate text based on the underlying meaning rather than just the surface-level words.
+
+## Error handling
+
+Error handling refers to the strategies and mechanisms in place to manage and respond to errors during API requests. This includes handling network issues, invalid inputs, or server-side errors. Proper error handling ensures that applications using Generative APIs can gracefully recover from failures and provide meaningful feedback to users.
+
+## Parameters
+
+Parameters are settings that control the behavior and performance of generative models. These include temperature, max tokens, and top-p sampling, among others. Adjusting parameters allows users to tweak the model's output, balancing factors like creativity, accuracy, and response length to suit specific use cases.
+
+## Inter-token Latency (ITL)
+
+The inter-token latency corresponds to the average time elapsed between two generated tokens. It is usually expressed in milliseconds.
+
+## Prompt Engineering
+
+Prompt engineering involves crafting specific and well-structured inputs (prompts) to guide the model towards generating the desired output. Effective prompt design is crucial for generating relevant responses, particularly in complex or creative tasks. It often requires experimentation to find the right balance between specificity and flexibility.
+
+## Retrieval Augmented Generation (RAG)
+
+Retrieval Augmented Generation (RAG) is a technique that enhances generative models by integrating information retrieval methods. By fetching relevant data from external sources before generating a response, RAG ensures that the output is more accurate and contextually relevant, especially in scenarios requiring up-to-date or specific information.
+
+## Stop words
+
+Stop words are a parameter set to tell the model to stop generating further tokens after one or more chosen tokens have been generated. This is useful for controlling the end of the model output, as it will cut off at the first occurrence of any of these strings.
+
+## Streaming
+
+Streaming is a parameter allowing responses to be delivered in real-time, showing parts of the output as they are generated rather than waiting for the full response. Scaleway is following the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard. This behavior usually enhances user experience by providing immediate feedback and a more interactive conversation.
+
+## Temperature
+
+Temperature is a parameter that controls the randomness of the model's output during text generation. A higher temperature produces more creative and diverse outputs, while a lower temperature makes the model's responses more deterministic and focused. Adjusting the temperature allows users to balance creativity with coherence in the generated text.
+
+## Time to First Token (TTFT)
+
+Time to First Token (TTFT) measures the time elapsed from the moment a request is made to the point when the first token of the generated text is returned. TTFT is a crucial performance metric for evaluating the responsiveness of generative models, especially in interactive applications where users expect immediate feedback.
+
+## Tokens
+
+Tokens are the basic units of text that a generative model processes. Depending on the tokenization strategy, these can be words, subwords, or even characters. The number of tokens directly affects the context window size and the computational cost of using the model. Understanding token usage is essential for optimizing API requests and managing costs effectively.
@@ -0,0 +1,8 @@
+---
+meta:
+  title: Generative APIs - How Tos
+  description: Generative APIs How Tos
+content:
+  h1: Generative APIs - How Tos
+  paragraph: Generative APIs How Tos
+---