SwiftOpenAI
An open-source Swift package designed for effortless interaction with OpenAI's public API.
Table of Contents
Description
SwiftOpenAI
is an open-source Swift package that streamlines interactions with all OpenAI's API endpoints, now with added support for Azure, AIProxy, and Assistant stream APIs.
OpenAI ENDPOINTS
BETA
Getting an API Key
⚠️ Important
To interact with OpenAI services, you'll need an API key. Follow these steps to obtain one:
- Visit OpenAI.
- Sign up for an account or log in if you already have one.
- Navigate to the API key page and follow the instructions to generate a new API key.
For more information, consult OpenAI's official documentation.
⚠️ Please take precautions to keep your API key secure per OpenAI's guidance:
Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your backend server where your API key can be securely loaded from an environment variable or key management service.
SwiftOpenAI has built-in support for AIProxy, which is a backend for AI apps, to satisfy this requirement. To configure AIProxy, see the instructions here.
Installation
Swift Package Manager
- Open your Swift project in Xcode.
- Go to
File
->Add Package Dependency
. - In the search bar, enter this URL.
- Choose the version you'd like to install (see the note below).
- Click
Add Package
.
Note: Xcode has a quirk where it defaults an SPM package's upper limit to 2.0.0. This package is beyond that
limit, so you should not accept the defaults that Xcode proposes. Instead, enter the lower bound of the
release version that you'd like to support, and then
tab out of the input box for Xcode to adjust the upper bound. Alternatively, you may select branch
-> main
to stay on the bleeding edge.
Usage
To use SwiftOpenAI in your project, first import the package:
import SwiftOpenAI
Then, initialize the service using your OpenAI API key:
let apiKey = "your_openai_api_key_here"
let service = OpenAIServiceFactory.service(apiKey: apiKey)
You can optionally specify an organization name if needed.
let apiKey = "your_openai_api_key_here"
let oganizationID = "your_organixation_id"
let service = OpenAIServiceFactory.service(apiKey: apiKey, organizationID: oganizationID)
That's all you need to begin accessing the full range of OpenAI endpoints.
How to get the status code of network errors
You may want to build UI around the type of error that the API returns.
For example, a 429
means that your requests are being rate limited.
The APIError
type has a case responseUnsuccessful
with two associated values: a description
and statusCode
.
Here is a usage example using the chat completion API:
let service = OpenAIServiceFactory.service(apiKey: apiKey)
let parameters = ChatCompletionParameters(messages: [.init(role: .user, content: .text("hello world"))],
model: .gpt4o)
do {
let choices = try await service.startChat(parameters: parameters).choices
// Work with choices
} catch APIError.responseUnsuccessful(let description, let statusCode) {
print("Network error with status code: \(statusCode) and description: \(description)")
} catch {
print(error.localizedDescription)
}
Audio
Audio Transcriptions
Parameters
public struct AudioTranscriptionParameters: Encodable {
/// The name of the file asset is not documented in OpenAI's official documentation; however, it is essential for constructing the multipart request.
let fileName: String
/// The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
let file: Data
/// ID of the model to use. Only whisper-1 is currently available.
let model: String
/// The language of the input audio. Supplying the input language in [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) format will improve accuracy and latency.
let language: String?
/// An optional text to guide the model's style or continue a previous audio segment. The [prompt](https://platform.openai.com/docs/guides/speech-to-text/prompting) should match the audio language.
let prompt: String?
/// The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt. Defaults to json
let responseFormat: String?
/// The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use [log probability](https://en.wikipedia.org/wiki/Log_probability) to automatically increase the temperature until certain thresholds are hit. Defaults to 0
let temperature: Double?
public enum Model: String {
case whisperOne = "whisper-1"
}
public init(
fileName: String,
file: Data,
model: Model = .whisperOne,
prompt: String? = nil,
responseFormat: String? = nil,
temperature: Double? = nil,
language: String? = nil)
{
self.fileName = fileName
self.file = file
self.model = model.rawValue
self.prompt = prompt
self.responseFormat = responseFormat
self.temperature = temperature
self.language = language
}
}
Response
public struct AudioObject: Decodable {
/// The transcribed text if the request uses the `transcriptions` API, or the translated text if the request uses the `translations` endpoint.
public let text: String
}
Usage
let fileName = "narcos.m4a"
let data = Data(contentsOfURL:_) // Data retrieved from the file named "narcos.m4a".
let parameters = AudioTranscriptionParameters(fileName: fileName, file: data) // **Important**: in the file name always provide the file extension.
let audioObject = try await service.createTranscription(parameters: parameters)
Audio Translations
Parameters
public struct AudioTranslationParameters: Encodable {
/// The name of the file asset is not documented in OpenAI's official documentation; however, it is essential for constructing the multipart request.
let fileName: String
/// The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
let file: Data
/// ID of the model to use. Only whisper-1 is currently available.
let model: String
/// An optional text to guide the model's style or continue a previous audio segment. The [prompt](https://platform.openai.com/docs/guides/speech-to-text/prompting) should match the audio language.
let prompt: String?
/// The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt. Defaults to json
let responseFormat: String?
/// The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use [log probability](https://en.wikipedia.org/wiki/Log_probability) to automatically increase the temperature until certain thresholds are hit. Defaults to 0
let temperature: Double?
public enum Model: String {
case whisperOne = "whisper-1"
}
public init(
fileName: String,
file: Data,
model: Model = .whisperOne,
prompt: String? = nil,
responseFormat: String? = nil,
temperature: Double? = nil)
{
self.fileName = fileName
self.file = file
self.model = model.rawValue
self.prompt = prompt
self.responseFormat = responseFormat
self.temperature = temperature
}
}
Response
public struct AudioObject: Decodable {
/// The transcribed text if the request uses the `transcriptions` API, or the translated text if the request uses the `translations` endpoint.
public let text: String
}
Usage
let fileName = "german.m4a"
let data = Data(contentsOfURL:_) // Data retrieved from the file named "german.m4a".
let parameters = AudioTranslationParameters(fileName: fileName, file: data) // **Important**: in the file name always provide the file extension.
let audioObject = try await service.createTranslation(parameters: parameters)
Audio Speech
Parameters
/// [Generates audio from the input text.](https://platform.openai.com/docs/api-reference/audio/createSpeech)
public struct AudioSpeechParameters: Encodable {
/// One of the available [TTS models](https://platform.openai.com/docs/models/tts): tts-1 or tts-1-hd
let model: String
/// The text to generate audio for. The maximum length is 4096 characters.
let input: String
/// The voice to use when generating the audio. Supported voices are alloy, echo, fable, onyx, nova, and shimmer. Previews of the voices are available in the [Text to speech guide.](https://platform.openai.com/docs/guides/text-to-speech/voice-options)
let voice: String
/// Defaults to mp3, The format to audio in. Supported formats are mp3, opus, aac, and flac.
let responseFormat: String?
/// Defaults to 1, The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
let speed: Double?
public enum TTSModel: String {
case tts1 = "tts-1"
case tts1HD = "tts-1-hd"
}
public enum Voice: String {
case alloy
case echo
case fable
case onyx
case nova
case shimmer
}
public enum ResponseFormat: String {
case mp3
case opus
case aac
case flac
}
public init(
model: TTSModel,
input: String,
voice: Voice,
responseFormat: ResponseFormat? = nil,
speed: Double? = nil)
{
self.model = model.rawValue
self.input = input
self.voice = voice.rawValue
self.responseFormat = responseFormat?.rawValue
self.speed = speed
}
}
Response
/// The [audio