Temporal Cluster deployment guide
This guide provides a comprehensive overview to deploy and operate a Temporal Cluster in a live environment.
This guide is a work in progress. Some sections may be incomplete. Information may change at any time.
Legacy production deployment information is available here
Elasticsearch
- Elasticsearch v8 is supported beginning with Temporal Server version 1.18.0
- Elasticsearch v7.10 is supported beginning with Temporal Server version 1.17.0
- Elasticsearch v6.8 is supported through Temporal Server version 1.17.x
- Elasticsearch v6.8 and v7.10 are explicitly supported with AWS Elasticsearch
Advanced Visibility, within the Temporal Platform, is the subsystem and APIs that enable the listing, filtering, and sorting of Workflow Executions through an SQL-like query syntax.
Learn more features depend on an integration with Elasticsearch.
To integrate Elasticsearch with your Temporal Cluster, edit the persistence
section of your development.yaml
configuration file and run the index schema setup commands.
These steps are needed only if you have a "plain" Temporal Server Docker image.
If you operate a Temporal Cluster using our Helm charts or Docker Compose, the Elasticsearch index schema and index are created automatically using the auto-setup Docker image.
Edit persistence
- Add the
advancedVisibilityStore: es-visibility
key-value pair to thepersistence
section. The development_es.yaml file in thetemporalio/temporal
repo is a working example. The configuration instructs the Temporal Cluster how and where to connect to Elasticsearch storage.
persistence:
...
advancedVisibilityStore: es-visibility
- Define the Elasticsearch datastore connection information under the
es-visibility
key:
persistence:
...
advancedVisibilityStore: es-visibility
datastores:
...
es-visibility:
elasticsearch:
version: "v7"
url:
scheme: "http"
host: "127.0.0.1:9200"
indices:
visibility: temporal_visibility_v1_dev
Create index schema and index
Run the following commands to create the index schema and index:
# ES_SERVER is the URL of Elasticsearch server; for example, "http://localhost:9200".
SETTINGS_URL="${ES_SERVER}/_cluster/settings"
SETTINGS_FILE=${TEMPORAL_HOME}/schema/elasticsearch/visibility/cluster_settings_${ES_VERSION}.json
TEMPLATE_URL="${ES_SERVER}/_template/temporal_visibility_v1_template"
SCHEMA_FILE=${TEMPORAL_HOME}/schema/elasticsearch/visibility/index_template_${ES_VERSION}.json
INDEX_URL="${ES_SERVER}/${ES_VIS_INDEX}"
curl --fail --user "${ES_USER}":"${ES_PWD}" -X PUT "${SETTINGS_URL}" -H "Content-Type: application/json" --data-binary "@${SETTINGS_FILE}" --write-out "\n"
curl --fail --user "${ES_USER}":"${ES_PWD}" -X PUT "${TEMPLATE_URL}" -H 'Content-Type: application/json' --data-binary "@${SCHEMA_FILE}" --write-out "\n"
curl --user "${ES_USER}":"${ES_PWD}" -X PUT "${INDEX_URL}" --write-out "\n"
Set Elasticsearch privileges
Ensure that the following privileges are granted for the Elasticsearch Temporal index:
- Read
- index privileges:
create
,index
,delete
,read
- index privileges:
- Write
- index privileges:
write
- index privileges:
- Custom Search Attributes
- index privileges:
manage
- cluster privileges:
monitor
ormanage
.
- index privileges:
Add custom Search Attributes (optional)
This step is optional.
Here we are adding custom Search Attributes to your Cluster.
Run the following command to create search attributes: tctl search-attribute create
Archival
ArchivalWhat is Archival?Archival is a feature that automatically backs up Event Histories from Temporal Cluster persistence to a custom blob store after the Closed Workflow Execution retention period is reached.
Learn more is a feature that automatically backs up Workflow Execution Event Histories and Visibility data from Temporal Cluster persistence to a custom blob store.
Set up Archival
ArchivalWhat is Archival?Archival is a feature that automatically backs up Event Histories from Temporal Cluster persistence to a custom blob store after the Closed Workflow Execution retention period is reached.
Learn more consists of the following elements:
- Configuration: Archival is controlled by the server configuration (i.e. the
config/development.yaml
file). - Provider: Location where the data should be archived. Supported providers are S3, GCloud, and the local file system.
- URI: Specifies which provider should be used. The system uses the URI schema and path to make the determination.
Take the following steps to set up Archival:
- Set up the provider of your choice.
- Configure Archival.
- Create a Namespace that uses a valid URI and has Archival enabled.
Providers
Temporal directly supports several providers:
- Local file system: The filestore archiver is used to archive data in the file system of whatever host the Temporal server is running on. This provider is used mainly for local installations and testing and should not be relied on for production environments.
- Google Cloud: The gcloud archiver is used to connect and archive data with Google Cloud.
- S3: The s3store archiver is used to connect and archive data with S3.
- Custom: If you want to use a provider that is not currently supported, you can create your own archiverHow to create a custom Archiver
To archive data with a given provider, using the Archival feature, Temporal must have a corresponding Archiver component installed.
Learn more to support it.
Make sure that you save the provider's storage location URI in a place where you can reference it later, because it is passed as a parameter when you create a Namespace.
Configuration
Archival configuration is defined in the config/development.yaml
file.
Let's look at an example configuration:
# Cluster level Archival config
archival:
# Event History configuration
history:
# Archival is enabled at the cluster level
state: "enabled"
enableRead: true
# Namespaces can use either the local filestore provider or the Google Cloud provider
provider:
filestore:
fileMode: "0666"
dirMode: "0766"
gstorage:
credentialsPath: "/tmp/gcloud/keyfile.json"
# Default values for a Namespace if none are provided at creation
namespaceDefaults:
# Archival defaults
archival:
# Event History defaults
history:
state: "enabled"
# New Namespaces will default to the local provider
URI: "file:///tmp/temporal_archival/development"
You can disable Archival by setting archival.history.state
and namespaceDefaults.archival.history.state
to "disabled"
.
Example:
archival:
history:
state: "disabled"
namespaceDefaults:
archival:
history:
state: "disabled"
The following table showcases acceptable values for each configuration and what purpose they serve.
Config | Acceptable values | Description |
---|---|---|
archival.history.state | enabled , disabled | Must be enabled to use the Archival feature with any Namespace in the cluster. |
archival.history.enableRead | true , false | Must be true to read from the archived Event History. |
archival.history.provider | Sub provider configs are filestore , gstorage , s3 , or your_custom_provider . | Default config specifies filestore . |
archival.history.provider.filestore.fileMode | File permission string | File permissions of the archived files. We recommend using the default value of "0666" to avoid read/write issues. |
archival.history.provider.filestore.dirMode | File permission string | Directory permissions of the archive directory. We recommend using the default value of "0766" to avoid read/write issues. |
namespaceDefaults.archival.history.state | enabled , disabled | Default state of the Archival feature whenever a new Namespace is created without specifying the Archival state. |
namespaceDefaults.archival.history.URI | Valid URI | Must be a URI of the file store location and match a schema that correlates to a provider. |
Namespace creation
Although Archival is configured at the cluster level, it operates independently within each Namespace.
If an Archival URI is not specified when a Namespace is created, the Namespace uses the value of defaultNamespace.archival.history.URI
from the config/development.yaml
file.
The Archival URI cannot be changed after the Namespace is created.
Each Namespace supports only a single Archival URI, but each Namespace can use a different URI.
A Namespace can safely switch Archival between enabled
and disabled
states as long as Archival is enabled at the cluster level.
Archival is supported in Global NamespacesWhat is a Global Namespace?
A Global Namespace is a Namespace that exists across Clusters when Multi-Cluster Replication is set up.
Learn more (Namespaces that span multiple clusters).
When Archival is running in a Global Namespace, it first runs on the active cluster; later it runs on the standby cluster. Before archiving, a history check is done to see what has been previously archived.
Test setup
To test Archival locally, start by running a Temporal server:
./temporal-server start
Then register a new Namespace with Archival enabled.
./tctl --ns samples-namespace namespace register --gd false --history_archival_state enabled --retention 3
If the retention period isn't set, it defaults to two days. The minimum retention period is one day. The maximum retention period is 30 days.
Setting the retention period to 0 results in the error A valid retention period is not set on request.
Next, run a sample Workflow such as the helloworld temporal sample.
When execution is finished, Archival occurs.
Retrieve archives
You can retrieve archived Event Histories by copying the workflowId
and runId
of the completed Workflow from the log output and running the following command:
./tctl --ns samples-namespace wf show --wid <workflowId> --rid <runId>
Custom Archiver
To archive data with a given provider, using the ArchivalWhat is Archival?
Archival is a feature that automatically backs up Event Histories from Temporal Cluster persistence to a custom blob store after the Closed Workflow Execution retention period is reached.
Learn more feature, Temporal must have a corresponding Archiver component installed.
The platform does not limit you to the existing providers.
To use a provider that is not currently supported, you can create your own Archiver.
Create a new package
The first step is to create a new package for your implementation in /common/archiver. Create a directory in the archiver folder and arrange the structure to look like the following:
temporal/common/archiver
- filestore/ -- Filestore implementation
- provider/
- provider.go -- Provider of archiver instances
- yourImplementation/
- historyArchiver.go -- HistoryArchiver implementation
- historyArchiver_test.go -- Unit tests for HistoryArchiver
- visibilityArchiver.go -- VisibilityArchiver implementations
- visibilityArchiver_test.go -- Unit tests for VisibilityArchiver
Archiver interfaces
Next, define objects that implement the HistoryArchiver and the VisibilityArchiver interfaces.
The objects should live in historyArchiver.go
and visibilityArchiver.go
, respectively.
Update provider
Update the GetHistoryArchiver
and GetVisibilityArchiver
methods of the archiverProvider
object in the /common/archiver/provider/provider.go file so that it knows how to create an instance of your archiver.
Add configs
Add configs for your archiver to the config/development.yaml
file and then modify the HistoryArchiverProvider and VisibilityArchiverProvider structs in /common/common/config.go
accordingly.
Custom archiver FAQ
If my custom Archive method can automatically be retried by the caller, how can I record and access progress between retries?
Handle this situation by using ArchiverOptions
.
Here is an example:
func(a * Archiver) Archive(ctx context.Context, URI string, request * ArchiveRequest, opts...ArchiveOption) error {
featureCatalog: = GetFeatureCatalog(opts...) // this function is defined in options.go
var progress progress
// Check if the feature for recording progress is enabled.
if featureCatalog.ProgressManager != nil {
if err: = featureCatalog.ProgressManager.LoadProgress(ctx, & prevProgress);
err != nil {
// log some error message and return error if needed.
}
}
// Your archiver implementation...
// Record current progress
if featureCatalog.ProgressManager != nil {
if err: = featureCatalog.ProgressManager.RecordProgress(ctx, progress);
err != nil {
// log some error message and return error if needed.
}
}
}
If my Archive
method encounters an error that is non-retryable, how do I indicate to the caller that it should not retry?
func(a * Archiver) Archive(ctx context.Context, URI string, request * ArchiveRequest, opts...ArchiveOption) error {
featureCatalog: = GetFeatureCatalog(opts...) // this function is defined in options.go
err: = youArchiverImpl()
if nonRetryableErr(err) {
if featureCatalog.NonRetryableError != nil {
return featureCatalog.NonRetryableError() // when the caller gets this error type back it will not retry anymore.
}
}
}
How does my history archiver implementation read history?
The archiver package provides a utility called HistoryIterator which is a wrapper of ExecutionManager.
HistoryIterator
is more simple than the HistoryManager
, which is available in the BootstrapContainer, so archiver implementations can choose to use it when reading Workflow histories.
See the historyIterator.go file for more details.
Use the filestore historyArchiver implementation as an example.
Should my archiver define its own error types?
Each archiver is free to define and return its own errors. However, many common errors that exist between archivers are already defined in common/archiver/constants.go.
Is there a generic query syntax for the visibility archiver?
Currently, no. But this is something we plan to do in the future. As for now, try to make your syntax similar to the one used by our advanced list Workflow API.
Upgrade Server
If a newer version of the Temporal ServerWhat is the Temporal Server?
The Temporal Server is a grouping of four horizontally scalable services.
Learn more is available, a notification appears in the Temporal Web UI.
If you are using a version that is older than 1.0.0, reach out to us at community.temporal.io to ask how to upgrade.
First check to see if an upgrade to the database schema is required for the version you wish to upgrade to. If a database schema upgrade is required, it will be called out directly in the release notes. Some releases require changes to the schema, and some do not. We ensure that any consecutive versions are compatible in terms of database schema upgrades, features, and system behavior; however there is no guarantee that there is compatibility between any two non-consecutive versions.
When upgrading your Temporal Server version, ensure that you upgrade sequentially. For example, when upgrading from v1.n.x, always upgrade to v1.n+1.x (or the next available version) and so on until you get to the required version.
The Temporal Server upgrade updates or rewrites the old version data with the format introduced in the newer version. Because Temporal Server guarantees backward compatibility between two consecutive minor versions, and because older versions of the code are eventually removed from the code base, skipping versions when upgrading might cause older formats to become unrecognizable. If the old format of the data can't be read to be rewritten to the new format, the upgrades fail.
Check the Temporal Server releases and follow these releases in order. You can skip patch versions; use the latest patch of a minor version when upgrading.
Also be aware that each upgrade requires the History Service to load all Shards and update the Shard metadata, so allow approximately 10 minutes on each version for these processes to complete before upgrading to the next version.
Use one of the upgrade tools to upgrade your database schema to be compatible with the Temporal Server version being upgraded to.
If you are using a schema tools version prior to 1.8.0, we strongly recommend never using the "dryrun" (-y
, or --dryrun
) options in any of your schema update commands.
Using this option might lead to potential loss of data, as when using it will create a new database and drop your
existing one.
This flag was removed in the 1.8.0 release.
Upgrade Cassandra schema
If you are using Cassandra for your Cluster's persistence, use the temporal-cassandra-tool
to upgrade both the default and visibility schemas.
Example default schema upgrade:
temporal_v1.2.1 $ temporal-cassandra-tool \
--tls \
--tls-ca-file <...> \
--user <cassandra-user> \
--password <cassandra-password> \
--endpoint <cassandra.example.com> \
--keyspace temporal \
--timeout 120 \
update \
--schema-dir ./schema/cassandra/temporal/versioned
Example visibility schema upgrade:
temporal_v1.2.1 $ temporal-cassandra-tool \
--tls \
--tls-ca-file <...> \
--user <cassandra-user> \
--password <cassandra-password> \
--endpoint <cassandra.example.com> \
--keyspace temporal_visibility \
--timeout 120 \
update \
--schema-dir ./schema/cassandra/visibility/versioned
Upgrade MySQL / PostgreSQL schema
If you are using MySQL or PostgreSQL use the temporal-sql-tool
, which works similarly to the temporal-cassandra-tool
.
Refer to this Makefile for context.
PostgreSQL
Example default schema upgrade:
./temporal-sql-tool \
--tls \
--tls-enable-host-verification \
--tls-cert-file <path to your client cert> \
--tls-key-file <path to your client key> \
--tls-ca-file <path to your CA> \
--ep localhost -p 5432 -u temporal -pw temporal --pl postgres --db temporal update-schema -d ./schema/postgresql/v96/temporal/versioned
Example visibility schema upgrade:
./temporal-sql-tool \
--tls \
--tls-enable-host-verification \
--tls-cert-file <path to your client cert> \
--tls-key-file <path to your client key> \
--tls-ca-file <path to your CA> \
--ep localhost -p 5432 -u temporal -pw temporal --pl postgres --db temporal_visibility update-schema -d ./schema/postgresql/v96/visibility/versioned
MySQL
Example default schema upgrade:
./temporal-sql-tool \
--tls \
--tls-enable-host-verification \
--tls-cert-file <path to your client cert> \
--tls-key-file <path to your client key> \
--tls-ca-file <path to your CA> \
--ep localhost -p 3036 -u root -pw root --pl mysql --db temporal update-schema -d ./schema/mysql/v57/temporal/versioned/
Example visibility schema upgrade:
./temporal-sql-tool \
--tls \
--tls-enable-host-verification \
--tls-cert-file <path to your client cert> \
--tls-key-file <path to your client key> \
--tls-ca-file <path to your CA> \
--ep localhost -p 3036 -u root -pw root --pl mysql --db temporal_visibility update-schema -d ./schema/mysql/v57/visibility/versioned/
Roll-out technique
We recommend preparing a staging Cluster and then do the following to verify the upgrade is successful:
- Create some simulation load on the staging cluster.
- Upgrade the database schema in the staging cluster.
- Wait and observe for a few minutes to verify that there is no unstable behavior from both the server and the simulation load logic.
- Upgrade the server.
- Now do the same to the live environment cluster.
Health checks
The Frontend Service supports TCP or gRPC health checks on port 7233.
If you use Nomad to manage your containers, the check stanza would look like this for TCP:
service {
check {
type = "tcp"
port = 7233
interval = "10s"
timeout = "2s"
}
or like this for gRPC (requires Consul ≥ 1.0.5
):
service {
check {
type = "grpc"
port = 7233
interval = "10s"
timeout = "2s"
}
Set up Multi-Cluster Replication
The Multi-Cluster ReplicationWhat is Multi-Cluster Replication?
Multi-Cluster Replication is a feature which asynchronously replicates Workflow Executions from active Clusters to other passive Clusters, for backup and state reconstruction.
Learn more feature asynchronously replicates Workflow Execution Event Histories from active Clusters to other passive Clusters, and can be enabled by setting the appropriate values in the clusterMetadata
section of your configuration file.
enableGlobalNamespace
must be set totrue
.failoverVersionIncrement
has to be equal across connected Clusters.initialFailoverVersion
in each Cluster has to assign a different value. No equal value is allowed across connected Clusters.
After the above conditions are satisfied, you can start to configure a multi-cluster setup.
Set up Multi-Cluster Replication prior to v1.14
You can set this up with clusterMetadata
configuration; however, this is meant to be only a conceptual guide rather than a detailed tutorial.
Please reach out to us if you need to set this up.
For example:
# cluster A
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 100
masterClusterName: "clusterA"
currentClusterName: "clusterA"
clusterInformation:
clusterA:
enabled: true
initialFailoverVersion: 1
rpcAddress: "127.0.0.1:7233"
clusterB:
enabled: true
initialFailoverVersion: 2
rpcAddress: "127.0.0.1:8233"
# cluster B
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 100
masterClusterName: "clusterA"
currentClusterName: "clusterB"
clusterInformation:
clusterA:
enabled: true
initialFailoverVersion: 1
rpcAddress: "127.0.0.1:7233"
clusterB:
enabled: true
initialFailoverVersion: 2
rpcAddress: "127.0.0.1:8233"
Set up Multi-Cluster Replication in v1.14 and later
You still need to set up local cluster clusterMetadata
configuration
For example:
# cluster A
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 100
masterClusterName: "clusterA"
currentClusterName: "clusterA"
clusterInformation:
clusterA:
enabled: true
initialFailoverVersion: 1
rpcAddress: "127.0.0.1:7233"
# cluster B
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 100
masterClusterName: "clusterB"
currentClusterName: "clusterB"
clusterInformation:
clusterB:
enabled: true
initialFailoverVersion: 2
rpcAddress: "127.0.0.1:8233"
Then you can use the tctl admin
tool to add cluster connections. All operations should be executed in both Clusters.
# Add cluster B connection into cluster A
tctl -address 127.0.0.1:7233 admin cluster upsert-remote-cluster --frontend_address "localhost:8233"
# Add cluster A connection into cluster B
tctl -address 127.0.0.1:8233 admin cluster upsert-remote-cluster --frontend_address "localhost:7233"
# Disable connections
tctl -address 127.0.0.1:7233 admin cluster upsert-remote-cluster --frontend_address "localhost:8233" --enable_connection false
tctl -address 127.0.0.1:8233 admin cluster upsert-remote-cluster --frontend_address "localhost:7233" --enable_connection false
# Delete connections
tctl -address 127.0.0.1:7233 admin cluster remove-remote-cluster --cluster "clusterB"
tctl -address 127.0.0.1:8233 admin cluster remove-remote-cluster --cluster "clusterA"