13 KiB
title, intro, product, redirect_from, versions, type, topics
| title | intro | product | redirect_from | versions | type | topics | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Replacing a cluster node | If a node fails in a {% data variables.product.prodname_ghe_server %} cluster, or if you want to add a new node with more resources, mark any nodes to replace as offline, then add the new node. | {% data reusables.gated-features.cluster %} |
|
|
how_to |
|
About replacement of {% data variables.product.prodname_ghe_server %} cluster nodes
You can replace a functional node in a {% data variables.product.prodname_ghe_server %} cluster, or you can replace a node that has failed unexpectedly.
After you replace a node, {% data variables.location.product_location %} does not automatically distribute jobs to the new node. You can force your instance to balance jobs across nodes. For more information, see AUTOTITLE.
Warning
To avoid conflicts, do not reuse a hostname that was previously assigned to a node in the cluster.
Replacing a functional node
You can replace an existing, functional node in your cluster. For example, you may want to provide a virtual machine (VM) with additional CPU, memory, or storage resources.
To replace a functional node, install the {% data variables.product.prodname_ghe_server %} appliance on a new VM, configure an IP address, add the new node to the cluster configuration file, initialize the cluster and apply the configuration, then take the node you replaced offline.
Note
If you're replacing the primary database node, see Replacing the primary database node.
{% data reusables.enterprise_clustering.replacing-a-cluster-node-provision %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-admin-configure-ip %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-modify-cluster-conf %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-initialize-new-node %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-config-node %}
-
To take the node you're replacing offline, from the primary MySQL node of your cluster, run the following command.
ghe-remove-node NODE-HOSTNAMEThis command will evacuate data from any data services running on the node, mark the node as offline in your configuration, and stop traffic being routed to the node. For more information, see AUTOTITLE.
Replacing a node in an emergency
You can replace a failed node in your cluster. For example, a software or hardware issue may affect a node's availability.
Note
If you're replacing the primary database node, see Replacing the primary database node.
To replace a node in an emergency, you'll take the failed node offline, add your replacement node to the cluster, then run commands to remove references to data services on the removed node.
-
To remove the node that is experiencing issues from the cluster, from the primary MySQL node of your cluster, run the following command. Replace NODE-HOSTNAME with the hostname of the node you're taking offline.
ghe-remove-node --no-evacuate NODE-HOSTNAMEThis command will mark the node as offline in your configuration and stop traffic being routed to the node. You can run this command in
no-evacuatemode now because, later in this procedure, you'll run commands that instruct data services on the node to copy any replicas onto the other available nodes in the cluster. For more information, see AUTOTITLE. -
Add your replacement node to the cluster. {% data reusables.enterprise_clustering.replacing-a-cluster-node-provision %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-admin-configure-ip %}
-
To add the newly provisioned replacement node, on any node, modify the
cluster.conffile to add the replacement node. For example, this modifiedcluster.conffile adds the newly provisioned nodeghe-replacement-data-node-3:[cluster "ghe-replacement-data-node-3"] hostname = ghe-replacement-data-node-3 ipv4 = 192.168.0.7 # ipv6 = fd12:3456:789a:1::7 git-server = true pages-server = true mysql-server = true elasticsearch-server = true redis-server = true memcache-server = true metrics-server = true storage-server = true
{% data reusables.enterprise_clustering.replacing-a-cluster-node-initialize-new-node %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-config-node %}
-
-
Remove references to data services on the node you removed.
-
Find the UUID of the node you removed. To find the UUID, run the following command, replacing
HOSTNAMEwith the hostname of the node. You will use this UUID in the next step.ghe-config cluster.HOSTNAME.uuid -
To remove references to data services, run the following commands. Replace
UUIDwith the UUID of the node.These commands indicate to each service that the node is permanently removed. The services will recreate any replicas contained within the node on the available nodes within the cluster.
Note
These commands may cause increased load on the server while data is rebalanced across replicas.
For the
git-serverservice (used for repository data):ghe-spokesctl server destroy git-server-UUIDFor the
pages-serverservice (used for {% data variables.product.prodname_pages %} site builds):ghe-dpages remove pages-server-UUIDFor the
storage-serverservice (used for Git LFS data, avatar images, file attachments, and release archives):ghe-storage destroy-host storage-server-UUID --force
-
-
Optionally, delete the entry for the removed node in your
cluster.conffile. Doing so will keep yourcluster.conffile organized and save time during futureconfig-applyruns.-
To remove the entry from the file, run the following command, replacing
HOSTNAMEwith the hostname of the removed node.ghe-config --remove-section "cluster.HOSTNAME" -
To copy the configuration to other nodes in the cluster, from the administrative shell of the node where you modified
cluster.conf, runghe-cluster-config-apply.
-
Replacing the primary database node (MySQL or MySQL and MSSQL)
To provide database services, your cluster requires a primary MySQL node and at least one replica MySQL node. For more information, see AUTOTITLE.
If your cluster has {% data variables.product.prodname_actions %} enabled, you will also need to account for MSSQL in the following steps.
If you need to allocate more resources to your primary MySQL (or MySQL and MSSQL) node or replace a failed node, you can add a new node to your cluster. To minimize downtime, add the new node, replicate the MySQL (or MySQL and MSSQL) data, and then promote it to the primary node. Some downtime is required during the promotion process.
{% data reusables.enterprise_clustering.replacing-a-cluster-node-provision %} {% data reusables.enterprise_clustering.replacing-a-cluster-node-admin-configure-ip %} {% data reusables.enterprise_installation.ssh-into-cluster-node %} {% data reusables.enterprise_clustering.open-configuration-file %}
-
{% data reusables.enterprise_clustering.configuration-file-heading %} Add a new heading for the node and enter the key-value pairs for configuration, replacing the placeholders with actual values.
- Ensure that you include the
mysql-server = truekey-value pair. - If {% data variables.product.prodname_actions %} is enabled in the cluster, you will have to include the
mssql-server = truekey-value pair as well. - The following section is an example, and your node's configuration may differ.
... [cluster "HOSTNAME"] hostname = HOSTNAME ipv4 = IPV4-ADDRESS # ipv6 = IPV6-ADDRESS consul-datacenter = PRIMARY-DATACENTER datacenter = DATACENTER mysql-server = true redis-server = true ... ...
- Ensure that you include the
{% data reusables.enterprise_clustering.replacing-a-cluster-node-initialize-new-node %}
-
From the administrative shell of the node where you modified
cluster.conf, runghe-cluster-config-apply. The newly added node will become a replica MySQL node and any other configured services will run there.[!NOTE] The previous snippet does not assume {% data variables.product.prodname_actions %} is enabled in the cluster.
-
Wait for MySQL replication to finish. To monitor MySQL replication from any node in the cluster, run
ghe-cluster-status -v.If {% data variables.product.prodname_actions %} is enabled in the cluster, you will have to wait for MSSQL replication to complete.
Shortly after adding the node to the cluster, you may see an error for replication status while replication catches up. Replication can take hours depending on the instance's load, the amount of database data, and the last time the instance generated a database seed.
-
During your scheduled maintenance window, enable maintenance mode. For more information, see AUTOTITLE.
-
Ensure that MySQL(or MySQL and MSSQL) replication is finished from any node in the cluster by running
ghe-cluster-status -v.Warning
If you do not wait for MySQL(or MySQL and MSSQL) replication to finish, you risk data loss on your instance.
-
To set the current MySQL primary node to read-only mode, run the following command from the MySQL primary node.
echo "SET GLOBAL super_read_only = 1;" | sudo mysql -
Wait until Global Transaction Identifiers (GTIDs) set on the primary and replica MySQL nodes are identical. To check the GTIDs, run the following command from any cluster node.
ghe-cluster-each -r mysql -- 'echo "SELECT @@global.gtid_executed;" | sudo mysql'- To check that the global MySQL variable was set successfully, run the following command.
echo "SHOW GLOBAL VARIABLES LIKE 'super_read_only';" | sudo mysql -
If {% data variables.product.prodname_actions %} is enabled in the cluster, SSH into the node that will become the new primary MSSQL node.
ssh -p 122 admin@NEW_MSSQL_NODE_HOSTNAME- From within a
screensession run the following command to promote MSSQL to the new node.
/usr/local/share/enterprise/ghe-mssql-repl-promoteThis will attempt to access the current primary MSSQL node and perform a graceful failover
- From within a
-
After the GTIDs on the primary and replica MySQL nodes match, update the cluster configuration by opening the cluster configuration file at
/data/user/common/cluster.confin a text editor.- Create a backup of the
cluster.conffile before you edit the file. - In the top-level
[cluster]section, remove the hostname for the node you replaced from themysql-masterkey-value pair, then assign the new node instead. If the new node is also a primary Redis node, adjust theredis-masterkey-value pair. - If {% data variables.product.prodname_actions %} is enabled in the cluster, you will have to include the
mssql-server = truekey-value pair as well.
[cluster] mysql-master = NEW-NODE-HOSTNAME redis-master = NEW-NODE-HOSTNAME primary-datacenter = primary ...
- Create a backup of the
-
In the administrative shell of the node where you modified
cluster.conf, start ascreensession and runghe-cluster-config-apply. This command reconfigures the cluster, promoting the newly added node to the primary MySQL node and converting the original primary MySQL node into a replica.[!NOTE] The previous snippet does not assume {% data variables.product.prodname_actions %} is enabled in the cluster.
-
Check the status of the MySQL(or MySQL and MSSQL) replication from any node in the cluster by running
ghe-cluster-status -v. -
If {% data variables.product.prodname_actions %} is enabled in the cluster, run the following command from the new MySQL and MSSQL node.
/usr/local/share/enterprise/ghe-repl-post-failover-mssql -
When the MySQL(or MySQL and MSSQL) replication is finished, from any node in the cluster, disable maintenance mode. See AUTOTITLE.