Updated: 8/30/18 Hopefully you have an established routine of local backups or cloud backups, and you should be sleeping a little better (when you get a chance to sleep). And like fire-drills or other emergency procedures, you are probably practicing a restore of your backups on another server. You have probably discovered, depending on the size of your data and speed of the system, this restore may take hours. Moving from ontape to onbar helped by enabling parallel backup and restore of your dbspaces, but while backups can happen with the lights on, during a restore, the server won’t be accessible until the restore is complete. You may not be able to afford for your applications to be down that long. You don’t have to just be reacting to a failure, you can proactively reduce down-time by utilizing one or more of the Informix Technology Replication technologies discussed in the “Informix Replication Technology” blog by Nagaraju Inturi. With High-availability Data Replication (HDR) your database operations can switch to a secondary server in the event of a hardware or disk failure on your primary system. Each of the “for-purchase” licenses authorize the use of at least one High-Availability server. If you are not using any of the “for-purchase” editions, or want to follow along without touching your actual system, the HDR and RSS technology is available between two instances of the Developer Edition. This blog shows how to implement an HDR secondary server using the “ifxclone” command, after making a few simple changes to your current server’s configuration, without stopping it (this is of course about availability). Overview of HDR ComponentsOne server plays the role of “Primary”, and all updates made to logging Database tables, are recorded in the log. The Primary sends those log updates to each of the “secondary” servers, where it will apply those changes to the physical disk. This continually keeps the data consistent with the data on the primary. If the connection between the Primary and the Secondary is too slow, or unstable, this can impact the performance on the primary. For these situations, creating an Remote Secondary Standby server (RSS) will allow the Primary to use asynchronous communications to the secondary, with the trade-off being a risk that a primary failure could happen before some committed transaction are recorded on the RSS machine. Therefore, HDR is your High-Availability option, and RSS is better for fast disaster recovery. Some “for-purchase” licenses entitle the use of HDR and RSS simultaneously. Limitations/RestrictionsHDR and RSS replication technology uses the logs to keep the secondary system up to date. Therefore, for data to be replicated, it must be stored in logging databases and in spaces with logging. It cannot be in BLOB spaces (no logging is done for BLOB data), non-logging smartblob spaces, or external storage. Other restrictions to configuration of the servers are in the Administrators Guide. Prerequisites1. HARDWARE: A second machine with:
Common Setup (from Source Server) Some changes and files will be the same on both servers. We will make the changes to the files on “host1” (our “server1” machine), and then “ifxclone” will copy them to the new target host (host2). We are avoiding naming our systems “primary” and “secondary” because these are roles that can switch between the servers over time, and perhaps for long periods of time. Trusted Server Connection The primary server must trust the connection from the new database server. We need mutual trust when the roles of the two servers reverse. While you can do this at the OS level, we will assume you want this trust limited to the database service. This can be done by creating a file you identify in the ONCONFIG file’s REMOTE_SERVER_CFG parameter in the $INFORMIXDIR/etc directory. For this example, will create a file named "trusted.hosts" in the "$INFORMIXDIR/etc" directory. This also allows HCL Informix tools to update the trust information later. Create an empty “trusted.hosts” file, and set the file permissions to limit write access to the instance owner(informix) and group (informix):
Now configure the server to use the new file:
Enable “ifxclone” and the “sysadmin:admin” function to setup the connectivity information on all servers in the cluster (for now it is just one).
You can manually edit the file the first time, but we will execute the “admin” SQL function in the “sysadmin” database using dbaccess to update the REMOTE_SERVER_CFG on all servers in the cluster (I am also adding our current server as trusted so either server can initiate a connection to the other, and this trust information will eventually be copied to our new server too): Database
HDR ONCONFIG Changes Next disable temp table logging, and enable snapshot copy to be made by the “ifxclone” command:
NOTE:
Database
INFORMIXSQLHOSTS The “ifxclone” command will attempt to add the new server to the INFORMIXSQLHOSTS file, and if it already exists will not properly modify the file to define a server group the way we want. If you already have the entry for the new server we should delete it, or comment it out for now: Database
Database Logging Mode Only databases using logging will be replicated. Databases using buffered logging can lose transactions if the log has not been flushed to disk and the server fails. This same window of vulnerability extends to the replica as well. If all databases are logging, or you don’t need the non-logging databases to be replicated, you can move on to the next step. You can verify the list of the non-logging databases, and those that are using buffered logging by running the following query against the “sysmaster” database table “sysdatabases” using “dbaccess”: Database
If no rows are produced, then all databases are using unbuffered logging. For each database you want replicated that is currently non-logging (or is using buffered logging), you can convert them to unbuffered logging mode during the ontape system backup command using the “ontape” option “-U” and the list of database names to change to Unbuffered-Logging. For example, change “db1” and “db2” to Unbuffered-Logging during the Level 0 system backup requires the following ontape command: Database
Notes:
With Data Replication (DR) you should create all new Databases with some form of logging (ANSI, Unbuffered, or Buffered). Chunk Path/Device Information While we are still on server1, you should collect the path information for the chunks since these paths need to exist on the new server. You can collect this with the “onstat -d” command. We will discuss the output during setup of server2. Target SetupMost, if not all, of the new server configuration can be handled by “ifxclone”:
Create Needed Directories and Symlinks to Raw Devices All dbspace chunk paths used on the primary must be the same on our new server. You will need to create the matching symlinks to the corresponding physical devices. For cooked files, the “ifxclone” can create the missing chunk files automatically, even for the root dbspace, but it will not create the parent directories. The parent directories for those paths must already exist for “ifxclone” to successfully create the chunk files. You can find the paths in the “onstat -d” output from “server1”. For this example, “onstat -d” produced the following output, showing that the chunk paths are all under the “/chunkdir” directory: Then, for each of those paths, make sure the parent directories exist with adequate permissions. The above example only needs the “/chunkdir” directory to exist. It also must have the ownership, and permissions, so you will need to do these operations as root/administrator: Database
Under those directories you can manually create the needed raw-device symlinks (need permission of 660, same as any cooked files you choose to create manually.) If the list of chunks is long, you can collect the paths via the following command on “server1”, and use the output to create a script to handle the above steps for creating the required parent directories: Database
Run “ifxclone” The “ifxclone” will make “server1” a primary, perform “fake” backup of “server1”, and restore it to “server2” as a new HDR secondary. All changes to logged spaces on “server1” will be continually applied to “server2”. The “ifxclone” will add the secondary to the INFORMIXSQLHOSTS file on the “source” server, and copy the “source” server’s ONCONFIG information to the target: Database
See the Administrator’s Reference and the section for “The ifxclone utility” for complete information on ifxclone. Check the Progress To monitor the Data Replication Information and message log, run the following (Ctrl-C to break out):
The server state at the top of each output burst should change from “Initialization”, to “Fast Recovery”, then eventually to “Read-Only (Sec)”, and show it is paired with server1 and the Data Replication state is “on”: Server Failed to Initialize? If the server failed to initialize, check the message log for errors. Correct any missing “paths”, or permissions on the filesystem, or any other changes needed to the ONCONFIG file. Then rerun the “ifxclone” command, but you must add “--useLocal” so you don’t overwrite any changes you fixed in the local ONCONFIG. INFORMIXSQLHOSTS was Updated The “autoconf” option added a “group” to the server INFORMIXSQLHOSTS file, and assigned each server in the group using the “g=” option. The group name is the name of the source server but with “g_” prepended. The INFORMIXSQLHOSTS file from our example contains three lines now: Database
Verify Backup Device Configuration Verify the ONCONFIG LTAPEDEV and TAPEDEV values (copied from server1 to server2) exist and have the correct permissions (RWX for owner and group) on the new server. This will ensure that any automatic and manual backups run as intended. If you need to change them on the secondary later, you can use “onmode -wf” command while the server is running to update the value. Copying the alarmprogram.sh ensures that the same automatic log backup configuration is used on server2. In the event server2 becomes the primary, the log backups are already configured for operation. Manual Failover In the event of a server1 being unavailable or you need to shut it down, the secondary can be made the primary. On the secondary server, you can run the following command which will both shutdown the current primary, and make the secondary the new primary (force is needed if the primary is already “unavailable”): onmode -d make primary server2 force Until you run the command to make the secondary the primary, server2 is read-only (even when configured as updatable secondary, which forwards updates to the primary). Making a new Secondary from old PrimaryOnce you have “server1” working again, and it didn’t lose any data, it can become the “secondary” for the current primary (server2). Just restart server1 with “oninit -PHY” (pretends it was physically restored, and waiting for logical recovery), make it the secondary for our current primary so it will recover logical logs from primary, and monitor the message log until the server is running as a secondary: Database
If the server wasn’t down long, your cluster is complete again, with “server1” as your secondary. However, if any of the needed logical logs are no longer available directly from the primary, the server will remain in “Fast Recovery (sec)” state, and message log will indicate you need to do a recovery from tape: Database
In this event, you need to restore the missing logs from the log backups of server2. For example, if you are using “ontape”, you could copy the server2 logical log backups to host1, and place them in the server1 configured LTAPEDEV location. Since the backup names all start with a prefix indicating “host2” and server number “0”, we provide this information via the IFX_ONTAPE_FILE_PREFIX environment variable, and then run the “ontape” command to restore the logical logs: Database
If you lost data disk, you re-clone from “server2” by following the “Setup Target” steps again, but this time the target is “server1”, and the source is “server2”. Client Failover to SecondaryWhen planning for server failover in the event of a primary failure, the clients connecting to this cluster should be prepared for connecting to the current primary server accepting transactions. The traditional way for enabling this, without utilizing the Connection Manager, is using sqlhosts groups instead of server names for connecting. Connect to Server Group Client applications configured to connect to the “g_server1” INFORMIXSERVER group will automatically connect to the first server in the group, and if unavailable, it will connect to the next server in the group, until it connects to the current primary. If the application automatically attempts to reconnect following a connection failure, the switch to a different server is transparent to users. Otherwise restarting the application to force a new connection will find the current primary. Configuring client applications using JDBC URL or JDBC DataSource to connect to the server group need to include options to specify the SQLHOSTS file path as either a local file or a URL to retrieve the file, and type of path, or can be retrieved via LDAP. Here is an example JDBC URL specifying the INFORMIXSERVER as the server group, SQLH_TYPE, and SQLH_FILE (SQLHOSTS file is “/local/sqlhosts”):
You can see this and additional details and options for connecting to HA servers via JDBC in the IBM Informix JDBC Driver Programmer's Guide.
Further Reading: Connection Manager The connection manager can monitor the primary server availability, and automatically force a secondary to become the primary. Client connecting to the connection manager will be directed to the primary server. The connection manager can also implement rules to determine the best connection for clients. See the Administrator’s Guide and the section on “Connection management through the Connection Manager”. Kevin Mayfield Senior Solutions Architect at HCL Connect with me on LinkedIn Informix is a trademark of IBM Corporation in at least one jurisdiction and is used under license.
1 Comment
Michal Lukaszewicz
1/25/2018 09:27:38 am
This is a very good and comprehensive reading. I was preparing myself to use ifxclone to speed up build process of dev HA boxes and after going through your article I must admit it looks very promising.
Reply
Leave a Reply. |