Transzap Moves to System z for Three 9′s Uptime

By Mary E. Shacklett ( published at Mainframe Executive – http://www.mainframe-exec.com )

The story of Transzap and its adoption of a System z9 Business Class processing solution isn’t about re-architecting systems for the purpose of leveraging business applications and preserving the business intelligence of legacy code. Rather, Transzap made a strategic decision to move from an open and distributed computing environment to an open computing environment constructed around System z. In so doing, Transzap believed it had found a robust computing solution that would differentiate it in the highly competitive Software as a Service (SaaS) marketplace.

Tranzap’s Business Environment

Located in Denver, Transzap got its start nine years ago as a SaaS company that provided e-payable solutions, data exchange, and analytical reporting to companies in the oil and gas industry. Through its Oildex services, Transzap provides its more than 4,200 customers with information about how and where they spend their money. The Transzap Oildex service is a mission-critical system that oil and gas industry companies depend on for their business workflow and decision-making.

“Offering software as a service is a highly competitive business and an entirely different world from that of traditional software in corporate IT shops,” says Peter Flanagan, Transzap’s CEO. “We have a relatively compact IT staff, but the number of companies and users we serve is the equivalent of that of a very large company. In addition, when you’re a SaaS provider, you’re held to high standards of service and reliability. Our business was growing rapidly, and when we began to look at our projections and consider the impact of a tenfold increase in our business, we started to think about the ramifications of this on everything in our IT infrastructure, starting with CPUs and storage.”

Meeting the Challenges of Three 9′s Uptime

As a SaaS provider, Transzap competes against the internal IT resources of its customers, and also against other industry SaaS providers. Consistently, Transzap must prove it offers a more economical, reliable and powerful solution than any other market or internal company alternative, and is held to strict Service Level Agreements (SLAs).

“What we offer is a mission-critical service to our customers, and to the oil and gas industry,” says Flanagan. “One of the things we must consistently do is to meet all of our SLA commitments.”

A major challenge for many SaaS companies, according to Flanagan, is guaranteeing uptime and availability. “The normal SaaS uptime standard for those providers such as Transzap that are willing to commit to an SLA has been 99.5 percent uptime, but the industry is changing,” says Flanagan. “The new standard that customers are asking for, and which we set for ourselves, is three 9′s uptime of 99.9 percent.”

Three 9′s uptime, or less than 45 minutes per month of downtime, is the standard that Transzap targets for its software services.

“We’ve chosen to include downtime penalties in our SLAs, and the new three 9′s uptime standard that the SaaS industry will eventually be moving toward is something that we’re already moving toward as an internal metric,” says Flanagan. “As a software services supplier, we also have to meet Sarbanes- Oxley [SOX] control requirements in our customers’ business processes.”

As part of its SOX compliance work, Transzap regularly undergoes SAS 70 audits. “We maintain SOX compliance and we set three 9′s uptime as our internal standard, but the third prong of this is the continuous and rapid development cycles for our product to ensure we’re delivering exactly what our customers require in an industry that rapidly changes,” says Flanagan. “The software applications we deliver have to work right the first time, and the Quality of Service [QoS] requirements are acute.”

Flanagan puts himself in the position of a CIO of an energy company, which is his typical customer. “You’re the CIO, and your line of business guy comes to you, wanting you to entrust a mission-critical business process to an outside SaaS provider,” says Flanagan. “If you’re a CIO, you immediately think of the risk. You take a hard look at your own department and the business process’ and you wonder if you can’t do the job yourself. There’s no question that you’re going to be skeptical of the SaaS provider, because having email down for a day is one thing, but impacting the functions of a mission-critical system is quite another. For example, what if the SaaS vendor can’t meet your SLAs on uptime and reliability? It’s common knowledge that some SaaS vendors are having trouble meeting their SLAs. We wanted to provide these CIOs with assurances in the reliability of the Transzap system, and we’re willing to stand behind these assurances by including downtime penalties in our SLAs.”

Assessing the Data Center

Before moving to a new solution for its data center, Transzap’s original processing environment was a distributed architecture of Dell and Linux servers.

“Transzap needed to avoid server sprawl, and it also wanted to control costs,” says David Kreuter, president and senior consultant for VM Resources, which Transzap retained for its data center planning and makeover. “Transzap was foreseeing enormous growth for the commercial applications it was providing to customers over its network,” Kreuter says. “Transzap was a large Oracle user, and also had additional colocation issues related to hardware.”

In 2007, Transzap and VM Resources sat down to start planning a course of action. “Like so many SaaS providers, we had a scaled-out infrastructure and a highly distributed system,” says Flanagan. “We had lots of boxes and racks. We were finding that as we continued to grow, we were running more instances of systems and it was becoming harder to scale out our data base. As we started to scale, our payloads were growing and we also were adding complexity.
The total scenario made it much more difficult to deliver the quality of service our customers were expecting, and our costs were going up.”

Flanagan and his staff faced a quality of service issue in storage array failures for the Intel-based systems in the data center; these were beginning to affect availability.

“The question for us was, how do you manage the hardware layer of the data center when your staff core competencies are in software and operating systems?” asks Flanagan. “We could encounter a RAID issue for a critical subsystem, and it was hard to pin down the problem and specify it to a particular vendor’s SLA or service contract with us. This was because we had the Linux operating system vendor, the hardware vendor, the database vendor, and they were all telling us it was the other guy’s fault. Meanwhile, we were orchestrating a failover with clustered databases. We concluded that for all our Dell and Intel boxes, invariably our suppliers were putting us in a position where they were relying on us to make the diagnosis.”

Moving to System z

Transzap wanted to consider virtualization in its data center; its first thought was to look at an Intel-VMware solution.

“Instead of buying eight to 16 new boxes from a commercial supplier, we felt we could reduce server sprawl and virtualize by using Red Hat and SUSE Linux on commodity Intel hardware with a VMware Hypervisor,” says Flanagan. “But then we asked ourselves, what kind of service would we get? This service strategy hadn’t worked out for us in the past, so why should it work now?”

It was then that Transzap talked with IBM. “We didn’t have any mainframe experience, and also had been a distributed, open systems shop, but there was this idea that possibly a System z that could run a series of virtual Linux machines could be a solution for us,” says Flanagan. “We started to investigate this. The ultimate drivers for us were that the solution was less complex than anything else we had looked at, and it offered greater availability for our applications.”

In its re-architected data center, Transzap decided to move critical Linux systems to the System z, while maintaining a mix of Intel Linux and Windows servers to run applications with lower availability requirements.

“You might call this a ‘repurposed’ System z because we didn’t have a legacy history with the mainframe and were using it strictly to virtualize our data center, and to improve reliability and quality of service,” says Flanagan. “I also can tell you that since June 2008, when the System z first went into production, the box has never been rebooted.”

Implementation

Transzap had to carefully plan and orchestrate an implementation of System z that wouldn’t disrupt services to its customer base.

“Some of our equipment is at our facility, but all of our production systems were actually hosted in community racks in another secure data center facility here in Colorado,” says Flanagan. “The first thing we had to do was decide where we were going to put our new System z. The original facility wasn’t big enough to add the System z to our own data center, so we might have been one of the first companies to place a mainframe in a caged environment at a new hosting facility where we have all our racks!”

In moving to System z, Transzap had two migrations to do: It had to migrate data and applications to the new system and migrate its internal IT staff skillsets so staff could work on the System z.

“We discussed which system we wanted to move first, and decided that it would be our main transactions database,” says Flanagan. “To do this safely, we first orchestrated several proof of concept migrations with smaller databases.”

Staff training for a new environment was less quantifiable. “For this type of environment, you need z/VM know-how, knowledge of System z, knowledge of virtual networks using virtual servers, skill in knowing how to run Oracle on virtual machines, and system administration skills in Linux,” says VM Resources’ Kreuter.

“We were unsure of how long it would take to acquire System z skillsets at the beginning,” says Flanagan. “We were a small company, and we knew we couldn’t rush out and hire people. This is where working with an outside consultant, in this case VM Resources, was very important to our success. They did a great job of showing us how to allocate resources in the new z/VM environment. Very quickly, we learned how to replicate a physical machine as a virtual machine in z/VM. As we move forward, the relationship with a knowledgeable outside consulting firm is exactly what we need.”

When Transzap went into production on System z in June, it started with two Integrated Facility for Linux (IFL) processors and 16GB of memory. Transzap also was using an IBM DS6800 storage system and TS3400 tape library.

“Transzap didn’t have enough capacity to meet the demands of its Oracle applications,” says Kreuter. “Initially, they encountered response time issues when the system first went live, but the system was never down for their clients.”

The issue was fine-tuning the database, which according to Kreuter, began as a critical situation incident, or CritSit. “IBM was all over this,” Kreuter says. “Their service and support were instrumental in speeding resolution since Transzap was concurrently upgrading Oracle while it was moving the database to System z.”

Ultimately, the solution was to add more capacity to handle the database load. By the second day of production, IBM helped Transzap establish an environment on the System z that had gone from two to six IFLs, and today is running three IFLs with SQL tuning for Oracle.

For the project, Transzap was using z/VM version 5 release 3, along with SUSE SLES Linux as the virtual guest operating system and Oracle 10g Enterprise Edition as the database.

“Transzap wasn’t just doing a virtualization,” says Kreuter. “It also was bringing on board a rich set of tools for system management, system monitoring, and virtual server automation and deployment. These tools included IBM DIRMAINT for server deployment and management and for storage management, and the Performance Toolkit for real-time performance management of Linux virtual servers with the z/VM Hypervisor.”

The new tools were critical to Transzap’s success. “We had no issues with provisioning and setting up LPARs and z/VM,” says Flanagan. “We encountered the tuning issues that we had to deal with when we went live, but we were able to tune the I/O and memory parameters correctly, and we’re now at a point where we’re quite adept at cloning virtual environments for test suites, and customer-specific applications within a two-hour time span. In our old environment of distributed physical servers, this provisioning could take as long as two months.”

Future Directions

Introducing System z to its data center gives Transzap the ability to add capacity easily and as needed, as well as a means of performing system upgrades without worries about server obsolescence. For Flanagan and his staff, this was critical because one of Transzap’s goals is to make what has been an internal uptime standard of three 9′s (99.9 percent) a hard commitment to its customers.

“We have three more database subsystems that are now resident on System z, as well as several different applications and message queue servers based on IBM WebSphere,” says Flanagan. “There also is the potential for future growth to a System z10 Business Class machine, and we’re additionally considering a deployment of DB2 and Alphablox.”

There were many operations and decision points along the way to transforming a highly distributed data center into a more tightly integrated architecture constructed around a System z platform, but Flanagan recalls one specific moment.

“The defining moment for me came one morning at 2:30,” says Flanagan. “We had a technician on duty 24/7 and he got a call from someone at IBM’s monitoring center. The caller from IBM said, “Your System z storage array just experienced a drive failure. There’s no need to worry because failover was done online and there is no loss of data. I just wanted to know if you wanted us to come out now. The engineer can be there in 15 minutes to plug in a new drive.” I knew that we had moved to the System z because we wanted the ability to go to a three 9′s environment and to commit this to our customers,” says Flanagan, “but the level of service was a pleasant surprise, and something we never could have achieved in our old distributed server operation. Competition is very intense in the SaaS space, and with the re-architecture to System z, I’m sleeping easier at night.”