LibGo Travel, one of the largest privately held travel companies in the U.S., provides vacation packages through its retail stores and wholesale distribution channels to consumers, partners, travel agents, and stores. The company wanted to expand its offerings by adding dynamic, branded, and personalized packages. To help execute this idea, LibGo had to bring together our travel partners, including airlines, hotels, and travel aggregators, as well as LibGo Travel's existing heterogeneous systems environment. As a result, LibGo's Next-Generation Travel System (NGTS) is among the most sophisticated booking systems that are currently being implemented. Instead of building one-off interfaces for each partner - a time-consuming, expensive, and brittle solution -- LibGo adopted a modern SOA with shared business services and Web services: data interchange would be XML-based, and WSDL would be the single interface definition standard.
This report from the trenches focuses on our experience of architecting NGTS as a large-scale composite application that is able to deliver more than a million transactions per day. In the process we mastered performance, transactional semantics, and the security challenges that are unique to an SOA environment, which is the focus of this article. The software architecture is shown in Figure 1.
Tracing a Customer Order
The heart of LibGo's booking system, which sits behind the portal, is NGTS, a J2EE-based composite application that ties together a plethora of local back-end systems and remote partner systems, all based on different technologies but accessed through service interfaces. A single sign-on framework integrated into the booking portal provides agents' security and enables the logging of the agents' activities. Each time a customer quote is generated by the agent, NGTS creates a lead in the Oracle E-Business Suite sales online application. To generate a personalized vacation package for a customer, the agent works through each component of the vacation package, such as airline, hotel, and car. A J2EE-based packaging and promotion engine (based on a rules engine) within NGTS gathers the appropriate schedule and pricing information from partners and internal systems, applies appropriate discounts, and delivers the information to the agent through the portal. When the sale is made, the booking is published to the ESB infrastructure where it's propagated to the CRM, Financials, and Project Accounting modules. The process is shown in Figure 2.
Airlines, hotels, and other partners may change customers' travel schedules. To handle this scenario, LibGo created a callback service within NGTS that the partner systems call via Web services. NGTS processes this information and publishes it to the ESB so it can be matched against a customer booking, thus triggering notification - either by e-mail or by agents calling customers.
Now, the reality-check: not all vendors use HTTP-based communication to send information to LibGo. Pricing and availability information can come in multiple formats from FTP, flat files, message queues, e-mails, and fax. Also, each of our internal and external systems has a slightly different data representation, even for the same entities. In keeping with LibGo's design principles, a service adapter layer encapsulates capabilities that are implemented in both internal and external partner systems, and abstracts the business logic for access through an XML-based interface defined in WSDL. The enterprise service bus (ESB) infrastructure facilitates communications and provides services such as data transformation and enrichment. NGTS also provides key foundational services on its own, such as common security services, utility services such as fax and e-mail, a shopping cart-like service, and an elaborate rules engine. All front-end systems (including the .NET-based Web presence) use the common services and business logic provided through NGTS and the ESB infrastructure.
The rules engine is a truly foundational service used to do all manner of things ranging from implementing the decision logic for alerting (call a customer immediately in case of a same-day flight change), to implementing the result caching policies (refreshes price information for air travel on some routes more often than others), to enabling dynamic pricing and packaging of travel services (mid-week travel and minimum night stays).
Now let's discuss LibGo's experiences with caching, transactions, and security.
Caching Is a Necessity for Real-Time Distributed SOAs
Four key factors drove us to consider implementing caching within the SOA:
Web-only travel vendors must cache heavily to avoid partner fees that are levied each time a search is carried out. Often enough, such "over-caching" results in bad data being served, and customers end up being disappointed when offers turn out to be unavailable, or at least not at the promised price, when they press the Buy button. For LibGo, the bottom line was that we needed capabilities to define sophisticated and configurable caching rules to deal with price volatility. For example, prices on flights to Hawaii are less volatile than those on Las Vegas flights. Therefore, it might make sense to not cache flights to a particular Las Vegas flight. Cache invalidation rules based on destination location and heuristics helped LibGo optimize the caching strategy. Figure 3 shows the type of data that is cached.
Caching at the UI level. This applies to HTML pages and fragments. Using Oracle Application Server's WebCache, information items such as destination, airline, and hotel information can be easily cached. Since the WebCache capability is closely integrated with Oracle Portal, it can cache output from both .NET-based Web systems and the J2EE system that supports live agents. We used events to manage this content cache; for example, triggers on changes within the content management system invalidate the cache in WebCache and force updates. This avoids serving up bad data.
Caching of reference data within the composite application (NGTS). This applies to reference data such as state, country, and destination names or different types of attributes that make up 50MB of data located in different data stores that is sourced from remote systems. Using the Java Object Cache in Oracle Application Server's J2EE container, LibGo cached this data at the mid-tier and avoided the overhead of remote system calls.
Caching of transactional data within the composite application (NGTS). This applies to the most perishable information, such as pricing information that relates to airline tickets and hotels. Each time a search is performed for a travel itinerary, the composite application draws upon the data cached in the Java Object Cache. If pricing information isn't available for part of the itinerary that is directly out of the cache, NGTS pulls the information in from the partner system. The result set is then stored in the cache for subsequent queries for both the same customer (in case the customer decides to change the hotel but keep the air ticket), as well as for other customers.
For LibGo, caching is a business necessity to meet performance requirements and avoid transaction costs. If you're developing an SOA, we strongly recommend that you consider multiple caching strategies to implement a better SOA. Always remember to cache your system codes. As long as you have management rules to avoid overcaching, caching will produce great returns.
Compensating Transactions Are Here to Stay
Remember the ACID properties of transactions (atomic, consistent, isolated, durable) and rollbacks in the database context? How about XA-based transaction managers that enable distributed transaction management across a number of platforms?
LibGo's business transactions are inherently distributed: travel bookings are usually packages, with each component provided by a different vendor. Early on we realized that most of our partners didn't offer sophisticated transactional interfaces to LibGo. Each time a customer request for a complete travel package (airline, hotel, car, and tickets to attractions) was processed, the NGTS system had to ensure that all constituent parts were available before committing the transaction. While standards bodies are addressing the general need to relax ACID properties and manage more complex business transactions within the context of Web services and SOAs, a pragmatic approach to manage our business transactions was needed. The idea to implement an all-purpose Web services transaction manager was quickly dismissed. Instead, LibGo decided to build a custom transaction manager and leverage existing transaction capabilities within their packaged applications, and at the same time, delay compensating transactions to the end of a serial transaction flow.
Avoid Compensating Transactions Where Possible
Our scheme was simple: easy-to-roll-back transactions would be executed up front, while the partner transactions would be delayed until the very end. Since the ERP system had sophisticated transaction capabilities, including rollbacks, LibGo decided that when building the composite application, the ERP-specific steps in the overall process are carried out first. If the ERP transaction didn't fail, the next gating steps would be executed - bookings across partner systems for airline and hotels. The model of sequentially ordered transactions minimizes the number of compensating transactions.
Work with Partners to Implement Compensating Transaction Logic Where Necessary
Some business transactions will need to be compensated. For this purpose, LibGo worked with partners to implement a scheme using unique transaction IDs that were passed in the XML request or SOAP header of each communication. The transaction ID would be set by the booking engine in NGTS each time a booking was to be made. Then, each partner system would append this unique transaction ID with its own information denoting the transaction path. This way, if part of a transaction failed, the compensation handler in the NGTS system has an audit trail that could be used to perform multistep compensating actions. This isn't necessary in the case of internal systems because these typically implement the rollbacks themselves, and so compensating transactions aren't necessary.
Here are LibGo's ground rules for distributed transactions in SOA:
LibGo also decided on one additional small upgrade: to place timestamps in the SOAP header or XML payload, depending on the service, and to require partners to implement a clock synchronization mechanism along with LibGo. This has greatly helped troubleshoot the complex transactions and pinpoint bottlenecks as part of the distributed transaction environment. In a distributed environment with multiple Web services, it's always hard to pinpoint the performance bottlenecks. However with the timestamp regime (you'll need a common time server), issues are resolved more quickly because it's possible to pinpoint the culprit right away.
Don't be intimidated by the new challenges of Web services transactions. Yes, you will have to worry about compensating transactions if partners don't participate in a transaction model, but often, a sequential approach to complex business transactions will minimize the chances of compensation. Don't try to solve world hunger; rather, find smart ways to leverage the context of your specific use case to delay compensating partner transactions to the very end. Make a unique transaction ID and a timestamp part of your transaction's SOAP header. Then, you'll be well prepared to execute compensation logic step-by-step to perform transaction playback.
Of course, standards will evolve for handling transaction semantics: WS-BusinessActivity, which formalizes undo semantics for long-running transactions, such as our booking transaction; WS-Coordination, which formalizes transaction IDs used across two or more transaction monitors; and WS-AtomicTransactions, which has stricter transaction semantics than WS-BusinessActivity and ensures strict, guaranteed atomic actions (such as writing a record to both a database and message queue). At LibGo, we continue to have a view to standards when implementing our SOA.
Pragmatism Pays with Security
LibGo has agents in stores, consumers online, and call center service representatives. The agents are responsible for bookings; managers must be able to obtain information on bookings and override policies. Given that the composite application incorporates business logic in NGTS, many ERP modules, and partner systems, the main challenge was to install a common access, authentication, and authorization framework across the applications that would enforce security and also enable auditing and logging (for compliance reasons).
To achieve this, LibGo used the HR model in Oracle e-Business Suite HRMS, along with Oracle Application Server Single Sign-On (SSO) and Oracle's Internet Directory (OID) LDAP store. User, resources, and entitlements from Oracle HR are populated into the OID store, which has application-specific objects. Every application and role has a set of entitlements; for example, agents may be allowed to accept partial payment for over-the-phone bookings, but customers who use the Web interface cannot do the same. All applications are then registered with SSO to provide SSO and role-based authentication for all applications via JAAS (the Java package that lets applications authenticate and enforce access controls upon users). LibGo uses Oracle Application Server Portal and SSO to consolidate services and bind them into a user interface, and to provide a common security and personalization framework for enabling access to packaged applications, business intelligence and reporting applications, and composite applications in NGTS.
To secure communications between LibGo and external partners, we took a pragmatic approach of using secure frame relay lines with VPN as a backup solution. Such Web-based security approaches are a little heavy-handed because they often secure the entire wire protocol rather than just the SOAP message that is sent over the protocol. Further, for many message-based integration projects, several intermediary steps are necessary before messages arrive at their target endpoint, and transport-level security leaves the messages unsecured at each intermediary checkpoint.
To achieve a finer level of control and to avoid the intermediary security issues, LibGo is moving from today's existing transport-level security to message-level security. WS-Security defines a mechanism for adding three levels of message-level security to SOAP messages:
Conclusion
Building an enterprise-wide SOA is challenging. As more capabilities move into standards and into the middleware stacks of the vendors, however, the task should become easier. For example, when LibGo embarked on this project, Web services orchestration solutions were in their infancy. Now, it is possible to get high-performance, manageability, auditability, exception management, and a framework for building compensating transactions from BPEL Process Manager. In building out our SOA, we had a clear view of the evolution of standards and how capabilities around security and transaction management would work their way into products. When building your SOA, make sure you have this view - so you don't end up producing tomorrow's legacy systems.