Contact Radware Sales

Our experts will answer your questions, assess your needs, and help you understand which products are best for your business.

Business Case: Ellis Island Immigration Records Open to the Internet's Huddled Masses


March 4, 2002 02:00 PM

Network MagazineIndustry: Nonprofit organization
Headquarters: New York, NY
Technology in focus: Load-balancing
Project leader: Sam Daniel, director of information technology

Business challenge: On April 17, 2001, the Statue of Liberty-Ellis Island Foundation launched its Web site, containing extensive information on immigrants processed through the Port of New York and Ellis Island between 1892 and 1924. The site enables visitors to search through a massive database to find information about ancestors processed there during that time. But traffic to the site was much heavier than expected, and many users couldn't gain access. To complicate matters, the foundation added e-commerce capabilities to the site in June 2001, enabling users to purchase items such as ship manifest reproductions, as well as images of the ships their ancestors traveled on. Adding e-commerce meant the site had to maintain high availability levels.

Solution: Several hours after the site launched, the foundation swapped out its two Radware Web Server Director (WSD) load-balancing systems for updated models. The devices were housed at its Web-hosting facility, Hostcentric. The new devices, used in an active-active configuration, have helped to eliminate bottlenecks by distributing the traffic more evenly among the foundation's servers. The devices provide redundancy, higher availability levels, and enhanced bandwidth-management capabilities. Fault-tolerance mechanisms help to ensure that if a server goes down, the device reroutes traffic automatically without disruption.

The system's ability to maintain session state with AOL proxy servers has helped to ensure a high-quality experience for the site's many AOL users. The load-balancing systems have also eliminated the need for a large, expensive dedicated server for these users.

On April 17, 2001, Web surfers from around the globe embarked on a virtual journey that led them to fascinating discoveries about their ancestors. But some would-be passengers were left waiting on the shores of cyberspace, unable to access the Web site housing a massive catalog of U.S. immigration records. Hordes of visitors had already flocked there, rendering the site unavailable to many seekers of genealogical treasures.

The travelers' intended destination, www.ellisislandrecords.org, contains a database of information on 22 million immigrants and other passengers who came to the United States through the Port of New York and Ellis Island between 1892 and 1924. Established by a nonprofit organization called the Statue of Liberty-Ellis Island Foundation, the site provides a wealth of information on these arrivals. Visitors to the site can look up immigrant names, gender, ethnicity, marital status, arrival dates, age at the time of arrival, ports of departure, last residence, and the names of the ships they traveled on. The site, called the American Family Immigration History Center (AFIHC), hosts more than three million ship manifest pages and images of more than 800 passenger ships.

Since the site's launch date, information on Ellis Island immigrants has been made much more accessible. But the initial site-availability problems showed the foundation needed to fine-tune traffic management to keep history from repeating itself.

FOR THE RECORDS
The foundation was established in 1982-long before the AFIHC and its Web site were a gleam in board members' eyes. Its primary mission was to restore and preserve the Statue of Liberty and Ellis Island. In 1990, the restoration's centerpiece, the Ellis Island Immigration Museum, opened to the public. But many visitors mistakenly assumed that the immigration records of those processed through the island were housed at the museum, says Peg Zitko, director of public affairs for the foundation. At the time, these records actually resided on microfilm at the National Archives. With so many museum visitors requesting access to these records, the foundation decided to make them available in digital form-hence the AFIHC.

The AFIHC was established through a partnership between the foundation, the National Park Service, and the Church of Jesus Christ of Latter-day Saints. The undertaking involved about 12,000 volunteers from the church who helped to compile data from the ships' manifests. In April 2001, this information was digitized and made available to visitors to the island, as well as virtual visitors on the Web.

The project evolved to encompass partners, including Compaq and Oracle, which signed on as sponsors, and Radware (www.radware.com), which joined as a supplier. Today, the foundation has about 30 permanent employees, but ongoing operation and site maintenance involves staff at Hostcentric (www.hostcentric.com), the foundation's Web-hosting facility, as well as many contractors and vendor support staff.

OPENING THE FLOODGATES
It was no surprise the site generated such intense interest. After all, it holds the ancestry records of 100 million Americans, according to the foundation.

But nobody was ready for the deluge of traffic that ensued when the site went live. It processed roughly eight million hits within the first six hours of operation, but it actually received a staggering 27,000 hits per second, says Zitko. On Lycos, ellisislandrecords immediately became the number-one search term and remained in the top 50 for six weeks, she says.

Some disgruntled users who couldn't access the site expressed their frustration via e-mail. Others were staying up late at night-and into the early hours of the morning-trying to catch a lull in the traffic.

The site still gets plenty of hits around the clock, says Sam Daniel, the foundation's director of information technology. "According to our WebTrends reports, the site has very little downtime," says Daniel. WebTrends provides detailed data on Web site traffic patterns and visitors, which is then converted into reports.

The number and combination of variables that users can use to search the database undoubtedly drives much of this traffic. For example, visitors can search multiple spellings of an ancestor's name, search by last name only, or by name and gender. Each of these options translates to more potential hits against the database.

To complicate the traffic-management issue, the foundation launched the site's e-commerce portion in June 2001. Here, visitors can order reproductions of manifests that bear their ancestors' names and purchase images of the ships they traveled on. The foundation also added an online gift shop offering related genealogical items to visitors. Introducing commercial capabilities pushed the stakes even higher when it came to site-availability levels.

Later in 2001, a tragic, unanticipated event drove another wave of traffic to the site. After September 11, 2001, the site experienced an uptick in the volume of hits, says Daniel. People seemed compelled to dig for their family roots, and they flocked to the site in growing numbers, he says. This trend persisted into the holiday buying season, as visitors purchased manifest reproductions, images of ships, and other items from the gift shop.

During the site's brief history, the traffic flow hasn't diminished. Since the site went live, it's received more than 1.6 billion hits, and averages 120 million hits per month, says Daniel.

THE GREAT DATA MIGRATION
The network supporting ellisislandrecords. org was designed to provide flexible, reliable connectivity to the foundation's many resources. The network is distributed among three primary sites: the museum, located on the island; the foundation's corporate headquarters in New York City; and Hostcentric's data center, also in the city. To ensure data integrity, the information in the headquarters database, the passenger records database, and the database of user information at Hostcentric must be synchronized.

Although most of AFIHC's visitors are Web surfers from afar, museum visitors also generate a lot of traffic. The museum sports nearly a dozen kiosks, providing a preview of the center's research activities and enabling users to do a preliminary search of the immigration records database. More than 40 workstations provide access to the database and related programs. In addition to serving as access points, these systems track visitors' use of the center's resources, helping the foundation optimize system-availability levels.

The network's current configuration is shown in the figure. The foundation has a Cisco Catalyst 3600 router providing a point-to-point interface for its systems at the foundation's headquarters and on Ellis Island. The network has a primary and a secondary firewall. An HP Procurve 8000 switch serves as the backbone, and two Cobalt RaQ servers handle e-mail. Two Radware Web Server Director (WSD)-Pro+ load-balancing systems handle user requests to the server farm. The foundation has more than 30 Compaq servers at the facility. The center's immigration records are housed in an Oracle8i database backed up on a RAID 5 storage array.

Initial implementation of the AFIHC and the Web site took nearly a year, although tasks such as collecting the information for the passenger database and project planning took nearly five years. The foundation worked with multiple contractors. The roster includes Edwin Schlossberg, which designed the center and Web site; R/GA Interactive, which wrote the application enabling users to search the database and view images; and systems integrator Square One.

Others performed the less glamorous, but equally important task of wiring the facilities at the museum. Once technicians configured the servers at Hostcentric and ran the final test scripts, the site was off and running. Traffic immediately reached gargantuan proportions. After a few hours of operation, it was clear the network would need more load-balancing horsepower. The network's original two WSD-Pro systems were swapped out for more advanced WSD-Pro+ units.

Technicians added 10 servers to the server farm at Hostcentric. To help soothe the ruffled feathers of surfers unable to access the site, the systems issued a message informing them of the situation and inviting them to try the site again later. Although not an ideal scenario, it was a more appealing alternative than the impersonal standard error message.

According to Christine Pascarella, a vice president at Hostcentric who was integrally involved in the AFIHC project, a big challenge in dealing with the onslaught of traffic was making the necessary infrastructure changes in such a short time. The eagerly-awaited site had gotten a lot of publicity before the launch, and any problems accessing the site would be noticed quickly.

But the team knew that additional capacity wouldn't be enough. The project needed a long-range plan to ensure that performance and availability would remain high. The team analyzed the site's traffic and observed its impact once the traffic hit the servers in the server farm. After scrutinizing the nature of the traffic and examining scripts and executables run during user searches, the team developed a long-term strategy for smoothing out the traffic flow and maximizing availability.

CREATING A DIGITAL DEMOCRACY
Deciding to replace a major component of your Web site's infrastructure three hours after its launch is not an endeavor for the risk-averse. In this case, however, the team determined the payoff would far outweigh any possible hurdles.

The team opted to swap out the WSD-Pros because the systems in the server farm would have to reach and sustain optimum availability and performance levels to support the foundation's long-term strategy. Otherwise, more bottlenecks would develop once the e-commerce portion of the site went live. The project required a redundant, fault-tolerant infrastructure to effectively and evenly distribute the traffic among the systems in the server farm. In the event of a system failure, a secondary load balancer would have to kick in with the least possible disruption to ongoing user sessions.

To help ensure a seamless connection for the user, the system had to verify availability at the application level, as opposed to just performing standard ping checks. The WSD-Pro+, with eight Fast Ethernet ports and two Gigabit Ethernet ports, is a layer-4-7-based system that can perform application-level health checks based on the IP, TCP, and UDP protocols, as well as on content. A user-configurable option allows for some customization of monitoring capabilities.

Full-path health checks can be performed from the load balancer out to the Internet, helping prevent service interruption due to a downed router, switch, or other device. Performing availability checks at these levels can help stave off the common scenario of forwarding traffic down a path that appears to be intact, only to discover a breakdown at the router or application level. In the event of a potential problem or outage, the WSD-Pro+ can send alerts via SNMP traps or e-mail. Performance statistics and reporting enable Daniel and his team to keep close tabs on the overall status of the servers at Hostcentric.

In addition to these monitoring capabilities, the foundation also needed more brawny systems to handle the site's hefty traffic load. The WSD-Pro+ has a 9.6Gbit/ sec backplane and, according to Radware, can support 128,000 routing table entries and 500,000 simultaneous sessions.

The foundation's systems are deployed in active-active mode to help ensure high availability. Stateful failover helps to optimize user experience. Sessions are mirrored so if the primary system fails, the secondary system can take over without disruption.

The WSD-Pro+ has a shutdown mode that enables users to complete existing sessions, but prevents initiation of new sessions. Throttling capabilities let Daniel and his team bring a system down for maintenance and gradually reintroduce traffic to a server that's being brought back online, avoiding potential bottlenecks.

CLEARING NEW PATHS
Optimal traffic redirection is a major determinant of any load-balancing system's usefulness. The WSD-Pro+ enables these decisions to be made on the basis of a wide array of variables. When traffic hits the site, it's directed to a virtual IP address; the system then determines which server(s) is best equipped to handle specific types of requests.

In addition to DNS and HTTP redirection, the system can forward requests based on URL (including path, file name, and port information). It can also forward requests based on content, file type, Secure Sockets Layer (SSL) session IDs, and cookies. The ability to maintain session persistence on the basis of SSL IDs and cookies is important for sites with e-commerce applications.

The WSD-Pro+ enables traffic to be re-directed according to application. In one scheme, a collection of application-specific servers within a server farm shares a virtual IP address. The system forwards requests for an application to the most appropriate server(s) for that application.

Redirection can be based on various load-balancing algorithms. These include algorithms that allow traffic to be sent to the server with the fewest users, the smallest amount of traffic (in terms of packets per second), the fewest bytes, the lowest CPU use, or the quickest response time. The system can also use customized SNMP- and Windows-NT based algorithms.

The WSD-Pro+ can classify traffic by source and destination IP address, application, TCP port, content/URL, and cookies. For example, additional bandwidth can be allocated to sessions involving cookies, or traffic associated with specific applications. Scheduling algorithms include weighted round robin, class-based queuing, random early drop (RED), and weighted RED (WRED).

The system has multiple traffic-flow optimization mechanisms. For example, replies to requests directed through the WSD-Pro+ can be sent via the router directly to the user. This ability to bypass the switch helps to reduce the volume of traffic and prevent bottlenecks.

Regardless of how sophisticated a load-balancing system is, its functionality depends on optimal security. The WSD-Pro+ has packet-filtering and Network Address Translation (NAT) capabilities. The system can filter packets based on MAC address, IP address, and application. Other filtering capabilities include content screening, URL blocking, and URL filtering. According to Radware, the system can also monitor traffic against 450 common attack signatures.

By tweaking the system, Daniel and his team can establish a maximum connection threshold for each server. This helps prevent Denial of Service (DoS) attacks, such as SYN flood attacks, by reducing the chance of request overflows. Access control can be limited on the basis of IP address, protocol, and application.

TRAVELING LIGHT
According to Daniel, the Radware systems have performed well under the site's demanding load. In his previous job at a dotcom, he witnessed the impact of ineffective traffic management. In particular, he found his old company's load-balancing system lacking in crucial areas. One shortcoming was its inability to handle traffic from AOL users very well. Due to AOL's proxy servers, the system couldn't easily maintain session state-not a viable option for today's Internet-based business.

As a result of witnessing such problems, Daniel focuses on providing a seamless connection and optimal user experience. The WSD-Pro+'s ability to sustain persistent connections has enhanced users' quality of experience at the site, he says. In addition, it's eliminated the need to send all AOL user traffic to one large, costly, dedicated server. Instead, the system distributes traffic efficiently among existing servers, and the load balancers reduce the chance of a user being dropped if a server goes down.

If an outage occurs, users are transparently moved to a functioning server while session persistence remains intact, says Daniel. According to Daniel, these capabilities helped to maintain existing performance levels during the hectic holiday season, when traffic to the site-as well as revenue generated through the sale of manifests, images, and items from the gift shop-increased significantly.

In addition, Daniel says that the foundation's systems are also easier and less time-consuming to maintain and upgrade than the load balancer his previous employer used. And low maintenance means fewer employees for troubleshooting, which saves money. In addition, the system's ease of use has eliminated the need to hire an additional staff member to handle the systems at Hostcentric. Finally, not having to buy a large, pricey dedicated server for AOL traffic has helped the organization cut corners without sacrificing performance.

Due to the complex nature of the rollout and upgrade of the foundation's site, it's difficult to put a price tag on implementation. But, according to Daniel, the foundation allocated about $22.5 million for the entire AFIHC project. This includes the buildout of the center in the museum on Ellis Island, developing and deploying the Web site, creating and integrating the databases, and digitizing the manifest images and pictures of the ships. The cost also includes hosting services and ongoing operation and maintenance.

Behind the numbers lie some of the project's less tangible benefits. Being part of this effort has proved rewarding on a personal level, says Daniel. The degree of interest people have in finding out about their ancestry-and the ability to assist them in this quest-has made participating in the endeavor fulfilling, he notes.

UNCHARTED WATERS
Although the AFIHC's rollout proved successful, there's always room for improvement in any network implementation. "I think some people still have some fears about April 17, since they had a very difficult time getting onto the site," says Daniel. His goal is to continue to enhance the network's load-balancing capabilities to address these availability issues. Improving the user experience will also be a priority.

Toward that end, Daniel would like to incorporate content acceleration into the network infrastructure. Equalizing the experience of users with slower connections and those with higher-octane connections is a major item on his wish list.

Access to more detailed reporting on visitor movement through the site is also on that list. More granular reporting would let Daniel and his team fine-tune performance and availability levels. In addition, incorporating more detailed reporting capabilities into the network's load-balancing systems would reduce dependence on third-party products such as WebTrends, he notes. Of course, additional security is always a plus. "Coming from a paranoid perspective, everything is a potential hack," says Daniel. A vigilant load-balancing system frees up resources such as firewalls and Intrusion Detection Systems (IDSs) to do their job more effectively, he says.

But solid performance is still his first priority, and he would be reluctant to make a trade-off that would jeopardize it. "Once you start moving away from the initial core of a product and what it does, the harder it is for that system to function optimally in its existing environment," says Daniel. "It's kind of a toss-up. I'd like to have all the bells and whistles, but sometimes you just want to have that workhorse out there to take care of the primary functions."

Daniel says many site visitors and other organizations have approached the foundation with ideas about capabilities and features that could be incorporated into the site in the future. As the site is enhanced to keep pace with users' interests and requests, load-balancing requirements will become more stringent.

But in these tumultuous times in the Internet world, Daniel says the foundation is on the right track. "Coming from a dotcom environment, I'm very happy to see a site this successful," he says.

 

Already a Customer?

We’re ready to help, whether you need support, additional services, or answers to your questions about our products and solutions.

Locations
Get Answers Now from KnowledgeBase
Get Free Online Product Training
Engage with Radware Technical Support
Join the Radware Customer Program

Get Social

Connect with experts and join the conversation about Radware technologies.

Blog
Security Research Center
CyberPedia