Dataworks Enterprise, The Distribution System and The Web

Traditionally, institutions employ a range of information distribution systems that can be loosely described as publishing market data content to users. Institutions choose particular technology solutions on the basis of the characteristics of the content and the requirements of the user, in conjunction with the cost/benefit analysis of the solution and a good understanding of the features and limitations of the technologies in hand.

Common distribution system architectures include traditional database access (via centralised database servers or through separate application servers), video switches, terminal solutions, market data distribution systems (MDDS or platforms) and a range of other techniques.

In recent years, Web technology has begun to deliver systems fully capable of distributing market data content to large numbers of users. Web Servers are increasingly becoming a central part of our distribution system strategy. Dataworks Enterprise is a tool that can help to integrate these technologies using a common set of paradigms.

In addition, distributed object technology is rapidly developing. Initially, this is being used in traditional RPC environments and as a basis for middle tier business functionality. However, this is relatively new technology and, consequently, its deployment will be limited to very specialised application domains.

In the near future, MDDSs will continue to form the core real-time distribution architecture with Web technology increasingly being employed for static or near-static content. To see why this is the case, we need to review the major features and limitations of the underlying technologies.

Web Technology vs. MDDS � A Background

Web systems are identified by the use of specific web-related protocols (i.e. HTTP 1.0/1.1) used to communicate between clients and servers. Many people use the term �web protocols� when they mean Internet protocols. To describe standard Internet Protocols as being Web protocols is a bit like describing all fish as trout. This confusion arises partly because Web Services employ a range of protocols taken from existing Internet standards, such as MIME (the multimedia email extension). It is also partly because web clients (or browsers) are able to communicate with a range of non-Web Internet services such as news, file transfer, mail and terminal sessions in addition to Web Servers themselves.

Using HTTP, Web publishing systems deliver data on a point-to-point basis as a result of receiving a specific request for that content from an individual client. Each request from client to server causes a new connection to be made to the server, the request passed across that connection, and the connection closed. In HTTP 1.1, the connection may be cached to reduce TCP virtual circuit set-up/tear-down times, but this is an optimisation and does not affect the basic interaction of the client and the server. As far as the user is concerned, each element of content is retrieved in a separate conversation with the server. Typically, a browser will compose a single page of information from the server using content retrieved via a number of individual requests.

The content of an HTTP request is, in principle, a �blob� of, typically text-based, data. Binary data is often encoded into a textual representation for transmission although there is no strict requirement that this is the case. Content is, typically, HTML or XML, although HTTP can be used to distribute images, documents and other types of content.

Unlike web-based solutions, market data distribution systems are optimised to deliver small quantities of dynamic content to a large number of users. The protocols used tend to be proprietary multicast protocols, usually based on low-level Internet protocols such as IP or UDP. The system is characterised by the server receiving requests from the system to open a stream on an item of data for one or more clients. Typically, the server is unaware of the number of clients and their identity. There is only an indirect relationship between an individual client request and that made by the system to the server. Since the connection between the client and the server is maintained, the server can push new content to the client as required.

The distinction between opening a stream as opposed to retrieving an item is an important one. Most web servers claim to support dynamic content through re-request of static data, rather euphemistically, called �server push�. An example of such an approach is �Web Channel� technology. Web �server push� is no more than a scheduled �client pull�: there are no actual �channels� in �Web Channel� technology. However, real-time updates are only possible where a stream of content flowing from client to server can be maintained, since it is only through this stream that servers can �push� content (in the form of updates) to clients.

In other words, web servers implement client-pull publishing (the client �pulls� the data on demand from the server). Market Data Distribution Systems implement �server push� publishing (the client expresses an interest in an open-ended stream of data, and the server replies with state and changes as they happen).

To ease this limitation, some web servers attempt to multiplex streams of data over public point to point protocols such as FTP. However, these will never achieve the performance levels of a multicast distribution system. Others use proprietary and public multicast protocols to distribute data. In both cases, they cease to be web servers at all.

The relationship of the server and the client is also important. Web servers are aware of the client making an individual request and are responsible for applying security mechanisms such as access permissions to individual items of content. A platform server is not aware of the particular client making the request and has to delegate the authorisation of users to access particular data to the platform.

Performance is also a key issue. As a result of the request/response paradigm, performance of the Web systems tends toward the linear. Each new request places additional and increasing load on the server. As more individuals make requests of the system, more CPU, memory and disk resources are required on the server to support the request rate and more network bandwidth is consumed. To increase the difficulty of deploying these systems, user requests can cause unpredictable traffic congestion, resulting in delays and a lowering of the overall quality of service of the network. The fact that there is a large number of caching solutions being devised, including local browser caching, Web Proxies and Web Distribution managers, illustrates the problem. Unlike Web Systems, Market data systems scale in a less than linear fashion (i.e. scale well).

Web solutions are, however, capable of supporting simultaneous access to a content set only limited by the power of the server. Since each request is satisfied on a round trip, only the storage space and CPU resources available to the server limit content. Web solutions are also capable of creating new content sets �on the fly�. For instance, a Web server may represent the results of a database query as content unique to the user who issued the query. This makes the web server well suited to three tier application architectures.

MDDS rely on the notion that at any given time the content set being viewed is a small subset of what is available, and, more particularly, that within any given group of users, a large proportion will be viewing the same limited set of items. This makes them well suited to distribution of commonly used data such as market prices, but less well suited to satisfying a random query from a database.

The direction of distribution is also a key factor. Platforms have largely been in the control of large data vendors who depend on the redistribution of contributed data. As a result, platforms have, traditionally, provided a means for transferring contributions from the client machine to central servers. Central contribution servers then provide fan-out and normalisation of the data, presenting the contributions in more than one vendor format (as required) and sending the data to that (those) vendor(s).

However, both Web services and distribution systems are optimised to deliver data from server to client. Neither technology is well suited to scenarios where data is being transferred �upstream�. Better technical solutions may be found in traditional database technology. However, the cost of deployment suggests that where a distribution mechanism is in place that can be used for limited upstream communication, this may be cheaper than installing additional technologies (the KISS principle).

Also increasingly important is the notion of total cost of ownership (TCO) in general, and cost to deploy in particular. Traditionally, the MDDS was the core of the distribution architecture. However, they are notoriously expensive to install and provide a quality of service far beyond what is required to support many end users. Web solutions appear much cheaper to deploy and can support large numbers of users with limited requirements. As a result, many institutions find themselves making the assessment of the cost to deploy a new system based on the requirement of the end user. For instance, the question of whether the users need real-time updates to the data needs to be asked. Many institutions run both systems concurrently to suit the wide range of tasks being performed and find that they then need the means to integrate the two approaches.

Many of the arguments associated with Web technology have concentrated on client technology issues diverting attention from distribution system infrastructure. Traditional Web clients (or browsers) are general-purpose pieces of software that allow the end-user to view displays of (and in some ways, to interact with) web content. Browsers have a lower rollout cost for large numbers of end users because they are standard pieces of software and are sometimes even delivered as part of the OS. Unlike browsers, clients to market data distribution systems are, typically, custom-built applications: These applications usually provide specific sets of functionality known to be useful to end-users. However, this is not led by the web technology itself. There is no reason at all why specialised applications could not employ web technology, or that browsers cannot be given access to market data directly (you can do this with Dataworks Enterprise).

Browsers are normally associated with thin client technology. Thin clients are intended to reduce TCO by reducing management and administration costs. However, the thin/thick client argument is also a bit misleading in relation to browsers. A thin client that is populated with large quantities of downloaded application code ceases to be a thin client at all. Indeed, from a TCO point of view, this is probably an administrative nightmare, defeating the original purpose of the technology. True thin client technology is concerned with restricting functionality to appropriate levels for cost reasons, such as the provision of view-only terminal positions. In the case of the browser, this would probably mean extremely tight control over all scripts, Java code and any other �downloadable� executable unit on the client machine apart from the displayed page. The easiest way to do this is to disallow it completely. It would also imply that upstream connections, particularly those that employ additional virtual circuits, would also have to be tightly controlled to minimise their impact on the network.

As shown above, the nature of web protocols indicates that Web servers are best suited to delivering static or near-static content. Equally, the market data distribution system is best suited to the delivery of real-time content. These are implicit by the very nature of the definition of �web service� and �real-time distribution system�. This is not to say that in some cases, it is not appropriate to use one for the other. Actual implementation technology will often depend as much on the availability of a technology as on its appropriateness. For instance, when deploying a new service on the web, we have to consider whether the users have the browser installed, what are the implications and potential bottlenecks in the network and so on. It may be the case that this new service, although appearing to suit the web server environment, will actually be more cost effectively deployed using a central database or the market data system.