Dataworks Enterprise Publishing System

Dataworks Enterprise Publishing System is a full-featured platform independent publishing system.

Dataworks Enterprise Publishing System allows users to publish information via a central server to a distribution system.

Clients can use Dataworks Enterprise to directly expose data onto the distribution system using the distribution system plug-in. However, where there are a large number of users wishing to publish data or where the published data has to be combined into a small number of sources, this is achieved using the central publishing server provided. This provides system managers with better control of published data and the use of the distribution system as a whole.

The publishing system consists of two standard Dataworks Enterprise components, The Publishing Server and The Publishing Client. The Publishing Server accepts connections, using TCP virtual circuit technology, from one or more Publishing Clients. Publishing Clients monitor their local Cache for new sources. When a source application creates a source, it can mark the source to be published (see the SourceType property). The Publishing Client passes the information associated with these sources to the Publishing Server. The Publishing Server mounts the source on its local Cache, marking it to be exposed to the distribution system. In this way, the source becomes available to existing subscribers. In addition, The Publishing Server can perform a range of processes on the incoming data such as combining the data from multiple clients into a single source and applying permissions.

The Components of Dataworks Enterprise Publishing System

The Publishing Client interacts with the underlying Cache message streams through standard COM interfaces. It takes data from the system prior to the application of local caching algorithms. As a result, The Publishing Client does not maintain separate Cache copies of data passed to the server. This considerably reduces the processing and memory overhead of this component.

Publishing User Interface

In most installations The Publishing Client and Server are launched by the launcher application and are hidden by default. When shown, via the launcher task dialog, the components display a message log. Right clicking on the message log offers the option to clear the log. In addition, The Publishing Server displays a pair of tree views of its internal content, one sorted according to its currently connected clients and the other by the sources which it has mounted. For each client, there is a list of sources to which the client is connected. For each source, there is a list of clients connected to that source and a list of permanent supported by The Publishing Server on that source. From this interface permanent items can be deleted by selected the item in question, right clicking and selecting the ‘Delete…’ option.

Modes of Operation

Dataworks Enterprise Publishing System supports both server push and client pull idioms. In server push, the source application places data in the Cache when the data becomes available to it (without waiting for a request from the client). This is useful where the content supplied by the source is fairly small and predictable or where there is no mechanism to request the data on the fly as with a broadcast feed. In the client pull idiom, the source application waits for requests for given content and satisfies those requests on the fly. This is well suited where the content set is large and there is a mechanism for retrieving the data. Most real-time services operate the latter mechanism. Sources can use both the push and pull models for their items, but an individual item is either push or pull. This is helpful where there is a source with a certain number of preloaded items, which can be pushed into the Cache and the rest are requested using client pull.

Permissions

The Publishing Server operates a permissions system to allow the administrator to control who can create or modify published items. The permissions system is based around the notions of authorisations and permissionable entities.

The Publishing Server configuration files allow the administrator to define Groups, which are collections of users, hosts or a combination. The files also define permissionable entities such as the Server, a particular source or a particular item. Each entity has an Access permission and a Create permission (except items which only have an Access permission). The Create permission indicates whether a particular group, user or host is allowed to create sub-entities from the entity. For example, the Create permission of the Server indicates who is authorised to create Sources on the server.

Permissions consist either of an Allow list or a Deny list allowing positive or negative permissions to be set up. Access to a permissionable resource is granted on the basis of either being in an ‘Allow’ list or not being in a ‘Deny’ list.

The security mechanism is based on trust. Each client provides the server with the name of the host and the user operating the connection.

All permissions in the system are controlled using a standard permissions structure which is ‘matched’ against the appropriate elements of the client structure. Each permission element consists of a permission flag and a user-host list. The flag has the values ‘Allow’ or ‘Deny’ and determines the operation of the list. Thus, the list can be seen as either an ‘Allow’ list or a ‘Deny’ list. Each list element contains a host, a user or both.

Permanent Items

The Publishing Server has the facility to maintain permanent items, i.e. items whose state is always good even if the client disconnects. This can be very useful where the item is a report or other data that can be considered ‘good’ over a long period of time, since it allows the contributing client to disconnect without ‘greying’ the data.

The Publishing Server, by default, backs up the content of permanent items into file storage so that it can reload the content on restart. In addition, the Server also provides a facility to Journal changes to the permanent items. In the event that the server crashes, the journal is used to bring the backup image up to date.

The Server component provides an interface for deleting permanent items.

Name Spaces and Load Balancing

The Publishing Server determines how to combine sources on the basis of the source name. If two clients connect and create a source with the same name, the Server will load balance requests between those sources. If one of the clients disconnects, the requests are reverted to the other. When the client comes back online, the requests are redistributed on the basis of load balancing new requests. In common with the distribution system, the Publishing Server assumes that two sources with the same name represent the same content set.

The Publishing Server also supports the notion of name spaces. Name spaces are collections of content that need to be merged by the server into a single source. Name spaces are indicated using the colon (:) character. For instance suppose that we need to construct a source MY_SOURCE which is a combination of information from two clients, one supplying calculations from Excel and the other supplying real-time data. We would configure the source applications to produce a published source called MY_SOURCE:EXCEL and MY_SOURCE:RT respectively. The Publishing server recognises the : colon character and treats the two sources as being name spaces within the MY_SOURCE source. When requests arrive for this source, the Server sends a request to one client within each namespace for the source (load balancing between the clients of a given name space). The results of these requests are merged by The Publishing Server.

The name space and load balancing facilities are only supported for items using the client pull model.

Resiliency

As described above, multiple clients can connect to a single Publishing Server. This allows multiple applications to provide data for a given source. However, there is still a single point of failure in the Publishing Server itself.

To resolve this problem, a single client host may run more than one Publishing Client, each of which connect to different Publishing Servers. Each Client uses its executable name as a configuration name, allowing more than one Publishing Server to be configured. In the description provided below, we only use the standard client name ‘PressClnt’. For instance, the TCP port configuration is described as:

Tosca.PressClnt.Port: 6510

If there a second instance of the client is required, copy the binary into the user binary area and rename it (for example, ‘PressBkUp.exe’). The new binary will now read its port from the configuration:

Tosca.PressBkUp.Port: 6520

The Publishing Clients will replicate source application data to each of those servers for resiliency. Each Publishing Server connects to the distribution system, which provides mechanisms for load balancing between multiple servers providing the same source data.

The Publishing Server is a single instance component and will not allow itself to be run more than once as there is no benefit to doing so.

History

This component formed part of the first release of Dataworks Enterprise

Documentation

None

Dependencies

None

Configuration

Dataworks Enterprise Publishing System Installation

Dataworks Enterprise Publishing System can be installed directly from the distribution CD. During installation, the appropriate binaries and configuration files are copied to the disk and the launch scripts updated appropriately. Typically, The Publishing Server is installed on a central host and one or more clients configured to connect to it.

Prior to execution, some additional changes may need to be made to the hosts running the client software to allow the client to identify the host that is running the server software. If the client and server are run on the same IP sub network, Dataworks Enterprise Components automatically deduce these configurations making further configuration unnecessary. If not, the configuration may be achieved in a number of ways depending on the site configurations. This might involve:

  • a) setting The Publishing Client configuration to specify the hostname of the server, or
  • b)updating the HOSTS file to add ‘publisher’ as an alias for an existing server, or
  • c)updating a local DNS system to add ‘publisher’ as an alias for an existing server.

    The client and server also have configurations for port numbers and TCP optional parameters which, depending on the site, may also need to be changed.

    The system-supplied configuration files are installed into the standard configuration directory. Users should not edit these files. Instead they should create their own user configuration files (PRESSCLNT.USR and PRESSSVR.USR) and place these files in the configuration search path. These files will be included automatically by the standard configuration file. (See Configuration Files and Databases).

    All entries in The Publishing Client configuration file begin with the prefix ‘Tosca.PressClnt’ indicating that the client is a standard Thomson Financial component called PressClient. Similarly, all entries in The Publishing Server configuration file are prefixed with ‘Tosca.PressSrv’.

    In many cases, The Publishing Client and Server have the same set of configurations. In the following situations, <component> should be replaced by ‘PressClnt’ or ‘PressSrv’ as necessary.

    General Configurations

    The Publishing Client and Server support application-level heartbeats with an associated timeout in seconds. These are configured using ‘Tosca. <component>.Heartbeats’, which has a value 0 or 1, and ‘Tosca. <component>.RetryTime’. Each component sends a heartbeat message every timeout seconds. If the peer does not receive either data or a heartbeat in the timeout, it attempts reconnection. Reconnection attempts are operated on every timeout during the disconnection phase.

    TCP Specific Configurations

    Many of the TCP configurations are shared between The Publishing Client and The Publishing Server. These include the service, port and many optional TCP parameters.

    ‘Tosca.<component>,Service’ specifies the name of the service used to connect to the well-known socket of The Publishing Server. If the service name is not defined, the components attempt to retrieve a port number to use as specified by ‘Tosca.<component>.Port’. If this is not defined, The Publishing system uses port number 6510.

    In the case of the client, the configurations also define the name or names of server to which to connect. This is held in the ‘Tosca.PressClnt.Server’ configuration. This is a semi-colon delimited list of names of servers. This may be a DNS name, an IP address in dotted notation or an alias as provided in the HOSTS file.

    Publishing Client Configurations

    In addition to the heartbeat timeout, The Publishing Client also has a login timeout, which can be specified using the ‘Tosca.PressClnt.LoginTimeout’. If the server fails to accept login within this period, The Publishing Client will disconnect and re-enter the reconnection loop.

    Publishing Server Configurations

    Backup Files and Journals

    The Publishing Server will automatically backup permanent items to non-volatile file storage. ‘Tosca.PressSrv.BackupTimer’ sets the period of backup. Every backup period and during shutdown, the server checks all sources and backs up an image for each permanent item. On server reload, permanent items are pushed into the Cache using the stored image. Setting the backup period to 0 disables the backup except during shutdown. The default backup period is every 30 minutes. The location of the backup files is set using the ‘Tosca.PressSrv.BackUpPath’ setting, the sources index file being specified in ‘Tosca.PressSrv.BackUpFile’.

    If the ‘Tosca.PressSrv.DoJournal’ configuration is true (1), The Publishing Server will construct a Journal of updates that occur between backups of permanent items. If the server crashes between backups, the journal is replayed when the server is next loaded. The journal flushes any changes to permanent items to the hard disk every minute unless the ‘Tosca.PressSrv.FlushImmed’ flag is set to 1 indicating that all updates to permanent items should be immediately flushed to the hard disk. Deciding which approach to use is a matter of judging the number of items, the frequency of updates and the overhead of flushing changes to the hard disk. Obviously the journal file size can get huge if there are many permanent items frequently updating. You need to fine-tune the balance between the backup time and the permitted maximum journal size. You can turn journal recording off but in the event of a crash, you will lose any updates since the last back up. If you are pushing a lot of items that have frequent updates, it is probably best to keep them as non-permanent. If the ‘Tosca.PressSrv.DoJournal’ configuration is false (0) the journal facility is disabled. Note also, that if the permanent item backup is disabled, then the journal is also disabled since it serves no useful function.

    Publishing Server Authorisations

    The Publishing Server operates a permissions system to allow the administrator to control who can create or modify published items. The permissions system is based around the notions of authorisations and permissionable entities.

    The following discussion assumes that we have a source called FRED with items INST1 and INST2, groups called Forex and Fixedincome, users called keith, sarah and joe and machines called mars and mercury.

    The Publishing Server is represented by the ‘Tosca.PressSrv.Permission.’ configuration. There are three entities of interest, the permissionable resources, the permission data and the authorisation information (or values).

    Permissionable resources are hierarchical from Server to Source to Item.

    Servers have an Access permission which determines who can log in and a Create permission which determines who can create sources. Sources have an Access permission (who can edit) and a Create permission (who can create items). Items only have access permissions.

    Resources ending with .Access are access permissions and those ending with .Create are create permissions. Thus, in our example, the Server Access/Create names are:

    Tosca.PressSrv.Permission.Access ...
    Tosca.PressSrv.Permission.Create ...
    

    The Source Access/Create Base Names are:

    	Tosca.PressSrv.Permission.FRED.Access ...
    	Tosca.PressSrv.Permission.FRED.Create ...
    

    And the Item Access Base Names are:

    	Tosca.PressSrv.Permission.FRED.INST1.Access ...
    	Tosca.PressSrv.Permission.FRED.INST2.Access ...
    

    Permission Data is organised according to lists associated with each resource. This list has a type which is either ‘Allowed’ (i.e. everyone is denied unless they are on the list) or, by default, ‘Denied’ where everyone is allowed except those on the list. (Thus, with no permissions, everyone is allowed access to everything.) Deny lists are the default. For example:

    The Server Access Key might be:

    	Tosca.PressSrv.Permission.Access: Allowed=...
    

    The key value is a comma-separated list of either user@hostname or groupname strings known as Authorisation Data. Either the host or the user can be omitted. If the '@' is omitted, the string is taken to be a group name.

    Example Server Access:

    For instance, the following allows access to the server to joe (from anywhere), keith from mars, anyone from mercury and any member of the group Forex:

    	Tosca.PressSrv.Permission.Access:Allowed=joe@,keith@mars,@mercury,Forex
    

    Groups are defined using the Groups key and may be either groups, users, machines or both. For instance, suppose the Editors group consists of the user ‘Fred’ on ‘apollo’, members of the Admin group, the user ‘daphne’ on any machine and anybody on the machine mars. This could be defined as:

    	Tosca.PressSrv.Group.Editors:Fred@apollo,Admin,daphne@,@mars
    

    Example Server Configuration:

    By default, deny all access except that which is allowed:

    	Tosca.PressServer*Access:Allowed=Admin
    	Tosca.PressSrv*Create:Allowed=Admin
    

    Declare the user ‘Administrator’ from a given machine ‘console’ to belong to an Admin group that is referenced in all groups

    	Tosca.PressSrv.Group.Admin:Administrator@console
    

    Now, declare Forex as a group that contains Admin, the user sarah and the user joe on the machine mars

    	Tosca.PressSrvr.Group.Forex:Admin,sarah@,joe@mars
    

    And the FixedIncome group containing Admin, the user keith on the machine mars and anyone on machine mercury.

    	Tosca.PressSrv.Group.FixedIncome:Admin,keith@mars,@mercury
    

    Now create the server Access keys. Access is allowed only to the FixedIncome and Forex groups (and by implication the Admin group). Create Permission is only allowed to Admin for new sources

    	Tosca.PressSrv.Permission.Access:Allowed=Forex;FixedIncome
    	Tosca.PressSrv.Permission.Create:Allowed=Admin
    

    Now restrict access to each RTSource by group

    	Tosca.PressSrv.Permission.FIAnalytics.Access:Allowed=FixedIncome
    	Tosca.PressSrv.Permission.FXAnalytics.Access:Allowed=Forex
    	Tosca.PressSrv.Permission.FIAnalytics.Create:Allowed=Admin
    	Tosca.PressSrv.Permission.FXAnalytics.Create:Allowed=Admin
    

    Now restrict access to each Instrument by user

    	Tosca.PressSrv.Permission.FIAnalytics.INST1.Access:Allowed=FixedIncome
    	Tosca.PressSrv.Permission.FIAnalytics.INST2.Access:Allowed=Admin,keith@mars
    	Tosca.PressSrv.Permission.FXAnalytics.INST1.Access:Allowed=Forex
    	Tosca.PressSrv.Permission.FXAnalytics.INST2.Access:Allowed=sarah@
    

    The permissions offers the administrator tight control over who can access given items in The Publishing Server. However, the security mechanism supplied by the server is based on trust between the server and its clients, since clients have to identify the user and host for the connection.

    Sources

    This component may optionally have a management source that provides statistics on current usage. For the client, the name of the source is:

    	PubClnt.[config]@[machine_name]
    

    For the server it is:

    	PubSvr.[config]@[machine_name]
    

    To use these sources, request the instrument "CATALOGUE". This will return a record containing a list of other records which can be requested and a list of commands that can be executed by the component. The actual content of this record depends on the module in question and is subject to change from time to time.

    Known Problems

    None