Increasingly, we have a requirement to support some sort of symbol mapping or
cross-reference service. One purpose of Dataworks Enterprise was to provide �a
single source of data within Thomson (Thomson Financial)�. This means that
there should be a single API through which data can be retrieved and a single
set of data which can be retrieved through that API
Consequently, symbology mapping is a basic requirement of our system. Currently,
we use XREF to provide naming information, but XREF only handles those symbols
that are available within the GT system. It has a kludge to deal with Piranha
based sources. However, it has no knowledge of TF symbologies such as those in
use in First Call, ILX, etc. Equally, our customers also have their own
symbologies and there are those symbologies that are used by other third
parties including our competitors. It would be nice if we could equally support
these too. We need a mechanism to integrate all this.
This is most like a �super XREF� with ingest from multiple modules and would
simply be a means of mapping a symbol in one system to a symbol in another. This would be code that used "rules" to map one entity to another. Systems use
either database services or, even, simple name mangling to provide this sort of
service. This is a service that provides the ability to analyse large collections of data
for matching user specific sets of criteria. Within Dataworks Enterprise this
may be best achieved by providing a number of very flexible database population
tools and the ADO handler. This document is specifically concerned with the mapping service, not with
navigation or screening. The name mapping system at its most basic level is a service that accepts a
request for a source symbol, a source symbology and a destination symbology and
returns the appropriate symbol in that symbology. However, in as much as we
can, this whole process should be hidden from the user. This whole process should be implemented at the client side of the system. This
allows us to: There are two key points about cross-referencing in general. Firstly, there can
never be a complete solution to this problem. Any service that could be
provided must be �sold� on the basis that it is a best effort, rather than
being 100% accurate. The data issues associated with cross-reference data are
complex and largely out of our control.
Secondly, we have to take into account that there is every likelihood that we
will need to enhance and extend the service we provide possibly on a �per
customer-basis�. As stated above, most institutions have their own internal
naming systems that they use to map names for entities. These are typically
stored on databases and usually are part of some larger systems that are VERY
private. To be able to integrate with these systems, we cannot think in terms
of a single centralized naming service. It would be useful if some limited access to the service were provided to
downstream modules. For instance, the service should be able to accept a
request using a source symbology and symbol and return an entire record of
potential mappings available. The service should be extensible so that we can bring new symbols online
whenever we need to. This should not require bringing down the system to add
new symbologies. The service should be able to ingest from any source or be
able to be extended to do so. The service should be able to advertise the various symbologies it supports. The implementation of the service should be backwardly compatible with existing
usage of Dataworks Enterprise. We also need to take into account the technologies that we have available on a
customer site. For instance, the requirement to support broadcast-only access
to customer sites also imposes some constraints on the design of the system.
Any system that we create for name mapping must be distributable within
Dataworks Enterprise architecture. There are also business and contractual issues that would need to be resolved.
If we are going to provide this service, it needs to be separated from the
current contractual requirements to purchase the underlying source of the data.
For instance, to use XREF, customers have to purchase PGE data. If we have a
symbology mapping service, we would need our customers to be licenced for all
the sources that are providing mapping information, which is clearly nonsense. However, it should be possible to permission mappings so that they can be sold
as separate data entities within the existing Dataworks Enterprise
permissioning system. If mapping is not available there should be an acceptable
degradation in functionality. There are a number of existing systems within Thomson Financial that attempt to
support some features of symbology mapping services from basic to very
sophisticated. This system is supplied by DataStream and is the basis of the symbology
information provided by PGE. Henry is a database that provides details of the
relationships between entities such as quotes, stocks, companies, exchanges,
etc. This is a set of databases providing name lookup information for DataStream
users. Its source of information is an Access Database uploaded once per week,
plus a daily run of a mainframe program (4321B). Navigator is accessed via COM
or Corba using DAF. The data ultimately resides on a SQL database (Sybase, I
believe). PIO/TKO is a concordance database offering similar services to Henry and is
based in the States. PIO is intended to be the repository for this type of
information in Thomson Financial. PIO tends to have Thomson Financial data and
third-par5ty data only. Like Henry and Navigator, PIO is also access using
COM-Corba layers on DAF. XREF is a Dataworks Enterprise module that ingests PGE symbology information
(actually from Henry). It was originally written to cross-reference ISINs and
PGE symbols although it is now used for a lot more than this. It is not,
contrary to popular opinion, extensible. XREF provides queries through exposing
a Dataworks Enterprise source. These queries typically result is a response
that is either a set of name of matching records or a single record containing
columns for the various names it supports. We are not proposing in this
document to extend or replace XREF but rather to use the information it
currently can provide in a new way.
Note, we are discussing symbology cross-reference. There are a number of systems
in place whose role is to provide concordance services (e.g. PIO, Henry, etc.).
These systems attempt to provide sophisticated models of the financial
marketplace and expose the symbology mapping as part of that model. These
systems also provide some form of navigational system. For instance, these
systems can satisfy queries like "Give me all the symbols for UK Chemical
Stocks". We do not need anything so complex. It should be pointed out that the database and ingestion mechanisms for the
various symbology systems do vary. Many such systems reside on large databases.
The database may be populated manually or batch updated from other sources. The
databases are typically not accessible directly, instead some form of �API� is
provided to gain access to the information they hold. This API may be a
function interface is a high level language, COM or Corba. Alternatively, it
might actually be a file download. Some third parties distribute the data as a
regular CD, others "over the wire". Some vendors have terminal services
for doing lookup (screen scraping?). It would seem to make sense therefore that
we use our normalisation services within Dataworks Enterprise to gain access to
these systems and expose their functionality as a source). This makes our code
more distributable. The name mapping service will be implemented externally to the cache in order to
simplify development and deployment of the system and to enhance the
extensibility of the system. However, the cache will provide some mechanisms
that employ these services. These mechanisms will be implemented in the
client-side of the cache downstream from the permissioning elements but largely
hidden from the user. A new type of source will be introduced into Dataworks Enterprise, which is a
�mapping source�. These sources will identify themselves using the source
DataType �Mapping� (instead of �Record�, �Page� and �System� currently in use).
These sources will not be visible to the user directly. The purpose of these sources is to provide the basic name mapping services used
by Dataworks Enterprise. Each mapping source is actually a �front-end� to an
existing system, be it Hawk, PIO, Henry, XREF or some customer system. Source
level permissions would enable us to permission the data. The mapping source would have the task of: The mapping response is a record containing a series of fields one for each
symbology supported a non-empty result. The name of the field is the name of a
symbology and the value is the mapping. Where a value is empty, the mapping
record does not contain the field. For instance, for the symbol GB0004594973 using symbology ISIN, a mapping source
might return: The mapping source would be responsible for applying some rules to its local
query. For instance, where the request is for a source symbology that is
granular to company and the destination is a symbology that is quote based, the
mapping source would look for the domicile quote of the primary security of the
company. In the example above, the ISIN number refers to a security, but the
source operates at a quote level and so returns the domicile quote for the
security given. In the first instance, we would add an additional Dataworks Enterprise Source to
the existing XREF handler to provide a basic mapping function (�XREFMapping�).
This would allow us to support some basic symbology queries over a broadcast
link for PGE customers. In the future, I would imagine us providing other
mapping sources for our internal databases, creating mapping sources for file
downloads and possibly providing a prototypical mapping source as a basis of
mapping in customer sites. Within the cache symbologies would be converted to Local Names and stored in
this form. Local names are per-process reference counted names. They are passed
across the cache within EndPointData structures as strings. The main change to sources would be adding functionality to: Advertise the symbologies they support. By default, a source only supports a
single symbology (effectively, its namespace is a private symbology). Existing
sources would continue to work by virtue of the fact that they ignore the
Symbology property. However, mapping will not be applied to sources that do not
advertise the symbologies they support. On receipt of a request, they would query the Symbology property of the incoming
request and use this to update their query to the back end system. For
instance, Datastream handlers would prefix DSMnemonic symbology with �U:�.
Existing sources should never receive anything other than their native symbol
set. If they do get anything they will ignore it anyway. For instance, PGE might advertise a single symbology as it only supports a
single mechanism of request. However, a source such as Datastream would be
modified to accept the variety of symbology requests it supports. This would be
achieved by using a new form of createSource() described below. This would
allow the handler to advertise the symbologies that it supports. During a request, handlers would check the value of the RTItem�s Symbology
property. If this property were empty, the handler would continue to operate as
before. This provides backward compatibility. When the symbology property is
filled in, the DS handler would extract the text of the symbology, convert it
to a data reference and then query that reference for the symbologies it
supports in order of preference. The handler then converts the symbology to its
namespace (e.g. for an incoming Swift currency code, the PGE handler might add
the suffix �/�) and passes the query on to its existing request processor. Note
that the symbology of a request would not form part of the item discriminator
in the cache. Some handlers, like those based on Piranha, may optionally choose to extract
multiple symbologies from the data reference and construct a query based on
those. For instance, Piranha sources work best if they are provided with ISIN,
SEDOL and CUSIP numbers. The amount of changes required of a particular feed handler will depend on the
number of alternative symbologies it supports. Each source in the client source list will have a new property (collection?)
that represents the symbologies that the source supports. This will be set by
the source at mount time. This implies a new version of the CreateSource
methods in both the public and private interfaces of the cache. It also implies
that these properties can be shared between the server and client side through
a change to the EndpointData structure. Resolver uses this information to
resolve a query for data. NOTE: Changes to these structures will have an impact
on Archival (see below). We may choose to expose this through an additional
method in the RTSource interface or possibly by exposing an additional
interface on the RTSource object (this will affect both client and server sides
of the system). The data is in the form of a collection of simple strings
providing �standard� names for symbologies, e.g. [�ISIN�, �CUSIP�, �PGESymbol�,
�ILXSymbol�]. The key new module in the cache is the Resolver. In the first instance, this
will be completely hidden from the user. Resolver is located downstream of the
permissioning module (so that the Resolution system can be permissioned) and
will form part of the request chain. By default, Resolver will have no impact
on the existing request/response mechanisms in the cache. Records (and Items) will have a new property of (a simple string that might be a
single symbology or a collection of symbology mappings depending on the state
of the request. The following processing is performed: 1) When a bind is issued for any record, the Symbology property may populated
with the name of the symbology used for the request. By default, this value is
empty. This allows legacy code to continue to operate as before. If the
Symbology property is empty, the requester is implying that they are using the
native symbology of the source, i.e. a name that is valid in the namespace of
the source. In this case, Resolver is unused and the request is propagated as
usual proceeding to step 6. 2) If the symbology property has a non-zero value (non-empty string), Resolver
will take control of the request. It looks up the destination source in the
RTSources collection. If the source is not present, then the request operates
as before proceeding to step 6. 3) If the source is present, Resolver checks the symbologies supported by the
source. If the symbology is supported, Resolver packs up the symbology with the
request and sends it to the source (implies a change to the RequestInfo
archive). The symbology is passed with the request as a single RTDataRef
element and we proceed to step 6. 4) If the symbology is not supported, Resolver checks the �Mapping� sources
available. As stated above, �Mapping� sources advertise the mappings they can
perform. Resolver looks for a single hop mapping. If a single hop mapping is
found, Resolver issues a request to the mapping source for the mapping data
described above. On receipt of the data or status, Resolver checks the
appropriate field (which may not be present). If the field is missing, Resolver
looks for a different mapping and issues a new request. Once a mapping has completed, the request is updated by: In this event we proceed to step 6. 5) If Resolver cannot find a single hop mapping, it resorts to a two-hop
mapping. Resolver never attempts more than two hops due to the potential for
circular references. This involves looking for a pair of mapping sources that
support the source symbology, the destination symbology and a common third
symbology. Resolver makes a unique collection of these mapping pairs. For each
pair in the collection, it issues a request for the mapping from mapping source
that has the source symbology and the common symbology. It checks the common
symbology and if it is empty moves to the next in its unique list. If the
common value is populated, it issues a request using this against the second
mapping source. If destination symbology values are returned the Resolver
constructs a RTDataRef containing the union of all mappings provided. In this
manner the resolver searches for mappings between quite disparate Dataworks
Enterprise sources. 6) The request is issued to the source. As part of this development there will be a change to the archive structures
within the cache. This will be done in such a way that the archive will be
backwardly compatible for downstream clients. The new Archive structure will
introduce a prefixed length for the structures in the cache (Record, Status,
RequestInfo, EndpointData, Field) to allow us to extend these structures in
future version of the system without additional changes to the basic structure
of the Archive itself. Length prefixes would have variable length to reduce the
storage overhead, which could be large in the case of fields. When packing an Archive, the module will prefix its data with its length. This
might involve caching the lengths of the structures for performance reasons. When unpacking an Archive the major structures will test the Archive version. If
the version is older than that which provides prefix lengths it will read the
Archive as it does at present. Otherwise, it will note the current archive
position and read the length. It will then read its structure from the Archive.
Once the read is complete, the code will reposition the Archive according to
the length. However, this means that clients prior to the current revision of
the Archive will not be able to read Archives from the first version that
supports these prefixes. Many of the mapping sources have incomplete data. Many of the mapping sources have incorrect data. For a particular symbology, values are dependent on parties. Consequently, what
DS thinks is the ISIN may not be the same as what First Call thinks. The databases update at different times generating inconsistencies There is no mechanism in this model for generating updates to symbologyWhat is Resolution?
Mapping Service
Navigational Service
Screening Service
Basic Requirements
Current Solutions
Henry
Navigator
PIO/TKO
XREF
Glossary
Resolution and Dataworks Enterprise
Mapping Sources
Field Name
Field Value
DSCODE
900455
ISIN
GB0004594973
TIDM
ICI
PGESYMBOL
ICI.L
SEDOL
0459497
Changes to Existing Sources
Changes to the Cache
Changes to the RTSource Object
Resolver
Changes to the Archive Structure
Unresolved Issues