Sensor Web Services and Real Time Data
SensorGrid project aims to develop a flexible computing environment for coupling real-time data sources to High Performance Geographic Information Systems (GIS) applications. The system will be developed around Service Oriented Architecture (SOA) principles and High Performance Web Services ideas where real-time sources publish their data products via standard interfaces in common formats and clients have access to both streaming and archived data.
Traditionally GIS applications such as simulation, visualization and data mining tools are developed to access and manipulate archived geospatial data. But advances in real-time data acquisition techniques from various types of sensors introduced a new group of applications that are being designed to run on real-time data to provide real-time or near-real time results; such applications are gaining ground in platforms like Crisis Management or Early Warning Systems because they allow authorities to take action on time. Earthquake data assimilation tools are good examples of this group since they use data from Seismic or GPS sensors. However, most of these tools currently consume data from repositories and they do not have access to real-time data due to several reasons.
SensorGrid architecture will utilize open GIS standards and Web Services methodologies to couple data assimilation tools with real-time data. The system will use NaradaBrokering as the messaging substrate and this will allow high performance data transfer between data sources and the client applications. The Standard GIS interfaces and encodings like GML and SensorML will allow data products to be available to the larger GIS community.
Figure 1. Major SensorGrid Components
Figure 1. depicts a client's interaction with SensorGrid to gain access to streaming sensor data. The client discovers the related Sensor Collection Service (SCS) information by using search interfaces provided by Information Service (IS). IS returns a handler which contains the WSDL address of the SCS that has access to the particular sensor client requests. The client then sends a getData query to SCS. Depending on the nature of the query SCS may take two actions; if the query is for archived sensor data then it requests data from the Observation Archives and returns it to the client. But if the client wants to access real-time data then it returns a data handler which contains the broker information and topic name for the sensor. Also depending on the size of the archived data SCS may choose one of two options for data transfer; if the result size is relatively small then it is returned via SOAP message, otherwise NaradaBrokering is used. SCS also keeps information about the sensors themselves. This information is encoded in SensorML. After receiving the broker address and the topic name, client may subscribe to the NaradaBrokering server to receive real-time data.
GIS Standards for Data Services
Advances in Internet and distributed systems helped academia, governments and businesses to provide access to a substantial amount of geospatial data. The GIS community must face the following challenges:
- Adoption of universal standards: Over the years organizations have produced geospatial data in proprietary formats and developed services by adhering to differing methodologies;
- Distributed nature of geospatial data: Because the data sources are owned and operated by individual groups or organizations, geospatial data is in vastly distributed repositories,
- Service interoperability: Computational resources used to analyze geospatial data are also distributed and require the ability to be integrated when necessary.
The Open Geospatial Consortium, Inc (OGC) represents a major effort to address some of these problems. The OGC is an international industry consortium of more than 270 companies, government agencies and universities participating in a consensus process to develop publicly available interface specifications. OGC Specifications support interoperable solutions that "geo-enable" the Web, wireless and location-based services, and mainstream IT. OGC has produced many specifications for web based GIS applications such as Web Feature Service (WFS) and the Web Map Service (WMS). Geography Markup Language (GML) is widely accepted as the universal encoding for geo-referenced data. In addition to the more traditional HTTP request/response style services, the OGC is also defining the SensorML family of services.
The GIS community quite obviously represents a major sub-domain in the “Grid of Grids” [Building a Grid of Grids- Messaging Substrates and Information Management] picture. By architecting GIS services using Web Services, and be placing these services within a SOA messaging substrate, we may integrate GIS Grid Services with other applications. Our work on GIS services as Web Services is previously described in more detail in these papers: "Complexity Computational Environment: Data Assimilation SERVOGrid" and "Numerical Simulations for Active Tectonic Processes".
GIS applications developed by various vendors and academic institutions have become more complex as they are required to process larger data sets, utilize more computing power and increasingly in some cases need to collect data from distributed sources. Traditionally GIS applications are data centric: they deal with archived data. However, with sensor-based applications gaining momentum the need of coupling real-time data sources such as sensors, radars, or satellites to high end computing platforms such as simulation, visualization or data mining applications introduces several important distributed computing challenges to GIS community.
Although commercial GIS applications provide various solutions to these problems, most of the solutions are based on more traditional distributed computing paradigms such as static server-client approaches. Traditional point to point communication approaches tend to result in more centralized, tightly coupled and synchronous applications which results in harder management practices for large scale systems. Modern large scale systems on the other hand require more flexible asynchronous communication models to cope with the high number of participants and transfer of larger data sets between them.
Defining a Common Data Format
The first step for building such services is to decide appropriate encodings for describing the data. The importance of the data format lies in the fact that it becomes the basic building block of the system which in turn determines the level of interoperability. Use of a universal standard like XML greatly increases the number of users from different backgrounds and platforms who can easily incorporate our data products into their systems. Furthermore, services and applications are built to parse, understand and use this format to support various operations on data. So in a sense the type and variety of the tools being used in the development and data assimilation processes depend on the format initially agreed.
For these reasons we use GML, a commonly accepted XML based encoding for geospatial data, as our data format in GIS-related applications. One important fact about GML is that, although it offers particular complex types for various geospatial phenomena, users can employ a variety of XML Schema development techniques to describe their data using GML types. This provides a certain degree of flexibility both in the development process and in the resulting data products. For instance, depending on the capability of the environment schema developers may exclusively use certain XML Schema types and choose not to incorporate more obscure ones because of incompatibility issues. As a result a particular geospatial phenomenon can be described by different valid GML schemas.
By incorporating GML in our systems as de facto data format we gain several advantages:
- It allows us to unify different data formats. For instance, various organizations offer different formats for position information collected from GPS stations. GML provides suitable geospatial and temporal types for this information, and by using these types a common GML schema can be produced. (See http://www.crisisgrid.org/html/servo.html for sample GML schemas for GPS and Seismic data)
- As more GIS vendors are releasing compatible products and more academic institutions use OGC standards in their research and implementations, OGC specifications are becoming universal standards in GIS community and GML is rapidly emerging as the standard XML encoding for geographic information. By using GML we open the door of interoperability to this growing community.
- GML and related technologies allow us to build general set of tools to access and manipulate data. Since GML is an XML dialect, any XML related technology can be utilized for application development purposes. Considering the fact that in most cases the technologies for collecting data and consecutively the nature of the collected data product would stay the same for a long period of time the interfaces we create for sharing data won’t change either. This ensures having stable interfaces and libraries.
Establishing XML or some flavor of it as the default message/data format for the global system requires consideration of a Data Binding Framework (DBF) for generating, parsing, marshalling and un-marshalling XML messages. Marshalling and un-marshalling operations convert between XML-encoded formats and (typically Java) binding classes that can be used to simplify data manipulation.
Being able to generate XML instances and parsing them in a tolerable amount of time is one of the criteria while choosing such a framework, because message processing time would affect overall system performance as well as the performance of the individual XML processing component.
Another criterion to consider is the ability of the binding framework to successfully generate valid instances according to the Schema definitions. This is a major problem for DBFs since not all of the XML Schema types can be directly mapped to Object Oriented Programming constructs. Some of the XML Schema types (such as Substitution Groups which are heavily used in GML Schemas) do not correspond to types in Object Oriented world and this causes difficulties while processing the XML documents. Various Data Binding Frameworks offer different solutions, some of which are more elaborate than the other and depending of the nature of the data a suitable framework must be chosen.
GIS systems are supposed to provide data access tools to the users as well as manipulation tools to the administrators. In principle the process of serving data in a particular format is pretty simple when it is made accessible as files on an HTTP or FTP server. But additional features like query capabilities on data or real-time access in a streaming fashion require more complicated services. As the complexity of the services grows, the client’s chance of easily accessing data products decreases, because every proprietary application developed for some type of data require its own specialized clients. Web Services help us overcome this difficulty by providing standard interfaces to the tools or applications we develop.
No matter how complex the application itself, its WSDL interface will have standard elements and attributes, and the clients using this interface can easily generate methods for invoking the service and receiving the results. This method allows providers to make their applications available to others in a standard way.
The usefulness of Web Services is constrained by several factors. They can be used in several cases such as
- The volume of data transferred between the server and the client is not high. Actual amount of data can be transferred depends on a number of factors like the protocol being used to communicate or maximum allowed size by HTTP;
- Time is not a determining factor. Despite the obvious advantages, current HTTP-based implementations do not provide desirable results for systems that require fast response and high performance. This is simply due to the delays caused by data transfer over network, network constraints, and HTTP request-response overhead.
Most scientific applications that couple high performance computing, simulation or visualization codes with databases or real-time data sources require more than mere remote procedure call message patterns. These applications are sometimes composite systems where some of the components require output from others and they are asynchronous, it may take hours or days to complete. Such properties require additional layers of control and capabilities from Web Services which introduces the necessity for a messaging substrate that can provide these extra features.
Streaming with NaradaBrokering
Community Grids Lab has been developing NaradaBrokering; a distributed messaging infrastructure which goes beyond the remote procedure call methodology pure Web Services approach is based on. It provides two related capabilities. First, it provides a message oriented middleware (MoM) which facilitates communications between entities (which includes clients, resources, services and proxies) through the exchange of messages. Second, it provides a notification framework by efficiently routing messages from the originators to only the registered consumers of the message in question.
NaradaBrokering facilitates the idea of loosely coupled systems by supporting asynchronous communication and it can be used to support different interactions by encapsulating them in specialized messages called events. Events can encapsulate information pertaining to transactions, data interchange, method invocations, system conditions and finally the search, discovery and subsequent sharing of resources.
Some of the important features of NaradaBrokering can be summarized as follows:
- Ensures reliable delivery of events in the case of broker or client failures and prolonged entity disconnects.
- Provides compressing and decompressing services to deal with events with large payloads. Additionally there is also a fragmentation service which fragments large file-based payloads into smaller ones. A coalescing service then merges these fragments into the large file at the receiver side.
- Provides support for multiple transport protocols such as TCP (blocking and non-blocking), UDP, SSL, HTTP, RTP, HHMS (optimized for PDA and cell-phone access) and GridFTP with protocol chosen independently at each link
- Implements high-performance protocols (message transit time of 1 to 2 ms per hop)
- Order-preserving optimized message delivery
- Quality of Service (QoS) and security profiles for sent and received messages
- Interface with reliable storage for persistent events, reliable delivery via WS-Reliable Messaging.
- Discovery Service to find nearest brokers /resources
SOPAC GPS Services: Real Time Streaming Support for Position Messages
To demonstrate the use of technologies discussed earlier we describe GPS Services developed for the Scripps Orbit and Permanent Array Center (SOPAC) GPS station networks. Three of SOPAC’s GPS networks are distributed in San Diego Counties, Riverside/Imperial Counties and Orange County, and provide publicly available data. Raw data from the GPS stations are continuously collected by a Common Link proxy (RTD server) and archived in RINEX files.
The data collected from the GPS stations are served in 3 formats:
- RAW: For archiving and record purposes, not interesting for scientific applications, not available in real-time.
- RTCM: Published real-time and no records are kept. This is useful for RTCM capable GPS receivers as reference.
- Positions: Positions of the stations. Updated and presented every second. (Recently Orange County Stations are upgraded to 2Hz.) GPS Time Series can be produced using these positions and they can be in different epochs such as hourly, daily, etc.
Position information is used by applications such as RDAHMM (Regularized Deterministic Annealing EM for Hidden Markov Models), a time series data analysis program useful for mode change detection [Robert Granat, Regularized Deterministic Annealing EM for Hidden Markov Models, Doctoral Dissertation, University of California Los Angeles, 2004.]. The RTD server however outputs the position messages in a binary format called RYO. This introduces another level of complexity on the client side because the messages have to be converted from binary RYO format.
To receive station positions, clients are expected to open a socket connection to the RTD server. An obvious downside of this approach is the extensive load this might introduce to the server when multiple clients are connected.
After the RTD server receives raw data from the stations it applies filters and for each network generates a message. This message contains a collection of position information for every individual station from which the position data has been collected in that particular instant. In addition to the position information there are other measurements in a message such as quality of the measurement, variances etc. For each GPS network, the RTD server broadcasts one position message per second through a port in RYO format. This is depicted on the left hand sides of Figure 3.
As we discuss below, to make the position information available to the clients in a real-time streaming fashion we are using the NaradaBrokering messaging system. Additionally we developed applications to serve position messages in ASCII and GML formats. This allows applications to choose the format that they want for applications will additionally allow us to implement more finely grained network subscriptions: users and applications don’t have to process an entire network’s stream to receive the subset of GPS stations that they want. RDAHMM provides a specific example for this: we need to apply RDAHMM change detection to individual GPS station signals.
Decoding RYO Messages
As shown in Figures 5 and 6, the incoming data streams must be converted into various formats. This is done by using developed specialized services that subscribe to specific topics and republish the decoded data to topics associated with the new format.
For example, the RYO Message Type 1 starts with a 5-byte Header which is followed by a 47-byte GPS Position message. Three types of optional blocks may follow the Position Message and a 2-byte checksum is located at the end of the message.
Figure 2. RYO Binary Message Format
A non-blocking Java Socket connection is made to RTP server to collect RYO messages. We use thread programming techniques for this purpose. An Decoder application which uses binary conversion tools converts RYO messages into text messages. Furthermore since we do not expect clients to know about the GPS time format we convert GPSWeek and GPSmsOfWeek values to Gregorian calendar format (i.e. 2005-19-07/04:19:44PM-EST). Additionally since we anticipate some clients to expect position information in terms of Latitude and Longitude, we calculate Latitude, Longitude and Height values from XYZT Position.
GML Schema for Position Messages and Data Binding
We have developed a GML conformant Schema to describe Position Messages. The Schema is based on RichObservation type which is an extended version of GML 3 Observation model. This model supports Observation Array and Observation Collection types which are useful in describing SOPAC Position messages since they are collections of multiple individual station positions. We follow strong naming conventions for naming the elements to make the Schema more understandable to the clients.
We used Apache XML Beans for data binding purposes: these convert ASCII data streams into XML. SOPAC GML Schema and sample instances are available here: http://www.crisisgrid.org/schemas
Integrating NaradaBrokering with Streaming GPS Measurements
After we have services for decoding position information into three different formats we may integrate these services with NaradaBrokering to provide real-time access to data. The following figures depict the use of NaradaBrokering topics in the system. Figure.3 depicts the flow of data to interested subscribers: applications like RDAHMM, databases for permanent storage, and portal systems (such as QuakeSim) for human interaction. To support these various consumers, we must provide different versions of the data stream.
The basic routing techniques are illustrated in the following figure. The GPS network data streams are collectively made available by Scripps through ports 7010, 7011 and 7012. These three ports serve all the data from three distinct networks. The data is published in RYO format. We intercept this data through Java proxies that act as publishers on the RYO topics to a NaradaBrokering node. Subscribers to this topic may be any number of applications capable of handling these binary formats, including translation programs. As shown in the figure, these streams are translated into ASCII text formats by RYO Decoders. These decoders then publish the data back to the broker network on new topics, ASCII topics in the figure. Any number of listening applications may receive this data, including (as shown in the figure), GML Converters that transform the ASCII streams into GML (suitable for GIS applications) and publish back to new GML topics.
Figure 3. GPS network integrated with NaradaBrokering. Currently the system is being tested for the following networks: San Diego Counties, Riverside/Imperial Counties and Orange County GPS networks. Following table shows the station names and RTD-Server port numbers for accessing RYO binary messages collected from these stations:
|RTD Port No||7010||7011||7012|
|GPS Network||Orange County||Riverside/Imperial Counties||San Diego County|
The following tables show the current information for NaradaBrokering Server and topic names:
NaradaBrokering Server: xsopac.ucsd.edu:3045
We may add more filters to the data and develop more finely grained topics. For example, after decoding the binary stream, we may publish the individual GPS station data streams to individual topics.
SensorGrid Downloadable Software
Software is available from http://www.crisisgrid.org/files/sensorgrid/. We are in the process of moving this to a Source Forge repository.
OGC Sensor Web Enablement
OGC SWE is intended to be a revolutionary approach for exploiting Web-connected sensors such as flood gauges, air pollution monitors, satellite-borne earth imaging devices etc .
The goal of SWE is to creation of Web-based sensor networks. That is to make all sensors and repositories of sensor data discoverable, accessible and where applicable controllable via the WWW .
OGC SWE Specifications
OGC defines a set of specifications and services for this goal. Below are short descriptions of these services and the names and dates of the OGC documents I have covered.
- Sensor Web Enablement White Paper; Mike Botts, Mark Reichardt. A summary of the OGC’s SWE vision and the Sensor Web Services.
- SensorML: XML encoding language for sensors. Used to discover, query and control Web-resident sensors.
ref# : 04-19 date : 2004-09-24
- Observations & Measurements: The general models and an XML encoding for what a sensor observes or measures (The value returned by or derived from a sensor observation -e.g. quantity, count, boolean, category, ordered category, position-).
ref# : 03-022r3 date : 2003-02-04
- Sensor Collection Service: A service to fetch observations from a sensor or constellation of sensors. Provides real time or archived observed values. Clients can also obtain information that describes the associated sensors and platforms. This is the intermediary between a client and a sensor collection management environment.
ref# : 03-023r1 date : 2003-01-21
- Sensor Planning Service: A service by which a client can determine collection feasibility for a desired set of collection requests for one or more mobile sensors/platforms, or the client may submit collection requests directly to these sensors/platforms. SPS enables sensor tasking, acquisition requests, processing and simulation requests, and registration for alert notification.
ref# : 03-011r1 date : 2003-01-14
- Web Notification Service: A service by which a client may conduct a dialog with one or more other services. This service is useful when many collaborating services are required to satisfy a client request, and/or when significant delays are involved is satisfying the request. Provides a means for Sensor Planning Services to alert people, software, or other sensor systems of SPS results or alerts regarding phenomena of interest.
ref# : 03-008r2 date : 2003-01-21
Figure 1. Possible Configuration of Sensor Web Components for In-Situ Sensors 
SWE process and the documents have undergone significant changes; some of these changes and my comments (in blue) services are given here, each description is from the corresponding specification:
- The latest SensorML Recommendation Paper is expected to be released on 11-02-04 as document # 04-019. Other SWE specifications refer to older versions of SensorML we should try to use the latest version as long as XML Schemas of other services validate.
- The Observations & Measurements draft document brings some enhancements to GML3 Observations schema. The goal of these additions is to make the Observation concept more compatible with the sensor data. Thus the data retrieved from the sensors would easily be described. New complex types like ObservationArray and ObservationCollection are useful for defining Time-series data fetched from GPS stations.
- Sensor Collection Service is perhaps the center piece of the SWE architecture. It is supposed to provide information about the sensors or sensor platforms, return the capabilities of the sensors or constellations, and fetch observed values either directly from the sensors or from the data archives.
This service is constructed based on OGC OWS1.2 specification and not defined in detail. The specification is not up to date, for instance the Query element in the schemas are based on Web Registry Service (WRS)0.7.1 which is not available online or does not have a chance to become a recommendation anymore. I used OGC Filter Encoding to provide the Filter capabilities for Query. The capabilities document is based on OWS-1.2 specification, which provides schemas to generate an OGC_Capabilities document.
- Web Notification Service. This is an asynchronous service description which is intended to be a generic implemenation for all OGC services that might require notification services. A Notification service is especially useful in the cases where the service trading (publish-find-bind) paradigm is not sufficient. Observations that require preceding control activities or intermediate and/or subsequent user notifications favor asynchronous operations. For instance in the case of an Earthquake Observation process the client of the sensors must be notified of any extraordinary activy. Thus a notification mechanism becomes necessary.
To make use of the notification capabilities, users have to be registered to WNS beforehand. The registration procedure will be performed by a user or by an OGC Service that can act as a proxy for the user. For the registration users must provide identification information and the notification target (e.g. user’s e-mail address, phone number etc.). WNS does not provide a user data verification mechanism.
The registration procedure is not explained in detail. We can extend FTHPIS to provide a user/service registration framework for any particular WNS. Although the role of the registries appears to be very central in SWE it is discussed very briefly in a white paper and there are no implementation specific discussions avvailable.
Basically we need registries (or databases) for keeping information about Sensors/Platforms, Sensor/Platform Types, Observable Types, Observable instances etc. A basic implementation is to have tables for each registry in a MySQL database.
Although the OGC provides a WNS the WS-Notification can be considered here as an alternative. Thus we would include other benefits provided by WS-Notification.
We can also include Narada Brokering in this framework to pass asynchronous notifications from the sensors/constellations to the registered parties.This would provide us with features like reliable delivery, archiving messages etc.
- Sensor Planning Service. SPS is developed to support the information asset manager role. Collection feasibility plans and collection requests for a sensor or a sensor constellation is managed by SPS
It looks like this is not one of the crucial parts of SWE we need to implement immediately. We need to find a new approach for some of the tasks SPS is supposed to do. A probable approach is to use Information System to find and appropriate SCS for the user’s request.
The request-response dialog between SCS and the Sensors/SensorPlatforms is handled with XML messages. The Sensors are described using SensorML but the actual observations or measurements obtained from sensors are encoded in GML3.
As in the other OGC services a SCS provides information about the sensors it can communicate or collect data from. The client issues a GetCapabilities request to the SCS and in return receives a Service Instance (e.g. Capabilities) document. The capabilities document provides a list of sensor platforms supporting multiple sensors or a list of sensors associated with this particular SCS.
After receiving the capabilities document the client may ask a description of the Sensor or the Sensor Platform from the SCS by issuing a DescribeSensor or DescribePlatform request. The response model of this operation is described by SensorML elements DescribePlatform and DescribeSensor.
One important difference of the Sensor Services from other OGC Web Services is that asyncronous communication might be required between Sensors and the Clients. For instance we might want to know any earthquake activity for a certain area and thus the system should be able to notify us about the observations picked up from the sensors.
OGC Web Notification Service is described for this purpose. The user registers with the WNS by providing an identification and type of the notification he/she wants to receive, i.e. e-mail, http-call, sms etc. WNS then sends a notification to the registered users whenever an event occurs.
In the SWE context the WNS communicates with the Sensor Planning Service. Whenever a previously requested job is completed the SPS sends a ‘job ready’ message that includes the job id to the WNS. WNS notifies the registered user of that job about the situation.