Compilation of the GHCN Data Base

The compilation of the GHCN data base took place in several stages, beginning with data set acquisition.  The GHCN data base was assembled from the various national-, continental-, and global-scale data bases listed in Table 1 of the documentation that accompanies these files.  Most of the global data sets listed in Table 1 are derived from the WMSSC, and therefore contain many of the same stations (i.e., duplicates).  However, each also includes previously undigitized data that either extends the records of WMSSC stations or consists of observations from additional stations.  Similarly, most of the national- and continental-scale data sets in Table 1 contain numerous stations that have never been incorporated into a global data base.  In addition, several of the national-scale data sets, notably those from the USSR and China, were only recently made available through bilateral data exchanges and thus have rarely, if ever, been used by anyone outside their respective countries.

The second step in the compilation of the GHCN data base entailed scrutinizing and revising all station inventory parameters (i.e., country codes, station numbers, station names, latitudes, longitudes, and elevations).  Whenever possible, all such parameters were updated with the most recent information available from the World Meteorological Organization (WMO).  Assigned 3-digit country codes for all countries in the GHCN may be found in the files cocodes.f1 (sorted by country name) and cocodes.f2 (sorted by country code), which are contained in the CDIAC online directory /pub/ndp041.  These files can serve as the starting point for a user who wishes to work with data from select countries.

In the third compilation step, all data sets were merged and subjected to a process that removed the numerous "duplicate" stations.  On average, for each unique temperature and precipitation station, there were two duplicates, while for sea level pressure and station pressure, there was an average of one duplicate station for each unique station.

In the final compilation step, all stations in the data base were subjected to a two-part quality control analysis.  In the first part, all observations exceeding certain thresholds (obtained from world record values) were set to missing.  In the second part, each time series was plotted and inspected for "gross" errors (i.e., errors visible to the naked eye).  Some erroneous values were readily corrected (i.e., observations with missing negative signs, etc.), while others were uncorrectable and had to be set to missing.

Data collection (as opposed to analysis) was emphasized during the first year of the project.  As a result, the GHCN data base is considerably larger than most of its predecessors.  Specifically, the GHCN data base contains 80 and 100% more temperature and precipitation stations, respectively, than the WMSSC (the number of sea level pressure and station pressure stations is roughly the same for both data bases).  Furthermore, across all variables, many of the stations in the GHCN data base have longer periods of record than their counterparts in the WMSSC.

Only one restriction was applied to limit the size of the data base.  To be included, a station was required to have a minimum of 10 years of data for at least one of the four variables.  Consequently, the distribution of stations across the globe is uneven.  For example, industrialized countries such as the United States have a large network of stations with periods of record in excess of 10 years, while developing countries such as Brazil have only a small number of stations with long periods of record.  A detailed inventory of all stations in the GHCN data base is presented in Appendix C.  As a future goal, an effort will be made to develop a data set consisting only of long-term records from a network of stations that is more uniformly distributed across the globe.

Created with the Freeware Edition of HelpNDoc: Easy CHM and documentation editor