Extending and combining the HSN database with the LINKS database, demonstrator: Zeeland (HLZ)
Historical life courses provide a unique insight in how societies change over time. They make it possible to analyze the nature, timing and relative importance of the constitutive elements of social change. The micro data revolution has reshaped scientific research in history, sociology and demography enormously during the last forty years. The Historical Sample of the Netherlands (HSN), coordinated by the International Institute of Social History (IISH), collects these kind of data for the Netherlands.
The HSN is a dynamic dataset that is improved and extended on a permanent basis. The HSN database is used for historical research in the domains of a.o. fertility, mortality, migration, social mobility, illiteracy and family history.
The main source of information of the HSN for the period from 1850 on, are the municipal population registers. Although they allow us to follow the life course of individuals from cradle to grave and from place to place, they have some shortcomings that make these sources less suitable for some research topics. That relates for example to the study of the historical fertility decline. The HSN offers life courses of parents with which one can study their fertility history but these the life courses sometimes contain gaps (in time and/or space), which make it plausible that information on their fertility history is incomplete. This applies for example to children dying soon after birth but also to children registered as stillbirth; both categories are often respectively as a rule not included in the population registers. They are however included in the vital registration system. Population registers use the household as a registration unit. That implies that information on the wider kin network is not included in the HSN. Again this information can be deduced form a combination of information from the vital registration.
The LINKS dataset, containing linked data from the civil certificates on births, deaths and marriages from 1812 onwards into the late 20th century is ideal to be used for the improvement and completion of the HSN dataset. The extension of the HSN with data on relatives from LINKS will make it also possible for researchers to include the wider kin network in the life course of the HSN research person (HSN RP) and his/her household. For example the database will be extended with information from the marriage certificates of the (grand)parents and parents in law of the HSN RP, from brothers and sisters and from his/her own children (see figure 1).
With an integrated HSN/LINKS database we are better able to map and explain the profound changes in family structures during the 19th century and its effects on the lives of children and young adults. The extended database will lead to improved and more complete information on the life courses of the sampled individuals. The extended data set will also show how the structure of the family network has changed over a larger part of the 19th century. It will also allow answering questions about how the effects of specific family constellations on demographic behavior and life chances have changed over time. For instance: what are the effects of the presence of parents on child survival and to what extent has the size of sibships changed over the 19th century; what have been the effects of living in with grandparents as an infant or child for one’s survival chances, and have these kin influences changed over time as child mortality was reduced because of better economic and hygienic conditions?
The Historical Sample of the Netherlands (HSN) reconstructs the life histories for a representative portion of the nineteenth and twentieth century population in the Netherlands. The data are available for research by way of the HSN Data Release Life Course 2010.01. Measured by the expanding number of articles, often in the most cited international journals, and by the number of extensions of the database financed by individual researchers the HSN-database proved to be a success. The database has grown into one of the most important ones in its field and plays a major role in the development of new methodology making databases with longitudinal data internationally comparable. In 2010 the HSN was the first Dutch database to be awarded with the DANS data price.
The database of the HSN covers the entire country including about 78,000 individuals born between 1811 and 1922. For about 44,000 of these persons the database contains not only basic data from the birth, marriage and death certificates but also data from the population registers. It is by way of these registers that the life courses are reconstructed including data on each successive family situation in which the individuals lived, all the addresses where they lived, as well as data on the religion and occupational title of each subject and of every person with whom they co-resided (Mandemakers, 2006). By way of the Personal Card System and its successors the database reaches even into the present. Most of the life courses were realized with the earlier NWO_Groot project that concentrated on the birth period 1863-1922. Life courses data from the period 1843-1862 have been collected for some provinces among which Zeeland (already starting in 1843). The data are entered, processed and released according a scheme with best practices for large historical databases (Mandemakers and Dillon, 2004).
LINKS is one of the projects run at the HSN and stands for 'Linking system for historical family reconstruction'. The project is financed by the NWO/CATCH program and aims at the reconstruction of all nineteenth and early twentieth century families in the Netherlands. The project is based on the data of the WieWasWie project (WWW, formerly known as GENLIAS), which is a database with the basic information of all civil certificates from this period. The system of vital registration in the Netherlands is a legacy of the Napoleonic period. It prescribes and regulates registration of births, marriages and deaths, and has resulted in a huge and invaluable source of information for scholars and the general public. For over fifteen years numerous volunteers have been working to build this database, which contains not only the names of born, deceased and married persons, but also the names of their parents, places of birth, ages and partly their occupational titles. The database of WWW currently holds key information on certificates of marriage, death and birth from 1811 until the (shifting) year that these data become publicly accessible: as of 2012 that is 1912 (birth certificates), 1937 (marriage certificates) and 1962 (death certificates).
The availability of this dataset offers an enormous potential for scientific research provided that individuals are linked into families. As a consequence of the high degree of fuzziness of the spelling of both first and last names and the enormous amount of data (and possible links), linking is a complicated task. The LINKS-database currently contains about 20 million certificates with 70 million entries of individuals. Central focus is on marriage certificates the data entry of which is expected to be more or less complete for the whole of the Netherlands in 2012. By way of linking these certificates LINKS creates a family network of pedigrees and families. The Zeeland part of this dataset Zeeland contains about 190,000 marriage certificates for the period 1812-1937, about 680,000 death certificates for the period 1811-1962 and about 670,000 birth certificates (period 1812-1912).
So, for the province of Zeeland the LINKS dataset has been completed as far as legally possible (births till 1912, marriages, till 1937 and deaths till 1972) and it is also the province in which the HSN has realized the most complete set of life courses, starting from 1843. For these reasons the province is ideal to test the impact of the integration of the HSN database with the LINKS data.
HSN data are owned by the HSN foundation. Although the database of the HSN is an historical database in which the largest part of the included individuals is no longer alive, some still are. This implies that the HSN is bound to the regulations of the Dutch Personal Data Protection Act (Wet Bescherming Persoonsgegevens). Secondly, although most of the data come from records which are open for the public, some of the data have only been made available by the archives for the HSN-database only for scientific or statistical research and under the condition of anonymous use of the data.
The handling of the data in the HSN database has been determined in the HSN privacy regulations. As a consequence of the foregoing all data in the HSN are only accessible for scientific or statistical research and all data are made anonymous. In case of possibly still living persons, the HSN data will only be made available for researchers after they have signed a license agreement in which the researcher agrees to respect the HSN privacy regulations; the anonymous nature of the data and declares to use the data only for the mentioned goal and not to redistribute the data. For researchers who for research purposes are specifically in need of the non-anonymous data, individual arrangements can be made in which data can be handled in a safe environment.
By way of these regulations the HSN follows the Code of Professional Conduct and Ethics in the Use of Personal Data in Scientific Research as it is proposed by the Social Science Council of the Royal Netherlands Academy of Arts and Sciences (Gedragscode voor gebruik van persoonsgegevens in wetenschappelijk onderzoek (Koninklijke Nederlandse Akademie van Wetenschappen, Amsterdam, april 2003).
The data from the LINKS project are transcribed from records open to the public by numerous volunteers and are owned by the regional archives. They are made available to the HSN only for scientific research. This implies that they may only be delivered without the names of the persons. This is in accordance with HSN policy not to supply names in data releases. The data of the Zeeland demonstrator will be released under the rules and practices of the HSN.
After signing a licence a researcher is allowed to download the relevant data releases from the website of the HSN at the International Institute for Social History.
The HSN is leading in the development of new data structures for large historical databases and Kees Mandemakers, head of the HSN, is chairing the European Historical Population Samples Network (EHPS-Net). This recently launched network is financed by the European Science Foundation (ESF) and brings together scholars from all over Europe to create a common format for databases containing information on persons, families and households. The network creates a portal that provides access to the European databases, as well as to important non-European ones which have joined the network.
The Intermediate Data Structure (IDS) is designed to form an integrated and joint interface between many European databases (Alter, Mandemakers & Gutmann 2009). The IDS is already being implemented in databases like the Demographic Database of Umea, the Scania database in Lund, the HSN and several databases managed by the International Consortium of Social and Political Studies (ICPSR) in Ann Arbor, Michigan. The network is the European branch of a wider international movement to get comparable data structures which was initiated at a workshop at the IISG in the beginning of 2006. Figure 2 provides the basic scheme of the IDS.
The HSN and LINKS databases already contain sufficient identifiers to link the individual level with contextual data from two sources: a) The databases on Dutch municipalities available through the Hub for Aggregated Social History (HASH) and b) the HISCO codes for occupational titles.
HSN and LINKS software is built in JAVA (1.5 and higher) and runs on a combination of MYSQL databases (version 5.5), running on IISG-servers (256 GByte internal memory; 64 core).
The data are distributed in csv and/or dbf files, both formats are easy to import by the statistical packages that are usable in our research community (SPSS, Stata, SAS, R). All files are downloadable from the HSN download server; after signing the license researchers will get the codes to log in and download the data and documentation.
|Figure 2. Basic Scheme Intermediate Data Structure (IDS)
The information from LINKS will improve the HSN -dataset in many ways: 1 The linked data will be used for filling gaps in life courses because of lacking registers (Zeeland lacks relatively many population registers mainly due to war and water damage), 2 The data will be used to check on still unknown marriages and deaths to trace people back in case they are lost during research, 3 The HSN will be extended with information on the wider kin network by using information from marriage certificates of parents, siblings and children (including occupational titles), 4 The dataset of the HSN is completed with a) information on stillbirths –important in itself but also for more complete fertility information, b) information on lacking births (esp. during the period 1850-1870 there are omissions in the population registers and c) information on the relational status of household members which was not included in the registers from the period 1850-1862.
So, main component of the demonstrator Zeeland will be a new and much improved dataset for the province of Zeeland. Apart of the necessary documentation which is inherent on the project itself special attention will be given on the possibilities of rolling out the Zeeland procedures to the rest of the HSN database. Most of the birth certificates are still lacking in the WWW database (only the provinces of Zeeland, Groningen and Drenthe have complete data from 1812-1912). The case of Zeeland will be used to test if death certificates may be used for goals that ideally are reached with birth certificates (death certificates include also data from the moment of birth), like including children that died at a very young age and were not completely included in the early population registers.
The developed procedures and results will be shown by automatically run power point sheets to be positioned at the data section of the HSN website. All data will be converted into the Intermediate Data Structure, see foregoing paragraph.
It goes too far to describe all possible research possibilities, see section 5.1 for some examples. To get a good impression of research, see the HSN yearly report, especially the list of publications.
LINKS offers a family network by way of pedigrees and families. Because the HSN is based on birth certificates the HSN-sample suits perfectly in this LINKS family network. The combination with LINKS will extend the HSN -dataset in several ways (between brackets the numbers, referring to the list of milestones and persons involved in section 6).
- HSN Zeeland is checked on completeness of information on relatives present in the household, especially children that died soon after birth. This means that the HSN will be enlarged with relatives which were never registered in the population registers, especially in the early years of the population register till 1870. [1,3,10]
- The information from LINKS will also be used for the establishment of the relationships between household members which are not recorded in the population registers from the period 1850-1862[1,3,10].
- More in general the LINKS data will help to find and solve all kind of inconsistencies that are still in the Zeeland dataset. [2, 5, 15]This is especially relevant for those municipalities that lost their population registers as a consequence of water or fire (a.o. the registers of Middelburg, 1850-1900, 1850-1940) . A report on completeness of the Zeeland dataset will be an important target and welcomed by the research community 
- The Life Courses dataset of the HSN is extended with the stillbirths. This kind of births are not included in the population register or in the birth certificates but in the death certificates. Stillbirths are important in the study of fertility and the inclusion of these data will make the HSN-database even more unique than it already is. [3,10]
- The HSN is extended with (excerpts of) marriage certificates of all nearby relatives (parents, siblings and children) married between 1812 and 1932. From these persons we will have the place of births, the date of marriage, place of marriage, the birth year, civil status and for about 90% of the persons in question the occupational title, within the time frame 1812-1932, also for areas outside Zeeland. 
- The structure of the new datasets will be demonstrated by way of an automatic power point demonstration, to be started from the website of the HSN [6,7,8].
- The LINKS dataset still lacks a lot of birth certificates (only three provinces under which Zeeland are complete). Since death certificates are much better presented, it is worth full to test the alternative use of death certificates for the goals formulated in 1) 
- Accompanying internal documentation [1,5,9]
- User documentation [4,11] describing the datasets [3, 10, 15]