Sources of Comparative Advantage in Data-Related Services

Does digital infrastructure increase trade benefits in services, and how large are these trade benefits? This paper uses the prevalence of countries’ data centres and secure internet servers to investigate the impact of data-related infrastructures on services trade. Data centres and internet servers facilitate the production and trade of many service sectors reliant on software technologies. Therefore, digital infrastructures are complementary sources of comparative advantage in the more data-reliant sectors. Instrumental variable regressions underline this trade impact by using as instruments a country’s exposure to natural hazards since 1900, such as earthquakes, floods and droughts. Overall, data infrastructures are an important and exogenous determinants of specialization patterns in services dependent on software.


Introduction *
Data-related services traded cross-border have grown steadily since the early 2000s. The share of datarelated services exports has more than doubled since 2004 and now represents a share exceeding 0.6 percent in global GDP. Moreover, non-digital services such as finance and retail have become more data-intense and are increasingly traded over the internet, shifting away from their traditional supply of services through foreign affiliates (Andrenelli, 2018).
What determines the achievement of countries in producing and trading data-related services? Invariably, some countries are more successful in exporting these services than others and therefore enjoy comparative advantage in the data-related sectors. Technological innovations such as cloud computing and advanced information and telecommunication technology (ICT) have certainly contributed to the expansion of data-related services trade. However, one potential important factor in explaining the success of countries in exporting data-related services are data centers, internet exchange points (IXP) and internet servers. This paper therefore examines whether these data-related infrastructures form an exogenous and independent source of comparative advantage for countries exporting in data-related services.
Recent previous works have mainly concentrated on the regulatory environment in explaining the ability of countries to export data-related services. They therefore failed to attribute any importance to the role of specific digital infrastructures. Ferracane and van der Marel (2018) have shown that countries with lower levels of restrictiveness with respect to data, such as the prevention to apply data localization requirement, exhibit higher shares of trade in data-related services. Similarly, Goldfarb and Trefler (2018) explore key features of data in the form of Artificial Intelligence with respect to international trade and discuss the policy implications such as privacy, data localization and standards. Both works and other existing research therefore omit the likely impact of data-related infrastructures on data-related services trade, something this paper does.
Disentangling the importance of digital infrastructures to data-related services trade is important because descriptive data does not provide a straightforward answer. Figure 1 shows for instance that various lower middle-income countries have been extremely effective in exporting ICT-services. This in spite of the skewed distribution of data-related infrastructures around the globe, as shown in Figure  2. It is therefore unclear what role policy makers should attribute to the development of data centers, internet exchange points and secure internet servers. In great part these factors provide the backbone of many digital services exports in addition to an open policy regime in data. As a result, this paper makes a serious attempt in examining the question whether these types of digital infrastructures have any bearing on trading data-intensive services.
In doing so, this paper adds value to the previous literature in three specific ways. One, this paper collects data on the number of data servers that countries have built over the years, something that no other paper has done so far, to ascribe the importance of this specific type of infrastructure on services trade. Other types of data-related infrastructures which are also taken as a focal point in this paper are internet exchange points and secure internet servers. This point makes a distinctive difference compared to previous literature that examines a certain proxy of data flows to examine its contribution to trade (Manyika et al., 2016). Knowing that this source of data is bilateral-specific between countries, and because all data flows of within and between countries needs to pass through data-related infrastructures, this paper takes the latter approach to find out its relevance for data-related trade.
Second, and most importantly, we use instrumental variable strategy to find out how data-related infrastructures form a source of countries accomplishment in trading data-related services. Even though Robert Schuman Centre for Advanced Studies Working Papers data flows and trade in services are hugely intertwined (WTO, 2019), instrumenting data flows is difficult. Instead, instrumenting digital infrastructures in our case has the convenient advantage of obtaining a valid instrument: Firms that construct data-related infrastructures such as data centres strongly take into account a country's exposure to the risk of natural disasters in their decision making. This information presents us with exogenous variation in order to assess the true impact of data infrastructure on data-related services trade. Our empirical results show that the selection of this instrument is a valid one.
Third, an older set of works have demonstrated that the internet is a strong predictor for cross-border trade in services 2004). Although this literature reflects the line of research this paper takes, it is nonetheless markedly different. The distinctive difference is that over the years the digital infrastructure for trading services online has changed dramatically. Cloud computing, data storage technologies and other computer system resources were not available yet in the early 2000s or were otherwise deployed on a minimal scale. While the internet is necessary to transmit data, which are embedded in many digital services, it hardly defines the entire set of organizational facilities needed for a country to trade services digitally. Accompanying data-related infrastructures such as data centers to explain trade in services currently represents a novelty in the empirical academic literature.
The remainder of the paper is organized as follows. The next section discusses the previous literature with respect to data-related trade in services, or more broadly, digital services trade. Then, the third section sets out the simple empirical strategy in which data-intensive services trade are regressed on the three types of data-related infrastructures. The fourth section develops the instrumental variable strategy that is used in the empirical model and presents the results of the instrumental variable (IV) regressions. Finally, the last section concludes and puts the results in a wider context.

Previous Literature
The previous literature on data, trade in services, and data centers is scant. From a macro-perspective, the importance of data in international trade globally has been estimated by Manyika et al. (2016). The study states that cross-border data flows account for $2.8 trillion of the total amount of increased world GDP over the last decade, thereby exerting a larger impact on growth than traditional goods trade. Interestingly, this work does not dedicate special attention to the inter-linkages that exist between data flows and trade in services. It takes the data as a separate type of flow that impacts the economy independent from services. That flow, however, has over the years increased dramatically. Besides the sheer flows of data that has increased exponentially since 2005, associated internet protocol (IP) traffic was estimated to have grown 64 times since 2005, whereas global internet bandwidth has quadrupled between 2010 and 2014 (Pepper et al., 2014).
Earlier work from  points to the internet as a facilitating factor on trade in services. The authors show that an increase in internet penetration by 10 percent boosts growth of services imports by 1.1 percentage point and exports by 1.7 percentage point. These numbers illustrate that the internet effectively decrease transport costs for services as a subset of trade costs. Overall, the US ITC estimates that the internet reduces trade costs by about 26 percent on average (USITC, 2014), overcoming the historical hurdle of borders on trade. Lendle et al. (2016) find similar reductions in the border effect to be 64 percent for a comparable basket of goods that is likewise traded for a similar set of countries over the internet using eBay. In this case, the authors claim the reduction in trade costs comes from the way the internet cuts search costs given the effect rises with product differentiations and for countries speaking different languages.
However, positive trade costs developments for data and associated services also have the possibility to be reversed. Goldfarb and Trefler (2018) make clear that as Artificial Intelligence (AI) expands, datarelated policy implications for services are likely to rise. For example, regulatory "behind-the-border" policies such as data localization and strict data privacy rules have the potential to limit the ability of Electronic copy available at: https://ssrn.com/abstract=3657216

Sources of Comparative Advantage in Data-Related Services
European University Institute 3 foreign firms to access data and to scale up their business models. This policy restriction may therefore inhibit countries' trade patterns and distort sources of comparative advantage. Recent work by Ferracane and van der Marel (2018) indeed shows that countries with higher levels of policy restrictions in data, such as data localization, exhibit lower levels of services trade. The authors show that this is particularly true for services that are data-intensive by means of software usage.
The involvement of data centres as critical digital infrastructure has been highlighted in various nonacademic country and regional-specific case studies. Clipp et al. (2014) assess the impact of Facebook's settlement of a data centre in Sweden. The report claims that it has brought alongside an entirely new digital ecosystem worth more than the direct investment costs of the infrastructure itself. García Zaballos and Iglesias (2017) discuss how data centres suit the purpose of economic development for Latin American and Caribbean countries. They argue too that data centres are essential for the development of an ICT ecosystem in the region. The authors also consider a framework that portrays the optimal environment in which data centres are developed. One important factor that supports the decision to locate data centres in countries is the resilience to natural risk disastersomething this paper explores as an exogenous instrument.
On a larger scale this paper closely relates to a long line of research that describes more generally the impact of infrastructure on trade costs and comparative advantage. Donaldson (2018) estimates that physical infrastructure (in the form of railroads networks in India) shape comparative advantage in goods trade. Analogous to this paper, because countries differ in digital infrastructures, they also differ in productivity levels across services, which incites countries to trade in order to exploit comparative advantage. Stretching the concept of infrastructure as an endowment factor or rather country characteristic, this paper also relates to the works of Nunn (2007), Levchenko (2007), Costinot (2009 and eventually Chor (2011). These works develop an empirical approach also used in this paper. That is, they exploit differences in country factors such as quality of domestic institutions and per worker human capital as a source for comparative advantages in goods that are intensive in the factor. In our case, this "factor" data.

Empirical Strategy
This section sets out the empirical strategy. The empirical strategy uses an interaction term in which country-specific characteristics are multiplied with industry-level intensities. This interaction defines comparative advantage from where trade benefits arise, following the above-mentioned set of seminal works such as Nunn (2007), Levchenko (2007), Costinot (2009) and Chor (2011. In this paper, the interaction term is comprised of an indicator measuring a country's prevalence of data centres and other related data infrastructures, multiplied by the extent to which service sectors are intense in the use of software. The latter term is conveniently called sectoral data-intensities. This is how this paper's source of comparative advantage in data (or software) intensive services set up: by a country's level of digital infrastructure drawing on data centres and related types of data infrastructures which are discussed below.
Equation (1) measures formally how data infrastructure such as data centres form a country's source of comparative advantage which is associated with data-intensive trade. In particular, we regress the logarithm of cross-border exports of services (SX) in country c, for service sector j, on the interaction term that is comprised of a term called DC for country c and a term denoted by D/L for sector j. Hence, the empirical baseline model takes the following form: In equation (1) the terms (DC) and (D/L) form the multiplication of country characteristics and sector-intensities, which respectively is denoted by DC (data centres) and D/L (data-intensity). The term DC represents a vector in which we not only use number of data centres per country, but also equivalent Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers metrics of internet exchange points (IXP) and secure internet services (SIS) (see below). Then, the terms and refer to the fixed effects by exporter and sector, respectively. Sector fixed effects are applied at the 2-digit BPM6 level, which includes a total of 18 service sectors. Finally, is the residual term. Regressions are estimated with robust standard error clustered by country-sector and are performed over the period 2016-2017 in a cross-sectional setting throughout. Hence, no time dimension is given to our identification strategy.
For services exports, (SX) , this work uses the WTO-UNCTAD-ITC annual trade in services dataset, which covers exports and imports of total commercial services. This is our preferred source which we also use for our descriptive analysis below. This dataset covers 222 entities which include countries and regional aggregations/economic groupings from 2005-2017 at the 2-digit level. The data is in line with the sixth edition of the IMF Balance of Payments and International Investment Position Manual (BPM6) as well as the 2010 edition of the Manual on Statistics of International Trade in Services (MSITS 2010). This entails that, compared to the BPM5 classification, major changes for the Balance of Payments (BOP) classification for services have been introduced with regards to financial intermediation services, insurance services, intellectual property and manufacturing and maintenance services.

Data Centres
In its simplest explanation, data centres are physical facilities that house data and enable digital network applications for external organizations. Often established in large buildings, or a set of buildings, data centres are dedicated spaces where computing capabilities and storage resources enable the delivery of shared applications for external organization. Data centres also store, manages and disseminates data for the applications. In effect, the data centre owns computer systems and associated components, such as telecommunication and storage systems, to organize an external organization's IT operations and equipment. Often, data centres accommodate data and the network application techniques that are critical for the external organization to the continue their daily digital operations. In simple terms, data centres are also commonly known as "the cloud", which are basically servers located somewhere that are connected via ICT networks.
As a result, reliability, network support and security assurance are important aspects to guarantee the operations of a data centre. To adhere to these qualities, the businesses literature identifies several criteria that firms consider in their decision to select a site for constructing a data centre. Three of the most important factors play a role in this decision-making process, namely environmental conditions, power supply and communications infrastructure. Regarding the first factor, it covers a site's climate and history of natural hazardsan item that we utilize for our instrumental variable strategy. Other secondary factors that play a role in firms' decisions to select a site are socio-economic conditions such as skilled workforce, the availability of construction services and an appropriate governance framework in terms of existing regulations. Figure 3 shows that the global distribution of data centres is unequal, giving natural rise to crosscountry differences of this data-related infrastructure. The prevalence of data centres in each country is measured per 1 Mln population, which is also used in our regressions. The map shows that countries such as Iceland, Latvia, Switzerland, Mauritius and Hong Kong have high densities of data centres. Most countries well-endowed with data centres are relatively smaller open-oriented economies. Two interesting countries with one of the highest data centre concentrations are Iceland and New Zealand; two countries that are geographically remote compared to many others. On the other side of the spectrum, countries showing lowest levels of data centre are Tanzania, Bangladesh, Peru, Afghanistan, but also China and Mexico. The last two countries that have somewhat higher income levels compared to others in the bottom ranking.
In addition to data centres, we also enter the number of internet servers (SIS) and internet exchange points (IXP) into the regressions. Internet servers are computer programs or devices that provides Electronic copy available at: https://ssrn.com/abstract=3657216

Sources of Comparative Advantage in Data-Related Services
European University Institute 5 functionality for other programs or devices, called "clients". Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers. Oftentimes, these servers are placed at data centres, but not always. Internet exchange points are physical infrastructure through which internet service providers (ISPs) and content delivery networks (CDNs) exchange internet traffic between their networks. See for the variable list for further information.
In the regression, all three data infrastructures are divided by 1Mln population to get a normalized measure. The reason for doing so is that firms take into account the size of the market when investing in each of the three types of physical infrastructures for data. Economies of scale plays in important role, but as outlined above, other factors too. Also, given that we are in a framework of specialization (i.e. comparative advantage), we need to normalize this country measure. At a later stage we also perform robustness checks by dividing the number of the three data infrastructures by employment and land area (sq km). We take the log for each measure after dividing so that in fact for data centres we use (DC/P) , for secure internet servers we get (SIS/P) , and for internet exchange points the term becomes (IXP/P) .

Data-intensities
Data-intensities are measured using information on software usage. Specifically, this paper takes the 2011 Census ICT Survey from the US, which reports data at detailed 4-digit NAICS sector level. This data is survey-based and records how much each industry and service sector spend in Mln USD on ICT technology in terms of hardware equipment and computer software.
The survey records two types software expenditure: capitalised and non-capitalised. We select capitalised expenditure given that this proxy is closer to the concept of intensities with respect to labor and capital as a factor of production used in the academic literature (e.g. Romalis, 2004;Chor, 2011). Non-capitalised expenditure instead relates more to the input support of firms which enters in the production function as intermediate inputs. Capitalised expenditure is comprised of longer-term investments made in computer software. It excludes purchases and payroll for developing software as well as software licensing and service, and maintenance agreements for software. The year 2010 is selected so that this information doesn't run the risk of being endogenous to the trade data. Capitalized software expenditure is divided over labour, for which we also use data for the year 2010.
This proxy for "data-intensity" is not ideal. However, there is currently no good data on how much data is used by each sector. Several sources such as Cisco and Telegeography guesstimate the extent to which data exist within countries, but only for a handful of observations, not sectors. Having said that, what is clear, and wat is also intuitively appealing, is that the use of data within and across borders is performed using software technologies. Firms need software to use the internet in its simplest form and employ advanced software to transmit large sets of data that connect to data centres. In addition, more technology advanced transmissions of data over the internet are done with the help of cloud computing technologies that data centres provide and which in themselves are extremely software intense. As such, software capital is in our view the first-best available proxy to date.
Intensities are computed at 4-digit NAICS level and then concorded into 2-digit BPM6 level given that the trade data recorded in this classification system. Because no concordance table exists between NAICS and BPM6, a self-constructed matrix is used. Numbers are aggregated at 2-digit BPM6 level by taking the simple average. Note that one sector forms a mismatch between the two classification tables, which is Intellectual property / Royalties and license fees. This category is not reported in the US Census nor in the BLS database. Nonetheless, this sector is important as it covers, among other items, patents, trademarks and copyrightsactivities which are data-intensive and for which the trade data records high Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers levels of services exports. Therefore, we have developed our own concordance table to include this sector. Details of this procedure can be found in Annex 1. 1 Table 1 ranks the service sectors by software intensity. Unsurprisingly, telecommunications, computer services and information services exhibit highest software usage. These sectors employ a high amount of data by means of software usage compared to labour. Information services cover activities that are closely related to the work of a data centre, such as data processing services, but also includes web search. Both financial and insurance services are high data-intense sectors too. The two sectors are more broadly considered as very digital intensive given that over the years internet technologies have massively changed the financial services industry. 2 On the other side of the spectrum, the least softwareintense sectors are construction and travel services. The middle-range is made up of a mix of modern and traditional sectors such as R&D and transport services. Figure 4 illustrates the thrust of this paper's hypothesis. It shows that the prevalence of data centres in countries is strongly associated with trade in data-intensive services. The figure takes the sector of information services as an example, but equally strong correlations hold for other data-intensive sectors. Countries with a higher level of data centres per 1 Mln people demonstrate greater shares of information services exports in GDP, such as Switzerland, UK or Singapore. Conversely, Guatemala, Kazakhstan and Peru are countries with low shares of information services exports as part of their economies but also having low densities of data centres.

Results Baseline Regressions
Tables 2 and 3 report the empirical results of the baseline regressions. Besides the number of data centres (DC), we also provide results from the two additional variables in the DC vector as given in equation (2), namely for internet exchange points (IXP) and secure internet servers (SIS).
In Table 2, the first column shows the coefficient result using data centres as the variable of interest. Columns 2 and 3 report the results for using IXPs and secure internet servers. In all three occasions the coefficients are positive and strongly significant, indicating that countries which have a higher prevalence of data centres do indeed exhibit higher volumes of exports in more data-intensive service sectors. Similarly, countries which are more supplied with IXPs and secure internet servers tend to export more in data-intensive services. These findings remain robust for data centres and secure internet servers when including GDP per capita (in PPP) which measures a country's level of development, as reported in Table 2. The empirical literature makes clear the inclusion of this control variable reinforces the case that the variable of interest identifies an independent channel through which trade patterns are determined.
Regarding the comparable importance between the two remaining significant variables, yielding standardized beta coefficients provides a gauge of their relative impacts. In doing so, results of the beta coefficient of secure internet servers is more than three times as large ( = 0.399) compared to the beta coefficient found for data centres ( = 0.129). 3 However, one must keep in mind that on average 1 The concordance table between 4-digit NAICS and 2-digit BPM6 can be obtained upon request. Admittedly, the inclusion of intellectual property / royalties and license fees as a service is a BOP decision and some debate exists whether this is truly a service. In addition, for some countries, this may also reflect tax and transfer pricing as drivers of observable trade in this sector. However, since this sector is included in all publicly available data sources recording trade in services, we prefer to include it. Nonetheless, in our regression we have also dropped this sector entirely as additional (unreported) robustness checks. Results do not alter in any way apart from slight coefficient size changes. Results are available and can be obtained upon request.
countries have more than 240.000 secure internet servers on their territory against 40 data centres. Therefore, even though a one standard deviation increase in secure internet servers is more than 3 times as effective, in terms of actual units this outcome differs: a 10 percent increase in data centres, equivalent to 4 units, results in an increase of about 0.5 percent in exports of data-related services. The corresponding percentage increase in secure internet servers, which is 24.000, results in an increase of 0.42 percent of data-intensive services exports.

Endogeneity
One great concern with our empirical specification is that a higher prevalence of digital infrastructures is observed in countries that trade more data-related services. In other words, reverse causality is an obvious probability. In order to ensure that the significance of data centres and secure internet servers are not a function of services trade, this paper uses the frequency rate of natural hazards of countries as an instrument. A country's natural hazard rate is caused by a geophysical, meteorological, hydrological and climatological onset events. They represent strong external forces that isolate any potential influence resulting from existing trade patterns. 4 The use of natural onsets is an intuitively appealing instrument for in particular data centres. Firms deciding on a site to construct data centres generally consider many factors. One of the top criteria for the site selection are the environmental conditions. Especially long-term weather conditions prevail in the decision making for constructing a data centre, such as wind, snow and ice storms, as well as other natural hazards which are composed of sudden onsets such as seismic events, floods, tornados, hurricanes and volcanos. These weather conditions leave little room for any control and are unlikely to be influenced by trade. Therefore, the number of onset occurrences, or more broadly natural hazards, as an instrument for data centres is a strong exogenous factor with respect to a country's export patterns in data-related services.
Especially Long-term meteorological conditions are an important determinant in the decisionmaking process. Firms avoid places with excessive wind patterns as well as snow and ice storms. More generally, data centres are often placed by considering the optimum temperature of a location and country's relative humidity. Taking into account these onset indicators is essential, because the digital communication networks and hardware IT equipment are designed for operating only within a certain range of temperatures and humidity. Also, cooling techniques that data centres extensively make use of depend on environmental conditions as they affect energy demands for air conditioning. Precipitation on the other hand appears less of a problem as firms can deal with such weather conditions by protecting the data centre.
In addition, other natural hazards also play a large role in deciding for a location. These are composed of more sudden onsets, namely hydrological natural hazards which are caused by the occurrence, movement, and the distribution of surface and subsurface freshwater and saltwater. Examples are floods that are of a great concern for data centre developers. Interestingly, firms also like to stay away from locations near oceans because of high-sulphates and natural corrosives (salt) that can damage the data centre. Finally, geophysical natural hazards matter too for data centres, which can manifest in the form of earthquakes and volcanic activity. Notably volcanos are of concern for data centre constructors. Volcanic eruptions create a large amount of ashes that in combination with strong wind patterns can cause long-term damage to data centres.
Of note, a country's natural hazard rate caused by a geophysical, meteorological, hydrological and climatological onset event also matter for internet servers as they are often, but not always, placed at the 4 There are also biological and extra-terrestrial natural hazards. For instance, epidemic or insect infestation are examples of the former, whereas space weather is an example of the latter. Both categories are omitted here. Robert Schuman Centre for Advanced Studies Working Papers premise of the data centre. As such, our instrument also deals the risk of endogeneity with respect to internet servers.

Instrument
We take data on natural hazards from the EM-DAT database. This data source is also known as the International Disaster Database from the Centre for Research on the Epidemiology of Disasters (CRED). The database records two types of hazards, namely natural hazards and technology hazards. This paper takes natural hazards for creating the instrument. 5 Natural hazards consist of geophysical, meteorological, hydrological, climatological, as well as biological and extra-terrestrial onset events. This paper takes the first four types of natural hazards as a focal point for creating the instrument. It therefore omits the last two types of natural hazards. 6 The database only report disasters related to these hazards, which means that an onset event only enters the database if one of the following three criteria is recorded: (1) 10 or more people deaths, (2) 100 or more people affected, injured or homeless, or (3) a declaration exits by the country of a state of emergency and/or an appeal for international assistance. 7 The database records information by country, region and precise location, and also documents temporal information.
Geophysical hazards are earthquakes (including tsunamis), mass movements and volcanic activity, which are all onset events that originate from solid earth. Meteorological hazards are storms, extreme temperatures and fogs; they are generally onsets caused by short-lived extreme weather and atmospheric conditions that can last for days. Hydrological hazards are onsets caused by the occurrence, movement, and distribution of surface and sub-surface of freshwater and saltwater which appears as floods, landslides and wave actions. Finally, climatological hazards are onsets caused by atmospheric processes stemming from seasons or climate variability, such as drought and wildfire. This latter category also covers slow-moving onsets.
To create a variable, the four categories of hazards are taken together. Specifically, the occurrence rate is computed by taking the sum of each of the four categories of hazards, as well as for each category separately, since 1900-2015. This number for each country is then divided by its 2015 population so as a per Mln rate is obtained. The reason for taking a long historical record is because a lengthier time horizon makes any estimation using this instrument even less sensitive to current services trade influenceseven though reverse causality is unlikely to happen in the first place. 8 The reason for dividing over population is that the goal of this paper is to find out specialization patterns, in line with the aforementioned comparative advantage literature. Besides, firms naturally take market size into account when deciding on an investment. 9 Figure 4 shows the correlation between the instrument of onset occurrences and data centres per 1Mln population in logs. A relatively tight relationship appears. The figure indicates that countries with higher per unit levels of data centres also exhibit greater per unit levels of onset occurrences. Note that 5 Technological hazards are comprised of industrial accidents, such as gas leaks and oil spills, as well as transport accidents and other miscellaneous accidents such as collapses and explosions. 6 Biological natural hazards are epidemics, inset infections and animal accidents. Extra-terrestrial natural hazards are airbursts and more generally space weather such as geomagnetic storms or shock waves. 7 The database reports that in some cases secondary criteria are also taken into account when figures are missing, such as the significance of the disaster over a time span or whether significant damage takes place. 8 One could argue, for instance, that past services trade performance is correlated with overall higher levels of economic activity which brings along onset event in society. Given that this paper takes an extended time horizon of natural hazards since 1900, it is very unlikely that services trade today impacts this variable. 9 Robustness checks have also been performed using total employment and land area as de denominator. Results are largely consistent and can be obtained upon request. the correlation is positive, which at first may look counter-intuitive. Yet the unit levels using population neutralizes the size of each country that otherwise would be picked up if nominal values of onset occurrences were used: bigger countries also experience more hazards. By construction, then, smaller sized economies such as Iceland, Hong-Kong, Bulgaria and Mauritius naturally show greater levels of data centre endowments together with hazard frequencies.

Results IV Regressions
The results of the IV regressions are reported in Tables 4 and 5 for data centres and secure internet servers respectively. These were the two significant variables in the baseline regressions. In column 1 of each table the instrument variable of occurrence rate (OR) is computed using all four types of natural hazards occurrences together, as explained above, and which is titled "Total". In the second column, only the category of meteorological hazards are taken up to compute the occurrence rate and are denoted with "Meteo" in the table.
In Table 4, the results in the first two columns have a positive coefficient sign, in line with the initial correlation of Figure 5. The coefficient size for both regressors more or less double in each column compared to the results in Table 2. Both columns also report an F-statistic that is much higher than 10, indicating the instrument is strong. In addition, the p-value of the Kleibergen-Paap rk LM-statistic shows that the instrument is also relevant as we can reject the null hypothesis of under-identification. Note that the coefficients for the first stage regressions are also positive and significant. However, the endogeneity test rejects the OR Total instrument in column 1 whereas the OR Meteo instrument only pass the endogeneity test at a marginal level (p ≤ .10).
Further, in columns 3 and 4 of Table 4 the regressions are repeated but now with GDP per capita. Including this control variable is problematic by default: a higher level of development is responsive to a country's higher levels of services trade (Francois and Hoekman, 2010;Hoekman and Mattoo, 2008). Therefore, in the IV regressions, we treat this variable as endogenous too. Doing so gives in both columns a positive and significant coefficient outcome for the variable of interest, although with half of the coefficient size consistent with the results found in the baseline regressions. In both cases the pvalues of the Sargan-Hansen J-statistic tell us that the joint null hypothesis is not rejected. The two instruments are therefore both valid and uncorrelated with the error term.
The results reported in Table 5 are largely similar as found in Table 4. In all columns the coefficient results for secure internet servers are highly significant, experience a reduced coefficient size when the GDP term is entered, and are measured with a strong instrument. This stays true when instrumenting secure internet services with both the total and meteorological onset occurrence rate.
However, the treatment of the per capital GDP variable remains somehow unsatisfactory. Even though the addition of this control variable makes the case stronger for our coefficient results not to be already picked up by levels of development, ideally this covariant should be treated as exogenous. Finding a way to make GDP per capita uncorrelated with services trade and the instrument uncorrelated with the error is therefore essential. This would also make the IV estimator consistent. One solution is to take past economic performance which is unlikely to be affected by current levels of services trade. Moreover, given that we take and extreme time horizon for onset occurrences, historical data on per capita GDP levels are also unlikely to influence this instrument. As a result, the year 2007 is chosen which is a time that lies well ahead of our cross-section. Table 5 reports the results when treating GDP per capita levels for the year 2007 as exogenous. The first two columns show that when instrumenting for data centres, the regressions generates significant outcomes. Similarly, when instrumenting for secure data servers the last two columns also show significant outcomes, although the coefficient result for OR Meteo is statistically weak. Note that in all cases the GDP variable is negative yet insignificant. One potential explanation is that on average less developed countries have grown faster since 2007 which has resulted in greater levels of services trade Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers as opposed to richer countries, which have seen a slower growth rate in GDP. In all cases, the instrument is strong (F-statistic > 10) although the endogeneity test is only rejected in columns 2 and 4, signifying that instrumentation is necessary.
On the basis of these results in column 2 and 4, a 10 percent increase in data centres now results in an expansion of exports in data-related services of about 1.6 percent, which is equivalent to Singapore's share in world exports in services. Likewise, an equal percentage increase in secure internet servers raises data-related exports with 2.1 percent, comparable to the relative size of total services export of a country like Canada.

Other Robustness Checks
This section performs various robustness checks. First, we take employment and land area as the denominator instead of population. Second, we check for alternative years for the per capital GDP variable. Third, we check whether our results are not driven by the sheer levels of trade. Fourth, we use an index of cross-border data flow restrictiveness as an additional control variable.
The first robustness check is to see whether results hold when replacing total population for total employment or land area. So far population is used to normalize our measure of data centres (DC) and secure internet servers (SIS) prevalence because firms take into account market size when investing in the construction of a data centre. Using employment instead would be in line with our (D/L) variable of software-intensity and be somewhat closer to an interpretation based on factor proportions following Romalis (2004). One should know that strictly speaking digital infrastructure such as data centres do not form any endowment factor in the production function. It therefore cannot be seen as an Heckscher-Ohlin determinant of comparative advantage (Chor, 2011). Still, using employment is informative to see whether results are sensitive to different numerators.
Tables 7 and 8 report the results for data centres and secure internet servers respectively when using total employment. Employment is sourced from the Penn World Tables for the year 2017. In Table 7, three out of four columns give a significant result of the regressions for data centres. The result is particularly strong when using OR Meteo as instrument. Column 4 shows that the results remain significant when entering the GDP variable. The results for secure internet servers are also significant albeit weaker. Here, no significant results are obtained when using the GDP term as a control variable. Note that if we take total land area (sq. km) instead of employment, results are largely similar but statistically weaker as shown in Tables 9 and 10.
The second robustness check is to make sure that the year for the per capita GDP data is not arbitrary chosen. It may be that using the year 2007 for the control variable gives a significant result by chance. It may also be that 2007 is too close to our cross-section year of 2017 and so the GDP term might still be sensitive to any reverse causality. This might in particular be true when the effect of economic development involves a substantial time lag for countries to experience higher levels of services trade. Therefore, we take the years 2001, 2004, in addition to 2010 as an extra year to see whether results hold. Table 11 and 12 shows that for all years, and for both data centres and secure internet servers, results are consistently significant, although for earlier years the coefficients are statistically weaker.
The following robustness check is to see whether there's truly a shift away from existing exports in favour of data-related services. It may be that sheer trade as a response variable does not entirely capture the relative specialization pattern of a country. Countries may have higher trade levels in data-related services, but that wouldn't necessarily imply a relative change in their export basket vis-à-vis already exported services. Even though the previous literature, e.g. Chor (2011), precisely tests comparative advantage by the interaction term as defined in the baseline specification, we nonetheless test this issue. Specifically, we compute the Balassa index of Revealed Comparative Advantage (RCA) based on our services trade data and substitute this indicator in equation (1).

Sources of Comparative Advantage in Data-Related Services
European University Institute 11 The idea behind this indicator is that it summarizes the concept of Ricardian comparative advantages in one measure. However, given that this measure has received various critique as it is asymmetric, i.e. unbounded for those sectors with an RCA higher than 1, but a zero as a lower bound for sectors with a comparative disadvantage, we normalize the index following standard practise as proposed by Laurson (2015) with (RCA -1) / (RCA + 1). The results are reported in Table 13 for data centres and Table 14 for secure internet servers. The outcomes for data centres are significant throughout all specifications except in column 3. Importantly, the coefficient result is significant when using OR Meteo as instrument in addition to the GDP variable for the year 2007 as a control. The results for secure internet services remain insignificant when entering the GDP term.
The fourth and final robustness check is to control for policy restrictions in digital services. Crossborder services trade as picked up in this paper appears to be susceptible to restrictions that prohibit or inhibit the movement and domestic use of data. In particular, Ferracane and van der Marel (2018) show that restrictions to the free flow of data across borders is negatively associated with trade in dataintensive services. Given that this paper takes data centres and secure internet servers as a focal point, of particular interest is their sub-index of cross-border data restrictions. Data centres as well as internet services, which are mostly integrated in data centres, operate by the existence of cross-border data flows and therefore have an impact on data-related services trade.
The restrictiveness indicator is sourced from ECIPE where it forms part of the larger Digital Trade Restrictiveness Index (DTRI). Only restrictions to the cross-border movement of data is selected, and so restrictions regarding the domestic use of data is excluded. Ferracane and van der Marel (2018) show that the latter category of restrictions has little impact on services trade. Cross-border data flow restrictions include data localization rules, local storage requirement and restrictive conditional flow regimes. To ensure that the restrictiveness index is not endogenous to services trade, the same approach as with the GDP variable above is applied: a year well-ahead of the cross-section period is taken, namely the year 2010. We do this for both the restrictiveness index and the GDP variable.
Results for data centres are reported in Table 15, whereas results for secure internet servers are reported in Table 16. In both tables the coefficient results in columns 1 and 2 are significant. It means that the two instruments stay robust when including policy restrictions on data. Interestingly, the coefficient results on the data policy restrictiveness index also comes out as statistically significant, which appears to be stronger when using OR Meteo as instrument. It implies that lower levels of crossborder data flow restrictions form a complementary factor for enhancing trade in data-related services alongside developing digital infrastructures. Note that this corresponding result is stronger for secure internet servers than for data centres.
A final interesting result in column 3 and 4 of both tables is that the results show a coefficient results on data restrictions that remains negatively significant when entering the per capita GDP variable.

Conclusion and Policy Implication
This paper shows that data-developed countries are more successful exporters in service sectors more reliant on software. The paper illustrates that countries with higher rates of data centres and secure internet servers are in a better position to specialize in services which are software intense. This paper uses historical occurrence rates of natural hazards such as meteorological onsets as an instrument to show that this impact is robust and exogenous. Data infrastructures are sensitive to extreme onset occurrences which firms like to avoid. In sum, countries with a higher prevalence of data centres and internet servers, which are likely to experience more favourable weather conditions, are in a better position to export in digital services.
Differing productivity levels across service sectors create incentives for countries to trade in order to stimulate comparative advantage. One way of doing so is to reduce restrictive policies related to the cross-border movement and domestic usage of data. Many emerging and developing countries would Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers benefit from such a reform approach as generally these countries have shown greatest increase in datarelated restrictions. However, that cannot be the only factor for policy makers to consider. Indeed, the empirical analysis this paper develops makes a strong case that the digital infrastructure is a robust enabling factor that complements data policies.
Yet, increasing digital infrastructure capacities has its limits. Particularly for developing countries building data centres is an expensive undertaking. Also, small open economies may not always want to prioritize the construction of data centres given their limited land area. In that context, the regional and global network can play a supportive role in developing data-related activities to reinforce comparative advantage. Countries that do not have enough capacity to construct data centres will rely on other countries for their digital infrastructures on which data can be stored. Yet, for that to happen there is a need to have a policy regime in which data can flow freely across borders. Regional or global frameworks to discuss data policies can therefore help to achieve the benefits of data-related services trade for these types of countries.
Obviously, then, digital policies and digital infrastructure go hand-in-hand to create greatest productivity benefits. In that regard, it is encouraging to see that robustness checks in this paper confirm that both aspects can contribute significantly in reaping gains arising from services trade: one through policy reforms and a second one through spending on digital infrastructures. However, this outcome does not provide policy makers with the complete story, as it leaves enough room for further research. For instance, it is far from clear how the two factors interact. What is the optimal sequencing model for governments to undertake? What can poorer countries do to increase disaster risk resilience given that many of them suffer from continued high rates of natural hazards? Robert Schuman Centre for Advanced Studies Working Papers

Sources of Comparative Advantage in Data-Related Services
European University Institute 17

Source: Data Center Map
Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers Electronic copy available at: https://ssrn.com/abstract=3657216

Sources of Comparative Advantage in Data-Related Services
European University Institute 19

Figure 4: Correlation between Information Services Exports in GDP and Data Centres per 1Mln Population
Source

Correlation between Exports of Info. Services and Data Centres
Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers Electronic copy available at: https://ssrn.com/abstract=3657216

Sources of Comparative Advantage in Data-Related Services
European University Institute 21   Note: * p<0.10; ** p<0.05; *** p<0.01. The dependent variable the log of services exports ln(SX) using data from the WTO-UNCTAD-ITC BPM6 database. Robust standard errors are clustered at the country-industry level. Fixed effects for sector are applied at 2-digit BPM level. The term ln(D/L) denotes the log of data intensity over labour using data from US Census on capitalized software and US BLS. P denotes Population. DC denotes Data Centres. IXP denote Internet Exchange Points. SIS denotes Secure Internet Servers.
Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers

Correlation between Onset Occurance and Data centres
Electronic copy available at: https://ssrn.com/abstract=3657216 Note: * p<0.10; ** p<0.05; *** p<0.01. The dependent variable the log of services exports ln(SX) using data from the WTO-UNCTAD-ITC BPM6 database. Robust standard errors are clustered at the countryindustry level. Fixed effects for sector are applied at 2-digit BPM level. The term ln(D/L) denotes the log of data intensity over labour using data from US Census on capitalized software and US BLS. P denotes Population. DC denotes Data Centres. OR denotes Occurrence Rate of natural hazards (Total or Meteo) using the EM-DATA database.
Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers Note: * p<0.10; ** p<0.05; *** p<0.01. The dependent variable the log of services exports ln(SX) using data from the WTO-UNCTAD-ITC BPM6 database. Robust standard errors are clustered at the countryindustry level. Fixed effects for sector are applied at 2-digit BPM level. The term ln(D/L) denotes the log of data intensity over labour using data from US Census on capitalized software and US BLS. P denotes Population. SIS denotes Secure Internet Servers. OR denotes Occurrence Rate of natural hazards (Total or Meteo) using the EM-DATA database.

Sources of Comparative Advantage in Data-Related Services
European University Institute 25          Note: * p<0.10; ** p<0.05; *** p<0.01. The dependent variable the log of services exports ln(SX) using data from the WTO-UNCTAD-ITC BPM6 database. Robust standard errors are clustered at the countryindustry level. Fixed effects for sector are applied at 2-digit BPM level. The term ln(D/L) denotes the log of data intensity over labour using data from US Census on capitalized software and US BLS. P denotes Population. DC denotes Data Centres. OR denotes Occurrence Rate of natural hazards (Total or Meteo) using the EM-DATA database. CB denotes Cross-Border and covers cross-border data flow restrictions using ECIPE's DTRI. Note: * p<0.10; ** p<0.05; *** p<0.01. The dependent variable the log of services exports ln(SX) using data from the WTO-UNCTAD-ITC BPM6 database. Robust standard errors are clustered at the countryindustry level. Fixed effects for sector are applied at 2-digit BPM level. The term ln(D/L) denotes the log of data intensity over labour using data from US Census on capitalized software and US BLS. P denotes Population. SIS denotes Secure Internet Servers. OR denotes Occurrence Rate of natural hazards (Total or Meteo) using the EM-DATA database. CB denotes Cross-Border and covers cross-border data flow restrictions using ECIPE's DTRI.
Electronic copy available at: https://ssrn.com/abstract=3657216 Robert Schuman Centre for Advanced Studies Working Papers

Annex 1: The category of Royalties and licenses & Intellectual property
The category of Royalties and license and Intellectual property are two different names that refer to the same variable which are found in WTO-UNCTAD-ITC and OECD-WTO (BaTIS) trade in services databases. In the WTO-ITC-UNCTAD database, to which we refer as BPM6, this category is called Intellectual property whereas in the BaTIS database this category is denoted as Royalties and licenses.
Unfortunately, no direct connection between the NAICS 2007 classification and the sectors Royalties and licenses nor Intellectual property can be made from where we have computed our data intensities, i.e. (D/L). Equally unfortunate is that no concordance table exists between NAICS and BPM6 and NAICS and EBOPS more generally. Therefore, we have constructed our own concordance tables and build them up from an extremely detailed 6-digit level. This is not too difficult when mapping each 6digit NAICS code into a 2-digit BPM6 or EBOPS code. However, since no clear 6-digit NAICS code can be directly linked to the services category of Royalties and license or Intellectual property, we have extended our concordance scheme to include this sector. We have done so in an indirect way through other concordance systems. The result of this concordance process can be seen in Table A1.1 below.
The way to do so is not clear-cut and some assumptions need to be made. For starters, the WTO-UNCTAD-ITC trade in services database designates Intellectual Property as chapter "SH" following the 6 th edition of the Balance of Payments (BPM6) while the OECD-WTO BaTIS denotes this category as S266 following EBOPS 2002. As said, both overlap and are therefore indicated as "SH / S266" in Table  A1.1. To eventually arrive at the NAICS 2007 code, two sequential sources are needed. First, the Annex III of the MSITS 2002 EBOPS classification provides a concordance table between EBOPS and CPC 1.0, which is used as a first step. Four sectors are classified under 266 Royalties and license fees, namely Patents, Trademarks, Copyrights and Other non-financial intangible assets. With the help of the United Nations correspondence tables website (https://unstats.un.org/unsd/classifications), a concordance can be made between CPC 1.0 and finally NAICS 2007 through five successive steps as outlined in Table  A1.1.
Many different NAICS 2007 codes fall into one of the four original CPC 1.0 codes and therefore not all of them are equally relevant for Royalties and licenses or Intellectual property services. For that reason, we are not taking all 6-digit NAICS 2007 which eventually trace back to the two BPM6 and EBOP 2002 sectors as given in Table A1.1. The reason is that not all NAICS 2007 sectors are fully covered by the two intangible sectors. We only identify those which are not partially covered. These sectors are given in bold in column "NAICS 2002 / 07" of Table A.11 and are not given an * under the column "P" (which stands for partial). The information on whether an item is covered partially or not also comes from the United Nations correspondence tables. To come up with 2-digit BPM6 and EBOPS 2002 sector intensities, we take the unweighted average of each data intensity of these designated nonpartial NAICS 2007 sectors, which should give us eventually a good approximation of the level of data used in the two sectors of Royalties and license and Intellectual property.
As one can see, a mix of service sectors fall under the two sectors, namely R&D services, some financial services, as well as cultural services such as motion pictures and sound recording. Also trust funds are fully covered under this category of Royalties and license / Intellectual property. Of note, the NAICS sector 515120 is not included under EBOPS, but is covered under BPM6 following their respective manuals.
The European Commission supports the EUI through the European Union budget. This publication reflects the views only of the author(s), and the Commission cannot be held responsible for any use which may be made of the information contained therein.