Frequently Asked Questions
( on PECO and DOSME panel projects)
What do PECO and DOSME actually mean?
PECO is an acronym from French „Pays Europeannes Centrales et Orientales“, that is, Central and Eastern European Countries. The first “PECO Panel” project was designed to look at the development of businesses in these countries. The name for this grouping of countries was changed to “Central European Countries” or CECs during the project. Thus CEEC and CEC mean the same thing. DOSME stands for „Demography of Small and Medium Enterprises in Central European Countries“.
The main objective of the “PECO Panel project of newly created enterprises“ was: the establishing and observing the enterprises existing in the participating countries in order to obtain data on the size and characteristics of the active enterprise population and its development. Asthe panel had to be established on the basis of business registers of the national statistical institutes, analysis of the quality of register was included into the main aims of the project.
Besides that the project was designed to teach the participating countries the methodology of sample surveys, experiencing the whole life-cycle of a panel survey within a large multi-national project. The transfer of knowledge and development of skills was concentrated in such way that the CECs will be able to continue the panel of newly created enterprises independently in the future and apply the methods to other areas of enterprise statistics. The knowledge and skills that have been developed include panel methodology and statistical methods such as sampling, imputation, estimation and grossing-up.
The idea of the project emerged from the seminar on "Business registers in statistical practice" which was organised by the statistical office of the Czech and Slovak Federal Republic in Bratislava in November 1992 with support from the French INSEE and Eurostat. In working out a technically feasible project and to implement it Eurostat also agreed to provide substantial financial aid under the multi-country PHARE programme.
After a long period of preparation, the first PECO survey started in September 1995. The first survey on the units existing at 1 January 1995 was followed by two annual surveys on newly created enterprises and by a follow-up panel survey after two years. During these three years all participants have gained experience in designing panels, sample surveys and imputation.
Project PECO was carried out in years 1995-1997. From methodological point of view the DOSME project is a continuation of the PECO Panel Project. Also the previously created databases have been utilised in the process of analysing the survey results.
One of the products of the PECO project has been the calculation of preliminary estimates of the creation of enterprises since the beginning of transition and their characteristics. The results are very valuable, because, for the first time since the fall of the Berlin Wall, the impact of the transition on the emergence of new production structures has been quantified and the data for the different countries are fully comparable. The importance of the data has been widely recognised: the demand is large and the press coverage of the first publication of results was remarkable. Insight in the quality of the national business registers was another output of the project.
The success of the panel project made it possible to take a next step in creating the conditions for good enterprise statistics, which is particularly important for those countries that need to present reliable and harmonised information on their economic structure prior to entry to the European Union (EU).
Thus the DOSME project has been introduced, wherein the main issues to address in the coming years were set as follows:
The following countries participated in the project from the beginning: Albania, Bulgaria, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Rumania, Slovakia and Slovenia. In 1998, the Former Yugoslav Republic of Macedonia (FYROM) joined the project.
The project is co-ordinated by Eurostat and experts from EU countries (France, UK, Netherlands and on occasions from Finland and Denmark) also contribute to the project. The international data processing is provided centrally in Infostat where also a standard set of documentation is kept and maintained. The working group of the participating countries has met twice a year and also some training sessions and other special working sessions have been held.
The main responsibilities of participating partners can be briefly outlined in key words.
For characterising the data scope of the project we can have a look at the structure and contents of the basic questionnaire for yearly surveys B (which differs only slightly from the questionnaire used in the first survey A and in the follow up survey C).
It has 3 chapters, each with modules of one or more questions. It begins with a mailing window and end with a return window.
First chapter: Identification of the enterprise
Second chapter: Current position of the enterprise
Third chapter: Starting and development conditions
Each survey has a code:
The characteristics of the surveys are:
The calculated and allocated sample sizes of survey B4 characterise the average sample sizes in the standard yearly B surveys. In the initial survey A, the sizes were 4-5 times higher. In the follow up survey C1, the sizes were dependent on the number of units still active from previous surveys A and B1.
The real selected sample sizes (as the sum of selected units in small strata) were different by plus or minus 2-5 units due to the random character of selection.
The logic of surveys can be characterised by a review of tasks at central and national level.
CENTRAL LEVEL NATIONAL LEVEL
Questionnaire design (adaptation)
Sample package preparation
Creating Sampling base
Generating statistical tables
Checking Statistical tables
Finalising sample sizes
Allocation package preparation
Sample allocation, fraction generation Sample selection
Generating statistical tables
Establishing working groups
National version of questionnaire
Adapted national documentation
Matching Sample with Register
Generating tables and transfer files
Addresses of survey units
Data entry package preparation
Starting the Survey
Conducting the Survey
Training the staff
Entering data from questionnaires
Manual and automatic checking
Reporting on survey progress
National central processing
Result file creation and transfer
Result checking package
Result file checking and correcting
Creating the Central data base (CDB)
Imputation package distributed
Imputation of non responses
Analysing result data
National analysis and publication
Register quality measurements
The sequence of treatments for survey data processing has been defined as follows:
A. Checking the result file
B Creating the CDB and the process of imputation
For checking and analysing the statistical (counting) files and for analysing the CDB mainly MS Excel and SAS are used at both level of data processing.
The data processing packages have been created in Infostat. Here is a short review of them.
PACKAGE USED SOFTWARE NOTES
Sample Clipper Integrated package, menu driven
Allocation Pascal Tailor-made programs and utilities
Data entry Blaise, Pascal, Delphi Entry program written in Blaise, checking programs in Pascal, Menu shell for Windows in Delphi
Result checking Manipula, Pascal Manipula programs and Pascal utilities
Creating CDB Pascal Own special programs and utilities
Imputation Pascal, Delphi DOS version written in Pascal, Windows version written in Delphi (including an SQL module)
Assessing the character of data on the questionnaire from methodological point of view the hierarchical hot deck method was recommended and chosen. Then there was nothing to do than design and create a specific application.
Let us have a look at the defined and used principles of imputation by hierarchical hot deck method.
We distinguished independent and dependent or hierarchical variables. The defined types of hierarchical variables were primary, secondary and tertiary. We say that a variable is secondary if it comes after a switch question (or variable). Tertiary variables are then related to the secondary ones.
The stratification variables were defined in this way:
These variables were mandatory while other characteristics could be added for imputation of specific variables:
We were imputing both item or partial non-responses and unit or global non-responses of active units
Limits in definition of sub populations at the first approximation were taken from French experiences:
For having information to define the optimal choice of criteria for definition of sub population of donors for each variable, analytical output tables were generated. Also tables were produced to show the distribution of donors and acceptors for the imputed variables. Such tables can be generated in any good table generator, spreadsheet or statistical analytical package. In our case, these tables are generated by tailor-made programs.
We can consider the basic data types according to the following groupings: space, time, content. Then the variety of contents of data can be outlined by some related notions as subject, observation and calculation.
The basic file types can be outlined according to the level of observation indicated above and processing in the following way:
We have marked withitalics those files that are present only at the given level. For example, the sampling base is always present only at the national level and the counting and calculating files are always present only at central level. We can see that the majority of files are present at both levels. However, at central level they are multiplied not only by years but also by countries.
This simple list well illustrates that, even at the national level and within one survey, we have to deal with many files. After five finished surveys the number of files will be much greater and we need a logical grid to view them properly.
Some files are primarily devoted to analytical purposes while some others can also be used for analysis. Some new files can be prepared from existing files.
Here we outline the character of the central database as the final survey result and its possible analytical exploitation.
Let us start with outlining the basic or standard contents of our central data base file (CDB). We have the following groups of data in the file:
Besides the identifiers, management and auxiliary data the central database contains three major groups of data:
Considering the differences between these groups of data and the basic structure of the questionnaire we can provide the following types of analyses:
Nearly all of these possible analyses are prepared regularly for our international publications and countries are also providing some of them for their national publications.
Here we are just trying to briefly characterise all of the possible analytical approaches in more detail.
1. Structural analyses
Considering that in the national imputed central database each record contains the weight of the enterprise, we can generate a lot of estimates for the whole active population. Of course, we can use both weighted numbers and percentages. The main groups of variables that can be utilised for structural analyses:
2. Quantitative data about enterprises
First of all we can analyse the number of workers (ISumWork) and the number of local units (ILocNumb). From these data we can calculate categories and averages. However, data about workers can be analysed also according to the types that are present on the questionnaire.
We can consider as quantitative also data on capital conditions of joint stock companies and limited liability companies because the quantitative aspect is included into the question in percents.
3. Profile of entrepreneurs
From characteristics of the sole proprietor or main manager of the enterprise we can analyse the profile of entrepreneurs. The related questions are:
The question on the type of enterprise creation can be included also in this block.
From the year of birth we can calculate age categories that can be used in the process of further analyses.
Sex, education, profession and age we can correlate also with condensed activity code or super condensed activity code, with categories of the number of local units, etc.
4. Problems in development
Two blocks of questions have been devoted to this problem:
Also the question on type of enterprise creation can be included into this analytical block.
In analysing the related questions it is important to realise that they are constructed as hierarchical ones, or in other words, as secondary and tertiary (while the primary question is always the questions on activity of the enterprise). That means, that first we can get figures on having problems or not, and then we can get the percentages on the distribution of existing problem types. This comment on analysing hierarchical data is valid for all other not independent (secondary and tertiary) questions (reason for cessation and control of investment).
5. Quality of the register
First of all it must be stressed that while the previous types of analyses can be provided both at national and central level, analysing the quality of the register is a business at the national level by the definition. We have not enough information on how this kind of analyses has been carried out in countries.
The basic factors of the quality of registers can be outlined as follows:
In the process of quality analyses some considerations and rules have to be taken into account:
According to these considerations we can provide the following basic analysis:
Here is a short comparison of the contents of our first longitudinal database and the standard central database in their data groups.
Longitudinal data base Central database
I. Surveys A and B
II. Survey C
Besides the difference inAuxiliary record and sub-file identifiers, there are two important differences. The first is that the longitudinal database does not contain data groups ofDirect results of survey andManagement data of Survey. The second difference can not be seen in the above listed comparison. While CDB files for survey A and for standard B surveys contain both direct survey data and imputed data, the primary CDB file for survey C contains only direct survey result data but there are no corrected and imputed data in it. It is obvious because the survey C results have been imputed within the longitudinal data base frame where in some cases also the data from A and B1 have been re-imputed. That is why the longitudinal database cannot simply replace the original or source databases. It is not a copy or a simple combination of the survey result databases.
Two basic types of analytical publications have been prepared:
The basic publications from the project are:
The publications are available for public at these sources:
References to Eurostat publications:
The main aims of the project, i.e. a) teach the CECs about sample surveys and panel surveys and b) establishing an international longitudinal data base on the transition process have been fully achieved.
Besides that we can outline some experiences of countries gained from the project in several fields.
A. Methodology of sample surveys and panel surveys
B. Organising sample and panel surveys
C. Managing the field-work of panel surveys
D. Editing and imputing sample survey data
The working group meetings are devoted to the following main problem areas:
The provisional agenda of the meeting was always prepared in advance and was distributed to all participants. Besides that nearly all working papers of the meeting are prepared in advance and distributed before meeting via e-mail or published on the project web site.
As an example here are listed the main topics discussed at 3rd Working Group Meeting in Brasov (April 19999):
The progress report on central processing in Survey B3 prepared and presented by Infostat can also be shortly characterised by its contents:
Sequence of tasks in central processing:
Modification of sampler package
Other issues (corrections, communication, working meetings).
The first four meetings, devoted to prepare the project and start the first survey, was held in Bratislava. Then the standard spring and autumn meetings were organised by countries. Here is a list of meetings, dates and places.
Meeting Date Place
Until now only one seminar has been organised on the use of Business Registers and the CEC pilot project on demography of small and medium enterprises held in Luxembourg, in February 1999.
The seminar can be briefly characterised by an overview of submitted and presented document:
Communication within the project either PANEL PECO or DOSME is obviously multidirectional following from the multinational character of the project. The basic communication links in the project are as follow:
Subjects of the communication in the project in general are:
From the above lists we can see that the communication in the project is rather complex and in many cases, depending on the subjects, must be fast, in time and reliable. New information and communication technology we had at disposal at each communication point when both projects came into operation helped us to fulfil the tasks to a large extent.
The basic documentation of the project was designed and developed in software package Fore Front Help Author in a form of a WINHELP file.
The general structure of the help system can be illustrated by its contents and the scope of topic types in the index of topics.
The main screen or the contents of the help file for Survey B looks like this:
Structure of the Documentation
Index of topics
General instruction S-B1 Questionnaire F-B1
Sampling base D-P01 Sampling program T-P1
Sequence of treatment T-SBSP Sampling manual S-P1
Statistical tables D-ST Transfer files D-TF
Entry manual S-EP
Version March 21, 1997
Author: InfoStat Bratislava
The general documentation of the project is a set of forms, i.e. the forms are basic instrument to document the project. Each of the forms receives an identifier and each form belongs to one series or type. The following ten types of forms have been defined and used:
Topics are connected with jumps that can be defined individually or in groups as sequences. From this electronic form of the documentation various printout selections can be produced.
While this documentation is concentrated on methodological issues, a lot of working papers and software documentation is present on the project web site now. Software packages also contains their own documentation.
In order to promote the communication with the all parties (the project manager, INFOSTAT, participating countries and consultants) involved in the DOSME project we have designed and developed the DOSME home page accessible via Internet.
The basic aim is to provide a universal and flexible tools for communication, distribution of important document and software and for an exchange of the project documents mainly those presented to working sessions. At present we have put on the DOSME home page the all documents produced so far in the project and presented to the working session as well as some other useful information related to the project activities.
The basic contents of the site is the following:
The site is regularly updated and some new improvements have been also introduced.
(From project documents compiled by Ladislav Meszaros, Infostat Bratislava, March 2000)