A Brief History of Data Management at Vadu HDSS  

Research Data management involves collection, storage, shaping, protecting, verifying, validating and processing essential data to make it available to researchers for efficient analysis and accurate results.

We at Vadu HDSS have been involved in demographic surveillance since the year 2002. We have focused on the following aspects to ensure quality data management lifecycle.

data security

data sharing

document and record storage

data governance

data architecture

data reference and master data management

data quality management

database management  

data destruction

metadata management 

In November 2002 we have started paper assisted personal interviewing (PAPI): Demographic surveillance data collection started with Social Scientists collecting data themselves. The Social Scientists involved were Pallavi Marathe (Lele), Pallavi Kale, Yasmin Shaikh, Dr Anagha Savdekar and Suman Tikku.Data were collected using hard copy questionnaires. These data were then entered at the back office using an application customized in FoxPro. Trupti Yadav customized the software application and managed the data entry, cleaning and storage. To support the Social Scientists and Field Research Assistants (FRAs), there was a system put in place where there were about 194 Village Level Informers (VLIs) identified in all villages (@ one VLI per 1000 population).These VLIs were villagers who used to keep track of events in their vicinity (about 1000 population or say about 200 households) and inform the FRAs.The FRAs then used to fill in respective event form by visiting the households. By February, 2003, the social scientists were joined by a team of Six Field Research Supervisors (FRS) namely Vijay Gaikwad, Sayaji Pingale, Prashant Gaikwad, Bharat Choudhary, Dipak Mandekar and Shilpa Fulaware.






Moving ahead, in the year 2007 the paper assisted personal interviewing (PAPI) system was upgraded using technologies available and accessible at that time. The software application was changed to PHP bases web forms with MySQL as backend database by Padma Khua, a Masters student of Tathagata Bhattacharjee at I2IT, Pune.The first Systems Requirement Specifications created by Pavan K was revised/edited by Padma to suit the needs of Vadu HDSS.Data collection using hard copies continued. This underwent a number of iterations by many data developers including Prasad Joshi and Neeraj Kashyap. Sometime in the year 2009, another cadre of personnel, namely the Field Research Assistants (FRA) joined the data collection team. Hence, the system of VLI was shut down in the year 2009.

A major leap was taken during the monsoon of 2013, where electronic data capture started in the field from surveillance round 21. Laptop based discrete distributed offline system was launched which was synced with server at regular intervals (LAPI). This system was designed, developed and implemented by Tathagata Bhattacharjee using phpMyEdit tool and MySQL as backend database.Sagar Patil played an important role of supervision and guided the field team into this new foray of electronic data collection. The equipment used was Dell laptops.





In the year 2015, advancing technology of tablet computers were adopted and a tablet based distributed offline system was deployed which synced with server at regular intervals (TAPI). Nidhi Patharia developed the first Android Application for this putpose using Android, SQLite, json, PHP. The final version was stored in MySQL.Raju Narale join the team later and continued its development and he was supported by Sandeep Bhujbal. For the first time, location of interview (GPS coordinates) were collected (auto filled) at the time of interview. The data collection using as many as 30 Samsung Android Tablets started. The data collection was done by FRA and the FRS were supervising the FRA using the same application. The field activities related to data collection and supervision was done by Bharat Chaudhari.


In the year 2019, a cloud based offline/online personal interviewing system [C/TAPI] was introduced under the guidance of Tathagata Bhattacharjee, and Sandeep Bhujbal created a new application for data collection. It was created using World Bank promoted SurveySolutions tool. This deployment was named as VaduHDSS2019. This application had two parts, i.e., the Designer & Cloud server. The questionnaire is designed using Designer. The questionnaire was then uploaded on cloud server to do the data collection, hosted on World Bank cloud server.After completing one round successfully on the World Bank’s cloud, Vadu HDSS has now hosted the application on its own cloud virtual machine hosted on Amazon Web Services (AWS). Currently this application is in use for HDSS data collection and field activities related to data collection and supervision was done by Bharat Chaudhari. Also, in this year 2019, a major shift was undertaken in terms of data management by moving all datasets into an enterprise class PostgreSQL database.



In the year 2020, a shift was undertaken in the data collection methods. The round was atypical from January 2020 and the FRAs were visiting the households in the field for surveillance round. When the first National Level Lockdown for containing COVID-19 pandemic started on 22nd March 2020, we quickly shifted to telephonic interviews but the data entry continued using this VaduHDSS2019 application.

As data collection was over phone, one of the most important features of this application, namely the Geo-positioning recording (the GPS) becomes redundant. It is always better to collect data rather than missing it completely and hence we opted for telephonic data collection.

When we started telephonic interviews for HDSS data collection, we did face quite a few challenges. These challenges included non-availability of the person on the phone, incorrect phone numbers, FRA was unclear as to who was talking etc. Dinesh Shinde, a senior FRS did an exercise to collate what FRA suggest to update phone numbers for entire database (Annexure 1).

Overall experience in the recent past is that the HDSS data quality is at stake and Vadu HDSS is unsure of the data completeness as well as correctness. Hence, this exercise proposes to identify better ways of documentation and better ways ensuring how we could improve the quality of Vadu HDSS data collection.

Our endeavor continues, and we are committed for a strong and reliable research data management ecosystem.

This article is written by Prof. Sanjay K Juvekar and supported by Tathagata Bhattacharjee

Please send data related queries to data_manager@kemhrcvadu.org