Time for Government Data Revolution is now


Anyone who has worked inside/alongside the government knows how difficult it is to work with government data. Any citizen or researcher or a journalist who access government data are frustrated most of the times as well. This problem is not new. Even government officials are confused and clueless most of the times. However, data is the basis for policy-making, for public service delivery, for monitoring, for evaluation, for situational awareness and for resource allocation!

The Government is on a spree to build portals, dashboards, websites for collecting, managing, disseminating information. However, we are still applying band-aid on a fractured leg.

In order to address the issues, we must understand the bigger picture. We must think through a systemic thinking lens and understand people, process and the technology.

But let us first understand the various different types of data which exists in government:

Though the time frequency, methods, storage and analysis differs for each of the data type, we can use this framework broadly for any government at any level.

What are the issues here? We shall examine this through the lens of people, process and technology and look throughout the Data Pipeline.

1. Collection

People who collect, manage, analyze and disseminate data. Most of this data comes from the ground which is collected by either school teachers, contractual staff or local government at the district level. There is little efforts to fill up vacancies at these levels, ensure capacity & data literacy of these people and ensure proper incentives so that we can get proper data.

Many of time we see that the data is being entered just for the sake of entering or because the administrator wants to show the area as “well performing”. Crores are being spent but nothing of this can use for policy-making or any analysis.

Although we now have offline smartphone apps for data collection and are slowly moving away from paper based systems, a lot needs to be done to improve the collection processes.

2. Data Management

Every piece of data collected is in a different format, lying with different departments in silos and is very difficult to analyze. Some still provide scanned PDFs, some with merged columns and rows in excel, some with 100s of excel tabs, some in google sheets, some data is on NIC servers.

In short, all the 6 types of data mentioned above is in different places, with different people, in different formats.

3. Data Analysis

We have a lot of BI tools in the market like Tableau, Metabase, Data Studio, ArcGIS, etc. which can help us make sense of large spatio-temporal as well as simple data-sets. However, we need to be careful with these since more dashboards will only end up complicating the problem rather than solving it. but we if we have solved 1 and 2, using these across the pipeline will really reap benefits in converting data into insights and actions ultimately. It will also help build accountability in the system.

4. Data Dissemination

Although we have NDSAP (National Data Sharing and Accessibility Policy) which provides an overall framework for metadata, standards, accessibility, sharing, etc. implementation of the same has been really shoddy. Mostly because they are guidelines and are not enforce-able. A lot of the government data is lying on multiple portals, some with departments and a lot of it charge-able.

In a lot of instances, data is only available at a state level. This does not enable civil society organizations or citizens to get a true picture since there is a lot of variation below state level. Even below district level, there is a lot of variation.

What should be done?

  1. Create common identifiers for administrative units: LGD (Local Government Directory) codes and names should standardize across datasets and ministries. Every district, sub-district, block, village and local body should be uniquely identifiable and the name should be through a single source. This can be further link to schemes, programs and their output/outcome indicators.

  1. Enable data availability till block/local body level: Precision mapping (collecting data from a smaller geographical unit) and collecting fine-grained data are important for identifying target areas that need priority. These would make local bodies accountable for developmental indicators. In fact, even districts, on average, have about a 1.3 million rural population, making district-specific findings difficult to interpret in light of the substantial variation within districts. The utility of such fine-grained data can be fully leverage if link to local governance units that are accountable for implementation of programs and interventions.
  2. Build an open source platform for data-management across NICNET for Internal Data Management: Instead of using MS Excel, Google Sheets and using VPN to access and share data, the government can easily build an internal data management platform. Imagine anyone creating/pulling a database and linking it to administrative units, programs and is being able to share it with different departments and ministries. E.g.: Health Ministry being able to access Poshan tracker data link with its internal data systems, Fertilizer Ministry being able to access crop sowing data from Agriculture Ministry so that it can better plan Fertilizer demand. The possibilities are endless.
  3. Integrate household/people level data through common identifiers: Data about any citizen is typically disperse across ministries and departments. For any citizen, it is really a cumbersome process to submit the same data hundreds of times to different departments. With consent, we can leverage the once-only-principle i.e. certain static information should be collect only once. It can then be use and shared internally so that the citizen doesn’t have to fill it every-time. It can be use for efficient delivery and better targeting of schemes – e.g. NREGA and NFSA databases can be use to target a lot of other sectoral schemes such as health insurance. An illustration is below:

5. Build a National Data Lake: India needs to build a central storage repository that holds big data from multiple sources in a raw format which are uploaded by CDOs (Chief Data Officers) and can use by officials, citizens, researchers. We also need to build classification of data by time-series, administrative units and structure all other variables. The OGD (Open Government Data) platform needs to revamp and integrate with the data management and sharing platform

6. Combining Public + Private Sector Data: We are only seeing a part of the jigsaw puzzle with each dataset or sector being a small piece. We are never going to understand the true extent of any situation unless we combine government data with private sector data. Private sector data on hospitals, schools, companies, etc. needs to be integrate in our data lake through open digital ecosystems

7. Build Data and Digital Capacity amongst Public Servants: Efforts should be made to build data leaders and stewards within departments to provide data oversight, policy and technical frameworks. Moreover, analysts and data scientists needs to be hire who can make sense of the complex data, provide insights and forecast to policy-makers and people who can data warehousing who also understand statistics and have some domain knowledge

8. Consolidate Portals: Every Ministry and state should have a single window portal for accessing all public data. All the 6 data types should converge at these portals and should be at a single source so that anyone can navigate easily. There is no point in building endless portals to host and disseminate data

9. Build One Stop Public Information Portals:

While it is heartening to see citizens rising up to the occasion and working on the information gap, it is also increasing the information load on a citizen and creating an asymmetry with citizens who have access to social capital and those who don’t. Add to this the problem of verifying leads, scammers, etc. The nature of the problem is such that all non-governmental entities combined (with absolutely good intent) may end up complicating this. Is there a better way to address this information crisis?

The single source of truth information should always be a public good and provided by the state itself with centralized single window aggregators so that every citizen knows: if I have to find out about this, I will find it here. If i have to contact this person, I will find the details here.

It seems this problem is much deeper than building dashboards, chatbots and portals. We must advocate for public information portals! Rajasthan and Karnataka have built very good public information portals which provide details on schemes, services, representatives, etc.

10. National Geospatial Data Portal connected with OGD:

Different civil society organizations, academia and private sector organizations have built portals in silos which have different spatial data-sets on them. However, efforts need to make on building a National Geospatial Data Portal with key stakeholders like NRSC (National Remote Sensing Centre), SOI (Survey of India), etc. where we can layer different types of spatial data-sets and use it across Ministries for policy planning, monitoring and evaluations.

India really has a strong statistical system. If we are able to implement these measures, we can unlock the full power of public data which will lead to transparency, accountability, better planning, monitoring and delivery of public services.

Article is originally published on LinkedIn Pulse and written by Mitul Jhaveri.