INDIA'S NATIONAL MAGAZINE
from the publishers of THE HINDU
How reliable is UID?
R. RAMACHANDRAN
The advocates of the project believe that this will eliminate the multiple bureaucratic layers that the people of the country, particularly the rural poor, are confronted with and the multiplicity of documents that they have to present in order to access their legitimate entitlements, and the channels of corruption that these have bred over the years. But it has been clearly stated that “Aadhaar will only guarantee identity, not rights, benefits or entitlements”. It is only envisaged as a “robust” mechanism to eliminate duplicate and fake identities by uniquely verifying and authenticating genuine beneficiaries and legitimate claimants.
After authentication by a centralised database of biometric and demographic information to which service providers will be linked, this unique identification number alone will enable every individual to access services and entitlements anywhere in the country and at any time. The centralised database, Central ID Repository (CIDR), will be maintained and regulated by the UID Authority of India (UIDAI), which has been set up with the technocrat Nandan Nilekani, former co-chairman of the IT enterprise Infosys, as its chairman.
So will the system do what it claims it will? Socio-political issues and those of ethics and breach of privacy have been raised in this regard in different quarters. But purely at a technical level, the question is whether the technology deployed for identification will return answers that are unambiguous. Can it be that definitive that the authentication and verification made by matching the presented data with the stored data for a given individual in the CIDR will be unique and refer only to that individual? Are there no errors in such biometric systems?
What is biometrics? Biometrics, as defined by the report of the Whither Biometrics Committee (2010) of the National Research Council (NRC) of the United States, “is the automated recognition of individuals based on their behavioural and biological characteristics. It is a tool for establishing confidence that one is dealing with individuals who are already known (or not known) and consequently that they belong to a group with certain rights (or to a group to be denied certain privileges). It relies on the presumption that individuals are physically and behaviourally distinct in a number of ways.” The UID biometric system is a “multi-modal” one and uses data on the ten (single) fingerprints, palm print or slap fingerprint (which combines the features of fingerprints and hand geometry), iris characteristics and facial images of every person.
The NRC study concludes thus: “Human recognition systems are inherently probabilistic and hence inherently fallible. The chance of error can be made small but not eliminated…. The scientific basis of biometrics – from understanding the distribution of biometric traits within given populations to how humans interact with biometric systems – needs strengthening particularly as biometric technologies and systems are deployed in systems of national importance.” A biometric identification system basically involves the matching of measured biometric data against previously collected data, the reference database, for a given individual. Since the sources of uncertainty in a biometric system are many, this can only be approximate. So biometric systems can only provide probabilistic results.
Sources of uncertainty
The sources of uncertainty include variations in biological attributes both within and between persons, sensor characteristics, feature extraction and matching algorithms. Traits captured by biometric systems may change with age, environment, disease, stress, occupational factors, socio-cultural aspects of the situation in which data submission takes place, changes in human interface with the system and, significantly, even intentional alterations. This would be so particularly of the poor engaged in labour-intensive occupations such as farming, where hands are put to rough use causing weathering of finger and hand prints. Recently, it has also been shown that the three “accepted truths” about iris biometrics involving pupil dilation, contact lenses and template aging are not valid. Kevin Bowyer and others from the University of Notre Dame, U.S., have demonstrated that iris biometric performance can be degraded by varying pupil dilation, by wearing non-cosmetic prescription contact lenses, by time lapse between enrolment and verification and by cross-sensor operation and that all these factors significantly alter the matching done to identify an individual uniquely.
According to the NRC report, there are many gaps in our understanding of the nature and distinctiveness and stability of biometric characteristics across individuals and groups. “No biometric characteristic,” it says, “is known to be entirely stable and distinctive across all groups. Biometric traits have fundamental statistical properties, distinctiveness, and differing degrees of stability under natural physiological conditions and environmental challenges, many aspects of which are not well understood, especially at large scales.” (Emphasis added, given its particular relevance to the UID, which has to deal with 1.21 billion registrations in the database.)
Calibration changes and aging of sensors and the sensitivity of sensor performance to variations in the ambient environment (such as light levels) can affect the measurements. Biometric characteristics cannot be directly compared, but their stable and distinctive features are extracted from sensor outputs. Differences in feature extraction algorithms – chiefly pattern recognition algorithms – can affect performance, particularly when they are designed to achieve interoperability among different proprietary systems. However, in the case of UID, customised enrolment and extraction software are supposed to have been used in all systems used by enrolment (registration) agencies across the country. The same will have to be done for systems at the service provider level, where a beneficiary's data will be captured for authentication. Similar will be the issue with regard to matching algorithms. However, since matching is generally expected to be done at a centralised database at CIDR, only the algorithm's performance or sensitivity in handling variations in biometric data presented will be important, but this needs to be known and quantified.
Biometric match
A fundamental characteristic of a biometric system is that a biometric match represents “not certain recognition but probability of a correct recognition, while a non-match represents a probability rather than a definitive conclusion that an individual is not known to the system”. Thus, even the best designed biometric systems will be incorrect or indeterminate in a fraction of cases, and both false matches and false non-matches will occur. Recognition errors of biometric systems are stated in terms of false match rate (FMR) – the probability that the matcher recognises an individual as a different enrolled subject – and the false non-match rate (FNMR) – the probability that the matcher does not recognise a previously enrolled subject. (Correspondingly, 1–FNMR means the probability that a trait is correctly recognised and 1–FMR that an incorrect trait is not recognised.)
“Assessing the validity of the match results, even given this inherent uncertainty,” the NRC report points out, “requires knowledge of the population of users who are presenting to the system — specifically, what proportions of those users should and should not match. Even very small probabilities of misrecognitions — the failure to recognise an enrolled individual or the recognition of one individual as another — can become operationally significant when an application is scaled to handle millions of recognition attempts. Thus, well-articulated processes for verification, mitigation of undesired outcomes, and remediation (for misrecognitions) are needed, and presumptions and burdens of proof should be designed conservatively, with due attention to the system's inevitable uncertainties.”
India's current population is 1.21 billion and the UID scheme aims to cover all the residents. No country has attempted an identification and verification system on this scale. Though enrolment for the proposed system is stated to be voluntary, it will be on an unprecedented scale because a potential beneficiary can be denied access to a particular scheme or service if the individual does not enrol himself/herself and obtain the Aadhaar number. Indeed, many countries that had launched a biometric identification system have scrapped the idea as there are many unanswered questions about the reliability of a biometric system for the purposes they had considered it. It should be remembered that the objective of the Indian system is developmental, rather than security and related issues that countries of the West have been concerned with, and is aimed at delivering specific benefits and services to the underprivileged and the poor of the country. The envisaged system is also correspondingly different from those proposed elsewhere. To see if the system envisaged by the UIDAI meets these criteria and can deliver unique identification of all, it is important to understand the way the system is supposed to work.
The process
The process of enrolment that is currently on – already about 70 million have enrolled – involves presenting oneself to one of the agencies, termed registrars, identified by the UIDAI for enrolment purposes across the country. This involves the registrar recording the individual's properly verified basic demographic information – which includes name, address, gender, date of birth, relationship – and capturing biometric information – which includes palm print (slap fingerprint), ten single fingerprints, iris imaging and face imaging – and this is encrypted and transmitted to the UIDAI electronically, including physical transmission using pen-drives for locations that lack any data connectivity. In principle, unknown errors or data corruption could occur at the transmission stage.
Even assuming that the transmission is perfect, data presented during enrolment need to be compared and checked to avoid duplication – “de-duplication” – and thus prevent any fraud. Otherwise one individual may end up with two Aadhaar numbers. So any new set of biometric data – fingerprints and iris prints – need to be compared with those of already enrolled individuals and shown to be different from every other set. This comparison was trivial when the first person, Ranjana Sonawne of Tembhli village in Maharashtra, enrolled because there was no one before that to be compared against. But it is clear that when the nth person goes to enrol, the data will have to be compared against the already enrolled n–1 sets of data. So registrars will send the applicant's data to the CIDR for de-duplication. The CIDR will perform a search on key demographic fields and on the biometrics for each new enrolment so as to minimise duplication in the database.
Can one totally eliminate duplication? As noted earlier, this will depend on the FNMR which, in a probabilistic system, will be a finite number, however small. So there will be a small but finite probability for duplication to occur. It is easy to see that this matching exercise will involve n(n-1)/2 comparisons, which, as n becomes large, obviously, is a highly computationally intensive exercise requiring large computing power. The number of comparisons will be several orders of magnitude more than the numbers enrolled. So in a population of 1.21 billion, when the (1.21 billion+1)th person comes in to enrol, the CIDR server will have to perform about 700 million billion (7x10 {+1} {+7}) comparisons. This may seem mind-boggling, but a modern-day high-performance computer can do this pretty fast. And since such a de-duplication exercise will be done off-line before issuing the Aadhaar numbers, the time involved in doing the comparisons is not the issue. The key issue is the magnitude of probabilistic error in these comparisons. In case of a false match, for example, the system will reject a genuine applicant. A computer cannot resolve FMR and FNMR cases; it has to be done physically by tracking down individuals and carrying out the re-enrolment-cum-matching exercise.
(Spoofing a single fingerprint has been demonstrated to be possible and such an impostor fingerprint can be used to fool a biometric reader. But this seems nearly impossible to do for all the 10 fingerprints and the palm print without being caught. And, combined with multimodal comparison, chances of such impersonation become extremely low.)
Error rate
The crucial issue, therefore, is the error rate and how many false positive identifications and false negative identification cases can potentially arise? A Proof of Concept (POC) exercise was carried out by the Authority with 40,000 subjects, divided into two sets of 20,000, in rural Andhra Pradesh, Karnataka and Bihar. This was done to analyse data from rural groups where quality of fingerprints is likely to be uneven.
For POC analyses, only 10 fingerprint data and two iris data were used. The face biometric was not used. According to the report, the study – which clearly was a multimodal one – observed an FNMR – that is a person is identified to be a different individual and re-enrolled resulting in duplication – of 0.0025 per cent.
Similarly, the study observed an FMR – where a new applicant is rejected because of false matching – of 0.01 per cent using irises alone and 0.25 per cent with fingerprints alone. But the concluding claim of the report that “by doing analysis as shown in the examples above on real data captured under typical Indian conditions in rural India, we can be confident that biometric matching can be used on a wider scale to realise the goal of creating unique identities” is clearly misleading as the order of magnitude of such cases of misrecognition in the real situation involving much larger numbers (say hundreds of millions) will be pretty large. The corresponding exercise of resolving these cases would be huge. If not resolved, large numbers would either be denied the benefits due to them or large number of impostors would get benefits that are not legitimately theirs because of inherent errors in the technology.
Also, as the NRC report emphasises, “Although laboratory evaluations of biometric systems are highly useful for development and comparison, their results often do not reliably predict field performance. Operational testing and blind challenges of operational systems tend to give more accurate and usable results than developmental performance evaluations and operational testing in circumscribed and controlled environments.” As against this one-to-many comparisons at the stage of identification of an individual during the enrolment process, the process of authentication or verification when a claimant presents his/her UID number is a case of one-to-one match. The process of Aadhaar authentication, as outlined by the UIDAI, is as follows:
Aadhaar number, along with other attributes (including biometrics), is submitted to the UIDAI's CIDR for verification. The CIDR verifies whether the data (demographic and/or biometric) submitted match the data available in the CIDR and respond with a “yes/no” answer. No personal identity information is returned as part of the response. And this process can be done online by the service provider linked to the UIDAI. But the authentication is based entirely on the Aadhaar number submitted so that this operation is reduced to a 1:1 match (emphasis added).
This means that the Authority has only to match the presented data with the copy of the individual's biometrics that was captured earlier and stored in the CIDR corresponding to that UID number. The CIDR will, in turn, say ‘yes' or ‘no' to a particular query on, say, the demographic information of the individual, which can be verified against documents such as Proof of Address (PoA) or Proof of Identity (PoI) by the service provider. This is quite different from the verification required in biometric systems for security purposes, say entry through airports, where every verification procedure may be a one-to-many matching exercise. But authentication, despite being a 1:1 match, could have its own error rates largely arising from inevitable human errors, especially in large-scale implementation – for example, transmitting the wrong Aadhaar number or wrongly keyed-in query – and since the system is designed to answer only in “yes/no”, the service provider, say NREGA, may not be in a position to know that the error has originated at the agency-end itself. While, in principle, the UID number holder should be able to crosscheck what is being transmitted, in the rural Indian context, given the level of illiteracy, this may not always happen.
More pertinently, the verification process could itself become the channel of new ways of corruption. Suppose the service provider deliberately transmits the wrong Aadhaar number during the authentication process and in return obviously gets a ‘no” for an answer to any query pertaining to the claimant of service or benefits that he/she is entitled to. Now this could become the basis of corruption. The service provider could say that the service/benefit can be provided – which the claimant is entitled to legitimately – on payment of ‘x' amount of money.
This socio-cultural trait of corruption will always find new ways of doing it, especially when such a project is sought to be implemented on such a countrywide scale involving hundreds of million transactions. It is not clear how this manual error – deliberate or otherwise – at the man-machine interface in the UID system can be avoided on a real-time basis during the interaction between a potential beneficiary and the service provider. In addition to probabilistic errors in the biometric identification scheme, perhaps such issues could also become cause of real concern.