Data mining pilot – evaluation report - Electoral Commission

sentencehuddleΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

217 εμφανίσεις

Data mining pilot – evaluation
report
July 2013
Translations and other formats
For information on obtaining this
publication in another language or in
a large-print or Braille version, please
contact the Electoral Commission:
Tel: 020 7271 0500
Email: publications@electoralcommission.org.uk
© The Electoral Commission 2013


































Data mining pilot



e
valuation
report

July 2013






Contents


Executive summary

1

1

Introduction

1
1

2

Set up


target groups and databases

18

3

Pilot processes


data transfer, matching, follow up

2
7

4

Costs

3
8

5

National data mining


judging success

4
1

6

County data mining


judging success

57

7

Recommendations

6
0





Appendices



Appendix A



Department for Education

6
3


Appendix B



Welsh Government
,

Department for Education
and Skills
68


Appendix C


Department for Work and
Pensions

69


Appendix D


Royal Mail

76


Appendix E


Student Loans Company

8
1


Appendix F


County databases

8
4


Appendix G


National data mining, follow up work

87


Appendix
H


Pilot area profiles

9
4



1

Executive summary
Background
• As part of the shift to individual electoral registration (IER) the UK
government has been exploring the extent to which access to information
held on national public databases can assist Electoral Registration
Officers (EROs) in maintaining their electoral registers. This pilot, run by
Cabinet Office, involved comparing registers with five national public
databases as well as data held by four county councils in order to identify
unregistered electors. This process is known as ‘data mining’.
• The Electoral Commission has a statutory responsibility to report on the
effectiveness of the pilot.
• Data mining was the subject of a previous pilot in 2011. We published our
evaluation of that scheme in March 2012.
1

Set up – target groups and databases
• The pilot involved targeting three groups of electors known to have low
registration rates – attainers (17 year olds and some 16 year olds), home
movers and students.
2

• Eighteen EROs were involved in the national pilot. They were provided
with data held by the Department for Education (DfE), Welsh Department
for Education and Skills (DfES), Department for Work and Pensions
(DWP), Royal Mail and Student Loans Company (SLC) for the period of
the pilot.
• Each of the EROs accessed data from at least two of the national
databases.
• In addition, four district council EROs were given access to data held by
their respective county council, with a specific focus on trying to identify
unregistered attainers.
3



1
The Electoral Commission, Data matching schemes to improve accuracy and completeness
of the electoral registers – evaluation report (March 2012)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0010/146836/Data-matching-
pilot-evaluation.pdf

2
An attainer is an individual who will turn 18 before the end of a 12 month period starting from
the next 1 December after the application is made i.e. if an application is made in spring 2013,
the applicant will be eligible as an attainer if they turn 18 any time before December 2014.
2

Pilot processes – data transfer,
matching, follow up
• There were considerable delays to the original timetable for establishing
this pilot. A significant cause of the delays was the lack of capacity and
resources within Cabinet Office (and the Government Digital Service
(GDS), which is part of Cabinet Office) due to their workload related to the
transition to IER.
• The delays affected the amount of work the EROs could do with the data.
• Three different organisations were involved in matching the national
databases against the electoral registers in order to produce a list of
potential unregistered electors. Each organisation used similar but not
identical matching criteria. We have not been able to quantify any
variation in the matching caused by differences in the matching criteria.
• For the national data mining, Cabinet Office’s original intention was that
pilot areas should adopt a fairly standardised approach to checking the
data received and contacting the individuals identified, to ensure that
results were comparable. In practice, however, the nature and extent of
follow up work varied widely.
• Much of this variation was caused by practical difficulties, for example the
need to spend more time than expected in ensuring the accuracy of the
data received. However, some of the variation could have been avoided if
there had been fewer delays and a greater level of support provided by
Cabinet Office to pilot areas. In particular, a few areas told us they felt
unsupported and were unclear about what to do.
• Although the wide variation in follow up activity has limited our evaluation
in certain ways, we do have more complete and comparable data than in
the 2011 pilot and so are able to reach more definite conclusions.
Costs
• It is not possible to produce an overall figure for the cost of this pilot. This
is because we do not have final costs for all pilot areas or any costs for
Cabinet Office (including GDS), who conducted much of the work.
• We are also therefore unable to estimate the cost per new elector
registered or the likely cost of any national rollout. Any estimates of these
would need to include the cost of coordinating and managing the pilot (the


3
Overall, 20 EROs participated in the pilot, with two participating in both the national and
county data mining.
3

role taken by Cabinet Office in this pilot), as any future work with data
mining would require some form of central coordination.
National data mining – judging
success
• This evaluation sought to answer two key questions:
- How effective is the data at identifying unregistered electors?
- Is data mining a cost effective way of registering new electors?
How effective is the data at identifying unregistered
electors?
• The evidence from this pilot suggests that data mining, as it was tested, is
not a practical way of identifying unregistered electors.
• This is because, although the data returned to EROs did contain details of
unregistered electors, it also contained significant numbers of existing
electors, ineligible individuals and out of date information (where the
individual was no longer resident at the given address).
• We have not made, and could not make, a consistent assessment of what
proportion of records returned to EROs were in fact potential new
electors. However, we have reported on what pilot areas found when they
either checked the data against other data sources (for example their
electoral registers) or when they received responses to their follow up
work, as well as using their feedback on the processes and volume of
work involved.
• The feedback from pilot areas was clear: the amount of time and
resources they spent on reducing the data provided to a list of likely
new electors with usable address information was unsustainable
and could not be incorporated into their ‘business as usual’
processes.
• The reasons that so many existing electors and ineligible individuals were
returned on the data include poor data specifications from Cabinet Office,
currency restrictions not being tight enough and incomplete or poor quality
addresses on some of the national databases.
• Inconsistent address formatting and incomplete addresses are likely to
have contributed to the significant numbers of existing electors returned in
the data (Cabinet Office could not provide the data which would have
allowed for a definitive assessment). These problems also made it more
difficult for EROs to use the information to write to the individuals
identified.
4

• Some pilot areas were uncomfortable with the risk of writing to existing
electors, individuals below the age of registering to vote or deceased
people. Many areas spent a long time checking and cleansing the data
before sending out registration forms.
• Some of these issues are likely to be relatively straightforward to resolve,
for example creating better data specifications or improving data currency.
• However, some of the issues are more complex, particularly the issue of
addresses being held in different formats between the national databases
and electoral registers and, on some of the databases, being incomplete.
• The variable quality and formatting of addresses reflects the differing roles
of addresses on the different databases. In many cases the address
information is less important than it is on the electoral registers.
• Issues with addressing are likely to be only resolved in one of two ways:
comprehensive addition of UPRNs to the national databases used for
mining or extensive address standardisation at the initial stage of the
mining process.
4
Both of these solutions have resource implications.
Is data mining a cost effective way of registering
new electors?
• The aim of this pilot was to see if data mining was an effective way of
improving the completeness of the electoral registers, by identifying
potential new electors who could subsequently be registered.
• In order to answer this question, we would need to assess the cost benefit
of data mining by, for example, calculating the cost per new elector
registered. However, we are unable to do this as Cabinet Office could not
provide details of their expenditure on the pilot. As they managed the
process and conducted much of the matching and data processing, their
costs could be significant and are crucial in reaching any realistic
assessment of cost effectiveness.
• In addition, we would need to assess the added value of providing EROs
with access to national data compared to the data which they already
have access to locally.
• We do know the numbers of new registrations achieved by pilot areas.
These vary widely, from 2% to 31% of the sample the pilot area worked
with, with an average of around 9%. In many cases, the actual numbers of
new electors were low – on average around 300 individuals.


4
UPRNs are unique 12-digit codes assigned to each property at a local level. These local
lists are then combined to form the National Land and Property Gazetteer in England and
Wales and the One Scotland Gazetteer in Scotland.
5

• However, the level of new registrations does not provide a clear
assessment of the potential of data mining. The registrations achieved
in each area were partly the result of the approach taken to data checking
and follow up work and this varied across the pilot areas. Also, in nearly
all areas, the follow up work was limited to a single letter with no reminder
or canvassing. The new registrations are therefore likely to be on the
lower end of what could be achieved.
• In addition, the delays in the pilot timetable mean that not all of the
registrations achieved could be reported here (as responses will have
continued to come in up to and following publication).
• Part of the variation is also likely to be due to demographics. Some pilot
areas had a higher response rate to their letters than others and in areas
with higher levels of population mobility, it is likely that the data becomes
out of date more quickly. From the data available, however, we cannot
make any clear assessment of the impact of demographics on response
rates.
• Taken on their own, some of the numbers of new registrations appear
reasonable (given the general lack of personal canvassing in the pilot, we
have taken a 10% registration rate as reasonable). However, overall the
numbers of new registrations are low in light of the time and
resources spent to achieve them. Feedback from the pilot areas
supports this assessment.
• In order for data mining to be of practical use to EROs, the data returned
would need to contain many fewer names of registered or ineligible
individuals and have significantly improved address information. Our
recommendations consider how that could be achieved.
• Finally, data mining would require a central organisation to take on the
role of coordinating data transfer and processing the data – as Cabinet
Office and GDS did for this pilot. Who that organisation would be (Cabinet
Office, GDS or a different body), and what level of resource it would
require, is a key question for any future use of data mining.
Individual databases – key findings
Department for Education (DfE)
• In terms of the usefulness of this data in identifying potential, unregistered
electors:
- The currency of this data appears to be good.
- The addresses appeared to be more complete than those held
in other national databases but a poor data specification from
Cabinet Office meant that the format was inconsistent.
- In the three areas that could separate their results, new
registrations were an average of 9% of the sample of records
they worked with.
6

• Overall, limited conclusions can be drawn about the DfE data as several
pilot areas were unable to report their results separately for this database.
Welsh Department for Education and Skills (DfES)
• The DfES data does not include full addresses, only postcodes and as a
result no pilot areas were able to do any follow up work with the data.
• This database is not suitable for the purposes of identifying unregistered
electors because of the lack of address information.
Department for Work and Pensions (DWP)
• In terms of the usefulness of this data in identifying potential, unregistered
electors:
- There were numerous address issues including crucial address
information being missing and extensive and confusing
abbreviations.
- A substantial number of the records returned related to existing
electors.
- The currency of the data was an issue. Evidence provided by
some pilot areas indicates the data should be restricted to
records which have been updated in the past three months.
- Overall the number of new registrations achieved was 8%.
• We do not believe that any of the findings in this pilot call into question the
use of DWP data for the confirmation process.
5

Royal Mail
• In terms of the usefulness of this data in identifying potential, unregistered
electors:
- Some of the Royal Mail datasets requested by Cabinet Office
included data relating to individuals below the age for registering
to vote.
- The addresses appeared to be more complete and consistent
than those held in most of other national databases.
- A substantial number of the records returned related to existing
electors.
- The currency of the data was an issue, with a substantial
number of the records relating to individuals who were found to
be no longer resident. This is likely to be partly a result of the
two year update restriction placed on the data, which most pilot
areas felt was too long.


5
Confirmation is the process by which existing electors will be matched against data held by
DWP in order to retain them on the registers during the transition to IER. The findings from
this pilot are not directly applicable to confirmation because data mining involves identifying
and using non-matches between the registers and the DWP database and as such is
inherently more likely to encounter issues with records held by DWP.
7

- Overall the number of new registrations achieved was 8%.
• Overall, Royal Mail data does not seem to be more effective than DWP
data in identifying potential new electors, and there is a higher cost
attached to using this data.
Student Loans Company (SLC)
• In terms of the usefulness of this data in identifying potential, unregistered
electors:
- There seemed to be issues with the addresses on this data
being incomplete. Only one pilot area reported usable results for
this database and they found that nearly a third of the addresses
were quite clearly incomplete. SLC informed us that the
addresses they provided to GDS were complete, so it seems
that these issues may have arisen in the matching process,
although we are unable to say for certain.
- This pilot area reported a low number of new registrations.

County data mining – judging success
• This pilot did not try to assess the usefulness of access to county council
data. Rather it was set up to provide qualitative feedback on the barriers
and issues that would be faced if EROs in lower tier authorities tried to
access the data held by an upper tier.
• There is therefore nothing in the findings to challenge the assumption that
it would be sensible to equalise unitary authority and lower-tier authority
EROs’ access to data.
• However, it took a long time and a great deal of effort to establish the data
sharing arrangements between the pilot areas and the county councils. It
is clear that, if legal access was granted to lower tier EROs, they would
still need to invest time in securing access to, and learning how to best
utilise, the county data.
• Importantly, EROs would need to assess the cost effectiveness of
accessing the data that became available to them – exactly as they
should do with the local data they currently have access to.




8

Recommendations
National data mining
The findings from this pilot do not justify the national roll out of data
mining. The concept of using national data to assist EROs may still have
potential, but data mining should not be implemented without further
testing of the databases and processes.
Data mining would require a central organisation to be responsible for
managing the connection between national data holding organisations
and undertaking data processing work. Cabinet Office undertook this role
for this pilot.
The need for a central coordinating body is key as some data holding
organisations, such as DfE, do not conduct the matching process themselves.
However, the requirement is wider than this and includes the management of
relationships between national data holding organisations and local EROs.
The alternative is requiring, for example, DWP to deal directly with individual
data requests from 380 EROs.
This central organisation could be the eventual system owner for the IER
Digital Service, which will manage the ‘verification process’ – the checking of
electors’ personal identifiers between EROs and DWP. However, in contrast
to verification, data mining is unlikely to be as automated a process.
There should only be further data mining testing on the understanding
and acceptance of the need for an ongoing central presence (and any
related costs) in order to receive the data from EROs and the national
organisation, match it and return the results to each local area.
Any further testing should also be considered in relation to the priority
of the overall transition to IER. Plans should therefore take into account the
capacity of all the organisations and individuals required to test data mining,
specifically to ensure that any testing would not adversely affect their existing
commitments to delivering IER.
In addition, there were numerous issues in this pilot with the communication
and support provided by Cabinet Office. It is important that Cabinet Office
considers what lessons can be drawn from this pilot, particularly in
terms of engagement with EROs, for the wider implementation of IER.
If further testing is undertaken then, in relation to specific databases:
• There would be merit in re-testing the Department of Education
database as there were fewer issues than for the other databases and
limited results were returned. However:
- it would be sensible to explore the approach to addressing on
this database ahead of any full pilot
9

- 16 year olds who are under attainer age should be excluded
from the data returned to EROs
• The Welsh Department for Education and Skills database is not
suitable for the purposes of data mining and should not be tested
again.
• The Department of Work and Pensions database should only be
included in further testing if:
- full integration of UPRNs is completed (i.e. all records have
UPRNs)
- the record currency can be restricted to those with address
changes within the past three months
• The Royal Mail database should only be included in further testing
if:
- the names of individuals below the age for registering to
vote are excluded from the data, which Royal Mail has
confirmed it can do
- the record currency can be restricted further and the data
shared includes the start date of the redirection
• The Student Loans Company database should only be included in
further testing if:
- the addressing issues experienced in this pilot can be
resolved
- testing takes place during October - November or January -
February rather than at the end of the academic year
In relation to any further testing in general:
• There needs to be a clear understanding of the databases being
accessed and a clear data specification provided to the data
holding organisations (based on the requirements of the pilot).
• For any new database proposed for data mining testing, their
approach to addressing should be assessed in advance of the pilot.
There would be limited value at this point in testing a database which
lacks UPRNs and has poor addressing information.
• For any database tested, the potential for returning records to
EROs with the original register address attached should be
explored (rather than the address held on the national database). This
could, where available, be achieved using UPRNs.
• Any combination of databases needs to be less complex. For
example, in this pilot data from two databases was combined into one
file, and pilot areas could receive more than one file. Many areas could
not clearly report on the results and this made evaluating the pilot more
10

difficult. For future testing, only one file with data from one database
should be provided to each ERO.
• There should be mandatory checking of the national data provided
against data held locally. This would allow for an assessment of the
added value to EROs of access to national data, as compared to local
data which they already have access to. For example, if DfE data is
included in the re-testing, it should be compared with locally held
education data to assess whether the unregistered individuals identified
on DfE could be identified using local data instead.
• Cabinet Office need to ensure that they maintain good
communication between themselves, the data holding
organisations and EROs throughout the process, including after data
from the national databases has been returned to EROs.

County data mining
The results from this pilot do not show how useful it would be for EROs for
lower tier authorities to have access to data held by an upper tier. However,
EROs in these authorities should be given the legal right of access to
data held by upper tier authorities, to put them in position analogous to
EROs in unitary authorities.
EROs are responsible for deciding which local data they are prepared to use
in maintaining their register. These decisions should be based on an
assessment of the quality of the specific database to be used.
11

1 Introduction
This report sets out the findings of the Electoral Commission’s evaluation 1.1
of the 2013 data mining pilot. The pilot, run by Cabinet Office, involved
comparing electoral registers with five national public databases and
data held by four county councils in order to identify unregistered
electors.
The aim of this pilot was to test whether providing Electoral Registration 1.2
Officers (EROs) with information from these databases could help
improve the completeness of their registers.
6
The pilot targeted groups
of electors known to have lower than average levels of registration.
The electoral registers
Electoral registers underpin elections by providing the list of those who 1.3
are eligible to vote. Those not included on the registers cannot take part
in elections. Registers are also used for other important civic purposes,
including selecting people to undertake jury service and calculating
electorates to inform Parliamentary and local government boundary
reviews, which are the basis for ensuring representative democracy.
In addition, credit reference agencies may purchase complete copies of 1.4
electoral registers, which they use to confirm addresses supplied by
applicants for bank accounts, credit cards, personal loans and
mortgages.
Great Britain does not have one single electoral register. Rather, each 1.5
local authority appoints an ERO who has responsibility for compiling an
accurate and complete electoral register for their local area.
7

8




6
By completeness, we mean that ‘every person who is entitled to have an entry in an
electoral register is registered’. The completeness of the electoral registers refers to the
percentage of eligible people who are registered at their current address. The proportion of
eligible people who are not included on the register at their current address constitutes the
rate of under-registration.
7
In Scotland, in some cases the Assessor of the Valuation Joint Board has been appointed to
act as ERO for several neighbouring local authorities.
8
By accuracy, we mean that ‘there are no false entries on the register’. The accuracy of the
electoral registers is a measure of the percentage of entries on the registers which relate to
verified and eligible voters who are resident at that address. Inaccurate register entries may
relate to entries which have become redundant (for example, due to people moving home),
which are for people who are ineligible and have been included unintentionally, or which are
fraudulent.
12

Current system of updating the electoral registers
At present, EROs use an annual household canvass and rolling 1.6
registration to update their registers. Electors can register to vote
throughout the year (including up to 11 working days before each
election) by completing a rolling registration form and submitting it to
their ERO. However, most updates to the registers take place during the
annual canvass, which is undertaken each autumn. At its simplest, the
canvass involves delivering a registration form to each household and
following up, via postal reminders and personal visits, those households
who do not respond. Revised registers are published by 1 December
each year.
The majority of EROs also use locally held data, such as council tax and 1.7
housing records, to improve the effectiveness of their registration
activity. The extent and sophistication of this use varies widely.
Accuracy and completeness of the electoral registers
Previous Electoral Commission research has provided estimates of the 1.8
accuracy and completeness of the electoral registers. As at April 2011,
we estimated the local government registers were 85% accurate and
82% complete.
9
This equates to approximately 8.5 million unregistered
people in Great Britain. However, this does not mean that these registers
should have had 8.5 million more entries, because many, but not all, of
those not registered correctly may still have been represented on the
registers by an inaccurate entry (for example, at a previous address).
Data mining is intended to help improve the completeness of the 1.9
electoral registers by identifying unregistered electors. The principle
behind data mining is that data held by national or local bodies may
include people who are eligible to vote but who are not registered. By
comparing these databases with the electoral registers, it should be
possible to produce a list of potential new electors, and the ERO can
then invite them to register.
Data mining may help to improve the accuracy of the registers as well, 1.10
for example by enabling EROs to identify and remove out of date entries.
However, this pilot was not designed to test the effects on accuracy and
pilot areas were not asked to identify, or report on, out of date entries.




9
The Electoral Commission, Great Britain’s electoral registers 2011 (December 2011).
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0007/145366/Great-Britains-
electoral-registers-2011.pdf
The rates for the April 2011 Parliamentary registers were similar
at 86% accurate and 82% complete.
13

Individual electoral registration
The electoral registration system in Great Britain is changing from 1.11
household registration to individual electoral registration (IER).
10
At
present, one person in every household is responsible for registering
everyone else who lives at that address (although individuals can apply
to register as well). Under IER, the responsibility for registering to vote
will rest with each individual. In addition, in order to be registered each
individual will have to provide personal identifiers: their date of birth and
National Insurance number. These changes are intended to improve the
accuracy of the registers and to modernise the security of the
registration and voting systems. The Commission has been calling for
IER since 2003.
The previous UK Government, during the passage of the Political Parties 1.12
and Elections Act 2009 (PPEA), introduced legislation providing for the
phased introduction of IER. PPEA included provisions to allow data
matching pilots to be carried out, with a view to establishing which
national public databases might be useful to EROs in helping to maintain
electoral registers during and after the transition to IER.
The Coalition Agreement reached by the Conservative and Liberal 1.13
Democrat parties set out the Government’s plans to speed up the
implementation of IER. The Electoral Registration and Administration Act
2013 (ERA) provides the legal framework for IER to introduced on that
basis.
Transition to IER
The final household canvass will take place in spring 2014.
11
In summer 1.14
2014 all names and addresses on the electoral registers will be
compared with records held by the Department for Work and Pensions
(DWP) in order to verify the identity of the people on the registers
(‘confirmation’).
12
All electors whose details are matched will be
confirmed directly onto the first IER registers and will not have to provide
personal identifiers.
All electors whose entries are not matched will be asked to re-register by 1.15
providing personal identifiers. In addition, from the start of the transition,
any new elector will need to make an individual application and provide
personal identifiers, as will any electors who change address (even if
they have been confirmed at their previous address).


10
Northern Ireland has used a system of individual electoral registration since 2002.
11
There will still be annual canvasses under IER, but these will take a different form.
12
The timing for the introduction of IER in Scotland is different as a result of the referendum
on independence in September 2014. The confirmation process will take place in late
September with a subsequent write out to unconfirmed electors beginning in October 2014.
14

Under current plans, electors on the 2014 registers who are not 1.16
confirmed will have until December 2016 to provide personal identifiers,
before they are removed.
13
However ERA enables Ministers to lay an
Order before the UK Parliament to provide for the transition to be
completed by December 2015 instead. The Government has made it
clear that its intent is to complete the transition in 2015. Therefore, while
there is uncertainty as to whether the point of removal of electors that
have not provided personal identifiers will be in 2015 or 2016, it is our
view that EROs should plan on the basis that they will have to be ready
for the point of removal to be 2015.
Previous pilot schemes
2011 data matching pilot
14

The first pilot held under PPEA was conducted in 2011. It involved 1.17
matching electoral registers from 22 pilot areas against ten national
databases in order to test whether giving EROs access to the data would
help them improve the accuracy and completeness of their register.
Under PPEA, the Commission has a statutory responsibility to report on 1.18
the effectiveness of data matching pilots. Our report into the 2011 pilot
was critical of the methodology of the pilot. Issues included the
complexity of the data returned to EROs; the lack of a consistent
definition of what constituted a match between the registers and the
databases; and the overlap between the timing of pilot and the annual
canvass.
15

The methodological flaws meant that it was not possible for us to 1.19
conclude whether access to national databases could assist EROs in
maintaining their electoral registers. However, we said that there was
merit in re-testing nearly all of the databases used in the pilot but in a
way which allowed for the collation of meaningful results.


13
Any elector with an absent vote (postal or proxy voters) will need to be confirmed or provide
their personal identifiers before the revised electoral registers are published by 1 December
2014 in order to retain their absent vote. However, they will still be able to vote in a polling
station until either December 2015 or December 2016, depending on the end date of the
transition.
14
The terminology used for these pilots has changed over time. Initially, the 2011 pilot, which
aimed to identify unregistered electors, was referred to as a data matching pilot.
Subsequently, data matching was adopted as an umbrella term covering two different
processes: 1. Confirmation – matching register entries to DWP data to confirm identity and
retain electors on the registers. 2. Data mining – using data to find unregistered electors.
15
The Electoral Commission, Data matching schemes to improve accuracy and
completeness of the electoral registers – evaluation report (March 2012)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0010/146836/Data-matching-
pilot-evaluation.pdf

15

The recommendations from our evaluation of the 2011 pilot are referred 1.20
to throughout this report where relevant.
2012 confirmation pilot
The results from the 2011 pilot did indicate that a majority of existing 1.21
electors could be matched against the DWP database. This finding was
the basis for the principle of confirmation and was the subject of a
separate pilot in 2012.
The confirmation pilot involved matching the electoral registers from 14 1.22
pilot areas against data held by DWP. On average, over 70% of electors
could be matched. Our evaluation concluded that confirmation was an
effective and reliable way of verifying the identities of electors and
should be used during the transition to IER. However the results also
showed that confirmation was less effective for certain groups of
electors, particularly students and those who move home frequently.
16

2013 pilot: aims and objectives
The role of data mining during or after the transition to IER is not yet 1.23
clear. The rationale behind the pilot schemes is that it potentially offers a
way for EROs to target registration activities and so make more effective
use of resources.
The current pilot was designed to test whether providing EROs with 1.24
access to this data would enable them to identify, and register, currently
unregistered individuals. The pilot focused on three groups of electors
known to have lower than average rates of registration: attainers, home
movers and students (see Chapter 2 for further information on why these
groups were selected).
The pilot was also intended to help assess the cost-effectiveness of data 1.25
mining.
The Electoral Registration Data Schemes (No. 2)
Order 2012
The Electoral Registration Data Schemes (No. 2) Order 2012 (‘the 2012 1.26
Order’), made on 19 December 2012, allowed for the sharing of
specified data between the named pilot areas and data holding
organisations. The data holding organisations named in the Order are:


16
The Electoral Commission, Data matching pilot – confirmation process (April 2013)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0009/154971/Data-matching-
schemes-confirmation-process-evaluation-report.pdf

16

• the Department for Work and Pensions
• the Department for Education
• the Welsh Department for Education and Skills
• the Student Loans Company Ltd
• the Higher Education Funding Company for England
17

• Royal Mail Ltd
• Cumbria, Hampshire, Nottinghamshire and Lancashire county
councils
The Order also specified which pilot areas each data holding 1.27
organisation could share data with.
Under the 2012 Order, an agreement between the relevant data holding 1.28
organisation and pilot area needed to be in place before personal data
could be shared between the two parties. The purpose of the agreement
was to set out: governance arrangements for data transfer and
matching; the expected inputs and outputs; information security
standards; and timescales.
This evaluation
The Commission has a statutory responsibility to report on the 1.29
effectiveness of the data matching schemes. Sections 35 and 36 PPEA
outline the aspects that should be covered in our evaluation, including
whether the scheme could assist EROs in meeting their registration
objectives, the administration and cost of the scheme and any public
objections. Based on this, we have developed specific criteria with which
to evaluate the data mining pilot.
Our evaluation aims to assess: 1.30
• whether the national and/or county council databases could provide
EROs with current information on eligible and unregistered electors
within each of the target groups
• the number of new registrations achieved from the individuals
identified by these databases
• whether there were any public objections to the scheme


17
Although the Higher Education Funding Company for England was named in the Order,
they did not participate in the pilot.
17

• the cost of using data mining to identify eligible but unregistered
electors
Sources for our evaluation
Our evaluation draws on: 1.31
• interviews with all of the pilot areas, all of the national data holding
organisations and one county council
• results submitted by the pilot areas
• information provided by the pilot team in the Cabinet Office
The Commission would like to thank everyone who has participated in 1.32
the pilot and helped with this evaluation, particularly the staff at the local
authorities and Valuation Joint Boards.
This report
The rest of this report sets out further details of how the pilot worked and 1.33
presents the results, analysis and associated recommendations. The
chapters are arranged as follows:
• Chapter 2 looks at the set up of the pilot including the databases
accessed
• Chapter 3 sets out the processes used in the pilot including data
transfer, matching and follow up
• Chapter 4 provides information on costs associated with the pilot
• Chapter 5 sets out our analysis of the national data mining activities
• Chapter 6 sets out our analysis of the county council data mining
activities
• Chapter 7 sets out our recommendations
• Appendices A to F set out our detailed findings in relation to the
databases tested in this pilot

18

2 Set up – target groups and
databases
This chapter explains the set up and matching process for the national 2.1
and county data mining.

National data mining
Target groups of electors
Three groups of electors were targeted in this pilot: attainers, home 2.2
movers and students. These are all groups which are known to have
lower than average rates of registration.
There are several likely reasons for these lower registration rates such 2.3
as high population mobility or disaffection with traditional politics. It is
possible that data mining could help to register those people who are not
on the register because they move house frequently but it is likely to be
less effective at registering those who feel disengaged from traditional
politics.
Attainers
An elector who is not yet 18 years of age must be shown on the register 2.4
with the date on which they will attain the age of 18. Those electors are
Key points
• The pilot involved targeting three groups of electors known to have
low registration rates – attainers (17 year olds and some 16 year
olds), home movers and students.
• Eighteen EROs were involved in the national pilot. They were
provided with data held on five national public databases. Each of
the EROs accessed data from at least two of these databases.
• In addition, four district council EROs were given access to data held
by their respective county council, with a specific focus on trying to
identify unregistered attainers.
19

called attainers as they are about to attain voting age. Attainers are
predominantly 17 year olds, with some 16 year olds.
18

Our research has found that registration levels varied by age, with 55% 2.5
of 17-18 year olds registered at their current address, compared to 86%
for electors aged 35-54 and above 90% for older age groups.
19

Home movers
Our research found that the registration rate for electors who had been 2.6
at their current address for up to one year was 26%. This is substantially
lower than for electors who had been at their address for between one
and two years, where the registration rate was 76%.
Students
It is difficult to produce robust estimates of student registration rates, 2.7
since many students are entitled to be registered twice (at both their
home and term-time addresses). The Commission’s 2005 research into
electoral registration, using 2001 census data, estimated the student
registration rate to be around 78%.
20

However, registration rates were lower at term-time addresses and 2.8
substantially lower for students who had moved in the previous six
months (around 55%).
The databases
The databases used in this pilot were selected in response to the target 2.9
groups identified, e.g. education data was considered a useful source of
data on attainers. However, the final list of databases involved in the
pilot was also determined by whether data holding organisations wanted
to participate. For example, the Driver and Vehicle Licensing Agency
was involved in the 2011 pilot but declined to participate this time
around. In addition, there was an initial ambition to include some NHS
data but no agreement was reached on access.


18
An attainer is an individual who will turn 18 before the end of a 12 month period starting
from the next 1 December after the application is made i.e. if an application is made in spring
2013, the applicant will be eligible as an attainer if they turn 18 any time before December
2014.
19
The Electoral Commission, Great Britain’s electoral registers 2011 (December 2011).
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0007/145366/Great-Britains-
electoral-registers-2011.pdf

20
The Electoral Commission, Understanding Electoral Registration (September 2005)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0020/47252/Undreg-
FINAL_18366-13545__E__N__S__W__.pdf

20

Five national public databases were used in this pilot and these are 2.10
shown in the table below against the target groups they were intended to
identify.
21

Table 1: National databases and target groups
Target group Databases
Attainers Department for Work and Pensions (DWP)
Department for Education (DfE)
Welsh Government, Department for Education
and Skills (DfES)
22

Students Student Loans Company (SLC)
DfE / DfES
Home movers DWP
Royal Mail

Table 2 below summarises the coverage and sources of the national 2.11
databases used in the pilot. It also indicates what, if any, currency
restrictions were placed on the data used in this pilot. One of the issues
identified in the 2011 pilot was that much of the information provided
from the national databases was not current enough for the purposes of
electoral registration. The currency restrictions employed in this pilot
were an attempt to reduce this problem.
.


21
Data held by the Higher Education Funding Company for England (HEFCE) was also
intended to be used in this pilot. However, Cabinet Office and HEFCE could not reach
agreement on the terms and conditions for sharing data, as required by the 2012 Order (see
paragraph 1.28), within the timescales of the pilot.
22
Cabinet Office had planned to use two datasets held by the Welsh Department for
Education and Skills (DfES): the School Census described in Table 2 below and also the
Lifelong Learning dataset. DfES provided an extract of this second dataset to Cabinet Office,
however there were a number of formatting issues which could not be resolved in time for it to
be used in this pilot.
21

Table 2: National databases used in the 2013 data mining pilot
Data
-
holder

Database

Coverage of data used in
pilot
Currency restrictions

Sources and updates

Department for
Education
National Pupil
Database
Schools, academies and some
non-mainstream educational
provision in England. Does not
cover independent schools or
further education colleges.
Records from the
October 2012 census
only
Mostly submitted directly by
schools although in some
areas comes through the
local authority. The census
is conducted three times a
year.
Welsh
Department for
Education and
Skills

Pupil Level
Annual School
Census
All local
authority maintained
schools in Wales (covering
approx. 97% of pupils)
Records from the
January 2013 census
only
Submitted by schools via
local authorities. The
census is conducted once a
year.
Department for
Work and
Pensions
Customer
Information
Systems (social
security, tax
credits and child
benefit records
only)
All individuals with a National
Insurance number or a child
reference number
Address

updated within
last 12 months

Various databases in DWP
and HM Revenue and
Customs. Updated through
customer or employer
contact.
22

Data
-
holder

Database

Coverage of data used in
pilot
Currency restrictions

Sources and updates

Student Loans
Company Ltd
Student Finance
Customer
Account System
(Higher
Education)
Students undertaking
undergraduate higher
education qualifications, who
have applied for a student
loan
23

Current
students only

Student initiated. Students
make a fresh finance
application each academic
year. SLC conducts identity
checks before approving
initial application.
Attendance at the specified
institution is checked three
times a year.
Royal Mail Ltd

Mail
redirection
service (Change
of Address
Update and
Suppress
databases; Home
Mover Mailing
Service)
National but optional


individuals apply for mail to be
redirected when they move
home.
Redirections

set up

within the last 2 years
Individuals submit reques
t
to have mail redirected for a
set period of time, providing
previous and new
addresses. Royal Mail
conducts identity checks
before approving
application.



23
There will also be a small number of postgraduate students who are eligible to apply for student loans e.g. on courses such as teacher training.
23

The pilot areas
A total of 20 areas participated in the pilot: 13 local authorities in 2.12
England, five local authorities in Wales and two Valuation Joint Boards in
Scotland. Of these, 18 were involved in the national data mining; two of
the authorities in England looked only at county council data.
All areas who participated in the 2011 or 2012 pilots were invited to 2.13
participate in this pilot. In total, 13 of this year’s pilot areas were involved
with one or both of the previous pilots. The remaining seven areas had
previously expressed interest in participating in a data matching pilot.
The pilot areas were not selected to be representative of Great Britain 2.14
but do offer a reasonable spread of different types and sizes of authority
across England, Scotland and Wales
For the national data mining, pilot areas looked at different groups of 2.15
electors. As can be seen from Table 3 below, there was not an even
distribution between the different groups. Nine pilot areas tested the
databases for mining attainers, five tested the student databases and 14
the databases for home movers.
This means that some pilot areas tested the data for more than one 2.16
group; in fact, some areas received as many as three different sets of
data.

24

Table 3: National data mining – pilot areas and target groups
Area
24

Attainers

Home
movers
Students

England

Coventry






Greenwich






Harrow





Richmond
-
upon
-
Thames




Rushmoor





South Ribble





Southwark





Sunderland






Tower Hamlets







Wigan





Wolverhampton






Wales





Ceredigion







Conwy





Pembrokeshire






Powys






Wrexham





Scotland


East Renfrewshire

/
Renfrewshire VJB
25





Lothian VJB









24
In addition, Barrow-in-Furness and Mansfield took part in the pilot looking solely at county
data.
25
Renfrewshire Valuation Joint Board participated in the pilot using only the register for East
Renfrewshire (the VJB covers East Renfrewshire, Inverclyde and Renfrewshire).
25

County data mining
This part of the pilot involved four district councils receiving data held by 2.17
their respective county council.
In many parts of England, local government is organised as a multi-2.18
tiered structure. Each county will have a single county council and
several district councils within the county area.
Local government functions are shared between the county and district 2.19
tiers. For some functions, both tiers will have responsibility whereas for
other functions responsibility will lie with only one tier. For example,
district councils are responsible for electoral registration and county
councils are responsible for education.
Unlike in a unitary council (as in all of Wales and many areas in England, 2.20
where a single authority has responsibility for all local government
functions) an ERO for a lower tier authority has no legal right to access
the data held by the county council.
The main aim of this part of the pilot was to see if it was possible to 2.21
establish data sharing arrangements between a county and district
council for the purposes of electoral registration.
Areas, target groups and databases
Four pilot areas participated in the county data mining and all focused on 2.22
identifying attainers. Three of the county councils supplied their
education database, with one providing their broader children’s services
database.
The coverage of each of the county council datasets is summarised in 2.23
Table 4 below.
Table 4: County data mining – pilot areas and databases
Pilot area

County council

Dataset

Barrow
-
in
-
Furness

Cumbria

Education

data

Mansfield

Nottinghamshire

Children’s services

data

Rushmoor

Hampshire

Education

data

South Ribble

Lancashire

Education data


records from
the October 2012 school
census

26

Conclusion
Although less complex than the 2011 pilot, this pilot was still ambitious in 2.24
terms of the number of different databases it was trying to assess. For
the national data mining, there were two databases for each target group
of electors. Half of the pilot areas were looking at more than one target
group, with two looking at all three of attainers, home movers and
students.
This made some of the practical arrangements more complex, for 2.25
example by increasing the number of legal agreements required (a
separate agreement was needed for each pairing of data holding
organisation and pilot area).
27

3 Pilot processes – data
transfer, matching, follow up
This chapter explains the overall processes followed in the pilot: data 3.1
transfer, data matching and the follow up work conducted by pilot areas.
We also set out some of the associated issues for our evaluation.



Key points
• There were considerable delays to the original timetable for
establishing this pilot. A significant cause of the delays was the lack
of capacity within Cabinet Office due to their workload related to the
transition to individual electoral registration (IER).
• The delays affected the amount of work the pilot areas could do with
the data.
• Three different organisations were involved in matching the national
databases against the electoral registers in order to produce a list of
potential unregistered electors. Each organisation used similar but not
identical matching criteria. We have not been able to quantify any
variation in the matching caused by differences in the matching
criteria.
• For the national data mining, Cabinet Office’s original intention was
that pilot areas should adopt a fairly standardised approach to
checking the data received and contacting the individuals identified,
to ensure that results were comparable. In practice, however, the
nature and extent of follow up work varied widely.
• Much of this variation was caused by practical difficulties, for example
the need to spend more time than expected in ensuring the accuracy
of the data received. However, some of the variation could have been
avoided if there had been fewer delays and a greater level of support
provided by Cabinet Office to pilot areas. In particular, a few areas
told us they felt unsupported and were unclear about what to do.
• Although the wide variation in follow up activity has limited our
evaluation in certain ways, we do have more complete and
comparable data than in the 2011 pilot and so are able to reach more
definite conclusions.
28

Introduction
The Cabinet Office was responsible for the management of the pilot. 3.2
They coordinated the legal agreements and transfer of data and
provided support to the pilot areas.
Timetable and delays
The original timetable for the pilot involved pilot areas sending their 3.3
registers to the Government Digital Service (GDS) by early February
2013. The matching would be conducted in February, allowing GDS to
compile and return the results in early March, well in advance of 17 April,
the statutory deadline for data transfer.
26
Pilot areas would then submit
results from their follow up work by the end of April, several months
before our statutory evaluation deadline of 17 July.
In practice, the registers were sent in early March, the matching was 3.4
conducted in late March and the results returned only a day or two
before the 17 April cut-off point.
The delays were in part caused by difficulties in finalising legal 3.5
agreements between the data holding organisations and pilot areas (see
paragraph 1.28). However, it seems that these difficulties were largely
restricted to the agreements with Royal Mail and the county councils.
Most of the pilot areas and data holding organisations we spoke to
indicated that the process was fairly straightforward, with many simply
updating the agreements used in previous pilots.
However, the pilot also suffered from the workload created by Cabinet 3.6
Office’s wider Electoral Registration Transformation Programme which is
responsible for introducing IER, in particular the demands on GDS
around the development of the IER Digital Service.
27

If there is any further work on data mining, we strongly recommend that 3.7
Cabinet Office creates realistic project plans and timescales, making
sure in advance that they have sufficient resources and staff capacity to
meet the stated timescales. Any planning should consider the
requirements imposed by the overall work required as part of the
implementation of IER as that will always take priority over pilots.



26
The SI which enabled the pilot set 17 April as the deadline for data transfer.
27
The IER Digital Service is the central IT system being developed as part of the transition to
IER. It will be responsible for managing the confirmation process as well as the transfer of
data between Electoral Registration Officers (EROs) and DWP as part of the ongoing
verification of electors’ personal identifiers.
29

Impact of the delays
The impacts of the delays include: 3.8
• GDS having little time to investigate errors or clean returned data.
• GDS not having time to restrict results to the postcodes or wards
specified by some of the pilot areas (so contributing to the higher
volume of records returned).
• Results being returned during the election period, when five pilot
areas were running elections.
28
These areas were therefore unable
to do any work with the data until the elections were complete.
29

• Generally compressing the time available for follow up work with
potential electors.
• Pilot areas having little time at the end of the process to provide
results or to fix errors in the data reported to us.
The rush to transfer data before the deadline of 17 April meant that one 3.9
pilot area (Pembrokeshire) did not receive the data they expected for
attainers. Another area (South Ribble) did not do any follow up work
during the timescales of the pilot so we have no results to report for
them.
Data transfer
Data was transferred between the various organisations by a 3.10
combination of secure courier and secure email.
Both of the previous data matching pilots used secure courier to transfer 3.11
data. For any national roll out of data mining this would not be a
sustainable or cost effective way of transferring the data due to the
volumes of data involved.
It is a positive step that secure email was used for some of the data 3.12
transfer in this pilot, although it was limited to government bodies
(including local authorities) with a pre-existing secure government email
account. Some of the pilot areas who tried to set up a secure account as
part of this pilot reported that it was not a particularly quick process, and


28
Barrow-in-Furness, Mansfield, Rushmoor and South Ribble had county-wide elections.
Wolverhampton had a local by-election.
29
The priority in those areas at that time was running the elections. They also could not
register people from 11 working days before polling day and did not want to write out to
people and potentially have them believe they would be registered for the elections when they
would not have been.
30

not all completed it in time for use in the pilot. It is possible that the work
being carried out to establish connectivity between all local authorities
and the IER Digital Service means this would be less of an issue in the
future.
The non-governmental data holding organisations (Student Loans 3.13
Company and Royal Mail) had to transfer their data by secure courier,
which they found cumbersome. Both would have preferred to use secure
electronic transfer, which is typically used in other data sharing
arrangements they are part of.
National data mining
Matching process
Each database was matched against the relevant electoral registers in 3.14
order to identify the individuals who were already registered.
30
This left a
list of individuals who were on the database but could not be found on
the electoral registers – at least, not at that address. This process took
place centrally, although with several organisations involved.
Responsibilities
The pilot areas first sent an extract of their electoral registers to the 3.15
GDS, who carried out some basic standardisation.
31

The matching was conducted by GDS, the Department for Work and 3.16
Pensions (DWP) and Transactis (on behalf of Royal Mail).
32

• GDS matched the electoral registers against data held by the
Department for Education (DfE), the Welsh Government Department
for Education and Skills (DfES) and the Student Loans Company
(SLC).
• DWP matched the electoral registers against their Customer
Information Systems database.
• Transactis matched the electoral registers against the Royal Mail
databases.
The results from all databases were then compiled by GDS. All of the 3.17
results for each group of electors were compiled into one file per pilot


30
An extract of each database was used covering the relevant geographical area.
31
Ensuring that e.g. field headings are identical, key bits of data are in the same field for each
register.
32
CDMS Ltd t/a Transactis are a pre-selected supplier under HM Government’s Data Access,
Processing and Analytics Framework. Transactis are Royal Mail’s preferred data
management partner and are contracted to do other matching work as well as this pilot.
31

area. For example, for home movers, GDS compiled the results from
DWP and Transactis into a single file for each of the 14 pilot areas. This
stage also involved identifying individuals who had been returned from
more than one database and removing duplicates. The data returned to
pilot areas indicated where an individual had been identified on more
than one database.
The different groups (attainers, home movers and students) were treated 3.18
separately at this stage. Pilot areas who selected more than one group
therefore received more than one results file. There was no checking for
duplicates between different target groups for the same pilot area. For
example, there was no mechanism to identify and remove duplicate
names between the records returned for home movers and students.
Figure 1: National data mining – matching process
Pilot areas send extract of their electoral r
egister to GDS, where it is
cleaned and re
-
formatted and passed to other agencies for matching

GDS match
electoral registers
against DfE, DfES
and SLC data
List of individuals found on databases but not on electora
l registers
produced by or sent to GDS
DWP matches
electoral
registers against
DWP data
Transactis
matches
electoral
registers against
Royal Mail data
GDS compile results within each option (attainers, students, home
movers) producing one results file per option for each pilot area
Results sent to pilot areas

32

Matching algorithm – DWP and GDS
DWP and GDS used automated matching processes to conduct the 3.19
matching although each separately developed a matching algorithm (the
rules that determine who matches).
DWP used the algorithm that they developed as part of the 2012 3.20
confirmation pilot while GDS developed their own algorithm based on the
same principles.
This algorithm involves a two-stage match process: address then name. 3.21
The algorithm first attempts to match the address, using either the UPRN
or address lines (e.g. 1 Acacia Avenue).
33
If the address could be
matched, the algorithm then attempts to match the names of the
individuals living at that property, using different combinations of first
name, middle name and surname.
The results are classified into red, amber or green matches.
34
3.22
• Red: no address match, or an address match but no name match.
• Amber: address match and partial (or ‘fuzzy’) name match, for
example exact first name, fuzzy last name (for example, where there
was a slight difference in spelling between the two data sets).
• Green: address match and exact name match, for example full first
name and full surname.
The purpose of data mining is to identify individuals who are not on the 3.23
registers, i.e. the individuals who did not match during this process. The
results sent back to the pilot areas were therefore a list of the red
matches from the relevant databases – the records on that database that
could not be matched against an entry on the registers. This is a
significant improvement on the 2011 pilot, where pilot areas were
provided with all the names on the national database and left to
determine their own definitions of what constituted a match, leading to
inconsistency between areas.


33
UPRNs are unique 12-digit codes assigned to each property at a local level. These local
lists are then combined to form the National Land and Property Gazetteer in England and
Wales and the One Scotland Gazetteer in Scotland. Out of the databases used in this pilot,
DWP is the only one which holds UPRNs, with approximately 88% of the records having
UPRNs. The matching for the four other databases therefore compared the address fields
held on that database with those held on the relevant electoral register.
34
Our report on the 2012 confirmation pilot contains further details of the matching algorithm
and classification. The Electoral Commission, Data matching pilot – confirmation process
(April 2013)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0009/154971/Data-
matching-schemes-confirmation-process-evaluation-report.pdf

33

Amber matches were not included in the results sent back to pilot areas 3.24
as evidence from the confirmation pilot suggested that a large proportion
of these records were existing electors but had failed to match due to
minor differences of address or name.
Matching algorithm – Transactis
Transactis also used an automated matching process to match the 3.25
electoral registers against the Royal Mail data, using the same algorithm
that they use for all of their Royal Mail data matching work.
This algorithm is a single stage process which involves matching data 3.26
held in all fields at once (i.e. matching name and address at the same
time). The algorithm has provision for fuzzy matching as well as rules to
assist matching more complex addresses, such as flats and tenements.
Transactis re-formatted the electoral registers that they received to 3.27
assist the matching process, quite extensively in some cases.
Pilot areas’ follow up work
Once each pilot area received their results for the target groups they 3.28
were focusing on they could then assess the data and carry out follow up
work using the names provided. Broadly, this meant either sending
letters with a registration application form or door to door canvassing.
One of the main problems with the 2011 pilot was the wide variation in 3.29
follow up work between the different pilot areas, coupled with the overlap
between the pilot and the annual canvass. This meant that results were
not comparable and we could not draw any conclusions about the
usefulness of the databases.
This year’s pilot took place outside of the canvass period which makes 3.30
interpreting the results more straightforward. In addition, the need for
standardised follow up was accepted by Cabinet Office and included in
their plans, at least for the national data mining. Cabinet Office intended
that each pilot area adopt a similar approach to the follow up work and
provided guidance outlining each stage.
In practice, however, there was a considerable amount of variation in the 3.31
follow up work, although not to the same extent as in the 2011 pilot.
Below, we set out the implications of some of that variation for this
evaluation. Appendix G contains further details of each pilot area’s
approach to follow up work.
Sampling
Cabinet Office thought that pilot areas would either follow up all of the 3.32
records received or work with a random sample if the volume was too
high. For the home movers option, pilot areas had been able to select
particular wards or postcodes, so they would only get records back for
individuals who lived in those locations. In fact, Royal Mail were unable
to limit home movers data to the wards or postcodes selected by many
34

of the pilot areas. The volume of data provided was therefore much
greater than the pilot areas expected.
In practice, many areas took different approaches to identifying which 3.33
records to follow up. As a result, where we see variation in the numbers
of new registrations achieved by different pilot areas, we cannot be sure
how much of the variation is due to different approaches to sampling.
These differences include:
• Using a random sample or selecting particular records.
• Taking different approaches to identifying and excluding out of
date records or records relating to existing electors. One pilot
area did no checks before sending letters, some did very extensive
checks, while others conducted limited checks or checked certain
records only, for example only those returned from one database.
• Selecting attainer samples based on varying date of birth
ranges.
35
For example, some areas excluded all 18 year olds while
others wrote to them anyway, and different areas applied different
lower date of birth thresholds.
36

In general, we have more complete results for the areas which 3.34
conducted more thorough checks before sending letters. This is because
they were able to identify more of the individuals who were existing
electors, ineligible individuals or no longer resident, than the areas who
relied on these individuals (or the new resident at that address) to
respond to the letter. Also, response rates to follow up were higher in
some of the areas who completed more checks, perhaps because they
were writing out to a more refined list.
Method and timing of follow up work
Pilot areas were able to contact potential new electors by letter or by 3.35
door to door canvassing. Mostly, the pilot areas wrote to the individuals
identified. The Cabinet Office provided a template cover letter (which the
Commission had an opportunity to comment on), briefly explaining the
purpose of the pilot. This was sent out along with a registration form.
However, the times at which letters were sent out varied. Some were 3.36
sent out close to the reporting date and so there may well be additional


35
Cabinet Office specified an age range rather than a more precise date of birth range for the
attainer files, meaning that individuals who were already 18 and individuals too young to
qualify as attainers were included on this data.
36
It has become apparent that different areas use different cut off points for the lower age
bracket for inclusion on the register, including December 1995, February 1996 and January
1997. This issue applies to the pilot areas looking at the county council data as well. It would
be helpful for any future pilot to impose a consistent interpretation.
35

responses which are not included in the figures in this report. In addition,
a few areas issued reminders while others did not.
This means that the numbers of new registrations achieved across the 3.37
pilot areas vary, partly due to the different practices used in follow up,
and, again, this makes it more difficult to compare the results in a
meaningful way.
Pilot areas were also encouraged to canvass at least some of the non-3.38
responders following the write-out stage as we know that door to door
canvassing can have a positive impact on response rates.
37

However, only four out of the 18 areas did any sort of canvassing, and in 3.39
all of these it was limited in some way, e.g. to a particular postcode or to
individuals identified on one database. This lack of canvassing was
primarily a result of the delays in the pilot set up (see paragraphs 3.3 to
3.9).
County data mining
Matching process
This element of the pilot differed from the national data mining in a 3.40
number of key ways, namely that each pilot area received data from a
different database and they conducted the matching process
themselves.
It had been envisaged that the pilot areas might develop their own 3.41
algorithms, similar to those used in the national data mining, either
themselves or through their Electoral Management Software (EMS)
provider. In the end, only one pilot area worked with their EMS provider
to conduct some automated matching. Other areas manually checked
the data against the register, in one case using functionality within
Microsoft Excel to assist them.
These differences are not a significant problem for these four pilot areas 3.42
as the aim of this element of the pilot was primarily to explore what some
of the potential challenges and benefits would be for lower tier councils
in accessing upper tier data.
Follow up
Cabinet Office did not originally specify that these pilot areas should do 3.43
any follow up work beyond the matching (because the aim of the county


37
The Electoral Commission, Great Britain’s electoral registers 2011 (December 2011)
http://www.electoralcommission.org.uk/__data/assets/pdf_file/0007/145366/Great-Britains-
electoral-registers-2011.pdf

36

pilot was different, focusing on whether it was feasible to share this data
and whether pilot areas would be able to successfully conduct the
matching). In the end, Cabinet Office did encourage pilot areas to
contact the individuals they could not find on their register, but this was
not required.
In practice there was therefore significant variation in the approach to 3.44
follow up and reporting.
Three of the pilot areas wrote out to unregistered individuals but, due to 3.45
the elections, letters were only sent in the middle of May (see paragraph
3.8). One pilot area (South Ribble) did not send any letters out during the
pilot timescales, although they are still planning to contact the
unregistered individuals. This area has therefore not reported any
results.
Support and guidance provided by
Cabinet Office
Pilot areas had mixed views on the support and guidance provided by 3.46
Cabinet Office. While some were positive about the support provided, a
few areas said they felt unsupported, were unclear what to do and were
left feeling disillusioned with the pilot process.
Several areas said that guidance and communication from Cabinet 3.47
Office could be improved. Explanations of how to interpret the data and
updates on progress (during the delays) were considered to be
particularly lacking. For example, some areas told us that they received
two versions of the same data file with no clear explanation as to why
(one was an updated version with some errors corrected).
Others said there was inadequate explanation of some detailed 3.48
elements such as the meaning of column headings in spreadsheets. The
impact on follow up work includes pilot areas having to spend longer
than anticipated making sense of the data before they could start work
on it.
Misunderstandings about which database the record had been returned 3.49
from contributed to not being able to report by database, and one pilot
area worked with records returned from only one database due to
incorrect data labels and poor communication from Cabinet Office.
This was a pilot and some mistakes could be expected. It is, however, 3.50
important that Cabinet Office considers what lessons can be drawn from
this pilot, in terms of engagement with EROs, for the wider
implementation of IER.
In addition, it would have been beneficial if Cabinet Office had provided 3.51
feedback to the data holding organisations during the course of the pilot,
37

particularly about any issues that had been identified with their data (see
Chapter 5 and Appendices A-F for details of issues with individual
datasets).
Conclusions
This pilot suffered from fewer process issues than the 2011 pilot. 3.52
However, there were still a variety of issues which either directly
impacted on the running of the pilot (and therefore its likelihood of
success) or on our ability to evaluate it.
Most significantly, the delays at the start of the pilot process meant that 3.53
there were tight deadlines for data transfer with no contingency period in
which to deal with problems. The delays also had a direct impact on the
evaluation as pilot areas had less time to conduct and report on their
follow up work with potential new electors.
38

4 Costs
This chapter sets out our summary of the costs incurred in the pilot. 4.1

Pilot areas
The costs incurred by the areas participating in the pilot are set out in 4.2
Table 5. The costs for Harrow, Pembrokeshire, Powys and Renfrewshire
VJB are estimates as these areas had not submitted final costs in time
for inclusion in this report.
As can be seen from the table, there is a wide range in the pilot areas’ 4.3
costs, from a couple of thousand pounds up to £27,000. This variation
may be due to the demographics of the area, how many records the pilot
area received or the approach to follow up they adopted.


Key points
• It is not possible to produce an overall figure for the cost of this pilot.
This is because we do not have final costs for all pilot areas or any
costs for Cabinet Office (including the Government Digital Service),
who conducted a lot of the work on the pilot.
• Since we are unable to produce a cost for this pilot, we are also
unable to estimate the cost per new elector registered or the likely
cost of any national rollout. Any estimates of these would need to
include the cost of coordinating and managing the pilot (the role
taken by Cabinet Office in this pilot), as any future work with data
mining would require some form of central coordination.
39

Table 5: Pilot areas’ costs
Pilot area Total costs
Barrow-in-Furness £16,609
Ceredigion £33,007
Conwy £10,965
Coventry £16,130
Greenwich £6,680
Harrow £15,000
Lothian VJB £7,140
Mansfield £2,654
Pembrokeshire
38
£9,000
Powys £27,000
Renfrewshire VJB £8,300
Richmond-upon-Thames £10,971
Rushmoor £8,419
South Ribble No estimate or final costs
Southwark £7,036
Sunderland No estimate or final costs
Tower Hamlets £27,157
Wigan £11,394
Wolverhampton £10,990
Wrexham £11,513
Total £239,965





38
Pembrokeshire also submitted an estimate of £6,500 for work on the attainers option, but
they did not receive this file and so we have not included this figure in their total costs.
40

Data holding organisations and GDS
Only Royal Mail charged Cabinet Office for accessing their data (plus 4.4
charges for data processing and matching), with the Department for
Work and Pensions charging for data processing and project
management. The other three national data holding organisations and
the four county councils did not charge. However there was a cost
associated with the unsuccessful negotiations to access the Higher
Education Funding Council for England’s data (see paragraph 2.10) as
well as the cost of using secure couriers to transfer data between some
of the data holding organisations and pilot areas.
Table 6: Data holding organisations’ costs
Data holding organisation / item Cost
Department for Education -
Welsh Department for Education and Skills -
Department for Work and Pensions £28,992
Higher Education Funding Council for England £1,204
Royal Mail £40,869
Student Loans Company -
Secure couriers £12,887
Total £83,952

We asked Cabinet Office for a summary of their costs – to cover the time 4.5
spent managing the pilot and also the work conducted by GDS (who are
part of Cabinet Office), who did a substantial amount of the matching
and data-processing (see paragraphs 3.14-3.18). However, Cabinet
Office were unable to provide an estimate of their costs.
Conclusion
As in the 2011 data matching pilot, we are unable to provide a definitive 4.6
overall cost for the pilot or comment on the likely costs should data
mining be rolled out further. This is because we do not have final costs
for all pilot areas or costs for any of the central government work. GDS
spent a lot of time on this project and if that was included in the total
spend the cost of the pilot is likely to increase significantly.


41

5 National data mining –
judging success
This chapter summarises our findings in relation to the national 5.1
databases used in the pilot. More details of the results submitted in
relation to each database can be found in Appendices A-E.

Key points
• The evidence from this pilot suggests that data mining, as it was
tested, is not a practical way of identifying unregistered electors.
• This is because, although the data returned to Electoral Registration
Officers (EROs) did contain details of unregistered electors, it also
contained significant numbers of existing electors, ineligible
individuals and out of date information (where the individual was no
longer resident at the given address).
• The reasons that so many existing electors and ineligible individuals
were returned on the data include poor data specifications from
Cabinet Office, currency restrictions not being tight enough and
incomplete or poor quality addresses on some of the national
databases.
• As a result of these issues, data mining is a resource intensive
process. The volume of work involved in the pilot was far higher than
expected for many pilot areas. The new registrations need to be
considered in light of the time and resources required to achieve
them.
• We are unable to draw clear conclusions on the cost benefit of data
mining from this pilot, due to incomplete costs information.
• The pilot also does not provide for an assessment of the use of local
data in comparison to national data, i.e. to what extent EROs could
have found some of the individuals identified from the national data
on their existing local data sources.
• In order for data mining to be of practical use to EROs, the data
returned would need to contain many fewer names of registered or
ineligible individuals and have significantly improved address
information. Our recommendations consider how that could be
achieved.
• Data mining, if rolled out nationally, would require a central
organisation to take on the role of coordinating data transfer and
processing the data (as Cabinet Office did for this pilot). Who that
organisation would be is a key question for any data mining roll out.
42

Introduction
In judging the success or potential of national data mining there are two 5.2
key questions:
39

• How effective is this data at identifying unregistered electors?
• Is data mining a cost effective way of registering new electors?
This chapter uses the results reported by the pilot areas and the issues 5.3
raised by them in our interviews in order to reach a conclusion based on
these two questions.
Identifying unregistered electors
The evidence from this pilot suggests that data mining, as it was tested, 5.4
is not a practical way of identifying unregistered electors.
This is because, although the data returned to EROs did contain details 5.5
of unregistered electors, it also contained significant numbers of existing
electors, ineligible individuals and out of date information (where the
individual was no longer resident at the given address).
We have not made, and could not make, a consistent assessment of 5.6
what proportion of records returned to EROs represented unregistered
people, resident at the given address. However, we have reported on
what pilot areas found when they either checked the data against other
data sources (e.g. their electoral registers) or when they received
responses to their follow up work, as well as using their feedback on the
processes and volume of work involved.
Addressing information
Pilot areas had serious concerns about the quality and consistency of 5.7
address information on some of the databases used. The addresses
held on each database are important as they form part of the matching
process (an algorithm would ideally be able to recognise the same
address on two different databases) and are also used to contact the
potential new electors.


39
Feedback from members of the public is also one of our evaluation criteria and, where