List Info

Thread: UK Research Evaluation Framework: Validate Metrics Against Panel Rankings




UK Research Evaluation Framework: Validate Metrics Against Panel Rankings
country flaguser name
United States
2007-11-25 22:11:33
     ** Cross-Posted** Fully Hyperlinked version of this
posting:
     http://openaccess.eprints.org/index.php?/archives
/333-guid.html

SUMMARY: Three things need to be remedied in the UK's
proposed 
HEFCE/RAE Research Evaluation Framework: 
http://
www.hefce.ac.uk/pubs/hefce/2007/07_34/

     (1) Ensure as broad, rich, diverse and forward-looking
a 
battery of candidate metrics as possible -- especially
online 
metrics -- in all disciplines.

     (2) Make sure to cross-validate them against the panel

rankings in the last parallel panel/metric RAE in 2008. The

initialized weights can then be fine-tuned and optimized by
peer 
panels in ensuing years.

     (3) Stress that it is important -- indeed imperative --
that 
all University Institutional Repositories (IRs) now get
serious 
about systematically archiving all their research output
assets 
(especially publications) so they can be counted and
assessed (as 
well as accessed!), along with their IR metrics (downloads,

links, growth/decay rates, harvested citation counts,
etc.).

     If these three things are systematically done -- (1) 
comprehensive metrics, (2) cross-validation and calibration
of 
weightings, and (3) a systematic distributed IR database
from 
which to harvest them -- continuous scientometric assessment
of 
research will be well on its way worldwide, making research

progress and impact more measurable and creditable, while at
the 
same time accelerating and enhancing it. Once one sees the
whole 
report, it turns out that the HEFCE/RAE Research Evaluation

Framework is far better, far more flexible, and far more 
comprehensive than is reflected in either the press release
or 
the Executive Summary.

It appears that there is indeed the intention to use many
more 
metrics than the three named in the executive summary
(citations, 
funding, students), that the metrics will be weighted field
by 
field, and that there is considerable open-mindedness about

further metrics and about corrections and fine-tuning with
time. 
Even for the humanities and social sciences, where
"light touch" 
panel review will be retained for the time being, metrics
too 
will be tried and tested.

This is all very good, and an excellent example for other 
nations, such as Australia (also considering national
research 
assessment with its Research Quality Framework), the US (not
very 
advanced yet, but no doubt listening) and the rest of Europe

(also listening, and planning measures of its own, such as 
EurOpenScholar).

There is still one prominent omission, however, and it is a

crucial one:

The UK is conducting one last parallel metrics/panel RAE in
2008. 
That is the last and best chance to test and validate the 
candidate metrics -- as rich and diverse a battery of them
as 
possible -- against the panel rankings. In all other fields
of 
metrics -- biometrics, psychometrics, even weather
forecasting 
metrics ? before deployment the metric predictors first need
to 
be tested and shown to be valid, which means showing that
they do 
indeed predict what they were intended to predict. That
means 
they must correlate with a "criterion" metric that
has already 
been validated, or that has "face-validity" of
some kind.

The RAE has been using the panel rankings for two decades
now (at 
a great cost in wasted time and effort to the entire UK
research 
community -- time and effort that could instead have been
used to 
conduct the research that the RAE was evaluating: this is
what 
the metric RAE is primarily intended to remedy).

But if the panel rankings have been unquestioningly relied
upon 
for 2 decades already, then they are a natural criterion
against 
which the new battery of metrics can be validated,
initializing 
the weights of each metric within a joint battery, as a
function 
of what percentage of the variation in the panel rankings
each 
metric can predict.

This is called "multiple regression" analysis: N
"predictors" are 
jointly correlated with one (or more) "criterion"
(in this case 
the panel rankings, but other validated or face-valid
criteria 
could also be added, if there were any). The result is a set
of 
"beta" weights on each of the metrics, reflecting
their 
individual predictive power, in predicting the criterion
(panel 
rankings). The weights will of course differ from discipline
by 
discipline.

Now these beta weights can be taken as an initialization of
the 
metric battery. With time, "super-light" panel
oversight can be 
used to fine-tune and optimize those weightings (and new
metrics 
can always be added too), to correct errors and anomalies
and 
make them reflect the values of each discipline.

(The weights can also be systematically varied to use the
metrics 
to re-rank in terms of different blends of criteria that
might be 
relevant for different decisions: RAE top-sliced funding is
one 
sort of decision, but one might sometimes want to rank in
terms 
of contributions to education, to industry, to
internationality, 
to interdisciplinarity. Metrics can be calibrated
continuously 
and can generate different "views" depending on
what is being 
evaluated. But, unlike the much abused "university
league table," 
which ranks on one metric at a time (and often a subjective

opinion-based rather than an objective one), the RAE metrics

could generate different views simply by changing the
weights on 
some selected metrics, while retaining the other metrics as
the 
baseline context and frame of reference.)

To accomplish all that, however, the metric battery needs to
be 
rich and diverse, and the weight of each metric in the
battery 
has to be initialised in a joint multiple regression on the
panel 
rankings. It is very much to be hoped that HEFCE will
commission 
this all-important validation exercise on the invaluable and

unprecedented database they will have with the unique,
one-time 
parallel panel/ranking RAE in 2008.

That is the main point. There are also some less central
points:

The report says -- a priori -- that REF will not consider
journal 
impact factors (average citations per journal), nor author
impact 
(average citations per author): only average citations per
paper, 
per department. This is a mistake. In a metric battery,
these 
other metrics can be included, to test whether they make any

independent contribution to the predictivity of the battery.
The 
same applies to author publication counts, number of
publishing 
years, number of co-authors -- even to impact before the 
evaluation period. (Possibly included vs. non-included staff

research output could be treated in a similar way, with
number 
and proportion of staff included also being metrics.)

The large battery of jointly validated and weighted metrics
will 
make it possible to correct the potential bias from relying
too 
heavily on prior funding, even if it is highly correlated
with 
the panel rankings, in order to avoid a self-fulfilling
prophecy 
which would simply collapse the Dual RAE/RCUK funding system
into 
just a multiplier on prior RCUK funding.

Self-citations should not be simply excluded: they should be

included independently in the metric battery, for
validation. So 
should measures of the size of the citation circle
(endogamy) and 
degree of interdisciplinarity.

Nor should the metric battery omit the newest and some of
the 
most important metrics of all, the online, web-based ones: 
downloads of papers, links, growth rates, decay rates, 
hub/authority scores. All of these will be provided by the
UK's 
growing network of UK Institutional Repositories. These will
be 
the record-keepers -- for both the papers and their usage
metrics 
-- and the access-providers, thereby maximizing their usage

metrics.

REF should put much, much more emphasis on ensuring that the
UK 
network of Institutional Repositories systematically and 
comprehensively records its research output and its metric 
performance indicators.

But overall, thumbs up for a promising initiative that is
likely 
to serve as a useful model for the rest of the research
world in 
the online era.

****


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )