THE A-Z OF TESTING
G
GUIDANCE
Some tests are used to inform a person’s own decisions – what sort of career to go for, for instance – rather than to make decisions about them – as in recruitment and selection. This is known as guidance.
Tests can be powerful bases for rich discussions about many different life changes and, as counselling and coaching become popular means of developing staff, particularly senior ones, tests are being used more widely than they used to be in this area. In particular “interest inventories”, assessments which clarify a person’s real interests ( as opposed to their ability in an area ) are useful here.
H
HISTOGRAM
Simply a way of presenting a frequency distribution ( see above ) graphically, as a bar chart
I
INFORMED CONSENT
Ensuring that the test taker understands and agrees to the terms and conditions under which the testing takes place: how the results will be used; why the testing is taking place etc etc. This can be done by the administrator or via an introductory screen on a computer based testing system.
Recent developments have stressed the need for a two way commitment at the beginning of the testing session:
INSTRUMENT
Well, you’ll see this word used quite a lot in manuals and advertising copy about tests and assessments but it's difficult to say it has a technical definition. Let’s say it’s another term for a formal method of making a decision about something and its usually a “thing” – a book, a test, a form, a piece of software – rather than a process.
INTEGRITY
Integrity tests are sometimes called “ honesty tests”.
Assessing “integrity” is a huge growth area in the USA ( as many as 5 million people being assessed for integrity each year ). In Europe, employers tend to try to assess this via interviews but this can be a minefield: unstructured interviews are extremely poor predictors of anything and blunt questions about honesty are fairly easily answered by candidates to the satisfaction of the interviewer.
The area of testing integrity has been controversial: does it impinge too much on private life ? ; isn’t it fairly obvious how you have to answer the questions in an integrity test ?; given that any measurement contains error you’re going to suggest people are dishonest when they aren’t – which is a fairly serious accusation.
Professor Adrian Furnham summarises the approach taken by integrity tests as follows:
The impetus for the growth in integrity testing has been research into the huge cost of white collar crime; high profile cases like Enron and the research of psychologists into management malpractice and its causes..
Whatever the criticisms, integrity testing seems to predict later job success surprisingly well. The reason for this is that integrity tests often actually measure one of the Big Five factors – conscientiousness – a particularly robust measure of work performance.
INTELLIGENCE
Intelligence is important in work performance so here is a very short introduction to some issues.
Mike and Pam Smith in their book TESTING PEOPLE AT WORK give a particularly useful definition of intelligence :
“ the relative speed and accuracy with which the brain processes complex information.”
As is mentioned elsewhere in our A-Z, people will have different abilities with different sorts of information: words, numbers, diagrams etc. We also define two different sorts of intelligence: crystallised ( what you’ve learnt and your access to that knowledge ) and fluid ( your ability to make sense of a situation ).
The concept of general intelligence ( which you’ll often find referred to as ‘g’ ) is that if you’re good at one type of mental problem you’ll TEND to be good at all of them. As we’ve said, individuals will have different strengths and weaknesses (good with verbal information, less good with abstract reasoning ) but this tendency ( good at one, tendency to be good at all ) has been shown repeatedly. In other words there is such a thing as “ general intelligence” or “ smarts” or being “ quick on the uptake” and it can be measured. Many of you will know your own Intelligence Quotient or IQ which measures general intelligence.
For some time IQ fell into disrepute as some people doubted its existence or saw it as being culturally biased. At this stage intelligence was often defined as “the ability to perform well on intelligence tests” which is not that useful to a hard-pressed manager ! Interest in multiple intelligences ( see Emotional intelligence, above ) was a reaction to this.
But research has increasingly established general intelligence, shown that it’s correlated with job success and that it’s related to the physical functions of the brain.
All this said, classic tests which produce an IQ tend to be used more in educational and clinical testing. Business testers tend to use tests of specific ability ( verbal, numerical; etc etc ) because it’s easier to see how an individual’s strengths and weaknesses in different abilities relate to the content of a particular job, whereas knowing someone has an IQ of x provides a label, but a less obvious basis for actions and decisions.
In addition there’s still a lot of misunderstanding of intelligence and organisations are worried about being challenged over the use of classic IQ tests, either because of bias or because IQ items are not obviously related to specific jobs.
Fundamentally though, anyone recruiting, managing or developing people should be aware of the importance of intelligence and how intelligence affects performance. Just to give two examples:
INTERESTS
Attitudes towards things you do whether work, hobbies, duties or day by day activities. Interest Inventories are types of assessment which help people understand their real interest and are useful in career guidance.
INTER-RATER RELIABILITY
If two different people assess the same thing, do they come up with the same score.
Let’s use sports as an example. For certain events you need very little human judgement: if someone jumps higher than everyone else they win the high jump. The only judgement involved is when the referees measure how high the bar is. They might misread the instrument used for measuring the height.
In other sports, ‘judgement’ is much more important: ice dancing, synchronised swimming, acrobatics are examples. Here, judges decide on scores and you sometimes see different judges giving the same performance very different scores. There’s controversy when one performer is given much higher scores than another but other experts ( and sometimes media without much expertise ) claim the scores are unfair.
So, when developing a test, researchers measure whether different people in different situations, measuring the same person, will record either the same score, or scores whose difference simply reflects the inaccuracies present in every act of measurement ( see CLASSIC TEST THEORY )
Inter-Rater Reliability is also important in techniques where there’s a subjective element involved: for instance where you’re running an assessment centre and someone has to rate someone’s performance on a group exercise. Usually a standard rating scheme is created and the “ raters” observing the exercise are trained carefully so they score the same things in the same way.
To really see the importance of Inter Rater Reliability watch programmes like The X Factor. Initially you have 3 judges with very private agendas ( and no objective scoring system ) having spats over an individual singer. There’s no inter rater reliability here, let alone when the public votes on who stays in the programme and who gets thrown off it.
IN-TRAY EXERCISE
As it implies, an exercise which tries to mirror real office work by providing an in-basket of memos, letters and other documents which a candidate has to prioritise and deal with. An In-Tray exercise is often a component of an ASSESSMENT CENTRE ( see our entry in the A-Z).
Originally In-Tray Exercises were carried out physically: a candidate was given a tray of real documents. This made it difficult to mark their performance and to tailor the documents to a specific organisation.
As work has moved on-screen so have these sorts of exercises which are now easer to customise and score objectively. On-screen exercises now mirror the real flow of office life with e-mails , messages, requests for meetings and urgent decisions appearing throughout the process.
IPSATIVE
Ipsative tests measure something within a person rather than comparing that person with other people ( the test comparing the person with other people is called a NORM-REFERENCED TEST ). An ipsative test measures a person’s score on one scale or factor with their own scale on another factor.
For instance, an ipsative test might measure which of the following areas a person was most interested in, which was their next strongest interest down to their weakest interest:
SPORT
ACADEMIC WORK
CHARITY WORK
IDLENESS
BUSINESS
The results would show the relative strength of that person’s interest in these areas. A norm referenced test, by contrast, might show that a person was more interested in sport than 70% of the adult population in the UK but less interested in business than 90% of the population in the UK.
Norm-referenced and ipsative measures have different strengths and weaknesses and can be used for different purposes. There is a huge literature on the technical merits of each approach. In recent years more companies have seen the benefits of ipsativity and often use norm-reference and ipsative measures next to each other to get more rounded views of an individual. Ipsative measures:
There are a number of technical implications of this approach to testing which we’re more than happy to talk through with users and prospective users.
IQ
See intelligence
ITC
The International Test Commission. An international body which, among other things, lays down guidelines for testing which seek to bring together international views in an increasingly multinational activity.
ITEM
A question or a statement in a test that a candidate has to ( respectively ) answer or react to.
ITEM BANK
See ITEM RESPONSE THEORY
ITEM RESPONSE THEORY
A real growth area in test theory and applications at the moment as this theory underlies a lot of work on ability item banks and adaptive testing systems.
Item Response Theory is an area of academic work and theory which deals with the individual characteristics of an ITEM ( see above ) and how likely it is that a person will make a correct response to an item – in other words, how difficult it is.
Once you know this about items you can form a huge database of them ( an ITEM BANK ) and use algorhythms to create tests which are made up of different items but which are equivalent in difficulty. This is central to ADAPTIVE testing and CBT.
To give an example; let’s say you want to assess a pool candidates’ who live a long way away on a verbal reasoning tests. You’re going to assess them over the internet but you want to reduce the possibility of them swapping answers. Using an item bank, generated test each candidate would be presented with a test which was of equivalent difficult ( so you could compare their scores), but the individual items would be different and have different answers.