Multifactor Authentication update WCAG 2.2

laptop with login and password

On January 13th, 2022, a W3C Editor’s draft of the Web Content Accessibility Guidelines (WCAG) 2.2 was published on GitHub. Among several updates and new items, it includes processes for making multifactor Authentication more accessible and easier to use. Systems for auto-filling are allowed, as well as copy and paste, so that one does not have always depend on remembering passwords. Email links and text messages are included for those happy with using other applications and devices.

This is welcome help for aspects of multifactor authentication that were described in a previous blog, even though the requirement is not at the hoped for top level. However, it has been set at Level AA, so hopefully this new Success Critera will still be offered by web services later this year. As was mentioned in August 2021, passing the check is based on overcoming what is called the cognitive function test

“A task that requires the user to remember, manipulate, or transcribe information. Examples include, but are not limited to:

  • memorization, such as remembering a username, password, set of characters, images, or patterns. The common identifiers name, e-mail, and phone number are not considered cognitive function tests as they are personal to the user and consistent across websites;
  • transcription, such as typing in characters;
  • use of correct spelling;
  • performance of calculations;
  • solving of puzzles. ” (WCAG 2.2)

It should be pointed out that this draft has yet to be approved, but WCAG have set June 2022 as the date for publication.

As an aside, there is no mention regarding the impact of biometrics (such as facial or finger print recognition) in the WCAG document, which can also be used to support access to web services, but are not available on all devices. These systems do not suit all users, and if passwords are not used as part of a login process these could present another type of barrier.

Time-based one-time passwords (TOTPs) can also cause problems when they have a very short period of use (30 seconds) and a person may fail to complete the action several times and then has to take a break. A January 2022 review by PC Mag UK highlighted the fact that Authenticator apps can offer better security, when compared to text messages (SMS). Some have desktop options that may also be more accessible.

Moving on withTranscripts

Laptop and notepad on the laps of students in a lecture

Over the years researchers have shown how it is possible to have live interactive highlighted transcripts without character or line restrictions, such as is needed with captions. This is only possible when using technology, but with many more students using tablets, mobiles and phones during lectures it is surprising to find how few lecture capture systems offer this option.

It has been shown that physically writing notes by hand can aid retention and using laptops etc in lectures means there is access to all the other distractions such as social media and emails! However, having the availability of a transcript that provides interaction allows for key points to be selected and annotation improves retention for those students who find it hard to take notes whether by hand or using technology (Wald, 2018).

Systems that also offer transcript annotation linked to the presentation slides, intergrated with the ability to make personal notes alongside the synchronised text, are hard to find. Ways to correct words as you hear or see them, where there are subject complexities can also be difficult.

As was described in our last blog it is clear that all the corrections needed, tend to be measured by different forms of accuracy levels, whether it is the number of incorrect words, ommissions and substitutions. Further work on the NLive transcript has also shown that where English is not a first language those manually making corrections may falter when contractions and conditional tense are used and if the speaker is not a fluent English speaker, corrections can take up to five times longer (according to a recent discussion held by the Disabled Students’ Commission on 6th December).

Difficulties with subject related words have been addressed by CaptionEd with related glossaries, which is the case with many specialist course captioning offerings where companies have been employed to provide accurate outputs. Other companies, such as Otter.ai and Microsoft Teams automatically offer named speaker options which is also helpful.

Professor Mike Wald has produced a series of interesting figures as a possible sample of what can happen when students just see an uncorrected transcript, rather than actually listen to the lecture. This is important as not all students can hear the lecture or even attend in person or virtually. It is also often the case that the transcript of a lecture is used long after the event. The group of students he was working with six years ago found that:

  • Word Error Rate counts all errors (Deletions, substitutions and insertions in the classical scientific way used by speech scientists): WER was 22% for a 2715 word transcript.
  • Concept Error Rate counts errors of meaning: This was 15% assuming previous knowledge of content (i.e. ignoring errors that would be obvious if student knew topic content) but 30% assuming no previous knowledge of content.
  • Guessed error rate counts errors AFTER student has tried to correct transcript by them ‘guessing’ if words have errors or not: there was little change in Word Error Rate as words guessed correctly were balanced by words guessed incorrectly (i.e. correct words that student thought were incorrect and changed).
  • Perceived error rate asks student to estimate % errors: Student readers’ perception of Word Error Rate varied from 30% – 50% overall and 11% – 70% for important/key words: readers thought there were more errors than there really were and so found it difficult and frustrating.
  • Key Errors (i.e. errors that change meaning/understanding) were 16% of the total errors and therefore would only require 5 corrections per minute to improve Concept Error Rate from 15% to 0% (speaking rate was 142 wpm and there were approx 31 errors per minute) but it is important to note that this only improves the scientifically calculated Word Error Rate from 22% to 18%.

This is such an important challenge for many universities and colleges at the moment, so to follow on from this blog you may be interested to catch up with the transcript provided from the Disabled Students’ Commission Roundtable debate held on 6th December. One of the summary comments highlighted the importance of getting the technology right as well as manual support, but overriding this all was the importance of listening to the student voice.

Finally, if you ever wonder why speech recognition for automated captioning and transcription still fails to work for us all, have a look at a presentation by Speechmatics about AI bias, inclusion and diversity in speech recognition . An interesting talk about using word error rates, AI and building models using many hours of audio with different phonetic structures to develop language models that are more representative of the voices heard across society.

Guidance for captioning rich media from Advance HE (26/02/2021)

Transcripts from Captions?

young person looking at computer for online learning

The subject of automatic captioning continues to be debated but Gerald Ford Williams has produced a really helpful “guide to the visual language of closed captions and subtitles” on UX Collective as a “user-centric guide to the editorial conventions of an accessible caption or subtitle experience.” It has a series of tips with examples and several very useful links at the bottom of the page for those adding captions to videos. There is also a standard for the presentation of different types of captions across multimedia ISO/IEC 20071-23:2018(en).

However, in this article transcripts are something that also need further discussion, as they are often used as notes gathered from a presentation, as a result of lecture capture or an online conference with automatic captioning. They may be copied from the side of the presentation, downloaded after the event or presented to the user as a file in PDF/HTML or text format depending on the system used. Some automated outputs provide notification of speaker changes and timings, but there are no hints as to content accuracy prior to download.

The problem is that there also seem to be many different ways to measure the accuracy of automated captioning processes which in many cases become transcriptions. 3PlayMedia suggest that there is a standard, saying “The industry standard for closed caption accuracy is 99% accuracy rate. Accuracy measures punctuation, spelling, and grammar. A 99% accuracy rate means that there is a 1% chance of error or a leniency of 15 errors total per 1,500 words” when discussing caption quality.

The author of the 3PlayMedia article goes on to illustrate many other aspects of ‘quality’ that need to be addressed, but the lack of detailed standards for the range of quality checks means that comparisons between the various offerings are hard to achieve. Users are often left with several other types of errors besides punctuation, spelling and grammar. The Nlive project team have been looking into these challenges when considering transcriptions rather than captions and have begun to collect a set of additional issues likely to affect understanding. So far, the list includes:

  • Number of extra words added that were not spoken
  • Number of words changed affecting meaning – more than just grammar.
  • Number of words omitted
  • Contractions … e.g. he is – he’s, do not … don’t and I’d could have three different meanings I had, I would, or I should!

The question is whether these checks could be included automatically to support collaborative manual checks when correcting transcriptions?

Below is a sample of the text we are working on as a result of an interview to demonstrate the differences between three commonly used automatically generated captioning systems for videos.

Sample 1

Sample 2

Sample 3

So stuck. In my own research, and my own teaching. I’ve been looking at how we can do the poetry’s more effectively is one of the things so that’s more for structuring the trees, not so much technology, although technology is possibleso starting after my own research uh my own teaching i’ve been looking at how we can do laboratories more effectively is one of the things so that’s more for structuring laboratories not so much technology although technology is part of the laboratoryso stop. In my own research on my own teaching, I’ve been looking at how we can do the ball trees more effectively. Is one thing, so that’s more for structuring the voluntary is not so much technology, although technology is part little bar tree

Having looked at the sentences presented in transcript form, Professor Mike Wald pointed out that Rev.com (who provide automated and human transcription services) state that we should not “try to make captions verbatim, word-for-word versions of the video audio. Video transcriptions should be exact replications, but not captions.” The author of the article “YouTube Automatic Captions vs. Video Captioning Services” highlights several issues with automatic closed captioning and reasons humans offer better outcomes. Just in case you want to learn more about the difference between a transcript and closed cations 3PlayMedia wrote about the topic in August 2021 “Transcription vs. Captioning – What’s the Difference?”.

Collaboration and Captioning

headphones

Over the years captioning has become easier for the non-professional with guidance on many platforms including YouTube and a blog about “Free Tools & Subtitle Software to Make Your Video Captioning Process Easier” from Amara. This does not mean that we are good at it, nor does it mean that it does not take time! However, artificial intelligence (AI) and the use of speech recognition can help with the process.

Nevertheless, as Professor Mike Wald said only 11 months ago in an article titled Are universities finally complying with EU directive on accessibility? “Finding an affordable way to provide high quality captions and transcripts for recordings is proving very difficult for universities and using automatic speech recognition with students correcting any errors would appear to be a possible solution.”

The idea that there is the possibility of collaboration to improve the automated output at no cost is appealing and we saw it happening with Synote over ten years ago! AI alongside the use of speech recognition has improved accuracy and Verbit advertise their offerings as being “99%+Accuracy”. but sadly do not provide prices on their website!

Meanwhile Blackboard Collaborate as part of their ‘Ultra experience’ offers attendees the chance to collaborate on captioning when managed by a moderator, although at present Blackboard Collaborate does not include automated live captioning. There are many services that can be added to online video meeting platforms in order to support captioning such as Otter.ai. Developers can also make use of options from GoogleMicrosoftIBM, and Amazon. The TechRepublic describe 5 speech recognition apps that auto-capton videos on mobiles. Table 1. shows the options available in three platforms often used in higher education.

Caption OptionsZoomMicrosoft TeamsBlackboard Collaborate
Captions – automatedYes YesHas to be added 
Captions – live manual correctionWhen set upWhen set upWhen set up
Captions – live collaborative correctionsNoNoNo
Captions Text Colour adaptations Size onlySome optionsSet sizes
Caption Window resizingNoSuggested, not implementedSet sizes 
Compliance – WCAG 2.1 AAYesYesYes
Table 1. Please forgive any errors made with the entries – not all versions offer the same options

It is important to ask if automated live captioning is used with collaborative manual intervention, who is checking the errors? Automated captioning is only around 60 – 80% accurate depending on content complexity, quality of the audio and speaker enunciation. Even, 3Playmedia in an article on “The Current State of Automatic Speech Recognition” admits that human intervention is paramount when total accuracy is required.

Recent ‘Guidance for captioning rich media’ for Advance HE, highlights the fact that the Web Content Accessibility Guidelines 2.1 (AA) require “100% accurate captioning as well as audio description.” They acknowledge the cost entailed, but perhaps this can be reduced with the increasing accuracy of automated processes in English and error correction can be completed with expert checks. It also seems to make sense to require those who have the knowledge of a subject to take more care when the initial video is created! This is suggested alongside the AdvanceHE good practice bullet points such as

“…ensure the narrator describes important visual content in rich media. The information will then feature in the captions and reduces the need for additional audio description services, benefiting everyone.”.

Let’s see how far we can go with these ideas – expert correctors, proficient narrators and willing student support!

Authentication Types: what they mean?

iris biometric scanning

You might have wondered what all those authentication types mentioned in our last blog actually meant? Some are well known, but a few are new, so it seemed to make sense to try to give each one a definition or explanation from the many sites that have this information! The result is a random collection of links. They may not be the best available and are certainly not academically based or tried and tested but here goes:

Knowledge: Something a person knows

  • Password – a string of characters that allows access to a computer system or service.
  • PIN – A personal identification number (PIN), or sometimes redundantly a PIN number, is a numeric (sometimes alpha-numeric) passcode used in the process of authenticating a user accessing a system.
  • Knowledge-based challenge questions – Knowledge-based authentication (KBA) is an authentication scheme in which the user is asked to answer at least one “secret” question.
  • Passphrase – A passphrase is a longer string of text that makes up a phrase or sentence.
  • Memorised swiping path – laying your finger on a screen and moving in any direction that covers the memorised characters.

Possession: Something a person has

  • Possession of a device evidenced by one time password (OTP) generated by, or received on a device – “The password or numbers sent to for instance a phone expire quickly and can’t be reused.”
    • Possession of a device evidenced by a signature generated by a device – “hardware or software tokens generate a single-use code to use when accessing a platform.”
    • Card or device evidenced by QR code scanned from an external device – “Quick Response (QR) code used to authenticate online accounts and verify login details via mobile scan or special device.”
    • App or browser with possession evidenced by device binding – “a security chip embedded into a device or private key linking an app to a device, or the registration of the web browser linking a browser to a device”
    • Card evidenced by a card reader – “physical security systems to read a credential that allows access through access control points.”
    • Card with possession evidenced by a dynamic card security code – “Instead of having a static three- or four-digit code on the back or front of the card, dynamic CVV technology creates a new code periodically.”

Inherence: Something about the person e.g. biometrics

  • Fingerprint scanning – “When your finger rests on a surface, the ridges in your fingerprints touch the surface while the hollows between the ridges stand slightly clear of it. In other words, there are varying distances between each part of your finger and the surface below. A capacitive scanner builds up a picture of your fingerprint by measuring these distances.”
  • Voice recognition – “Voice and speech recognition are two separate biometric modalities…By measuring the sounds a user makes while speaking, voice recognition software can measure the unique biological factors that, combined, produce [the] voice.”
  • Hand & face geometry – A biometric that identifies users from the shape of their hands and in the case of Google’s Media Pipe face identification it is complex network of 3D facial keypoints using artificial intellingence etc to analyse the results.
  • Retina & iris scanning – “both ocular-based biometric identification technologies… no person has the same iris or retina pattern”
  • Keystroke dynamics – …”keystroke dynamics don’t require an active input. Instead, keystroke dynamics analyzes the typing patterns of users; this can include typing rhythms, frequent mistakes, which shift keys they use for capitalization and pace.”
  • Angle at which device is held – “the exact angle a user holds the phone as a means of making replay attacks a lot more difficult.”

There has been a debate about which of the above should be considered under the various headings and acceptable as part of a secure multifactor authentication system. If you are interested in these processes and want more information it may be worth reading the Opinion of the European Banking Authority on the elements of strong customer authentication under PSD2. By the way PSD2 means ‘Payment Services Directive 2’ and the UK will be following the directive, but there is an extension for UK e-commerce transactions.

However, in the meantime many organisations other than banking, shopping sites and those that hold personal data have asked users to consider multifactor authentication including the NLive project lead and the University of Southampton that has some helpful instructions.