Persona – Eum, a would be computer scientist who needs magnification

Eum

Full face view of Eum of Asian origin with dark heair and eyes

Age: 21 years

Course: Computer Science 1st Year

Hobbies: coding, cooking and travelling

Background:

Eum developed glaucoma meaning that as his peripheral vision has deteriorated and his central vision has become foggy, he spends more time zooming in to read text rather than zooming out. He has had to depend more and more on assistive technologies to help with his computer science studies, having changed his plans that were to become a chef.  Eum loves cooking, but the stress of a restaurant kitchen coping with lots of unexpected obstacles became too challenging.   As time has passed he also found he does not always recognise faces or signs in the distance.  He tends to trip up on curbs or steps, especially when it gets dark or there is low lighting, however, this has not stopped his love of exploring new places when on holiday with family. 

Reading can be difficult without careful targeted magnification and he tends to set up his two computer screens with good contrast levels for black on white text or other high colour contrast options, whilst trying to reduce reflections from windows and lights. At times he finds it easier to move the text in front of his eyes rather than scanning across text because of his reduced field of vision and the depth of focus on individual sections of text needed.   Time is often an issue as assignments tend to take longer to complete when reading only a few letters in one glance.  Eum uses a magnification cum screen reading program on a Windows machine, so that he can access various arrow heads, crosshair and line cursor sizes with focus tracking.  The latter means that items are magnified as he moves across and down the screen.  A high visibility keyboard helps, as does his portable magnification stand for reading paper-based materials. 

Although Eum uses audio books on his tablet for listening to novels, he prefers to read academic papers and notes on his computer, where he can add annotations.   Messaging and emails using his smart phone are quicker with speech recognition and text to speech output, although Eum does not depend totally on the screen reader technology for navigation.   He has learnt over time where items are to be found and is very disciplined about how he personalises his desktop and filing system on all his devices.  As a computer scientist in the making, Eum has become adept at changing browser settings with ad blockers and other extensions, but this does not compensate for the clutter he finds on many web sites.  He becomes frustrated when developers fail to realise the importance of avoiding overlaps and disappearing content when sites are magnified.  He feels scrolling horizontally should not be necessary on websites especially when form filling, as this is a particular challenge with text fields or modal windows that go missing or where there is no logical order to the layout.

Main Strategies to overcome Barriers to Access

Controls are labelled
Keyboard Access for navigation
Multifactor authentication
Accessible Error or Status Messages
Image alt text descriptions
Consistent Heading order
Audio Descriptions
Document accessibility

Multifactor authentication for password verification.  Eum usually manages the initial password and use of mobile authenticator apps, SMS or knowledge-based challenge questions better than the grid style image or Captcha options. Time can be Eum’s real problem if he has to cope with different types of authentication where items disappear before he has had a chance to memorise or note down the code.  He cannot use retina-scan technology, but finger and speech recognition work well.  (Web Content Accessibility Guidelines (WCAG) 2.2 Success Criterion 3.3.7 Accessible Authentication)

Keyboard Access for navigation helps Eum as he has memorised many short cuts when surfing the web and finds it easier than mouse use when he has to track the arrow head or cursor.  He depends on web pages appearing and operating in predictable ways (W3C WCAG Predictable: Understanding Guideline 3.2).

Allowing content to reflow on Zoom.   Eum reads magnified content using large fonts that need mean a page has to reflow vertically (as a column) and remain fully legible and logical without the need for horizontal scrolling (W3C WCAG 1.4.4 Resize text (Level AA)). User agents that satisfy UAAG 1.0 Checkpoint 4.1 allow users to configure text scale.

Maintain logical order of Form ControlsEum benefits from the logical order of form controls with labels close to the fields to which they relate so that the flow is vertical on magnification rather than horizontal.  Dividing forms into small sections when they are long or complex with clear headings really helps.  However, careful checks need to be made with borders of buttons etc so that they do not overlap other controls or disappear from the screen with high levels of magnification.  

Controls need to have good contrast, shape and colour. Text needs to be distinct. The importance of contrast levels for all aspects of a service cannot be stressed too highly and yet it is often one of the items that fails in accessibility checks. Webaim provide clear guidance Understanding WCAG 2 Contrast and Color Requirements.  It is also vital that text and images of text are distinct from their surroundings – WCAG SC 1.4.3 (Contrast – Minimum)

Elements should have more than one visual cue. Eum finds it really helps when he is selecting features on a web page to have icons, colour as well as text such as when there is an alert.  Underlined text should just be used for hyperlinks.  Links and other actionable elements must be clearly distinguishable.

Changing Content without warning.  Because Eum uses assistive technology, whether it is his magnification software or speech recognition, he finds that any visual changes that occur to the interface or actions that happen without the page refreshing can cause problems.  He needs to be notified about the changes W3C WCAG 2.0 Name, Role, Value: Understanding SC 4.1.2 and there is also the need to Understand Success Criterion 4.1.3: Status Messages.

Responsive Design for tablet use.  Eum likes to use his tablet, but has to depend on its built-in access technologies and the responsive design for accessibility provided by the developers of web services as well as their customisable options.  He relies on a consistent layout throughout a web service with sufficient space between interactive items such as buttons.   A11y project mobile and touch checklist for accessibility.

Captions, Audio Descriptions and use of transcripts.  Videos can be very tiring to watch and so Eum uses captions with high contrast colours to make them stand out more against the background of the video images, using highlighter systems to mark key points in the transcripts and depends on audio descriptions if scenes are not well described in the commentary on the video.

Document accessibility whether online or as a PDF download or other format, Eum needs to make changes to suit his narrow field of vision.   WCAG Understanding Success Criterion 1.4.8: Visual Presentation has some useful guidance.  WebAim also have an easy to follow set of instructions for making ‘Accessible Documents: Word, PowerPoint, & Acrobat’.

Key points from Eum

“Allow for magnification with zooming in to read text in small chunks, taking care to make visual presentation as logical as possible for vertical scrolling preferably in one column layout”

Designing and Coding for Low Vision” featuring Mallory van Achterberg is a really helpful YouTube video where he says ‘If I cannot see it, don’t let me tab to it” plus lots of other helpful comments, as a coder who uses magnification.

There is a useful document called  “Accessibility Requirements for People with Low Vision” created as a W3C Editor’s Draft (04 November 2021)

Multifactor Authentication – Cognitive overload?

smart phone face recognition, code, finger print and numeric password screens.

Recenty the FBI warned that cyber thieves are finding ways around multifactor authentication protections. We may not own crypto accounts, but universities across the UK have implemented forms of two or multi-factor authentication to protect our accounts and we are grappling with strategies to ensure accessible forms for verification are available. The W3C WCAG Cognitive and Learning Disabilities Task Force has published an Accessible Authentication (User Story) to illustrate the need for the cognitive function test mentioned in the previous blog. It also highlights how memory impairments and difficulties with executve function can make MFA a challenge as well as time constraints.

There was a Twitter stream in February 2021, that highlighted more reasons why some of the MFA choices offered may not be helpful to individuals with Autism and Attention Deficit Hyperacive Disorder (ADHD). Devon Price’s insight into the frustrating nature of these extra levels of security illustrates the cognitive overload that can occur when several tasks have to be completed with multiple devices or systems.

key fob

If you do not have an external code device, the choices tend to centre around the use of a personal mobile phone. Having checked the websites of 50 universities it appeared that 42 (84%) advised students to use an authenticator app – usually Microsoft, that also offers a Self Service Password Reset. Other options were the Authy, Google, Sophos or Duo apps. All the university systems still require the user to remember passwords with the extra verification and then encourage a back up option via SMS, a phone call and some mentioned a landline or email. Only 4 universities offered a choice of two authentication apps and just 7 mentioned the use of a fob or external device, although one said this would not to work with Virtual Personal Networks. Preference for Authy as an alternative was mentioned in a question about the Microsoft Authenticator, as this can be used on a desktop computer. At the time of writing Microsoft Authenticator instructions do not mention a desktop verification option to their MFA.

Some universities in the small study used the Microsoft instructions, but when searching for support it took well over 3 clicks, to find out about the authentication options offered by 14 out of the 50 universities (28%), 9 of these websites either had no information or required a log in. This meant that a new student may have no way of preparing for this aspect of registration, although all of the websites had good connections to their student support or IT services.

Only one university appeared to depend on a memorable word for verification and the use of authenticator apps usually meant that the code could be used without an internet connection or a connection to a mobile network provider. This does not mean that copying a code results in successful verification on all occasions.

Where concentration or attention is an issue, as may happen with ADHD, the problem of copying codes from one device to another can become worse as more attempts are made. Too many tries leads to lockout, desperation and yet another feeling of failure, let alone time wasting and other more severe consequences, if the actions are related to banking. Actions that involve recognition and copying qualify under the W3C WCAG cognitive function test as requiring an alternative method.

Using biometrics is often considered a good alternative, but it is not always easy to get facial recognition to work on phones if you are blind or have dexterity difficulties and some individuals really do not like sharing their facial image. The UK National Cyber Security Centre admits that fingers prints are not always recognised if people have been working in some industries or you are elderly and even a skin condition can cause problems. Also not all devices offer the chance of using finger recogntion.

In summary Making Content Usable for People with Cognitive and Learning Disabilities suggests four options:

  1. Web Authentication: An API for accessing Public Key Credentials [webauthn-2].
  2. Single Sign-on (SSO) that allow users to access many sites with a single login (federated login).
  3. Two step authentication with Bluetooth links (no copying).
  4. Quick Response Codes (QR Code).

The use of authentication apps, with a set up of 5 additional choices was offered by 6 of the 50 universities reviewed, so they provided more than the above options, plus the ability to use a helpline. When checking the helplines it became clear that there were often rather a lot of questions surrounding the Microsoft Authenticator, as evidenced by the University of Hertfordshire’s comprehensive set of answers. So It appears there is still much to do to make the process more inclusive.

Multifactor Authentication update WCAG 2.2

laptop with login and password

On January 13th, 2022, a W3C Editor’s draft of the Web Content Accessibility Guidelines (WCAG) 2.2 was published on GitHub. Among several updates and new items, it includes processes for making multifactor Authentication more accessible and easier to use. Systems for auto-filling are allowed, as well as copy and paste, so that one does not have always depend on remembering passwords. Email links and text messages are included for those happy with using other applications and devices.

This is welcome help for aspects of multifactor authentication that were described in a previous blog, even though the requirement is not at the hoped for top level. However, it has been set at Level AA, so hopefully this new Success Critera will still be offered by web services later this year. As was mentioned in August 2021, passing the check is based on overcoming what is called the cognitive function test

“A task that requires the user to remember, manipulate, or transcribe information. Examples include, but are not limited to:

  • memorization, such as remembering a username, password, set of characters, images, or patterns. The common identifiers name, e-mail, and phone number are not considered cognitive function tests as they are personal to the user and consistent across websites;
  • transcription, such as typing in characters;
  • use of correct spelling;
  • performance of calculations;
  • solving of puzzles. ” (WCAG 2.2)

It should be pointed out that this draft has yet to be approved, but WCAG have set June 2022 as the date for publication.

As an aside, there is no mention regarding the impact of biometrics (such as facial or finger print recognition) in the WCAG document, which can also be used to support access to web services, but are not available on all devices. These systems do not suit all users, and if passwords are not used as part of a login process these could present another type of barrier.

Time-based one-time passwords (TOTPs) can also cause problems when they have a very short period of use (30 seconds) and a person may fail to complete the action several times and then has to take a break. A January 2022 review by PC Mag UK highlighted the fact that Authenticator apps can offer better security, when compared to text messages (SMS). Some have desktop options that may also be more accessible.

Moving on withTranscripts

Laptop and notepad on the laps of students in a lecture

Over the years researchers have shown how it is possible to have live interactive highlighted transcripts without character or line restrictions, such as is needed with captions. This is only possible when using technology, but with many more students using tablets, mobiles and phones during lectures it is surprising to find how few lecture capture systems offer this option.

It has been shown that physically writing notes by hand can aid retention and using laptops etc in lectures means there is access to all the other distractions such as social media and emails! However, having the availability of a transcript that provides interaction allows for key points to be selected and annotation improves retention for those students who find it hard to take notes whether by hand or using technology (Wald, 2018).

Systems that also offer transcript annotation linked to the presentation slides, intergrated with the ability to make personal notes alongside the synchronised text, are hard to find. Ways to correct words as you hear or see them, where there are subject complexities can also be difficult.

As was described in our last blog it is clear that all the corrections needed, tend to be measured by different forms of accuracy levels, whether it is the number of incorrect words, ommissions and substitutions. Further work on the NLive transcript has also shown that where English is not a first language those manually making corrections may falter when contractions and conditional tense are used and if the speaker is not a fluent English speaker, corrections can take up to five times longer (according to a recent discussion held by the Disabled Students’ Commission on 6th December).

Difficulties with subject related words have been addressed by CaptionEd with related glossaries, which is the case with many specialist course captioning offerings where companies have been employed to provide accurate outputs. Other companies, such as Otter.ai and Microsoft Teams automatically offer named speaker options which is also helpful.

Professor Mike Wald has produced a series of interesting figures as a possible sample of what can happen when students just see an uncorrected transcript, rather than actually listen to the lecture. This is important as not all students can hear the lecture or even attend in person or virtually. It is also often the case that the transcript of a lecture is used long after the event. The group of students he was working with six years ago found that:

  • Word Error Rate counts all errors (Deletions, substitutions and insertions in the classical scientific way used by speech scientists): WER was 22% for a 2715 word transcript.
  • Concept Error Rate counts errors of meaning: This was 15% assuming previous knowledge of content (i.e. ignoring errors that would be obvious if student knew topic content) but 30% assuming no previous knowledge of content.
  • Guessed error rate counts errors AFTER student has tried to correct transcript by them ‘guessing’ if words have errors or not: there was little change in Word Error Rate as words guessed correctly were balanced by words guessed incorrectly (i.e. correct words that student thought were incorrect and changed).
  • Perceived error rate asks student to estimate % errors: Student readers’ perception of Word Error Rate varied from 30% – 50% overall and 11% – 70% for important/key words: readers thought there were more errors than there really were and so found it difficult and frustrating.
  • Key Errors (i.e. errors that change meaning/understanding) were 16% of the total errors and therefore would only require 5 corrections per minute to improve Concept Error Rate from 15% to 0% (speaking rate was 142 wpm and there were approx 31 errors per minute) but it is important to note that this only improves the scientifically calculated Word Error Rate from 22% to 18%.

This is such an important challenge for many universities and colleges at the moment, so to follow on from this blog you may be interested to catch up with the transcript provided from the Disabled Students’ Commission Roundtable debate held on 6th December. One of the summary comments highlighted the importance of getting the technology right as well as manual support, but overriding this all was the importance of listening to the student voice.

Finally, if you ever wonder why speech recognition for automated captioning and transcription still fails to work for us all, have a look at a presentation by Speechmatics about AI bias, inclusion and diversity in speech recognition . An interesting talk about using word error rates, AI and building models using many hours of audio with different phonetic structures to develop language models that are more representative of the voices heard across society.

Guidance for captioning rich media from Advance HE (26/02/2021)

Transcripts from Captions?

young person looking at computer for online learning

The subject of automatic captioning continues to be debated but Gerald Ford Williams has produced a really helpful “guide to the visual language of closed captions and subtitles” on UX Collective as a “user-centric guide to the editorial conventions of an accessible caption or subtitle experience.” It has a series of tips with examples and several very useful links at the bottom of the page for those adding captions to videos. There is also a standard for the presentation of different types of captions across multimedia ISO/IEC 20071-23:2018(en).

However, in this article transcripts are something that also need further discussion, as they are often used as notes gathered from a presentation, as a result of lecture capture or an online conference with automatic captioning. They may be copied from the side of the presentation, downloaded after the event or presented to the user as a file in PDF/HTML or text format depending on the system used. Some automated outputs provide notification of speaker changes and timings, but there are no hints as to content accuracy prior to download.

The problem is that there also seem to be many different ways to measure the accuracy of automated captioning processes which in many cases become transcriptions. 3PlayMedia suggest that there is a standard, saying “The industry standard for closed caption accuracy is 99% accuracy rate. Accuracy measures punctuation, spelling, and grammar. A 99% accuracy rate means that there is a 1% chance of error or a leniency of 15 errors total per 1,500 words” when discussing caption quality.

The author of the 3PlayMedia article goes on to illustrate many other aspects of ‘quality’ that need to be addressed, but the lack of detailed standards for the range of quality checks means that comparisons between the various offerings are hard to achieve. Users are often left with several other types of errors besides punctuation, spelling and grammar. The Nlive project team have been looking into these challenges when considering transcriptions rather than captions and have begun to collect a set of additional issues likely to affect understanding. So far, the list includes:

  • Number of extra words added that were not spoken
  • Number of words changed affecting meaning – more than just grammar.
  • Number of words omitted
  • Contractions … e.g. he is – he’s, do not … don’t and I’d could have three different meanings I had, I would, or I should!

The question is whether these checks could be included automatically to support collaborative manual checks when correcting transcriptions?

Below is a sample of the text we are working on as a result of an interview to demonstrate the differences between three commonly used automatically generated captioning systems for videos.

Sample 1

Sample 2

Sample 3

So stuck. In my own research, and my own teaching. I’ve been looking at how we can do the poetry’s more effectively is one of the things so that’s more for structuring the trees, not so much technology, although technology is possibleso starting after my own research uh my own teaching i’ve been looking at how we can do laboratories more effectively is one of the things so that’s more for structuring laboratories not so much technology although technology is part of the laboratoryso stop. In my own research on my own teaching, I’ve been looking at how we can do the ball trees more effectively. Is one thing, so that’s more for structuring the voluntary is not so much technology, although technology is part little bar tree

Having looked at the sentences presented in transcript form, Professor Mike Wald pointed out that Rev.com (who provide automated and human transcription services) state that we should not “try to make captions verbatim, word-for-word versions of the video audio. Video transcriptions should be exact replications, but not captions.” The author of the article “YouTube Automatic Captions vs. Video Captioning Services” highlights several issues with automatic closed captioning and reasons humans offer better outcomes. Just in case you want to learn more about the difference between a transcript and closed cations 3PlayMedia wrote about the topic in August 2021 “Transcription vs. Captioning – What’s the Difference?”.

Collaboration and Captioning

headphones

Over the years captioning has become easier for the non-professional with guidance on many platforms including YouTube and a blog about “Free Tools & Subtitle Software to Make Your Video Captioning Process Easier” from Amara. This does not mean that we are good at it, nor does it mean that it does not take time! However, artificial intelligence (AI) and the use of speech recognition can help with the process.

Nevertheless, as Professor Mike Wald said only 11 months ago in an article titled Are universities finally complying with EU directive on accessibility? “Finding an affordable way to provide high quality captions and transcripts for recordings is proving very difficult for universities and using automatic speech recognition with students correcting any errors would appear to be a possible solution.”

The idea that there is the possibility of collaboration to improve the automated output at no cost is appealing and we saw it happening with Synote over ten years ago! AI alongside the use of speech recognition has improved accuracy and Verbit advertise their offerings as being “99%+Accuracy”. but sadly do not provide prices on their website!

Meanwhile Blackboard Collaborate as part of their ‘Ultra experience’ offers attendees the chance to collaborate on captioning when managed by a moderator, although at present Blackboard Collaborate does not include automated live captioning. There are many services that can be added to online video meeting platforms in order to support captioning such as Otter.ai. Developers can also make use of options from GoogleMicrosoftIBM, and Amazon. The TechRepublic describe 5 speech recognition apps that auto-capton videos on mobiles. Table 1. shows the options available in three platforms often used in higher education.

Caption OptionsZoomMicrosoft TeamsBlackboard Collaborate
Captions – automatedYes YesHas to be added 
Captions – live manual correctionWhen set upWhen set upWhen set up
Captions – live collaborative correctionsNoNoNo
Captions Text Colour adaptations Size onlySome optionsSet sizes
Caption Window resizingNoSuggested, not implementedSet sizes 
Compliance – WCAG 2.1 AAYesYesYes
Table 1. Please forgive any errors made with the entries – not all versions offer the same options

It is important to ask if automated live captioning is used with collaborative manual intervention, who is checking the errors? Automated captioning is only around 60 – 80% accurate depending on content complexity, quality of the audio and speaker enunciation. Even, 3Playmedia in an article on “The Current State of Automatic Speech Recognition” admits that human intervention is paramount when total accuracy is required.

Recent ‘Guidance for captioning rich media’ for Advance HE, highlights the fact that the Web Content Accessibility Guidelines 2.1 (AA) require “100% accurate captioning as well as audio description.” They acknowledge the cost entailed, but perhaps this can be reduced with the increasing accuracy of automated processes in English and error correction can be completed with expert checks. It also seems to make sense to require those who have the knowledge of a subject to take more care when the initial video is created! This is suggested alongside the AdvanceHE good practice bullet points such as

“…ensure the narrator describes important visual content in rich media. The information will then feature in the captions and reduces the need for additional audio description services, benefiting everyone.”.

Let’s see how far we can go with these ideas – expert correctors, proficient narrators and willing student support!