Collaboration and Captioning


Over the years captioning has become easier for the non-professional with guidance on many platforms including YouTube and a blog about “Free Tools & Subtitle Software to Make Your Video Captioning Process Easier” from Amara. This does not mean that we are good at it, nor does it mean that it does not take time! However, artificial intelligence (AI) and the use of speech recognition can help with the process.

Nevertheless, as Professor Mike Wald said only 11 months ago in an article titled Are universities finally complying with EU directive on accessibility? “Finding an affordable way to provide high quality captions and transcripts for recordings is proving very difficult for universities and using automatic speech recognition with students correcting any errors would appear to be a possible solution.”

The idea that there is the possibility of collaboration to improve the automated output at no cost is appealing and we saw it happening with Synote over ten years ago! AI alongside the use of speech recognition has improved accuracy and Verbit advertise their offerings as being “99%+Accuracy”. but sadly do not provide prices on their website!

Meanwhile Blackboard Collaborate as part of their ‘Ultra experience’ offers attendees the chance to collaborate on captioning when managed by a moderator, although at present Blackboard Collaborate does not include automated live captioning. There are many services that can be added to online video meeting platforms in order to support captioning such as Developers can also make use of options from GoogleMicrosoftIBM, and Amazon. The TechRepublic describe 5 speech recognition apps that auto-capton videos on mobiles. Table 1. shows the options available in three platforms often used in higher education.

Caption OptionsZoomMicrosoft TeamsBlackboard Collaborate
Captions – automatedYes YesHas to be added 
Captions – live manual correctionWhen set upWhen set upWhen set up
Captions – live collaborative correctionsNoNoNo
Captions Text Colour adaptations Size onlySome optionsSet sizes
Caption Window resizingNoSuggested, not implementedSet sizes 
Compliance – WCAG 2.1 AAYesYesYes
Table 1. Please forgive any errors made with the entries – not all versions offer the same options

It is important to ask if automated live captioning is used with collaborative manual intervention, who is checking the errors? Automated captioning is only around 60 – 80% accurate depending on content complexity, quality of the audio and speaker enunciation. Even, 3Playmedia in an article on “The Current State of Automatic Speech Recognition” admits that human intervention is paramount when total accuracy is required.

Recent ‘Guidance for captioning rich media’ for Advance HE, highlights the fact that the Web Content Accessibility Guidelines 2.1 (AA) require “100% accurate captioning as well as audio description.” They acknowledge the cost entailed, but perhaps this can be reduced with the increasing accuracy of automated processes in English and error correction can be completed with expert checks. It also seems to make sense to require those who have the knowledge of a subject to take more care when the initial video is created! This is suggested alongside the AdvanceHE good practice bullet points such as

“…ensure the narrator describes important visual content in rich media. The information will then feature in the captions and reduces the need for additional audio description services, benefiting everyone.”.

Let’s see how far we can go with these ideas – expert correctors, proficient narrators and willing student support!

Authentication Types: what they mean?

iris biometric scanning

You might have wondered what all those authentication types mentioned in our last blog actually meant? Some are well known, but a few are new, so it seemed to make sense to try to give each one a definition or explanation from the many sites that have this information! The result is a random collection of links. They may not be the best available and are certainly not academically based or tried and tested but here goes:

Knowledge: Something a person knows

  • Password – a string of characters that allows access to a computer system or service.
  • PIN – A personal identification number (PIN), or sometimes redundantly a PIN number, is a numeric (sometimes alpha-numeric) passcode used in the process of authenticating a user accessing a system.
  • Knowledge-based challenge questions – Knowledge-based authentication (KBA) is an authentication scheme in which the user is asked to answer at least one “secret” question.
  • Passphrase – A passphrase is a longer string of text that makes up a phrase or sentence.
  • Memorised swiping path – laying your finger on a screen and moving in any direction that covers the memorised characters.

Possession: Something a person has

  • Possession of a device evidenced by one time password (OTP) generated by, or received on a device – “The password or numbers sent to for instance a phone expire quickly and can’t be reused.”
    • Possession of a device evidenced by a signature generated by a device – “hardware or software tokens generate a single-use code to use when accessing a platform.”
    • Card or device evidenced by QR code scanned from an external device – “Quick Response (QR) code used to authenticate online accounts and verify login details via mobile scan or special device.”
    • App or browser with possession evidenced by device binding – “a security chip embedded into a device or private key linking an app to a device, or the registration of the web browser linking a browser to a device”
    • Card evidenced by a card reader – “physical security systems to read a credential that allows access through access control points.”
    • Card with possession evidenced by a dynamic card security code – “Instead of having a static three- or four-digit code on the back or front of the card, dynamic CVV technology creates a new code periodically.”

Inherence: Something about the person e.g. biometrics

  • Fingerprint scanning – “When your finger rests on a surface, the ridges in your fingerprints touch the surface while the hollows between the ridges stand slightly clear of it. In other words, there are varying distances between each part of your finger and the surface below. A capacitive scanner builds up a picture of your fingerprint by measuring these distances.”
  • Voice recognition – “Voice and speech recognition are two separate biometric modalities…By measuring the sounds a user makes while speaking, voice recognition software can measure the unique biological factors that, combined, produce [the] voice.”
  • Hand & face geometry – A biometric that identifies users from the shape of their hands and in the case of Google’s Media Pipe face identification it is complex network of 3D facial keypoints using artificial intellingence etc to analyse the results.
  • Retina & iris scanning – “both ocular-based biometric identification technologies… no person has the same iris or retina pattern”
  • Keystroke dynamics – …”keystroke dynamics don’t require an active input. Instead, keystroke dynamics analyzes the typing patterns of users; this can include typing rhythms, frequent mistakes, which shift keys they use for capitalization and pace.”
  • Angle at which device is held – “the exact angle a user holds the phone as a means of making replay attacks a lot more difficult.”

There has been a debate about which of the above should be considered under the various headings and acceptable as part of a secure multifactor authentication system. If you are interested in these processes and want more information it may be worth reading the Opinion of the European Banking Authority on the elements of strong customer authentication under PSD2. By the way PSD2 means ‘Payment Services Directive 2’ and the UK will be following the directive, but there is an extension for UK e-commerce transactions.

However, in the meantime many organisations other than banking, shopping sites and those that hold personal data have asked users to consider multifactor authentication including the NLive project lead and the University of Southampton that has some helpful instructions.

LexDis introduces its News Blog

Person sitting behind a laptop trying to access a screen button with right hand.

LexDis was set up 14 years ago as a JISC project with learner experiences becoming a series of strategies that demonstrated ways of overcoming accessibility barriers and finding innovations that support digital learning. COVID-19 has meant these types of strategies have become even more important and technology companies have had to provide improved built in options in settings to enhance access to their online offerings.

We have just secured an Innovate UK funded NLive project that is all about evaluating the outcomes of an “automated quality controlled human collaboratively edited closed-caption and live transmission system”. It is a mouth full as well as a challenge! There are added goals to sort out including digital right management issues, improving recording quality for streamed audio and videos, making use of AI and noise cancellation algorithms as well as pesonalising accessibility options. Lots to achieve in a year!

So at the moment we are planning a series of news blogs that will track the outcomes of our endeavours and we will be asking for help along the way!

When evaluating online services for their usability and accessibility it is important to think about how a system will be used. So when we started to think about the elements that might cause barriers we turned to experts in the field last year, then studied the guidelines and articles to build on the knowledge we had gained in the past.

Just last week (August 16th, 2021) a really interesting article by Gareth Ford Williams came to our notice, thanks to Steve Lee. It was all about UX = Accessibility & Accessibility = UX where Gareth talked about evaluations seeming to ‘focus on guidelines rather than user outcomes’. I think that is what we tried to achieve with LexDis, so once again we are on that journey!

Gareth poses the following thought that we are going to hold onto as we explore ways of making it easier for students to access their online learning systems.

“If we step away from the compliance model and think of accessibility being first and foremost about people and the rich diversity we find within any audience, it starts to raise a lot of questions about what ‘good’ actually is.

Gareth goes on to mention “10 Human Intersectional UX Obstacles within any Product or Service’s Design” and presents a series of built in settings and strategies to support user preferences. During the coming year we will explore the challenges for an internet multimedia system and present ideas for overcoming them. Wish us luck!