The major challenge with the use of artificial intelligence (AI) is that it is often difficult to explain how AI or machine learning (ML) solutions and recommendations come to be. Previously this may not matter as much because AI’s use was limited and its recommendations were confined to relatively trivial decisions. In the past few decades, however, AI use has become more pervasive and some of these AI solutions are impacting high-stakes decisions, so this problem has become increasingly important. Fueling the urgency are findings that AI solutions can be unintentionally biased, depending on the type of data used to train the algorithms. For instance, the algorithm used by Amazon to hire staff was found to be biased against women because the algorithm was trained on previous data that largely comprised resumes from male applicants (Shin, 2020). Algorithms used by COMPAS (i.e., Correctional Offender Management Profiling for Alternative Sanctions) to predict likelihood of recidivism were also found to be predict that black offenders were twice as likely to reoffend compared to white offenders (Shin, 2020). These are only two of many instances of algorithm bias.
These algorithms tend to be from “black box” models that are developed from powerful neural networks performing deep learning. These neural networks are usually what researchers have to use for computer vision and image processing which are less amenable to other AI/ML techniques. Researchers in the field of explainable AI (or XAI) have tried to shine a light into these “black boxes” in different ways. For instance, one popular way to help explain deep learning AI processing images is to use heat or saliency maps that show the regions/pixels of a picture that seem to highly influence the algorithm’s prediction (i.e., what the network is “paying attention to”). If these regions aren’t relevant to the algorithm’s task at hand, then the researcher may be looking at a biased algorithm. In her recent talk at a Deep Learning Summit, Rohrbach (2021) showed an example of wrong captioning by a black box model, mislabeling the woman sitting at a desk in front of a computer monitor as “a man” (Figure 1a).
The saliency map showed that the network was attending more to the computer monitor than the person in the picture (Figure 1b), when it should have been focusing more on the person (Figure 1c). Presumably the bias arose because the training data contained more images of men sitting in front of computers than women doing the same.
Saliency maps also provide a way to evaluate the “logic” of the algorithm even when the captioning seems appropriate (see Figures 2a & 2b).
A major criticism of such saliency maps is that they only show the inputs of importance in deriving the algorithm. They do not reveal how these inputs are used. For instance, if two models had different captions or predictions but had very similar saliency maps, the maps would not explain how the models reached their different predictions (Wan, 2020).
Due to such limitations, other researchers such as Dr. Cynthia Rudin, propose that instead of making AI explainable, interpretable models should be developed instead. The difference being that interpretable models are inherently understandable, and the researcher is able to see how the model derives its solutions and algorithms, instead of merely trying to coax explanations from a model by reviewing what inputs were more influential than others after the algorithm has been developed (Rudin, 2021). Dr. Rudin’s work to make neural networks interpretable has received wide acclaim and she is a strong proponent of using interpretable models especially for high stakes decisions (Rudin, 2019). However, as she stated in the recent Deep Learning Summit, Dr. Rudin also admitted that developing such interpretable models takes more time and effort. This is because unlike black box models where researchers would not know if the model is working properly, interpretable models force researchers to work to troubleshoot the data when they see that the model is not working as it should – even if the solutions look ok (Rudin, 2021).
Nevertheless, there are some who still argue for the use of black box models that are low on explainability. Black box models are harder to copy and can give the company that developed them a competitive advantage. They are also easier to develop. Proponents of this view believe that the goal is not to explain every black box model but to identify when to use black box models (Harris, 2019). If a black box algorithm is able perform its task to a high degree of accuracy, we may not need to know exactly how it did it. Besides, what is a valid explanation to one may not be a valid explanation to another. These researchers argue that even physicians use things that they do not fully understand all the time, to include common drugs that have been shown to be consistently effective even though no one totally comprehends how they work in every patient. What is important is that that enough testing is done to ensure that the algorithm is dependable and suitable for its intended use (Harris, 2019).
When I first learned and read about AI and the problem of inexplicability some years back, I remember taking the position that the ends justified the means. That is, so long as the algorithm was accurate I’d be willing to sacrifice explainability. However, as I learn more about how people are using AI solutions for all sorts of important medical, hiring, and criminal justice decisions, I started to reconsider my position. After attending the Deep Learning Summit on Explainable AI (XAI) earlier this year, I am more inclined to think that it really depends on the application and type of decision involved. Just because a black box model has a sterling track record of highly accurate predictions in the past does not mean that it is not possible for the next prediction to be poor. The higher stakes the decision is, the better understanding we should have of the workings of the AI.
What do you think? For what applications or decisions would you accept a model that is unexplainable but have been consistently accurate?
Harris, R. (2019). How can doctors be sure a self-taught computer is making the right diagnosis? Retrieved from https://www.npr.org/sections/health-shots/2019/04/01/708085617/how-can-doctors-be-sure-a-self-taught-computer-is-making-the-right-diagnosis
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215.
Rudin, C. (2021). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Deep Learning Summit 2021.
Rohrbach, A. (2021). Explainable AI for addressing bias and improving user trust. Presented at the Deep Learning Summit 2021.
Shin, T. (2020). Real-life Examples of Discriminating Artificial Intelligence: Real-life examples of AI algorithms demonstrating bias and prejudice. Retrieved from https://towardsdatascience.com/real-life-examples-of-discriminating-artificial-intelligence-cae395a90070
Wan, A. (2020). What explainable AI fails to explain (and how we fix that). Retrieved from https://towardsdatascience.com/what-explainable-ai-fails-to-explain-and-how-we-fix-that-1e35e37bee07
Preprints first caught my eye in May 2020. As a human factors researcher who has researched wearables for several years and owned a few (remember the Jawbone UP?) my interest was piqued by a Washington Post headline, “Wearable tech can spot coronavirus symptoms before you even realize you’re sick” (Fowler, 2020). The wearable mentioned in the Post article, the Oura ring, must have already been developed before COVID-19 reached the U.S. and a novel application was developed for the existing hardware. Physical and digital prototypes go through iterative design and development processes and while the process can be rushed, speed comes at the expense of quality (e.g., Cyberpunk 2077). Quality is paramount when developing an application for monitoring individual and public health during a pandemic. The initial findings Fowler referenced were reported in preprint form. Preprints are scholarly works posted by researchers before the manuscripts have undergone peer review. Scientific research must be developed over time, much like hardware and software. Studies take months and even years to design, develop, pilot, collect and analyze data, interpret findings, and report the results. Peer review, the process of the scientific community evaluating research for its scientific soundness and practical and applied merit, takes additional months. Preprints are a way for scientists and researchers to put their (unvalidated) findings into the world, thus making sure nobody else can gain credit for the work. No industry standard exists for preprints. Do “initial findings” contain the results of two study participants or 200? Work-in-progress papers have been presented at academic conferences for years. Technical reports are another avenue for quickly reporting research. Why are preprints, which are posted online for the world to consume before peer review has vetted the work, necessary for researchers to make sure no one else receives credit for their work?
People may rely on news headlines and information gained through word-of-mouth for health advice. There’s a lot of money to be made. Newspapers need headlines to drive subscriptions, tech startups need investment capital, and researchers need funding. The financial incentives reinforce the people involved until the echo chamber increases sales. The preprint problem is magnified when journalists, who are also looking to not get scooped, take the initial findings and generate headlines that may be contradictory to the final, validated results. What happens if a study that’s initially reported as a preprint is found to be invalid? Is the preprint pulled from the internet? Does the news media issue a retraction? Will future researchers who are looking to develop and test hypotheses be able to distinguish between a preprint and a peer reviewed study? Some article repositories are taking steps to address this issue by clearly labeling which articles are preprints, such as medRxiv (https://www.medrxiv.org/). The following cautionary statement is prominently displayed on medRxiv’s homepage (emphasis medRxiv’s): Caution: Preprints are preliminary reports of work that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information. This is a good first step; however, it is incumbent on journalists and news outlets to responsibly report information. For example, a reporter could state that research into using wearables to aid in early COVID-19 diagnoses is ongoing while also refraining from mentioning any features or capabilities of devices that have not been validated.
The results of the first Oura studies are promising. Maybe the Oura ring and similar devices can detect the symptoms of COVID-19 before most people would otherwise spot them, but we don’t know yet. Validation takes time. In the meantime, wash your hands, wear a mask, and physically distance as much as you can. Relying on an unvalidated, non-FDA approved device for disease prevention and detection may lead some people to have a false sense of security when their wearable does not, in fact, indicate they may be ill. This false sense of security could then lead to infecting others. The scientific community can more clearly indicate that preprints are not to be used for individual or organization-level health guidance, as medRxiv has done. Scientists and researchers can choose to not cite preprints in their work, since the issue of what happens when a preprint is invalidated and retracted is still unclear. Companies and institutions can choose to not use preprints as a basis for hiring, promotion, and tenure. The issue of whether researchers should use information reported in preprints as a foundation upon which to scaffold scientific theories needs to be answered. We stand on the shoulders of giants, but without a firm footing we risk regressing down a slippery slope.
What are your thoughts on preprints? Let us know, below!
Fowler, G.A. (2020 May 8). Wearable tech can spot coronavirus symptoms before you even realize you’re sick. The Washington Post. https://www.washingtonpost.com/technology/2020/05/28/wearable-coronavirus-detect/
This short post was inspired by an article I recently read about a group of dentists that decided to open a practice in NYC (D'Ambrosio, 2020). What's different about it is these dentists offer a very limited number of services (in fact only three: cleaning, whitening, and straightening). No, it's not because they are underqualified to perform other services or they're just lazy, it's because through their research they found most people (i.e., young professionals) in the area don't need the full gamut of services normally offered at the dentist. They also realized most people can't afford to go to the dentist for these basic services because of cost. Now, implementing a "user-centered" business model, these dentists offer clientele the basic services they most often want at a much more affordable price. This is because the overhead costs associated with these three services (i.e., equipment, office space, insurance, etc.) are much lower than a traditional dentist, and those savings are passed on to the clientele. They also completely redesigned their office space with a time-relevant facelift. Their goal is to make going to the dentist similar to going to get a haircut or your nails done…pop in, get a cleaning, and pop out with some money still in your pocket.
It's obvious COVID has been a catalyst that has forced businesses to re-evaluate their models and plans over the past year. People have less money to spend, but still have needs to be met. I am by no means a business analyst or have a business degree, but I do understand the importance of focusing on the end-users when designing products, building training programs, or determining a list of services to offer. Many of these topics are partially covered under market research, but by borrowing concepts and approaches from usability, user experience, neuroscience, and other related fields, businesses can gather different types of research data to better inform their decisions. For example, restaurants, wineries, and supermarkets have used behavioral science (e.g., eye-tracking data) to inform menu redesigns or product options, resulting in better customer experiences and bottom lines (e.g. Cobe, 2020; Huseynov, Kassas, Segovia & Palma, 2019; Wästlund, Shams, & Otterbring, 2018). It shows that a bit of research to understand customer needs and creative thinking based on the data can go a long way.
1) Understand user needs regardless of your business focus and design to meet those needs.
2) Look to other sources of data not traditionally used to inform these related business decisions as a way to better understand your users and clientele. This can lead to better user experiences and improve overall customer satisfaction.
How are you seeking to understand your customer needs and how is data informing your "user centered" business decisions? Let us know what challenges you're facing and we'll let you know how we can help.
Cobe, P. (2020, September 25). Texas restaurants turn to neuroscience for menu makeovers. Restaurant Business. https://www.restaurantbusinessonline.com/technology/texas-restaurants-turn-neurosciencemenu-makeovers
D'Ambrosio, D. (2020, January 18). Take a trip to Beam Street for a new kind of dentist. Forbes. https://www.forbes.com/sites/danieldambrosio/2020/01/18/take-a-trip-to-beam-street-for-a-new-kind-of-dentist/?sh=792aaf52bf77
Huseynov, S., Kassas, B., Segovia, M.S., & Palma, M.A. (2019). Incorporating biometric data in models of consumer choice. Applied Economics, 51(14), 1514-1531.
Wästlund, Shams, & Otterbring, (2018). Unsold is unseen … or is it? Examining the role of peripheral vision in the consumer choice process using eye-tracking methodology. Appetite, 120, 49-56.
Read more here:
Last month, QIC welcomed Jessica James to the team! Jessica James is a Human Factors Intern at QIC. She earned a B.S. in Human Factors Psychology from Embry-Riddle Aeronautical University and is currently pursuing a M.S. in Human Factors from Embry-Riddle Aeronautical University as well. During her academic career, she has assisted in various projects related to usability and user experience (UX) research.
These posts are written or shared by QIC team members. We find this stuff interesting, exciting, and totally awesome! We hope you do too!