In the fight against COVID-19, medical professionals are turning to unlikely allies - computer scientists who are utilizing a process known as text mining to gain a competitive edge against the virus.
Many are already enlisting the help of AI to track the spread of COVID-19, identify high-risk individuals, and develop drugs. Now, a group of citizen scientists is using artificial intelligence to sift through thousands of coronavirus research papers and glean vital insights.
Analyzing this much data is a daunting task and one not easily accomplished when time is of the essence. When dealing with extremely large data sets, it's very difficult for humans alone to effectively analyze and categorize all of the information.
Subscribe to Freethink for more stories like this.
What is Text Mining?
Text mining is the process of analyzing a large data set to uncover themes. According to Techopedia, "This requires sophisticated analytical tools that process text in order to glean specific keywords or key data points." Other names for text mining are text analytics or text data mining.
Text mining is used by data scientists and others to filter through large groups of text that would take far too long for humans to organize. The process is similar to data mining but focuses solely on text material.
Text analytics utilizes artificial intelligence and natural language processing to discover commonalities between a number of sources. It runs algorithms to categorize data and uses analytical models to identify patterns.
Citizen Scientists Support Those on the Frontlines
In the race against time that scientists and medical professionals are currently facing, text analytics could play a pivotal role in helping find the information they need to fight and cure the novel coronavirus. To help sift through the myriad of available research, these entities are calling upon anyone with the ability to build machine learning tools to help.
The White House, along with the Allen Institute for AI, the Chan Zuckerberg Initiative, Georgetown University's Center for Secure and Emerging Technology, Microsoft, and the National Library of Medicine have prepared a data set of over 57,000 scholarly articles related to COVID-19.
"Scientists are essentially dealing with a deluge of information... keeping up with the research is going to be very hard."
"CORD-19," as it is called, has been made available to the global research community and some citizen scientists are jumping at the opportunity to provide their support. These groups are using the competition platform Kaggle, an online community for data scientists, as a call to action for individuals willing to help.
They're asking for anyone with text mining skills to help sort through these articles, in hopes of discovering any sort of information that might answer key questions to help fight COVID-19.
Subbarao Kambhampati, Chief AI Officer of the AI Foundation, explains the need for enlisting these individuals. "Scientists are essentially dealing with a deluge of information. There are too many papers coming out and just keeping up with the research is going to be very hard. We don't want to redo what somebody else has already done."
The Secret Weapon in the Fight Against COVID-19
Maksim Eren is one of the citizen scientists who jumped at the opportunity to lend his skills to this fight. He built a tool that clusters similar articles so that scientists and researchers can look at smaller, more specific groups of research.
Eren explains, "They may be able to see patterns that they wouldn't be able to see as humans because we are applying the machine learning algorithms."
The data clusters collect and organize information on topics such as the science of airborne viruses and the utilization of tools like facemasks. This literature clustering is helping ensure that scientists aren't simply redoing previous research but are instead building upon the previous work of others.
Eren's literature clusters help ensure that scientists aren’t redoing previous research, but building upon others' work.
To incentivize individuals like Eren, Kaggle is sponsoring a $1,000 per task award to the winner whose submission is the best at meeting evaluation criteria. The winner is then given the option of receiving the award as a monetary payment or receiving it as a charitable donation to COVID-19 efforts.
Kaggle has previously offered similar competitions, such as one designed to discover better ways to screen for cervical cancer. However, those have all been for longer-term, less urgent timeframes. Due to the nature of the COVID-19 outbreak, this competition is the biggest and most pressing challenge its competitors have faced.
Working Together to Combat the Virus
Scientists and medical professionals are the ones on the frontlines of this global pandemic, but they can't do it alone. The support that citizen scientists, like Eren, are providing has the potential to play a crucial role in gaining leverage against the virus's spread.
The issue isn't a lack of information on the virus. Quite the contrary, it's an overabundance of information for those trying to navigate the data, so computer scientists like Eren are providing an invaluable service.
Their use of artificial intelligence and text mining is helping make sense of tens of thousands of research papers in a fraction of the time that it would take humans alone. The work of these individuals is nothing short of heroic and it could become a strong force in the fight to understand and end COVID-19.