Using machine learning to find mutations in similar genome sequences of cancer samples
A team of researchers working at the Francis Crick Institute has developed a way to find mutations in similar genome regions of cancer samples. In their paper published in the journal Nature Biotechnology, the group describes using a machine-learning algorithm to spot cancerous mutations in non-unique parts of the genome.
As part of human evolutionary history, sections of the genome have undergone rearrangement, and in some cases, duplication. Such duplications have been found to be problematic when attempting to find mutations. Current scanning methods toss out short sequences that are identified as ambiguous, which means that segments of the genome that are very similar to one another are not included in such reports—and that means that any mutations will be missed. In this new effort, the researchers have developed a means for finding mutations in non-unique parts of the genome.
The approach involved first developing a list of genome regions known to be similar to other regions and then using them to teach a machine-learning algorithm how to recognize them. Researchers then used the algorithm to spot mutations in different tissues—2,658 samples from the Pan-Cancer Analysis of Whole Genome dataset. The researchers uncovered mutations in 1,744 coding sequences along with thousands of other mutations in non-coding sequences. They also found that their algorithm had a false discovery rate of approximately 7% and a validation rate of more than 80%.
The researchers noted that those mutations that involved coding sequences have an impact on protein sequences, some of which have been linked to cancer types. They also found instances of mutations that led to protein changes, that have also been linked to specific kinds of cancers. As one example, they found a recurrent mutation in the KMT2C and PIK3CA genes. They also found mutations that have been linked to breast cancer. And they found mutations that are involved in regulatory regions, including some in the immunoglobulin family.
The researchers suggest their technique can be used by other teams as a means to overcome issues with overlooking mutations in near-duplicate genetic regions.
This article originally appeared on phys.org to read the full article, click here.
Nastel Technologies helps companies achieve flawless delivery of digital services powered by middleware. Nastel delivers Middleware Management, Monitoring, Tracking, and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s Navigator X fuses:
Nastel Technologies is the global leader in Integration Infrastructure Management (i2M). It helps companies achieve flawless delivery of digital services powered by integration infrastructure by delivering tools for Middleware Management, Monitoring, Tracking, and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate, to answer business-centric questions, and provide actionable guidance for decision-makers. It is particularly focused on IBM MQ, Apache Kafka, Solace, TIBCO EMS, ACE/IIB and also supports RabbitMQ, ActiveMQ, Blockchain, IOT, DataPower, MFT, IBM Cloud Pak for Integration and many more.
The Nastel i2M Platform provides:
- Secure self-service configuration management with auditing for governance & compliance
- Message management for Application Development, Test, & Support
- Real-time performance monitoring, alerting, and remediation
- Business transaction tracking and IT message tracing
- AIOps and APM
- Automation for CI/CD DevOps
- Analytics for root cause analysis & Management Information (MI)
- Integration with ITSM/SIEM solutions including ServiceNow, Splunk, & AppDynamics