TOX.AI

My Journey Towards Tackling Toxicity Prediction

Growing up, I always had a deep love and respect for animals. They brought joy into my life and taught me so much about empathy and connection. However, seeing animals suffer, especially when it's something preventable, always broke my heart. The pain felt even sharper knowing I wasn't doing anything to help. Although I was just a kid, I believed that every small effort could make a difference, and this belief stayed with me as I got older.

By the time I turned 16, I started dreaming bigger. This is when I came up with the idea for Tox.AI. The mission was ambitious but clear: disrupt the drug development pipeline through computational methods to predict drug toxicity. I envisioned creating a system that could shorten the time and reduce the cost of drug discovery while making animal testing redundant. The ultimate goal? To render animal testing obsolete and significantly devalue its role in preclinical trials.


Technicalities Of A Naive Idea

At the core of this vision was the belief that machine learning could do what animal models currently do—only faster, cheaper, and more ethically. At the time, I proposed developing a system to predict toxic properties of chemical substances based on their molecular characteristics. Using resources like PubChem, my plan was to gather data on molecular structures, store it in a vectorized database, and use clustering techniques like Tanimoto's Coefficient to group chemicals by similarity. This would feed into a binary classification model, allowing us to predict whether a substance would be hazardous at specific doses.

  1. Data Collection: Scrape the web for chemical property data and create an expanding database.
  2. Clustering: Group similar chemicals using k-nearest neighbors and Tanimoto's Coefficient.
  3. Prediction Model: Train a model to classify chemicals as hazardous or non-hazardous based on their feature vectors.

I was a bold, stubborn teenager, filled with enthusiasm and belief in the power of technology. But looking back, I see how naïve I was. I failed to consider critical factors:

  1. Data Complexity: I underestimated how challenging it would be to gather and standardize high-quality toxicity data.
  2. Regulatory Hurdles: I didn't fully grasp the stringent requirements of regulatory bodies like the FDA.
  3. Model Limitations: I overlooked the complexity of biological systems and how even advanced computational models might struggle to replicate certain human responses.
  4. Ethical Nuance: While reducing animal testing is a noble goal, ensuring the safety of patients in clinical trials is a moral imperative.

Despite these oversights, the experience was invaluable. It planted the seed for my passion and shaped the direction of my career. Tox.AI wasn't just a youthful project—it was a mission, one that I'm still committed to pursuing through science and innovation as a data science and applied math student at Berkeley. While the path forward may be complex, my goal remains clear:

"I want to build a future where animals are no longer the cost of medical progress."


Predicting Drug-Target Affinities with AI

The future of drug discovery lies in AI-driven precision. My research focuses on leveraging Graph Neural Networks (GNNs) and molecular descriptors to predict drug-target affinities, a critical step in accelerating early-stage drug development.

Chemprop model architecture

Graph Neural Network Architecture (Chemprop)

Using GNNs, I model molecules as structured graphs, capturing atomic interactions to predict binding strength.

Loading 3D molecule viewer...

Epidermal Growth Factor Receptor (4i23)

Molecular descriptors, such as chemical fingerprints and electrostatic properties, refine AI models for more precise binding predictions.

1C9 interaction with 4i23

1C9 Interactions With 4i23

This approach helps prioritize promising drug candidates, reducing laboratory screening efforts and shortening the drug development timeline.

By integrating AI into drug discovery pipelines, I aim to make development faster, more efficient, and less dependent on traditional screening methods.The work is ongoing, but here is a research poster we developed summarizing our current findings and future directions.