Our group aims at integrating all known and hypothesized regulatory mechanisms about T cells in a single predictive model. This model will then be used to understand T cells under normal and pathological conditions.
This is no easy task! But we think we have an approach that might work. First, we will look at T cells under a large number of stimuli, in a way that lends itself to modelling (bioinformaticians hate batch effects). Second, we only use high-throughput methods (“omics”) to maximize the amount of produced data vs required work. This includes developing new smarter assays, such as new single-cell and low-input methods. Thirdly, we will use advances in machine learning and Bayesian statistics. With the right formulation the computer is able to think like a human, but accounting for many more interactions in one go. That is unfortunately not enough – Fourthly, we will include known pathways and “human bias” to help the relatively stupid computer. Non-bioinformaticians might be surprised, but this is not frequently done as the known interactions are not kept in a computer-friendly format. To get this sorted we rely heavily on immunologists closely working with bioinformaticians to make the connection!
From a wet-lab perspective, we have chosen the following protocols as our “core” to enable us to somewhat pin down the central dogma:
- RNA-seq of large and small RNAs
- ChIP-seq, ATAC and HiC to analyze the chromatin state
- Mass spectrometry of proteins and metabolites
- Light microscopy and cryo-EM to capture the morphology
- Single-cell scale protocols to multiplex the experiments and measure response heterogeneity
We have access to a range of pathogen models, in cell culture and in vivo (mouse and human). To obtain causality in our model we are using CRISPR and/or drug/cytokines in multiwell plates.
If you open a review on your favourite cell type, you will find that it lists maybe 50 genes at best. Where did the other 20.000 genes go? Truth is, many genes have been found to be important, but researchers tend to have their favourites. This need not in any way be objective; aspects which easy to study, or which simply end up in high-impact journals, have been given priority.
That said, we want to revisit less known pathways. Many of these pathways can now be studied together as high-throughput datasets can be analyzed from many angles. One phenomenon commonly ignored is that all immune cells actually move around in your body, depending on the time of the day (and differently in mice and humans!). This is thought to help T cells find pathogens, as well as communicate with other cells in the immune system. Not taking such a major behaviour into account risks confounding other analyses. Is a gene highly expressed because it moved into the blood, or is it in the blood because the gene is expressed?
Another more well-known variable, but usually overlooked unless your research group focuses on it, is that immune cells behave difference depending on your sex. Women generally has a more active immune response, and because of this also a higher incidence of autoimmune diseases. Unfortunately it is not even as simple as males vs females, due to hormonal cycles, pregnancy and menopause. We take a highly reductionistic approach to decompose the sex differences, first looking at the effect of different stimuli and then connecting the stimuli to sex differences in silico.
Our lab has a background in analysing RNA and the chromatin structure. It’s easy to get plenty of data. Unfortunately it is also easy to start using your method of choice as a golden hammer. Many mechanisms won’t be seen with just one single method, so we are trying to avoid this trap. As an example, we have found a number of genes controlling metabolism which are different in different T cell states. We also know from the past that metabolites can affect T cell state, such vitamin A able to induce regulatory T cells (these guys are important for modulating the immune system, avoiding autoimmunity). Likewise, calcium signalling is a key part of T cell activation after TCR (T cell receptor) induction. Recently it has been shown that tumours induce steroid biosynthesis in T cells to evade immunity. All that said, there has so far been little effort to link unbiased transcriptomics with metabolism, largely due to poor overlap between these two communities. With the clear importance of metabolism, this is a gap we are trying to close, along with protein modifications, gene isoforms, calcium concentration, cell morphology, and everything else we can feed to the computer.
Understanding T cells in different tissue contexts
Those who look carefully will see that T cells express different cells in different tissues. Again, this is confounded by the circadian rhythm, but it is clearly quite not that simple – several organs like the testicles and the brain are “immune privileged”. That is, there is a different representation of immune cells, typically less. Possibly because their action would create too much damage if not kept under sufficient control. This process can only be controlled by the surrounding tissue emitting signals telling the T cells to stay away. When this system fails, you risk ending up with diseases such as multiple sclerosis.
Sometimes you forget that T cells are just one component of the immune system. But if you see a change in T cells during a pathogenic condition, was the change directly due to the pathogen or did the signal first get modulated through another cell type? Modelling the full immune system is currently beyond our scope. However, as with the interaction with different tissues, we will have a particular focus on the signal cues from other immune cells. We will do this by screening the impact of all signal molecules we can get our hands on, to build a catalogue of possible interactions.
The ultimate goal is to understand the T cells does at work – what are they doing when they detect a parasite, a virus, or cancer, and why? By looking at how the state is perturbed, and cross-checking it with our catalogue of possible interactions, we will find the critical paths of influence (there are likely more than one!). Using this model we will be able to develop better drugs to module the response, strengthening, weakening it or making it more specific. This approach is already widely successful in cancer immunotherapy, but the drugs used are best used as sledge-hammers. How do you make sure the car doesn’t stop? Take out the breaks. How do you make sure a car doesn’t go too fast? Remove the wheels. Obviously such as an approach has strong side-effects, with immuno-therapy sometimes leading to auto-immune responses, and anti-autoimmunity drugs making patients susceptible to viruses. Our long-term goal is to take the toolbox and sharpen the blades even further.
For an engineer or a physicist, biology can be made simple – It is a vector state variable, which is updated given the current state as well as input from the environment. Each state variable is something we easily can measure, such as the amount of RNA or protein for a certain gene. The challenge for a bioinformatician is to fit such a model, making use of existing knowledge, filling in the gaps wherever needed. But when we lack data we also need to fill in the gaps whatever we can. There is little room for idealism in biology, we have to make do with what we have!
Our basic regulatory model is focused on the central dogma where much information can be captured with high-throughput methods, or predicted. For example, we can predict where transcription factors might or might not bind, which drastically reduces the parameters needed for the model (For the biologist: every parameter is a proposed mechanism!). Onto this network we are then adding perturbed states (cells with a stimuli). A key idea behind the model is that we are trying to solve the full model in one go. Because transcriptional regulation is so intertwined, the only way to beat the complexity is by brute force data collection. We deploy a hybrid explanatory machine learning model / kinetics model for our purpose.
A large amount of knowledge is already present. This data is however not commonly used in current gene regulatory network reconstruction models, part due to the challenge of integrating very heterogeneous data. However, adding this knowledge constrains the model such that more sensible solutions can be found despite noisy data (picture on the right). We are investing various Bayesian models in which putative interactions can be plugged in as prior distributions, and unlikely links removed entirely. This way hope to include, among other, alternative splicing, post-translational modifications, and protein-protein interactions. The resulting equations require new computational approaches for efficient solving.
The same cell come in many flavours. As many molecules are present at low concentrations, the cells are at mercy of stochasticity. This has been observed since long, simply because not all T cells do not divide at the same rate. This makes well-controlled bulk analyses really challenging. We are resolving the heterogeneity using single-cell technology, enabling RNA-seq and targeted proteomics readouts. Microscopy is of course also a single-cell method. While this is not a complete toolbox yet, we are continuously working on new approaches to increase the resolution and the fidelity of our model.