Brain tissues have single-voxel signatures in - multi-spectral MRI Der Medizinischen Fakultät der Friedrich-Alexander-Universität ...
←
→
Transkription von Seiteninhalten
Wenn Ihr Browser die Seite nicht korrekt rendert, bitte, lesen Sie den Inhalt der Seite unten
Brain tissues have single-voxel signatures in multi-spectral MRI Der Medizinischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg zur Erlangung des Doktorgrades Dr. med. vorgelegt von Alexander Simon Maria German aus Berlin-Steglitz 1
Als Dissertation genehmigt von der Medizinischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg Tag der mündlichen Prüfung: 21. Februar 2023 Vorsitzender des Promotionsorgans: Prof. Dr. Markus Friedrich Neurath Gutachter: Prof. Dr. Frederik Bernd Laun Gutachter: Prof. Dr. Jürgen Winkler Gutachter: Prof. Dr. Arnd Dörfler Gutachter: Prof. Dr. Dimitrios Karampinos 2
To my family. 3
Contents 1 Zusammenfassung auf Deutsch1 5 2 Introduction 7 3 Magnetic Resonance 8 3.1 Nuclear Magnetic Resonance . . . . . . . . . . . . . . . . . . . . 8 3.2 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . 10 3.3 q-Space Trajectory Imaging . . . . . . . . . . . . . . . . . . . . . 12 3.4 Chemical Exchange Saturation Transfer . . . . . . . . . . . . . . 14 4 Machine Learning 18 4.1 Theory of Learning2 . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 19 4.3 The generalization puzzle . . . . . . . . . . . . . . . . . . . . . . 21 5 Brain Classification 22 6 Methods, Results, and Discussion of the Original Paper 24 6.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7 Original Paper 27 8 List of Abbreviations 28 9 List of Publications 29 9.1 Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 9.2 Conference abstracts . . . . . . . . . . . . . . . . . . . . . . . . . 29 9.3 Talks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 9.4 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 10 Contributions 31 11 Acknowledgment 32 1 Basedon my DS-ISMRM Abstract [104]. 2 Basedon my introduction for my proof ”Growth function uniform convergence” in a mathematical datascience seminar. 4
1 Zusammenfassung auf Deutsch Hintergrund und Ziele der Originalarbeit ”Hirngewebe besitzen Einzel-Voxel- Signaturen in der multispektralen MRT” [103]. Seit den bahnbrechenden Arbei- ten von Brodmann [4] und Vogt und Vogt [5] ist bekannt, dass verschiedene Ge- hirnregionen einzigartige zyto- und myeloarchitektonische Merkmale aufweisen. Hirngewebe - und andere Gewebe - auf der Grundlage ihrer intrinsischen Merk- male zu klassifizieren, ist ein langjähriges Bestreben im Bereich der Magnetre- sonanz (MR). Die Idee, Gewebe anhand ihrer T1 - und T2 -Relaxationszeiten zu klassifizieren, lässt sich bis in die Zeit vor dem Aufkommen der Magnetresonanz- tomographie (MRT) zurückverfolgen [21]. Tatsächlich motivierte Lauterbur da- mit die MRT [24], und das zu Recht; der hohe Weichteilkontrast, der sich aus den T1 - und T2 -Zeiten im menschlichen Körper ergibt, ist ein Eckpfeiler der heuti- gen Radiologie, die häufig relaxationszeitgewichtete MR-Bilder verwendet. Die ersten automatisierten Ansätze zur Klassifizierung von Geweben auf der Basis intrinsischer MR-Merkmale wurden in den 1980er Jahren vorgestellt [28]. Ob- wohl erfolgreiche Klassifizierungen beispielsweise für bis zu zehn Gewebeklassen berichtet wurden [41], schränkte die limitierte Menge an Eingangsmerkmalen die Möglichkeit ein, Klassifizierungen für eine höhere Anzahl von Gewebeklas- sen zu erreichen. Aus diesem Grund wurden atlasbasierte Ansätze mit großem Erfolg eingeführt, die räumliche Informationen nutzen, um die Anzahl poten- zieller Gewebeklassen an einer bestimmten Position zu reduzieren [52]. In der vorliegenden Studie untersuchte ich in Kooperation mit einem interdisziplinä- ren Team die Realisierbarkeit einer globalen Hirnklassifikation auf der Basis in- trinsischer MR-Merkmale. Zu diesem Zweck nutzte ich mehrere technologische Fortschritte. Erstens verwendete ich einen 7-Tesla-Scanner der neuesten Ge- neration, der ein erhöhtes Kontrast-Rausch-Verhältnis für viele MR-Kontraste bietet [90]. Zweitens verwendete ich eine neuartige Diffusions-MR-Technik, die q-Raum-Trajektorienbildgebung [85]. Mit dieser können nicht nur die voxel- gemittelten Diffusionsmetriken gemessen werden, sondern auch die Varianz der Diffusionstensoren innerhalb eines Voxels, was in vielen Regionen der kortikalen grauen Substanz mit mehr als einer dominanten Faserorientierung relevant ist. Drittens verwendete ich eine chemische Austausch-Sättigungs-Transfer-Sequenz (CEST). Der daraus resultierende Magnetisierungstransfer (MT)-Kontrast er- scheint als geeigneter Marker der Myelinisierung, der zur Unterscheidung und Segmentierung verschiedener kortikaler Regionen verwendet werden kann [83]. Darüber hinaus ist die Ultrahochfeld-CEST-Bildgebung reich an Informationen über verschiedene chemisch relevante Gewebekomponenten- und Eigenschaften, wie pH, Glutamat, Phosphokreatin, Proteingehalt und Lipide. In dieser Arbeit zeige ich, dass eine globale Klassifizierung des Gehirns somit möglich wird. Methoden. Ich rekrutierte 38 Probanden. Mit dem 7-Tesla-MRT-Scanner wur- den zwei hochaufgelöste Datensätze für die Goldstandard-Segmentierung auf- genommen: Ein 3D ptx T1 -gewichteter MPRAGE-Datensatz (0.65ˆ0.65ˆ0.65 mm³) und ein QSM-Datensatz (Quantitative Susceptibility Mapping, 0.6ˆ0.6ˆ 0.6 mm³) [85, 90]. Die CEST-MRT wurde mit einer Snapshot-Sequenz (1.8ˆ1.8 ˆ3 mm³) durchgeführt [91] mit zwei verschiedenen Sättigungs-B1-Niveaus von 5
0,7 und 1,0 µT , vorgesättigt bei jeweils 56 verschiedenen Offsets. Für den Vorsättigungspulszug wurde die Multiple Interleaved Mode Sättigungstechnik (MIMOSA) verwendet [93]. Nach Korrekturen für Bewegung, B1 - und B0 - Inhomogenitäten wurden die CEST-Peaks mit einem 5-Lorentz-Pool-Modell vo- xelweise gefittet. Diffusionsgewichtete Bilder wurden mit einer echoplanaren Spinecho-Sequenz [98] mit linearen, planaren und sphärischen b-Tensoren und b-Werten 0, 100, 500, 1000, 1500 und 2000 s/mm² aufgenommen (1,5ˆ1,5ˆ3 mm³). Ich führte Korrekturen für Bewegungen, Wirbelstrom-Effekte und Bild- verzerrungen durch. Diffusions- und Kovarianztensoren wurden voxelweise an- gepasst und die in [85] beschriebenen Diffusionsmetriken bestimmt. Die Daten wurden mittels des FSL-Registrierungstools FLIRT koregistriert auf den Ziel- datensatz MPRAGE. Der Klassifikationsansatz ist in Abb. 2 visualisiert. Es wurde ein Datensatz von allen Probanden außer einem jungen Mann (Test- datensatz) erstellt. Für jedes Voxel wurden die lokalen 15 QTI-Parameter, 210 diffusionsgewichteten Signale, vier Lorentz-CEST-Amplituden-Parameter und 112 z-Spektrumswerte aus den Bildern extrahiert, koregistert und auf den MPRAGE-Raum interpoliert und in einem 2D-Array mit 6 ¨ 106 Zeilen und 341 Spalten gespeichert. Die entsprechende anatomische Region wurde in einem one-hot-kodierten 2D-Array mit 6 ¨ 106 Zeilen und 102 Spalten gespeichert. Die beiden Datensätze wurden dann gemischt, gesplittet und auf einen Mittelwert von Null und eine Einheitsvarianz normalisiert und enthielten keine räumlichen Informationen mehr. Ich definierte ein vollständig verbundenes neuronales Netz- werk in TensorFlow Keras [81]. Nach dem Training wurde das Netzwerk verwen- det, um eine voxelweise Vorhersage für den Testteilnehmer durchzuführen. Die Genauigkeit, definiert als Anzahl der korrekt klassifizierten Voxel geteilt durch die Gesamtzahl der Voxel, wurde berechnet, zudem wurde eine Kreuzvalidie- rung durchgeführt. Um das Klassifikationsprinzip zu untersuchen, berechnete ich u.a. die Salienz-Vektoren [77] gemittelt über die jeweiligen Regionen des Testteilnehmers. Ergebnisse und Beobachtungen. In Erweiterung früherer Arbeiten zur globa- len Gehirnklassifikation habe ich neuartige hochdimensionale Kontraste in den Eingabedatenraum aufgenommen. Beim Ansatz der Einzel-Voxel-Klassifikation dient die räumliche Kohärenz der Vorhersageergebnisse in benachbarten Voxeln als inhärente Metrik der Zuverlässigkeit. Aktuelle atlasbasierte Klassifizierungs- ansätze übertreffen die beobachtete Genauigkeit von 60% [60]. Es ist verlockend, die MR-basierten Muster als analog zu histologischen Gewebefingerabdrücken zu betrachten [4, 5]. Mögliche Störfaktoren können B0- oder B1-Inhomogenitäten sein, die bei 7 Tesla im Vergleich zu niedrigeren Feldstärken zunehmen, und welche das Netzwerk zur Klassifizierung mit verwenden könnte (erweiterte Dis- kussion siehe [103]). Bei den CEST-Daten wurden die B0 -und B1 -Inhomogenität mit dem MIMOSA-Ansatz sowie erfassten Feldkarten adressiert. Schlussfolgerungen. Die Einzel-Voxel-Klassifikation von Hirngewebe basierend auf Hochfeld-Diffusions- und CEST-Merkmalen erreicht eine hohe Genauigkeit. Dies deutet darauf hin, dass einzigartige Merkmale von Hirnregionen nicht nur durch die Histologie, sondern auch durch Einzel-Voxel-MR-Signaturen erkenn- bar sind. 6
2 Introduction An image of an object may be defined as a graphical representation of the spatial distribution of one or more of its properties. Paul Lauterbur (1929-2007), [24] In medicine, magnetic resonance imaging (MRI) is a versatile tool used to ex- plore the body noninvasively. Among the main sectional imaging modalities, the underlying nuclear magnetic resonance effect (NMR) is considered to in- volve the least biological interference compared to diagnostic ultrasonography, which relies on sound waves, computed tomography, which relies on x-rays, and single-photon and positron-emission-tomography, which rely on radiotracers. Since the inception of all these methods in the second half of the 20th cen- tury, there has been a steadily increasing variety of available image acquisition techniques and contrast agents, each yielding 3D images depending on different physical, chemical, and biological tissue properties. In current clinical practice, these individual monochromatic images are reviewed by radiologists or by au- tomated image processing. It is common practice to display multiple images simultaneously (e.g., as a 2 ˆ 2 grid on a monitor screen) or to feed them to image processing algorithms (e.g., with up to four channels) [96]. In contrast, image information is discarded in this thesis, and a single-voxel paradigm is pursued instead. In general, this paradigm means inference of biological tissue properties of interest from the imaged tissue properties at hand. In particular, the feasibility of whole-brain segmentation using only local MR tissue properties is shown: [103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach, Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred- erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https: //doi.org/10.1016/j.neuroimage.2021.117986. This frame text aims to introduce the MRI sequences that were deployed for this study in Section 3 and the relevant machine learning (ML) concepts in Section 4. The history of the single-voxel paradigm and brain segmentation will be elaborated upon in Section 5. Finally, the limitations and implications of the present work will be discussed in Section 6. In 2021, [103] was awarded the Gorter prize of the German Chapter of the International Society for Magnetic Resonance in Medicine. 7
3 Magnetic Resonance MRI was introduced by [24] in 1973. It relies on NMR, which is a phenomenon of absorption and emission of electromagnetic fields that occurs in all nuclei with an odd number of protons or neutrons exposed to a static magnetic field. NMR was discovered by [7, 9, 10] in 1938 and 1946. This section is based on [75] and will only deal with the isotope 1H, the proton, which is abundant in organic material. 3.1 Nuclear Magnetic Resonance ÝÑ An NMR experiment requires a strong magnet, generating the static field B0 and coils that transmit and receive radiofrequency (RF) fields. The magnetic Ý Ñ moment vector Ý Ñµ of 1H exposed to B0 performs a spinning process depicted in Figure 1 referred to as precession, with angular frequency ω0 , given by Larmor’s Equation ω0 “ γ 0 B0 (1) where γ0 denotes the gyromagnetic ratio of 2.68 x 108 rad/s/tesla. The MR probe will resonate (i.e., absorb and emit RF fields at this frequency). For the 7-tesla scanner used for this thesis, this resonance results in a radio signal at approximately 298 Megahertz with a wavelength of roughly one meter in air and 13 cm in human tissue. The detectable MR signal strength depends on the equilibrium longitudinal ÝÑ ÝÑ Ý Ñ magnetization M0 of the probe. M0 is parallel to B0 and depends on the ther- mal energy compared to the quantum energy difference between parallel and Ý Ñ anti-parallel alignment of Ý Ñ µ along B0 . At the human body temperature T of 310 Kelvin, M0 can be approximated by ρ0 γ02 ℏ2 M0 « B0 (2) 4kT where ρ0 denotes the proton density of the probe, ℏ the reduced Planck constant, and k the Boltzmann constant. It can be shown that the voltage induced in the receive coil that forms the MR signal is proportional to ω0 M0 . Plugging Equation (1) and (2) into ω0 and M0 shows the benefit of using high field strengths: MR signal 9 ρ0 B02 . (3) Ý Ñ Let z be the direction of B0 . Excitation by a RF pulse is equivalent to ro- Ý Ñ ÝÑ ÝÝÑ tating the current magnetization vector M :“ Mz ` MK into the transversal Ý Ñ plane. Subsequently, M is subject to two independent and simultaneous pro- cesses termed relaxation: ÝÑ • Longitudinal relaxation: the longitudinal magnetization Mz grows back ÝÑ into M0 by energy transfer to the surroundings. After 90° rotation, the 8
Ý Ñ B0 ω0 Ñ Ý µ Figure 1: Illustration of Larmor precession growth of Mz is given by Mz ptq “ M0 p1 ´ e´t{T1 q (4) where T1 denotes the spin-lattice relaxation time. ÝÝÑ • Transversal relaxation: the transversal magnetization MK is subject to ÝÝÑ precession, causing the MR signal. MK decays by dephasing of the in- dividual magnetic moments it is composed of. On the one hand, this dispersion occurs due to spin-spin interaction, which is irreversible, and quantified using the T2 spin-spin relaxation time. On the other hand, dispersion happens due to constant local field inhomogeneities (e.g., iron deposits), which is quantified by Tinhom . The latter is reversible by ap- plying a 180° refocusing pulse. After 90° rotation, the decay of MK and, thus, the MR signal is approximately given by MK ptq “ M0 e´t{T2 e´t{Tinhom . (5) Therefore, the free induction decay (FID) after RF excitation of the MR signal with time constant T2˚ is approximately given by 1 1 1 “ ` . (6) T2˚ T2 Tinhom 9
Assume the probe is excited at regular intervals given by a repetition time T R, the MR signal is sampled after an echo time T E ă T R, and we are using a spinecho sequence (i.e., a 180° refocusing pulse is emitted waiting a time T2E after a 90° excitation pulse). By Equation (4) and (5), the strength of our MR signal in the steady state will be proportional to MK pT Eq “ M0 p1 ´ e´T R{T1 qe´T E{T2 . (7) Simply by varying the parameters T R and T E, we can determine the four properties M0 , T1 , T2 , and T2˚ of the probe. 3.2 Magnetic Resonance Imaging The NMR phenomenon described in Section 3.1 can be used to form an image. According to the Larmor Equation (1), spatial variation of the magnetic field ÝÑ Ý Ñ B0 with a gradient G will result in local differences in radio-field absorption and emission of the probe. This spatial encoding can be achieved, for example, Ý Ñ by generating three orthogonal gradients of the z-component of B0 , pointing along the x, y, or z direction. In the seminal paper [24], the gradient was used to produce one-dimensional projections of multiple probe rotations, which were then fed to an image reconstruction algorithm. A common current method for image formation is Fourier imaging: first, we use a xy-slice selection gradient Gz during the RF pulse and a transient phase encoding gradient Gy afterward. During the readout, a frequency encoding gradient Gx converts the MR signal into a spectrum around ω0 , resolved along the x-axis. Repeating this procedure for different Gy is equivalent to acquiring lines of the two-dimensional Fourier transform of the selected xy-slice’s signal, referred to as the k-space signal. After completion, k-space data can be converted to an image using algorithms such as [17, 2]. Typically, only the magnitude image is used, whereas the phase image is discarded. Three-dimensional Fourier-based approaches are replacing the slice selection gradient by another phase encoding gradient. There are receiver coil sensitivity encoding techniques, which are complementary to Fourier encoding [48]. In quantitative susceptibility mapping (QSM), the phase image is used to infer the magnetic field from frequency shift, which is then converted to the spatial distribution of magnetic susceptibility χ by field-to-source inversion [51]. In this thesis, independent operation of multiple transmit coils was used to alleviate RF excitation inhomogeneity at 7 tesla [106, 93]. As already noted by Lauterbur [24], in contrast to other imaging techniques, the resolution of MRI is not limited by the MR signal wavelength: at 7-tesla field strength, the human brain has been resolved to 100 micrometers [92], which is one 10000th of the wavelength. Isotropic resolution of 2.8 micrometers has been achieved at low temperatures using a microcoil [88]. Medical applications of MRI use its fine resolution and, even more importantly, its rich soft tissue contrast. Different tissues behave individually regarding χ, ρ0 , T1 , T2 , and T2˚ , and many pathological processes delineate well from their surroundings. 10
Resolvable tissue properties extend far beyond these constants using techniques including • In-phase/Out-of-phase imaging: in organic material, 1H mainly occurs in water (H2 O) and fat (CH2 and CH3 ). The precession frequency of fat is 3.35 ppm lower than water, and this difference can be harnessed for water-fat separation by slightly adjusting T E. • Inversion recovery: due to their different T1 times, tissues can be selec- tively suppressed by transmitting 180° RF pulses at an inversion time T I before regular excitation. This approach is commonly used to null cerebrospinal fluid (fluid-attenuated inversion recovery, FLAIR) and fat (short TI inversion recovery, STIR) • Diffusion gradients: by sequentially applying strong, opposite gradients, additional spin dephasing is generated by Brownian motion. Restriction of free Brownian motion serves as a rich source to detect tissue properties. The approach used for this thesis is described in Section 3.3. • Off-resonant RF pulses: presaturation or excitation with RF fields alter- nating with a slightly lower or higher frequency than the Larmor frequency of 1H in water can be used to detect 1H occurring in other organic com- pounds. The strategy used for this thesis is described in Section 3.4. • Perfusion: tissue perfusion can be assessed by arterial spin labeling (ASL) or administering contrast agents. Besides, there are different angiographic methods. The fact that deoxyhemoglobin is paramagnetic (χ ą 0), whereas oxyhemoglobin is diamagnetic (χ ă 0) leads to a difference in T2˚ , which can be harnessed for functional brain studies. • X-nuclei: using MRI setups suitable for low signal-to-noise ratios and short T E, other nuclei with magnetic moments can be assessed, such as 17 O, 23Na, 31P, 35Cl, 39K, and hyperpolarized gases, including 3He and 129 Xe. On the one hand, different MR properties are mostly independent and difficult to predict for individual tissues a priori. On the other hand, different MR im- ages of the body depict the identical configuration of tissues provided they were acquired at the same point in time and corrected for motion. This identical configuration results in a high degree of mutual information in images of differ- ent MR properties. Flexible acquisition techniques to capture these redundant properties simultaneously are being investigated [73, 99]. Recently, deep rein- forcement learning (Deep RL) has been shown to be a powerful tool for the operation of complex machines (e.g., robots [94] and tokamaks [116]). Deep RL might pave the way to even more flexible MRI setups, acquisition, and image reconstruction techniques [111, 113]. 11
3.3 q-Space Trajectory Imaging Since antiquity, random motion of particles in fluids and gases has been ob- served. This process is referred to as Brownian motion or diffusion, and the probability density function f px, tq for the location of a particle starting from the origin at time point t in a medium with diffusion constant ? D along any given axis x was linked to a Gaussian distribution with width 2Dt by Einstein in 1905 [3]: 1 x2 f px, tq “ ? e´ 4Dt (8) 4πDt Diffusion of 1H, typically bound to H2 O, can be observed by NMR, as demon- strated in 1965 by Stejskal and Tanner [18, 19]. Diffusion of 1H was first spatially resolved by Le Bihan and Breton in 1985 [27, 29]. Assuming isotropic Gaussian Ý Ñ diffusion, the signal strength S after applying bipolar diffusion gradients Gd can be expressed as ln S0 ´ ln Spbq Spbq “ S0 e´bD ðñ D “ (9) b ÝÑ where S0 denotes the baseline signal with zero Gd , D the diffusion coefficient, Ý Ñ whereas b is a measure of the spin dephasing effect of Gd . b is obtained by Ý Ñ ÝÑ defining an integral quantity q of ˘Gd , żt Ý Ñ Ý Ñ q ptq :“ γ Gd pτ qdτ (10) 0 and integrating the square norm of Ý Ñq over the echo time ż TE b :“ Ý Ñ q ptqJ Ý Ñ q ptqdt. (11) 0 For example, a rectangular gradient pulse of strength G with duration δ and activation delay ∆ for the opposite gradient can be defined by $ J ’p0, 0, Gq , ’ if 0 ď t ă δ &p0, 0, 0qJ , ’ if δ ď t ă ∆ Ý Ñ Gd ptq :“ J (12) ’p0, 0, ´Gq , if ∆ ď t ă δ ` ∆ ’ ’ p0, 0, 0qJ , if δ ` ∆ ď t ď T E % The b-value for this particular gradient is δ b “ γ 2 G2 δ 2 p∆ ´ q, (13) 3 That is to say, gradients that are long, strong, and far apart yield high diffusion 2 weighting. The diffusion coefficient for water at 35 °C is approximately 2.9 µm ms2 [25]. Due to Equation (8), after a typical gradient duration ? of 50 ms, around one third of H2 O particles will have moved further than 2Dt « 17µm. Diffusion in 12
tissue is neither isotropic nor Gaussian because it is compartmentalized by lipid bilayers and densely packed with macromolecules at smaller scales than this dis- placement (e.g., see https://doi.org/10.7554/eLife.25916.007). Acknowledging this fact prompts the use of multiple gradient directions and strengths. Sub- Ý Ñ jecting a probe to different G can be conceptualized as assigning the measured D from identity (9) to the coordinates of Ý Ñ q pδq. This three-dimensional data space is referred to as q-space [65]. A further step involves deploying different non-rectangular gradient profiles. Because different gradient profiles correspond to reaching the same point Ý Ñq pδq by different trajectories, it is termed q-space trajectory imaging (QTI, [85, 98]). QTI can be formally shown to disentangle tissue microstructure by the diffusion tensor distribution model: approximate the diffusion environment of each voxel by a three-dimensional close-packing of small microenvironments, the diffusion properties of which are described by individual second-order symmetric diffusion tensors D. Assign to each voxel the probability distribution of a random variable d when sampling from these diffusion tensors. For the sake of simplicity, we resort to the Mandel notation: ? ? ? d :“ pD11 , D22 , D33 , 2D23 , 2D13 , 2D12 q (14) Different distributions of d might have the same expectation Erds but still different auto-covariance matrices defined by covrd, ds :“ Erd b ds ´ Erds b Erds. (15) In terms of their sensitivity to diffusion microenvironment shapes, q-space tra- jectories can be grouped by a generalization of the b-value termed b-tensor B: ż TE B :“ Ý Ñ q ptqÝ Ñ q ptqJ dt. (16) 0 Let b denote the Mandel notation of B as in (14). B naturally extends b with respect to Equation (9): J SpBq “ ErS0 e´b dq s (17) This expectation can be approximated by Erds and covrd, ds: SpBq 1 « e´b Erds ` Trppb b bqpcovrd, dsqJ q J (18) S0 2 Hence, Erds and covrd, ds can in turn be approximated from a sufficient amount of b-tensors of different shapes, sizes, and orientations. Rotationally invariant parameters of the microstructure can be calculated from covrd, ds using the matrices Eiso and Eshear : 13
(a) F A (b) µF A Figure 2: Approximated macroscopic (a) and microscopic (b) fractional anisotropy maps of an axial brain slice of a healthy volunteer. ¨ ˛ ¨ ˛ 1 0 0 0 0 0 2 ´1 ´1 0 0 0 ˚0 1 0 0 0 0‹ ˚´1 2 ´1 0 0 0‹ ˚ ‹ ˚ ‹ 1˚0 0 1 0 0 0‹‹ ; Eshear :“ 1 ˚´1 ˚ ´1 2 0 0 0‹‹ (19) Eiso :“ ˚ ˚0 3˚ 0 0 1 0 0‹ 9˚˚0 0 0 3 0 0‹ ‹ ‹ ˝0 0 0 0 1 0‚ ˝0 0 0 0 3 0‚ 0 0 0 0 0 1 0 0 0 0 0 3 With the Frobenius inner product, Eiso and Eshear can, for example, be used to calculate the macroscopic fractional anisotropy F A of Erds and the microscopic fractional anisotropy µF A of the distribution of d. 3 TrppErds b ErdsqEshear J q F A2 :“ (20) 2 TrppErds b ErdsqEiso J q J 3 TrpErd b dsEshear q µF A2 :“ J q (21) 2 TrpErd b dsEiso µF A is more sensitive than F A to the microscopic anisotropy occurring, for example, in crossing fibers, as shown in Figure 2. There are concepts that go even further than QTI but are not yet clinically available: by retrieving phase information from q-space after a combination of ultra-high long and short diffusion gradients, even microscopic shapes of diffusion restricting boundaries can be inferred. This ability was demonstrated by Laun in 2011 [67] and is termed diffusion pore imaging (DPI). 3.4 Chemical Exchange Saturation Transfer Depending on its chemical species, 1H experiences slightly varied strengths of ÝÑ the static magnetic field B0 . According to Equation (1), this corresponds to variations in the angular frequency ωp of the pool of 1H occurring in a specific 14
chemical compound. This phenomenon is referred to as chemical shift and was discovered soon after NMR itself [11]. The RF separation of different ωp increases linearly with B0 . ωp is commonly normalized using the water pool: ωp ´ ω0 ∆ωp :“ (22) ω0 By broadband RF excitation and observation of the MR signal during FID, a spectrum generated by pools of chemical compounds can be acquired. This technique is termed nuclear magnetic resonance spectroscopy (MRS), which was pioneered by Ernst in the 1970s and 80s [20, 26]. MRS has become a principal analytical tool for the identification of molecules and structural elucidation in chemistry and biology [66]. In routine medical imaging, MRS plays a role in brain tumor confirmation, which shows decreased levels of N-acetyl-aspartate and increased choline [30]. Due to the low concentration of the 1H pools cor- responding to molecular species of interest in tissue, MRS requires on-resonant presaturation of the water pool before off-resonant excitation and has a low signal-to-noise ratio (SNR). Chemical exchange saturation transfer (CEST [46, 49, 56]) chooses a differ- ent approach: the rare 1H pools are addressed by off-resonant presaturation, whereas only the water pool is observed directly. For CEST, off-resonant pre- saturation is followed by on-resonant excitation repeatedly. This strategy leads to signal amplification from rare pools by their rate kp of saturation transfer to the water pool. The measured water pool attenuation for different ∆ω is referred to as Z-spectrum Zp∆ωq. Saturation transfer occurs due to cross-relaxation and 1 H chemical exchange. In systems of two spins, cross-relaxation gives rise to the nuclear Overhauser effect (NOE), which influences longitudinal (T1 ) relaxation and depends on interaction distance and mobility, as first described by Solomon in 1955 [12]. In brain tissue, the main contributing effects to the Z-spectrum are: • Direct presaturation of the water pool. ∆ωwater P[-1 ppm,+1 ppm]. • Semi-solid macromolecular magnetization transfer (MT, [33]): surfaces of macromolecular structures such as proteins and cell-membranes have a very broad resonance frequency ∆ω. Transfer occurs due to chemical exchange and cross-relaxation with the local water pool with restricted mobility. ∆ωMT P[-100 ppm,+100 ppm]. • Relayed-NOE [72]: saturation due to intramolecular NOE in mobile macro- molecules such as soluble proteins and lipids relays to the unrestricted water pool at ∆ωNOE P[-2 ppm,-5 ppm]. • Amide proton transfer (CEST-APT [54]): proteins are polyamides of amino acids, and the CO – NH bond chemically exchanges its proton with the water pool at an approximate rate of 28 s´1 . ∆ωamide « 3.5 ppm. 15
• Amine proton transfer (CEST-APEX [70]): amino groups ( – NH2 ) (e.g., of lysine and arginine) perform rapid chemical exchange of protons with the water pool at a rate of 700– 10000 s´1 [42]. ∆ωamine « 2 ppm. The increased SNR of CEST comes at the cost of a number of confounding factors that have to be accounted for: • Spillover dilution: there is unintended direct water presaturation. • T1 -scaling: water pools with short T1 can accumulate less transferred pre- saturation compared to water pools with long T1 . • B0 : throughout the imaging volume, inhomogeneity of the static field leads to location-dependent scaling of ωp . • B1 : inhomogeneity of the RF pulse for presaturation results in location- dependent variation of Z-spectrum attenuation. • Temperature and pH: the rate of chemical exchange kex of 1H is dependent on temperature and pH [23]. This dependence has been harnessed for a pH-sensitive contrast by Zhou et al. [55]. • Overlay: the main pools contributing to the Z-spectra have significant overlap with respect to ωp . In 2012, Zaiss demonstrated that the effects add up inversely to form the Z- spectrum [71, 79]. This inverse metric can be used to construct a spillover- robust parameter for each pool referred to as M T RRex ([80]) by decomposing the Z-spectrum into Lorentzian functions Lp for each pool of the form: Γ2p Lp p∆ωq :“ Ap , (23) Γ2p ` 4p∆ω ´ δp q2 where Ap , Γp , δp are pool-specific empiric starting parameters with boundaries for a fit algorithm such as [8] optimizing ! ÿ Zp∆ωq “ c ´ Lp p∆ωq, (24) p where c is a constant signal reduction. For all five pools, a reference Z-spectrum Zref,p :“ Zlab ` Lp can be calculated from the fitted Z-spectrum Zlab and the respective Lorentzian Lp , yielding 1 1 M T RRex p∆ωp q “ ´ . (25) Zlab p∆ωp q Zref,p p∆ωp q In this thesis, Z-spectra have further been corrected for B0 inhomogeneity, using the water peak, and for B1 inhomogeneity, using a B1 map, two B1 values, and a homogeneous presaturation pulse [93]. In Figure 3, the marked effect of different chemical composition on Z-spectra is demonstrated in a phantom measurement. I acquired these data prior to the actual study, using the same scanner and CEST sequence as in [103], with a reduced offset list. 16
1 0.9 0.8 0.7 0.6 ) Z( 0.5 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 -2 -4 -6 -8 -10 [ppm] 1 0.9 0.8 0.7 0.6 ) 0.5 Z( 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 -2 -4 -6 -8 -10 [ppm] 1 0.9 0.8 0.7 0.6 ) 0.5 Z( 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 -2 -4 -6 -8 -10 [ppm] Figure 3: Single-voxel Z-spectra in test tubes with different chemical compounds measured simultaneously in a phantom at 7 tesla. Top: 0.9% sodium chloride, middle: 125 mM creatine, bottom: egg white. Interpolation method: linear. 17
4 Machine Learning The term machine learning was popularized in 1959 by Arthur Samuel in his paper [16], which describes his seminal work on computer checkers when working for IBM [37]. The modern definition for machine learning is provided by [45]: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E.” In the common case of supervised learning, E is a dataset of features, annotated with a label or target. Typical examples for T are classification and regression, along with accuracy and mean-squared error as P , respectively [84]. Central problems in the field of artificial intelligence have been solved by a machine learning technique termed deep learning [84]. The aim of Section 4.1 is to provide a formal definition of ML, whereas the deep learning algorithm used for this thesis is introduced in Section 4.2. Finally, the mystery of deep learning and recent progress in understanding it is briefly outlined in Section 4.3. 4.1 Theory of Learning A core task of machine learning (ML) is to infer a binary classification rule from labeled data. This process is formalized in terms of statistical learning theory given in the definitions below. Definition 4.1. (ML setting) The training dataset S “ txp1q , xp2q , ..., xpnq u and new data are drawn independently and at random from a probability distri- bution D over the instance space X. It is labeled by a target concept c˚ Ď X, corresponding to the positive class of the binary classification. Example 4.2. One medical application of ML could be to decide whether a new patient has a particular disease. Possible instance spaces X include t0, 1ud (e.g., for d Boolean-valued sign-and-symptom features of the patient) and Rd (e.g., for d blood test results). The target concept c˚ would then correspond to the proper diagnosis, and the training dataset S would be a representative collection of old patient records consisting of the clinical or laboratory findings, respectively. Definition 4.3. (ML task) Using S, we choose a hypothesis h from a hypothe- sis class H Ď PpXq, where PpXq is the power set of X. Our goal is to minimize the true error of h #ş ˚ f pxqdx if D is continuous h△c˚ D errD phq “ P robph△c q “ ř ωPh△c˚ fD pωq if D is discrete where △ denotes the symmetric difference h△c˚ “ h Y c˚ zph X c˚ q, and fD denotes the probability density or mass function of D. We can only measure the training error of h |S X ph△c˚ q| errS phq “ . |S| 18
For any given h, we refer to the situation when errS phq and errD phq are both low, generalization of h. The opposite, when errS phq is low but errD phq is high is termed overfitting of h and is a major issue in ML. For a given confidence level 1 ´ δ and generalization measure ϵ, we are, therefore, seeking a bound for the sample size n “ |S|. When we do not achieve a perfect fit of S (i.e., errS phq ą 0), we refer to this as the non-realizable case and formalize the corresponding generalization guarantee as follows: Definition 4.4. (ML uniform convergence) Consider a fixed H and ϵ, δ ą 0, and let S be drawn from D with size |S|. Then, the smallest possible sample size m :“ mpH, ϵ, δq is termed uniform convergence sample complexity if with a probability greater than or equal to 1 ´ δ for every h in H, errS phq, and errD phq differ by no more than ϵ for any |S| ě m regardless of D: P rob p@h P H : |errS phq ´ errD phq| ď ϵq ě 1 ´ δ. If we assume |H| to be finite, it is possible to derive a uniform convergence sample complexity of the form ˆ ˙ 1 2 m “ 2 ln |H| ` ln . 2ϵ δ A tighter measure of the complexity of H that also extends to some cases when |H| is infinite is the Vapnik– Chervonenkis Dimension VCdimpHq, introduced by [22]. Definition 4.5. (Shattering) H shatters a set A if each subset of A can be expressed as A X h, h P H: PpAq “ tA X h|h P Hu. Definition 4.6. (VC-dimension) The VC dimension VCdimpHq is the size of the largest set shattered by H. 4.2 Artificial Neural Networks Consider the problem of finding a linear separator for a labeled training dataset S “ txp1q , xp2q ..., xpnq u from the instance space X “ Rd . Let l P t´1, 1un be our labeling vector. We have to find a vector and threshold pair w P Rd , t P R satisfying pwJ xpiq ´ tqli ą 0 for all 1 ď i ď n. (26) This linear separator is referred to as perceptron and was the first and most simple artificial neural network introduced by Frank Rosenblatt [13, 15] and was celebrated as a ”digital brain” [14]. It comprises just one neuron, the sum as an input function, and the signum function as the activation function, as visualized in Figure 4. A neural network (NN) can be viewed as an acyclic directed graph composed of perceptrons, and a general architecture is the fully connected feedforward NN (FC-NN), [100]. 19
1 t bias I1 w1 w2 ř xpiq I2 li data .. .. Input Activation label . . function function Id wd weights Input layer Figure 4: The Perceptron Definition 4.7. (FC-NN) An FC-NN Φ is given by its activation function ϱ : R Ñ R, its number of layers L P N and neurons N P NL`1 , which denotes the number of neurons in the input layer N0 P N, each hidden layer Nl P N, 1 ď l ď L ´ 1, and the output layer NL P N. The number of parameters is given by L ÿ P pN q :“ Nl Nl´1 ` Nl (27) l“1 and the parameters are denoted by L ą θ “ ppW plq , bplq qqL l“1 P RNl ˆNl´1 ˆ RNl “ RP pN q . (28) l“1 For 1 ď l ď L we define a recursive sequence, starting with the input vector Φp0q px, θq :“ x P RN0 : ´ ¯ Φplq px, θq :“ ϱ W plq Φpl´1q px, θq ` bplq (29) The FC-NN is given by the function Φpx, θq : RN0 ˆ RP pN q Ñ RNL ; Φpx, θq ÞÑ ΦpLq px, θq. (30) plq plq plq W ,b and Φ are referred to as the weights, biases, and activations of layer l. A common choice for the one-variable activation function is ϱpxq :“ maxt0, xu, where ϱ is applied component wise in (29). For classification tasks, it is common practice to encode class i by the canonical basis vector ei P RNL , which is referred to as one-hot encoding. In the last recursion step (29), ϱ is replaced by the softmax activation function ezi σ : RNL Ñ RNL ; σi pzq ÞÑ řNL for 1 ď i ď NL . (31) zj j“1 e 20
N řNL The softmax activation function assures σpzq P r0, 1s L and i“1 σi pzq “ 1, normalizing ΦpLq to a probability mass function. The parameters θ of Φ are typically optimized by a loss function L pΦpx, θq, yq for each input x and labeling y and a gradient-based method. This method is an iterative algorithm with an update rule using the pointwise derivative and a step size or learning rate η: θpi`1q :“ θpiq ´ η∇θ L pΦpxpiq , θpiq q, ypiq q (32) In practice, the adaptive moments (Adam [76]) modification of a stochastic gradient descent is commonly employed. In this thesis, the categorical cross- entropy loss function was used, featuring a small α to prevent blow-up and an additional L2-regularization term penalizing large parameters scaled by a small β: N ÿ L L pΦpx, θq, yq :“ ´ Φi px, θq log2 pyi ` αq ` β∥θ∥22 (33) i“1 Dropout [78] is a further empirical regularization technique that was employed and that can be expressed as multiplying each component of each hidden layer Φplq px, θq in the recursive rule (29) with Bernoulli distributed random variables Bi „ Bernppq, 1 ď i ď Nl , where p denotes the dropout rate. When using one-hot encoding, the importance of each dimension of input data xpiq to a classification output Φj px0 , θq of class j can be assessed using saliency [74], which is determined by evaluating the local derivative BΦj px, θq . (34) Bx xpiq 4.3 The generalization puzzle The growth function uniform convergence theorem [22] using VCdim from sta- tistical learning theory and the bound VCdimpΦq P OpP pN qL log P pN qq ([86] using Bachmann-Landau notation) can be employed to prove a bound for the generalization error of an FC-NN with fixed L and a constant c [100]: d P pN q log P pN q |errD phq ´ errS phq| ď c (35) |S| Thus, one would at least demand that |S| be larger than P pN q. By contrast, impressive generalization performance is achieved with NNs containing orders of magnitude more parameters than training data. Not only is bound (35) rendered vacuous by practical experience, but there are even experimental re- sults demonstrating convergence to zero training error of the same architecture on datasets with random labels with barely increased optimization effort [114]. Even convergence on random noise is achieved with some more effort. The brain is another example of heavy overparametrization compared to its amount of ex- perience [57]. The apparent contradiction between these empirical results and statistical learning theory is referred to as the generalization puzzle and has not 21
been resolved despite significant theoretical work. A new, powerful explanation is offered by the lottery ticket hypothesis, stating that ”a neural network contains a subnetwork that matches the performance of the trained network already at initialization”. The lottery ticket hypothesis was an empirical serendipity3 of Frankle and Carbin in 2018 [89], corroborated by [95]. Apart from surprising generalization capabilities, NNs show outstanding per- formance with respect to approximation of S and optimization of θ, despite the non-convexity of the gradient ∇θ L pΦpx, θq, yq, and they perform exception- ally well on high-dimensional data [100]. Special properties of the data can be hard-coded into more specialized network architectures, notably the convolu- tional NN introduced by LeCun in 1989 [32], which harnesses the relatedness of neighboring pixels by using convolutions with multiple small kernels instead of a full-matrix multiplication in the recursion rule (29). By refining the FC- NN with attention mechanisms (i.e., adaptive mechanisms that enhance and diminish parts of the input data), even more powerful general-purpose architec- tures have been achieved, such as the transformer [87] and the perceiver [109, 108]. These architectures allow large-scale, multi-modal data processing without domain-specific assumptions and are considered a step towards artificial general intelligence [119]. 5 Brain Classification The foundation of macroscopic brain anatomy was laid by the seventh book of Andreas Vesal’s De Humani Corporis Fabricia in 1543 [1]. Around 1900, a breakthrough in microscopic brain anatomy was achieved through advances in light microscopy, new histological stains, and the discovery of the nerve cell. At this time, Korbinian Brodmann published his epoch-making work Comparative Localization Theory of the Cerebral Cortex in which he proposed a parcellation of the cerebral cortex according to histological aspects [4]. The Brodmann areas correspond to the functional division of the cerebral cortex, as later established by the neurosurgeon Walter Penfield through electrical stimulation during awake craniotomy [6]. Since then, there has been an explosion of brain research due to new experimental methods, including MRI, which has recently been used to refine the Brodmann parcellation [83]. Automated segmentation of anatomical and pathological structures in MRI, particularly in the brain, has continued to be an important challenge. The original techniques for segmentation date back to a time when powerful image processing was not yet available and are, therefore, based on the intensity of the tissue contained in the voxel at different MR sequences (i.e., the single- voxel paradigm). The earliest work on this topic from 1985 used three different MR weightings and an unsupervised clustering program of NASA for multi- 3 The Princes of Serendip found a treasure when looking for something else. 22
Figure 5: Six manually defined cuboids in the single-voxel data space (left) corresponding to anatomical structures in the image space (right). Reprinted with permissions from [38]. spectral satellite images. This program was used to separate healthy brain parenchyma, hemorrhage, and cerebrospinal fluid [28]. Eight years later, the distinction of gray and white matter succeeds on the basis of T1 , T2 ρ0 weighted images [38]. For this purpose, the authors manually defined cuboids in the three- dimensional data space (reciprocal of T1 , reciprocal of T2 and ρ0 ) that can be seen in Figure 5. In 2000, with five MRI contrasts, even the subdivision in 15 classes including the thalamus, putamen, caudatus, and pallidum succeeded (T1 , T2 , ρ0 , Gadolinium-T1 , and perfusion imaging [50]). At the same time, progress in image processing led to the 2002 article Whole Brain Segmentation by the developers of the Freesurfer program [52], introducing the gold standard of spherical registration, which is still valid today: using T1 images, individual cerebral cortex surfaces are mapped onto a spherical surface, which in turn is mapped onto a uniform atlas. The inverse of the mapping then yields individual atlases of the Brodmann areas. Convolutional NNs have been trained on many Freesurfer segmentations to perform the same task substantially faster [97]. Due to the success of image processing, the multi-parametric and single-voxel paradigms mentioned above could be replaced, and the main research on brain segmentation using this technique has preceded the year 2000. Other notable papers include [31, 34, 36, 35, 39, 40, 41, 43, 44, 47]. Since then, Chai et al. [63] and West et al. [68] have used the single-voxel paradigm for volumetric brain analyses, and Bastiani et al. [82] for ex-vivo, high-resolution MR studies of the brain. In the prostate, the single-voxel paradigm has been employed more recently for cancer classification ([53, 61, 62, 59, 64, 69, 88]). 23
6 Methods, Results, and Discussion of the Orig- inal Paper In the paper ”Brain tissues have single-voxel signatures in multi-spectral MRI” [103], the QTI and CEST imaging techniques as introduced in Section 3 were used to acquire 341 raw and computed three-dimensional MR images of the brain of 38 participants. Using the multi-parametric single-voxel paradigm, an FC-NN as described in Section 4 was trained on each individual voxel to predict the correct brain tissue, defined by the gold standard Freesurfer segmentation from Section 5. The FC-NN achieved roughly 60% classification accuracy for 97 tissue classes in the test case. Conceptually, the present work draws on the aforementioned classification work [50], increasing the number of MR features and tissue classes. In contrast to [50], the present work does not rely on the spatial neighborhood. 6.1 Strengths A natural approach would be to train a 3D-CNN with 341 channels to do the classification task. Typically, a clinical sample size, such as 38, is unlikely to sufficiently cover the distribution D of all possible brain shapes. One possibility is to divide the images into patches of, for example, 7x7x7 voxels, considerably increasing the sample size. This approach is possible and yields a very good classification accuracy (>90% just for QTI data). Using so many channels, one does not know whether this accuracy comes from brain shape or tissue-specific multi-parametric MR signatures. Because the latter are a product of the bio- logical tissue properties, they are the favored option. Note that the assessment of high-dimensional single-voxel signatures is neither backed into the architec- ture of a CNN nor into a radiologist reviewing multiple imaging studies: both rather look out for macroscopic shapes in each channel individually. Consider the following analogy: a customer helpline employs an ML algorithm rating caller rage. A complex algorithm using the entire conversation might either depend on indicators of rage in the caller’s voice or in the content of the spoken words. A simple algorithm (e.g., an FC-NN trained on spectrograms of short time frame windows) will base its predictions only on the caller’s voice (and could provide constant updates). Similarly, the single-voxel paradigm could be used to highlight additional biomedical information on multi-parametric MRI and other modalities in clinical radiology. For classification targets with dom- inant intra-subject variation, just one participant has been shown to suffice as a source of training data. 6.2 Limitations Every voxel is associated to a participant. Therefore, realistic testing should be performed on a participant who is not contributing voxels to the training dataset S. This requirement causes the distribution D for new data to be differ- ent, violating Definition 4.1. Furthermore, realistic testing should be performed 24
on more participants. This issue was resolved using 38-fold cross-validation. Confounding factors, such as inhomogeneity of B0 , B1 , and coil sensitivity will 65 1 3 0.9 6 60 0.8 9 mean predicted age (years) 12 0.7 55 15 0.6 true identity 18 50 0.5 21 0.4 24 45 27 0.3 Pearson's r = 0.87571 30 0.2 40 33 0.1 36 35 0 20 30 40 50 60 70 80 3 6 9 12 15 18 21 24 27 30 33 36 true age (years) predicted identity Figure 6: Biomedical targets with dominant inter-subject variance. Left: Mean test prediction of age for each participant after 38-fold cross-validation. Right: Confusion matrix of identity prediction, row-normalized. imprint spatial information into the single-voxel signatures to some extent, fa- cilitating classification. Several strategies were used to account for this effect. • Regarding CEST, B1 -inhomogeneity was increased with multiple inter- leaved pulses [93], and B0 and B1 -correction of the Z-spectra was per- formed using field maps. • QTI and CEST parameters of tissue classes were averaged and appeared highly individual. Additionally, each 341-dimensional vector was pro- jected into the two-dimensional plane using t-distributed stochastic neigh- bor embedding [58]. This unsupervised approach showed some grouping into tissue classes. • Averaged saliency vectors for each tissue class were computed, showing biologically plausible linear b-tensor saliency for anisotropic white matter structures. • Analogous to the 341-input FC-NN, a three-input FC-NN was trained on B0 and B1 field maps and on a ρ0 intensity map. This countercheck yielded an approximate 20% test accuracy. 25
• Tissue class is a biological target with dominant intra-subject variance. By comparison, the subjects’ age of life is a biological target with domi- nant inter-subject variance and does not depend on spatial information. An analogous 341-input, scalar-output FC-NN was trained to predict age for each voxel. After 38-fold cross-validation, a Pearson’s correlation co- efficient of 0.88 was achieved. This coefficient was computed using the averaged prediction in each test case after 38-fold cross-validation and the true age. See Figure 6. • An additional bottleneck layer with 16 neurons between the second and third hidden layer was introduced to the architecture of the FC-NN. Trans- forming the bottleneck neuron activations back into the brain image do- main revealed no obvious spatial encoding inside the network. Predicting biomedical targets with dominant inter-subject variance is key to using the single-voxel paradigm in a general context. Clinical application of the single-voxel paradigm requires ML algorithms that generalize after training on voxels from few participants. In contrast to the case of age, a 341-input FC-NN trained on the binary classification Parkinson’s disease did not show any discriminative ability on the test cases. The MR methods might simply be too imprecise, the stage of disease too early, or the underlying pathome- chanisms too heterogeneous. However, the FC-NN always showed promising performance during training time. A plausible explanation for this discrepancy is the pronounced identity information imprinted into single-voxel signatures: A QTI-only, 225-input FC-NN trained on 80% of voxels to predict the one- hot-encoded identity of the participants showed 40% accuracy in the 20% test voxels (no cross-validation required, confusion matrix shown in Figure 6). This identity-information imprint could be reproduced using CEST only. Further re- search into the nature and control of the identity information could be beneficial to fully unlock the potential of the single-voxel paradigm. 26
7 Original Paper [103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach, Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred- erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https: //doi.org/10.1016/j.neuroimage.2021.117986. 27
8 List of Abbreviations CEST Chemical Exchange Saturation Transfer FC-NN Fully Connected Artificial Neural Network FID Free Induction Decay NMR Nuclear Magnetic Resonance NN Neural Network NOE Nuclear Overhauser Effect ML Machine Learning MR Magnetic Resonance MRI Magnetic Resonance Imaging QTI q-Space Trajectory Imaging SNR Signal-to-Noise Ratio TE Time to Echo TR Repetition Time 28
9 List of Publications 9.1 Papers [103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach, Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred- erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https: //doi.org/10.1016/j.neuroimage.2021.117986 [110] Andrzej Liebert, Katharina Tkotz, Jürgen Herrler, Peter Linz, Ange- lika Mennecke, Alexander German, Patrick Liebig, Rene Gumbrecht, Manuel Schmidt, Arnd Doerfler, Michael Uder, Moritz Zaiss, and Armin M Nagel. “Whole-brain quantitative CEST MRI at 7T using parallel transmission meth- ods and B 1 + correction”. In: Magn. Reson. Med. 86.1 (July 2021), pp. 346– 362 9.2 Conference abstracts [102] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach, Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred- erik Laun. “Brain tissues have single-voxel signatures in multi-spectral MRI”. in: Poster presented at ISMRM & SMRT Annual Meeting & Exhibition. 2021 [107] Leonie E. Hunger, Alexander German, Felix Glang, Katrin M. Khakzar, Nam Dang, Angelika Mennecke, Andreas Maier, Frederik Laun, and Moritz Zaiss. “DeepCEST: 7T Chemical exchange saturation transfer MRI contrast inferred from 3T data via deep learning with uncertainty quantification”. In: Poster presented at ISMRM & SMRT Annual Meeting & Exhibition. 2021 [101] Moritz Simon Fabian, Felix Glang, Katrin Michaela Khakzar, Angelika Barbara Mennecke, Alexander German, Manuel Schmidt, Burkhard Kasper, Arnd Doerfler, Frederik B. Laun, and Moritz Zaiss. “Reduction of 7T CEST scan time and evaluation by L1-regularised linear projections”. In: Poster pre- sented at ISMRM & SMRT Annual Meeting & Exhibition. 2021 [105] Felix Glang, Moritz Fabian, Alexander German, Katrin Khakzar, Angelika Mennecke, Frederik Laun, Burkhard Kasper, Manuel Schmidt, Arnd Doerfler, Klaus Scheffler, and Moritz Zaiss. “Linear projection-based CEST reconstruc- tion – the simplest explainable AI”. in: Poster presented at ISMRM & SMRT Annual Meeting & Exhibition. 2021 [112] Angelika Mennecke, Katrin Khakzar, Kai Herz, Moritz Fabian, Alexander 29
Sie können auch lesen