Brain tissues have single-voxel signatures in - multi-spectral MRI Der Medizinischen Fakultät der Friedrich-Alexander-Universität ...

Die Seite wird erstellt Linus Hesse

Kunst und Unterhaltung

Deutsch

Like
Teilen
Einbetten
Vollbild
Folien
HTML Herunterladen
PDF Herunterladen
Missbrauch

←

WEITER LESEN

→

Transkription von Seiteninhalten

Wenn Ihr Browser die Seite nicht korrekt rendert, bitte, lesen Sie den Inhalt der Seite unten

Brain tissues have single-voxel signatures in
            multi-spectral MRI
            Der Medizinischen Fakultät
        der Friedrich-Alexander-Universität
                Erlangen-Nürnberg
                         zur
       Erlangung des Doktorgrades Dr. med.
                    vorgelegt von
         Alexander Simon Maria German
                 aus Berlin-Steglitz

                        1

Als Dissertation genehmigt
von der Medizinischen Fakultät
der Friedrich-Alexander-Universität Erlangen-Nürnberg
Tag der mündlichen Prüfung: 21. Februar 2023

Vorsitzender des Promotionsorgans: Prof. Dr. Markus Friedrich Neurath

Gutachter:   Prof.   Dr.   Frederik Bernd Laun
Gutachter:   Prof.   Dr.   Jürgen Winkler
Gutachter:   Prof.   Dr.   Arnd Dörfler
Gutachter:   Prof.   Dr.   Dimitrios Karampinos

                                         2

To my family.

    3

Contents
1 Zusammenfassung auf Deutsch1                                                                                                             5

2 Introduction                                                                                                                             7

3 Magnetic Resonance                                                                                                                       8
  3.1 Nuclear Magnetic Resonance . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
  3.2 Magnetic Resonance Imaging . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
  3.3 q-Space Trajectory Imaging . . . . . . .                                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  3.4 Chemical Exchange Saturation Transfer                                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   14

4 Machine Learning                                                           18
  4.1 Theory of Learning2 . . . . . . . . . . . . . . . . . . . . . . . . . 18
  4.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 19
  4.3 The generalization puzzle . . . . . . . . . . . . . . . . . . . . . . 21

5 Brain Classification                                                                                                                    22

6 Methods, Results, and Discussion of the Original Paper                      24
  6.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
  6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 Original Paper                                                                                                                          27

8 List of Abbreviations                                                                                                                   28

9 List      of Publications                                                                                                               29
  9.1       Papers . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29
  9.2       Conference abstracts      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29
  9.3       Talks . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
  9.4       Interviews . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30

10 Contributions                                                                                                                          31

11 Acknowledgment                                                                                                                         32

  1 Basedon my DS-ISMRM Abstract [104].
  2 Basedon my introduction for my proof ”Growth function uniform convergence” in a
mathematical datascience seminar.

                                                          4

1    Zusammenfassung auf Deutsch
Hintergrund und Ziele der Originalarbeit ”Hirngewebe besitzen Einzel-Voxel-
Signaturen in der multispektralen MRT” [103]. Seit den bahnbrechenden Arbei-
ten von Brodmann [4] und Vogt und Vogt [5] ist bekannt, dass verschiedene Ge-
hirnregionen einzigartige zyto- und myeloarchitektonische Merkmale aufweisen.
Hirngewebe - und andere Gewebe - auf der Grundlage ihrer intrinsischen Merk-
male zu klassifizieren, ist ein langjähriges Bestreben im Bereich der Magnetre-
sonanz (MR). Die Idee, Gewebe anhand ihrer T1 - und T2 -Relaxationszeiten zu
klassifizieren, lässt sich bis in die Zeit vor dem Aufkommen der Magnetresonanz-
tomographie (MRT) zurückverfolgen [21]. Tatsächlich motivierte Lauterbur da-
mit die MRT [24], und das zu Recht; der hohe Weichteilkontrast, der sich aus den
T1 - und T2 -Zeiten im menschlichen Körper ergibt, ist ein Eckpfeiler der heuti-
gen Radiologie, die häufig relaxationszeitgewichtete MR-Bilder verwendet. Die
ersten automatisierten Ansätze zur Klassifizierung von Geweben auf der Basis
intrinsischer MR-Merkmale wurden in den 1980er Jahren vorgestellt [28]. Ob-
wohl erfolgreiche Klassifizierungen beispielsweise für bis zu zehn Gewebeklassen
berichtet wurden [41], schränkte die limitierte Menge an Eingangsmerkmalen
die Möglichkeit ein, Klassifizierungen für eine höhere Anzahl von Gewebeklas-
sen zu erreichen. Aus diesem Grund wurden atlasbasierte Ansätze mit großem
Erfolg eingeführt, die räumliche Informationen nutzen, um die Anzahl poten-
zieller Gewebeklassen an einer bestimmten Position zu reduzieren [52]. In der
vorliegenden Studie untersuchte ich in Kooperation mit einem interdisziplinä-
ren Team die Realisierbarkeit einer globalen Hirnklassifikation auf der Basis in-
trinsischer MR-Merkmale. Zu diesem Zweck nutzte ich mehrere technologische
Fortschritte. Erstens verwendete ich einen 7-Tesla-Scanner der neuesten Ge-
neration, der ein erhöhtes Kontrast-Rausch-Verhältnis für viele MR-Kontraste
bietet [90]. Zweitens verwendete ich eine neuartige Diffusions-MR-Technik, die
q-Raum-Trajektorienbildgebung [85]. Mit dieser können nicht nur die voxel-
gemittelten Diffusionsmetriken gemessen werden, sondern auch die Varianz der
Diffusionstensoren innerhalb eines Voxels, was in vielen Regionen der kortikalen
grauen Substanz mit mehr als einer dominanten Faserorientierung relevant ist.
Drittens verwendete ich eine chemische Austausch-Sättigungs-Transfer-Sequenz
(CEST). Der daraus resultierende Magnetisierungstransfer (MT)-Kontrast er-
scheint als geeigneter Marker der Myelinisierung, der zur Unterscheidung und
Segmentierung verschiedener kortikaler Regionen verwendet werden kann [83].
Darüber hinaus ist die Ultrahochfeld-CEST-Bildgebung reich an Informationen
über verschiedene chemisch relevante Gewebekomponenten- und Eigenschaften,
wie pH, Glutamat, Phosphokreatin, Proteingehalt und Lipide. In dieser Arbeit
zeige ich, dass eine globale Klassifizierung des Gehirns somit möglich wird.
Methoden. Ich rekrutierte 38 Probanden. Mit dem 7-Tesla-MRT-Scanner wur-
den zwei hochaufgelöste Datensätze für die Goldstandard-Segmentierung auf-
genommen: Ein 3D ptx T1 -gewichteter MPRAGE-Datensatz (0.65ˆ0.65ˆ0.65
mm³) und ein QSM-Datensatz (Quantitative Susceptibility Mapping, 0.6ˆ0.6ˆ
0.6 mm³) [85, 90]. Die CEST-MRT wurde mit einer Snapshot-Sequenz (1.8ˆ1.8
ˆ3 mm³) durchgeführt [91] mit zwei verschiedenen Sättigungs-B1-Niveaus von

                                       5

0,7 und 1,0 µT , vorgesättigt bei jeweils 56 verschiedenen Offsets. Für den
Vorsättigungspulszug wurde die Multiple Interleaved Mode Sättigungstechnik
(MIMOSA) verwendet [93]. Nach Korrekturen für Bewegung, B1 - und B0 -
Inhomogenitäten wurden die CEST-Peaks mit einem 5-Lorentz-Pool-Modell vo-
xelweise gefittet. Diffusionsgewichtete Bilder wurden mit einer echoplanaren
Spinecho-Sequenz [98] mit linearen, planaren und sphärischen b-Tensoren und
b-Werten 0, 100, 500, 1000, 1500 und 2000 s/mm² aufgenommen (1,5ˆ1,5ˆ3
mm³). Ich führte Korrekturen für Bewegungen, Wirbelstrom-Effekte und Bild-
verzerrungen durch. Diffusions- und Kovarianztensoren wurden voxelweise an-
gepasst und die in [85] beschriebenen Diffusionsmetriken bestimmt. Die Daten
wurden mittels des FSL-Registrierungstools FLIRT koregistriert auf den Ziel-
datensatz MPRAGE. Der Klassifikationsansatz ist in Abb. 2 visualisiert. Es
wurde ein Datensatz von allen Probanden außer einem jungen Mann (Test-
datensatz) erstellt. Für jedes Voxel wurden die lokalen 15 QTI-Parameter,
210 diffusionsgewichteten Signale, vier Lorentz-CEST-Amplituden-Parameter
und 112 z-Spektrumswerte aus den Bildern extrahiert, koregistert und auf den
MPRAGE-Raum interpoliert und in einem 2D-Array mit 6 ¨ 106 Zeilen und 341
Spalten gespeichert. Die entsprechende anatomische Region wurde in einem
one-hot-kodierten 2D-Array mit 6 ¨ 106 Zeilen und 102 Spalten gespeichert. Die
beiden Datensätze wurden dann gemischt, gesplittet und auf einen Mittelwert
von Null und eine Einheitsvarianz normalisiert und enthielten keine räumlichen
Informationen mehr. Ich definierte ein vollständig verbundenes neuronales Netz-
werk in TensorFlow Keras [81]. Nach dem Training wurde das Netzwerk verwen-
det, um eine voxelweise Vorhersage für den Testteilnehmer durchzuführen. Die
Genauigkeit, definiert als Anzahl der korrekt klassifizierten Voxel geteilt durch
die Gesamtzahl der Voxel, wurde berechnet, zudem wurde eine Kreuzvalidie-
rung durchgeführt. Um das Klassifikationsprinzip zu untersuchen, berechnete
ich u.a. die Salienz-Vektoren [77] gemittelt über die jeweiligen Regionen des
Testteilnehmers.
Ergebnisse und Beobachtungen. In Erweiterung früherer Arbeiten zur globa-
len Gehirnklassifikation habe ich neuartige hochdimensionale Kontraste in den
Eingabedatenraum aufgenommen. Beim Ansatz der Einzel-Voxel-Klassifikation
dient die räumliche Kohärenz der Vorhersageergebnisse in benachbarten Voxeln
als inhärente Metrik der Zuverlässigkeit. Aktuelle atlasbasierte Klassifizierungs-
ansätze übertreffen die beobachtete Genauigkeit von 60% [60]. Es ist verlockend,
die MR-basierten Muster als analog zu histologischen Gewebefingerabdrücken zu
betrachten [4, 5]. Mögliche Störfaktoren können B0- oder B1-Inhomogenitäten
sein, die bei 7 Tesla im Vergleich zu niedrigeren Feldstärken zunehmen, und
welche das Netzwerk zur Klassifizierung mit verwenden könnte (erweiterte Dis-
kussion siehe [103]). Bei den CEST-Daten wurden die B0 -und B1 -Inhomogenität
mit dem MIMOSA-Ansatz sowie erfassten Feldkarten adressiert.
Schlussfolgerungen. Die Einzel-Voxel-Klassifikation von Hirngewebe basierend
auf Hochfeld-Diffusions- und CEST-Merkmalen erreicht eine hohe Genauigkeit.
Dies deutet darauf hin, dass einzigartige Merkmale von Hirnregionen nicht nur
durch die Histologie, sondern auch durch Einzel-Voxel-MR-Signaturen erkenn-
bar sind.

                                        6

2    Introduction
                An image of an object may be defined as a graphical representation
                of the spatial distribution of one or more of its properties.

                                                   Paul Lauterbur (1929-2007), [24]

In medicine, magnetic resonance imaging (MRI) is a versatile tool used to ex-
plore the body noninvasively. Among the main sectional imaging modalities,
the underlying nuclear magnetic resonance effect (NMR) is considered to in-
volve the least biological interference compared to diagnostic ultrasonography,
which relies on sound waves, computed tomography, which relies on x-rays, and
single-photon and positron-emission-tomography, which rely on radiotracers.

Since the inception of all these methods in the second half of the 20th cen-
tury, there has been a steadily increasing variety of available image acquisition
techniques and contrast agents, each yielding 3D images depending on different
physical, chemical, and biological tissue properties. In current clinical practice,
these individual monochromatic images are reviewed by radiologists or by au-
tomated image processing. It is common practice to display multiple images
simultaneously (e.g., as a 2 ˆ 2 grid on a monitor screen) or to feed them to
image processing algorithms (e.g., with up to four channels) [96].

In contrast, image information is discarded in this thesis, and a single-voxel
paradigm is pursued instead. In general, this paradigm means inference of
biological tissue properties of interest from the imaged tissue properties at hand.
In particular, the feasibility of whole-brain segmentation using only local MR
tissue properties is shown:

[103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach,
Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin
Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred-
erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral
MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https:
//doi.org/10.1016/j.neuroimage.2021.117986.

This frame text aims to introduce the MRI sequences that were deployed for
this study in Section 3 and the relevant machine learning (ML) concepts in
Section 4. The history of the single-voxel paradigm and brain segmentation will
be elaborated upon in Section 5. Finally, the limitations and implications of
the present work will be discussed in Section 6. In 2021, [103] was awarded the
Gorter prize of the German Chapter of the International Society for Magnetic
Resonance in Medicine.

                                        7

3     Magnetic Resonance
MRI was introduced by [24] in 1973. It relies on NMR, which is a phenomenon
of absorption and emission of electromagnetic fields that occurs in all nuclei
with an odd number of protons or neutrons exposed to a static magnetic field.
NMR was discovered by [7, 9, 10] in 1938 and 1946. This section is based on
[75] and will only deal with the isotope 1H, the proton, which is abundant in
organic material.

3.1   Nuclear Magnetic Resonance
                                                                              ÝÑ
An NMR experiment requires a strong magnet, generating the static field B0
and coils that transmit and receive radiofrequency (RF) fields. The magnetic
                                      Ý
                                      Ñ
moment vector Ý Ñµ of 1H exposed to B0 performs a spinning process depicted in
Figure 1 referred to as precession, with angular frequency ω0 , given by Larmor’s
Equation
                                    ω0 “ γ 0 B0                               (1)
where γ0 denotes the gyromagnetic ratio of 2.68 x 108 rad/s/tesla. The MR
probe will resonate (i.e., absorb and emit RF fields at this frequency). For the
7-tesla scanner used for this thesis, this resonance results in a radio signal at
approximately 298 Megahertz with a wavelength of roughly one meter in air
and 13 cm in human tissue.

The detectable MR signal strength depends on the equilibrium longitudinal
                ÝÑ              ÝÑ               Ý
                                                 Ñ
magnetization M0 of the probe. M0 is parallel to B0 and depends on the ther-
mal energy compared to the quantum energy difference between parallel and
                                   Ý
                                   Ñ
anti-parallel alignment of Ý
                           Ñ
                           µ along B0 . At the human body temperature T of
310 Kelvin, M0 can be approximated by

                                       ρ0 γ02 ℏ2
                                M0 «             B0                          (2)
                                        4kT
where ρ0 denotes the proton density of the probe, ℏ the reduced Planck constant,
and k the Boltzmann constant. It can be shown that the voltage induced in
the receive coil that forms the MR signal is proportional to ω0 M0 . Plugging
Equation (1) and (2) into ω0 and M0 shows the benefit of using high field
strengths:
                              MR signal 9 ρ0 B02 .                           (3)
                           Ý
                           Ñ
Let z be the direction of B0 . Excitation by a RF pulse is equivalent to ro-
                                          Ý
                                          Ñ      ÝÑ ÝÝÑ
tating the current magnetization vector M :“ Mz ` MK into the transversal
                       Ý
                       Ñ
plane. Subsequently, M is subject to two independent and simultaneous pro-
cesses termed relaxation:
                                                                 ÝÑ
   • Longitudinal relaxation: the longitudinal magnetization Mz grows back
           ÝÑ
      into M0 by energy transfer to the surroundings. After 90° rotation, the

                                        8

Ý
                                         Ñ
                                         B0

                                ω0            Ñ
                                              Ý
                                              µ

               Figure 1: Illustration of Larmor precession

  growth of Mz is given by

                           Mz ptq “ M0 p1 ´ e´t{T1 q                     (4)

  where T1 denotes the spin-lattice relaxation time.
                                                          ÝÝÑ
• Transversal relaxation: the transversal magnetization MK is subject to
                                       ÝÝÑ
  precession, causing the MR signal. MK decays by dephasing of the in-
  dividual magnetic moments it is composed of. On the one hand, this
  dispersion occurs due to spin-spin interaction, which is irreversible, and
  quantified using the T2 spin-spin relaxation time. On the other hand,
  dispersion happens due to constant local field inhomogeneities (e.g., iron
  deposits), which is quantified by Tinhom . The latter is reversible by ap-
  plying a 180° refocusing pulse. After 90° rotation, the decay of MK and,
  thus, the MR signal is approximately given by

                        MK ptq “ M0 e´t{T2 e´t{Tinhom .                  (5)

  Therefore, the free induction decay (FID) after RF excitation of the MR
  signal with time constant T2˚ is approximately given by

                              1    1       1
                                 “    `        .                         (6)
                             T2˚   T2   Tinhom

                                     9

Assume the probe is excited at regular intervals given by a repetition time T R,
the MR signal is sampled after an echo time T E ă T R, and we are using a
spinecho sequence (i.e., a 180° refocusing pulse is emitted waiting a time T2E
after a 90° excitation pulse). By Equation (4) and (5), the strength of our MR
signal in the steady state will be proportional to

                      MK pT Eq “ M0 p1 ´ e´T R{T1 qe´T E{T2 .                    (7)

Simply by varying the parameters T R and T E, we can determine the four
properties M0 , T1 , T2 , and T2˚ of the probe.

3.2    Magnetic Resonance Imaging
The NMR phenomenon described in Section 3.1 can be used to form an image.
According to the Larmor Equation (1), spatial variation of the magnetic field
ÝÑ                    Ý
                      Ñ
B0 with a gradient G will result in local differences in radio-field absorption
and emission of the probe. This spatial encoding can be achieved, for example,
                                                                        Ý
                                                                        Ñ
by generating three orthogonal gradients of the z-component of B0 , pointing
along the x, y, or z direction. In the seminal paper [24], the gradient was used
to produce one-dimensional projections of multiple probe rotations, which were
then fed to an image reconstruction algorithm. A common current method for
image formation is Fourier imaging: first, we use a xy-slice selection gradient
Gz during the RF pulse and a transient phase encoding gradient Gy afterward.
During the readout, a frequency encoding gradient Gx converts the MR signal
into a spectrum around ω0 , resolved along the x-axis. Repeating this procedure
for different Gy is equivalent to acquiring lines of the two-dimensional Fourier
transform of the selected xy-slice’s signal, referred to as the k-space signal. After
completion, k-space data can be converted to an image using algorithms such as
[17, 2]. Typically, only the magnitude image is used, whereas the phase image is
discarded. Three-dimensional Fourier-based approaches are replacing the slice
selection gradient by another phase encoding gradient. There are receiver coil
sensitivity encoding techniques, which are complementary to Fourier encoding
[48]. In quantitative susceptibility mapping (QSM), the phase image is used
to infer the magnetic field from frequency shift, which is then converted to
the spatial distribution of magnetic susceptibility χ by field-to-source inversion
[51]. In this thesis, independent operation of multiple transmit coils was used
to alleviate RF excitation inhomogeneity at 7 tesla [106, 93].

As already noted by Lauterbur [24], in contrast to other imaging techniques, the
resolution of MRI is not limited by the MR signal wavelength: at 7-tesla field
strength, the human brain has been resolved to 100 micrometers [92], which
is one 10000th of the wavelength. Isotropic resolution of 2.8 micrometers has
been achieved at low temperatures using a microcoil [88]. Medical applications
of MRI use its fine resolution and, even more importantly, its rich soft tissue
contrast. Different tissues behave individually regarding χ, ρ0 , T1 , T2 , and
T2˚ , and many pathological processes delineate well from their surroundings.

                                         10

Resolvable tissue properties extend far beyond these constants using techniques
including
  • In-phase/Out-of-phase imaging: in organic material, 1H mainly occurs in
    water (H2 O) and fat (CH2 and CH3 ). The precession frequency of fat
    is 3.35 ppm lower than water, and this difference can be harnessed for
    water-fat separation by slightly adjusting T E.
  • Inversion recovery: due to their different T1 times, tissues can be selec-
    tively suppressed by transmitting 180° RF pulses at an inversion time
    T I before regular excitation. This approach is commonly used to null
    cerebrospinal fluid (fluid-attenuated inversion recovery, FLAIR) and fat
    (short TI inversion recovery, STIR)
  • Diffusion gradients: by sequentially applying strong, opposite gradients,
    additional spin dephasing is generated by Brownian motion. Restriction
    of free Brownian motion serves as a rich source to detect tissue properties.
    The approach used for this thesis is described in Section 3.3.

  • Off-resonant RF pulses: presaturation or excitation with RF fields alter-
    nating with a slightly lower or higher frequency than the Larmor frequency
    of 1H in water can be used to detect 1H occurring in other organic com-
    pounds. The strategy used for this thesis is described in Section 3.4.
  • Perfusion: tissue perfusion can be assessed by arterial spin labeling (ASL)
    or administering contrast agents. Besides, there are different angiographic
    methods. The fact that deoxyhemoglobin is paramagnetic (χ ą 0), whereas
    oxyhemoglobin is diamagnetic (χ ă 0) leads to a difference in T2˚ , which
    can be harnessed for functional brain studies.
  • X-nuclei: using MRI setups suitable for low signal-to-noise ratios and
    short T E, other nuclei with magnetic moments can be assessed, such as
    17
      O, 23Na, 31P, 35Cl, 39K, and hyperpolarized gases, including 3He and
    129
       Xe.
On the one hand, different MR properties are mostly independent and diﬀicult
to predict for individual tissues a priori. On the other hand, different MR im-
ages of the body depict the identical configuration of tissues provided they were
acquired at the same point in time and corrected for motion. This identical
configuration results in a high degree of mutual information in images of differ-
ent MR properties. Flexible acquisition techniques to capture these redundant
properties simultaneously are being investigated [73, 99]. Recently, deep rein-
forcement learning (Deep RL) has been shown to be a powerful tool for the
operation of complex machines (e.g., robots [94] and tokamaks [116]). Deep RL
might pave the way to even more flexible MRI setups, acquisition, and image
reconstruction techniques [111, 113].

                                       11

3.3    q-Space Trajectory Imaging
Since antiquity, random motion of particles in fluids and gases has been ob-
served. This process is referred to as Brownian motion or diffusion, and the
probability density function f px, tq for the location of a particle starting from
the origin at time point t in a medium with diffusion constant
                                                             ? D along any given
axis x was linked to a Gaussian distribution with width 2Dt by Einstein in
1905 [3]:
                                            1       x2
                              f px, tq “ ?      e´ 4Dt                          (8)
                                           4πDt
Diffusion of 1H, typically bound to H2 O, can be observed by NMR, as demon-
strated in 1965 by Stejskal and Tanner [18, 19]. Diffusion of 1H was first spatially
resolved by Le Bihan and Breton in 1985 [27, 29]. Assuming isotropic Gaussian
                                                                             Ý
                                                                             Ñ
diffusion, the signal strength S after applying bipolar diffusion gradients Gd can
be expressed as
                                             ln S0 ´ ln Spbq
                    Spbq “ S0 e´bD ðñ D “                                   (9)
                                                    b
                                                ÝÑ
where S0 denotes the baseline signal with zero Gd , D the diffusion coeﬀicient,
                                                          Ý
                                                          Ñ
whereas b is a measure of the spin dephasing effect of Gd . b is obtained by
                              Ý
                              Ñ       ÝÑ
defining an integral quantity q of ˘Gd ,
                                        żt
                             Ý
                             Ñ             Ý
                                           Ñ
                              q ptq :“ γ Gd pτ qdτ                         (10)
                                          0

and integrating the square norm of Ý
                                   Ñq over the echo time
                                ż TE
                           b :“      Ý
                                     Ñ
                                     q ptqJ Ý
                                            Ñ
                                            q ptqdt.                           (11)
                                   0

For example, a rectangular gradient pulse of strength G with duration δ and
activation delay ∆ for the opposite gradient can be defined by
                            $         J
                            ’p0, 0, Gq ,
                            ’               if 0 ď t ă δ
                            &p0, 0, 0qJ ,
                            ’
                                            if δ ď t ă ∆
                  Ý
                  Ñ
                  Gd ptq :“               J
                                                                        (12)
                            ’p0, 0, ´Gq , if ∆ ď t ă δ ` ∆
                            ’
                            ’
                             p0, 0, 0qJ ,   if δ ` ∆ ď t ď T E
                            %

The b-value for this particular gradient is
                                                  δ
                               b “ γ 2 G2 δ 2 p∆ ´ q,                          (13)
                                                  3
That is to say, gradients that are long, strong, and far apart yield high diffusion
                                                                                  2
weighting. The diffusion coeﬀicient for water at 35 °C is approximately 2.9 µm ms2
[25]. Due to Equation (8), after a typical gradient duration
                                                      ?      of 50 ms, around one
third of H2 O particles will have moved further than 2Dt « 17µm. Diffusion in

                                        12

tissue is neither isotropic nor Gaussian because it is compartmentalized by lipid
bilayers and densely packed with macromolecules at smaller scales than this dis-
placement (e.g., see https://doi.org/10.7554/eLife.25916.007). Acknowledging
this fact prompts the use of multiple gradient directions and strengths. Sub-
                              Ý
                              Ñ
jecting a probe to different G can be conceptualized as assigning the measured
D from identity (9) to the coordinates of Ý    Ñ
                                               q pδq. This three-dimensional data
space is referred to as q-space [65]. A further step involves deploying different
non-rectangular gradient profiles. Because different gradient profiles correspond
to reaching the same point Ý  Ñq pδq by different trajectories, it is termed q-space
trajectory imaging (QTI, [85, 98]).

QTI can be formally shown to disentangle tissue microstructure by the diffusion
tensor distribution model: approximate the diffusion environment of each voxel
by a three-dimensional close-packing of small microenvironments, the diffusion
properties of which are described by individual second-order symmetric diffusion
tensors D. Assign to each voxel the probability distribution of a random variable
d when sampling from these diffusion tensors. For the sake of simplicity, we
resort to the Mandel notation:
                                         ?      ?      ?
                  d :“ pD11 , D22 , D33 , 2D23 , 2D13 , 2D12 q               (14)

Different distributions of d might have the same expectation Erds but still
different auto-covariance matrices defined by

                      covrd, ds :“ Erd b ds ´ Erds b Erds.                     (15)

In terms of their sensitivity to diffusion microenvironment shapes, q-space tra-
jectories can be grouped by a generalization of the b-value termed b-tensor B:
                                    ż TE
                             B :“          Ý
                                           Ñ
                                           q ptqÝ
                                                Ñ
                                                q ptqJ dt.                     (16)
                                     0

Let b denote the Mandel notation of B as in (14). B naturally extends b with
respect to Equation (9):
                                            J
                           SpBq “ ErS0 e´b dq s                         (17)
This expectation can be approximated by Erds and covrd, ds:

                  SpBq             1
                       « e´b Erds ` Trppb b bqpcovrd, dsqJ q
                            J
                                                                               (18)
                   S0              2
Hence, Erds and covrd, ds can in turn be approximated from a suﬀicient amount
of b-tensors of different shapes, sizes, and orientations. Rotationally invariant
parameters of the microstructure can be calculated from covrd, ds using the
matrices Eiso and Eshear :

                                           13

(a) F A                                      (b) µF A

Figure 2: Approximated macroscopic (a) and microscopic (b) fractional
anisotropy maps of an axial brain slice of a healthy volunteer.

           ¨                        ˛               ¨                             ˛
            1   0   0     0   0   0                   2   ´1    ´1      0   0   0
           ˚0   1   0     0   0   0‹                ˚´1   2     ´1      0   0   0‹
           ˚                        ‹               ˚                             ‹
          1˚0   0   1     0   0   0‹‹ ; Eshear :“ 1 ˚´1
                                                    ˚     ´1    2       0   0   0‹‹ (19)
 Eiso   :“ ˚
           ˚0
          3˚    0   0     1   0   0‹              9˚˚0    0     0       3   0   0‹
                                    ‹
                                                                                  ‹
           ˝0   0   0     0   1   0‚                ˝0    0     0       0   3   0‚
            0   0   0     0   0   1                   0   0     0       0   0   3

With the Frobenius inner product, Eiso and Eshear can, for example, be used to
calculate the macroscopic fractional anisotropy F A of Erds and the microscopic
fractional anisotropy µF A of the distribution of d.

                                  3 TrppErds b ErdsqEshear
                                                     J
                                                           q
                        F A2 :“                                                    (20)
                                  2 TrppErds b ErdsqEiso
                                                      J q

                                                   J
                                     3 TrpErd b dsEshear q
                          µF A2 :“                  J q
                                                                                   (21)
                                     2 TrpErd b dsEiso
µF A is more sensitive than F A to the microscopic anisotropy occurring, for
example, in crossing fibers, as shown in Figure 2. There are concepts that
go even further than QTI but are not yet clinically available: by retrieving
phase information from q-space after a combination of ultra-high long and short
diffusion gradients, even microscopic shapes of diffusion restricting boundaries
can be inferred. This ability was demonstrated by Laun in 2011 [67] and is
termed diffusion pore imaging (DPI).

3.4     Chemical Exchange Saturation Transfer
Depending on its chemical species, 1H experiences slightly varied strengths of
                          ÝÑ
the static magnetic field B0 . According to Equation (1), this corresponds to
variations in the angular frequency ωp of the pool of 1H occurring in a specific

                                          14

chemical compound. This phenomenon is referred to as chemical shift and
was discovered soon after NMR itself [11]. The RF separation of different ωp
increases linearly with B0 . ωp is commonly normalized using the water pool:
                                         ωp ´ ω0
                                ∆ωp :“                                       (22)
                                           ω0
By broadband RF excitation and observation of the MR signal during FID,
a spectrum generated by pools of chemical compounds can be acquired. This
technique is termed nuclear magnetic resonance spectroscopy (MRS), which was
pioneered by Ernst in the 1970s and 80s [20, 26]. MRS has become a principal
analytical tool for the identification of molecules and structural elucidation in
chemistry and biology [66]. In routine medical imaging, MRS plays a role in
brain tumor confirmation, which shows decreased levels of N-acetyl-aspartate
and increased choline [30]. Due to the low concentration of the 1H pools cor-
responding to molecular species of interest in tissue, MRS requires on-resonant
presaturation of the water pool before off-resonant excitation and has a low
signal-to-noise ratio (SNR).

Chemical exchange saturation transfer (CEST [46, 49, 56]) chooses a differ-
ent approach: the rare 1H pools are addressed by off-resonant presaturation,
whereas only the water pool is observed directly. For CEST, off-resonant pre-
saturation is followed by on-resonant excitation repeatedly. This strategy leads
to signal amplification from rare pools by their rate kp of saturation transfer to
the water pool. The measured water pool attenuation for different ∆ω is referred
to as Z-spectrum Zp∆ωq. Saturation transfer occurs due to cross-relaxation and
1
  H chemical exchange. In systems of two spins, cross-relaxation gives rise to the
nuclear Overhauser effect (NOE), which influences longitudinal (T1 ) relaxation
and depends on interaction distance and mobility, as first described by Solomon
in 1955 [12]. In brain tissue, the main contributing effects to the Z-spectrum
are:

  • Direct presaturation of the water pool. ∆ωwater P[-1 ppm,+1 ppm].
  • Semi-solid macromolecular magnetization transfer (MT, [33]): surfaces
    of macromolecular structures such as proteins and cell-membranes have
    a very broad resonance frequency ∆ω. Transfer occurs due to chemical
    exchange and cross-relaxation with the local water pool with restricted
    mobility. ∆ωMT P[-100 ppm,+100 ppm].
  • Relayed-NOE [72]: saturation due to intramolecular NOE in mobile macro-
    molecules such as soluble proteins and lipids relays to the unrestricted
    water pool at ∆ωNOE P[-2 ppm,-5 ppm].
  • Amide proton transfer (CEST-APT [54]): proteins are polyamides of
    amino acids, and the CO – NH bond chemically exchanges its proton with
    the water pool at an approximate rate of 28 s´1 . ∆ωamide « 3.5 ppm.

                                       15

• Amine proton transfer (CEST-APEX [70]): amino groups ( – NH2 ) (e.g.,
    of lysine and arginine) perform rapid chemical exchange of protons with
    the water pool at a rate of 700– 10000 s´1 [42]. ∆ωamine « 2 ppm.
The increased SNR of CEST comes at the cost of a number of confounding
factors that have to be accounted for:
  • Spillover dilution: there is unintended direct water presaturation.
  • T1 -scaling: water pools with short T1 can accumulate less transferred pre-
    saturation compared to water pools with long T1 .
  • B0 : throughout the imaging volume, inhomogeneity of the static field
    leads to location-dependent scaling of ωp .
  • B1 : inhomogeneity of the RF pulse for presaturation results in location-
    dependent variation of Z-spectrum attenuation.
  • Temperature and pH: the rate of chemical exchange kex of 1H is dependent
    on temperature and pH [23]. This dependence has been harnessed for a
    pH-sensitive contrast by Zhou et al. [55].
  • Overlay: the main pools contributing to the Z-spectra have significant
    overlap with respect to ωp .
In 2012, Zaiss demonstrated that the effects add up inversely to form the Z-
spectrum [71, 79]. This inverse metric can be used to construct a spillover-
robust parameter for each pool referred to as M T RRex ([80]) by decomposing
the Z-spectrum into Lorentzian functions Lp for each pool of the form:
                                                 Γ2p
                       Lp p∆ωq :“ Ap                        ,                (23)
                                       Γ2p   ` 4p∆ω ´ δp q2
where Ap , Γp , δp are pool-specific empiric starting parameters with boundaries
for a fit algorithm such as [8] optimizing
                                      !
                                           ÿ
                             Zp∆ωq “ c ´ Lp p∆ωq,                          (24)
                                             p

where c is a constant signal reduction. For all five pools, a reference Z-spectrum
Zref,p :“ Zlab ` Lp can be calculated from the fitted Z-spectrum Zlab and the
respective Lorentzian Lp , yielding
                                           1           1
                 M T RRex p∆ωp q “              ´              .             (25)
                                     Zlab p∆ωp q Zref,p p∆ωp q
In this thesis, Z-spectra have further been corrected for B0 inhomogeneity, using
the water peak, and for B1 inhomogeneity, using a B1 map, two B1 values,
and a homogeneous presaturation pulse [93]. In Figure 3, the marked effect
of different chemical composition on Z-spectra is demonstrated in a phantom
measurement. I acquired these data prior to the actual study, using the same
scanner and CEST sequence as in [103], with a reduced offset list.

                                       16

1

                    0.9

                    0.8

                    0.7

                    0.6

               )
               Z(   0.5

                    0.4

                    0.3

                    0.2

                    0.1

                     0
                      10   8   6   4   2        0   -2   -4   -6   -8   -10
                                            [ppm]
                     1

                    0.9

                    0.8

                    0.7

                    0.6
               )

                    0.5
               Z(

                    0.4

                    0.3

                    0.2

                    0.1

                     0
                      10   8   6   4   2        0   -2   -4   -6   -8   -10
                                            [ppm]
                     1

                    0.9

                    0.8

                    0.7

                    0.6
               )

                    0.5
               Z(

                    0.4

                    0.3

                    0.2

                    0.1

                     0
                      10   8   6   4   2        0   -2   -4   -6   -8   -10
                                            [ppm]

Figure 3: Single-voxel Z-spectra in test tubes with different chemical compounds
measured simultaneously in a phantom at 7 tesla. Top: 0.9% sodium chloride,
middle: 125 mM creatine, bottom: egg white. Interpolation method: linear.

                                           17

4     Machine Learning
The term machine learning was popularized in 1959 by Arthur Samuel in his
paper [16], which describes his seminal work on computer checkers when working
for IBM [37]. The modern definition for machine learning is provided by [45]:
“A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P , if its performance at tasks in
T , as measured by P , improves with experience E.” In the common case of
supervised learning, E is a dataset of features, annotated with a label or target.
Typical examples for T are classification and regression, along with accuracy
and mean-squared error as P , respectively [84]. Central problems in the field of
artificial intelligence have been solved by a machine learning technique termed
deep learning [84]. The aim of Section 4.1 is to provide a formal definition of ML,
whereas the deep learning algorithm used for this thesis is introduced in Section
4.2. Finally, the mystery of deep learning and recent progress in understanding
it is briefly outlined in Section 4.3.

4.1    Theory of Learning
A core task of machine learning (ML) is to infer a binary classification rule from
labeled data. This process is formalized in terms of statistical learning theory
given in the definitions below.
Definition 4.1. (ML setting) The training dataset S “ txp1q , xp2q , ..., xpnq u
and new data are drawn independently and at random from a probability distri-
bution D over the instance space X. It is labeled by a target concept c˚ Ď X,
corresponding to the positive class of the binary classification.
Example 4.2. One medical application of ML could be to decide whether a new
patient has a particular disease. Possible instance spaces X include t0, 1ud (e.g.,
for d Boolean-valued sign-and-symptom features of the patient) and Rd (e.g., for
d blood test results). The target concept c˚ would then correspond to the proper
diagnosis, and the training dataset S would be a representative collection of old
patient records consisting of the clinical or laboratory findings, respectively.
Definition 4.3. (ML task) Using S, we choose a hypothesis h from a hypothe-
sis class H Ď PpXq, where PpXq is the power set of X. Our goal is to minimize
the true error of h
                                #ş
                            ˚           f pxqdx if D is continuous
                                   h△c˚ D
        errD phq “ P robph△c q “ ř
                                    ωPh△c˚ fD pωq if D is discrete

where △ denotes the symmetric difference h△c˚ “ h Y c˚ zph X c˚ q, and fD
denotes the probability density or mass function of D. We can only measure the
training error of h
                                       |S X ph△c˚ q|
                            errS phq “               .
                                            |S|

                                        18

For any given h, we refer to the situation when errS phq and errD phq are both
low, generalization of h. The opposite, when errS phq is low but errD phq is high
is termed overfitting of h and is a major issue in ML. For a given confidence
level 1 ´ δ and generalization measure ϵ, we are, therefore, seeking a bound
for the sample size n “ |S|. When we do not achieve a perfect fit of S (i.e.,
errS phq ą 0), we refer to this as the non-realizable case and formalize the
corresponding generalization guarantee as follows:
Definition 4.4. (ML uniform convergence) Consider a fixed H and ϵ, δ ą 0,
and let S be drawn from D with size |S|. Then, the smallest possible sample
size m :“ mpH, ϵ, δq is termed uniform convergence sample complexity if with a
probability greater than or equal to 1 ´ δ for every h in H, errS phq, and errD phq
differ by no more than ϵ for any |S| ě m regardless of D:

                P rob p@h P H : |errS phq ´ errD phq| ď ϵq ě 1 ´ δ.

If we assume |H| to be finite, it is possible to derive a uniform convergence
sample complexity of the form
                                    ˆ             ˙
                                1               2
                          m “ 2 ln |H| ` ln         .
                               2ϵ               δ
A tighter measure of the complexity of H that also extends to some cases when
|H| is infinite is the Vapnik– Chervonenkis Dimension VCdimpHq, introduced
by [22].
Definition 4.5. (Shattering) H shatters a set A if each subset of A can be
expressed as A X h, h P H:

                             PpAq “ tA X h|h P Hu.

Definition 4.6. (VC-dimension) The VC dimension VCdimpHq is the size
of the largest set shattered by H.

4.2    Artificial Neural Networks
Consider the problem of finding a linear separator for a labeled training dataset
S “ txp1q , xp2q ..., xpnq u from the instance space X “ Rd . Let l P t´1, 1un be
our labeling vector. We have to find a vector and threshold pair w P Rd , t P R
satisfying
                          pwJ xpiq ´ tqli ą 0 for all 1 ď i ď n.             (26)
This linear separator is referred to as perceptron and was the first and most
simple artificial neural network introduced by Frank Rosenblatt [13, 15] and
was celebrated as a ”digital brain” [14]. It comprises just one neuron, the sum
as an input function, and the signum function as the activation function, as
visualized in Figure 4. A neural network (NN) can be viewed as an acyclic
directed graph composed of perceptrons, and a general architecture is the fully
connected feedforward NN (FC-NN), [100].

                                        19

1       t     bias

                                        I1      w1

                                                w2
                                                                 ř
                      xpiq              I2                                                li

                  data                   ..      ..             Input      Activation
                                                                                        label
                                          .       .             function   function

                                        Id      wd    weights

                                       Input layer

                                              Figure 4: The Perceptron

Definition 4.7. (FC-NN) An FC-NN Φ is given by its activation function
ϱ : R Ñ R, its number of layers L P N and neurons N P NL`1 , which denotes
the number of neurons in the input layer N0 P N, each hidden layer Nl P N, 1 ď
l ď L ´ 1, and the output layer NL P N. The number of parameters is given by
                                                          L
                                                          ÿ
                                              P pN q :“         Nl Nl´1 ` Nl                    (27)
                                                          l“1

and the parameters are denoted by
                                                          L
                                                          ą
                             θ “ ppW plq , bplq qqL
                                                  l“1 P         RNl ˆNl´1 ˆ RNl “ RP pN q .     (28)
                                                          l“1

For 1 ď l ď L we define a recursive sequence, starting with the input vector
Φp0q px, θq :“ x P RN0 :
                                     ´                           ¯
                     Φplq px, θq :“ ϱ W plq Φpl´1q px, θq ` bplq        (29)

The FC-NN is given by the function
                         Φpx, θq : RN0 ˆ RP pN q Ñ RNL ; Φpx, θq ÞÑ ΦpLq px, θq.                (30)
     plq        plq            plq
W          ,b         and Φ          are referred to as the weights, biases, and activations of layer
l.
A common choice for the one-variable activation function is ϱpxq :“ maxt0, xu,
where ϱ is applied component wise in (29). For classification tasks, it is common
practice to encode class i by the canonical basis vector ei P RNL , which is
referred to as one-hot encoding. In the last recursion step (29), ϱ is replaced by
the softmax activation function
                                             ezi
               σ : RNL Ñ RNL ; σi pzq ÞÑ řNL         for 1 ď i ď NL .         (31)
                                                  zj
                                            j“1 e

                                                              20

N       řNL
The softmax activation function assures σpzq P r0, 1s L and i“1         σi pzq “ 1,
normalizing ΦpLq to a probability mass function. The parameters θ of Φ are
typically optimized by a loss function L pΦpx, θq, yq for each input x and labeling
y and a gradient-based method. This method is an iterative algorithm with an
update rule using the pointwise derivative and a step size or learning rate η:

                    θpi`1q :“ θpiq ´ η∇θ L pΦpxpiq , θpiq q, ypiq q           (32)

In practice, the adaptive moments (Adam [76]) modification of a stochastic
gradient descent is commonly employed. In this thesis, the categorical cross-
entropy loss function was used, featuring a small α to prevent blow-up and an
additional L2-regularization term penalizing large parameters scaled by a small
β:
                                  N
                                  ÿ L

              L pΦpx, θq, yq :“ ´     Φi px, θq log2 pyi ` αq ` β∥θ∥22     (33)
                                    i“1

Dropout [78] is a further empirical regularization technique that was employed
and that can be expressed as multiplying each component of each hidden layer
Φplq px, θq in the recursive rule (29) with Bernoulli distributed random variables
Bi „ Bernppq, 1 ď i ď Nl , where p denotes the dropout rate. When using
one-hot encoding, the importance of each dimension of input data xpiq to a
classification output Φj px0 , θq of class j can be assessed using saliency [74],
which is determined by evaluating the local derivative
                                   BΦj px, θq
                                                       .                      (34)
                                      Bx        xpiq

4.3    The generalization puzzle
The growth function uniform convergence theorem [22] using VCdim from sta-
tistical learning theory and the bound VCdimpΦq P OpP pN qL log P pN qq ([86]
using Bachmann-Landau notation) can be employed to prove a bound for the
generalization error of an FC-NN with fixed L and a constant c [100]:
                                              d
                                                P pN q log P pN q
                    |errD phq ´ errS phq| ď c                           (35)
                                                       |S|

Thus, one would at least demand that |S| be larger than P pN q. By contrast,
impressive generalization performance is achieved with NNs containing orders
of magnitude more parameters than training data. Not only is bound (35)
rendered vacuous by practical experience, but there are even experimental re-
sults demonstrating convergence to zero training error of the same architecture
on datasets with random labels with barely increased optimization effort [114].
Even convergence on random noise is achieved with some more effort. The brain
is another example of heavy overparametrization compared to its amount of ex-
perience [57]. The apparent contradiction between these empirical results and
statistical learning theory is referred to as the generalization puzzle and has not

                                          21

been resolved despite significant theoretical work. A new, powerful explanation
is offered by the lottery ticket hypothesis, stating that ”a neural network contains
a subnetwork that matches the performance of the trained network already at
initialization”. The lottery ticket hypothesis was an empirical serendipity3 of
Frankle and Carbin in 2018 [89], corroborated by [95].

Apart from surprising generalization capabilities, NNs show outstanding per-
formance with respect to approximation of S and optimization of θ, despite the
non-convexity of the gradient ∇θ L pΦpx, θq, yq, and they perform exception-
ally well on high-dimensional data [100]. Special properties of the data can be
hard-coded into more specialized network architectures, notably the convolu-
tional NN introduced by LeCun in 1989 [32], which harnesses the relatedness
of neighboring pixels by using convolutions with multiple small kernels instead
of a full-matrix multiplication in the recursion rule (29). By refining the FC-
NN with attention mechanisms (i.e., adaptive mechanisms that enhance and
diminish parts of the input data), even more powerful general-purpose architec-
tures have been achieved, such as the transformer [87] and the perceiver [109,
108]. These architectures allow large-scale, multi-modal data processing without
domain-specific assumptions and are considered a step towards artificial general
intelligence [119].

5      Brain Classification
The foundation of macroscopic brain anatomy was laid by the seventh book
of Andreas Vesal’s De Humani Corporis Fabricia in 1543 [1]. Around 1900, a
breakthrough in microscopic brain anatomy was achieved through advances in
light microscopy, new histological stains, and the discovery of the nerve cell. At
this time, Korbinian Brodmann published his epoch-making work Comparative
Localization Theory of the Cerebral Cortex in which he proposed a parcellation
of the cerebral cortex according to histological aspects [4]. The Brodmann areas
correspond to the functional division of the cerebral cortex, as later established
by the neurosurgeon Walter Penfield through electrical stimulation during awake
craniotomy [6]. Since then, there has been an explosion of brain research due
to new experimental methods, including MRI, which has recently been used to
refine the Brodmann parcellation [83].

Automated segmentation of anatomical and pathological structures in MRI,
particularly in the brain, has continued to be an important challenge. The
original techniques for segmentation date back to a time when powerful image
processing was not yet available and are, therefore, based on the intensity of
the tissue contained in the voxel at different MR sequences (i.e., the single-
voxel paradigm). The earliest work on this topic from 1985 used three different
MR weightings and an unsupervised clustering program of NASA for multi-
    3 The   Princes of Serendip found a treasure when looking for something else.

                                                22

Figure 5: Six manually defined cuboids in the single-voxel data space (left)
corresponding to anatomical structures in the image space (right). Reprinted
with permissions from [38].

spectral satellite images. This program was used to separate healthy brain
parenchyma, hemorrhage, and cerebrospinal fluid [28]. Eight years later, the
distinction of gray and white matter succeeds on the basis of T1 , T2 ρ0 weighted
images [38]. For this purpose, the authors manually defined cuboids in the three-
dimensional data space (reciprocal of T1 , reciprocal of T2 and ρ0 ) that can be
seen in Figure 5. In 2000, with five MRI contrasts, even the subdivision in
15 classes including the thalamus, putamen, caudatus, and pallidum succeeded
(T1 , T2 , ρ0 , Gadolinium-T1 , and perfusion imaging [50]). At the same time,
progress in image processing led to the 2002 article Whole Brain Segmentation
by the developers of the Freesurfer program [52], introducing the gold standard
of spherical registration, which is still valid today: using T1 images, individual
cerebral cortex surfaces are mapped onto a spherical surface, which in turn is
mapped onto a uniform atlas. The inverse of the mapping then yields individual
atlases of the Brodmann areas. Convolutional NNs have been trained on many
Freesurfer segmentations to perform the same task substantially faster [97].
Due to the success of image processing, the multi-parametric and single-voxel
paradigms mentioned above could be replaced, and the main research on brain
segmentation using this technique has preceded the year 2000. Other notable
papers include [31, 34, 36, 35, 39, 40, 41, 43, 44, 47]. Since then, Chai et al.
[63] and West et al. [68] have used the single-voxel paradigm for volumetric
brain analyses, and Bastiani et al. [82] for ex-vivo, high-resolution MR studies
of the brain. In the prostate, the single-voxel paradigm has been employed more
recently for cancer classification ([53, 61, 62, 59, 64, 69, 88]).

                                       23

6     Methods, Results, and Discussion of the Orig-
      inal Paper
In the paper ”Brain tissues have single-voxel signatures in multi-spectral MRI”
[103], the QTI and CEST imaging techniques as introduced in Section 3 were
used to acquire 341 raw and computed three-dimensional MR images of the
brain of 38 participants. Using the multi-parametric single-voxel paradigm, an
FC-NN as described in Section 4 was trained on each individual voxel to predict
the correct brain tissue, defined by the gold standard Freesurfer segmentation
from Section 5. The FC-NN achieved roughly 60% classification accuracy for
97 tissue classes in the test case. Conceptually, the present work draws on the
aforementioned classification work [50], increasing the number of MR features
and tissue classes. In contrast to [50], the present work does not rely on the
spatial neighborhood.

6.1   Strengths
A natural approach would be to train a 3D-CNN with 341 channels to do the
classification task. Typically, a clinical sample size, such as 38, is unlikely to
suﬀiciently cover the distribution D of all possible brain shapes. One possibility
is to divide the images into patches of, for example, 7x7x7 voxels, considerably
increasing the sample size. This approach is possible and yields a very good
classification accuracy (>90% just for QTI data). Using so many channels, one
does not know whether this accuracy comes from brain shape or tissue-specific
multi-parametric MR signatures. Because the latter are a product of the bio-
logical tissue properties, they are the favored option. Note that the assessment
of high-dimensional single-voxel signatures is neither backed into the architec-
ture of a CNN nor into a radiologist reviewing multiple imaging studies: both
rather look out for macroscopic shapes in each channel individually. Consider
the following analogy: a customer helpline employs an ML algorithm rating
caller rage. A complex algorithm using the entire conversation might either
depend on indicators of rage in the caller’s voice or in the content of the spoken
words. A simple algorithm (e.g., an FC-NN trained on spectrograms of short
time frame windows) will base its predictions only on the caller’s voice (and
could provide constant updates). Similarly, the single-voxel paradigm could be
used to highlight additional biomedical information on multi-parametric MRI
and other modalities in clinical radiology. For classification targets with dom-
inant intra-subject variation, just one participant has been shown to suﬀice as
a source of training data.

6.2   Limitations
Every voxel is associated to a participant. Therefore, realistic testing should
be performed on a participant who is not contributing voxels to the training
dataset S. This requirement causes the distribution D for new data to be differ-
ent, violating Definition 4.1. Furthermore, realistic testing should be performed

                                       24

on more participants. This issue was resolved using 38-fold cross-validation.
Confounding factors, such as inhomogeneity of B0 , B1 , and coil sensitivity will

                                    65                                                                                                                   1
                                                                                                           3
                                                                                                                                                         0.9
                                                                                                           6
                                    60
                                                                                                                                                         0.8
                                                                                                           9
       mean predicted age (years)

                                                                                                           12                                            0.7
                                    55
                                                                                                           15                                            0.6

                                                                                           true identity
                                                                                                           18
                                    50                                                                                                                   0.5
                                                                                                           21
                                                                                                                                                         0.4
                                                                                                           24
                                    45
                                                                                                           27                                            0.3
                                                         Pearson's r = 0.87571                             30
                                                                                                                                                         0.2
                                    40
                                                                                                           33
                                                                                                                                                         0.1
                                                                                                           36
                                    35                                                                                                                   0
                                      20   30   40     50        60       70     80                             3   6   9   12 15 18 21 24 27 30 33 36
                                                 true age (years)                                                             predicted identity

Figure 6: Biomedical targets with dominant inter-subject variance. Left: Mean
test prediction of age for each participant after 38-fold cross-validation. Right:
Confusion matrix of identity prediction, row-normalized.

imprint spatial information into the single-voxel signatures to some extent, fa-
cilitating classification. Several strategies were used to account for this effect.

  • Regarding CEST, B1 -inhomogeneity was increased with multiple inter-
    leaved pulses [93], and B0 and B1 -correction of the Z-spectra was per-
    formed using field maps.
  • QTI and CEST parameters of tissue classes were averaged and appeared
    highly individual. Additionally, each 341-dimensional vector was pro-
    jected into the two-dimensional plane using t-distributed stochastic neigh-
    bor embedding [58]. This unsupervised approach showed some grouping
    into tissue classes.
  • Averaged saliency vectors for each tissue class were computed, showing
    biologically plausible linear b-tensor saliency for anisotropic white matter
    structures.
  • Analogous to the 341-input FC-NN, a three-input FC-NN was trained
    on B0 and B1 field maps and on a ρ0 intensity map. This countercheck
    yielded an approximate 20% test accuracy.

                                                                                      25

• Tissue class is a biological target with dominant intra-subject variance.
    By comparison, the subjects’ age of life is a biological target with domi-
    nant inter-subject variance and does not depend on spatial information.
    An analogous 341-input, scalar-output FC-NN was trained to predict age
    for each voxel. After 38-fold cross-validation, a Pearson’s correlation co-
    eﬀicient of 0.88 was achieved. This coeﬀicient was computed using the
    averaged prediction in each test case after 38-fold cross-validation and the
    true age. See Figure 6.
  • An additional bottleneck layer with 16 neurons between the second and
    third hidden layer was introduced to the architecture of the FC-NN. Trans-
    forming the bottleneck neuron activations back into the brain image do-
    main revealed no obvious spatial encoding inside the network.
Predicting biomedical targets with dominant inter-subject variance is key to
using the single-voxel paradigm in a general context. Clinical application of
the single-voxel paradigm requires ML algorithms that generalize after training
on voxels from few participants. In contrast to the case of age, a 341-input
FC-NN trained on the binary classification Parkinson’s disease did not show
any discriminative ability on the test cases. The MR methods might simply
be too imprecise, the stage of disease too early, or the underlying pathome-
chanisms too heterogeneous. However, the FC-NN always showed promising
performance during training time. A plausible explanation for this discrepancy
is the pronounced identity information imprinted into single-voxel signatures:
A QTI-only, 225-input FC-NN trained on 80% of voxels to predict the one-
hot-encoded identity of the participants showed 40% accuracy in the 20% test
voxels (no cross-validation required, confusion matrix shown in Figure 6). This
identity-information imprint could be reproduced using CEST only. Further re-
search into the nature and control of the identity information could be beneficial
to fully unlock the potential of the single-voxel paradigm.

                                       26

7    Original Paper
[103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach,
Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin
Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred-
erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral
MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https:
//doi.org/10.1016/j.neuroimage.2021.117986.

                                     27

8   List of Abbreviations
CEST Chemical Exchange Saturation Transfer
FC-NN Fully Connected Artificial Neural Network
FID Free Induction Decay
NMR Nuclear Magnetic Resonance
NN Neural Network
NOE Nuclear Overhauser Effect
ML Machine Learning
MR Magnetic Resonance
MRI Magnetic Resonance Imaging
QTI q-Space Trajectory Imaging
SNR Signal-to-Noise Ratio
TE Time to Echo
TR Repetition Time

                                  28

9     List of Publications
9.1   Papers
[103] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach,
Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin
Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred-
erik Bernd Laun. “Brain tissues have single-voxel signatures in multi-spectral
MRI”. in: NeuroImage 234 (2021), pp. 117–986. issn: 1053-8119. doi: https:
//doi.org/10.1016/j.neuroimage.2021.117986

[110] Andrzej Liebert, Katharina Tkotz, Jürgen Herrler, Peter Linz, Ange-
lika Mennecke, Alexander German, Patrick Liebig, Rene Gumbrecht, Manuel
Schmidt, Arnd Doerfler, Michael Uder, Moritz Zaiss, and Armin M Nagel.
“Whole-brain quantitative CEST MRI at 7T using parallel transmission meth-
ods and B 1 + correction”. In: Magn. Reson. Med. 86.1 (July 2021), pp. 346–
362

9.2   Conference abstracts
[102] Alexander German, Angelika Mennecke, Jan Martin, Jannis Hanspach,
Andrzej Liebert, Jürgen Herrler, Tristan Anselm Kuder, Manuel Schmidt, Armin
Nagel, Michael Uder, Arnd Doerfler, Jürgen Winkler, Moritz Zaiss, and Fred-
erik Laun. “Brain tissues have single-voxel signatures in multi-spectral MRI”.
in: Poster presented at ISMRM & SMRT Annual Meeting & Exhibition. 2021

[107] Leonie E. Hunger, Alexander German, Felix Glang, Katrin M. Khakzar,
Nam Dang, Angelika Mennecke, Andreas Maier, Frederik Laun, and Moritz
Zaiss. “DeepCEST: 7T Chemical exchange saturation transfer MRI contrast
inferred from 3T data via deep learning with uncertainty quantification”. In:
Poster presented at ISMRM & SMRT Annual Meeting & Exhibition. 2021

[101] Moritz Simon Fabian, Felix Glang, Katrin Michaela Khakzar, Angelika
Barbara Mennecke, Alexander German, Manuel Schmidt, Burkhard Kasper,
Arnd Doerfler, Frederik B. Laun, and Moritz Zaiss. “Reduction of 7T CEST
scan time and evaluation by L1-regularised linear projections”. In: Poster pre-
sented at ISMRM & SMRT Annual Meeting & Exhibition. 2021

[105] Felix Glang, Moritz Fabian, Alexander German, Katrin Khakzar, Angelika
Mennecke, Frederik Laun, Burkhard Kasper, Manuel Schmidt, Arnd Doerfler,
Klaus Scheffler, and Moritz Zaiss. “Linear projection-based CEST reconstruc-
tion – the simplest explainable AI”. in: Poster presented at ISMRM & SMRT
Annual Meeting & Exhibition. 2021

[112] Angelika Mennecke, Katrin Khakzar, Kai Herz, Moritz Fabian, Alexander

                                      29

Sie können auch lesen