Skip to content

Commit 7197dba

Browse files
I24 manycrystal (#17)
* Start of many crystal tutorial * Actual tutorial
1 parent 82d6807 commit 7197dba

1 file changed

Lines changed: 336 additions & 0 deletions

File tree

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Many Crystal Data Processing
2+
3+
In the earlier tutorials where we process one data set at a time. DIALS will however let you carry through a collection of data sets as an _ensemble_. This could be multiple data sets from a single sample at different orientations (for example with a multi-axis goniometer) or multiple nominally similar samples which you may wish to merge.
4+
5+
To DIALS these are _the same_ as processing a single data set: there are only very minor deviations in the usual workflow, and a slightly greater requirement on the operator to pay attention to what is going on.
6+
7+
## Multi-sweep vs. multi-crystal
8+
9+
The only real distinction between multi-sweep and multi-crystal data sets from the point of view of DIALS is whether or not there is a common orientation matrix between the sweeps: the question of isomorphism, which data sets should be included and which should not etc. are down to the user (i.e. you) though there are dials tools to give you some useful hints along the way.
10+
11+
## Workflow
12+
13+
The basic workflow is the same for multiple data sets as it is for single ones, with two nuances. The first is that importing all the data at once and indexing each run separately may give inconsistent results, so you as a user may want to make some choices after indexing. Once the data have been scaled you may want to remove some of the input data sets, which is obviously impossible for a single scan data set.
14+
15+
## Data
16+
17+
The data for this tutorial are [on zenodo](https://zenodo.org/record/7085897) - as a convenience there is a tool for fetching data from zenodo:
18+
19+
```bash
20+
Ethics-Gradient i24-ins :) $ pip install zenodo-get
21+
Ethics-Gradient i24-ins :) $ zenodo_get -r 7085897
22+
Title: Multi-crystal cubic insulin example data set recorded on i24
23+
Keywords: x-ray diffraction, diamond light source, cubic insulin, multi-crystal data set
24+
Publication date: 2022-09-16
25+
DOI: 10.5281/zenodo.7085897
26+
Total size: 6178.9 MB
27+
```
28+
29+
... time passes ... the data appear in the working directory. As an alternative you can run this script:
30+
31+
```bash
32+
curl -o fetch-7085897.sh https://gist.githubusercontent.com/graeme-winter/858487c87eed54e12f962870d3643f5a/raw/92e0ba6771a1a4e4fdf4915be64003054c573e06/fetch-7085897.sh
33+
bash -x fetch-7085897.sh
34+
```
35+
36+
This will download the data required to run the tutorial. The data themselves consist of 24 x 5° cubic insulin data sets recorded on i24 at Diamond Light Source with a CdTe Eiger 2X 9M: there are "metadata" files which contain the mask etc., then data files and finally NeXus files which contain the actual experiment metadata which will be the input for this tutorial.
37+
38+
This download could well take an hour or so, and requires abourt 6GB of space.
39+
40+
## Processing: Spot Finding
41+
42+
Ths works the same as we used for the `tl;dr` pipeline - import data and find spots, but this time _import many data sets_:
43+
44+
```
45+
dials.import ../downloads/*nxs
46+
```
47+
48+
This will take a couple of seconds then print out:
49+
50+
```
51+
Ethics-Gradient demo :( $ dials.import ../*nxs
52+
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
53+
DIALS 3.dev.947-gd83b87046
54+
The following parameters have been modified:
55+
56+
input {
57+
experiments = <image files>
58+
}
59+
60+
--------------------------------------------------------------------------------
61+
format: <class 'dxtbx.format.FormatNXmxDLS.FormatNXmxDLS'>
62+
num images: 1200
63+
sequences:
64+
still: 0
65+
sweep: 24
66+
num stills: 0
67+
--------------------------------------------------------------------------------
68+
Writing experiments to imported.expt
69+
```
70+
71+
(or something very similar). You can see we have 24 data sets in here each with 50 images, or 5°, so we have a total of 120° of data - but we have no idea how complete this is right now. Find some spots:
72+
73+
```
74+
dials.find_spots imported.expt
75+
```
76+
77+
In general this is pretty quick since every data set is tiny... you should not need to go back and redo this step, but susequent steps we may revisit.
78+
79+
## Processing: Indexing
80+
81+
Here is the first time we may consider running a step twice, but we can use the flexibility of the dials tool chain to make this relatively painless. Indexing first as before:
82+
83+
```
84+
dials.index imported.expt strong.refl joint=false
85+
```
86+
87+
Here we set `joint=false` to tell `dials.index` that the data sets don't share an orientation matrix - by default it will assume all the data come from one crystal with different goniometer settings (say). Usual indexing output follows about 24 times, which I won't copy here because it is very long. Because we are indexing in P1 you may not recognise the lattice shape, but if you watch carefully while processing you may see that one cell appears a lot. You don't need to pay attention however, because we can inspect the output very easily:
88+
89+
```
90+
dials.show indexed.expt | grep "Unit cell"
91+
```
92+
93+
Gives
94+
95+
```
96+
Ethics-Gradient demo :) $ dials.show indexed.expt | grep "Unit cell"
97+
Unit cell: 67.79(3), 67.71(3), 67.759(19), 109.429(8), 109.532(9), 109.416(15)
98+
Unit cell: 67.692(13), 67.723(8), 67.760(7), 109.438(3), 109.511(6), 109.456(6)
99+
Unit cell: 67.692(4), 67.808(7), 67.734(3), 109.631(7), 109.400(6), 109.394(3)
100+
Unit cell: 67.717(8), 67.688(5), 67.700(12), 109.528(6), 109.371(9), 109.459(7)
101+
Unit cell: 67.66(6), 67.80(3), 67.70(3), 109.545(5), 109.373(15), 109.500(14)
102+
Unit cell: 67.70(3), 67.801(19), 67.769(18), 109.4989(19), 109.457(6), 109.476(10)
103+
Unit cell: 67.62(3), 67.64(3), 67.736(18), 109.482(7), 109.542(9), 109.354(13)
104+
Unit cell: 67.688(11), 67.667(15), 67.772(7), 109.476(11), 109.519(17), 109.366(11)
105+
Unit cell: 67.68(3), 67.75(2), 67.69(4), 109.539(9), 109.326(14), 109.500(8)
106+
Unit cell: 67.65(3), 67.736(17), 67.749(17), 109.4611(18), 109.470(7), 109.443(8)
107+
Unit cell: 67.666(4), 67.706(3), 67.751(3), 109.5027(12), 109.505(5), 109.454(3)
108+
Unit cell: 67.68(3), 67.71(3), 67.708(19), 109.421(11), 109.435(7), 109.485(13)
109+
Unit cell: 67.69(2), 67.727(18), 67.753(17), 109.449(4), 109.456(6), 109.524(11)
110+
Unit cell: 67.74(3), 67.75(2), 67.72(2), 109.455(7), 109.418(10), 109.529(13)
111+
Unit cell: 67.848(18), 67.80(3), 67.787(17), 109.460(7), 109.4822(19), 109.474(8)
112+
Unit cell: 67.859(12), 67.884(9), 67.845(9), 109.455(3), 109.424(16), 109.437(8)
113+
Unit cell: 67.658(6), 67.649(5), 67.672(5), 109.4692(15), 109.404(6), 109.580(4)
114+
Unit cell: 67.631(3), 67.655(3), 67.599(6), 109.578(2), 109.3383(16), 109.4352(14)
115+
Unit cell: 67.747(16), 67.78(3), 67.72(3), 109.645(13), 109.339(6), 109.449(9)
116+
Unit cell: 67.776(4), 67.770(4), 67.839(5), 109.4885(18), 109.527(4), 109.466(2)
117+
Unit cell: 67.80(3), 67.80(3), 67.844(18), 109.455(6), 109.442(9), 109.473(14)
118+
Unit cell: 67.76(4), 67.83(2), 110.69(3), 89.961(3), 89.943(9), 70.518(14)
119+
Unit cell: 67.61(2), 67.67(2), 67.722(14), 109.501(6), 109.467(6), 109.385(12)
120+
```
121+
122+
Almost all of these are about 67,67,67,109,109,109 but some of them did not work quite right. Also, you may notice that there are only 23 rows, because one failed to index. However we now have some expectation omn what is the "right" answer, so let's use that information:
123+
124+
```
125+
dials.index imported.expt strong.refl joint=false unit_cell=67,67,67,109,109,109
126+
```
127+
128+
This is used as a starting point for the processing and will be subsequently refined. If you watch the output you will see that some of the sets are ~ 90% indexed, others about 60% and so on. Let's not worry too much here but there are probably multiple lattices in some of these sets. An exercise for the reader could be investigating this. Check output again with `dials.show` -
129+
130+
```
131+
Ethics-Gradient demo :) $ dials.show indexed.expt | grep "Unit cell"
132+
Unit cell: 67.824(6), 67.782(4), 67.817(4), 109.4474(11), 109.486(5), 109.529(4)
133+
Unit cell: 67.692(13), 67.723(8), 67.760(7), 109.438(3), 109.511(6), 109.456(6)
134+
Unit cell: 67.692(4), 67.808(7), 67.734(3), 109.631(7), 109.400(6), 109.394(3)
135+
Unit cell: 67.717(8), 67.688(5), 67.700(12), 109.528(6), 109.371(9), 109.459(7)
136+
Unit cell: 67.66(6), 67.80(3), 67.70(3), 109.545(5), 109.373(15), 109.500(14)
137+
Unit cell: 67.70(3), 67.801(19), 67.769(18), 109.4989(19), 109.457(6), 109.476(10)
138+
Unit cell: 67.62(3), 67.64(3), 67.736(18), 109.482(7), 109.542(9), 109.354(13)
139+
Unit cell: 67.686(11), 67.771(6), 67.767(7), 109.478(4), 109.486(14), 109.516(10)
140+
Unit cell: 67.67(3), 67.74(2), 67.75(2), 109.413(4), 109.538(9), 109.504(8)
141+
Unit cell: 67.62(3), 67.73(2), 67.718(16), 109.522(6), 109.452(8), 109.404(13)
142+
Unit cell: 67.666(4), 67.706(3), 67.751(3), 109.5027(12), 109.505(5), 109.454(3)
143+
Unit cell: 67.68(3), 67.71(3), 67.708(19), 109.421(11), 109.435(7), 109.485(13)
144+
Unit cell: 67.64(2), 67.686(18), 67.714(17), 109.446(4), 109.450(6), 109.545(11)
145+
Unit cell: 67.72(2), 67.76(2), 67.74(3), 109.528(14), 109.419(10), 109.456(7)
146+
Unit cell: 67.824(7), 67.864(6), 67.803(6), 109.4814(14), 109.457(5), 109.467(3)
147+
Unit cell: 67.859(12), 67.884(9), 67.845(9), 109.455(3), 109.424(16), 109.437(8)
148+
Unit cell: 67.644(5), 67.606(7), 67.654(9), 109.438(4), 109.588(3), 109.3586(17)
149+
Unit cell: 67.62(3), 67.59(5), 67.65(3), 109.589(17), 109.433(6), 109.333(13)
150+
Unit cell: 67.82(3), 67.754(16), 67.773(16), 109.6307(17), 109.434(8), 109.362(6)
151+
Unit cell: 67.772(4), 67.780(5), 67.842(6), 109.527(4), 109.4895(19), 109.4657(19)
152+
Unit cell: 67.80(3), 67.80(3), 67.844(18), 109.455(6), 109.442(9), 109.473(14)
153+
Unit cell: 67.68(3), 67.64(3), 67.73(2), 109.511(12), 109.473(7), 109.359(13)
154+
Unit cell: 67.83(4), 67.84(3), 67.90(2), 109.464(7), 109.514(9), 109.439(13)
155+
Unit cell: 67.62(2), 67.735(14), 67.723(14), 109.4923(8), 109.467(6), 109.477(6)
156+
```
157+
158+
We now have 24 rows, all with the right cells, so we can now pick up the usual workflow (refine, integrate) though having a look at the reciprocal lattice viewer is fun. Select "crystal frame".
159+
160+
## Processing: Refine and Integrate
161+
162+
Precisely the same as the usual flow:
163+
164+
```
165+
dials.refine indexed.refl indexed.expt
166+
dials.integrate refined.refl refined.expt
167+
```
168+
169+
With a _lot_ of text written.
170+
171+
I note at the end of refinement output:
172+
173+
```
174+
RMSDs by experiment:
175+
+-------+--------+----------+----------+------------+
176+
| Exp | Nref | RMSD_X | RMSD_Y | RMSD_Z |
177+
| id | | (px) | (px) | (images) |
178+
|-------+--------+----------+----------+------------|
179+
| 0 | 1097 | 0.32693 | 0.21546 | 0.11621 |
180+
| 1 | 1745 | 0.28606 | 0.23142 | 0.16296 |
181+
| 2 | 852 | 0.33914 | 0.22277 | 0.095315 |
182+
| 3 | 414 | 0.2984 | 0.24746 | 0.19368 |
183+
| 4 | 616 | 0.38176 | 0.21949 | 0.087365 |
184+
| 5 | 1075 | 0.2712 | 0.23097 | 0.11563 |
185+
| 6 | 1173 | 0.25273 | 0.22517 | 0.11882 |
186+
| 7 | 394 | 0.29569 | 0.2544 | 0.21929 |
187+
| 8 | 1311 | 0.36547 | 0.24697 | 0.16955 |
188+
| 9 | 1396 | 0.28428 | 0.20668 | 0.089166 |
189+
| 10 | 848 | 0.26028 | 0.25227 | 0.1787 |
190+
| 11 | 1326 | 0.2702 | 0.23254 | 0.13247 |
191+
| 12 | 1457 | 0.25631 | 0.21543 | 0.098179 |
192+
| 13 | 1546 | 0.27731 | 0.23287 | 0.15222 |
193+
| 14 | 1195 | 0.27129 | 0.22806 | 0.097633 |
194+
| 15 | 189 | 0.21021 | 0.26237 | 0.21419 | <- this may not be so good
195+
| 16 | 1141 | 0.29631 | 0.22824 | 0.11738 |
196+
| 17 | 1220 | 0.42937 | 0.25279 | 0.17958 |
197+
| 18 | 1415 | 0.28069 | 0.22886 | 0.09891 |
198+
| 19 | 559 | 0.30719 | 0.28369 | 0.1695 |
199+
| 20 | 1087 | 0.25108 | 0.23645 | 0.13212 |
200+
| 21 | 1541 | 0.29062 | 0.24738 | 0.16129 |
201+
| 22 | 1213 | 0.2702 | 0.23749 | 0.15287 |
202+
| 23 | 1703 | 0.26258 | 0.21416 | 0.11408 |
203+
+-------+--------+----------+----------+------------+
204+
```
205+
206+
There are a couple with not-so-many reflections, these may not be as good as the neighbours. Remember we are looking for isomorphism, so we want them to all be similar.
207+
208+
## Procesing: Symmetry Determination
209+
210+
Each of these sets is 5° of data: if we look at each in isolation we won't have many symmetry related observations to combine, which makes it hard to assess symmetry. Instead we have a tool named [`dials.cosym`](http://scripts.iucr.org/cgi-bin/paper?S2059798318002978) which does something similar to the symmetry program but works on an ensemble of data sets. It also resolves any indexing ambiguity between various sets, and the paper linked gives details on how it does this.
211+
212+
```
213+
dials.cosym integrated.refl integrated.expt
214+
```
215+
216+
This is similar to the logic behind `dials.symmetry` but accounts for the fact that the crystals may be inconsistently indexed:
217+
218+
```
219+
+--------------+--------+------+-----+-----------------+
220+
| likelihood | Z-CC | CC | | Operator |
221+
|--------------+--------+------+-----+-----------------|
222+
| 0.083 | 1.71 | 0.17 | | 4 |(1, 1, 0) |
223+
| 0.083 | 1.71 | 0.17 | | 4^-1 |(1, 1, 0) |
224+
| 0.083 | 1.71 | 0.17 | | 4 |(1, 0, 1) |
225+
| 0.083 | 1.71 | 0.17 | | 4^-1 |(1, 0, 1) |
226+
| 0.083 | 1.71 | 0.17 | | 4 |(0, 1, 1) |
227+
| 0.083 | 1.71 | 0.17 | | 4^-1 |(0, 1, 1) |
228+
| 0.949 | 10 | 1 | *** | 3 |(1, 0, 0) |
229+
| 0.949 | 10 | 1 | *** | 3^-1 |(1, 0, 0) |
230+
| 0.949 | 10 | 1 | *** | 3 |(0, 1, 0) |
231+
| 0.949 | 10 | 1 | *** | 3^-1 |(0, 1, 0) |
232+
| 0.949 | 10 | 1 | *** | 3 |(0, 0, 1) |
233+
| 0.949 | 10 | 1 | *** | 3^-1 |(0, 0, 1) |
234+
| 0.949 | 10 | 1 | *** | 3 |(1, 1, 1) |
235+
| 0.949 | 10 | 1 | *** | 3^-1 |(1, 1, 1) |
236+
| 0.949 | 10 | 1 | *** | 2 |(1, 1, 0) |
237+
| 0.083 | 1.71 | 0.17 | | 2 |(-1, 1, 0) |
238+
| 0.949 | 10 | 1 | *** | 2 |(1, 0, 1) |
239+
| 0.083 | 1.71 | 0.17 | | 2 |(-1, 0, 1) |
240+
| 0.949 | 10 | 1 | *** | 2 |(0, 1, 1) |
241+
| 0.083 | 1.71 | 0.17 | | 2 |(0, -1, 1) |
242+
| 0.083 | 1.71 | 0.17 | | 2 |(1, 1, 2) |
243+
| 0.083 | 1.71 | 0.17 | | 2 |(1, 2, 1) |
244+
| 0.083 | 1.71 | 0.17 | | 2 |(2, 1, 1) |
245+
+--------------+--------+------+-----+-----------------+
246+
```
247+
248+
```
249+
Best solution: I m -3
250+
Unit cell: (78.1965, 78.1965, 78.1965, 90, 90, 90)
251+
Reindex operator: b+c,a+c,a+b
252+
Laue group probability: 1.000
253+
Laue group confidence: 1.000
254+
Reindexing operators:
255+
x,x-y,x-z: [2, 3, 4, 5, 7, 11, 12, 18, 20, 23]
256+
x,y,z: [0, 1, 6, 8, 9, 10, 13, 14, 15, 16, 17, 19, 21, 22]
257+
```
258+
259+
We now have data we can scale, and assess isomorphism for. Also take a look at the html report in `dials.cosym.html`
260+
261+
## Processing: Scaling
262+
263+
As before, simple scaling recipe (default) is probably fine:\
264+
265+
```
266+
dials.scale symmetrized.refl symmetrized.expt
267+
```
268+
269+
This will try and scale everything together, and report on the outcomes. First we will filter, then worry about resolution limits. Filter by computing the ∂-CC-half statistic - i.e. those which really are not contributing much to the end result:
270+
271+
```
272+
dials.compute_delta_cchalf scaled.refl scaled.expt
273+
```
274+
275+
```
276+
Dataset: 15, ΔCC½: -8.617
277+
Dataset: 7, ΔCC½: -1.426
278+
Dataset: 3, ΔCC½: -0.995
279+
Dataset: 19, ΔCC½: -0.214
280+
Dataset: 5, ΔCC½: -0.107
281+
Dataset: 9, ΔCC½: -0.028
282+
Dataset: 16, ΔCC½: 0.058
283+
Dataset: 4, ΔCC½: 0.115
284+
Dataset: 20, ΔCC½: 0.135
285+
Dataset: 14, ΔCC½: 0.138
286+
Dataset: 6, ΔCC½: 0.140
287+
Dataset: 13, ΔCC½: 0.147
288+
Dataset: 17, ΔCC½: 0.188
289+
Dataset: 22, ΔCC½: 0.189
290+
Dataset: 18, ΔCC½: 0.217
291+
Dataset: 12, ΔCC½: 0.299
292+
Dataset: 23, ΔCC½: 0.316
293+
Dataset: 21, ΔCC½: 0.359
294+
Dataset: 8, ΔCC½: 0.361
295+
Dataset: 11, ΔCC½: 0.368
296+
Dataset: 2, ΔCC½: 0.436
297+
Dataset: 1, ΔCC½: 0.486
298+
Dataset: 0, ΔCC½: 0.504
299+
Dataset: 10, ΔCC½: 0.858
300+
301+
mean delta_cc_half: -0.253
302+
stddev delta_cc_half: 1.803
303+
cutoff value: -7.465
304+
305+
Removing dataset 15
306+
Writing table to delta_cchalf.dat
307+
Saving 200688 reflections to filtered.refl
308+
Saving the experiments to filtered.expt
309+
Writing html report to: compute_delta_cchalf.html
310+
```
311+
312+
We see one is really bad, and it has automatically been excluded: we can re-use this with
313+
314+
```
315+
dials.scale filtered.refl filtered.expt
316+
```
317+
318+
This will over-write the old scaling results, beware, but shows that there is some flexibility in how the workflow can be used. Now the program will also recommend a resolution limit, so let's apply that:
319+
320+
```
321+
dials.scale filtered.refl filtered.expt d_min=1.54
322+
```
323+
324+
This gives us a completed data set, but does not include _everything_ we could have done...
325+
326+
## Processing: Automate with Multiplex
327+
328+
There is a tool included in `dials` to automate this: [`xia2.multiplex`](https://onlinelibrary.wiley.com/iucr/doi/10.1107/S2059798322004399).
329+
330+
```
331+
mkdir multiplex
332+
cd multiplex
333+
xia2.multiplex ../integrated.refl integrated.expt min_completeness=0.9
334+
```
335+
336+
This will do everything we did above, but more, in assessing which data sets are isomorphous in terms of intensities, group data sets, merge all clusters which look good and generally make your life a lot easier.

0 commit comments

Comments
 (0)