Skip to content

Commit d0c341e

Browse files
authored
Merge pull request #236 from prisms-center/0.2a2
0.2a2
2 parents ebd0e00 + 98e283c commit d0c341e

27 files changed

Lines changed: 859 additions & 299 deletions

INSTALL.md

Lines changed: 97 additions & 64 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ This version of CASM supports:
99
- Occupational degrees of freedom.
1010
- High-throughput calculations using:
1111
- VASP: [https://www.vasp.at](https://www.vasp.at)
12+
- Semi-Grand canonical Monte Carlo calculations
1213

1314
CASM is updated frequently with support for new effective Hamiltonians, new interfaces for first-principles electronic structure codes, and new Monte Carlo methods. Collaboration is welcome and new features can be incorporated by forking the repository on GitHub, creating a new feature, and submitting pull requests. If you are interested in developing features that involve a significant time investment we encourage you to first contact the CASM development team at <casm-developers@lists.engr.ucsb.edu>.
1415

@@ -57,7 +58,7 @@ CASM is developed by the Van der Ven group, originally at the University of Mich
5758

5859
**Developers**: John Goiri and Anirudh Natarajan.
5960

60-
**Other contributors**: Min-Hua Chen, Jonathon Bechtel, Max Radin, Elizabeth Decolvenaere and Anna Belak
61+
**Other contributors**: Min-Hua Chen, Jonathon Bechtel, Max Radin, Elizabeth Decolvenaere, Anna Belak, Liang Tian, and Naga Sri Harsha Gunda
6162

6263
#### Acknowledgements ####
6364

@@ -89,7 +90,7 @@ See INSTALL.md
8990

9091
The ``casm`` executable includes extensive help documentation describing the various commands and options. Simply executing ``casm`` will display a list of possible commands, and executing ``casm <cmd> -h`` will display help documentation particular to the chosen command.
9192

92-
For a beginner, the best place to start is to follow the suggestions printed when calling ``casm status -n``. This provides step-by-step instructions for creating a CASM project, generating symmetry information, setting composition axes, enumerating configurations, calculating energies with VASP, setting reference states, and fitting an effective Hamiltonian. The subcommand ``casm format`` provides information on the directory structure of the CASM project and the format of all the CASM files.
93+
For a beginner, the best place to start is to follow the suggestions printed when calling ``casm status -n``. This provides step-by-step instructions for creating a CASM project, generating symmetry information, setting composition axes, enumerating configurations, calculating energies with VASP, setting reference states, and fitting an effective Hamiltonian using the program ``casm-learn``. ``casm-learn`` provides The subcommand ``casm format`` provides information on the directory structure of the CASM project and the format of all the CASM files.
9394

9495
All that is needed to start a new project is a ``prim.json`` file describing the crystal structure of the material being studied. See ``casm format --prim`` for a description and examples. Typically one will create a new project directory containing the ``prim.json`` file and then initialize the casm project. For example:
9596

@@ -108,15 +109,9 @@ All that is needed to start a new project is a ``prim.json`` file describing the
108109

109110
After initializing a casm project:
110111

111-
- ``casm`` generates code that is compiled and linked at runtime in order to evaluate effective Hamiltonians in a highly optimized manner. If you installed the CASM header files in a location that is not in your default search path you must specify in your CASM project settings where to find the header files. You can inspect the current settings via ``casm settings -l``, and then add the correct include path via ``casm settings --set-compile-options``. For example:
112-
113-
casm settings --set-compile-options 'g++ -O3 -Wall -fPIC --std=c++11 -I/path/to/include/casm'
114-
115-
- Shared object compilation options may be set via ``casm settings --set-so-options``. For example (using the default settings):
112+
- ``casm`` generates code that is compiled and linked at runtime in order to evaluate effective Hamiltonians in a highly optimized manner. If you installed the CASM header files and libraries in a location that is not in your default search path you must specify where to find them. Often the default compilation options work well, but there are some cases when the c++ compiler, compiler flags, or shared object construction flags might need to be customized. You can inspect the current settings via ``casm settings -l`` and options to change them via ``casm settings --desc``.
116113

117-
casm settings --set-so-options 'g++ -shared -lboost_system'
118114

119115

120-
An html tutorial describing the creation of an example CASM project and typical steps is coming soon.
121116

122117

SConstruct

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ import sys, os, glob, copy, shutil, subprocess, imp, re
66
from os.path import join
77

88
Help("""
9-
Type: 'scons' to build all binaries,
9+
Type: 'scons configure' to run configuration checks,
10+
'scons' to build all binaries,
1011
'scons install' to install all libraries, binaries, scripts and python packages,
1112
'scons test' to run all tests,
1213
'scons unit' to run all unit tests,
@@ -43,10 +44,9 @@ Help("""
4344
Sets to compile with debugging symbols. In this case, the optimization level gets
4445
set to -O0, and NDEBUG does not get set.
4546
46-
$LD_LIBRARY_PATH:
47+
$LD_LIBRARY_PATH (Linux) or $DYLD_FALLBACK_LIBRARY_PATH (Mac):
4748
Search path for dynamic libraries, may need $CASM_BOOST_PREFIX/lib
4849
and $CASM_PREFIX/lib added to it.
49-
On Mac OS X, this variable is $DYLD_FALLBACK_LIBRARY_PATH.
5050
This should be added to your ~/.bash_profile (Linux) or ~/.profile (Mac).
5151
5252
$CASM_BOOST_NO_CXX11_SCOPED_ENUMS:

casmenv.sh

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
#
1616
#export CASM_BOOST_PREFIX=""
1717

18+
#
19+
1820
# Recognized by install scripts. Use this if linking to boost libraries compiled without c++11. If defined, (i.e. CASM_BOOST_NO_CXX11_SCOPED_ENUMS=1) will compile with -DBOOST_NO_CXX11_SCOPED_ENUMS option.
1921
# Order of precedence:
2022
# 1) if $CASM_BOOST_NO_CXX11_SCOPED_ENUMS defined
@@ -105,6 +107,17 @@ if [ ! -z ${CASM_PREFIX} ]; then
105107

106108
fi
107109

110+
# If CASM_BOOST_PREFIX is set, update library search path
111+
if [ ! -z ${CASM_BOOST_PREFIX} ]; then
112+
113+
# For Linux, set LD_LIBRARY_PATH
114+
export LD_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$LD_LIBRARY_PATH
115+
116+
# For Mac, set DYLD_LIBRARY_FALLBACK_PATH
117+
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$DYLD_FALLBACK_LIBRARY_PATH
118+
119+
fi
120+
108121
# If testing:
109122
if [ ! -z ${CASM_REPO} ]; then
110123

@@ -114,25 +127,12 @@ if [ ! -z ${CASM_REPO} ]; then
114127
export PATH=$CASM_REPO/bin:$CASM_REPO/python/casm/scripts:$PATH
115128
export PYTHONPATH=$CASM_REPO/python/casm:$PYTHONPATH
116129

117-
if [ ! -z ${DYLD_FALLBACK_LIBRARY_PATH} ]; then
118-
# For testing on Mac, use DYLD_FALLBACK_LIBRARY_PATH:
119-
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_REPO/lib:$DYLD_FALLBACK_LIBRARY_PATH
120-
121-
# If CASM_BOOST_PREFIX is set, update library search path
122-
if [ ! -z ${CASM_BOOST_PREFIX} ]; then
123-
# For testing on Mac, set DYLD_LIBRARY_FALLBACK_PATH
124-
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$DYLD_FALLBACK_LIBRARY_PATH
125-
fi
126-
else
127-
# For testing on Linux, use LD_LIBRARY_PATH:
128-
export LD_LIBRARY_PATH=$CASM_REPO/lib:$LD_LIBRARY_PATH
129-
130-
# If CASM_BOOST_PREFIX is set, update library search path
131-
if [ ! -z ${CASM_BOOST_PREFIX} ]; then
132-
# For testing on Mac, set DYLD_LIBRARY_FALLBACK_PATH
133-
export LD_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$LD_LIBRARY_PATH
134-
fi
135-
fi
130+
# For testing on Linux, use LD_LIBRARY_PATH:
131+
export LD_LIBRARY_PATH=$CASM_REPO/lib:$LD_LIBRARY_PATH
132+
133+
# For testing on Mac, use DYLD_FALLBACK_LIBRARY_PATH:
134+
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_REPO/lib:$DYLD_FALLBACK_LIBRARY_PATH
135+
136136
fi
137137

138138

python/casm/casm/learn/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def create_halloffame(maxsize, rel_tol=1e-6):
7070
from fit import example_input_Lasso, example_input_LassoCV, example_input_RFE, \
7171
example_input_GeneticAlgorithm, example_input_IndividualBestFirst, \
7272
example_input_PopulationBestFirst, example_input_DirectSelection, \
73-
set_input_defaults, \
73+
open_input, set_input_defaults, \
7474
FittingData, TrainingData, \
7575
print_input_help, print_individual, print_population, print_halloffame, print_eci, \
7676
to_json, open_halloffame, save_halloffame, \
@@ -90,6 +90,7 @@ def create_halloffame(maxsize, rel_tol=1e-6):
9090
'example_input_IndividualBestFirst',
9191
'example_input_PopulationBestFirst',
9292
'example_input_DirectSelection',
93+
'open_input',
9394
'set_input_defaults',
9495
'FittingData',
9596
'TrainingData',

python/casm/casm/learn/fit.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1101,6 +1101,31 @@ def set_input_defaults(input, input_filename=None):
11011101
return input
11021102

11031103

1104+
def open_input(input_filename):
1105+
"""
1106+
Read casm-learn input file into a dict
1107+
1108+
Arguments
1109+
---------
1110+
1111+
input_filename: str
1112+
The path to the input file
1113+
1114+
Returns
1115+
-------
1116+
input: dict
1117+
The result of reading the input file and running it through
1118+
casm.learn.set_input_defaults
1119+
"""
1120+
# open input and always set input defaults before doing anything else
1121+
with open(input_filename, 'r') as f:
1122+
try:
1123+
input = set_input_defaults(json.load(f), input_filename)
1124+
except Exception as e:
1125+
print "Error parsing JSON in", args.settings[0]
1126+
raise e
1127+
return input
1128+
11041129
class FittingData(object):
11051130
"""
11061131
FittingData holds feature values, target values, sample weights, etc. used

python/casm/scripts/casm-learn

Lines changed: 168 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import deap.tools
99
if __name__ == "__main__":
1010

1111
parser = argparse.ArgumentParser(description = 'Fit cluster expansion coefficients (ECI)')
12+
parser.add_argument('--desc', help='Print extended usage description', action="store_true")
1213
parser.add_argument('-s', '--settings', nargs=1, help='Settings input filename', type=str)
1314
parser.add_argument('--format', help='Hall of fame print format. Options are "details", "json", or "csv".', type=str, default=None)
1415
#parser.add_argument('--path', help='Path to CASM project. Default assumes the current directory is in the CASM project.', type=str, default=os.getcwd())
@@ -61,13 +62,7 @@ if __name__ == "__main__":
6162
if args.verbose:
6263
print "Loading", args.settings[0]
6364

64-
# open input and always set input defaults before doing anything else
65-
with open(args.settings[0], 'r') as f:
66-
try:
67-
input = casm.learn.set_input_defaults(json.load(f), args.settings[0])
68-
except Exception as e:
69-
print "Error parsing JSON in", args.settings[0]
70-
raise e
65+
input = casm.learn.open_input(args.settings[0])
7166

7267
if args.hall:
7368

@@ -132,28 +127,181 @@ if __name__ == "__main__":
132127
# pickle hall of fame
133128
casm.learn.save_halloffame(hall, halloffame_filename, args.verbose)
134129

135-
else:
130+
elif args.desc:
136131

137132
print \
138133
"""
139-
Learning is performed in four steps:
140134
141-
1) Select training data.
142-
Create a selection of configurations to include in the regression problem.
135+
1) Specify the problem:
136+
137+
'casm-learn' helps solve the problem:
138+
139+
X*b = y,
140+
141+
where:
142+
143+
X: 2d matrix of shape (n_samples, n_features)
144+
The correlation matrix, holding the evaluated basis functions. The
145+
entry X[config, bfunc] holds the average value of the 'bfunc' cluster
146+
basis function for configuration 'config'. The number of configurations
147+
is 'n_samples' and the number of cluster basis functions is 'n_features'.
148+
149+
y: 1d matrix of shape (n_samples, 1)
150+
The calculated properties being fit to. The most common case is that
151+
y[config] holds the formation energy calculated for configuration
152+
'config'.
153+
154+
b: 1d matrix of shape (n_features, 1)
155+
The effective cluster interactions (ECI) being solved for.
156+
157+
To specify this problem, the 'casm-learn' input file specifies which
158+
configurations to fit to (the training data), how to weight the
159+
configurations, and how to compare solutions via cross-validation.
160+
161+
162+
Training data may be input via a 'casm select' output file. The default
163+
name expected is 'train'. So to use all calculated configurations, you
164+
could create a directory in your CASM project where you will perform
165+
fitting and generate a 'train' file:
166+
167+
cd /my/casm/project
168+
mkdir fit_1 && cd fit_1
169+
casm select --set is_calculated -o train
170+
171+
172+
Example 'casm-learn' JSON input files can be output by the
173+
'casm-learn --exMethodName' options:
174+
175+
casm-learn --exGeneticAlgorithm > fit_1_ga.json
176+
casm-learn --exRFE > fit_1_rfe.json
177+
...etc..
178+
179+
By default, these settings files are prepared for fitting formation_energy,
180+
using the 'train' configuration selection. Edit the file as needed, and
181+
see 'casm-learn --settings-format' for help.
182+
183+
184+
When weighting configurations, the problem is transformed:
185+
186+
X*b = y -> L*X*b = L*y,
187+
188+
where, W = L*L.tranpose():
189+
190+
W: 2d matrix of shape (n_samples, n_samples)
191+
The weight matrix is specified in the casm-learn input file. If the
192+
weighting method provides 1-dimensional input (this is typical, i.e.
193+
a weight for each configuration), in an array called 'w', then:
194+
195+
W = diag(w)*n_samples/sum(w),
196+
197+
diag(w) being the diagonal matrix with 'w' along the diagonal.
198+
199+
200+
A cross-validation score is used for comparing generated ECI. The cv score
201+
reported is:
202+
203+
cv = sqrt(mean(scores)) + N_nonzero_eci*penalty,
204+
205+
where:
206+
207+
scores: 1d array of shape (number of train/test sets)
208+
The mean squared error calculated for each training/testing set
209+
210+
N_nonzero_eci: int
211+
The number of basis functions with non-zero ECI
212+
213+
penalty: number, optional, default=0.0
214+
Is the user-input penalty per basis function that can be used to
215+
favor solutions with a small number of non-zero ECI
216+
217+
See 'casm-learn --settings-format' for help specifying the cross-validation
218+
training and test sets using options from scikit-learn. It is usually
219+
important to use the 'shuffle'=true option so that configurations are
220+
randomly added to train/test sets and not ordered by supercell size.
221+
222+
223+
When you run 'casm-learn' with a new problem specification the first time,
224+
it generates a "problem specs" file that stores the training data, weights,
225+
and cross-validation train/test sets. Then, when running subsequent times,
226+
the data can be loaded more quickly, and the cross-validation can be
227+
performed using the same train/test sets. 'casm-learn' will attempt to
228+
prevent you from re-running with a different problem specification so that
229+
solutions can be compared via their cv score in an "apples-to-apples"
230+
manner. The default name for the "specs" file is determined from the input
231+
filename. For example, 'my_input_specs.pkl' is used if the input file is
232+
named 'my_input.json'. See 'casm-learn --settings-format' for more help.
233+
234+
235+
The '--checkspecs' option can be used to write output files with the
236+
generated problem specs data. Amont other things, this can be used to
237+
adjust weights manually or save and re-use train/test sets. See
238+
'casm-learn --settings-format' for more help.
239+
240+
241+
2) Select estimator and feature selection methods
242+
243+
The "estimator" option specifies a linear model estimator that determines
244+
how to solve the linear problem L*X*b = L*b, for b.
245+
246+
The "feature_selection" option specifies a feature selection method that
247+
determines which features (ECI) should be considered for the solution. The
248+
remaining are effectively set to 0.0 when calculating the cluster
249+
expansion. Generally there is a tradeoff: By limiting the number of
250+
features included in the cluster expansion Monte Carlo calculations can be
251+
more efficient, but at a possible loss of accuracy. Be careful to avoid
252+
overfitting however. If your cross validation scheme does not provide
253+
enough testing data, you may fit your training data very well, but not
254+
have an accurate extrapolation to other configurations.
255+
256+
See 'casm-learn --settings-format' for help specifying the estimator and
257+
feature selection methods. Assuming you are using the GeneticAlgorithm and
258+
have named your input file 'fit_1_ga.json', run:
259+
260+
casm-learn -s fit_1_ga.json
261+
262+
'casm-learn' will run and eventually store its results. For a single
263+
problem specification (step 1, the settings in "problem_specs"
264+
in the 'casm-learn' input file), you may try many different estimation
265+
and feature selection methods and use the cv score to compare results. All
266+
the results for a single problem specification can be stored in a 'Hall Of
267+
Fame' that collects the N individual solutions with the best cv scores. To
268+
view these results use:
269+
270+
casm-learn -s fit_1_ga.json --hall
271+
272+
For more details, or to output the results for further analysis in JSON or
273+
CSV format, there is a '--format' option. To view only particular
274+
individuals in the hall of fame, there is a '--indiv' option.
275+
143276
144-
2) Select scoring metric.
145-
Add sample weights to configurations if desired and select a cross validation
146-
method.
277+
3) Analyze results
278+
279+
The above steps (1) and (2) may be repeated many times as you attempt to
280+
optimize your ECI. Solutions for different problems (i.e. different
281+
weighting schemes, re-calculating with more training data) may be compared
282+
based on scientific knowledge, for instance, which predicts the 0K ground
283+
state configurations correctly, or from analysis of Monte Carlo results.
284+
285+
The '--checkhull' option provides a simple way to check the 0K ground
286+
states and can create 'casm select' style output files with enumerated but
287+
uncalculated configurations that are predicted to be low energy. These can
288+
then be used to generate more training data and re-fit the ECI.
289+
290+
When you have generated ECI that you wish to use in Monte Carlo
291+
calculations, use the '--select' option to write an 'eci.json' file into
292+
your CASM project for the currently selected cluster expansion (as listed
293+
by 'casm settings -l).
147294
148-
3) Select estimator.
149-
Choose how to solve for ECI from calculated property and correlations. For
150-
instance: LinearRegression, Lasso, or Ridge regression.
151295
152-
4) Select features.
153-
Select which basis functions to include in the cluster expansion. For instance,
154-
SelectFromModel along with a l-1 norm minimizing estimator. Or GeneticAlgorithm.
296+
4) Use results
297+
298+
Once an 'eci.json' file has been written, you can run Monte Carlo
299+
calculations. See 'casm monte -h' and 'casm format --monte' for help.
300+
155301
"""
156302

303+
else:
304+
157305
parser.print_help()
158306

159307

0 commit comments

Comments
 (0)