I was trying to run GenoTools (1.3.2) for ancestry prediction using clinical exomes, so I bulit my own reference panel using the overlapping variants between clinical exomes and WGS data available from gnomAD HGDP + 1KG and ran genotools with the following command:
genotools --pfile pf_exome_all_chrs_merged_par \
--out /net/beegfs-hpc/work/fangz/GP2/pf_exomes/glnexus_joint_calling/genotools/ancestry/pf_exome \
--full_output True \
--ancestry \
--ref_panel /net/beegfshpc/work/fangz/GP2/pf_exomes/glnexus_joint_calling/genotools/ref_panel/genotools_inputs/hgdp_tgp_ref_panel_with_var_id \
--ref_labels /net/beegfshpc/work/fangz/GP2/pf_exomes/glnexus_joint_calling/genotools/ref_panel/genotools_inputs/hgdp_tgp_ref_panel_labels.txt
Labeled Reference Ancestry Counts:
label
AFR 753
EAS 727
SAS 683
EUR 573
AMR 395
MID 137
FIN 92
OCE 27
MDE 13
Name: count, dtype: int64
Getting Common SNPs
/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/utils.py:414: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory
=False.
bim1 = pd.read_csv(f'{geno_path1}.bim', sep='\t', header=None)
Training Balanced Accuracy: 0.8438636213357675
Training Balanced Accuracy; 95% CI: (0.8170917613141708, 0.8706354813573642)
Best Parameters: {'umap__a': 1.0, 'umap__b': 0.75, 'umap__n_components': 15, 'umap__n_neighbors': 5, 'xgb__lambda': 0.001}
Balanced Accuracy on Test Set: 0.9691176470588235
Balanced Accuracy on Test Set, 95% Confidence Interval: (0.956114602315253, 0.982120691802394)
Traceback (most recent call last):
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/bin/genotools", line 8, in <module>
sys.exit(handle_main())
^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/__main__.py", line 157, in handle_main
out_dict['ancestry'] = execute_ancestry_predictions(args_dict['geno_path'], args_dict['out'], args_dict, ancestry, tmp_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/pipeline.py", line 106, in execute_ancestry_predictions
ancestry_dict = ancestry.run_ancestry()
^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/ancestry.py", line 1129, in run_ancestry
pred = self.predict_ancestry_from_pcs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/ancestry.py", line 620, in predict_ancestry_from_pcs
projected = self.predict_admixed_samples(projected, train_pca)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/genotools/ancestry.py", line 887, in predict_admixed_samples
birch.fit(cas_train_cluster[['PC1','PC2','PC3']])
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/sklearn/cluster/_birch.py", line 524, in fit
return self._fit(X, partial=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/sklearn/cluster/_birch.py", line 530, in _fit
X = self._validate_data(
^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/sklearn/base.py", line 633, in _validate_data
out = check_array(X, input_name="X", **check_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/beegfs-hpc/home/fangz/miniforge3/envs/GenoTools/lib/python3.12/site-packages/sklearn/utils/validation.py", line 1082, in check_array
raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required by Birch.
Note that I made the mistake not renaming MID to MDE.
However, I am missing CAS according to Dan even I leave out OCE.
Let me know if there's any file I need to upload to help with troubleshooting!!
Hi all!
I was trying to run GenoTools (1.3.2) for ancestry prediction using clinical exomes, so I bulit my own reference panel using the overlapping variants between clinical exomes and WGS data available from gnomAD HGDP + 1KG and ran genotools with the following command:
I ran into the error below:
I pushed the pfiles of clinical exomes here and the variant list here.
Note that I made the mistake not renaming MID to MDE.
However, I am missing CAS according to Dan even I leave out OCE.
Let me know if there's any file I need to upload to help with troubleshooting!!
Thanks!
Zih-Hua