PRS.jl Documentation
Implementation of ldpred, EB, and lassosum in Julia.
PlinkReader
PRS.PlinkReader
— MethodPlinkReader(path; markerIndex = false, sampleIndex = false)
Structure to access PLINK .bed, .bim, and .fam files at path
.
Arguments
markerIndex::Bool
should index marker -> idx be created?sampleIndex::Bool
should index sample -> idx be created?
PRS.Sample
— TypeSample(fid, iid, father, mother, sex)
PLINK sample info
PRS.nsamples
— Methodnsamples(p::PlinkReader)
Number of samples in Plink file.
PRS.samples
— Methodsamples(p::PlinkReader)
Return Array of Sample
structs for samples in Plink file.
PRS.sample_index
— Methodsample_index(p::PlinkReader, iid::String)
Return index of sample iid
in PlinkReader
. Uses index if available, otherwise linear search.
PRS.Marker
— TypeMarker(chrom, id, cm, pos, a1, a2)
PLINK marker info
PRS.nmarkers
— Methodnmarkers(p::PlinkReader)
Number of markers in Plink file.
PRS.markers
— Methodmarkers(p::PlinkReader)
Return array of Marker
structs for markers in Plink file.
PRS.marker_index
— Methodmarker_index(p::PlinkReader, id::String)
Return index of marker id
in PlinkReader
. Uses index if available, otherwise linear search.
PRS.markersDF
— MethodmarkersDF(p::PlinkReader)
Get data frame with marker info.
Returns data frame with columns
- Chrom
- Name
- cM
- Pos
- A1
- A2
- Idx
Base.getindex
— Methodgetindex(p::PlinkReader, s::Int, m::Int)
Retrieve genotype of sample s
and marker m
.
Plink convention for representation of genotypes is as follows:
0b00
Hom10b01
Het0b10
missing0b11
Hom2
PRS.dosageMatrix
— FunctiondosageMatrix(p::PlinkReader, markerIdx, sampleIdx = nothing;
normalize = true)
Create dosage matrix (samples x markers) for markerIdx
and sampleIdx
from PlinkReader
. Here dosage is expressed as count of alternative allele.
Replace missing genotypes with mean dosage. If normalize = true
, normalize markers to mean μ = 0 and standard deviation σ = 1.
LDMatrix
PRS.LDMatrix
— MethodLDMatrix(p::PlinkReader, mIdx0, mIdx1; alpha = 0.9, window = posWindow(1_000_000))
Compute LDMatrix from genotypes in Plink file.
PRS.LDMatrix
— MethodLDMatrix(path::AbstractString)
Load LDMatrix from .bim and .lds files at path
.
PRS.markers
— Methodmarkers(ld::LDMatrix)
Get array of Marker
structs for all markers in LDMatrix ld
.
PRS.markersDF
— MethodmarkersDF(ld::LDMatrix)
Get data frame with marker info.
Returns data frame with columns
- Chrom
- Name
- cM
- Pos
- A1
- A2
- Idx
PRS.save
— Methodsave(ld::LDMatrix, path::AbstractString)
Save LDMatrix
to path
.
PRS.ldscore
— Methodldscore(ld::LDMatrix)
Compute LD scores for markers in LDMatrix ld
.
ldpred
PRS.ldpred_gibbs
— Methodldpred_gibbs(z, D, p, σ2, μ0; n_burnin = 100, n_iter = 500, verbose = false)
Run LDpred Gibbs sampler.
Arguments
z::Vector
: z-scoresD::LDMatrix
: LD matrixp::Real
: proportion of variants deemed to be causalσ2::Real
: $Nh²/M$ as estimated from LDscore regression (estimate_h2
, ormean(z^2)/mean(lds)
) Note: prior variance of non-null component of $μ$, $σ2_μ = σ2/p$μ0::Vector
: starting estimate (e.g. from infinite model)
PRS.estimate_h2
— Methodestimate_h2(z, lds, n)
Estimate total heritability $h^2$ for markers using LD score regression with intercept forced to 1.
Arguments
lds::Vector
: LD score as computed byldscore(LDMatrix)
n::Integer
: effective number of samples ($4 n_0*n_1/(n_0+n_1)$ for case-control)
PRS.estimate_neff
— Methodestimate_neff(beta, sebeta, freq)
Estimate effective number of samples in case-control study.
Effective number of samples is total number of samples for a cohort with same number of samples and controls, that would result in the observed sebeta
.
freq
can be minor or major allele freq, as only freq*(1-freq)
is being used.
lassosum
PRS.z2cor
— Methodz2cor(z, n)
Convert GWAS z-value to phenotype-genotype correlation coefficient.
Missing docstring for elnetg!(X, r, λ, s, β = zeros(Float64, length(r)); maxiter = 10_000, thresh = 1e-4)
. Check Documenter's build log for details.
PRS.elnetg_path
— Methodelnetg_path(X, r, λs, s;
maxiter = 10_000, thresh = 1e-4)
Solve elastic net for path along λs
using warm starts.
Arguments
X::Matrix
nsubj x nmarkers normalized genotype matrix column normalized (μ = 0, σ = 1)r::Vector
nmarkers x 1 vector of correlation coefficients between phenotype and genotypesλs::Vector
shrinkage parameter path for 1-norm in decreasing orders::Real
shrinkage parameter for 2-norm (LD)β::Vector
warm start (in) and result (out) for solution vectormaxiter::Int
maximum number of iterationsthresh::Real
maximum change in β to be called converged
utils
PRS.snp_join
— Methodsnp_join(df1, df2; on = [:Chrom, :Pos],
alleles1 = [:A1_1, :A2_1], alleles2 = [:A1_2, :A2_2],
matchcol = :sign)
Join data frames df1
and df2
and match variants.
Data frames df1
and df2
are first matched by columns specified in on
. Then column matchcol
is set to
matchcol == 1
if values foralleles1
andalleles2
matchmatchcol == -1
if values foralleles1
andalleles2
are swappedmatchcol == 0
otherwise
Note: Swapped means simply swapping of alleles, not applying reverse complement as done in other implementations.