|CI| |Package| |PyPI| |覆盖率| |Depsy| |下载量|
描述
Samtools提供了一个名为"faidx"(FAsta InDeX)的功能,它可以创建一个小型的平面索引文件".fai",允许快速随机访问索引FASTA文件中的任何子序列,同时只将文件的最小部分加载到内存中。这个Python模块实现了纯Python类,用于索引、检索和就地修改FASTA文件,使用的是与samtools兼容的索引。pyfaidx模块在API上与pygr
_ seqdb模块兼容。随pyfaidx模块一起安装的命令行脚本"faidx
_"可以在不需要任何编程知识的情况下复杂地操作FASTA文件。
.. _pygr
: https://github.com/cjlee112/pygr
如果您在出版物中使用pyfaidx,请引用:
Shirley MD
、Ma Z
、Pedersen B
、Wheelan S
。使用pyfaidx高效"Pythonic"访问FASTA文件 <https://dx.doi.org/10.7287/peerj.preprints.970v1>
_。PeerJ PrePrints 3:e1196。2015年。
.. _Shirley MD
: http://github.com/mdshw5
.. _Ma Z
: http://github.com/azalea
.. _Pedersen B
: http://github.com/brentp
.. _Wheelan S
: http://github.com/swheelan
安装
这个包在Linux和macOS下使用Python 3.7+进行了测试,可以从PyPI获取:
::
pip install pyfaidx # 如果没有root权限,请添加--user
或者下载一个发布版本 <https://github.com/mdshw5/pyfaidx/releases>
_ 然后:
::
pip install .
如果使用pip install --user
,请确保将/home/$USER/.local/bin
(在linux上)或/Users/$USER/Library/Python/{python版本}/bin
(在macOS上)添加到您的$PATH
中,如果您想运行faidx
脚本。
Python 2.6和2.7用户可以选择使用v0.7.2 <https://github.com/mdshw5/pyfaidx/releases/tag/v0.7.2.2>
_或更早版本的包。
用法
.. code:: python
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta')
>>> genes
Fasta("tests/data/genes.fasta") # 设置strict_bounds=True以进行边界检查
行为类似字典。
.. code:: python
>>> genes.keys()
('AB821309.1', 'KF435150.1', 'KF435149.1', 'NR_104216.1', 'NR_104215.1', 'NR_104212.1', 'NM_001282545.1', 'NM_001282543.1', 'NM_000465.3', 'NM_001282549.1', 'NM_001282548.1', 'XM_005249645.1', 'XM_005249644.1', 'XM_005249643.1', 'XM_005249642.1', 'XM_005265508.1', 'XM_005265507.1', 'XR_241081.1', 'XR_241080.1', 'XR_241079.1')
>>> genes['NM_001282543.1'][200:230]
>NM_001282543.1:201-230
CTCGTTCCGCGCCCGCCATGGAACCGGATG
>>> genes['NM_001282543.1'][200:230].seq
'CTCGTTCCGCGCCCGCCATGGAACCGGATG'
>>> genes['NM_001282543.1'][200:230].name
'NM_001282543.1'
# 起始属性是基于1的
>>> genes['NM_001282543.1'][200:230].start
201
# 结束属性是基于0的
>>> genes['NM_001282543.1'][200:230].end
230
>>> genes['NM_001282543.1'][200:230].fancy_name
'NM_001282543.1:201-230'
>>> len(genes['NM_001282543.1'])
5466
注意,Sequence对象的起始和结束坐标是[1, 0]。可以通过向Fasta
或Faidx
传递one_based_attributes=False
来将其更改为[0, 0]。此参数仅影响Sequence .start/.end
属性,对切片坐标没有影响。
像列表一样索引:
.. code:: python
>>> genes[0][:50]
>AB821309.1:1-50
ATGGTCAGCTGGGGTCGTTTCATCTGCCTGGTCGTGGTCACCATGGCAAC
像字符串一样切片:
.. code:: python
>>> genes['NM_001282543.1'][200:230][:10]
>NM_001282543.1:201-210
CTCGTTCCGC
>>> genes['NM_001282543.1'][200:230][::-1]
>NM_001282543.1:230-201
GTAGGCCAAGGTACCGCCCGCGCCTTGCTC
>>> genes['NM_001282543.1'][200:230][::3]
>NM_001282543.1:201-230
CGCCCCTACA
>>> genes['NM_001282543.1'][:]
>NM_001282543.1:1-5466
CCCCGCCCCT........
- 切片的起始和结束坐标是基于0的,就像Python序列一样。
像DNA一样互补和反向互补
.. code:: python
>>> genes['NM_001282543.1'][200:230].complement
>NM_001282543.1 (complement):201-230
GAGCAAGGCGCGGGCGGTACCTTGGCCTAC
>>> genes['NM_001282543.1'][200:230].reverse
>NM_001282543.1:230-201
GTAGGCCAAGGTACCGCCCGCGCCTTGCTC
>>> -genes['NM_001282543.1'][200:230]
>NM_001282543.1 (complement):230-201
CATCCGGTTCCATGGCGGGCGCGGAACGAG
Fasta
对象也可以使用方法调用来访问:
.. code:: python
>>> genes.get_seq('NM_001282543.1', 201, 210)
>NM_001282543.1:201-210
CTCGTTCCGC
>>> genes.get_seq('NM_001282543.1', 201, 210, rc=True)
>NM_001282543.1 (complement):210-201
GCGGAACGAG
可以从[start, end]坐标列表中检索剪接序列: 待办事项 更新此部分
.. code:: python
# 在v0.5.1中新增
segments = [[1, 10], [50, 70]]
>>> genes.get_spliced_seq('NM_001282543.1', segments)
>gi|543583786|ref|NM_001282543.1|:1-70
CCCCGCCCCTGGTTTCGAGTCGCTGGCCTGC
.. _keyfn:
自定义键函数提供更清晰的访问:
.. code:: python
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta', key_function = lambda x: x.split('.')[0])
>>> genes.keys()
dict_keys(['NR_104212', 'NM_001282543', 'XM_005249644', 'XM_005249645', 'NR_104216', 'XM_005249643', 'NR_104215', 'KF435150', 'AB821309', 'NM_001282549', 'XR_241081', 'KF435149', 'XR_241079', 'NM_000465', 'XM_005265508', 'XR_241080', 'XM_005249642', 'NM_001282545', 'XM_005265507', 'NM_001282548'])
>>> genes['NR_104212'][:10]
>NR_104212:1-10
CCCCGCCCCT
您可以指定一个字符来分割名称,这将生成额外的条目:
.. code:: python
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta', split_char='.', duplicate_action="first") # 默认duplicate_action="stop"
>>> genes.keys()
dict_keys(['.1', 'NR_104212', 'NM_001282543', 'XM_005249644', 'XM_005249645', 'NR_104216', 'XM_005249643', 'NR_104215', 'KF435150', 'AB821309', 'NM_001282549', 'XR_241081', 'KF435149', 'XR_241079', 'NM_000465', 'XM_005265508', 'XR_241080', 'XM_005249642', 'NM_001282545', 'XM_005265507', 'NM_001282548'])
如果您的key_function
或split_char
生成重复条目,您可以选择采取什么操作:
.. code:: python
v0.4.9新特性
genes = Fasta('tests/data/genes.fasta', split_char="|", duplicate_action="longest") genes.keys() dict_keys(['gi', '563317589', 'dbj', 'AB821309.1', '', '557361099', 'gb', 'KF435150.1', '557361097', 'KF435149.1', '543583796', 'ref', 'NR_104216.1', '543583795', 'NR_104215.1', '543583794', 'NR_104212.1', '543583788', 'NM_001282545.1', '543583786', 'NM_001282543.1', '543583785', 'NM_000465.3', '543583740', 'NM_001282549.1', '543583738', 'NM_001282548.1', '530384540', 'XM_005249645.1', '530384538', 'XM_005249644.1', '530384536', 'XM_005249643.1', '530384534', 'XM_005249642.1', '530373237','XM_005265508.1', '530373235', 'XM_005265507.1', '530364726', 'XR_241081.1', '530364725', 'XR_241080.1', '530364724', 'XR_241079.1'])
过滤函数(返回True)可以限制索引:
.. code:: python
# v0.3.8新特性
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta', filt_function = lambda x: x[0] == 'N')
>>> genes.keys()
dict_keys(['NR_104212', 'NM_001282543', 'NR_104216', 'NR_104215', 'NM_001282549', 'NM_000465', 'NM_001282545', 'NM_001282548'])
>>> genes['XM_005249644']
KeyError: XM_005249644 not in tests/data/genes.fasta.
或者直接获取Python字符串:
.. code:: python
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta', as_raw=True)
>>> genes
Fasta("tests/data/genes.fasta", as_raw=True)
>>> genes['NM_001282543.1'][200:230]
CTCGTTCCGCGCCCGCCATGGAACCGGATG
您可以确保始终接收大写序列,即使您的fasta文件是小写的
.. code:: python
>>> from pyfaidx import Fasta
>>> reference = Fasta('tests/data/genes.fasta.lower', sequence_always_upper=True)
>>> reference['gi|557361099|gb|KF435150.1|'][1:70]
>gi|557361099|gb|KF435150.1|:2-70
TGACATCATTTTCCACCTCTGCTCAGTGTTCAACATCTGACAGTGCTTGCAGGATCTCTCCTGGACAAA
您还可以执行基于行的迭代,接收FASTA文件中出现的序列行:
.. code:: python
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta')
>>> for line in genes['NM_001282543.1']:
... print(line)
CCCCGCCCCTCTGGCGGCCCGCCGTCCCAGACGCGGGAAGAGCTTGGCCGGTTTCGAGTCGCTGGCCTGC
AGCTTCCCTGTGGTTTCCCGAGGCTTCCTTGCTTCCCGCTCTGCGAGGAGCCTTTCATCCGAAGGCGGGA
CGATGCCGGATAATCGGCAGCCGAGGAACCGGCAGCCGAGGATCCGCTCCGGGAACGAGCCTCGTTCCGC
...
序列名称在任何空白处被截断。这是索引策略的限制。但是,可以恢复完整名称:
.. code:: python
# v0.3.7新特性
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta')
>>> for record in genes:
... print(record.name)
... print(record.long_name)
...
gi|563317589|dbj|AB821309.1|
gi|563317589|dbj|AB821309.1| Homo sapiens FGFR2-AHCYL1 mRNA for FGFR2-AHCYL1 fusion kinase protein, complete cds
gi|557361099|gb|KF435150.1|
gi|557361099|gb|KF435150.1| Homo sapiens MDM4 protein variant Y (MDM4) mRNA, complete cds, alternatively spliced
gi|557361097|gb|KF435149.1|
gi|557361097|gb|KF435149.1| Homo sapiens MDM4 protein variant G (MDM4) mRNA, complete cds
...
# v0.4.9新特性
>>> from pyfaidx import Fasta
>>> genes = Fasta('tests/data/genes.fasta', read_long_names=True)
>>> for record in genes:
... print(record.name)
...
gi|563317589|dbj|AB821309.1| Homo sapiens FGFR2-AHCYL1 mRNA for FGFR2-AHCYL1 fusion kinase protein, complete cds
gi|557361099|gb|KF435150.1| Homo sapiens MDM4 protein variant Y (MDM4) mRNA, complete cds, alternatively spliced
gi|557361097|gb|KF435149.1| Homo sapiens MDM4 protein variant G (MDM4) mRNA, complete cds
可以高效地将记录作为numpy数组访问:
.. code:: python
# v0.5.4新特性
>>> from pyfaidx import Fasta
>>> import numpy as np
>>> genes = Fasta('tests/data/genes.fasta')
>>> np.asarray(genes['NM_001282543.1'])
array(['C', 'C', 'C', ..., 'A', 'A', 'A'], dtype='|S1')
可以使用预读缓冲区在内存中缓冲序列,以实现快速顺序访问:
.. code:: python
>>> from timeit import timeit
>>> fetch = "genes['NM_001282543.1'][200:230]"
>>> read_ahead = "import pyfaidx; genes = pyfaidx.Fasta('tests/data/genes.fasta', read_ahead=10000)"
>>> no_read_ahead = "import pyfaidx; genes = pyfaidx.Fasta('tests/data/genes.fasta')"
>>> string_slicing = "genes = {}; genes['NM_001282543.1'] = 'N'*10000"
>>> timeit(fetch, no_read_ahead, number=10000)
0.2204863309962093
>>> timeit(fetch, read_ahead, number=10000)
0.1121859749982832
>>> timeit(fetch, string_slicing, number=10000)
0.0033553699977346696
预读缓冲可以将对缓冲区域的顺序访问的运行时间减少一半。
.. role:: red
如果您想就地修改FASTA文件的内容,可以使用mutable
参数。
FastaRecord的任何部分都可以用等长的字符串替换。
:red:警告
:这将立即并永久地更改文件内容:
.. code:: python
>>> genes = Fasta('tests/data/genes.fasta', mutable=True)
>>> type(genes['NM_001282543.1'])
<class 'pyfaidx.MutableFastaRecord'>
>>> genes['NM_001282543.1'][:10]
>NM_001282543.1:1-10
CCCCGCCCCT
>>> genes['NM_001282543.1'][:10] = 'NNNNNNNNNN'
>>> genes['NM_001282543.1'][:15]
>NM_001282543.1:1-15
NNNNNNNNNNCTGGC
FastaVariant类提供了一种方法来整合单核苷酸变异调用以生成一致性序列。
.. code:: python
# v0.4.0新特性
>>> consensus = FastaVariant('tests/data/chr22.fasta', 'tests/data/chr22.vcf.gz', het=True, hom=True)
RuntimeWarning: Using sample NA06984 genotypes.
>>> consensus['22'].variant_sites
(16042793, 21833121, 29153196, 29187373, 29187448, 29194610, 29821295, 29821332, 29993842, 32330460, 32352284)
>>> consensus['22'][16042790:16042800]
>22:16042791-16042800
TCGTAGGACA
>>> Fasta('tests/data/chr22.fasta')['22'][16042790:16042800]
>22:16042791-16042800
TCATAGGACA
>>> consensus = FastaVariant('tests/data/chr22.fasta', 'tests/data/chr22.vcf.gz', sample='NA06984', het=True, hom=True, call_filter='GT == "0/1"')
>>> consensus['22'].variant_sites
(16042793, 29187373, 29187448, 29194610, 29821332)
您还可以使用pathlib.Path
对象指定路径。
.. code:: python
# v0.7.1新特性
>>> from pyfaidx import Fasta
>>> from pathlib import Path
>>> genes = Fasta(Path('tests/data/genes.fasta'))
>>> genes
Fasta("tests/data/genes.fasta")
从filesystem_spec <https://filesystem-spec.readthedocs.io>
_文件系统访问fasta文件:
.. code:: python
# v0.7.0新特性
# pip install fsspec s3fs
>>> import fsspec
>>> from pyfaidx import Fasta
>>> of = fsspec.open("s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta", anon=True)
>>> genes = Fasta(of)
.. _faidx:
它还提供了一个命令行脚本:
cli脚本:faidx
.. code:: bash
从FASTA中获取序列。如果未指定区域,则返回输入文件中的所有条目。输入FASTA文件必须一致地换行,
输出的换行基于输入行长度。
位置参数:
fasta FASTA文件
regions 要获取的序列区域,以空格分隔,例如 chr1:1-1000
可选参数:
-h, --help 显示此帮助信息并退出
-b BED, --bed BED 区域的bed文件(起始坐标为0)
-o OUT, --out OUT 输出文件名(默认:标准输出)
-i {bed,chromsizes,nucleotide,transposed}, --transform {bed,chromsizes,nucleotide,transposed}
将请求的区域转换为另一种格式。默认:无
-c, --complement 序列互补。默认:False
-r, --reverse 序列反向。默认:False
-a SIZE_RANGE, --size-range SIZE_RANGE
选择[低,高]范围内的序列大小。例如:1,1000。默认:无
-n, --no-names 输出中省略序列名。默认:False
-f, --full-names 输出包含描述的完整名称。默认:False
-x, --split-files 将每个区域写入单独的文件(文件名从区域名派生)
-l, --lazy 对缺失的区域填充--default-seq。默认:False
-s DEFAULT_SEQ, --default-seq DEFAULT_SEQ
缺失位置和掩码的默认碱基。默认:无
-d DELIMITER, --delimiter DELIMITER
分割名称为多个值的分隔符(重复名称将被丢弃)。默认:无
-e HEADER_FUNCTION, --header-function HEADER_FUNCTION
修改标题行的Python函数,例如:"lambda x: x.split("|")[0]"。默认:lambda x: x.split()[0]
-u {stop,first,last,longest,shortest}, --duplicates-action {stop,first,last,longest,shortest}
遇到重复序列名称时要采取的操作。默认:stop
-g REGEX, --regex REGEX
选择匹配正则表达式的序列。默认:.*
-v, --invert-match 选择不匹配'regions'参数的序列。默认:False
-m, --mask-with-default-seq
使用--default-seq掩码FASTA文件。默认:False
-M, --mask-by-case 通过更改为小写来掩码FASTA文件。默认:False
-e HEADER_FUNCTION, --header-function HEADER_FUNCTION
修改标题行的Python函数,例如:"lambda x: x.split("|")[0]"。默认:无
--no-rebuild 即使.fai索引过期也不重建。默认:False
--version 打印pyfaidx版本号
示例:
$ faidx -v tests/data/genes.fasta
### 创建.fai索引,但使用--invert-match抑制序列输出 ###
$ faidx tests/data/genes.fasta NM_001282543.1:201-210 NM_001282543.1:300-320
>NM_001282543.1:201-210
CTCGTTCCGC
>NM_001282543.1:300-320
GTAATTGTGTAAGTGACTGCA
$ faidx --full-names tests/data/genes.fasta NM_001282543.1:201-210
>NM_001282543.1| Homo sapiens BRCA1 associated RING domain 1 (BARD1), transcript variant 2, mRNA
CTCGTTCCGC
$ faidx --no-names tests/data/genes.fasta NM_001282543.1:201-210 NM_001282543.1:300-320
CTCGTTCCGC
GTAATTGTGTAAGTGACTGCA
$ faidx --complement tests/data/genes.fasta NM_001282543.1:201-210
>NM_001282543.1:201-210 (complement)
GAGCAAGGCG
$ faidx --reverse tests/data/genes.fasta NM_001282543.1:201-210
>NM_001282543.1:210-201
CGCCTTGCTC
$ faidx --reverse --complement tests/data/genes.fasta NM_001282543.1:201-210
>NM_001282543.1:210-201 (complement)
GCGGAACGAG
$ faidx tests/data/genes.fasta NM_001282543.1
>NM_001282543.1:1-5466
CCCCGCCCCT........
..................
..................
..................
$ faidx --regex "^NM_00128254[35]" genes.fasta
>NM_001282543.1
..................
..................
..................
>NM_001282545.1
..................
..................
..................
$ faidx --lazy tests/data/genes.fasta NM_001282543.1:5460-5480
>NM_001282543.1:5460-5480
AAAAAAANNNNNNNNNNNNNN
$ faidx --lazy --default-seq='Q' tests/data/genes.fasta NM_001282543.1:5460-5480
>NM_001282543.1:5460-5480
AAAAAAAQQQQQQQQQQQQQQ
$ faidx tests/data/genes.fasta --bed regions.bed
...
$ faidx --transform chromsizes tests/data/genes.fasta
AB821309.1 3510
KF435150.1 481
KF435149.1 642
NR_104216.1 4573
NR_104215.1 5317
NR_104212.1 5374
...
$ faidx --transform bed tests/data/genes.fasta
AB821309.1 1 3510
KF435150.1 1 481
KF435149.1 1 642
NR_104216.1 1 4573
NR_104215.1 1 5317
NR_104212.1 1 5374
...
$ faidx --transform nucleotide tests/data/genes.fasta
name start end A T C G N
AB821309.1 1 3510 955 774 837 944 0
KF435150.1 1 481 149 120 103 109 0
KF435149.1 1 642 201 163 129 149 0
NR_104216.1 1 4573 1294 1552 828 899 0
NR_104215.1 1 5317 1567 1738 968 1044 0
NR_104212.1 1 5374 1581 1756 977 1060 0
...
faidx --transform transposed tests/data/genes.fasta
AB821309.1 1 3510 ATGGTCAGCTGGGGTCGTTTCATC...
KF435150.1 1 481 ATGACATCATTTTCCACCTCTGCT...
KF435149.1 1 642 ATGACATCATTTTCCACCTCTGCT...
NR_104216.1 1 4573 CCCCGCCCCTCTGGCGGCCCGCCG...
NR_104215.1 1 5317 CCCCGCCCCTCTGGCGGCCCGCCG...
NR_104212.1 1 5374 CCCCGCCCCTCTGGCGGCCCGCCG...
...
$ faidx --split-files tests/data/genes.fasta
$ ls
AB821309.1.fasta NM_001282549.1.fasta XM_005249645.1.fasta
KF435149.1.fasta NR_104212.1.fasta XM_005265507.1.fasta
KF435150.1.fasta NR_104215.1.fasta XM_005265508.1.fasta
NM_000465.3.fasta NR_104216.1.fasta XR_241079.1.fasta
NM_001282543.1.fasta XM_005249642.1.fasta XR_241080.1.fasta
NM_001282545.1.fasta XM_005249643.1.fasta XR_241081.1.fasta
NM_001282548.1.fasta XM_005249644.1.fasta
$ faidx --delimiter='_' tests/data/genes.fasta 000465.3
>000465.3
CCCCGCCCCTCTGGCGGCCCGCCGTCCCAGACGCGGGAAGAGCTTGGCCGGTTTCGAGTCGCTGGCCTGC
AGCTTCCCTGTGGTTTCCCGAGGCTTCCTTGCTTCCCGCTCTGCGAGGAGCCTTTCATCCGAAGGCGGGA
.......
$ faidx --size-range 5500,6000 -i chromsizes tests/data/genes.fasta
NM_000465.3 5523
$ faidx -m --bed regions.bed tests/data/genes.fasta
### 通过使用--default-seq字符掩码区域来修改tests/data/genes.fasta ###
$ faidx -M --bed regions.bed tests/data/genes.fasta
### 通过使用小写字符掩码区域来修改tests/data/genes.fasta ###
$ faidx -e "lambda x: x.split('.')[0]" tests/data/genes.fasta -i bed
AB821309 1 3510
KF435150 1 481
KF435149 1 642
NR_104216 1 4573
NR_104215 1 5317
.......
语法类似于 ``samtools faidx``
还提供了一个较低级别的Faidx类:
>>> from pyfaidx import Faidx
>>> fa = Faidx('genes.fa') # 可以使用as_raw=True返回str
>>> fa.index
OrderedDict([('AB821309.1', IndexRecord(rlen=3510, offset=12, lenc=70, lenb=71)), ('KF435150.1', IndexRecord(rlen=481, offset=3585, lenc=70, lenb=71)),... ])
>>> fa.index['AB821309.1'].rlen
3510
fa.fetch('AB821309.1', 1, 10) # 这些是基于1的基因组坐标
>AB821309.1:1-10
ATGGTCAGCT
- 如果FASTA文件未建立索引,当初始化`Faidx`时,`build_index`方法会自动运行,并将索引写入"filename.fa.fai"文件中,其中"filename.fa"是原始的FASTA文件。
- 起始和结束坐标都是以1为基础的。
支持压缩的FASTA
----------------------------
`pyfaidx`可以为使用`samtools`中的`bgzip`工具压缩的FASTA文件创建和读取`.fai`索引。`bgzip`以`BGZF`格式写入压缩数据。`BGZF`与`gzip`兼容,由多个连接的`gzip`块组成,每个块都有额外的`gzip`头,使得可以建立索引以进行快速随机访问。也就是说,用`bgzip`压缩的文件是有效的`gzip`文件,可以被`gunzip`读取。更多关于`bgzip`的详细信息,请参阅此描述。
更新日志
---------
请查看发布页面以获取完整的版本变更列表。
已知问题
------------
我尽力修复尽可能多的错误,但大部分工作都是由单个开发者支持的。请查看已知问题以了解与您工作相关的错误。欢迎提交Pull Request。
贡献
------------
创建一个包含单一功能的新Pull Request。如果添加新功能,请同时创建相关测试。
要在您的机器上运行测试:
- 创建一个新的虚拟环境并安装`dev-requirements.txt`。
pip install -r dev-requirements.txt
- 运行以下命令下载测试数据:
python tests/data/download_gene_fasta.py
- 运行测试:
pytests
致谢
----------------
本项目由作者Matthew Shirley自由授权,在Sidney Kimmel综合癌症中心肿瘤学系Sarah Wheelan博士和Vasan Yegnasubramanian博士的指导和经济支持下完成。