Tutorial of pymmdb

[1]:

from pymmdb import MMDB

1. Create a MMDB object that provides access to the datasets in scMMDB database.

[2]:

mmdb = MMDB('./') # creates a new MMDB object with the current directory as the root
print(mmdb) # prints the configuration of the MMDB object

MMDB(storage_path=./, server_address=https://mmdb.piaqia.com/)

2. Check the scMMDB’s details.

[3]:

mmdb.list_mmdb_info()

MMDB Information:
Species:  Homo sapiens, Mus musculus, Macaca mulatta, Sus scrofa
Tissue:  human cell line , human cell line, human blood, human bone marrow, human kidney, human primary motor cortex, human intra-abdominal lymph node tumor, human brain, human achilles tendon, mouse retina, mouse colon, mouse brain cortex, mouse cell line, mouse forebrain, mouse kidney, mouse primary motor cortex, mouse brain, human brain cortex, human jejunum, mouse thymic epithelium, human lung, bone marrow, human liver, human glioblastoma, human blood/skin, human tumor, macaca vaginal, mouse submandibular gland, mouse aorta, mouse tumor, mouse bone marrow, mouse mesenteric lymph nodes, mouse glioblastoma, mouse spleen/lymph nodes, mouse spleen, mouse epididymal adipose, mouse liver, pig liver
Disease:  none, cancer, diffuse small lymphocytic lymphoma of the lymph node, tendinopathy, pearson syndrome, alzheimer's disease, non-small-cell lung cancer, acute myeloid leukemia, atherosclerosis, acute lymphoblastic leukemia, glioblastoma, COVID-19, B cell acute lymphoblastic leukemia, HIV, obese, epilepsy, multisystem inflammatory syndrome; COVID-19, multiple sclerosis, peruvian tuberculosis disease, cutaneous T cell lymphoma, melanoma, SHIV infection, salivary gland squamous cell carcinoma, aortic aneurysm, breast cancer, nonalcoholic fatty liver disease
Technology:  SNARE-seq, Paired-seq, Novaseq, DOGMA-seq, SHARE-seq, NEAT-seq, sci-CAR-seq, HiSeq, ASAP-seq, CITE-seq, ECCITE-seq, Perturb-CITE-seq, REAP-seq, TEA-seq
Technology Type:  ATAC_RNA, ATAC_PROTEIN, RNA_PROTEIN, ATAC_RNA_PROTEIN

[4]:

mmdb.list_species() # list all species in the database

Species:  Homo sapiens, Mus musculus, Macaca mulatta, Sus scrofa

[5]:

mmdb.list_disease() # list all diseases in the database

Disease:  none, cancer, diffuse small lymphocytic lymphoma of the lymph node, tendinopathy, pearson syndrome, alzheimer's disease, non-small-cell lung cancer, acute myeloid leukemia, atherosclerosis, acute lymphoblastic leukemia, glioblastoma, COVID-19, B cell acute lymphoblastic leukemia, HIV, obese, epilepsy, multisystem inflammatory syndrome; COVID-19, multiple sclerosis, peruvian tuberculosis disease, cutaneous T cell lymphoma, melanoma, SHIV infection, salivary gland squamous cell carcinoma, aortic aneurysm, breast cancer, nonalcoholic fatty liver disease

[6]:

mmdb.list_tissue() # list all tissues in the database

Tissue:  human cell line , human cell line, human blood, human bone marrow, human kidney, human primary motor cortex, human intra-abdominal lymph node tumor, human brain, human achilles tendon, mouse retina, mouse colon, mouse brain cortex, mouse cell line, mouse forebrain, mouse kidney, mouse primary motor cortex, mouse brain, human brain cortex, human jejunum, mouse thymic epithelium, human lung, bone marrow, human liver, human glioblastoma, human blood/skin, human tumor, macaca vaginal, mouse submandibular gland, mouse aorta, mouse tumor, mouse bone marrow, mouse mesenteric lymph nodes, mouse glioblastoma, mouse spleen/lymph nodes, mouse spleen, mouse epididymal adipose, mouse liver, pig liver

[7]:

mmdb.list_technology() # list all technologies in the database

Technology:  SNARE-seq, Paired-seq, Novaseq, DOGMA-seq, SHARE-seq, NEAT-seq, sci-CAR-seq, HiSeq, ASAP-seq, CITE-seq, ECCITE-seq, Perturb-CITE-seq, REAP-seq, TEA-seq

[8]:

mmdb.list_technology_type() # list all technology types in the database

Technology Type:  ATAC_RNA, ATAC_PROTEIN, RNA_PROTEIN, ATAC_RNA_PROTEIN

3. Load the dataset in the scMMDB

[9]:

mmdb.list_dataset(species='Homo sapiens', tissue='human blood', disease=None, technology_type='RNA_PROTEIN').head(5) # list datasets information under the corresponding conditions.

[9]:

	ID	Species	Tissue	Disease	Technology_type	Technology	Cell_num	Title
82	Dataset_C_004	Homo sapiens	human blood	atherosclerosis	RNA_PROTEIN	CITE-seq	5232	Single-cell immune landscape of human atherosc...
83	Dataset_C_005	Homo sapiens	human blood	acute lymphoblastic leukemia	RNA_PROTEIN	CITE-seq	16450	Single-cell antigen-specific landscape of CAR ...
84	Dataset_C_006	Homo sapiens	human blood	acute lymphoblastic leukemia	RNA_PROTEIN	CITE-seq	23287	Single-cell antigen-specific landscape of CAR ...
87	Dataset_C_009	Homo sapiens	human blood	acute lymphoblastic leukemia	RNA_PROTEIN	CITE-seq	30484	Single-cell antigen-specific landscape of CAR ...
88	Dataset_C_010	Homo sapiens	human blood	acute lymphoblastic leukemia	RNA_PROTEIN	CITE-seq	31105	Single-cell antigen-specific landscape of CAR ...

[10]:

Dataset_A_000 = mmdb.load_dataset('Dataset_C_004') # load the dataset based on the Dataset ID.

Load dataset: [Dataset_C_004]

[11]:

Dataset_A_000 # check the dataset information

[11]:

{'RNA': AnnData object with n_obs × n_vars = 5232 × 100
     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'sample', 'tissue', 'celltype', 'ident', 'RNA.weight', 'ADT.weight', 'wsnn_res.1', 'seurat_clusters'
     var: 'features'
     uns: 'neighbors'
     obsm: 'X_pca_rna', 'X_umap_adt', 'X_umap_rna', 'X_wnnUMAP'
     varm: 'PCA_RNA'
     obsp: 'distances',
 'PROTEIN': AnnData object with n_obs × n_vars = 5232 × 21
     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'sample', 'tissue', 'celltype', 'ident', 'RNA.weight', 'ADT.weight', 'wsnn_res.1', 'seurat_clusters'
     var: 'features'
     obsm: 'X_pca_adt', 'X_umap_adt', 'X_umap_rna', 'X_wnnUMAP'
     varm: 'PCA_ADT'}

[12]:

Dataset_A_000['RNA']

[12]:

AnnData object with n_obs × n_vars = 5232 × 100
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'sample', 'tissue', 'celltype', 'ident', 'RNA.weight', 'ADT.weight', 'wsnn_res.1', 'seurat_clusters'
    var: 'features'
    uns: 'neighbors'
    obsm: 'X_pca_rna', 'X_umap_adt', 'X_umap_rna', 'X_wnnUMAP'
    varm: 'PCA_RNA'
    obsp: 'distances'

[13]:

Dataset_A_000['PROTEIN']

[13]:

AnnData object with n_obs × n_vars = 5232 × 21
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'sample', 'tissue', 'celltype', 'ident', 'RNA.weight', 'ADT.weight', 'wsnn_res.1', 'seurat_clusters'
    var: 'features'
    obsm: 'X_pca_adt', 'X_umap_adt', 'X_umap_rna', 'X_wnnUMAP'
    varm: 'PCA_ADT'

[ ]: