Aller au contenu

RefSeq

A propos

RefSeq est la base de données NCBI de séquences de référence ; un ensemble organisé et non redondant comprenant des contigs d'ADN génomique, des ARNm et des protéines pour des gènes connus, ainsi que des chromosomes entiers.

Accession numbers et types de molécules

Les entrées RefSeq sont similaires au format d'entrée de GenBank. Attribuer une nouvelle entrée RefSeq inclus un unique préfixe d'accession suivi d'un underscore. Le préfixe d'accession RefSeq a une signification implicite en termes de type de molécule qu'il représente. Voici tous les préfixes d'accession RefSeq et le type de molécule associé provenant de la Table 1 du
Chapitre 18 The Reference Sequence (RefSeq) Database retrouvé dans The NCBI Handbook

Accession prefix Molecule type Comment
AC_ Genomic Complete genomic molecule, usually alternate assembly
NC_ Genomic Complete genomic molecule, usually reference assembly
NG_ Genomic Incomplete genomic region
NT_ Genomic Contig or scaffold, clone-based or WGSa
NW_ Genomic Contig or scaffold, primarily WGSa
NZ_ Genomic Complete genomes and unfinished WGS data
NM_ mRNA Protein-coding transcripts (usually curated)
NR_ RNA Non-protein-coding transcripts
XM_ mRNA Predicted model protein-coding transcript
XR_ RNA Predicted model non-protein-coding transcript
AP_ Protein Annotated on AC_ alternate assembly
NP_ Protein Associated with an NM_ or NC_ accession
YP_ Protein Annotated on genomic molecules without an instantiated transcript record
XP_ Protein Predicted model, associated with an XM_ accession
WP_ Protein Non-redundant across multiple strains and species

Code status

Chaque enregistrement a un COMMENT, indiquant le niveau de conservation qu'il a reçu et l'attribution du groupe collaborateur. Ainsi, un enregistrement RefSeq peut être une copie essentiellement inchangée et validée de la soumission originale de l'INSDC, ou inclure des informations mises à jour ou supplémentaires fournies par des collaborateurs ou le personnel du NCBI. Voici tous les codes status RefSeq provenant de la Table 2 du
Chapitre 18 The Reference Sequence (RefSeq) Database retrouvé dans The NCBI Handbook

Code Description
MODEL The RefSeq record is provided by the NCBI Genome Annotation pipeline and is not subject to individual review or revision between annotation runs.
INFERRED The RefSeq record has been predicted by genome sequence analysis, but it is not yet supported by experimental evidence. The record may be partially supported by homology data.
PREDICTED The RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.
PROVISIONAL The RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff.
REVIEWED The RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.
VALIDATED The RefSeq record has undergone an initial review to provide the preferred sequence standard. The record has not yet been subject to final review at which time additional functional information may be provided.
WGS The RefSeq record is provided to represent a collection of whole genome shotgun sequences. These records are not subject to individual review or revisions between genome updates.