Steven Salzberg

Steven Salzberg

Johns Hopkins University

H-index: 161

North America-United States

Professor Information

University

Johns Hopkins University

Position

Bloomberg Distinguished Professor

Citations(all)

333890

Citations(since 2020)

154044

Cited By

242757

hIndex(all)

161

hIndex(since 2020)

105

i10Index(all)

339

i10Index(since 2020)

259

Email

University Profile Page

Johns Hopkins University

Research & Interests List

Computational Biology

Genomics

Bioinformatics

Metagenomics

Biomedical Data Science

Top articles of Steven Salzberg

Upstream open reading frames may contain hundreds of novel human exons

Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which would comprise an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 582 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.Author SummaryWe analyzed over 2000 human sequences that have been proposed to represent novel protein-coding …

Authors

Hyun Joo Ji,Steven L Salzberg

Journal

bioRxiv

Published Date

2024/3/23

Implementing governmental oversight of enhanced potential pandemic pathogen research

We write in response to the commentary “Virology—the path forward”(1). The authors argue against the recommendations of the US National Science Advi sory Board for Biosecurity (NSABB) to strengthen the oversight of enhanced potential pandemic pathogen (ePPP) research (2). The authors assert that adopting the NSABB recommendations would have a sweeping negative impact on US research and harm US competitiveness, and the authors cite the development of vaccines against measles and cytomegalovirus as examples of research that would be harmed. The claim of sweeping negative impact is false. ePPP research as defined by the NSABB represents< 0.01% of biomedical research and< 1% of virology research. At most, a dozen current US-funded virology research projects, of more than 2,000, would be affected.

Authors

Richard H Ebright,Raina MacIntyre,Joseph P Dudley,Colin D Butler,Andre Goffinet,Edward Hammond,Elisa D Harris,Hideki Kakeya,Yanna Lambrinidou,Milton Leitenberg,Stuart A Newman,Bryce E Nickels,Monali C Rahalkar,Matt W Ridley,Steven L Salzberg,Harish Seshadri,Günter Theißen,Antonius M VanDongen,Alex Washburne

Journal

Journal of virology

Published Date

2024/3/13

Novel metagenomics analysis of stony coral tissue loss disease

Stony coral tissue loss disease (SCTLD) has devastated coral reefs off the coast of Florida and continues to spread throughout the Caribbean. Although a number of bacterial taxa have consistently been associated with SCTLD, no pathogen has been definitively implicated in the etiology of SCTLD. Previous studies have predominantly focused on the prokaryotic community through 16S rRNA sequencing of healthy and affected tissues. Here, we provide a different analytical approach by applying a bioinformatics pipeline to publicly available metagenomic sequencing samples of SCTLD lesions and healthy tissues from four stony coral species. To compensate for the lack of coral reference genomes, we used data from apparently healthy coral samples to approximate a host genome and healthy microbiome reference. These reads were then used as a reference to which we matched and removed reads from …

Authors

Jakob M Heinz,Jennifer Lu,Lindsay K Huebner,Steven L Salzberg,Markus Sommer,Stephanie M Rosales

Journal

bioRxiv

Published Date

2024/1/3

A genome sequence for the threatened whitebark pine

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly …

Authors

David B Neale,Aleksey V Zimin,Amy Meltzer,Akriti Bhattarai,Maurice Amee,Laura Figueroa Corona,Brian J Allen,Daniela Puiu,Jessica Wright,Amanda R De La Torre,Patrick E McGuire,Winston Timp,Steven L Salzberg,Jill L Wegrzyn

Journal

G3: Genes, Genomes, Genetics

Published Date

2024/3/25

Detecting differential transcript usage in complex diseases with SPIT

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to deficiencies in protein function and underpin disease pathogenesis. Analyzing DTU via RNA sequencing (RNA-seq) data is vital, but the genetic heterogeneity in populations with complex diseases presents an intricate challenge due to diverse causal events and undetermined subtypes. Although the majority of common diseases in humans are categorized as complex, state-of-the-art DTU analysis methods often overlook this heterogeneity in their models. We therefore developed SPIT, a statistical tool that identifies predominant subgroups in transcript usage within a population along with their distinctive sets of DTU events. This study provides comprehensive …

Authors

Beril Erdogdu,Ales Varabyou,Stephanie C Hicks,Steven L Salzberg,Mihaela Pertea

Journal

Cell Reports Methods

Published Date

2024/3/25

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess.

Authors

Ales Varabyou,Markus J Sommer,Beril Erdogdu,Ida Shinder,Ilia Minkin,Kuan-Hao Chao,Sukhwan Park,Jakob Heinz,Christopher Pockrandt,Alaina Shumate,Natalia Rincon,Daniela Puiu,Martin Steinegger,Steven L Salzberg,Mihaela Pertea

Journal

Genome biology

Published Date

2023/10/30

The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.

Authors

Kuan-Hao Chao,Aleksey V Zimin,Mihaela Pertea,Steven L Salzberg

Journal

G3: Genes, Genomes, Genetics

Published Date

2023/3/1

The status of the human gene catalogue

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we …

Authors

Paulo Amaral,Silvia Carbonell-Sala,Francisco M De La Vega,Tiago Faial,Adam Frankish,Thomas Gingeras,Roderic Guigo,Jennifer L Harrow,Artemis G Hatzigeorgiou,Rory Johnson,Terence D Murphy,Mihaela Pertea,Kim D Pruitt,Shashikant Pujar,Hazuki Takahashi,Igor Ulitsky,Ales Varabyou,Christine A Wells,Mark Yandell,Piero Carninci,Steven L Salzberg

Published Date

2023/10/5

Professor FAQs

What is Steven Salzberg's h-index at Johns Hopkins University?

The h-index of Steven Salzberg has been 105 since 2020 and 161 in total.

What are Steven Salzberg's research interests?

The research interests of Steven Salzberg are: Computational Biology, Genomics, Bioinformatics, Metagenomics, Biomedical Data Science

What is Steven Salzberg's total number of citations?

Steven Salzberg has 333,890 citations in total.

What are the co-authors of Steven Salzberg?

The co-authors of Steven Salzberg are Jonathan A. Eisen, Owen White, Michael C. Schatz, Brian Haas, Charles H. Langley.

Co-Authors

H-index: 115
Jonathan A. Eisen

Jonathan A. Eisen

University of California, Davis

H-index: 108
Owen White

Owen White

University of Maryland, Baltimore

H-index: 87
Michael C. Schatz

Michael C. Schatz

Johns Hopkins University

H-index: 83
Brian Haas

Brian Haas

Harvard University

H-index: 70
Charles H. Langley

Charles H. Langley

University of California, Davis

academic-engine

Useful Links