SPARCL - Data Set Notes

Notes on Ingested Data Sets

For information on how to acknowledge usage of data sets in SPARCL, see the acknowledgments page.


This page documents the provenance of the data sets and the expected number of records in each data set before and after applying selection criteria (where applicable). It further includes descriptions of possible problems encountered when ingesting the records into SPARCL.

Notes on original files and ingest audit


SPARCL audit on all data sets

Upon ingest, SPARCL performs a number of checks on all data sets.

  • A NaN or Inf value appears in data field(s)
  • Redshift value not in expected range of -1 to +20
  • The SPECOBJID (BOSS-DR16/SDSS-DR16) or TARGETID (DESI-EDR) (specid in SPARCL) is less than -9223372036854775808 or greater than 9223372036854775807

Any failures are tracked as part of the audit process and stored in an audit database.

We also document statistics on problematic cases at the end of the How-to use SPARCL Notebook.


BOSS-DR16 and SDSS-DR16

These two data sets are part of the same SDSS-DR16 from the SDSS Collaboration but are ingested separately in SPARCL due to the catalog and spectra files having different data models (SDSS vs. BOSS spectrograph). One can query them jointly using datasetgroup='SDSS_BOSS'.

For BOSS-DR16 and SDSS-DR16, the input files are the following:

  • specObj-dr16.fits (reference spectroscopic catalog used to create the list of selected records and to store information in the AUX fields)
  • spPlate-{plate}-{mjd}.fits (spectra files used to ingest the SPECTRA fields; data models for the SDSS and BOSS instruments)
  • spZbest-{plate}-{mjd}.fits(files with the best-fit model and some CORE fields; data models for the SDSS and BOSS instruments)

There are two main reasons why SDSS-DR16/BOSS-DR16 records are not ingested:

  1. Spectrum {plate},{fiberid},{mjd} is not listed in the specObj-dr16 file
  2. The spPlate file does not have a corresponding spZbest file

BOSS-DR16 records
Reason rejected # of records
Not found in specObj-dr16 file 0
No corresponding spZbest file 0
NaN/Inf values in field(s) 499

SDSS-DR16 records
Reason rejected # of records
Not found in specObj-dr16 file 0
No corresponding spZbest file 0
NaN/Inf values in field(s) 1345



DESI-EDR

The original Dark Energy Spectroscopic Instrument (DESI) redshift catalog for healpix spectra (zall-pix-fuji.fits; data model) contains 2,847,435 rows. Selecting only rows for DESI targets (objtype='TGT'), we obtain 2,044,588 expected records. The rejected entries are a combination of sky fibers and/or problematic fibers (objtype='SKY', 'BAD' or blank). We opt to exclude them from SPARCL.

We do not apply any further quality cuts but we recommend that users familiarize themselves with possible quality flags of interest such as coadd_fiberstatus (non-zero if there is a warning or error with the fiber) and zwarn (non-zero when there are possible issues with determining the redshift or fitting the spectra or carried over from the fiber status).

SPARCL only includes healpix-coadded spectra after they were joined across cameras such that each spectrum contains a single array flux vector for the full wavelength range (rather than separate vectors for the B, R, Z spectrograph arms). Similarly, the inverse variance (ivar) and best-fit Redrock template (model) were joined across cameras into single arrays.

Information on data access through the Astro Data Lab and example notebooks showing how to use Astro Data Lab databases jointly with SPARCL are available here.

DESI-EDR records
Reason rejected # of records
Invalid objtype 406737
NaN/Inf values in field(s) 0