# RCSB_DB HISTORY

11-Mar-2018  - Py2->Py3 and refactored for python packaging tools
 4-Jul-2018  - V0.12 reformulate schema definitions directly from dictionary
               metadata and targeted helper functions.
20-Jul-2018    V0.13 overhaul static schema management
28-Jul-2018    V0.14 corrections for selection and type filters and object size pruning
20-Aug-2018.   V0.15 added sliced collections (e.g. entity or cannonical identifier),
               JSON schema description of collections, sfx/xfel examples, dictionary
               methods implemented in helper classes, and incremental repository metadata
               updates.
25-Aug-2018    V0.16 split out rcsb.utils.config, rcsb.utils.io, rcsb.utils.multiproc and
               mock-data as a shared submodule for test data. Merged branch ' namespace'
               back into master.
28-Aug-2018    V0.17 rename console scripts directory to avoid conflict with reserved keyword in Py2.
 9-Sep-2018    V0.18 add support for multi-level JSON schema generation and validation,
               integrate linking CIF methods to generate content at load time, add core
               assembly collection, add cardinal identifiers in new categories to each
               core collection, and enum normalization as a data transformation filter operation.
11-Sep-2018    V0.19 add dictionary method for citation author aggregation
                     adjust cardinality of entity and assembly identifier categories
14-Sep-2018    V0.20 Require at least one record in any array type, adjust constraints on iterables.
18-Sep-2018    V0.21 Require homogeneous categories/classes in JSON schema production.
22-Sep-2018    V0.22 Add method to generate _pdbx_struct_assembly.rcsb_candidate_assembly
10-Oct-2018    V0.23 Add date format to schema definitions; generate schemas and add validation tests
                     for repository_holdings, entity_sequence_clusters, and data_exchange schema types;
                     extend derived content for core_entry and core_entity collections;
                     add subcategory aggregation feature; and refactor api for helper methods.
12-Oct-2018    V0.24 Add rcsb_repository_holdings_transferred and  rcsb_repository_holdings_insilico_models,
                     make datetime type mappings the same as date, check for empty required properties for
                     subcategory aggregates
28-Oct-2018    V0.25 Move local helper method configuration to common YAML configuration, restore some missing
                     modules for cockroach and crate server types,  add category rcsb_accession_info in core_entry,
                     make audit_authors an iterable type in categories rcsb_repository_holdings_transferred,
                     rcsb_repository_holdings_insilico_models, and rcsb_repository_holdings_unreleased,
                     verify data types in sequence cluster collections.
13-Nov-2018    V0.26 Add chemical component and bird chemical compoment core collections with convenience
                   categories rcsb_chem_comp_info, rcsb_chem_comp_synonyms, rcsb_chem_comp_descriptor,
                   and rcsb_chem_comp_target. Add convenience category rcsb_entry_info in pdbx_core core_entry
                   collection.  Include correspondence details for DrugBank and CCDC/CSD.
                   Add dictionary methods to filter core entry objects by experiment type to
                   remove largely vacuous categories.  Add new CLI entry point for schema generation.
27-Nov-2018    V0.27 resolve inconsistent handling of multiple sources for antibody molecule types. Add
                   dictionary method rcsb_file_block_by_method add item pdbx_chem_comp_audit.ordinal to replace
                   the primary key for this category.    Add new mechanism to inject private document key
                   attributes to support Solr access and indexing.  (Feature branch  multisourcefix). Extend
                   repository holdings collection with prerelease sequences and extended content types for
                   the current repository state.
28-Nov-2018    V0.28 minor update to FASTA sequence content type to repository holdings current inventory and
               adjustments in constraints for rcsb_entry_info category production
30-Nov-2018    V0.29 branch birdconsolidate includes mashup of all BIRD definition data, additional content
               types in current repository holdings with adjustments to filter obsolete entries, various
               additional categories added to core entry collection.
 1-Dec-2018    V0.30  Adjustments for roundtrip CI/CD
 2-Dec-2018    V0.31 Fixes for expansion of semi-colon separated value types, add methods to
               include NCBI scientific and common names.
 3-Dec-2018    V0.32 interim update for troubleshooting entity collection load issues, adding search indices
               for core collections, adding optional private identifier chemical component identifiers for non-polymer
               entities, adjustments to bird core citation schema.
 9-Dec-2018    V0.33 Add _pdbx_reference_molecule.class, _pdbx_reference_molecule.type, Add categories drugbank_info
               and drugbank_target, add method rcsb_add_bird_entity_identifiers, add  _rcsb_entry_info.solvent_entity_count
               and adjust counts to include solvent.  Add consolidated loader for BIRD data, add loader for DrugBank
               corresponding data from rcsb.utils.chemref.    Add CLI option for loading integrated chemical reference data.
               Move time consuming schema validation tests to seperate subdirectory.  Add mandatory option for injected
               private keys.  Resolve schema loading issues with current chemical component and BIRD definitions.
 12-Dec-2018   V0.34 Add rcsb_entity_id identifiers in categories struct_ref_seq and struct_ref_seq_dif.Add category rcsb_entity_poly_info
               as the basis for a new core collection core_entity_monomer. Adjust logic for reporting assembly format availibility.
 13-Dec-2018   V0.35 Add ihm_dev collection and support for I/HM repository model files
 18-Dec-2018   V0.36 Add _entity_poly.rcsb_prd_id and configuration to include private key for BIRD identifier in the core_entity collection.'
  7-Jan-2019   V0.37 Simplify and consolidate site specific configuration options, streamline cli scripts, and consolidate common
               schema access and building methods in SchemDefUtil() (feature branch configpath)
  9-Jan-2019   V0.38 Introduce a new data transformation filter for XML character references and miscellaneous related changes.
 10-Jan-2019   V0.39 Adjustments to improve diagnostics relative remaining data issues observed loading latest data files, and
               tuning of classification for RCSB candidate assemblies.
 17-Jan-2019   V0.40 adjust handling of document replacement in sliced collections
 25-Jan-2019   V0.41 schema extension for DrugBank core collection
 16-Feb-2019   V0.42 Add entity_instance_core support, add content type merging feature to allow consolidation of data artifacts
               prior to data load processing, and overhaul the slice processing to improve performance.
 18-Feb-2019   V0.43 provide a more graceful handling of empty slices,
 19-Feb-2019   V0.44 add slice support for pdbx_validate_* categories, and adjust category exclusions for the entity_core collection.
 20-Feb-2019   V0.44 adding EntityInstanceExtractor.py preliminary version, adjust logging and cli script options.
#
 22-Feb-2019   V0.45 preliminary version of tools for validation load analysis.
  3-Mar-2019   V0.46 move validation analysis tools, EntityInstanceExtractor.py and related tests to package rcsb.utils.anal.
               Change helper criteria for identifying chemical component objects and warn for missing chem_comp_atom category cases.
               Change release filter for chem_comp_* collection to allow only status codes 'REL' and 'REF_ONLY' -
 14-Mar-2019   V0.47 Additonal dictionary content and helper method support for -
                _entity.rcsb_macromolecular_names_combined,   _rcsb_entry_info.software_programs_combined,
                _rcsb_entity_container_identifiers.chem_comp_ligand, _rcsb_entity_container_identifiers.chem_comp_monomers,

                Added new data types int-csv and int-scsv

                Added automated subcategory aggregation in the general data processing pipeline.

                New sub-category taxonomy_lineage with items: _rcsb_entity_source_organism.taxonomy_lineage_id, _rcsb_entity_source_organism.taxonomy_lineage_name, _rcsb_entity_source_organism.taxonomy_lineage_depth
                and _rcsb_entity_host_organism.taxonomy_lineage_id, _rcsb_entity_host_organism.taxonomy_lineage_name,
                _rcsb_entity_host_organism.taxonomy_lineage_depth

                New sub-category rcsb_ec_lineage with items _entity.rcsb_ec_lineage_name _entity.rcsb_ec_lineage_id
                and _entity.rcsb_ec_lineage_depth

                Add_rcsb_schema_container_identifiers.schema_name, _rcsb_schema_container_identifiers.collection_name
                _rcsb_schema_container_identifiers.collection_schema_version

               Configuration changes to simplify the handling of versioning removing these from database and collection names
               and schema file names, version configuration now managed in separate configuration options,
               NCBI_TAXONOMY_LOCATOR removed, added configuration options NCBI_TAXONOMY_CACHE_PATH and ENZYME_CLASSIFICATION_CACHE_PATH,
               provide separate configuration options schema version assignment within collections,
               up-version api and json schema, add config option VRPT_REPO_PATH_ENV.

               chem_comp category attributes normalized between chem_comp_core and bird_chem_comp_core collections.
 17-Mar-2019   V0.48 adding subcategory aggregate rcsb_macromolecular_names_combined
 24-Mar-2019   V0.49 address missing category issues in chemical reference data collections, various changes to
               address schema validation exceptions introduced by additionalProperties: False.  Add consolidated
               EC item.   Extend taxonomy lineaage to include leaf node.
 24-Mar-2019   V0.50 temporary adjustment for error handing for obsolete taxonony ids.
 25-Mar-2019   V0.51 remap merged taxons and adjust exception handling for taxonomy lineage generation
 31-Mar-2019   V0.52 block_attributes: REF_PARENT_CATEGORY_NAME: REF_PARENT_ATTRIBUTE_NAME: to provide parent details for synthetic key
               add 'addParentRefs' in enforceOpts to write relative $ref properties to describe parent relationships.
                Address GitHub issues:
                        Failed int cast for None in DataTransformFactory #20 and
                        MySQL SchemaDefLoader skip zero values #19
  3-Apr-2019   V0.53 added private schema property _primary_key to mark attributes that are part of an object primary key.
               This feature is controlled by 'addPrimaryKey' enforceOpts property.
  7-Apr-2019   V0.54 adding support for integrating CATH and SCOP structural classifiers and sequence difference type counts.
 11-Apr-2019   V0.55 add tree loaders to etl cli tool, add entity_instance_validation, add a variety of counts, add atc codes.
 13-Apr-2019   V0.56 add prototypical json schema properties '_attribute_groups' mapped to subcategory id/labels.
 17-Apr-2019   V0.57/58 schema and cache adjustments for node tree loading
 25-Apr-2019   V0.59 adds parent source/host organism and polymer monomer counts
  3-May-2019   V0.60 adds helper support for attributes:
                   _rcsb_entry_info.deposited_polymer_monomer_count,
                   _rcsb_entry_info.polymer_entity_count_protein,
                   _rcsb_entry_info.polymer_entity_count_nucleic_acid,
                   _rcsb_entry_info.polymer_entity_count_nucleic_acid_hybrid,
                   _rcsb_entry_info.polymer_entity_count_DNA,
                   _rcsb_entry_info.polymer_entity_count_RNA,
                   _rcsb_entry_info.nonpolymer_ligand_entity_count
                   _rcsb_entry_info.selected_polymer_entity_types
                   _rcsb_entry_info.polymer_entity_taxonomy_count
                   _rcsb_entry_info.assembly_count
                and categories
                    rcsb_entity_instance_domain_scop
                    rcsb_entity_instance_domain_cath
  4-May-2019  V0.61 extend content in and categories rcsb_entity_instance_domain_scop
                    rcsb_entity_instance_domain_cath to support slicing
 18-May-2019  V0.62 add rcsb_assembly_info category to the core_assembly
                    adjust classificatoin of ternary complexes extending enumeration for _rcsb_entry_info.selected_polymer_entity_types
                    add new classifier _rcsb_entry_info.na_polymer_entity_types
                    and removed _rcsb_entry_info.nonpolymer_ligand_entity_count
 20-May-2019  V0.63 Category rcsb_prot_sec_struct_info added to core_entity_instance and prune primary secondary structure
                    categories from this collection.
 21-May-2019  V0.64 Handle odd ordering of records in struct_ref_seq_dif
 29-Jun-2019  V0.65 Update development workflow, method code packaging,  and general housekeeping
  8-Aug-2019  V0.66 Refactoring dictionary method helpers and external resource provisioning to
                    reduce redundant processing and facilitate greater caching. Remove testing
                    dependencies on mock schemas and add explicit schema differencing to better
                    schema evolution. Add new helper methods for sites and connections and revise helper
                    methods for all entity and instance features.
 10-Sep-2019  V0.67 Adding extended metadata types, AtcProvider() and SiftsSummaryProvider().
 19-Sep-2019  V0.68 Eliminate private keys, revise/extend interenal schema metadata, purge schema,
                    add polymer alignment methods.
 24-Sep-2019  V0.69 Add additional polymer_instance features and new polymer_instance_validations methods.
 25-Sep-2019  V0.70 Incorporate RCSB extensions in 'min' schemas add single collection specific identifiers core collections
                    Add missing required entry identifiers.
 25-Sep-2019  V0.71 Adjustments for edge cases and enums after full-load tests
 29-Sep-2019  V0.72 Further adjustments for edge cases and enums after full-load tests
 29-Sep-2019  V0.73 Parallelize merging content types in RepositoryProvider()
 29-Sep-2019  V0.74 Cleaning up more edge cases with missing or misorganized data
  6-Oct-2019  V0.75 Reorganize feature value within feature range and position subcategories
  7-Oct-2019  V0.76 Tweak sequence reference processing to better cope with inconsistencies and missing data.
  8-Oct-2019  V0.77 Upversion schema after successful build
 14-Oct-2019  V0.78 Add support for sequence isoforms, entry molecular weight, turn off SIFTS for
                    references and alignments,  and various schema adjusts.
 14-Oct-2019  V0.79 Add test cases and fix exceptions for missing sequence difference details.
 14-Oct-2019  V0.80 Fix linting issue and update documentation in configuration example file -
 16-Oct-2019  V0.81 Filter reference sequence assignments marked as self-reference - added tests for GO Id consistency
 17-Oct-2019  V0.82 Preserve annotations on self-referenced sequences for now/shift ATC depth
 19-Oct-2019  V0.83 Strip missing/empty values from subcategory aggregates/handle missing accession codes on struct_ref_seq records
 19-Oct-2019  V0.84 Abandon reliance on ref_id/align_id consistency in struct_ref_* categories.
 20-Oct-2019  V0.85 Update type, coverage and configuration for latest IHM dictioanry update
 23-Oct-2019  V0.86 Add nesting property to objects with unit cardinality,
                    don't propagate exact-match/suggest search contexts to full-text
 25-Oct-2019  V0.87 Address a couple of anomalous sequence accession code issues and sync up mock repos
 26-Oct-2019  V0.88 Address anomalous non-polymer reference sequence records.
  2-Nov-2019  V0.89 Add bird specific citation handling methods and rerun
 23-Nov-2019  V0.90 Move pipeline to py38
  4-Dec-2019  V0.91 Add type specific entity collections (branch entity-type-division)
  5-Dec-2019  V0.92 Update search contexts for repository holdings.
  8-Dec-2019  V0.93 Partially handle odd cases of missing entity_poly category and adjust enumeration for nonpolymer instances features
                    Adjust error reporting in getProtSecStructFeatures(), tweak mandatory items for fiber diffraction, handle boundary
                    counts for cases with no macromolecules
 10-Dec-2019  V0.94  Adding aggregation of gene names, related annotation lineage, fixed miscellaneous counting issues
 14-Dec-2019  V0.941 Reorganize repository holdings collections
 15-Dec-2019  V0.942 Suppress SIFTS with primary data loading
 16-Dec-2019  V0.943 Extend normalizeCsvToList() to handle more inconsisent cases.
 16-Dec-2019  V0.944 Add workflow module and tests to facilitate Luigi integration.
 23-Dec-2019  V0.945 Add consolidated EC processing and disable auto search context filters
 04-Jan-2020  V0.946 Add support for rcsb_polymer_entity.rcsb_enzyme_class_combined_depth, set single process cache management on loading
 08-Jan-2020  V0.947 Add checks for missing entity gene names
 10-Jan-2020  V0.949 Update dependencides for rcsb.utils.struct
 20-Jan-2020  V0.950 pre-beta-rev-v1 - reorganizing assembly and feature categories, standardize on provenance_source, update enums
 25-Jan-2020  V0.951 nested schema extension adjustments
 27-Jan-2020  V0.952 configuration update updating ExDb collection names
 28-Jan-2020  V0.953 update configuration documentation and cleanup unused files
 29-Jan-2020  V0.954 finalize feature schema
 30-Jan-2020  V0.955 update dependencies
 30-Jan-2020  V0.956 Change PDBx MOCK repository organization from SANDBOX to FTP repository layout, handle missing data for
                     chromophores, and update configuration example file.
 30-Jan-2020  V0.957 Update dependency version for rcsb.utils.config
  2-Feb-2020  V0.958 Corrections keyword arguments in RepoLoadWorkflow()
  3-Feb-2020  V0.959 Split configuration file into site and schema portions
  5-Feb-2020  V0.960 Fix some inconsistencies among repository holdings collections and add consistency checks for
                     entity instance validation features
  6-Feb-2020  V0.961 Add additional chimeric and multi-reference test cases and suppress mongodb tracebacks on replace failures
  9-Feb-2020  V0.962 Cleanup combined software program list, separate tests to detect schema differences, add support for
                     schema updates to existing collections, update schema update CLI with compare methods.
 11-Feb-2020  V0.963 Expose upsertFlag on mongo update method wrapper.
 12-Feb-2020  V0.964 Extend entity and entity instance annotations with convalent linkage details
 12-Feb-2020  V0.965 Add model_ids to entry_container_identifiers and update dependencies
 14-Feb-2020  V0.966 Add MongoDb maintenance operations, point example config to development branch assets, update enums with
                     BIRD_MOLECULE_NAME.
 19-Feb-2020  V0.967 Initialize zero values for feature counts and coverage
 26-Feb-2020  V0.968 Add missing provenance source for rcsb_chem_comp_synonyms, filter case duplicates for taxonomy common
                     names, entity descriptions, and gene names.
  5-Mar-2020  V0.969 Allow category nested contexts and remote tests for uniprot-core
 17-Mar-2020  V0.970 Add separate primary citation category and offset for zero based sequence database references
 24-Mar-2020  V0.971 Add support for rcsb_chem_comp_annotation and various fixes for schema builder.
 27-Mar-2020  V0.972 Add support for rcsb_repository_holdings_removed.repository_content_types
  1-Apr-2020  V0.973 Add test cases for pdbx_audit_revision_details.description
  1-Apr-2020  V0.974 Add test cases for RESID annotations
  5-Apr-2020  V0.975 Adjustments to logging details for sequence processing
  7-Apr-2020  V0.976 Add min/max for consolidated diffraction wavelength details
  8-Apr-2020  V0.977 Adjust handling of nested contexts for subcategories.
 17-Apr-2020  V0.978 Add rcsb_accession_info.has_released_experimental_data
 27-Apr-2020  V0.979 Adjustments for remediated carbohydrate data files, leave BIRD_* annotations solely in definition collections.
 30-Apr-2020  V0.980 New NMR repo holdings content types, specific configuration option added for ed maps list
  2-May-2020  V0.981 Restore out-of-sync repository holdings data processing module and update related tests
  8-May-2020  V0.982 Adding support for context_attribute_values and search path schema extensions.
 13-May-2020  V0.983 Adjustments to suppress materialization of an assembly of deposited coordinates
 15-May-2020  V0.984 Adjustments to suppress logging of missing assembly data
 16-May-2020  V0.985 Adding test case for branched entity replace test
 18-May-2020  V0.986 More complete object purge for loadType replace, explicitly filter heavy atoms in counting stats.
 19-May-2020  V0.987 Reduce verbose logging of memory usage and use alternative $regex filter for mongo purging
 21-May-2020  V0.988 Added additional methods to export search group and context statistics
 22-May-2020  V0.989 Adding auth to entity polymer sequence residue mapping item.
  3-Jun-2020  V0.990 Updating dependencies for rcsb.utils.validation and validation test conditions.
  4-Jun-2020  V0.991 Adding new flag for prerelease sequence availability
  5-Jun-2020  V0.992 Adding support for new enum DDL attributes.
 12-Jun-2020  V0.993 Adjustments to the labeling of site, occupancy and oberserved features
 17-Jun-2020  V0.994 Adjustments rcsb_repository_holdings_unreleased_entry collection
 18-Jun-2020  V0.995 Adding entity taxonomy count.
 25-Jun-2020  V0.996 Adding schema validation to automatically follow on from any load failures in mongo/PdbxLoader()
 25-Jun-2020  V0.997 Update dependencies
 30-Jun-2020  V0.998 Add partial reload of failed object salvage path
 10-Jul-2020  V0.999 Adding support for derivative branched entity methods
 15-Jul-2020  V1.100 Adding methods to export water counting statistics.
 28-Jul-2020  V1.200 Add stash server config options to example configuration file
 28-Jul-2020  V1.300 Tweak stash configuration
 30-Jul-2020  V1.400 Add PubChem and Pharos mapping
  5-Aug-2020  V1.500 Add carbohydrate content to rcsb_chem_comp_synonyms and rcsb_chem_comp_annotation
  8-Aug-2020  V1.610 Limit SIFTS summary data cached
 15-Aug-2020  V1.620 Update dependency version for CATH backup
 21-Aug-2020  V1.630 Turn on the carbohydrate type options.
 26-Aug-2020  V1.631 Adjust polymer type counting -
 29-Aug-2020  V1.632 Add fix for missing data collection resolution
 30-Aug-2020  V1.633 Fix issue with subcategory aggregates with unit cardinality
 10-Sep-2020  V1.634 Trap missing data type exception in schema construction
 24-Sep-2020  V1.635 Use alternative mapping from pdbx_poly_seq_scheme
  1-Oct-2020  V1.636 Change handling of chemical component synonyms
  7-Oct-2020  V1.637 Update sequence cluster data and config path
 12-Oct-2020  V1.638 Update modeled residue counts and add option to use selective internal enumerations.
 13-Oct-2020  V1.639 Update example data and configuration
 13-Oct-2020  V1.640 Cast examples in schema, adjust handling of internal enumeration fallbacks
 21-Oct-2020  V1.641 Add DrugBank brand names to rcsb_chem_comp_synonyms
 23-Oct-2020  V1.642 Add rcsb_repository_holdings_combined_entry collection
 24-Oct-2020  V1.643 Incorporate obsolete/supersede item in rcsb_repository_holdings_combined_entry
 28-Oct-2020  V1.644 Make reference alignment options accessible from the configuration file
 31-Oct-2030  V1.645 Add taxonomy identifier salvage method
  1-Nov-2030  V1.646 Extend taxonomy identifier salvage method