What is a schema as opposed to a database as opposed to an instance? And how do schemas differ between Oracle and MSSQL? Or between Oracle and PostgreSQL? Or between PostgreSQL and MongoDB?
These terms can be confusing, but they are very important when planning a database architecture. So let’s define these terms and discuss conceptually how they are similar and how they differ between database software implementations. This post will focus mostly on schemas, with some references to the other terms for context.
What is a database ?
A database is the collection of database files that contain the data being stored. These files hold both the user data and the metadata (data dictionary) that the database needs to make sense of the user data. The metadata includes the schema definitions (where applicable) as described below.
What is a database instance?
A database instance is the collection of all of the database software processes plus any memory structures required by those processes, plus the database files where the database data is stored. (See diagram)
The different software vendors treat the relationship between databases and instances in different ways.
Oracle supports one database per instance unless you are working with 12c and above and using Oracle Multitenant.
PostgreSQL supports multiple databases per instance. Some system catalogs are shared across all databases in an instance.
MSSQL supports multiple databases per instance. Each instance has a set of system databases that are shared across all databases served by that instance.
MongoDB supports multiple databases per instance.
What is a schema?
The concept of a schema can be a little confusing because there are three different relevant uses of the word “schema” in the context of an IT project.
-
- Mirriam Webster defines a schema as “a structured framework or plan, an outline.”
- In the realm of database technology, a schema means a structural definition of the data that you are storing. This essentially defines the datatypes of the data you are storing, and the organization of that data (into tables, documents, indexes and constraints, etc). This can be expressed in the form of a diagram such as an entity relationship diagram (ERD), or in a set of data dictionary language (DDL) statements, or in a JSON object.
- Some database vendors have extended the concept of a schema to include not just a definition of the structure of a set of data, but also a particular collection of objects that contain the data (tables, etc), and even the data itself. This is sometimes a named collection and is typically based on one of these factors:
- Who owns the objects (a database user)
- Who should have access to the objects (e.g., a database role that may be assigned to users)
- What the objects are used for (e.g., all objects for a given application or function within an application)
When implemented in this fashion, a schema can also be thought of as a namespace. An object can have the same name in two different schemas and the two objects will be distinct from each other.
It is interesting to note that MongoDB, which is a document database as opposed to a relational database, is sometimes called a schema-less database. MongoDB also has the concept of a schema, but it is purely a description of the structure of the data, more like definition 2 above than 3. A MongoDB schema does not represent the actual instance of the data, as it does with the relational databases mentioned.
To summarize, within the context of database management software, a schema is either a set of objects that contain data that is related in some logical way (user, access, application), or simply a definition of the structure of data.
Examples
Here are some example of schemas (see the diagram below):
-
- Schema JSMITH: A schema that contains all of the tables that belong to user Jsmith. This schema would typically simply be named the same as the user, and is often created automatically when that database user account is created. When the user connects to the database, this will typically be his default schema. So any objects that he creates will automatically be part of that schema. When he issues a query, unless he specifies a schema name as part of the name of the object he is querying, or changes his schema search path (this is done differently by each database vendor), the result set will come from the object by that name that exists in his default schema.
- Schema PAYROLL: A schema that contains all of the tables for the payroll application. This schema would typically be named for the application or a functional area within the application. When accessing data from the Payroll schema, users will need to either set their schema search path to the Payroll schema, or prefix all object names in the query with PAYROLL.
- Schema DBO: This built-in schema in MSSQL is the default for all users unless otherwise specified. In many SQL Server databases, almost all objects end up here. This is similar to the public schema in PosgtreSQL.
How does a schema differ between database vendors?
MSSQL and PostgreSQL have an actual object in the database called a schema. You can create and drop a schema, and you can assign access rights and ownership to a schema as a whole. In these environments, there is a loose connection between a schema and a database user. A schema may be owned by a database user. But a database user does not have to own any schemas. A schema may also be owned by a role instead of an individual user. If you want to drop a user that owns a schema, they can and must first transfer ownership of the schema to another user.
Oracle has the concept of a schema but it does not really have an object in the database called a schema. It is more conceptual. In Oracle, each database user may be an owner of objects, and the collection of objects owned by a given user is considered a schema. If a database user is dropped, all objects owned by that user (in that user’s schema) must be dropped first. A schema in Oracle does not exist independently of a database user. There is a command in Oracle called Create Schema that is essentially a wrapper that lets you create a database user and a set of objects to be owned by that user all at once. The Drop Schema command is really the same thing as Drop User. One cannot transfer ownership of an object from one user to another. The new user would need to recreate the object. (A CTAS query may be helpful here).
MongoDB, as mentioned earlier, uses the concept of a schema in database design and in the validation of the structure of incoming data. (Nice blog on this here.) But there is no object in the database known as a schema.
I hope this post helps pull together the concept of a schema and the way the different vendors have implemented schemas.
Please comment with any questions or examples that you think might be helpful, including for database vendors that are not listed here. Also, if you disagree with the way I defined schema, please let me know how you see it.
To talk over any questions you may have around schemas or database architecture in general, contact Buda Consulting.