by Robert Buda | Apr 27, 2021 | NoSQL
Since I first blogged about NoSQL back in 2013, the use of “Not Only SQL” databases for mission-critical applications has proliferated across businesses of all sizes and sectors. With trends like big data analytics, cloud, mobile and IoT driving the modern digital enterprise, requirements for greater application performance and scalability combined with falling storage costs increasingly bias the playing field toward non-relational databases.
Is relational database (RDBMS) technology still the “reigning data champion”? Yes, but the gap has closed significantly. NoSQL’s more flexible data storage formats are helping to solve a range of problems that traditional relational databases struggle to address.
What are the top NoSQL use cases these days? Here are five of the most prevalent:
1. Real-time/Near Real-time Big Data Processing
The faster a company can process and act on fresh data, the greater its operational efficiency and business agility, and the greater the bottom-line value of its data. A typical approach to real-time big data processing uses stream processing to ingest new data combined with Apache Hadoop for analyzing historical data, plus a NoSQL database that integrates with both.
Payments processing leader PayPal is a prime example of a leading digital enterprise that processes big data on-the-fly and leverages it in multiple ways. PayPal captures vast quantities of raw clickstream data—more than 20 TB per day!—in multiple formats by processing it through Hadoop and Apache HBase NoSQL databases, and storing it all in the cloud for worldwide access by business analysts and data scientists. Fraud detection, data mining, customer segmentation and delivering personalized ads to customers are just some of the differentiating capabilities that PayPal has built on NoSQL.
2. Internet of Things
As of 2021, it’s estimated that about 46 billion IoT devices, from smartphones to home appliances to healthcare systems to factory sensors to smart vehicles, are now connected to the Internet. The amount of semi-structured data these devices continuously generate adds up to something like 847 zettabytes.
NoSQL databases are better suited than their relational counterparts to scale out to ingest this endless fire hose of diverse data. Freshub, a smart kitchen web platform for food shopping, is one example among many of an application that successfully processes data from huge numbers of IP-connected appliances. The Freshub solution maintains a MongoDB NoSQL cloud database of over 1 million grocery products gleaned from online catalogs in real-time. In this use case, NoSQL is well suited to integrate diverse and unpredictable data schemas from all these sources. NoSQL also scales out horizontally across an arbitrary number of cloud database nodes as Freshub grows its customer and data footprint.
3. Content Management
Online shopping now surpasses brick-and-mortar sales, and “content is king” across thousands of online marketplaces and web storefronts. Online sales leaders curate a selection of multimedia content (including user-generated and social media content like reviews, photos and videos) and deliver it to shoppers “at the moment of interaction.”
NoSQL document databases offer a flexible, open-ended data model that is ideal for storing a mix of structured, semi-structured and/or unstructured content. NoSQL also makes it possible to aggregate data that serves multiple business applications within a single catalog database. Whereas RDBMS with its fixed data models tend to result in the proliferation of multiple, overlapping catalogs for different purposes.
Forbes.com, which lives on viewership and ad revenue, exemplifies the use of NoSQL technology for content management. Forbes quickly built a custom content management system based on MongoDB in just a few months, giving them greater agility—including the ability to incorporate contributor content and analyze social sharing within clickstream data—at a lower cost. The same data store also feeds their mobile site, which now gets 50% of their total traffic.
4. Mobile Apps With Huge Numbers Of Users
Mobile phone and tablet use recently surpassed desktops as the top online platform for searching, shopping and otherwise viewing web content. Interestingly, as much of 90% of mobile data is served via apps and only 10% through browsers, an overwhelming shift in recent years.
Rapidly scaling mobile apps globally to serve tens of millions of users with acceptable performance (think mobile gaming or popular social media apps) often calls for distributed databases, which in turn calls for NoSQL. Flexible NoSQL data models also support rapid app update cycles better than relational data models in many cases.
For these reasons, more and more businesses looking to monetize web content are using NoSQL data stores for their apps. A popular case in point is The Weather Channel, whose MongoDB database instance handles millions of requests per minute while also processing user data and juggling weather updates from tens of thousands of worldwide locations.
5. Enriching The Digital Customer Experience
An engaging differentiating digital customer experience is built on data-intensive, time-critical capabilities like personalization, user profile management and a unified view of the customer across all your touch points. A lot of this demographic, behavioral and logistical data comes from the online clickstream, creating a write-intensive, multi-schema workload that taxes “scale-up” RDBMS infrastructure. A distributed NoSQL database can scale more cost-effectively, manage an ever-growing number of attributes with less administrative hassle, and often delivers lower latency—the Holy Grail of online interactions where you’re trying to get ads, recommendations, coupons, etc. in front of users in real-time.
Multimedia service provider Comcast uses a Couchbase NoSQL platform to deliver a positive customer support experience across multiple lines of business. A core goal of the platform is to capture data from huge numbers of omnichannel interactions (phone calls, online help, chatbots, etc.) and relate it all back to individual customers’ accounts and service status. Scalability and resilience are also critical concerns, as in any customer experience scenario. Especially because the better your solution works, the more customers will use it.
Other NoSQL Use Cases & Conclusion
- Real-time updates and queries
- Discussion thread hierarchy
- Data caching and archiving
- Simple data collection and analysis functions associated with voting and surveys
- Cross over data analysis that cannot be conducted in relational environments
- Online gaming where numerous simple queries need to run in fractions of a second
As these current NoSQL use cases illustrate, the strengths that I highlighted back in 2013, like flexible data models, low latency, ease of delivery/maintenance and the ability to integrate structured, semi-structured and unstructured elements, continue to make NoSQL a preferred choice for “digital transformation” across industries. The ongoing success of NoSQL innovators and early adopters like Netflix, Amazon, Twitter, Facebook and AOL continues to pave the way for new solutions.
Wondering if NoSQL technology is right for your application, how to architect a new NoSQL solution or how to move your current RDBMS to a NoSQL alternative? Contact Buda Consulting for a 15-minute “database discussion” to explore whether we can help.
by Robert Buda | May 12, 2020 | Database Architecture, Oracle, SQL Server
What is a schema as opposed to a database as opposed to an instance? And how do schemas differ between Oracle and MSSQL? Or between Oracle and PostgreSQL? Or between PostgreSQL and MongoDB?
These terms can be confusing, but they are very important when planning a database architecture. So let’s define these terms and discuss conceptually how they are similar and how they differ between database software implementations. This post will focus mostly on schemas, with some references to the other terms for context.
What is a database ?
A database is the collection of database files that contain the data being stored. These files hold both the user data and the metadata (data dictionary) that the database needs to make sense of the user data. The metadata includes the schema definitions (where applicable) as described below.
What is a database instance?
A database instance is the collection of all of the database software processes plus any memory structures required by those processes, plus the database files where the database data is stored. (See diagram)
The different software vendors treat the relationship between databases and instances in different ways.
Oracle supports one database per instance unless you are working with 12c and above and using Oracle Multitenant.
PostgreSQL supports multiple databases per instance. Some system catalogs are shared across all databases in an instance.
MSSQL supports multiple databases per instance. Each instance has a set of system databases that are shared across all databases served by that instance.
MongoDB supports multiple databases per instance.
What is a schema?
The concept of a schema can be a little confusing because there are three different relevant uses of the word “schema” in the context of an IT project.
-
- Mirriam Webster defines a schema as “a structured framework or plan, an outline.”
- In the realm of database technology, a schema means a structural definition of the data that you are storing. This essentially defines the datatypes of the data you are storing, and the organization of that data (into tables, documents, indexes and constraints, etc). This can be expressed in the form of a diagram such as an entity relationship diagram (ERD), or in a set of data dictionary language (DDL) statements, or in a JSON object.
- Some database vendors have extended the concept of a schema to include not just a definition of the structure of a set of data, but also a particular collection of objects that contain the data (tables, etc), and even the data itself. This is sometimes a named collection and is typically based on one of these factors:
- Who owns the objects (a database user)
- Who should have access to the objects (e.g., a database role that may be assigned to users)
- What the objects are used for (e.g., all objects for a given application or function within an application)
When implemented in this fashion, a schema can also be thought of as a namespace. An object can have the same name in two different schemas and the two objects will be distinct from each other.
It is interesting to note that MongoDB, which is a document database as opposed to a relational database, is sometimes called a schema-less database. MongoDB also has the concept of a schema, but it is purely a description of the structure of the data, more like definition 2 above than 3. A MongoDB schema does not represent the actual instance of the data, as it does with the relational databases mentioned.
To summarize, within the context of database management software, a schema is either a set of objects that contain data that is related in some logical way (user, access, application), or simply a definition of the structure of data.
Examples
Here are some example of schemas (see the diagram below):
-
- Schema JSMITH: A schema that contains all of the tables that belong to user Jsmith. This schema would typically simply be named the same as the user, and is often created automatically when that database user account is created. When the user connects to the database, this will typically be his default schema. So any objects that he creates will automatically be part of that schema. When he issues a query, unless he specifies a schema name as part of the name of the object he is querying, or changes his schema search path (this is done differently by each database vendor), the result set will come from the object by that name that exists in his default schema.
- Schema PAYROLL: A schema that contains all of the tables for the payroll application. This schema would typically be named for the application or a functional area within the application. When accessing data from the Payroll schema, users will need to either set their schema search path to the Payroll schema, or prefix all object names in the query with PAYROLL.
- Schema DBO: This built-in schema in MSSQL is the default for all users unless otherwise specified. In many SQL Server databases, almost all objects end up here. This is similar to the public schema in PosgtreSQL.
How does a schema differ between database vendors?
MSSQL and PostgreSQL have an actual object in the database called a schema. You can create and drop a schema, and you can assign access rights and ownership to a schema as a whole. In these environments, there is a loose connection between a schema and a database user. A schema may be owned by a database user. But a database user does not have to own any schemas. A schema may also be owned by a role instead of an individual user. If you want to drop a user that owns a schema, they can and must first transfer ownership of the schema to another user.
Oracle has the concept of a schema but it does not really have an object in the database called a schema. It is more conceptual. In Oracle, each database user may be an owner of objects, and the collection of objects owned by a given user is considered a schema. If a database user is dropped, all objects owned by that user (in that user’s schema) must be dropped first. A schema in Oracle does not exist independently of a database user. There is a command in Oracle called Create Schema that is essentially a wrapper that lets you create a database user and a set of objects to be owned by that user all at once. The Drop Schema command is really the same thing as Drop User. One cannot transfer ownership of an object from one user to another. The new user would need to recreate the object. (A CTAS query may be helpful here).
MongoDB, as mentioned earlier, uses the concept of a schema in database design and in the validation of the structure of incoming data. (Nice blog on this here.) But there is no object in the database known as a schema.
I hope this post helps pull together the concept of a schema and the way the different vendors have implemented schemas.
Please comment with any questions or examples that you think might be helpful, including for database vendors that are not listed here. Also, if you disagree with the way I defined schema, please let me know how you see it.
To talk over any questions you may have around schemas or database architecture in general, contact Buda Consulting.
by Robert Buda | Jul 9, 2010 | Database, MySQL
We are currently working on an application that uses a mysql database. The application produces a large number of csv files that must be kept around for a few weeks, and then removed after being used in various ways by the users of the system.
We need to keep track of all of these files and present a list of the files to the user so they can download them to their local machine, or send them to other systems as needed. We need to track when the file was created, who created it, and other information.
We considered a few alternatives for managing these files; We could keep then on the file system and have scripts that keep the database synchoronized with the file system. Or we could load each file into a blob type and keep it in the mysql database. Either of these options will work fine if we are careful. But we are seeking something more.
Our research led us to MongoDB. An open source document oriented database.
MongoDB claims to bridge the gap between key-value stores which are typically used by high volume web applications that have simple data storage requirements, and traditional RDBMS database systems.
In addition to simple queries based on key-value pairs, you can use boolean expressions, regular expressions, set operations, and sorting and limiting. MongoDB also has the ability to grab documents directly from the file system using its gridFS functionality.
We have not implemented MongoDB for this project yet but it looks very promising.
I will send updates as we get closer.