Sitting down for an interview is always intimidating, especially if you’re applying for a job in a highly technical field like data modeling. Fortunately, you won’t be the first person to go through this process. We’ve come up with this list of 20 interview questions that you might be asked when you apply for a data modeling job.
While the exact questions you’ll be asked are dependent on the specific role and hiring manager, studying the answers to the questions should help you be prepared.
What is data modeling?
First, how do you define a data modeling job? Essentially, it’s a component of software engineering in which specific methods are applied to process and analyze data.
Data modeling is used in a variety of industries to understand data related to business operations, supply chains, vendor and partner relationships, and more.
Typically, a data modeler will perform some business analysis to understand how the different components of a business are related, with the goal of developing a software representation of this real-world business logic.
This may require several iterations to iron out discrepancies or inefficiencies, or may subsequently require additional updates to reflect newly created business logic.
Here are 20 data modeling interview questions you should know:
1) What are the three levels of data modeling?
A: Data modeling includes three main levels: conceptual, physical, and logical. Physical models are those that describe the physical structure of a data set. For example, it may display the data tables, data columns, and the relationships between them.
Conceptual models are the “high-level” models that are used to communicate with the stakeholder about the concepts that the completed model will display. They can be used as a stepping stone to the development of a physical model.
Finally, logical models are representations of the specific requirements of a business, either in whole or in part.
2) What is metadata in a data model?
A: Reports can be based on the complete data model or on parts of it. Data that relates to an object in the model is referred to as metadata.
3) How does reverse engineering work in data modeling?
A: Reverse engineering takes place when you build a model from an existing script or database.
If a company has an existing but outdated software solution that was built to represent their business model, it might be more efficient for a data modeler to attempt to reconstruct business logic from the existing solution, workshop this with the company, and make tweaks that can be applied to the current solution, rather than to develop a new solution from scratch.
Many data modeling programs have tools you can use to reverse engineer a database into a model in cases like this.
4) What are the two types of schemas and do they differ?
A: The two schemas are star schema and snowflake schema. A star schema looks like a star, with a fact table in the middle and dimension tables radiating out of it. The snowflake schema looks similar, but some of the dimensions are normalized, which means it can use space more efficiently than a star schema.
For example, an entity that contains fields for postal addresses (street address, city, region, state, country), might choose to switch from a star schema to a snowflake by breaking out the state and country into separate entities, so that each country is only explicitly declared once, and is then indirectly referenced in all appropriate places.
5) What are the three main tenets of a data model?
A: The three tenants of a data model are: entity, attribution, and relationship.
- Entity refers to a physical entity (such as a customer or product)
- Attribute refers to the qualities or characteristics of an entity (such as the name and price of a product)
- Relationship refers to the connection between two entities (such as a sale)
6) What are the five dimensions in data modeling?
A: The five dimensions in data modeling are: conformed dimension, junk dimension, role playing dimension, slowly changing dimension, and degenerated dimension.
Conformed dimensions are those that are used across multiple areas, for example, with several tables in a database.
Junk dimensions are those that aren’t otherwise found in a fact table or dimension table, such as Y or N flags or other indicators.
Role playing dimensions are those that are used in a single database, but that have more than one role, such as a date that is linked to more than one key in the table.
Slowly changing dimensions are those in which the attributes change over time, and include dimensions where data is replaced, preserved, or preserved separately. These include Type-0 through Type-4.
Degenerated dimensions are those that don’t have their own dimension tables, but are depicted as a primary key in the fact table.
7) Can you give an example of different types of slowly changing dimension?
A: A Type-0 dimension is one in which an attribute doesn’t change over time, such as a customer’s date of birth.
A Type-1 dimension is one in which the attribute’s value is completely replaced, such as updating a customer’s mailing address without saving the previous one.
A Type-2 dimension is one in which the prior value is saved, such as when a company keeps both a customer’s current and former addresses in a table.
A Type-3 dimension is one in which a limited history is saved, such as a company that keeps only one prior address, instead of storing all of them.
A Type-4 dimension is one in which the data is preserved, but stored separately, and is not displayed in the main table.
8) What is the purpose of third normal form and how is it met?
A: Third normal form (3NF) is used to, among other things, prevent data duplication and abnormalities. Third normal form is met when all of the attributes in a database table are dependent only on the primary key. For example, a database in which a patient’s entry in a table contains their doctor’s phone number does not meet 3NF standards. This is because the doctor’s number should be contained in a table for doctors, rather than patients, to prevent duplication if more than one patient appears in the database.
9) What is the risk of using a recursive relationship?
Recursive relationships are those in which an entity has a relationship with itself. As an example, an employee at a company may act as both a subordinate and a supervisor in relation to other employees at the same company. Therefore, a table of employees may include a recursive relationship, since it includes both subordinates and supervisors.
Recursive relationships are more complicated than other relationships and may need to be handled differently when displaying them in a schema.
10) What does OLTP data modeling refer to?
A: OLTP refers to online transaction processing and is used to create data models that include bank transactions, online purchases, trading transactions, and more.
Hiring qualified data modelers can be difficult. Learn how Hunt Club can help.
11) What is normalization and denormalization?
A: Normalization refers to putting data into first, second, or third normal form, which are used to reduce redundancy in a database. Sometimes, however, redundancy is needed in order to access certain information more quickly. Denormalization is the practice by which you add redundant data back into the dataset for speedier access.
12) What is the purpose of a primary key constraint?
A primary key is the smallest and most discrete piece of information that’s required to uniquely identify a row within an entity. This sort of constraint is placed on a particular attribute of an entity to mandate that this attribute be populated, and that it be unique within the entity. Some examples of a primary key might be an employee number, a bank account number, or a drivers’ license number.
13) What is a foreign key?
When two entities are related to each other, the primary key of one is usually referenced in the attribute of another to enforce this relationship. As an example, you might have a table of employees that contains their contact details, and with their employee number as the primary key. If you had also had a table that describes when employees were going on leave, these employees would be referred to by their employee number, but that wouldn’t be the primary key for that table. It would be the foreign key used to get additional information from the employee table if necessary.
14) What is a composite primary or foreign key constraint?
Like a single-attribute primary key, a composite primary key is the smallest and most discrete information required to uniquely identify a row in an entity. A composite key is different in that it consists of multiple attributes within that entity.
For example, an entity that contained people’s first names and last names might require a composite primary key consisting of both first name and last name, as each of these attributes alone may be insufficient to disambiguate two rows.
Like a regular foreign key, a composite foreign key is used to refer from one entity to another. In the case of a composite foreign key, however, it consists of multiple columns or attributes, rather than just one.
15) What are subtype and supertype entities, including examples?
It might be possible to break certain entities into different types or categories. An entity that contains vehicles might be considered a super-type entity that can be broken into subtype entities such as car, bike, boat, etc. Alternatively, an employee entity might have subtypes divided into full-time or casual employees.
16) What is a surrogate key?
Sometimes, if a data model doesn’t have a suitable candidate for a primary key, then a surrogate key is chosen as a substitute. A surrogate key is an artificial key created by the database modelers to ensure that rows within an entity can be uniquely queried efficiently and effectively within SQL queries.
If a business needed an entity that listed employees, but didn’t already have employee numbers, a database modeler might create an automatically generated “employee ID” column within the entity to use as the primary key. This column is a surrogate key.
17) What is cardinality?
Relational databases can have several different types of relationships between entities. These relationships can be one to one, one to many, or many to many. A database that has high cardinality has a large percentage of unique values within its entities, while a low cardinality indicates there is more duplicated information within the database.
18) What is a unique constraint?
A unique constraint is a limitation imposed upon the attribute of an entity to ensure there are no duplicates values within that attribute across all rows within the entity. However, an attribute with a unique constraint is not necessarily prohibited from being NULL.
19) What is a sequence?
A sequence is a database object used to uniquely generate a number. Sequences are often used to generate values for surrogate keys within an entity.
20) What are identifying and non-identifying relationships?
Most data models include parent tables and child tables that are linked by relationship lines. A row or column in the parent table might be part of the child table’s primary key, and so it would be an identifying relationship and shown by a thick connecting line.
If the row or column in the parent table is not found in the child table’s primary key, it is considered a non-identifying relationship and shown by a dotted line instead.
Find the data modeler you’ve been hoping for.
Experienced data modelers can be hard to come by, especially since the most talented ones may already have a job and aren’t actively looking for a new one.
That’s where Hunt Club comes in. With a network of over five million passive candidates vouched for by over 10,000 experts, we can help you find the perfect data modeler for your company. We use proprietary technology to narrow down the candidate pool to a shortlist of five to seven top candidates for you to interview.
Whether you’re looking to hire a single candidate, or want to fill your candidate pipeline for the long-term, reach out to Hunt Club today to start your search!
Recommended Hiring Resources