By Vítor Betmann Calaça
Video overview: https://youtu.be/93zT3HXG7qA
Purpose:
- The INTERNSHIP DATABASE is inspired by an internship course from the Cinema & Media Arts department that I took while at university. The course aimed at helping students find internships in Canada (which may be in person, remote, or hybrid) and grades them based on their performance during the internship.
- The database manages and stores data on which students obtained internships, at which companies, who supervised them, how the students performed, and each student’s feedback regarding the internship experience.
The scope of this database includes:
- Students, Companies, and Supervisors, each containing basic identifying information.
- Internships, containing the student, supervisor, student position, and the start and end dates of the internship.
- Student Feedback and Supervisor Feedback, containing scores on how the student performed (for course grading) and the student’s opinion of the internship, respectively.
Beyond the scope of this database are:
- The actual grading of students.
- Any non-essential student information for this course, such as tuition and what other courses they have enrolled in.
- Any legal information or documentation related to legal processes regarding the university–company relationship.
Supported by the database:
- CRUD operations for students, supervisors, and companies.
- Tracking a student’s internship history with data on supervisor, company, position, and start and end dates.
- Providing a way for supervisors to grade students’ performances during the internship, as well as a way for the student to grade their own internship experience.
This database is intended to be used as a historical record of internship experiences. Using it to infer “present time” information is unreliable. Beyond its scope are:
- Using it to search for current open internships.
- Inferring any present day information (such as if a company is still operating or if a supervisor still works there).
Entities are captured in SQLite tables with the following schema.
The database includes the following entities:
The students table includes:
id, which specifies the unique ID for the student as anINTEGER. This column thus has thePRIMARY KEYconstraint applied.student_number, which specifies the student’s unique identification number for the purposes of anything related to the university asTEXT, to avoid issues with student numbers that start with0(as all student numbers must be9digits long, and aCHECKconstraint ensures that). Due to it being associated with a single student, it has theUNIQUEconstraint.first_name, which specifies the student's first name asTEXT, as it is appropriate for string fields (like a name).last_name, which specifies the student's last name asTEXT, for the same reason asfirst_name.email, which specifies the student's university email asTEXT, for the same reason asfirst_name. As with thestudent_number, each student has their own email, therefore, this field has theUNIQUEconstraint applied.
All fields are required and thus NOT NULL.
The companies table includes:
id, which specifies the unique ID for the company as anINTEGER. This column thus has thePRIMARY KEYconstraint applied.name, which specifies the official name of the company asTEXT, as it is appropriate for string fields (like a name). To avoid duplicates, theUNIQUEconstraint is applied.industry, which specifies the industry in which the company operates asTEXTdue to the same reason asname(although a MySQLSETwould work better for companies like Sony, which act in film and animation, but in this database it is assumed each company only spans a single industry). It has aCHECKconstraint (as it's meant for Cinema & Media Arts students) to ensure the company is under thefilm and tv,radio,animation, orjournalismindustries. Although this list is not comprehensive, it should limit students' options to companies related to their degree.city, which specifies the city at which the company is located asTEXTdue to the same reason asname.province, which specifies the province at which the company is located asTEXTdue to the same reason asname(although a MySQLENUMwould work better). This field has aCHECKconstraint for all the two-letter codes of Canada's provinces.blacklisted, which specifies whether a company has been blacklisted or not (i.e. students should avoid it) asNUMERIC. This field works as a soft delete, as when a company is blacklisted (i.e. when a statement tries to delete it from the database),CURRENT_DATEfills the column. From this, it can be inferred that if this column is notNULL, then this company has been blacklisted, and the date in the field states when that happened.
Except for blacklisted, all fields are required and thus NOT NULL.
The supervisors table includes:
id, which specifies the unique ID for the supervisor as anINTEGER. This column thus has thePRIMARY KEYconstraint applied.first_name, which specifies the supervisor's first name asTEXT, as it is appropriate for string fields (like a name).last_name, which specifies the supervisor's last name asTEXTdue to the same reason asfirst_name.company_id, which is the ID of the company at which the supervisor works as anINTEGER. This column thus has theFOREIGN KEYconstraint applied, referencing theidcolumn in thecompaniestable to ensure data integrity.position, which specifies the job position the supervisor holds at the company they work at asTEXT, due to the same reason asfirst_name.phone_number, which specifies the personal or work phone number for quickly reaching the supervisor asTEXTas the priority here is readability, although havingphone_numberstored as anINTEGERwould also work under other circumstances. To avoid duplicates, this field has theUNIQUEconstraint applied.email, which specifies the work email of the supervisor asTEXT, for the same reason asfirst_name. As with thephone_number, each company employee has their own email, and therefore theUNIQUEconstraint is applied.
All fields are required and thus NOT NULL.
The internships table includes:
id, which specifies the unique ID for the internship as anINTEGER. This column thus has thePRIMARY KEYconstraint applied.student_id, which is the ID of the student interning at a certain company as anINTEGER. This column thus has theFOREIGN KEYconstraint applied, referencing theidcolumn in thestudentstable to ensure data integrity.supervisor_id, which is the ID of the supervisor supervising the student as anINTEGER. This column thus has theFOREIGN KEYconstraint applied, referencing theidcolumn in thesupervisorstable to ensure data integrity.student_position, which specifies the job position the student holds at the company they intern at asTEXT, as it is appropriate for string fields (like a job position).work_type, which specifies with aCHECKconstraint whether the internship isin person,remote, orhybridasTEXTdue to the same reason asstudent_position.start_date, which specifies when the internship started asNUMERIC, as that is a good type affinity to represent dates. It defaults toCURRENT_DATE.end_date, which specifies when the internship ended asNUMERICdue to the same reason asstart_date. ACHECKconstraint ensuresend_dateis eitherNULLor greater thanstart_date.
Except for end_date, all fields are required and thus NOT NULL.
The supervisor_feedback table includes:
internship_id, which specifies the unique ID for the supervisor feedback as anINTEGER. This column thus has thePRIMARY KEYconstraint applied, and also theFOREIGN KEYconstraint applied, referencing theidcolumn in theinternshipstable to ensure data integrity and save on memory.initiative_score, which is the initiative score the supervisor would give the student.quality_of_work_score, which is the quality of work score the supervisor would give the student.productivity_score, which is the productivity score the supervisor would give the student.reliability_score, which is the reliability score the supervisor would give the student.comment, which contains the comments of the supervisor regarding the student's performance asTEXT, as it is appropriate for long strings or paragraphs. It is not required, and therefore it can beNULL.
All [something]_score columns are required, therefore NOT NULL, and have a CHECK constraint to be between 0 and 5.
The student_feedback table includes:
internship_id, which specifies the unique ID for the student feedback as anINTEGER. This column thus has thePRIMARY KEYconstraint applied, and also has theFOREIGN KEYconstraint applied, referencing theidcolumn in theinternshipstable to ensure data integrity and save on memory.internship_score, which is the score the student would give the internship experience. It is required, thereforeNOT NULL, and has aCHECKto beBETWEEN 0 and 5.comment, which contains the comments of the student regarding the internship experience asTEXT, as it is appropriate for long strings or paragraphs. It is not required, therefore it can beNULL.
Below, find the ER diagram alongside the explanation of each relationship.
- Each student can acquire
0 or moreinternships (although, if they get no internship, it is fair to assume they will fail the course). - Each internship should be associated with exactly
1student and1supervisor. - Each supervisor can supervise
1 or moreinternships simultaneously. - A supervisor can work for
1company, but each company can have1 or moresupervisors. This database does not take into account the possibility of a supervisor quitting and leaving a company with0supervisors, and it also assumes that, in order for a company to be inserted, the company must have at least1supervisor. - Each student and supervisor should write
1feedback entry for each internship they are related to. This design decision prevents bloating the database with multiple comments for the same internship.
INDEXes
- While
student_numberwas automatically indexed by SQLite as aCOVERING INDEX,idx_supervisor_emailis aCOVERING INDEXcreated to speed up suprevisor searches. This database expects most searches will be made with the former two identifiers in mind over people's names, as people with the same name would not produce accurate query results. idx_blacklistis an index of all blacklisted companies. It has been created instead of a list of whitelisted companies because it is expected that there will be significantly less blacklisted companies over whitelisted ones (e.g., 5% of all companies are blacklisted). When querying for all whitelisted companies, SQLite will still have to scan all of thecompaniestable, but for quick glances at which companies are blacklisted, this index speeds up that search and does not require as much memory as an index for whitelisted companies would.- Many common queries rely on the student number (or on finding the student ID through the student number), which relates students to a given supervisor through the
internshipstable. Examples include finding the supervisor feedback for a given student or finding the contact information of a student’s supervisor.supervisor_idandstudent_id(from theinternshipstable) are also used in manyJOINclauses, such as joins to find the top five companies most likely to hire remotely or to find all students working in person. This is enough to justify the creation of theidx_student_id,idx_supervisor_id,idx_work_type, andidx_start_dateindexes (from theinternshipstable) to speed up queries. Checks withEXPLAIN QUERY PLANsupport this decision. - The
idx_internship_id_supervisor_feedbackindex (from thesupervisor_feedbacktable) was also created to speed up queries related to a supervisor's evaluation of a student's internship. As a result, these will also speed up the professor's work when grading.
VIEWs
companies_whitelistandcompanies_blacklistprovide quick ways to filter whitelisted and blacklisted companies when using SELECT statements.companies_by_hire_amountandcompanies_by_scoreallow for quick overviews of “companies to maintain a good relationship with,” which, for example, could indicate the top companies to contact for a partnership in the following year.
TRIGGERs
blacklist_companyperforms a soft delete on a company. Instead of deleting a company’s entry from the database, it fills theblacklistedcolumn with theCURRENT_DATEand prevents the company from appearing in thecompanies_whitelistVIEW.
- This database assumes all companies are located in Canada, although adding columns to support other countries or more detailed address information (such as a ZIP code) should be easy.
- This database does not support any type of documentation as proof of what a student did (such as a PDF document proving the student was hired by a certain company). That's not in the database's scope, though.
- As the database is targeted toward Cinema & Media Arts students, it is easy to keep track of the industry each company belongs to (
film and tv,radio,journalism, andanimationso far). However, if this list grows larger, creating a separate table to hold all industry types would be ideal. As is, the design is not very scalable without harming clarity.
