~ Databases


Databases are boring?

if you think databases are boring and irrelevant, think again! Just about any institution you can think of probably uses a database. In fact your school./college/university has a database which contains information about you! There are all sorts of databases (massive stores of data) from DNA databases to databases that store information about plants! As you can imagine, prior to the widespread use of computers, data was held in paper form, which was a terrible nightmare in terms of storage and as you can imagine, quite an effort to copy and search! When it comes to keeping data accessible and searchable, databases are awesome!

It goes without saying that modern computers are synonymous with data. A big buzzword at the moment is "Big Data". As a computer science student you really should be clued up on what it is. Basically, there is a mind-boggling amount of data floating around our society. Physicists at CERN have been pondering how to store and share their ever more massive data for decades - stimulating globalization of the internet along the way, whilst 'solving' their big data problem. Tim Smith plots CERN's involvement with big data from fifty years ago to today.

Big Data - watch this first! It's important for your 'real world' learning!


• Big Data is a buzzword for (usually internet-based) incredibly high-volume databases such
as those run by the leaders of online industries, for example, eBay, Facebook, Google, etc.
• Big Data aims to understand the needs of consumers or other trends and target them with
advertising, which is controversial due to invasion of privacy and vulnerability to data theft
by hackers and governments overstepping boundaries.

What you will learn: Databases

(a) Relational database, flat file, primary key, foreign key, secondary key, entity relationship modelling, normalisation and indexing. (b) Methods of capturing, selecting, managing and exchanging data. (c) Normalisation to 3NF. (d) SQL – Interpret and modify. (e) Referential integrity. (f) Transaction processing, ACID (Atomicity, Consistency, Isolation, Durability), record locking and redundancy.

An interesting video on Google's data center


Topic 1

PowerPoint loading ...

Some theory essentials

Did you know:

that many apps and websites are just interfaces to databases.


• The flat file database is a simple type of a database where all data is held in a single table. It
is not suitable for large projects due to data redundancy.


• Data redundancy is a problem that not only wastes space but also results in inconsistent
data and failed searches (when databases are updated, we might miss some copies of
the same data, for example the customer’s name, and entries will look like they belong to
different customers).


• Relational databases overcome the problem of redundancy through the use of multiple
tables, creating another dimension in addition to the two dimensions of a flat file database.
Primary and foreign keys are used to link the tables, so that a query will pull the data from
multiple tables into one flat file report.


• Primary keys must be unique and every table has one. A primary key of one table might be
linked to another table, where it appears as a foreign key.


• Secondary keys are used for faster searching, and are similar to tags/hashtags used in
social networks and blogs (e.g. ‘#LOL’). They don’t have to be unique.


• Indexing saves searching by copying out the most searched-for database attributes, for
example surnames, into a separate file/table. An index table is like a table of contents in a
book, except instead of topics and page numbers it would have records of surnames and
their location on a hard disk.


• Normalisation is the process of splitting up a busy redundant flat file database into
multiple tables that form a relational database. There are various degrees of normalisation,
with the ‘un-normalised’ being the lowest and third normal form being the highest, where
all the data that can be sensibly isolated in their own table have been separated out.


• The first normal form is where no cells in a table contain multiple entries/bits of data and all
the columns can be connected to a primary key.


• The second normal form has no partial dependencies. This means that if a table contains
columns that depend on each other more than on the primary key, we can remove them to
a separate table and just leave one column as a foreign key. The rule of thumb is, if multiple
instances of a foreign key relate to many instances of a primary key (known as a many-tomany
relationship),
then the table
is not
in a second
normal form.


• The third normal form is about removing ‘transitive dependencies’. These are indirect
links, like a ‘friend of a friend’, or a ‘student’s form tutor’s department’ where students and
the form tutor’s department are linked only indirectly or ‘transitively’. We would create a
separate table for our ‘friend’ and their friends will be stored there, but not in our main
table. Transitive dependency occurs when two non-keys attributes of the same entity are
related (e.g. student’s form and form tutor’s name).


• Third normal form databases usually feature transaction tables; for example, rather than
students and courses, we will have students, enrolments (which is a transaction table as it is
likely to contain more records than other tables) and courses.


• Structured Query Language (SQL) is used to manipulate most common databases. Any
manipulation of data (including just retrieving it) is known as a ‘query’. We know select
queries (retrieve data), insert queries (add data to a database), update queries (change
values in a database) and delete queries (delete unneeded records). Additionally, we can
use SQL to create, modify and delete entire tables and their relationships.


• Queries can combine data from multiple tables, perform calculations and use criteria to
work either with all records or with just the ones fitting the criteria.


• SQL is written in plain text, is very close to English and was created for non-computer
specialists to understand. It is understood by many otherwise incompatible databases
and can be combined with other programming languages such as Python, Visual Basic or
Hypertext Preprocessor (PHP).


• Referential integrity is important for relational databases. If a related table is updated
it might not match the related data in another table, especially when foreign keys are
involved. This results in ‘orphan records’. This is similar to ending with broken links on the
internet if a site linked to has changed its URL.


• Databases hold large volumes of data while most queries work with a small portion of a
database, in small bursts of activity called transactions.


• Successful transactions have four properties, known together as ACID:
1. atomicity (every single objective has been achieved, every part worked)
2. consistency (data is not corrupted, database rules followed)
3. isolation (if multiple transactions are executed in parallel the result is identical to that
obtained if they were executed one after the other – so, they can’t affect each other’s
operation, which might involve ‘locking’ the data for one query that a parallel query is
working on)
4. durability (changes are committed and saved to disk).

Tasks

Your teacher may ask you to do one or more of the following tasks

Worksheets for this section - see the link to the left for 04-07

Learning Poster

1. Complete a learning poster (a single slide) in which all of the learning objectives for this topic are explained

*Refer to the theory PowerPoint and the "What you will learn objectives" slide.

Research Power Point

1. Complete a research powerpoint on the topic selected by your teacher using the following template.

End of topic assessment

Well done on completing this topic!

Your teacher will direct you as to which task needs to be handed in

What to submit

-learning poster

-research powerpoint

-worksheet