Technology Blog : March 2021

Database system normalization bits
https://www.youtube.com/watch?v=9YiPgZU7jQI&list=PLG8GE9-M0lR_xZ3Y42yO4lRSRR4qC5AR8
https://www.youtube.com/watch?v=eTiP-H9GQ30&list=PLmXKhU9FNesR1rSES7oLdJaNFgmuj0SYV

functional dependency
FD are used for searching
if A -> B then there is FD between A and B or B is functionally dependent on A....F(A) WILL GIVE VALUE OF B..SO FOR ONE VALUE OF A YOU CAN GET ONLY ONE VALUE OF B

A is called determinant and B is called dependent
if i tell u the value of A then we can search for corresponding value of B and you will get only one value of B
ex: a student can only register in only one stream (BTech (CSE))
for different value of A, we may get same value of B
ex : different students can register for same stream (BTech (CSE))

for one value of A if we get 2 different values of B then there is no FD between A and B

A is distinct then any keys that have A ie, AB, ABC, ABCD as determinant then all those FDs are valid
if dependent is same for all rows then FDs is valid

2 types of FD
trivial FD: AB-> A
A is subset of AB
trivial FD is used for verification
Let A= mobile number
B = name
select mobile, name from library where mob='999' and B='sarvani'
this query is used to verify whether record exists in library or not

https://www.youtube.com/watch?v=wez3fXrjBAE&list=PLmXKhU9FNesR1rSES7oLdJaNFgmuj0SYV&index=12

https://www.youtube.com/watch?v=9YiPgZU7jQI&list=PLG8GE9-M0lR_xZ3Y42yO4lRSRR4qC5AR8

closure of A is the attributes that can be search using value of A
armstrong axioms in DBMS:
1. reflexivity : Y is subset of X then X -> Y (employee marries another employee)
2. Augmentation: if X -> Y then XZ -> YZ
3. Transtivity: if X-> Y and Y-> Z then X->Z
4. Union: X-> Y and X -> Z then X->YZ (if x is id then y is dob and z is mob)
5. Decomposition : if X -> YZ then X-> Y and X->Z
6. pseudo transitivity : if X -> Y and WY -> Z then WX -> Z
7. composition : if X-> Y and Z -> W then XZ -> YW

AB-> C then we need A and B to search for C
C-> AB then using C we can search for A and B i.e, C -> A and C-> B
https://www.youtube.com/watch?v=vs65S6Nku5g&list=PLmXKhU9FNesR1rSES7oLdJaNFgmuj0SYV&index=15

Equivalence of FDs

let F:
P -> Q
Q->R
R->S

and

G:
P->QR
R->S

Take the determinants of F and compute closures in G
closure of P in G = {P, Q,R,S}
closure of Q in G = { Q}
closure of R in G = { R, S }
Check whether these dependencies exists in F...Q-> R in F is not above so F is not subset of G

Take the determinants of G and compute closures in F
closure of P in F = {P, Q,R,S}
closure of R in F = { R, S}
Check whether these dependencies exists in G. All exists in G and hence G is subset of F

Irreducible set of FD (cannonical form)
remove redundant element --
How to find Cannonical cover to remove redundancy or minimize FD?
Ans: mininze FDs so that removal of FDs will not effect the system
If presence or removal of a FD does not effect the system then the FD is redundant.
Example:
Given:
X->W
WZ->XY
Y->WXZ
Decomposing the FDs we get
X->W
WZ->X
WZ->Y
Y->W
Y->X
Y->Z

compute Closure of X ={X, W} now remove the FD X->W and again compute
closure of X ={x} and hence removing FD X ->W is effecting the system and so we cannot remove this FD

Similarly do for remaining FDs ..if you get a FD redundant remove it immediately and dont include it while checking other FDs .. we get
X->W
WZ->Y
Y->X
Y->Z

since there is only one element in RHS but WZ is there in LHS so
find closure of WZ ={W,Z,Y,X}
find closure of W ={W}
find closure of Z ={Z}

if closure of W is same as closure of WZ then Z is redundant and we can remove Z and we get W->Y
but here it is not ..hence minimal FDs are
X->W
WZ->Y
Y->XZ

for same set of FDs we may get different set of minimal sets and all are correct

Keys
any attribute or combination of attributes whose closure gives all other attributes in a table then that attribute/combination of attributes is called a super key
ex: R(A,B,C,D)
and A->BC then A is not super key as D cannot be derived from A
ex2: R(A,B,C,D)
ABC -> D
AB->CD
A->BCD
then ABC,AB and A are super keys. Minimal super key are called candidate key and here A is candidate key.
ex3: A->BCD
BCD->A
then A, BCD are both candidate keys as there is no subset of BCD that is a super key.

short cut: FIND ATTRIBUTE THAT HAS NO INCOMING EDGE and that attribute will definitely be in candidate key. if there is no such attribute, find all combinations of 1 attribute, 2 attribute and 3 attribute and so on.

ex3: R(A,B,C,D,E,F)
AB-> C
C->D
B->AE
here B and F do not have incoming edges
closure of BF={A,B,C,D,E,F} hence BF is candidate key. B,F are prime attribute and A,C,D,E are non prime attributes

ex4: R(A,B,C,D)
A->B
B->C
C->A
then AD, CD, BD are candidate keys and A,B,C,D are prime attributes

ex5: AB->CD
D->A
AB and BD are candidate keys. A,B,D are prime attribute and C is non prime attribute.

First Normal Form:
each cell must have only one value
the order or rows and columns are not important
the domain of a particular column must be same

Second Normal Form:
R(A,B,C,D)
AB->D
B->C
FIRST FIND CANDIDATE KEYS.
here AB is candidate key. A and B are prime attributes
but C is non prime attribute and is dependent on part of candidate key i.e, B->C this is called partial dependency . When in AB if B is NULL and B->C means NULL ->C value which is not valid
so decompose
R1(A,B,D) and R2(B,C)

here AB is primary key in R1 and B is primary key in R2.
A table must be in 1NF and must not have partial dependency then it is said to be in 2NF.

ex2: R(A,B,C,D,E)
AB->C
D->E
1. first find candidate keys ..here A,B,D have no incoming edges and so ABD is candidate key
2. decompose into 3 tables to avoid partial dependency
R1(A,B,D)
R2(A,B,C)
R3(D,E)

ex3:
A->B
B->E
C->D
1. first find candidate keys ..here A,C have no incoming edges and so AC is candidate key
In C->D we have partial dependency (part of candidate key -> non prime)
2. decompose into 3 tables to avoid partial dependency
R1(A,C)
R2(A,B,E)
R3(C,D)

EX3:
AB->C
AD->GH
BD->EF
A->I
H->J
1. first find candidate keys ..here A,B,D have no incoming edges and so ABD is candidate key
2. decompose into tables to avoid partial dependency
R1(A,B,D)
R2(A,B,C)
R3(A,I)
R4(A,D,G,H,J)

THIRD NORMAL FORM
partial dependency --part of candidate key i.e, prime->non prime
transitive dependency -- non prime -> non prime
or part of super key which is not key ->non-key

table will be in 3nf if it is in 2nf and no transitive dependency
(or)
shortcut: any FD A->B ..A must be super key or B must be prime..this condition must be false. if all FDs are following this then table is in 3NF.
ex: R(A,B,C,D,E)
A->B
B->E
C->D
1. first find candidate keys ..here A,C have no incoming edges and so AC is candidate key
in C->D we have partial dependency (part of candidate key -> non prime) . the table is not in 2NF
Shortcut to decompose into tables so that the resultant tables are in 3NF is
A->B ..A is not super key or B is not prime. So R1(A,B)
B->E ...B is not super key or E is not prime. So R2(B,E)
C>D ...C is not super key or D is not prime. So R3(C,D)
and R4(A,C)

Boyce Code Normal form(BCNF):
1. The table must be in 3NF and no overlapping candidate key or
any FD A->B ..A must be super key. if all FDs are following this then table is in BCNF.

Fully functionally dependent means determinant of all FDs must be candidate key or super key
When a table is given, to check whether the table is in BCNF
1. find the candidate keys
2. if you find any FD whose LHS or determinant is not superkey then it is not in BCNF

No overlapping candidate key: here prime/non prime must not determine prime in FD
ex: R(A,B,C)
AB->C
C->B
here AB and AC are candidate keys
in AB -> C AB is prime/superkey and C is prime
in C->B and C is prime and B is prime
so the table is in 3NF but not in BCNF as C is not super key
Decomposition would be: Write the problematic FD first
R1(C,B)
then we cannot have ABC here so only thing we must have is AC so that there will be lossless connection between C and so R2(A,C) and not AB

Check in which normal form the following relation is in?
R(ABCDE)
AB->CD
D->A
BC->DE

here candidate keys are AB,BD, BC are candidate keys
in AB->CD here AB is candidate key
in D-> A here D is not candidate key
hence table is not in BCNF
But A is a prime attribute so the 3NF holds
in BC->DE BC is candidate key
Hence table is in 3NF

in real time most of the databases end up with 2NF tables

LOSSLESS DECOMPOSITION or non additive decomposition:

1. R1UR2=R

2. R1 INTERSECTION R2 IS NOT EMPTY

3. CLOSURE OF (R1 INTERSECTION R2) MUST BE EITHER R1 OR R2
Normalization reduces redundancy by decomposing the tables. If in future if we combine the decomposed tables if we are able to get the original table back, then it is called lossless decomposition or non additive decomposition.
If R is divided into R1 and R2 and that decomposition is lossless iff
1. attribute(R1) UNION attribute(R2) =attributes in R
2. attribute(R1) INTERSECTION attribute(R2) IS NOT NULL that means intersection gives common attribute
3. attribute(R1) INTERSECTION attribute(R2) -> attribute (R1)
(or)
attribute(R1) INTERSECTION attribute(R2) -> attribute (R2)
that means intersection gives common attribute and common attribute must be key in R1 or R2

DEPENDENCY PRESERVING

If R is divided into R1 and R2 and that decomposition is dependency preserving iff

if F is FDs of R and F1 and F2 are FDs of R1 and R2

then F1 is subset of closure of F and F2 is subset of closure of F and

closure of F1 UNION F2 is subset of closure of F

ex: R(A,B,C,D)

AB->CD

D->A

R is decomposed into R1(A,D) and R2(B,C,D)

in R1(A,D) we get D->A

in R2(B,C,D)

find

closure of B in F = B

closure of C in F = C

closure of D in F = D,A

closure of BC in F = B,C

closure of BD in F = B,D,A,C

closure of CD in F = C,D,A

hence we get BD -> C

using D->A and BD -> C closure of AB = {A,B} and we cannot get all the FDs in R ..Hence the decomposition is not dependency preserving

spanned and unspanned mapping

the secondary memory is divided into block

and if an entire row cannot fit in a block then half of the row will be in one block and remaining will be in next block--spanned mapping--memory efficient but not time efficient --we always bother about time so not efficient

we waste the space that does not fit an entire row and cause internal fragmentation and start storing the entire row in next block

Indexing

for each row we do not have entry in index table then it is called sparse index

for each value if there is index then it is dense index

https://www.youtube.com/watch?v=O-Mbn6VI1zc&list=PLmXKhU9FNesR1rSES7oLdJaNFgmuj0SYV&index=45

primary index: main file is sorted and index is made on key attribute

no. of entries = no. of blocks in file say n

it is sparse index

size of index file is n and index file is divided into blocks and log n base 2 accesses are required to access the required block address in index file.

clustered index: main file is sorted and index is made on non-key attribute

secondary index: main file is not sorted and index is made on key/non-key

index is always sorted

Transaction management

Database has a buffer an main memory also has buffer.
the data is brought into main memory buffer and the transaction reads that data (Database copy) from main memory buffer.
the transactions share the data in main memory buffer. any subsequent writes and reads are done on main memory buffer.

when the transaction commits, then the data in main memory will be stored in main memory.

conflict serializabillity: the transactions are consistent then there is conflict serializability
2 instructions are conflicting if
1. they belong to different transactions
2. they are operating on same data
3. one of the instruction is write instruction

draw an edge for every conflict operation and if the graph has a cycle then schedule is not conflict serializable
the order in which transactions in the schedule to be executed by using topological sorting

if the schedule is not conflict serializable but there is blind write (without reading transaction is writing data) then the schedule may be view serializable

if schedule is view equivalent with serial schedule then the schedule is view serializable.
view equivalent
1. the transaction that reads a data item initially in a schedule and serial schedule must be same
2. the transaction that writes a data item finally in a schedule and serial schedule must be same
3. also intermediate reads must be same then both the schedule are view serializable.
as the other schedule is a serial schedule then the schedule in view serializable.

schedule must ensure that the schedule will take the database into consistent state

check whether a schedule is recoverable or not?

if transaction reads a value of a transaction that is not yet committed then it is called dirty read.

If there is no dirty read then the schedule is recoverable.

the transaction that reads first commits first then also schedule is recoverable

if there is dirty read irrespective of commit state, the schedule is not cascadeless recovery

and also has cascading rollback.

strict schedule: when transaction1 written a data item then no other transactions can read or write that data item until transaction1 commits .

procedural language written in relational algebra

non procedural language written in relational calculus

both relational algebra and relational calculus are written on paper and the implementation is done using SQL

Relational algebra

relational algebra query returns a table with no name

relational algebra does not give duplicate values as they use sets

no english keywords are not used

operators are two types 1. basic or functionally complete or fundamental

and 2. derived operators need not be used but used for implement query quickly

basic fundamental operators are

select

project

union

set difference

cartesian product

rename

derived operators built on basic operators?

natural join

intersection (A intersection B = (A-(A-B))

division

Technology Blog

Search This Blog

Pages

Wednesday 24 March 2021

1D and 2D arrays in java

1D arrays in java

2D arrays in java

DBMS degree college lecturers

Sunday 14 March 2021

GUI Programming Using Windows Builder in Eclipse For Absolute Beginners

AdSense

AdSense