object-oriented reengineering patterns and techniques
DESCRIPTION
Object-Oriented Reengineering Patterns and Techniques. Wahyu Andhyka Kusuma, S.Kom [email protected] 081233148591. M ateri 5 Problem Detection. Topik. Metrics Object-Oriented Metrics dalam Praktek Duplikasi k ode. Topik. Metrics Kualitas dari Perangkat Lunak - PowerPoint PPT PresentationTRANSCRIPT
Object-Oriented Reengineering Patterns and Techniques
Wahyu Andhyka Kusuma, [email protected]
081233148591
Materi 5Problem Detection
Topik• Metrics• Object-Oriented Metrics dalam Praktek• Duplikasi kode
Topik• Metrics
– Kualitas dari Perangkat Lunak– Menganalisa Kecenderungan
• Object-Oriented Metrics dalam Praktek• Duplikasi kode
7.4
Mengapa menggunakan OO dalam Reengineering?
• Menaksir kualitas dari perangkat lunak– Komponen mana yang memiliki kualitas yang buruk?
(sehingga dapat di reengineering)– Komponen yang mana memiliki kualitas yang baik?
(sehingga dapat di reverse engineered) Metrics sebagai peralatan untuk reengineering
• Mengontrol proses dari reengineering– Menganalisa kecenderungan :
• Komponen mana yang bisa diubah??– Bagian refactoring mana yang dapat digunakan? Metrics sebagai peralatan reverse engineering!
7.5
ISO 9126 Quantitative Quality Model
SoftwareQuality
Functionality
Reliability
Efficiency
Usability
Maintainability
Portability
ISO 9126 Factor Characteristic Metric
Error tolerance
Accuracy
Simplicity
Modularity
Consistency
defect density= #defects / size
correction impact= #components
changed
correction time
7.6
Product & Process Attributes
Product AttributeDefinisi : Mengukur aspek dari Hasil yang dikirimkan ke pelangganContoh : Jumlah dari sistemYang rusak, mempelajari tentang sistem
Process AttributeDefinisi : Mengukur aspek dari Proses dimana memproduksi produkContoh : waktu untuk memperbaiki, kerusakan jumlah dari komponenYang dirubah per perbaikan
7.7
External & Internal Attributes
External AttributeDefinisi : mengukur bagaimanaproduct/process berjalan dalamenvironmentContoh : waktu rata-rata dalamkesalahan, #components changed
Internal AttributeDefinisi : mengukur didalam Istilah didalam produk Memisahkan FORM, dalam konteks behaviourContoh : class coupling dancohesion, method size
7.8
External vs. Internal Product Attributes
External InternalKeuntungan: > close relationship dengan quality
factors
Kerugian:> relationship dengan quality
factors tidak dalam empirically validated
Kerugian:> Mengukur hanya setelah produk
digunakan> Pengumpulan data sulit data
serinkali ada interfrensi pengguna> Menghubungkan eksternal efek ke
dalam internal sangat sulit
Keuntungan:> Dapat diukur kapanpun> Pengumpulan data dapat secara
mudah dan otomatis> Berhubungan langsung dengan
pengukuran dan penyebabnya
7.9
Metrik dan Pengukuran• Weyuker [1988] mendefinisikan sembilan properti dimana Metrik
software harus diambil• Untuk OO hanya 6 properti yang sangat penting [Chidamber 94, Fenton &
Pfleeger ]– Non coarseness:
• Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat ditemukan sehingga menjadi m(P) m(Q)
• Tidak semua kelas memiliki nilai yang sama untuk metrik– Non uniqueness.
• Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q)• Dua kelas dapat memiliki metrik yang sama
– Monotonicity• m(P) m (P+Q) dan m(Q) m (P+Q), P+Q adalah “kombinasi” dari kelas P dan Q.
7.10
Metrik dan Pengukuran– Design Details are Important
• Inti utama dari Class harus mempengaruhi nilai dari metrik. Setiap class melakukan aksi yang sama dengan detailnya harus memberikan dampak terhadap nilai dari metrik.
– Nonequivalence of Interaction• m(P) = m(Q) m(P+R) = m(Q+R) dimana R interaksi dengan Class
– Interaction Increases Complexity• m(P) + (Q) < m (P+Q). • Dimana dua class digabungkan, interaksi diantaranya juga akan
menambah nilai dari metrik• Kesimpulan: Tidak semua pengukuran berupa Metrik
7.11
Memilih Metrik• Cepat
– Scalable: Kita tidak dapat menghasilkan log(n2) dimana n 1 juta LOC (Line of Code)
• Tepat– (misalnya #methods — perhitungkan semua method, public, juga
inherited?)• Bergantung pada kode
– Scalable: Kita menginginkan mengumpulkan metrik dalam waktu sama• Sederhana
– Metrik yang komplek sulit untuk diterjemahkan
7.12
Menaksir kemudahan perbaikan• Ukuran dari sistem, termasuk entitas dari sistem
– Ukuran Class, Ukuran method, inheritance– Ukuran entitas mempengaruhi maintainability
• Kesatuan dari entities– Class internal– Perubahan harusnya ada dikelas tersebut
• Coupling (penggabungan) diantara entitas– Didalam inheritance: coupling diantara class-subclass– Diluar inheritance– Strong coupling mempengarui perubahan di kelas tersebut
7.13
Sample Size and Inheritance Metrics
Class
AttributeMethodAccess
Invoke
BelongTo
Inherit
Inheritance Metricshierarchy nesting level (HNL)# immediate children (NOC)# inherited methods, unmodified (NMI)# overridden methods (NMO)
Class Size Metrics# methods (NOM)# instance attributes (NIA, NCA)# Sum of method size (WMC)
Method Size Metrics# invocations (NOI)# statements (NOS)# lines of code (LOC)
7.14
Sample class Size• (NIV)
– [Lore94] Number of Instance Variables (NIV) – [Lore94] Number of Class Variables (static) (NCV) – [Lore94] Number of Methods (public, private, protected) (NOM)
• (LOC) Lines of Code• (NSC) Number of semicolons [Li93] number of Statements • (WMC) [Chid94] Weighted Method Count
– WMC = ∑ ci – where c is the complexity of a method (number of exit or McCabe
Cyclomatic Complexity Metric)
7.15
Hierarchy Layout• (HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance
Tree, • HNL, DIT = max hierarchy level• (NOC) [Chid94] Number of Children • (WNOC) Total number of Children • (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added,
Inherited, Extended (super call)• (SIX) [Lore94]
– SIX (C) = NMO * HNL / NOM– Weighted percentage of Overridden Methods
7.16
Method Size• (MSG) Number of Message Sends• (LOC) Lines of Code• (MCX) Method complexity
– Total Number of Complexity / Total number of methods– API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with
params = 3....
7.17
Sample Metrics: Class Cohesion• (LCOM) Lack of Cohesion in Methods
– [Chidamber 94] for definition– [Hitz 95] for critique
Ii = set of instance variables used by method Milet P = { (Ii, Ij ) | Ii Ij = }
Q = { (Ii, Ij ) | Ii Ij }if all the sets are empty, P is emptyLCOM = |P| - |Q| if |P|>|Q|
0 otherwise• Tight Class Cohesion (TCC)• Loose Class Cohesion (LCC)
– [Bieman 95] for definition– Measure method cohesion across invocations
7.18
Sample Metrics: Class Coupling (i)• Coupling Between Objects (CBO)
– [Chidamber 94a] for definition, – [Hitz 95a] for a discussion– Number of other classes to which it is coupled
• Data Abstraction Coupling (DAC)– [Li 93] for definition– Number of ADT’s defined in a class
• Change Dependency Between Classes (CDBC)– [Hitz 96a] for definition– Impact of changes from a server class (SC) to a client class (CC).
7.19
Sample Metrics: Class Coupling (ii)• Locality of Data (LD)
– [Hitz 96] for definitionLD = ∑ |Li | / ∑ |Ti | Li = non public instance variables
+ inherited protected of superclass+ static variables of the class
Ti = all variables used in Mi, except non-static local variablesMi = methods without accessors
7.20
The Trouble with Coupling and Cohesion• Coupling and Cohesion are intuitive notions
– Cf. “computability”– E.g., is a library of mathematical functions “cohesive”– E.g., is a package of classes that subclass framework classes cohesive?
Is it strongly coupled to the framework package?
7.21
Conclusion: Metrics for Quality Assessment• Can internal product metrics reveal which components have good/poor
quality?• Yes, but...
– Not reliable• false positives: “bad” measurements, yet good quality• false negatives: “good” measurements, yet poor quality
– Heavyweight Approach• Requires team to develop (customize?) a quantitative quality model• Requires definition of thresholds (trial and error)
– Difficult to interpret• Requires complex combinations of simple metrics
• However...– Cheap once you have the quality model and the thresholds– Good focus (± 20% of components are selected for further inspection)
• Note: focus on the most complex components first!
Topik• Metrics• Object-Oriented Metrics dalam Praktek
– Detection strategies, filters and composition– Sample detection strategies: God Class …
• Duplikasi kode
7.23
Detection strategy• A detection strategy is a metrics-based predicate to identify candidate
software artifacts that conform to (or violate) a particular design rule
7.24
Filters and composition• A data filter is a predicate used to focus attention on a subset of interest of
a larger data set– Statistical filters
• I.e., top and bottom 25% are considered outliers– Other relative thresholds
• I.e., other percentages to identify outliers (e.g., top 10%)– Absolute thresholds
• I.e., fixed criteria, independent of the data set
• A useful detection strategy can often be expressed as a composition of data filters
7.25
God Class
• A God Class centralizes intelligence in the system– Impacts understandibility– Increases system fragility
7.26
Feature Envy
• Methods that are more interested in data of other classes than their own [Fowler et al. 99]
7.27
Data Class
• A Data Class provides data to other classes but little or no functionality of its own
7.28
Data Class (2)
7.29
Shotgun Surgery
• A change in an operation implies many (small) changes to a lot of different operations and classes
Topik• Metrics• Object-Oriented Metrics dalam Praktek• Duplikasi kode
– Detection techniques– Visualizing duplicated code
7.31
Kode di salinContoh dari Mozilla Distribution (Milestone 9)Diambil dari /dom/src/base/nsLocation.cpp
[432] NS_IMETHODIMP [433] LocationImpl::GetPathname(nsString[434] {[435] nsAutoString href;[436] nsIURI *url;[437] nsresult result = NS_OK;[438] [439] result = GetHref(href);[440] if (NS_OK == result) {[441] #ifndef NECKO[442] result = NS_NewURL(&url, href);[443] #else[444] result = NS_NewURI(&url, href);[445] #endif // NECKO[446] if (NS_OK == result) {[447] #ifdef NECKO[448] char* file;[449] result = url->GetPath(&file);[450] #else[451] const char* file;[452] result = url->GetFile(&file);[453] #endif[454] if (result == NS_OK) {[455] aPathname.SetString(file);[456] #ifdef NECKO[457] nsCRT::free(file);[458] #endif[459] }[460] NS_IF_RELEASE(url);[461] }[462] }[463] [464] return result;[465] }[466]
[467] NS_IMETHODIMP [468] LocationImpl::SetPathname(const nsString[469] {[470] nsAutoString href;[471] nsIURI *url;[472] nsresult result = NS_OK;[473] [474] result = GetHref(href);[475] if (NS_OK == result) {[476] #ifndef NECKO[477] result = NS_NewURL(&url, href);[478] #else[479] result = NS_NewURI(&url, href);[480] #endif // NECKO[481] if (NS_OK == result) {[482] char *buf = aPathname.ToNewCString();[483] #ifdef NECKO[484] url->SetPath(buf);[485] #else[486] url->SetFile(buf);[487] #endif[488] SetURL(url);[489] delete[] buf;[490] NS_RELEASE(url); [491] }[492] }[493] [494] return result;[495] }[496]
[497] NS_IMETHODIMP [498] LocationImpl::GetPort(nsString& aPort)[499] {[500] nsAutoString href;[501] nsIURI *url;[502] nsresult result = NS_OK;[503] [504] result = GetHref(href);[505] if (NS_OK == result) {[506] #ifndef NECKO[507] result = NS_NewURL(&url, href);[508] #else[509] result = NS_NewURI(&url, href);[510] #endif // NECKO[511] if (NS_OK == result) {[512] aPort.SetLength(0);[513] #ifdef NECKO[514] PRInt32 port;[515] (void)url->GetPort(&port);[516] #else[517] PRUint32 port;[518] (void)url->GetHostPort(&port);[519] #endif[520] if (-1 != port) {[521] aPort.Append(port, 10);[522] }[523] NS_RELEASE(url);[524] }[525] }[526] [527] return result;[528] }[529]
7.32
Contoh LOCDuplikasi
tanpa komentar
Dengan komentar
gcc 460’000 8.7% 5.6%
Database Server 245’000 36.4% 23.3%
Payroll 40’000 59.3% 25.4%
Message Board 6’500 29.4% 17.4%
Berapa banyak kode diduplikasi?Biasanya diperkirakan: 8 hingga 12% dari kode
7.33
is not considered duplicated code.
could be abstracted to a new function
...getIt(hash(tail(z)));...
...getIt(hash(tail(a)));...
...computeIt(a,b,c,d);...
...computeIt(w,x,y,z);...
Apa itu duplikasi kode?• Duplikasi kode = Bagian dari kode program ditemukan ditempat lain dalam
satu sistem yang sama– Dalam File yang berbeda– Dalam File sama tapi Method berbeda– Dalam Method yang sama
• Bagian tersebut harus memiliki logika atau struktur yang sama sehingga dapat diringkas,
7.34
Permasalahan dari duplikasi• Biasanya memberikan efek negatif
– Penggelembungan kode• Efek negatif ketika perbaikan sistem atau software• Menyalin menjadi kerusakan tambahan dalam kode
– Software Aging, “hardening of the arteries”, – “Software Entropy” increases even small design changes become very difficult
to effect
7.35
Nontrivial problem: • No a priori knowledge about which code has been copied• How to find all clone pairs among all possible pairs of segments?
Lexical Equivalence
Semantic Equivalence
Syntactical Equivalence
Mendeteksi duplikasi kode
7.36
Source Code Transformed Code Duplication Data
Transformation Comparison
Author Level Transformed Code Comparison TechniqueJohnson 94 Lexical Substrings String-MatchingDucasse 99 Lexical Normalized Strings String-MatchingBaker 95 Syntactical Parameterized Strings String-MatchingMayrand 96 Syntactical Metric Tuples Discrete comparisonKontogiannis 97 Syntactical Metric Tuples Euclidean distanceBaxter 98 Syntactical AST Tree-Matching
General Schema of Detection Process
7.37
Recall and Precision
7.38
…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];
…fastid=NULL;constchar*fidptr=get_fastid();if(fidptr!=NULL)intl=strlen(fidptr)fastid = newchar[l+]
Simple Detection Approach (i)• Assumption:
• Code segments are just copied and changed at a few places• Noise elimination transformation
• remove white space, comments• remove lines that contain uninteresting code elements
– (e.g., just ‘else’ or ‘}’)
7.39
Simple Detection Approach (ii)• Code Comparison Step
– Line based comparison (Assumption: Layout did not change during copying)
– Compare each line with each other line. – Reduce search space by hashing:
• Preprocessing: Compute the hash value for each line• Actual Comparison: Compare all lines in the same hash bucket
• Evaluation of the Approach– Advantages: Simple, language independent – Disadvantages: Difficult interpretation
7.40
while (<>) { chomp; $totalLines++;
# remo ve comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $inComment = !$inComment; $_ = $'; } $codeOnly .= $_ unless $inComment; $_ = $codeOnly;
s|//.*$||; # remo ve comments of type // s/\s+//g; #remo ve white space s/$keywordsRegExp//og if $remo veKeywords; #remo ve keywords
$equivalenceClassMinimalSiz e = 1;$slidingWindo wSiz e = 5;$remo veKeywords = 0;@keywords = qw(if then else );
$keywordsRegExp = join '|', @k eywords;
@unw antedLines = qw( else return return; { } ; );push @unw antedLines , @keywords;
A Perl script for C++ (i)
7.41
A Perl script for C++ (ii)$codeLines++; push @currentLines , $_; push @currentLineNos , $.; if($slidingWindo wSiz e < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the ¼le $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineT oBeCompared}) { push @$bucketRef , $lineNumbersCompared; } else {$eqLines{$lineT oBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count for next ¼le
• Handles multiple files• Removes comments
and white spaces• Controls noise (if, {,)• Granularity (number of lines)• Possible to remove keywords
7.42
Output Sample
Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);create_property(pd,pnMinelt,stInteger,true,*iMinelt);create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);create_property(pd,pnOwnership,stBool,true,*iOwnership);Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype);create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);create_property(pd,pMinelt,stInteger,true,*iMinelt);create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);Locations: </face/typesystem/SCTypesystem.C>6177/6178</face/typesystem/SCTypesystem.C>6229/6230
Lines = duplicated linesLocations = file names and line number
7.43
Enhanced Simple Detection Approach• Code Comparison Step
– As before, but now• Collect consecutive matching lines into match sequences• Allow holes in the match sequence
• Evaluation of the Approach– Advantages
• Identifies more real duplication, language independent– Disadvantages
• Less simple• Misses copies with (small) changes on every line
7.44
Abstraction
– Abstracting selected syntactic elements can increase recall, at the possible cost of precision
7.45
Metrics-based detection strategy• Duplication is significant if:
– It is the largest possible duplication chain uniting all exact clones that are close enough to each other.
– The duplication is large enough.
7.46
Automated detection in practice• Wettel [ MSc thesis, 2004] uses three thresholds:
– Minimum clone length: the minimum amount of lines present in a clone (e.g., 7)
– Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2)
– Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3)
Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006
7.47
Exact Copies Copies with Inserts/Deletes Repetitive
a b c d e f a b c d e f a b c d e f a b x y e f b c d e a b x y dc ea x b c x d e x f xg ha
Variations Code Elements
Visualization of Duplicated Code• Visualization provides insights into the duplication situation
– A simple version can be implemented in three days– Scalability issue
• Dotplots — Technique from DNA Analysis – Code is put on vertical as well as horizontal axis– A match between two elements is a dot in the matrix
7.48
Detected ProblemFile A contains two copies of a piece of code
File B contains another copy of this code
Possible SolutionExtract Method
All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)
File A
File A
File B
File B
Visualization of Copied Code Sequences
7.49
Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction code
Possible SolutionStrategy Method
Visualization of Repetitive Structures
7.50
Visualization of Cloned Classes
Class A
Class B
Class BClass A
Detected Problem:Class A is an edited copy of class B. Editing & Insertion
Possible SolutionSubclassing …
7.5120 Classes implementing lists for different data types
DetailOverview
Visualization of Clone Families
7.52
Kesimpulan• Duplikasi Kode adalah masalah nyata
– Membuat sistem semakin susah untuk diubah• Mendeteksi duplikasi kode adalah masalah berat
– Beberapa teknik sederhana dapat membantu– Dukungan dari alat lain juga dibutuhkan
• Visualisasi dari kode sangat berguna• Mengatasi duplikasi kode bisa dijadikan bahan penelitian