Why Store Procedures Don’t Scale

Disclaimer: When I say “Scale” I mean a couple of ideas:

  • Accommodating a large increases in load, with less then half a couple of hours worth of work.
  • Growing your technology team up to hundreds or thousands of application developers.
  • The cost of your hardware infrastructure, does on correlate linearly with how much your product gets used. Rather the relationship should be Log(n).

Also in this post deals with propriety database management systems as well. Specifically I mention SQL Server, however the same can be said about Oracle, and etc.

Now with that out of the way lets get going.

Store Procedures don’t scale (in terms of people, cost, and infrastructure)

And here’s why:

Points “1” and “2” won’t really matter, if you work for an industry that has more money than it knows what to do with, and or has no competition, like government agencies, hospitals, big telecom, and etc.

1) The supply of people who know TSQL, and know it well isn’t as large as, the number of people who know modern object-oriented programming languages like Java, Python, etc. If you start dumping business logic into TSQL that sits in a propriety SQL Server Database, you are going to have to start hiring DBAs who specialize in SQL Server which is expensive

2) If you don’t hire people who specialize in TSQL and SQL Server tuning, you’re going to have to increase size of the hardware that the SQL Server runs on. However, since SQL server is proprietary software, its licensing is tied to the amount of hardware it runs on. So, if you want to run a bunch of poorly optimized TSQL on bigger infrastructure, you must pay more for additional hardware, and licensing (since its per core).

holly moly that’s expensive [1]

3) Let’s say you’re okay with both points “1” and “2”. There is going to be a hard limit of 128 cores and a terabyte of ram that you can run a SQL server instance on. At that point if the TSQL is not optimized, and your data set is big enough you’re still not going to have lighting fast query times. Which really means that dumping core business logic into TSQL for performance reasons, was just covering up an underlying problem of bad schema design.

4) When you use TSQL store procedures, everything is inside SQL Server. Now there are some problems with that:

4 – A) Your business logic does not live in GIT, its changes are not tracked, and there is no autonomous log of who changed what, when. You may say “My store procs do live in GIT, inside a DB project”, that may be true. However that does not guarantee, what’s in source control is any one of the SQL server instances you may have. Since someone could always login to SQL Server and change a store proc, without anyone knowing. Now to stop this from happening, you could have an experienced DBA that acts as the guardian of SQL Servers, however that again is expensive.

4 – B) Now if you have multiple “Staging”, “Prod”, and “Dev” SQL server instances. You can never be 100% sure that all the Store Procs are the same across the environments, or reliably determine how out of date a store proc is.

4 – C) Using TSQL a lot leads to abusing TSQL (unless you have a good DBAs), with no checks or balances in your data tier. Like cross DB actions, synonyms, sending out emails, doing large ETL jobs, creating functions, etc. At this point the database stops being becoming you Data Tier, and is pretty much you application tier. Except its a lot more expensive to maintain and run (SQL server licensing and DBAs), then a web app that does the same sort of stuff.

[1] – https://redmondmag.com/articles/2019/11/08/sql-server-2019-licensing.aspx

Key Performance Counters for Windows Server using Performance Monitor

Every windows server ships with a lot of free tools, to help monitor and better understand what’s happening on the system. One of the most use full tools is called “Performance Monitor”, don’t let the outdated UI fool you. Performance Monitor is an extremely useful tool that allows you to log, collect, and visualize all your performance related data.

Before you go increase your VM size on your cloud provider, or switch to SSDs. Be sure to take a look at Performance Monitor, they could surprise you.

The data you collect can be broken down into four main groups:

  • CPU
    • Spawning too many threads
    • Doing too much work at one time
    • Background job that could be eating up all the CPU cycles
    • Too many Garbage collector calls
  • Memory
    • Creating too much garbage
    • Memory leaks
    • Trying to load too much data into memory at one time
    • Not streaming in data
  • Disk
    • Writing to disk all the time
    • Using too much ram at one time, so the OS starts to use Disk
    • Serving too much static content for your site (use a CDN)
  • Network
    • More traffic then your one machine has bandwidth for
    • Other applications on machine saturating connection

Any hiccup in any one of this areas of the machine could have a cascading effect on performance degradation, since each one could act as a bottle neck to the others.

On to the actual metrics you want to watch for, grouped by type:

CPU

% Processor Time:

  • What is it ?
    • Total time the processor was busy processing.
  • Why should I care ?
    • Gives a very general measure of how busy the processor can get.
  • How does it help me ?
    • If this counter is always very high, then you need to use some of the counters down below

% Privileged Time:

  • What is it ?
    • Total time the processor spent executing in kernel mode.
  • Why should I care ?
    • This measure takes into account only the kernel related operations that processor does, like memory management
  • How does it help me ?
    • If this counter is above 20% you have a driver, or hardware issue.

% User Time:

  • What is it ?
    • Total time the processor spent executing in any user application code.
  • Why should I care ?
    • Give you an idea of how much work your application code forces the processor to do.
  • How does it help me ?
    • If this percentage is too high it may be conflicting with privileged processor time. You always want to have some buffer, between “user time” and “privileged time” so the system can run smoothly.

Queue Length:

  • What is it ?
    • Number of threads waiting for a core to become available.
  • Why should I care ?
    • Give you an idea of how much work your machine is trying to do, at any given time.
  • How does it help me ?
    • Divide this number by the core count of machine. If the value is greater than 3, there is too much CPU pressure on the machine, and created a back long.

Process (*) \ Thread Count:

  • What is it ?
    • Amount of threads currently active in this process.
  • Why should I care ?
    • More threads means more CPU utilization.
  • How does it help me ?
    • Could be an indication that your application is spinning up too many threads at once.

Disk

Average Disk Queue Length

  • What is it ?
    • A simplified definition is how many disk operations (read & writes) were queued.
  • Why should I care ?
    • Gives you an idea of how saturated your disk is.
  • How does it help me ?
    • If the queue length is over 2 for a prolonged periods of time, then it could be an indication of a disk bottleneck.

% Disk Idle Time

  • What is it ?
    • How much time your disk spends doing nothing.
  • Why should I care ?
    • Gives you a picture of when the disk is free.
  • How does it help me ?
    • If the “Disk Idle Time” is low then it is doing a lot of work, which is okay. But only if the “Average Disk Queue Length” is below 2. Could give you an indication that the disk isn’t busy when it should be busy.

Avg Disk sec/Read & Avg Disk sec/Write

  • What is it ?
    • Latency of your disks.
  • Why should I care ?
    • Gives you an indication of how much lag there is to do anything with the disk.
  • How does it help me ?
    • Could indicate a hardware issue if the latency is too high.

Memory

Available Mega bytes

  • What is it ?
    • Amount of physical Memory available to processes.
  • Why should I care ?
    • Lets you know if your running out of memory.
  • How does it help me ?
    • Could indicate a memory leak if it continues to decrease

Pages / Sec

  • What is it ?
    • Rate at which pages are read directly from disk (slows down the whole system)
  • Why should I care ?
    • Occurs when there are page faults which cause system wide delays.
  • How does it help me ?
    • The higher this number gets, the more the system is running out of memory. The more it runs out of memory the more page faults will occur.

Pool Nonpaged Bytes

  • What is it ?
    • Area of memory for objects that cannot be written to disk
  • Why should I care ?
    • Eats up your available memory (ram)
  • How does it help me ?
    • If it becomes greater then 80% could lead to system halting, Nonpaged Pool Depletion Issue (Event Id 2019)

Pool Paged Bytes

  • What is it ?
    • Area of memory for objects that can be written to disk, when not being used.
  • Why should I care ?
    • The more object you have in this area the more expensive it will be to retrieve them.
  • How does it help me ?
    • The bigger this value gets the longer it takes to retrieve objects from memory.

Process (*) \ Private Bytes

  • What is it ?
    • The current size in bytes of the memory this process has allocated, that cannot be shared with other running processes.
  • Why should I care ?
    • Tells you how much memory your process takes up.
  • How does it help me ?
    • If this value gets consistently bigger over time, it could indicate the your application has a memory leak.

Network

Output Queue Length

  • What is it ?
    • The length of the output packet queue.
  • Why should I care ?
    • The higher this number the large the backlog of queue packets. The longer it takes to send out data
  • How does it help me ?
    • Helps indicate a network bottle neck. If greater then one, the systems network is nearing capacity.

Resources used:

Should you switch to .Net Core 5 ?

.Net Core 3.1 is in LTS mode (ends support in Dec 2022) where as .Net Core 5 is coming to general availability in November. However, there will never be a LTS version of .Net Core 5. They have a new release life. Every odd release just gets a general availability release, rather than an LTS release.

LTSLevelCycleNetCore

Pros of .Net Core 5

  • Unification of runtimes (mono, …)
  • Performance gains
  • Language interoperability (swift, java, …)
  • Not a lot of breaking changes

Cons of .Net Core 5

  • No LTS release
  • Fundamental packages like Xunit, Swashbuckler, Serilog, and etc, have not transitioned over yet.
  • Still a lot of un knowns in terms of bugs, and stability.
  • No Default Base Image provided my Microsoft yet, everything remains experimental

Recommendation: I think everyone that can should switch to .Net 6 some time after it comes out. Which is in Nov 2021, this would gave you a whole year to migrate to 6, while 3.1 is still being supported.

References

https://dotnet.microsoft.com/platform/support/policy/dotnet-core

https://www.stevejgordon.co.uk/upgrading-from-asp-net-core-3-1-to-5-0-preview-1

How to make your program faster, regardless of programming language or hardware

Person A: Does this sound like a impossible task ?

Person B: No not really

Person A: Does it have limits ?

Person B: Yes it does, but you see dramatic difference in speed regardless.

Person A: This sounds like a scam. Is it a scam ? How much will this cost ?

Person B: No its not a scam, and its free.

Person A: So what is it ?

Person B: You just have to master runtime complexity 😊

Person A: What !? …. 😒

Person B: Yeah I know, I thought that too. But it works 😁

Person A: Yeahhhh well I never really understood that sort of stuff. I just implement businesses logic for living. I don’t know much about computer science. And I am not gonna waste my time on this. If I need speed, I’ll just deploy it on a bigger server 😒

Person A: Come on … Its easy. You don’t have to understand the computer science. You just need to understand graphs, and recognize patterns.

Person B: What … really ?

Person A: Yeah you just got to use the graph below. Or just Google something like “Runtime Complexity Graph” . All you got to know that things in Red are the danger zone, things in Orange are the “meh” zone, and things in the Yellowish Green zone are okay.

Person B: Wait what does this have to do with programming, and what about those function things ?

Person A: Oh yeah right. So those function things represent how your program can run. The “n” represents the input to your function, like an array objects, or numbers. In the danger zone, if you add just one more element, your time to completion more then doubles. Whereas in the Yellowish Green zone adding another element doesn’t do much of anything.

Person B: 😡 This still doesn’t help me.

Person A: Okay okay okay, how about I make cheat sheet for you ? Your little guide to spotting when your in the danger zone ?

Person B: Show me.

Person A:

Type FunctionDescription
Constant Time1No matter how many elements/ inputs you give your function. Its runtime will always stay the same.
Logarithmic TimeLog(n)When doubling the number of input/elements into your function does not double the runtime. Also this is the runtime of most search algorithms.
Linear TimenWhen doubling the number of inputs/elements doubles your runtime. This is a for loop spanning from zero to the end of the input.
n + mTwo for loops one after the other, going over two different collections.
Quasi-linear Timen *Log(n)A worse version of Log(n). This is the runtime of most sorting algorithms.
Quadratic Timen2Every element in an array is compared with every other element in the input. This is the classic double “for loop” over a single array. For every nested “for loop” you add one more to the exponent. So if you had 5 nested for loops, you would have n5 .
n * mTwo nested for loops, but going over two different collections.
Exponential Time 2nA single extra input doubles runtime. You never want this, ever.

WTF is S.O.L.I.D – WTF is S ?

In this series I am going to be going through each of the principles. Go about explaining them in as simple of a manner as possible.

S = Single Responsibility Principle

Anything inside your code that is parts (class, modules, etc) should only ever have one reason to change. For example if you had a Person class, then everything in that class should only do things related to person. A person class should have methods like “eat”, “sleep”, “play”, and etc. However a person should never need to have a “log” method, cause it has nothing to do with a person.

 

4 Reasons why you should choose React for your next project !

1) Don’t Touch the DOM – Imperative Vs Declarative

Hey don’t touch the DOM ! React will do that for you. You might be wondering what I mean by that. You see React shields you from manually interacting (Imperative)  with the DOM using Java Script. Instead React uses something called the “Virtual DOM”. The Virtual DOM is a representation of the actual DOM that React uses to figure out what to actually change on the screen. React will always be like “Oh you made that change ? Let me handle the best way that update“. So when you use React you are using a Declarative way of programming.

2) Hair Balls Vs Lego – Component Architecture

Front end Apps these days are pretty complex, for example take Netflix, Facebook, or AirBnB. They all have complex user interactions and require a large number of cross interacting entities. If you tried to build any one of these applications using vanilla Java-script, and then tried to add more and more features to them, you would end up with a large hair ball of code. That would feature large chunks of CSS, JS, and HTML all over the place. And when anyone ever tried to add a new feature, they would have to copy and past code (the horror !). Components to the rescue ! With components you can encapsulated each and every aspect of your app into little manageable chunks, that can be imported into other parts of your application. Making building new features as easy as building with Lego.

3) One Way Data Flow

Every application has “State“. This can be anything from how many times you clicked a button, to your current permissions inside the app. Now in traditional front end apps this state is spread around different chunks of the app. And are not shareable across different areas of the app. For example if you had a news editor app, it would have to know who has permissions to actually publish the content, and give the user some categories to publish it under.

React deals with this problem by enforcing a centralized state. Once the “State” of the app changes, React automatically reacts and makes the necessary DOM changes to reflect the change. With this centralization you minimize the places where potential bugs show up. And you have a better idea where the bugs are in your application. Since data only flows one through your application, from your “State” to the components.

4) Just the UI

React is a library for building UIs, plain and simple. It does not try to be a large massive framework that has every single feature you could want. Therefor you can just add the extra features you want as third party libraries. This way your app becomes highly customized towards your use, rather than you needing to bend a framework to work the way you want it too.

 

Why you should get started on that side project now !

All programmers ( code carpenters, developers, software engineers, or what ever you like to go by) should constantly be making things, on the side, for free.

Why would anyone do something for free, when they could get paid for it ? Well what if I said that it could lead to more money down the line ? Would you do it ? And if you did do it, would you only do it for the money ?

Now if you became something for purely the purposes of making money, then …. well …. you are probably gonna have a pretty unfulfilling life, full of big houses, and fancy cars … maybe. However I think people should do things because it satisfies them. Doing something that satisfies you feeds you “soul” .

As a code carpenter I make things for a living, that help people out. And in general I like making things in any medium. So I frequently go out of my way to make alot of things, that kind of sort of have a purpose. By doing this I get a better understanding of different things, and develop experiences that will help me build things in the future.

 

WTF is Bubble Sort ?

Bubble sort is a type of sorting algorithm. Its not the best for everything but it does have its uses. It gets its name since it makes the largest values “bubble” to the top. Bubble sort does its job by using pair comparisons. So it takes the current value and compares it to it’s neighbor directly to the right.

Below you can see what I mean. We first take 5 and compare it 3, then ask the question “is 5 greater then 3 ? ” Then if it is, 5 switches places with 3 and if it isn’t then they stay in the same place, and move on to the next value.

BubbleSortEx1

Here you can see a non optimized implementation of it:

// Note this is without any optimizations
const bubblesort = (arr) => {
// We have a loop here to slowly shrink how much of the array we cover
// So that we don't constantly keep looping over the whole array
for(let i = arr.length; i > 0; i){
// This loop, covers the length of the array given to us by the
// loop above.
for(let j = 0; j < i 1; i ++){
// Here we do the comparison that asks the question
// "does the value I am currently at greater then its neighbor ?"
// if it is, then we switch the values around, if it isn't we do nothing
if(arr[j] > arr[j+1]){
let tempVal = arr[j];
arr[j] = arr[j+1];
arr[j+1] = tempVal;
}
}
}
}

view raw
BubbleSort.js
hosted with ❤ by GitHub

 

Questions People have Asked Me – Part 1

Below are some questions I was recently asked, with my answers. 

Please let me know if any of them are wrong, its a learning opportunity for me 🙂

What is the binary sort algorithm and how does it work? 

The binary sort algorithm (BSA) is used to effectively sort data. It works on the principle of continuously cutting the data set in half, until it finds what it is searching for. However, this algorithm only works on data sets that are already sorted. 

First the BSA checks the middle of the data set and compares the value it is searching for. If this value is the value it is searching for then it stops. However, if it is not equal, it checks if the value it found is bigger or smaller then the value it is trying to find. If the value is smaller, the BSA repeats the process on the left, whereas if the value is bigger it repeats the process on the right. This process is repeated multiple times until the value that the BSA is looking for is found. 

What is recursion and how is it used? 

Recursion in programming is when the program starts calling its self, from inside its self. This programming technique is usually used when a single large program can be solved in smaller parts and has a valid base case. Such as computing the Fibonacci Sequence or traversing a binary search tree. 

What is polymorphism and what is its purpose? 

Polymorphism is an aspect of Object-Oriented Programming where a “object” can take on many different forms, if all the forms are its children.  

Explain when you should use interfaces and when you should use abstract base classes. 

Both interfaces and abstract classes are a type of contract, in the class structures of a software application. Interfaces are a form of contract between two different entities, where you want to separate the functions from the implementations. This is done such that you existing application does not need to change much if a certain part of it is changed. This can be seen in the commonly used repository pattern, which is used to separate data access logic from the business logic. Where as abstract classes are a less extreme version of interfaces, where certain methods defined in it can have real implementations. This allows any child class that inherits from the abstract class to get those method implementations. The difference between the two can be further seen in how their child methods are derived. Since interfaces are “implemented”, where as abstract classes are “extended”. 

When should you use static methods and static variables? And when shouldn’t you use them. 

Static methods and variables can be used from a class without having to instantiate it. This is usually used when, you want to group a set of functionality or utility function together. An example of this is the “Math” class in JAVA, which gives the user all the math related function they need. You wouldn’t want to use them when you would be creating your inheritance-based class structures, most of the time. 

 

Write a SQL statement to create a table called “author” with the columns “id”, “name”, “age” (for MySQL or SQL Server). 

CREATE TABLE author ( 

id int NOT NULL AUTO_INCREMENT, 

name varchar(255) NOT NULL, 

age int, 

PRIMARY KEY (id) 

); 

Write a SQL statement to create a table called “book” with the columns “id”, “title”, “genre”, “author_id” (for MySQL or SQL Server). 

CREATE TABLE book ( 

id int NOT NULL AUTO_INCREMENT, 

title varchar(255) NOT NULL, 

genre varchar(255), 

author_id int, 

PRIMARY KEY (id), 

FOREIGN KEY (`author_id`) REFERENCES `author` (`id`) ON DELETE CASCADE); 

 

For the “author” and “book” tables created above, write a SQL statement to tell you the number of books each author has written, but only for authors who have written 2 or more books. The output should not show authors that have written only 1 book. The output should have the author’s name and the number of books they have written. 

SELECT author.name, COUNT(*) AS ‘# books’ FROM author, book WHERE author.id = book.author_id GROUP BY author.name HAVING COUNT(*) > 1; 

 

In databases, what are indexes used for and how to you decided how to use them effectively. 

Indexes in databases are used to speed up data retrieval. However, they come at the additional cost of space, and added complexity to database maintenance. They should only ever be used when the same type, or group of data is constantly being accessed. If the number of reads get even larger, there should also be some sort of caching layer the application queries, such that it doesn’t need to query the SQL database directly. 

 

What is the value of unit testing and what are some of your strategies for writing good unit tests? 

Unit testing is used to test the functionality of the different parts of an application. Its value lies in the fact that they make the programmer, test their code in a systematic way. And feeds into a workflow where tests are run before anything gets committed to the master branch. I think the best way to write a test case, is to write the test before writing the application logic, since it gets you thinking about what edge/special cases to consider. This is also known as the test-driven development approach. 

WTF is are Stacks, Queues, and Deques ?

Stack

A stack is a stack of books, or a stack of sandbags or a stack of elephants or even a stack of unicorns. Basically a stack is anything that follows LIFO (Last In First Out), which means that if the last thing you put onto the stack is the first thing you have to take out, then its a stack.

Lets say you build a tower of blocks, that tower is a stack. Why ? Cause to get to the block at the base you need to take off all the other blocks. Below you will see a visual example of what I am talking about.

stackexample

 

Queue

A Queue in programming is the same thing as the Queue while in line to buy food, or go to a movie, or get into a night club, its a first come first serve basis. Meaning the first thing to get out of a Queue was the first thing to go into the Queue.

You can also think of Queues as pipes that transport things. In the case of plumbing, the steel pipe and the water moving through it the Queue. Since the water that first enters the pipe is the water that first leaves the pipe.

Below is a diagram to show what I am talking about:

queuesexample

 

Deques

Deques are like the cooler older brother of the Queue, since it lets things flow in more then one direction. Now the user of the data structure can choose how they take elements out of it. If you understood how Queues worked, then you should be able to understand the diagram below.

dequesexample