Akash's Blog

Monday, January 20, 2025

Clone all GitHub Repositories of the User

Cloning all the repositories of the user is not something that you would need very frequently but for personal repositories it comes very handy as I don't need to clone individual repositories. Similarly while working with any organization this saves lot of time of individual cloning. Having said that you may need required access and permission to the account and organization as we will going to use the user token to clone the repositories.

Generate Access Token

The first step is to create a personal token which can allow us to access our repositories through API.  We can generate the token from Settings > Developer Settings > Personal Access Token. Here we will going to use classic token for better understanding and simplicity.















Run script to Clone

Following script can be used to clone all the repositories. It uses jq command to process the response of the github API which you may need to install if not done already. Basic idea is to call the GitHub API, fetch list of repositories, iterate over the repositories and clone each in given directory.  Update GitHub username and token in following script and execute it.

#!/bin/bash
GITHUB_USERNAME=""
GITHUB_TOKEN=""
# GITHUB_ORG=""

API_URL="https://api.github.com/user/repos"
# For organization use following
# API_URL="https://api.github.com/orgs/$GITHUB_ORG/repos" 
PAGE=1
PER_PAGE=100

mkdir -p allrepos
cd allrepos || exit

while :; do
  REPOS_TO_CLONE=$(curl -s -u "$GITHUB_USERNAME:$GITHUB_TOKEN" "$API_URL?per_page=$PER_PAGE&page=$PAGE" | jq -r '.[].clone_url')
  
  if [ -z "$REPOS_TO_CLONE" ]; then
break fi echo "$REPOS_TO_CLONE" | while read -r repo; do
git clone "$repo" done PAGE=$((PAGE + 1)) done echo "All repositories are cloned successfully!"

This script may take time depending on the number of repositories given user or oganization have. You will see success message once all repositories are cloned.

Monday, January 13, 2025

How Stackoverflow helped me?

Introduction

Stackoverflow has been part of every programmers day to day work. From finding solution to the errors, looking for best approach of the solution or just helping other fellow programmers on the internet, stackoverflow is an excellent website. 

In the era of two poles of internet search nowadays, ChatGPT and Google Search, it may seem that stackoverflow is loosing it's charm. I have been a user for a decade and I really enjoy reading questions and answers, not necessarily I am actively contributing the way I used to do earlier but still I find is truly valueable for learning and understanding the concepts. 

Having said that, the stackoverflow community has been a great company to be with so that you can ensure continuous learning.

How did it helped me?

Stackoverflow is the reason I managed to grab my first job, I knew nothing and I know nothing. When I signed up, I did all the mistakes that any new member would do, try to answer questions without clear explanations, impulsive voting and flagging, focusing on gaining reputation rather than helping the people etc.

However, I stick to it for longer and slowly kept on improing the way I answer the questions and primarily focused on helping instead of just answering the questions for the sake of gaining points. In a long term it helped me in following ways and most probably to anyone who actively contributed for a year and so,

  • I started paying much more attention to the problem rather than solution.
  • I understood the value of community and open source.
  • I learned to explain things better.
  • I gained confidence to build things on my own. etc.
Initially you may feel bullish when you focus on reputation and badges but gradually you realise that the quality answers and genuine help can take you far instead of reputations and badges. It doesn't mean reputations and badges are useless but it simply means that these should not be your driving force for contributing on Stackoverflow.

Where am I currently?

I no longer actively contribute on Stackoverflow now but I do regularly read interesting questions, drop in comments, help community in any possible way I can do and even answer question if I feel to. 

Following is my Stackoverflow flair,


profile for Akash Thakare at Stack Overflow, Q&A for professional and enthusiast programmers

Is it still worth?

For anyone who is in to programming should contribute to it. You can learn from the experts directly, get into discussion with them through chat, comments, questions and even answers. It surely enhances your knowledge. You don't need to invest whole day but couple of hours of contribution in a week can make a huge difference to your overall programming skill which I can surely guarantee. You just need to remain focused into area of your interest for longer rather than reading anything and everything.

Summary

Stackoverflow is an exceptional place to be if you are in programming and software developement in general to stay connected and updated about the tools and technologies which are making difference to the world. Do not rush to gain more reputations and badges but stay aligned on the path of help and support. ♥️

Thursday, December 26, 2024

AWS Cloud Cost Optimization

Introduction

AWS Cloud is one of the biggest cloud player which has captured more than 31% of the cloud service market. The cost of the services is majorly on usage based, the more you use the more you pay. We as an individual or as an organization need to understand how much we can optimize the cost when it comes to AWS because there can be multiple ways to achieve desired results but we are specifically looking for most optimal way.

In many cases we can save cost by taking necessary actions, however, it’s important to understand that saving cost should not be the first focus. It can come on the way to building it or even it’s ready and now we have some dedicated time to attend the cost related aspect of it.

Pricing in AWS

This is what AWS say about pricing,

You pay for the service you need, for as long as you use them, without compex licensing.

We should understand the “actual” need and “probable” usage before deciding the components and their capacity in cloud service, however, many of times these are two unknows which we can not confidently claim from the start. We can start with a minimal infrastructure and make it flexible to scale which is usually doable in Cloud environment.

High cost is not a bigger challenge if the usage also brings high value and returns for the business which is usually the case, the more usage is the more business value it should extend at the end of the day.

We can also calculate the price beforehand using AWS Pricing Calculator.

Ways to save money

There can be several ways which you can use to cut down the overall cost. Not all of the options will work for you but based on you current structure you can take up one or two options which can save you some bucks.

When it comes to cost we should also consider the cost we need to pay to the person who is taking care of this optimisation. Someone in the team took that responsibility and spent 20 hours to reduce 1000$ per month and the person got paid 30$/hour is a still a good deal.

Service

Choosing the right service for the job is very important when it comes to cost. There are certain services in AWS which you can choose for the same sort of job. There are many options for same purpose, for example, S3 and EFS you can use for storage and both may fulfill the requirement. However, the cost different is significant, for 1 GB of storage S3 can cost around 0.03$/month while EFS can cost 0.36$/month (in US East - N. Verginia).

Ofcourse this doesn’t mean you must choose s3 over EFS, both have their right use cases to serve. We need to understand which one we can pick and atleast review the cost as well before deciding the service to use.

Capacity

Capacity of the instances we are using in infrastructure makes a huge difference in billing. We can typically guage the machine requirements early based on the processes that we are planning to run on these machines. We usually know beforehand about whether it’s computation heavy or memory heavy operations, we usually know approaximately how much primary and secondary storage is required.

There are multiple types of instances available, we need to review our need and find the most suitable one with required configurations.

Details of all instance types can be found here.

Even though I specifically mentioned EC2 but it applies to RDS as well. Similar strategy can also be applied while deciding S3 storage tier and even the bucket policies to move between tiers for the sake of cost.

We can also review this in AWS Cost Explorer which can help in right sizing the instances.

Reserve

Choose reserved instances over on demand instance if we are sure that we will need them for atleast year or ever more than that. Reserved instance pricing is locked in and not completely based on usage, while on demand will be billed on usage. Ultimately, you end up saving upto 70% for 3 year term.

If we choose reserved t3.large instance for three years vs on demand instnace the monthly expense reduces by 50%. This difference will increase for high end instances upto 75%.

Alerts

We can enable the billing alerts based on the billing metrics so that in advance we get to know about the possible cost for given month. We can get an estimation in advance so that we can prepare and even take action to reduce the cost if possible.

Auto Transition

There are different options available in AWS to save money by shifting to tier as per the user. S3 Intelligent Tiering is one of the good example. It automatically detects the right fit for the object and moves it to most cost effective storage. Not necessarily this will fit all the usecases but we should explore such options for the service so that we don’t miss an opportunity to save money.

Review & Clean Up

Regular review and cleaning up unused resources may look unproductive or boring but 30 minute connect every month or quarter can definitely help to save a lot of money. Work pressure or busy schedule may pile up some unused ec2 machines, volumes, buckets etc. which we should regularly keep in check.

Alternative Cloud

This is not an option all the time but nowadays most major cloud provider services are durable and robust, if cost can be major factor, don’t stick to one cloud provider and explore other alternatives which can be a huge saving in a longer run, rest assured quality and performance is not compromised.

Conclusion

In current situation of booming cloud services and competitive environment between cloud providers, we should be vigilant about the different factors in the service that we consume and the long term repurcissions, when it comes to cost. There are several ways which we can try to cut down cost as much as possible without compromising on the quality and performance of the system. However, being ignorant about cost and pricing may end up with wasting your money which could have been utilised better.

Saturday, October 19, 2024

Observer Pattern

Observer pattern is widely used in many real time applications. It is one of the behavioural design patterns. In this post we will try to understand it and try to answer questions around it.

Let’s understand it with an example of stock market.

Basic idea is to observe something and get notified based on some condition so that observer can take necessary action accordingly.

With following Java program it becomes a bit more cleared on how this design pattern fundamentally works.

class Stock {
    String name;

    double price;

    public Stock(String name, double price) {
        this.name = name;
        this.price = price;
    }
}

class StockMarket {

    private List<Stock> stocks;

    private List<StockObserver> observers;

    StockMarket() {
        observers = new ArrayList<>();

        stocks = new ArrayList<>();
        stocks.add(new Stock("Apple", 1000.23));
        stocks.add(new Stock("Google", 1000.23));
        stocks.add(new Stock("Microsoft", 1000.23));
        stocks.add(new Stock("Facebook", 1000.23));
        stocks.add(new Stock("Tesla", 1000.23));

    }

    public void start() throws InterruptedException {
        Random random = new Random();

        while(true) {
            Thread.sleep(1000);
            Stock stock = stocks.get(random.nextInt(stocks.size()));
            stock.price = random.nextDouble();
            observers.forEach(o -> o.movement(stock));
        }
    }

    public void register(StockObserver observer) {
        this.observers.add(observer);
    }

}

interface StockObserver {

    void movement(Stock stock);

}


class Trader implements StockObserver {

    @Override
    public void movement(Stock stock) {
        System.out.printf("%s stock price %f\n", stock.name, stock.price);
    }
}

Note that the StockMarket is completely unaware of the underlying observer. Any class can be a observer not necessarily it to be a Trader. However, every Trader will be a StockObserver by default.

Consider a scenario where a Trader is interested in specific stock and not all stock. We can think of two options here,

  1. StockMarket can register Trader for specific Stock.
  2. Trader can act on specific Stock and ignore the other movements.

Which option is more suitable here and why?

Preferred choice may be first one as it stops flow of unneccessary information to Observer which they are not interested in and saves function calls.

What if StockMarket is overloaded with lot of observers?

Considering the example, there can be lot of Traders observing the stock market and wants a real time information. For each and every stock the pattern used will not likely going to scale. It may work but it may end up with higher latency.

What if one of the Observer call fails!?

All the subsequent observers should get notifications. Otherwise we may need to perform check to see whether the observer is active or not before notifying. In above example, if any of the observer fails it will not notify remaining observers.

Should StockMarket be responsible for notifying observers?

That’s a valid question. It can be moved to separate class to filter out events and notify to respective observers.

This is pretty simple implementation of the observer pattern. We may need to look into other aspects of the system to deal with the issues arising in current design. For example, rather than immediately updating we can build a queue to fan out. But that’s not the solution in all use cases. We need to take a wholestic view of the business and system requirements into consideration to decide the possible approach.

Summary

The observer pattern is a behavioural pattern which can be used to notify one or more observers of the subject in case of any state change. It’s crutical to understand the scalability, performance concerns and heavy dependency on observer interface before adapting this pattern on a large scale use case.

Saturday, September 28, 2024

Platform vs. Virtual Threads

This blog post is a raw comparison of platform and virtual threads. We will use both the types of threads and see how they behave and perform.

It is important to note that the execution time and other matrics may vary system to system, however, it can definitely give a rough idea about the overall picture. We will try to keep this comparison as fair as possible, if I fail to do so, feel free to post comment below.

My System

  • MacBook Pro
    • RAM 16 GB
    • Processor 2.6 GHz 6-Core Intel Core i7
  • Java 21

Time to start

Platform Threads : 21 ms

private static void platform() {
    long start = System.currentTimeMillis();
    for(int i = 0; i < 100; i++) {
        Thread t = new Thread(() -> System.out.println(Thread.currentThread().getThreadGroup().getName()));
        t.start();
    }
    long end = System.currentTimeMillis();
    System.out.println("Platform : " + (end - start) + " ms");
}

Virtual Threads : 18 ms

private static void virtual() {
    long start = System.currentTimeMillis();
    for(int i = 0; i < 100; i++) {
        Thread.startVirtualThread(() -> System.out.println(Thread.currentThread().getThreadGroup().getName()));
    }
    long end = System.currentTimeMillis();
    System.out.println("Virtual : " + (end - start) + " ms");
}

For 100 Threads the difference doesn’t look significant. However, If I increase the count of threads from 100 to 1000 the difference increases drastically.

  • Platform Threads : 235 ms
  • Virtual Threads : 29 ms

Reason

Even though the time taken to initialize threads are different, underlying operation would have taken same amount of time. Virtual threads enables better utilisation of the resources compared platform threads (virtual thread actually uses platform threads internally).

The major difference here is that the platform thread managed by OS are blocking, but virtual thread which are managed by JVM are non-blocking. So, in case of virtual thread it will wait for the platform thread to be available but won’t block the execution of the program.

Speed

Platform : 24 ms

private static void platform() {
    Thread t = new Thread(() -> {
        long start = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
            System.out.println(Thread.currentThread().getThreadGroup().getName());
        }
        long end = System.currentTimeMillis();
        System.out.println("Platform : " + (end - start) + " ms");
    });
    t.start();
}

In case of virtual thread, there is no output in console. Looks like it didn’t print anything to console, the thread didn’t start.

private static void virtual() {
    Thread.startVirtualThread(() -> {
        long start = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
            System.out.println(Thread.currentThread().getThreadGroup().getName());
        }
        long end = System.currentTimeMillis();
        System.out.println("Virtual : " + (end - start) + " ms");
    });
}

It seems we need to use join here.

private static void virtual() throws InterruptedException {
    Thread.startVirtualThread(() -> {
        long start = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
            System.out.println(Thread.currentThread().getThreadGroup().getName());
        }
        long end = System.currentTimeMillis();
        System.out.println("Virtual : " + (end - start) + " ms");
    }).join();
}

And it worked, but took comparatively more time than platform thread, almost double.

Virtual : 49 ms

Let’s try to be fair here,

private static void platform() throws InterruptedException {
    Thread t = new Thread(() -> {
        long start = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
            System.out.println(Thread.currentThread().getThreadGroup().getName());
        }
        long end = System.currentTimeMillis();
        System.out.println("Platform : " + (end - start) + " ms");
    });
    t.start();
    t.join();
}

Platform : 26 ms

For some reason, platform thread is winning here.

I thought of replacing System.out.println(Thread.currentThread().getThreadGroup().getName()); with System.out.println("."); to keep it simpler.

If we run both methods together from the main method,

public class Test {

    public static void main(String[] args) throws InterruptedException {
        platform();
        virtual();
    }

    private static void virtual() throws InterruptedException {
        Thread.startVirtualThread(() -> {
            long start = System.currentTimeMillis();
            for(int i = 0; i < 1000; i++) {
                System.out.println(".");
            }
            long end = System.currentTimeMillis();
            System.out.println("Virtual : " + (end - start) + " ms");
        }).join();
    }

    private static void platform() throws InterruptedException {
        Thread t = new Thread(() -> {
            long start = System.currentTimeMillis();
            for(int i = 0; i < 1000; i++) {
                System.out.println(".");
            }
            long end = System.currentTimeMillis();
            System.out.println("Platform : " + (end - start) + " ms");
        });
        t.start();
        t.join();
    }
}

Multiple executions,

  • Platform : 19 ms and Virtual : 15 ms
  • Platform : 21 ms and Virtual : 24 ms
  • Platform : 21 ms and Virtual : 16 ms
  • Platform : 22 ms and Virtual : 14 ms
  • Platform : 22 ms and Virtual : 29 ms

After changing the sequence of methods the result changes dramatically,

...
public static void main(String[] args) throws InterruptedException {
    virtual();
    platform();
}
...

Multiple executions,

  • Platform : 17 ms and Virtual : 60 ms
  • Platform : 7 ms and Virtual : 35 ms
  • Platform : 10 ms and Virtual : 47 ms
  • Platform : 9 ms and Virtual : 35 ms
  • Platform : 6 ms and Virtual : 30 ms

Okay, lot of confusion here. Let’s find answers of each one by one.

Reason

Why virtual thread couldn’t execute?

In first case the problem is that the main thread exits before the virtual thread completes it’s execution. We can definitely join the virtual thread to our main thread which we did. Again the non blocking behaviour made it behave this way.

public static void main(String[] args) throws InterruptedException {
    virtual();
    Thread.sleep(50000);
}

This gives Virtual : 33 ms

Why platform thread executed faster with join?

In case of virtual threads JVM has to deal with additional work underneath, which leads to slight delay in the execution. Even though it’s one thread and we are joining it to main, compared to platform thread it took more time to finish. This ovehead may include things like park/unpark the task based on JVM or CPU bound decision making.

How sequence of method is impacting the thread task execution time?

Again the blocking nature of platform thread makes life of virtual thread easier. For platform thread executed first the JVM, CPU or I/O is managed differently compared to virtual thread if executed first. Since the task is pretty simple the impact is significantly lower but such differences can make developer wonder if not understood properly.

↑ Back to Top