Which Cloud Storage To Use?

Which Cloud Storage To Use?

Which Cloud Storage To Use?

 

Part 2: Amazon Web Services

Vasko Viktorov is part of ROITI’s Cloud Software Engineering Team and in a sequel of three articles, he is taking a peek into each of the three most popular cloud providers – Azure, AWS and Google. In the previous article, he looked into Azure’s common storage options. If you missed it and want to have a full picture of available storage options on the market, check it while it’s hot. If you have already seen it or aren’t interested in Azure – proceed because we are going to peek into the next provider in the list – Amazon and its product Amazon Web Services (AWS).

Amazon Web Services

Amazon Web Services was founded in 2004 as part of Amazon.com, but it’s been functioning independently as its own company since November 2012 and only four years were needed for it to become the market leader. From 2016 till the present time, AWS has been the market leader, with a 34% market share as of Q3 2022.

As one can guess, to be a market leader for that long would mean setting the market standards. As such, AWS offers an extensive line of storage options that should meet most if not all requirements. Each of those storage options can be accessed easily via the AWS Management Console website or Command Line Interface (CLI).

Of course, as expected, there are also ready-to-use developer packages for different programming languages such as .NET, C++, Go, Java, JavaScript, Kotlin, PHP, Python, Ruby, and Swift. This makes communication, configuration, and usage a simple task because there’s no need to invest in new personnel with special expertise or learn a new language.

As we can see there’s a lot of flexibility and plenty of options so let’s check them out.

Elastic Compute Cloud (EC2) was one of the first services provided by AWS in early 2006. It is a virtual machine maintained by AWS, that provides virtual computing environments (instances), that can be set with preconfigured templates (ready-to-use software for the instances) and preset various hardware (depending on the instance type).

Although EC2 is not a storage service itself, it has a storage feature that allows the instance to be set as:

  • Temporary storage volume – persisting data as long as the instance is not stopped, hibernated, or terminated. If the storage is mounted in the same instance as an application by which it will be used, it can offer extremely high I/O speed, due to the fact, that they share the same space. This storage is perfect for application cache or buffer.
  • Persistent storage volume – persisting data on Elastic Block Storage, which we will check out next.

Elastic Block Storage (EBS) volumes behave like raw, unformatted block devices (virtual hard drives) that can be mounted on EC2 instances. The size limitations of EBS depend on the number of volumes (SSD/HDD) mounted on the EC2 instance and its type – different instance types have different volume limitations. But in most use cases you shouldn’t be concerned with hitting the limits. EBS delivers high I/O performance, availability, and durability. It is designed as a fast and reliable block storage volume for a single instance, although there’s a multi-attach exception to this, that applies only in very specific scenarios.

It is best used in cases of demanding workload, where long data persistence is required, and there is a high frequency of random read/write operations. Also, keep in mind that you pay for the amount of GB you provision for EC2 instances per month and input/output operations beyond the set baseline.

Elastic File System (EFS) is a managed Network File System (NFS), that provides serverless, fully elastic file storage where you can share files without provisioning or managing storage capacity and performance. It is designed for use across different instances and availability zones. EFS storage is practically unlimited, the only notable limitation is the single file size of 52.7 TB, which is quite big.

It is best used in demanding workloads – same as EBS, but in cases where the storage needs to be shared between more than one instance and used across different availability zones.

Simple Storage Service (S3) also known as bucket, is object cloud storage that provides high accessibility, scalability, reliability, and flexibility. The storage size is practically unlimited, the only limitation is the file size, which is capped at 5TB.

There are seven different types of S3 buckets, each optimized for a different scenario. They are ordered by expected frequency usage from most frequent to most infrequent:

S3 Standard frequent (Standard) is the default storage class. It’s best used in case of frequently accessed data. It has higher maintenance and retrieval cost compared to the others.

S3 Standard Infrequent (Standard-IA) is designed for long-lived and infrequently accessed data, accessed across multiple zones, which makes it resilient to zone connection loss. Best used for backups and infrequently accessed data that requires a millisecond response. It has a higher maintenance cost, but lower retrieval one.

S3 One Zone Infrequent (One Zone-IA) – has the same use case as S3 Standard-IA but is used only for a single zone. This makes it vulnerable to zone loss, but it is the cheaper option.

S3 Glacier Instant Retrieval is used for rarely accessed objects that require a milliseconds response. It has lower maintenance costs, but higher retrieval one.

S3 Glacier Flexible Retrieval is used for rarely accessed long-term data that does not require immediate access. It has a minimum storage duration of 90 days. The free retrieval takes 5 to 12 hours, but expedited retrieval can be ordered, which takes 1-5 min.

S3 Glacier Deep Archive is for long-term archive and digital preservation with retrieval in hours at the lowest cost of storage in the cloud. It has a minimum storage duration of 180 days. The retrieval time by default takes 12 hours.

S3 Intelligent-Tiering is your option If none of the above hit the spot or you can’t make up your mind. S3 Intelligent Tiering will analyze the data usage frequency in S3 and optimize costs for unknown or changing access patterns (of course for a small additional fee).

Amazon File Cache is a fully managed high-speed cache for file data processing, build on SSD, which makes it suited for low-latency, IOPS-intensive workloads that typically have small, random file operations. It can be used in on-premises file systems, AWS object store or AWS file systems if they support the NFSv3 protocol.

Amazon File Server service (FSx) is a managed native file server service, where fully managed means that hardware and software setup, maintenance and backup are fully taken care of by AWS. Another plus for FSx is that it has easy integration with the other AWS services. Overall if you are looking for a fast and easy solution, where there are minimum to no code changes required for migration from on-premise to the cloud, this could be your solution.

There are currently four different types of FSx services:

FSx for Lustre is based on one of the popular open-source file systems – Lustre. It supports many requirements of leadership class High-Performance Computing (HPC) simulation environments. FSx for Lustre is best suited for compute-intensive workloads like machine learning and video rendering.

FSx for Windows File Server uses a native Microsoft Windows file system, built on a Windows server. It is an optimized and perfect place for Windows-based applications.

FSx for OpenZFS – is based on one of the popular Linux-based file servers. It is an optimized and perfect place for Linux-based applications.

FSx for NetApp ONTAP is built on NetApp’s popular ONTAP file system. It is accessible from Linux, Windows, and macOS compute instances running in AWS or on-premises. It is optimized for applications based on NetApp ONTAP on-premises.

It is worth mentioning that AWS also provides a cool service called AWS Storage Gateway that makes accessing and moving data from on-premises to one of its cloud storages an easy task. So, if AWS is your choice, you should get familiar with this service.

This concludes the common storage options that are provided by AWS. If none of them made you excited and you’re looking for something different, you can also check the database options that AWS offers for:

Relational databases: SQL, Aurora(MySQL/PostgreSQL), Oracle, MariaDB, Redshift(PostgreSQL)

Non-relational databases: Cassandra, MongoDB, Amazon Neptune, DynamoDB, Timestream

In the next part, we will check out the common storage options provided by Google.

 

Resources:

https://wire19.com/amazon-microsoft-and-google-cloud-infrastructure-market/

https://www.educba.com/aws-storage-services/

https://acloudguru.com/blog/engineering/s3-glacier-instant-retrieval-deep-dive-which-s3-storage-class-is-right-for-me

https://www.geeksforgeeks.org/what-is-aws-ec2-instance-storage/

https://www.geeksforgeeks.org/amazon-web-services-introduction-to-amazon-fsx/

https://renovacloud.com/ebs-vs-efs-which-storage-system-is-right-for-you/?lang=en

https://docs.aws.amazon.com/whitepapers/latest/aws-overview/storage-services.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

https://docs.aws.amazon.com/fsx/latest/FileCacheGuide/what-is.html

https://docs.aws.amazon.com/managedservices/latest/userguide/amz-fsx-open-zfs.html

https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/what-is-fsx-ontap.html

Can We Forecast the Air Quality in Sofia?

Can We Forecast the Air Quality in Sofia?

Can We Forecast the Air Quality in Sofia?

One of our hobbies in recent years has been trying to make something useful. Aleks Lyubenov is one of ROITI’s Data Scientists and together with the rest of the team, has been working on an internal project regarding forecasting the air quality in Sofia – Bulgaria’s capital. Below, Aleks reveals some of the processes and analyses behind it, as well as what the predictions are.

Disclaimer: Buckle up for some Maths. 🙂

The issue

In recent years, air pollution has become an incredibly important topic in the realm of sustainable growth and development. Aside from the obvious environmental impact, rising emissions of air pollutants inevitably lead to higher concentrations of particulate matter in the atmosphere. This can – in turn – have a substantial effect on both local and global economies through reduced labour productivity, poor agricultural yields and an increase in healthcare expenditure. Thus, it is essential that we are able to model and accurately forecast such pollution in order to mitigate the aforementioned negative consequences.

Weather and other atmospheric phenomena are often modelled using numerical weather prediction, a method which attempts to explain the temporal evolution of atmospheric processes via the assimilation of observed variables such as cloud formation, air pressure, temperature, precipitation, and dozens, if not hundreds of other meteorological indicators. However, many of the dynamic equations that govern these processes cannot be explicitly calculated due to various numerical instabilities and are therefore approximated using parametrization schemes and computationally expensive numerical methods. Due to the extensive computational requirements of three-dimensional spatial interpolation in these frameworks, we believe it is essential to develop more lightweight models, which leverage the power of modern machine learning to model atmospheric phenomena.

Thus, we propose a data-driven approach to air quality forecasting. With over 2000 monitoring stations all over Europe, the European Air Quality Index provides a short-term indication of continental atmospheric quality using time series data containing five key pollutants:

  • PFC2.5 Atmospheric particulate matter (< 2.5𝜇m)
  • PFC10 Atmospheric particulate matter (< 10𝜇m)
  • O3 Ground-level ozone
  • NO2 Nitrogen-Dioxide
  • SO2 Sulphur-Dioxide

Of these sensors, 32 are located in Bulgaria and 5 are within Sofia:

  • 42.708, 23.31 165, Kozloduy, Banishora, zh.k. Banishora, Sofia
  • 42.726, 23.342 Rusalka, kv. Orlandovtsi, Serdika
  • 42.646, 23.381 389, zh.k. Mladost 3, Sofia
  • 42.727, 23.34 54, Zhelezopatna, kv. Orlandovtsi, Sofia
  • 42.622, 23.328 Monah Spiridon, Simeonovo – Dragalevtsi, Vitosha

In addition to particulate matter, we have considered meteorological data for the greater Sofia area. This data included indicators such as pressure, temperature, humidity, precipitation, wind speed, cloud cover and solar radiation levels, to name a few. We also decided to include temporal features such as, but not limited to, the day of the year and whether a particular day is a weekday or not. This was done to give our model an opportunity to capture any existing seasonal trends.

The approach

During the data collection and cleaning process, one of the main challenges we encountered was varying measurement frequencies in the gathered sensor data. We observed delays ranging from just 30 seconds to a few hours in extreme cases. Consistent timesteps are obviously an essential requirement in the predictive modelling of time series data. As such, we utilized a windowed resampling technique to fill in missing values. Our approach was to use a weighted average of data points located around the interpolation target. The weight, naturally, depended on the time difference between the sampled point and the target.

Thereafter, we constructed a 24-hour sliding window in the time series and defined the desired output to be the first consecutive hour outside the window.

Although traditional time series forecasting has utilized various approaches ranging from Autoregressive Integrated Moving Average (ARIMA) models to Support Vector and Random Forest Regressors, such models fail to take long-term dependencies into account and their performance ultimately suffers due to the volatile nature of air quality data.

In order to address this issue, we turn to Recurrent Neural Networks (RNNs). This is a class of neural networks, in which nodes are connected via a temporal-directed graph. Such networks make use of loops and internal states to process sequences of variable length, allowing for information persistence and the modelling of temporally dynamic behaviours, such as those we observe in air pollution datasets. At time step t, the output ht of component A depends on the input x(t), as well as the computation at time step t−1. Such an internal loop allows information to pass from one processing step to the next, taking previously learned information into account when identifying new patterns. Unfortunately, vanilla RNNs can’t model long-term dependencies in data due to vanishing and exploding gradients during backpropagation. This, fortunately, is a problem that isn’t shared by a very special kind of RNN called a Long-Term Short-Term Memory Network (LSTM).

The arrow/vector at the top of the LSTM diagram represents the cell state and can be thought of as the main highway of information. The model can modify the cell state by adding or removing information as it trains. This process is regulated by gates, which are composed of a sigmoid activation and an elementwise vector product. In our case, these gates will allow our model to keep track of the pieces of information that have a true impact on the levels of particulate matter in the atmosphere while allowing it to ignore more inconsequential variables.

There are three gates which control the flow of information in the LSTM. The first is the forget gate layer, which regulates what information needs to be thrown away from the cell state. The next is called the input gate layer, which – as its name would suggest – is responsible for deciding what new information should be included in the cell state. After the model has decided what new information to include, a hyperbolic tangent layer is responsible for creating a vector of candidate values that should be added to the cell state.

Now, the old state Ct−1 will be updated in a new state Ct by multiplying the old state with the information to be forgotten (weighted by sigmoid activation values) and adding the new candidate values (scaled by the intensity of the update). Of course, after this update is complete, we need to output a final value. This output will be a filtered version of our current cell state. A sigmoid activation decides which part of the cell state to push forward, and thereafter, a hyperbolic tangent activation forces the cell state values between -1 and 1. This output is known as the hidden state ht, and is fed back into the network at the next time step as the previously hidden state ht−1, allowing the LSTM to extract new information with the help of this clever recall mechanism.

Based on a paper by Wang, et. al. in May 2022, we have modified the LSTM architecture into an ILSTM model, which consists solely of an input gate and a forget gate. The mainline forgetting mechanism remains the same. However, the new architecture incorporates the prior cell state ct-1, into the input gate computation. In addition, the ILSTM introduces a Conversion Information Module (CIM) to prevent gradient saturation in the sigmoid during training: CIM=tanh(It). The information, which is ultimately kept in the cell state is defined by the sum of the cell state (after the forget gate computation) and the CIM. These modifications reduce the weight matrices from 8 to 4 and the bias parameters from 4 to 2.

Recurrent Neural Networks are not the only networks that utilize many identical neurons to model complex relationships (in this case, temporal relationships) while keeping the number of parameters in the model minimal. This weight-linking principle is also used in Convolutional Neural Networks (CNNs) to achieve a similar result.

The power of CNNs is based on the notion of a mathematical convolution, which is a way of extracting features from a signal. Formally, a convolution is an operation defined by two functions, which produce a third function, expressing the effect of one function on the other.

(f∗g)(t)=∫−∞f(𝜏)g(t−𝜏)d𝜏

The convolution can be thought of as the area under the curve f(𝜏) at each step t, weighted by the function g(t−𝜏). This formulation tells us that as the value of t changes, the weighting changes and is able to emphasize different “parts” of the input function f(𝜏). In machine learning terminology, the function g(t−𝜏) is called the kernel. The idea is to replace the per defaltam matrix multiplication of neural networks with the convolution operator defined above.

Similar to the way in which a neuron in the visual cortex responds to a specific stimulus, a convolutional layer convolves the input and passes this output onto the next layer of the network. After passing through a convolutional layer, the input turns into an abstraction called a feature map. In our case, a one-dimensional convolution over time series data will produce a one-dimensional feature map, having only a single channel. A beneficial property of convolutional layers is that they are composable, meaning that each subsequent layer is able to extract more abstract features from the given data.

We interlace convolutional layers with what are known as pooling layers, which essentially aggregate the output of convolutional layers, telling us whether a particular feature was detected in that layer or not. Because these pooling layers create aggregations (commonly, this is the maximum of each local neuron cluster in the extracted feature map), they allow later convolutional layers to have a greater receptive field over the original input data. Moreover, pooling layers provide the added benefit of small input transformation invariance.

Therefore, CNNs are very good at exploiting locality in the data. This robustness has allowed CNNs to be very successful in the realm of computer vision and time series data analysis. As we are interested in the levels of particulate matter within a particular time window, our task obviously falls into the second category. It is important to note that – unlike our recurrent layer – the convolutional layer is more concerned with modelling local spatial correlation than long-term dependencies.

Our predictions

Below, are two sets of predictions for a single week of particulate matter values, presented in hourly granularity. These preliminary results seem to indicate that combining these two deep learning models can be an effective way of capturing both long- and short-term relationships in air quality data, with minimal architectural complexity. Moreover, the adaptation of the LSTM recurrent layer into an ILSTM module seems to give our model a bit more of the flexibility required to predict some of the more volatile spikes in the dataset.

In the future, we aim to extend the predictive window of the model and experiment with attention mechanisms, which effectively accentuate certain parts of the input data while diminishing others, allowing the model to focus on smaller, but far more meaningful patterns. This, and further results, will be discussed in a subsequent article.

As a part of the product roadmap, and in line with our corporate social responsibility, we also aim to integrate our model with an existing air quality/weather forecasting platform, making our predictions available to the public via an open API.

 

References

Jingyang Wang, X. L. (2022). An air quality index prediction model based on CNNILSTM. Nature Scientific Reports.

Olah, C. (2014, July 8). Conv Nets: A Modular Perspective. Retrieved from Colah’s Blog

Which Cloud storage to use?

Which Cloud storage to use?

Which Cloud storage to use?

Part 1: Microsoft Azure

Cloud is the new go-to place for everyone willing to improve their business – from small coffee shops to huge corporations. It takes off a lot of risk and maintenance from users’ hands and gives scalability and security in return, which otherwise would have cost a small fortune to have and maintain privately. But if you are reading this, I’m sure the decision to go for a Cloud solution has already been made, and there is another question hanging now – ‘Which is the best place to store data and files for me?’ There are quite a lot of choices to be made – from choosing a provider and a payment plan to choosing the correct storage and the process can get complicated and confusing. Unfortunately, this article will not help you with the provider and payment plan choices, but it will reveal the different storage options that the most popular providers have and hopefully will help you make the correct storage choice.

Vasko Viktorov is part of ROITI’s Cloud Software Engineering Team and in a sequel of three articles, he will peek into each of the three most popular cloud providers – Azure, AWS and Google and check what storage options they offer, what use cases they are best suited for and look into some why’s and how’s. There is a large ground to cover so let’s roll up our sleeves and start with the first one – Microsoft Azure.

Microsoft Azure

The company has been operating in the Cloud market since 2010 and it is well-known as one of the biggest ones. As per the latest data from Q3 2022, Microsoft Azure holds the second biggest market share (21%) for cloud services. A company with that much experience, as expected, offers storage options that are well thought out and optimized for all common scenarios. Each of those storage options can be accessed easily via the Azure website or the desktop application ‘Azure Storage Explorer’.

Of course, there are also ready-to-use developer packages for different programming languages such as .NET, Java, Python, Node.js, C++, Ruby, PHP, Go, Android, and iOS, which makes the storage implementation, configuration and use in a project quite an easy task, because there’s no need to learn a new language or find someone special for it.

But enough free advertisement, let’s check what storage options Azure has to offer:

Azure Files provides a fully managed cloud-based file share, that is accessible via SMB and NFS protocols or REST API. It can be used concurrently on cloud and on-premises machines. It’s perfect for the replacement of on-premises file servers and for storing application-shared settings and configurations. It is also a good place for dumping logs, metrics, and diagnostics.

Azure Blob is a cloud storage for unstructured data. It functions as a key/value database where a key is a combination of the folder name and file name and the value is the content of the file (blob). The size limit of each blob can be specified, and metadata (key/value pairs) can be added for easier identification.

It is good to know that some precise identifiers must be included in the keys (file names) or the metadata because the content of the files cannot be searched directly, otherwise, each file will have to be downloaded and read separately.

There are three different types of blobs – block, append and page blobs. The type is specified upon creation of the file and cannot be changed afterwards.

  • Block blobs are optimized for uploading/downloading big amounts of data. Each block is limited to 4.2GB of data and a block blob can hold up to 50 000 blocks, which is in total 210 TB. The main use case for block blobs is for files that need to be read from the beginning to the end like images and binary media files and documents.
  • Append blobs are optimized for append operations. Each append block is limited to 4.2MB, up to 50 000 append blocks – a total of 210 GB. The main use case for append blocks is logging data from services, applications, virtual machines, etc.
  • Page blobs are optimized for frequent read/write operations. Each page block is limited to 4.2MB, it has no limit to the number of page blocks, but the total size cannot surpass 8.8 GB. Its main use case is storing index-based and sparse data structures like OS and data disks for Virtual Machines and Databases.

Azure Queue is a message storage that can hold millions of messages, depending on the storage total limit. A queue message can be up to 64 KB. It is great for communication between decoupled components of an application, or for storing the backlog of a company’s work actions.

Azure Table is a key/value database and it looks exactly like a classic database table. Each Azure table has one column, called entity, which is the key and one or many columns, called properties which are the value. Keep in mind that not all entities in a table have to share the same columns.

There are also limitations in the Azure table usage – different Azure tables cannot be linked to each other, search by value is slow, and the tables don’t allow complex joins or queries. But the big advantage that they have is that they are cheaper and more scalable than relational databases.

Azure tables are best used for storing user data, configurations, settings, or other types of information which is not big or complex enough to be put in their database.

Azure Managed Discs are solid-state drives or hard discs that are managed by Azure and available via Virtual Machines. The Azure-managed discs are the perfect solution for migrating an application from on-premise to the cloud. They could be used as virtual storage for files, but overall, this is not optimal cost-wise and use-wise.

It is also important to mention that Azure storage can be set with a specific tier of data usage – hot, cool, or archived. Hot is optimized for frequent usage, cool is for infrequent usage and archived is for rare usage. Setting the correct tier of usage can reduce your cost and improve your processes.

This concludes the common storage options that are available in Azure. If you didn’t find what you were looking for and you need more specialized storage, you can also check the database options that Azure offers for:

  • Relational databases: SQL, PostgreSQL, MySQL, Oracle, SQLite
  • Non-relational databases: CosmosDB, CassandraDB, MongoDB, RavenDB, CouchDB, HBase

In the second article of the sequel, we will look into the common storage options provided by AWS.

 

Resources:

https://wire19.com/amazon-microsoft-and-google-cloud-infrastructure-market/

https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs–append-blobs–and-page-blobs

https://learn.microsoft.com/en-us/azure/storage/files/storage-files-introduction

https://learn.microsoft.com/en-us/aspnet/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/data-storage-options

Three Topics That will Dominate IT Landscapes Discussions in Energy Trading Companies in 2023 and Their Implications

Three Topics That will Dominate IT Landscapes Discussions in Energy Trading Companies in 2023 and Their Implications

Three Topics That will Dominate IT Landscapes Discussions in Energy Trading Companies in 2023 and Their Implications

The past two years have been pretty volatile and emotional for energy traders (in good and bad ways). ROITI’s CEO Ventsislav Topuzov shares some insights in the article below. As things settle down into a kind of “new normal” there are several effects of the volatility that drive business demand for IT solutions and will bring new strains on companies’ teams. The topics are not new but the increased pressure for time to market brings a much tighter focus.

The main market developments with significant second-order impacts on IT landscapes in my view are:

  • Investment in RES and the rise in PPAs has been growing steadily, but the level of pressure in this area now that big oil companies have very actively entered the market is on a whole new level A total of $226 billion was invested in renewable energy globally in the first half of 2022, according to BloombergNEF. A few examples from the past year include CWP Global signing a landmark agreement to launch a major green hydrogen project in Djibouti, as well as bp taking a big stake and operatorship of Australia’s largest RES project.
  • The rise of LNG will mean not only new portfolios for players entering the market to manage but also new complexity to handle them in IT landscapes. The LNG trade volume has grown continuously over the past decades, reaching over 500 billion cubic meters as of 2021, according to Statista.
  • Highly volatile markets mean far smaller willingness to take on risk for traders which in turn means far fewer fixed price contracts and risk being passed on to end customers or producers S. natural gas price volatility hit a record in 2022, according to the Energy Information Administration (EIA). “The 30-day historical volatility of U.S. natural gas prices, based on the Henry Hub front-month futures prices, averaged 179% in February 2022”.

 

The Three Key IT Landscape Topics

The above three aspects of energy market developments lead to the following impacts on the internal teams:

  1. Far more complexity for Back Office functions. LNG and RES PPAs are highly custom and complex contracts and the stage of market development means standardization is not on the immediate agenda. This means a lot of manual work in settlements and contract management with a high chance that Excel is the most suitable software for a large part of the activities at this stage. Additionally, RES PPAs come with a certificate position to manage – and their behaviour (as well as their value) can range wildly across registry jurisdictions.
  2. More RES in the portfolios means more need for algorithmization of decision-making and trading. This trend has been around for a while but the push for faster time to market without compromising quality is greater than ever.
  3. Finally, as risk is being pushed increasingly up- and downstream, the risk management capabilities of producers and consumers will need to increase out of necessity rather than desire. The role of trading companies as risk aggregators and managers will not diminish (on the contrary), but as there is considerably more risk in the market as a whole, they will be reluctant to take on an ever-increasing amount.

 

How does this reflect on the IT landscape?

What can be observed across the board is increasing investment for digitalization – of more specific variations.

The new complexity of the traded products can be addressed in three ways:

  • Automating further existing processes to free time for existing teams to settle new products. Pros: last bits of established processes can be automated. Time of experienced people can be refocused on new priorities without losing oversight of the existing business.
  • Buy or build new solutions aimed specifically at PPAs, certificates, and/or LNG where this is a topic. Pros are clear – new solutions to new problems should alleviate the strain on the teams and bring efficiency into processes which may turn out to be sources of revenue and competitive advantage. Over time, further investment will be needed as the markets evolve, but the evolution is likely to be in the direction of standardization in the mid to long term, and this will make things easier.
  • Increasing Back Office headcount. Companies are traditionally unwilling to spend more in this area, as over time as processes get well defined, there is room for automation. However, experienced back office people can be crucial in a reality with the growing importance of origination and non-standardized contracts with new counterparties. KYCs, long-form confirmations, and complex settlements all require good knowledge of internal processes and market realities. Having the flexibility to address them promptly can easily go through onboarding new hires on more established processes which gives a faster time to market for them.

Two of the options above require pretty much immediate investment in IT capabilities. The third one depends on the bet how fast and how much of a role the more non-standardized business lines will play and how long will it take for them to mature into well-defined processes which are automatable.

More algorithms in trading are fairly straightforward, although there are key design components which need to be carefully considered like infrastructure, data correctness and security, and algo limits (or basically how much space you allow for human intervention should things go wrong). There are three types of competencies which are key in this area:

  • Data engineering – basically ensure that data is where it needs to be in a format which is analyzable and comparable across data sources at the time when it is needed – and it is needed a lot and reliably;
  • Data science – typically, there has been an Analysis department at every large energy company. Now that much more data is relevant for decision-making and more random variables enter the picture (wind, sun, among others), the analysts’ role becomes much more complex involving an increasing focus on data engineering but adding “come up with a model to predict this variable” tasks on top. Because of the added complexity and increased push for more variables to be taken into account faster, data science is in short supply – and demand is set likely to increase in the foreseeable future;
  • DevOps – data, model, and system reliability is a very key component in the algorithmization of trading and (to use my favourite Nassim Taleb’s term) an antifragile infrastructure is very important to assure the analytics are correct and that they are based on the correct data. DevOps engineers (and variations like ML Ops) are the key people having the technical competency to ensure these effects.

 

Last but not least, more risk passed from traders to producers and consumers means they need to build up risk management and trading capabilities and do it fast, as the markets will become tougher for them. IT landscape-wise, the key considerations will be along the lines of:

  • Selecting ETRM systems. While options on the market are abundant, potential buyers like e.g. large power producers are considered critical infrastructure and regulations to the origin of the company providing software may apply. This adds a twist to the selection process and limits the choice. On the other hand, portfolios will be heavily tilted towards a specific type of complexity and potentially the problems to solve will be fewer than traders’ problems. This could generally help the market to develop competition across systems covering different asset classes;
  • Control over data. More active trading companies have spent the last years (perhaps decades) progressively ensuring control over data, developing governance policies, and ensuring different views on data use the same source (which is ideally also the correct one 😊). This will be a challenge in the context of creating trading and risk management capabilities for companies that are not used to complex data governance. Specifically, it will lead to a higher total cost of ownership (TCO) of data solutions until the right policies and practices come into place;
  • Putting an operational model in place. While asset operators will have experience with monitoring and supporting critical processes and likely have something to build on, consumers may not necessarily be prepared. Putting the right operational processes around supporting a more trading-oriented landscape will require identifying critical roles, processes, and systems, identifying SLAs that need to be covered around them, and finding the right competencies to fill in the gaps.

 

Overall, there is a mixture of sources of complexity for trading IT landscapes. High volatility, more RES, and the increased importance of LNG will pose different challenges to the markets as a whole, and individual players. 2023 will be marked by increased pressure on the market participants to try and catch up with the trends in energy trading IT and try to jump over steps to get to a modern and business-supporting landscape.

 

We live in interesting times. 😊

 

Resources:

https://assets.bbhub.io/professional/sites/24/BNEF-2H-2022-Renewable-Energy-Investment-Tracker_Final-ABRIDGED.pdf

https://www.cwp.global/wp-content/uploads/2022/12/221206-CWP-Djibouti-MOU-signing-press-release-FINAL.pdf

https://www.bp.com/en_au/australia/home/media/press-releases/bp-backs-australias-energy-future.html

https://www.statista.com/statistics/264000/global-lng-trade-volume-since-1970/

https://www.naturalgasintel.com/u-s-natural-gas-price-volatility-at-all-time-high-in-2022/

What (the hell) is ETRM?

What (the hell) is ETRM?

What (the hell) is ETRM?

ETRM stands for Energy Trade and Risk Management and is, simply put, certain software services supporting energy trading and risk management. These services can be used for any part of the energy chain: from producers to suppliers, traders, and large consumers.

Eroslav Andreev is an Expert ETRM Developer and the lead of the ETRM Developers Team at ROITI. In this article, he helps clarify what (the hell) ETRM is about.

What business problems do you solve?

Like everything in the IT world, problems evolve with time. When I started working at ROITI, the main problem our clients used to face was how to get all their raw data into a single ETRM application. Many used to use multiple systems, and not all used to have working and stable interfaces between them, nor with trading exchanges. Six years later, this has somewhat been achieved, and now the problem is what to do with that data and how to use and analyse it profitably. We try to solve specific client problems depending on their maturity level, including integrations with other core systems, automation of tasks, reporting, cost-cutting, etc. In general, trying to make their lives a bit easier. And of course, in between that, I try to teach the others from the team “a thing or two”.

What is the tech stack you use and are there any peculiarities?

This is quite dependent on the client, although we focus on .Net for applications outside the ETRM system, and Java for the embedded tasks. We use a specific framework, only available for Endur, called OpenJVS. At first glance, it may seem a bit legacy, but it has its benefits over others, such as Spring, for example. Of course, SQL, DevOps, PowerShell, CI/CD, and unit testing are not new to us either.

The main difference here is that you cannot google your way around these ETRM systems, so learning is only by the try-fail-succeed experience or by the call-a-friend option.

Who is the “right” kind of person for such a role?

Me. 🙂 Joke aside, as long as one has analytical and detail-oriented thinking, the willingness to learn and develop, and not being afraid of making mistakes and solving undescribed problems, there might be a chance.

What is the most fulfilling part of the job?

Sparing someone even just 15 minutes of dull work by automating it is relieving and reassuring. At the end of the day, going home confident of the changes and implementations you have done during working hours, gives you the willingness and motivation to go to the office.

How Is Europe’s Gas Infrastructure Evolving?

How Is Europe’s Gas Infrastructure Evolving?

How Is Europe’s Gas Infrastructure Evolving?

Boyana Achovski, Secretary General at Gas Infrastructure Europe

Boyana Achovski, Secretary General at Gas Infrastructure Europe

ROITI’s interview with Boyana Achovski, Secretary General at GIE

European LNG projects have the potential to replace half of the total imports from Russia in 2023, said Secretary General at GIE, Boyana Achovski in an interview with ROITI.

How fast is the regasification capacity being developed?

Europe currently has 22 operational large-scale LNG terminals, including onshore terminals and floating storage and regasification units (FSRUs), which are located around the EU coastline. Including Turkey and the UK, European LNG terminals have an import capacity of 260 bcm/year (includning Turkey and the UK).

Much more LNG could be imported as additional infrastructure is being developed. LNG terminal operators (i.e. in the Netherlands, Germany, France, Poland, Greece, and Italy) are already working on increasing the import capacity via FSRUs for the coming winter at places where they serve the wider EU market and help Europe secure gas in stocks. More than 20 European LNG projects have been announced or accelerated since March 2022, with the potential to replace about 50% of total imports from Russia in 2023. By 2023, more than 60 bcm/year of additional regasification capacity is expected from the capacity expansion of the existing terminals and new FSRUs. By 2030, the plan is to increase the LNG import capacity by 75%.

How do you see Europe’s gas infrastructure landscape developing

Today, the energy provided by the gas infrastructure accounts for 25% of the total primary energy consumption in Europe.  On top of these fast-developing regasification capacities, with more than 1,148 TWh of working gas volume in underground gas storage, the European Union today is well placed to face system imbalances. But additional efforts are currently being considered: some Member States such as Romania, Bulgaria and Poland, for instance, are looking at expanding their storage capacities to further increase their system resiliency. Others like Latvia are modernising their facilities to become more effective. EU funds such as the Connecting Europe Facility – Energy will play a key role in turning these projects into reality.

What are the main challenges for infrastructure companies when gas is too short?

In the current crisis, the target of 80% of storage filling level in the EU Member States has been an absolute priority to save gas for the coming winter. In fact, implementing storage obligations was the very first measure that the Commission put forward after the Russian invasion of Ukraine.

While the EU has championed its efforts to secure these storage injections, it has also been faced with the challenge of substituting 155 bcm of Russian gas before 2030. This means that in the short- to medium-term, Europe needs to secure new LNG for Europe because after 2025 new future-ready LNG terminals are coming online. And so, Europe needs to strike a fine balance between long-term LNG import contracts and short-term, spot procurements. The European Union and European companies have recently intensified their outreach for new LNG supply. This includes the Joint Statement between the European Commission and the United States on European Energy Security, the recent contracts of the EU with US producers or the EU inking a Memorandum of Understanding on LNG with Israel and Egypt.

Are there any “winners”?

Well, in such times of crisis, I don’t think we can talk about any winners. On the other hand, the infrastructure operators completely proved their resilience as a massive tool to provide stability for European citizens. We see that in the crisis, LNG becomes a central player within the possible options for diversification. The role of the pipelines and underground gas storage is also key!

Europe currently accounts for one-third of the global LNG imports. For that, additional production is needed, and Europe can only secure that with long-term contracts with key producing countries. There is no way around it. According to GIIGNL, the global LNG regasification capacity is currently more than twice the liquefaction capacity. The global market becomes unbearably tighter due to the lack of enough supplies. In 2021, 44 markets imported LNG volumes, while 19 countries exported LNG. Asia continued to be the leading importing region with a 73% share of global LNG imports, hence driving away the LNG ships from Europe most of the year and continuing to increase its import capacities further.

Concerning storage, after the substantial efforts undertaken over the past months, storage system operators (SSOs) are on the right track to deliver on their filling targets. At the same time, they are leaving no stone unturned to realise their objectives to transition toward a carbon-neutral economy. Over the past few years, more than 40 projects have already been publicly announced in the field of underground hydrogen storage.

How can Europe reconcile the short-, mid-, and long-term objectives?

Careful planning is the way. Indeed, the balance between short-, mid-and long-term perspectives needs to be adequately designed to ensure that the infrastructure will be on time. Decarbonisation is not new for LNG terminal operators. Three years ago, they already presented decarbonisation pathways that can deliver the EU Green Deal – at different times and serving different demands. All LNG terminals can integrate value chains that will emerge depending on their geographical location, level of energy demand and upstream and downstream developments. SSOs are also anticipating the ambitious targets of the REPower EU plan, in particular, the development of infrastructure, facilities, and ports for 20 Mt of hydrogen by 2030 because some of this demand will have to be stored. Due to the time necessary to retrofit or repurpose underground gas storages, and taking into account the latest Gas Storage Regulation, this questions the need for a solid storage strategy as emphasised by the European Parliament in its report on a comprehensive European approach to energy storage in 2022.