Software engineering for an Artificial Intelligence Machine Learning Application
Here we discuss: software engineering for an Artificial Intelligence Machine Learning Application with a focus on the build process with continuous integration and continuous deployment.
Firstly, let's bring light to the pivotal role of continuous integration and continuous deployment, often denoted as CI/CD, in building and deploying machine learning models.
CI involves:
GitHub version control,
automated building: run under script control and triggered by Git Actions
testing, and code review.
CD involves model training, validation, versioning, deployment, monitoring, logging, and updating [1].
This automated process is designed to ensure high software quality and improve productivity [2].
In the context of AI Machine Learning Applications, the key focus is on the Machine Learning model, which lies at the heart of the application.
The process commences with building and training the model. PROJECT STEP 1.
The developers branch out from the main branch of the GIT repository, make changes, and run experiments. The model is trained on data, validated, and then submitted to the CI/CD servers for approval [2].
A major challenge in this process is the impact of both code and data on the model's performance.
To tackle this, Data Version Control (DVC) is employed for data versioning.
Quality control is maintained by using evaluation metrics like accuracy, loss, and R-score, which may vary as per the project requirements: again we are seeing the tight integration between the Project Plan and the Software Engineering practices.
The development team is charged with the selection of the right metrics and ensuring performance reports follow a predefined convention [2].
Finally, when it comes to continuously deploying the updated model, adopting modern techniques is crucial. Several tools have been developed to assist in this phase.
These include CML, an open-source tool designed for the needs of MLOps, along with other tools like GitHub Actions, GitLab CI/CD, Jenkins, TeamCity, Circle CI, and Travis CI.
These tools automate manual work, eliminate time waste, and help reduce costs of downtime, thereby enabling a seamless transition from a new code to its production release [3].
In conclusion, building and deploying a Machine Learning model for an AI application relies heavily on the principles of CI/CD.
By employing these practices, we can ensure better efficiency and productivity in our projects. It is indeed the future of software engineering for AI Machine Learning Applications.
Chapter: Integrating Software Project Management in AI and Machine Learning Applications Development
In the world of software development, project management plays an integral role, ensuring that timelines, resources, and quality standards are met. This becomes even more critical when we move into the realm of AI and Machine Learning (ML) applications, where the build process involves not just code deployment but also data handling, model training, and continuous model updates.
The Role of Software Project Management in AI/ML Applications Development
As part of the software project management process, the project manager must oversee and coordinate all stages of the application development, from initial planning and requirements gathering, to design and implementation, through to testing and deployment.
In AI/ML projects, this includes additional tasks like data acquisition and cleaning, model selection and training, and the integration of the model into the application.
The project manager also has to ensure that the team adheres to best practices such as Continuous Integration/Continuous Deployment (CI/CD), which involves regularly merging all developer working copies of code and data in the GITHUB repository to a shared mainline, and automating the process of software delivery.
Case Study: Software Project Management in a Commercial IT Development Function
A prime example of software project management in AI/ML application development can be seen in the operations of Google. The multinational technology company employs comprehensive project management strategies in its AI/ML projects.
Google's AI/ML projects follow a tightly integrated process where project managers work closely with data scientists, ML engineers, and software developers.
They employ CI/CD practices and use tools like TensorFlow Extended (TFX), an end-to-end platform for deploying production ML pipelines.
The project managers at Google ensure that they follow a systematic process, which includes defining project goals, assembling the project team, determining the project timeline, and setting quality standards. They also make sure to maintain continuous communication among team members to facilitate knowledge sharing and problem-solving.
One of Google's AI/ML projects, the Google Assistant, demonstrates the effectiveness of their project management strategies.
The development of Google Assistant involved complex AI and ML algorithms, extensive data training, and continuous updates. Through effective project management, Google was able to successfully develop and deploy the Assistant, which is now a leading virtual assistant in the market.
Conclusion
Software project management is pivotal in AI/ML application development. It ensures that the project proceeds systematically, resources are used optimally, and the final product meets the desired quality standards.
The integration of project management in the CI/CD process of AI and ML applications can significantly improve efficiency and productivity, as demonstrated by leading tech companies like Google.
Chapter: Integration of Software Project Management and Software Engineering through GitLab
GITHUB is a Code Versioning and Repository Database. GitHub gives us some Project Management tools like Git Issues and Git Actions.
GitLab, a leading DevOps CI/CD platform, plays an integral role in the convergence of software project management and software engineering [1].
GitLab facilitates collaboration among team members and different teams, making it an invaluable tool for project managers tracking a project or workload [2].
Through GitLab, the Unified Process Methodology can be integrated into the build tool chain.
The Unified Process, an iterative and incremental development process, aligns well with GitLab's single-application platform approach.
Each iteration in the Unified Process can be mapped to GitLab's planning, building, securing, and deploying stages, providing a seamless transition from software development to operations [1].
A case study illustrating this integration in a commercial IT development function is the adoption of GitLab by organizations transitioning from monolithic to microservices architectures.
Watch my video detailing the differences between Model View Controller and Microservices Design Patterns:
The complexity of DevOps has increased due to the use of more tools per project. To address this, organizations have started to adopt GitLab's DevOps platform. It resolves inefficiencies and vulnerabilities by replacing the DIY DevOps approach with a unified system facilitating end-to-end collaboration [1].
The rising popularity of Python and the increasing use of AI in software development further highlight the importance of GitLab.
As AI becomes a critical skill for developers' future careers, GitLab offers a platform to harness AI's potential while ensuring the quality of data fed into it [3].
This ensures that the AI and machine learning models are as effective as possible, further integrating software project management and software engineering.
The way to deploy is dictated by business requirements. You should not start any ML development before you know how you are going to deploy the resulting model.
There are 4 main ways to deploy ML models:
- Batch deployment - The predictions are computed at defined frequencies (for example on a daily basis), and the resulting predictions are stored in a database and can easily be retrieved when needed. However, we cannot use more recent data and the predictions can very quickly be outdated. Look at this article on how AirBnB progressively moved from batch to real-time deployments:
- Real-time - the "real-time" label describes the synchronous process where a user requests a prediction, the request is pushed to a backend service through HTTP API calls that in turn will push it to a ML service. It is great if you need personalized predictions that utilize recent contextual information such as the time of the day or recent searches by the user. The problem is, until the user receives its prediction, the backend and the ML services are stuck waiting for the prediction to come back. To handle additional parallel requests from other users, you need to count on multi-threaded processes and vertical scaling by adding additional servers. Here are simple tutorials on real-time deployments in Flask and Django:
.
- Streaming deployment - This allows for a more asynchronous process. An event can trigger the start of the inference process. For example, as soon as you get on the Facebook page, the ads ranking process can be triggered, and by the time you scroll, the ad will be ready to be presented. The process is queued in a message broker such as Kafka and the ML model handles the request when it is ready to do so. This frees up the backend service and allows to save a lot of computation power by an efficient queueing process. The resulting predictions can be queued as well and consumed by backend services when needed. Here is a tutorial in Kafka:
- Edge deployment - That is when the model is directly deployed on the client such as the web browser, a mobile phone or IoT products. This results in the fastest inferences and it can also predict offline (disconnected from the internet), but the models usually need to be pretty small to fit on smaller hardware. For example, here is a tutorial to deploy YOLO on IOS: