Categorization of software bugs is an important task in software repository mining. Most of the information about the software bugs are in textual form, and it is difficult to categorize these bugs into a particular category as the some of the terms present in the software bugs can be common to multiple categories. Fuzzy similarity technique can be utilized to identify the belongingness of these bugs into different categories. In this paper, a binary software bug categorization technique using fuzzy similarity measure is proposed to classify the bugs as bugs or non-bugs. The fuzzy similarity of a software bug is computed and based on a user-defined threshold value the bug can either be assigned to bug or non-bug category. Experiments are performed on available software bug data sets and performance of proposed fuzzy similarity based classifier is evaluated using the parameters accuracy, F-measure, precision, and recall. The proposed algorithm is also compared with the existing standard machine learning algorithms.
Finding the bug count in software modules in order to utilized limited testing resources that could help the testers is a challenging task. Mostly the regression methods are used for finding the bug count. AdaBoost is the one of machine learning technique which has been used for both classification and regression problems. However, it cannot perform better if the weak learner cannot accomplish at least 50% accuracy when running on the skewed dataset. Extra Tree regression is computationally effective and delivers excellent prediction performance. In this paper, we present a new approach, AdaBoost.R-ET algorithm to predict the bug count in a software module. We used the extra tree regression as a weak learner of AdaBoost regression to provide a significant improvement over the model. An experimental study is performed on five projects from the PROMISE repository consisting of 15 different versions. To estimate the performance of the AdaBoost.R-ET algorithm, we compare this algorithm with varying models of AdaBoost. The outcomes show that the proposed model is better than the different variants of AdaBoost models.
Gravitational search algorithm is a physics-based optimization algorithm inspired by Newton's law of gravitation and laws of motion. Both clustering and classification are two important steps in machine learning and getting expertise in them is the need of today's artificial intelligence era. Inventing new methods for mastering clustering and classification for reducing the complexity of the data is always welcome. This paper presents a review of applications of gravitational search algorithm and its variants for clustering and classification problems. In clustering, the GSA is used with various traditional clustering algorithms to find the interesting patterns in data and to divide the data set into different clusters. For solving classification problems, the GSA is hybridized with other swarm optimization algorithms to increase the classification accuracy and finding optimal classification rules.
This paper represents handwritten digit recognition on a very well-known dataset which is MNIST dataset using the Linear Binary Pattern (LBP) and Scale-Invariant Feature Transform (SIFT) feature extraction methods. From the dataset, features have been extracted using this extracting methods. After this, to reduce the number of features or reducing the dimension we have used Principal Component Analysis (PCA) for better performance for our proposed classifier. Then it has been trained by various classifiers. Then the accuracies and errors of those classifiers have been compared and demonstrated. Also some statistical analysis has been done for better understanding of the dataset. From those comparison, it has been shown that our proposed model (SIFT+PCA+CNN) has the better accuracy and less error than other classifiers. The results are competitively compared to previous works and they provide a baseline for evaluation of future work.
The threat landscape is exponentially increasing which become more worsen when a greater number of emerging devices are connected to internet such as Internet of Things (IoTs), Embedded Systems, Cyber Physical devices etc. To control the damage of cyber threats, there is a need to monitor the cyber criminals continuously to understand the tools and technique used by the attackers in order to develop cyber defense mechanism to protect cyber Eco-systems. In this research, a multi-Honeypot platform as a tool is presented for cyber threat intel generation to implement the multiple classes of Honeypots such as Windows, IoTs, Embedded etc. Honeypot is widely used by the security researchers, security companies to understand the tools and tactics about the attackers but these are quite complex to deploy and maintain especially due to diverse set of IT systems and intensive resource requirements to deploy High Interaction Honeypots. This complexity is reduced in this research by implementation of Para-Virtualization based approach to enable multiple classes of Honeypot sensors of different platforms on a light weight hardware. It is addressed that time window to collect the data and to conclude it as a cyber threat intel with support of evidences should be probabilistically determined. After applying the analysis such as behavior analysis and deep learning methods to determine about the unknown threat patterns, the attack data sets are correlated into different cyber threat events and converted into an actionable cyber threat intelligence to disseminate the information in an automated manner. In the end, threat intelligence is generated and experiments are documented. The deep learning-based analysis inspired by neural networks is integrated in the design to determine the unknown classified threat events.
The age of digital information uses images in fields like military and medical applications, but the security of those transacted images is still a question mark. To overcome this challenge, an efficient encryption system has to be developed which should accomplish confidentiality, integrity, security and it should also prevent the access of images by unauthorized users. Such an image encryption system has to be developed which provides enhanced security for images. This system uses two important techniques which is based on chaos theory namely: Confusion and Diffusion. Confusion part uses block scrambling and modified zigzag transformation while the Diffusion part uses 3D logistic map and key generation followed by additive cipher. This system also protects from statistical and differential attacks. The experimental results such as Entropy, Histogram analysis, Mean Absolute Error, Number of Pixels Change Rate (NPCR), Unified Average Changing Intensity (UACI) proves that the security of the images has been preserved at a higher level and also prevents the unauthorized access to the sensitive information.
Rooftop solar energy potential has traditionally been estimated by surveying the number of large buildings in a given area. In this work, we propose a fast and low-cost method to estimate the rooftop photovoltaic solar energy generated in a particular area by utilizing satellite imagery - even though it may be of low resolution. We employ a deep learning based approach to carry out image segmentation on low resolution satellite images of Bangalore, India. Three different model architectures (U-Net, SegNet, FCN) were trained on manually hand-labelled data and tested for the task of semantic segmentation of satellite images. U-Net was found to yield the best results. By using a custom modified U-Net to segment the images, we arrive at the building rooftop area available for solar panels. To calculate the annual solar energy generated in gigawatt-hour (GWh), the incident solar insolation values from the U.S. National Renewable Energy Laboratory (NREL) based on observed weather patterns in Bangalore, and standard values from datasheets of photovoltaic manufacturers are used.
Migration of customer to other companies is “Churn”. Customers who abandon services of any company or leave the organization are known as churners. Now a day churn prediction is one of the biggest challenge for organizations/Companies. Reputation/Ranking, finance and growth plans of company is directly affected by churn. This makes research on churn more valuable. In this paper python platform is used to implement two of the classification models of supervised machine learning. As the data used is a labeled data thus this is under the category of supervised domain of machine learning. Confusion matrix is observed to state the conclusion which says that KNN is a better approach to predict customer churn over Logistic Regression. KNN is 2.0% more accurate than Logistic Regression to predict customer churn. Here “accuracy” parameter says that Logistic algorithm is least effective to predict the customer churn as compare to K-Nearest Neighbors.
Millimeter wave communication in combination with MIMO system and NOMA multiple access technology are emerging technologies for 5G and next generation wireless communication. For the mmWave (Millimeter wave) based communication, fundamental challenge of massive MIMO communication is capacity improvement while minimizing cost and energy components. There are various solutions presented so far but the challenge of energy and spectrum efficiency yet to optimize along with capacity improvement and scalability. The novel concept, beamspace MIMO-NOMA which is integration of two technologies, Non orthogonal multiple access and beamspace MIMO. In Beamspace MIMO communication, only few number of RF chains are preferred to serve users, so energy consumption is decreased while with the help of NOMA more than one user can be served in each beam. This system can significantly restrict energy consumption. But the problem of such methods is the scalability. Therefore designing scalable spectrum and energy efficiency solution for mmWave communications is main research problem. In this paper, we proposed modified beamspace MIMO-NOMA in which user clustering is used. We used precoding technique to mitigate interferences. Also, Modified iterative power allocation is used for the intensification of achievable sum rate. Results obtained through simulation indicate that proposed method is more efficient than the beamspace MIMO-NOMA.
The evolution of cluster computers based on multicores, many cores and GPGPU accelerators is encouraging application developers to write hybrid parallel programs. Hybrid parallel programming is quite complex as it requires use of multiple programming paradigms such as MPI, OpenMP, CUDA/OpenCL to exploit the varied computational power available in a system. The paper brings out the challenges faced by application developers desiring to use heterogeneous HPC clusters. It describes a unified development environment which eases the complete development lifecycle of hybrid parallel programs on a HPC cluster. The software is capable of providing access to multiple clusters of different architectures, owing to the modularity of design and web based approach. The paper also serves as a good resource for researchers interested to gain an insight into hybrid parallel programming.
Manual inspection of satellite images for detection of objects of interest is very cumbersome and time consuming. In this paper we report the use of the YOLO Deep Convolutional Neural Network(CNN) architecture for automatic detection of air-bases in satellite images. We have trained the YOLO network with a custom dataset of about 360 air-bases. The trained network has demonstrated a recall value of 0.7716 on the test Images.
Recent Development in Hardware and Software Technology for the communication email is preferred. But due to the unbidden emails, it affects communication. There is a need for detection and classification of spam email. In this present research email spam detection and classification, models are built. We have used different Machine learning classifiers like Naive Bayes, SVM, KNN, Bagging and Boosting (Adaboost), and Ensemble Classifiers with a voting mechanism. Evaluation and testing of classifiers is performed on email spam dataset from UCI Machine learning repository and Kaggle website. Different accuracy measures like Accuracy Score, F measure, Recall, Precision, Support and ROC are used. The preliminary result shows that Ensemble Classifier with a voting mechanism is the best to be used. It gives the minimum false positive rate and high accuracy.
Recent years have seen a phenomenal change in healthcare paradigms and data analytics clubbed with computational intelligence has been a key player in this field. One of the main objectives of incorporating computational intelligence in healthcare analytics is to obtain better insights about the patients and proffer more efficient treatment. This work is based on liver transplant patients under the National Liver Transplant Program of Uruguay, considering in detail the health parameters of the patients. Applying computational intelligence helped to separate the cohort into clusters, thereby facilitating the efficient risk-group analysis of the patients assessed under the liver transplantation program with respect to their corresponding health parameters, in a predictive pre-transplant perspective. Also this marks the foundation of Clinical Decision Support Systems in liver transplantation, which act as an assistive tool for the medical personnel in getting a deeper insight to patient health data and thanks to the holistic visualization of the healthcare scenario, also help in choosing a more efficient and personalized treatment strategy.
This paper presents the level of damage prediction to buildings caused by Gorkha Earthquake in Nepal using machine learning techniques. The predictions have been made based on mathematically calculated eight tectonic indicators and past vibrational activity records. In this research the objective is to predict earthquake damage on existing data set of seismic activity by using machine learning techniques. In this study, two well-known approaches of machine learning viz. Neural Network (NN) and Random Forest (RF) have been implemented and optimal parameters for accurate prediction are investigated. The analysis reveals that Random forest method has outperformed the neural network approach for building damage prediction. The F1 score using the random forest classification has been obtained as 74.32%.
In Healthcare 4.0, Remote patient monitoring (RPM) becomes a more powerful and flexible patient observation through wearable sensors at any time and anywhere. The most focused application area of RPM which allows doctors to get real-time information of their patient remotely with the help of wireless communication system. Thus, RPM reduces the time and cost of the patient. It also provides the quality care to the patient. To enhance the security and privacy of the patient data, in this paper, we have presented a Permissioned blockchain-based healthcare architecture. We have also discussed the challenges and their solutions. We have described the applications of blockchain. We also have given the usage of Machine learning with blockchain technology which can impact the healthcare industry.
Nowadays, a million users use social networking services such as Twitter to tweet their products and services by placing the reviews based on their opinions. Sentiment analysis has emerged to analyze the twitter data automatically. Sentiment classification techniques used to classify US airline tweets based on sentiment polarity due to flight services as positive, negative and neutral connotations done on six different US airlines. To detect sentiment polarity, we explored word embedding models (Word2Vec, Glove) in tweets using deep learning methods. Here, we investigated sentiment analysis using the Recurrent Neural Network (RNN) model along with Long-Short Term Memory networks (LSTMs) units can deal with long term dependencies by introducing memory in a network model for prediction and visualization. The results showed better significant classification accuracy trained 80% for training set and 20% for testing set which shows that our models are reliable for future prediction. To improve this performance, the Bidirectional LSTM Model (Bi-LSTM) is used for further investigation studies.
Social media is a growing source of data and Social Media activities such as WhatsApp group communications have opened up a myriad of data and information exchange - especially amongst the modern student community. On an average, they tend to express their feelings and opinions far more freely amongst their net cronies than anywhere else. In this paper, the target is to evaluate the emotional state of students within such a group, by extracting hidden nuances through usage of words revealed by text analytic measures. The study is further augmented with records of personal statistics. Some of these, such as age, educational background and social status, help to categorize the participants for ease of comparative analysis at a later stage. The rest, gleaned through collecting signatures and handwritten scripts of participants, support the discipline of graphology, for delving into the depths of the writer's character traits. Overall, the primary goal is to propose a Case Based Reasoner (CBR) to hold all the details of each student as individual cases with attached sentiment polarity scores reflecting the personal positivity or negativity. The Case Base needs to undergo incremental learning, until the model is judged to have gained sufficient experience to start predicting polarities on its own.
Over the past few years, fake news and its influence have become a growing cause of concern in terms of debate and public discussions. Due to the availability of the Internet, a lot of user-generated content is produced across the globe in a single day using various social media platforms. Nowadays, it has become very easy to create fake news and propagate it worldwide within a short period of time. Despite receiving significant attention in the research community, fake news detection did not improve significantly due to insufficient context-specific news data. Most of the researchers have analyzed the fake news problem as a binary classification problem, but many more prediction classes exist. In this research work, experiments have been conducted using a tree-based Ensemble Machine Learning framework (Gradient Boosting) with optimized parameters combining content and context level features for fake news detection. Recently, adaptive boosting methods for classification problems have been derived as gradient descent algorithms. This formulation justifies key elements and parameters in the methods, which are chosen to optimize a single common objective function. Experiments are conducted using a multi-class dataset (FNC) and various machine learning models are used for classification. Experimental results demonstrate the effectiveness of the ensemble framework compared to existing benchmark results. Using the Gradient Boosting algorithm (an ensemble machine learning framework), we achieved an accuracy of 86% for multi-class classification of fake news having four classes.
We present an experimental study of implementing Latent Dirichlet Allocation (LDA) and Comparative Visual Analytics to trace socio-political issues highlighted within large corpora of political speech transcripts. In this experiment, over 500 speech transcripts are scraped by building scrapers to analyze this big-data of transcripts and derive insights from it. Based on LDA topic modelling algorithm, latent “topics”, referred as issues in this paper, were discovered from the speech transcripts and visualized using ‘pyLDAvis’, which is an interactive visualization tool used upon LDA Model results. Along with LDA, graphical visualizations were generated such as Lexical Dispersion Plots and Topic Bar Plots using Matplotlib library of Python. Within comparative analytics, visual graphs were generated for speeches by two different candidates and juxtaposed to compare and interpret their discourse. Linguists have performed Political Discourse Analysis (PDA) using manual approaches but analyzing such a large volume of speeches is practically time consuming and extremely complex. Our experiment which focuses on identifying socio-political issues within speech transcripts using NLP based text analytics proves to be a beneficial technique for understanding Political Discourse Analysis (PDA).
Optimization of searching the best possible action depending on various states like state of environment, system goal etc. has been a major area of study in computer systems. In any search algorithm, searching best possible solution from the pool of every possibility known can lead to the construction of the whole state search space popularly called as minimax algorithm. This may lead to a impractical time complexities which may not be suitable for real time searching operations. One of the practical solution for the reduction in computational time is Alpha Beta pruning. Instead of searching for the whole state space, we prune the unnecessary branches, which helps reduce the time by significant amount. This paper focuses on the various possible implementations of the Alpha Beta pruning algorithms and gives an insight of what algorithm can be used for parallelism. Various studies have been conducted on how to make Alpha Beta pruning faster. Parallelizing Alpha Beta pruning for the GPUs specific architectures like mesh(CUDA) etc. or shared memory model(OpenMP) helps in the reduction of the computational time. This paper studies the comparison between sequential and different parallel forms of Alpha Beta pruning and their respective efficiency for the chess game as an application.
Presently, the prediction of share is a challenging issue for the research community as share market is a chaotic place. The reason behind it, there are several factors such as government policies, international market, weather, performance of company. In this article, a model has been developed using long short term memory (LSTM) to predict the share price of DLF group. Moreover, for the experimental purpose the data of DLF group has been taken from yahoo financial services in the time duration of 2008 to 2018 and the recurrent neural network (RNN) model has been trained using data ranging from 2008 to 2017. This RNN based model has been tested on the data of year 2018. For the performance comparison purpose, other linear regression algorithms i.e. k-nn regression, lasso regression, XGboost etc has been executed and the proposed algorithm outperforms with 2.6% root mean square error.
In this paper a novel approach has been proposed for dynamic slicing of UML Interaction Overview Diagram. The proposed technique efficiently deals with the changing or dynamic behavior of the system which is illustrated by UML Interaction Overview Diagram. First, the Interaction Overview Diagram is converted to Interaction Overview Dependency Graph (IODG) by considering all the included sequence diagrams. Next a dynamic slicing algorithm is proposed which traverses the graph for a given slicing rule and creates the slice. The basic working principle of the algorithm is based on marking and unmarking the edges of Interaction Overview Dependency Graph when dependencies between messages emerge and end during execution. Interaction Overview Diagram is beneficial over other UML diagrams as it focuses on the overall interactions carried out in a system. Also, it helps to represent complex scenario in a single diagram. Moreover, use of sequence diagrams as nodes simplifies the process and focuses on sequence of execution for different messages.
Entity Resolution (ER) is a prerequisite to several Web applications including enhancing semantic searches and information extraction from the Web, strengthening the Web of Data by interlinking entity descriptions from autonomous sources, and supporting reasoning using related ontologies. While designing an ER system, it is assumed that each entity profile consists of an exclusively identified set of attribute-value pairs, each entity profile matches to a solitary real-world object, and two similar profiles are identified, while they co-occur in at least one block. ER is an inherently quadratic problem (i.e., O (n2)), given that every entity must draw a comparison with others. Moreover, existing ER techniques relinquishes to scale for large entity collections, Web data. The most well-known solution for addressing large-scale ER in the literature is blocking, which is an approximate solution where similar entities are grouped into blocks and comparisons are limited to within blocks. The process of entity resolution and the types of entity resolution in relational and Web data are discussed in this paper. Further, the paper reviews the literature on the approaches introduced by former researchers on the entity resolution system. The data integration, block building, and block processing phases, and the challenges involved for designing an efficient ER system are discussed. This paper concludes with the measures required to evaluate entity resolution approaches.
Executable files such as .exe, .bat, .msi etc. are used to install the software in Windows-based machines. However, downloading these files from untrusted sources may have a chance of having maliciousness. Moreover, these executables are intelligently modified by the anomalous user to bypass antivirus definitions. In this paper, we propose a method to detect malicious executables by analyzing Portable Executable (PE) files extracted from executable files. We trained a supervised binary classifier using features extracted from the PE files of normal and malicious executables. We experimented our method on a large publicly available dataset and reported more than 95% of classification accuracy.
Software defined Network is a network defined by software, which is one of the important feature which makes the legacy old networks to be flexible for dynamic configuration and so can cater to today's dynamic application requirement. It is a programmable network but it is prone to different type of attacks due to its centralized architecture. The author provided a solution to detect and prevent Distributed Denial of service attack in the paper. Mininet [5] which is a popular emulator for Software defined Network is used. We followed the approach in which collection of the traffic statistics from the various switches is done. After collection we calculated the packet rate and bandwidth which shoots up to high values when attack take place. The abrupt increase detects the attack which is then prevented by changing the forwarding logic of the host nodes to drop the packets instead of forwarding. After this, no more packets will be forwarded and then we also delete the forwarding rule in the flow table. Hence, we are finding out the change in packet rate and bandwidth to detect the attack and to prevent the attack we modify the forwarding logic of the switch flow table to drop the packets coming from malicious host instead of forwarding it.
Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
In this paper, A modified sierpinski based planar crossover is presented. Square patch crossover is taken as the base, from which square slot and circular slot based Sierpinski methods are analysed. Proposed method yielded 8 dB enhancement in isolation for circular slot sierpinski on the square patch base crossover as compared with square patch crossover. The simulations are done using ANSYS HFSS. The proposed crossovers designs are potential candidates for antenna beamforming and GPS applications.
Huge volume, variance and velocity of data due to digitalization of various sectors have lead to information explosion. Storing and retrieval of huge volume of data requires appropriate data structures that will contribute towards performance optimization of the computing systems. Trie and its variants are popular in applications ranging from sub-string search to auto completion, where strings are used as keys. Trie data structure is characterised by huge memory requirements, thus it is infeasible to store in primary memory, especially for embedded devices where memory is a constraint, thus implementing a trie in secondary storage is a feasible solution for embedded devices. B tree, B+ trees, B trie and Burst trie are reported data structures that are designed to minimize read and write latency in secondary memory. We propose a File-based trie that performs insert, search and delete operations without explicitly loading trie data into the primary memory. We compare the performance of File-based Linked list-trie with B trie, B trees and FlatBuffers on strings form standard dictionary. Our results demonstrate that Linked list-trie takes 40 percent less lookup time compared to B trie, B tree and 29 percent less lookup time compared to FlatBuffers. Linked list-trie consumes 7 KB of main memory and provide support for characters of multiple languages. The implementation can be further investigated for applications involving query completion, prefix matching etc.
In Computer vision, object recognition is a very important component and also very challenging. Intention of This paper is to exploit a high confidence object detection framework that boosts up the classification performance with less computational burden and cost efficient. Features are extracted from images by using Histograms of Oriented Gradients (HOG) technique and then for generating principle components as well as reducing dimensions Principal Component Analysis (PCA) has been applied on the extracted features. For classification of objects Support Vector Machine (SVM), Random Forest, Input mapped classifier, M5P classifier and Gaussian process classifier have been employed. A comparative study on performance of those approaches have been conducted. Moreover, for better clarification of the dataset, statistical and automated analysis have been considered. Overall findings demonstrates that, Principle Component Analysis (PCA)based Support Vector Machine (PSO) outperforms other approaches by depicting accuracy of 94.02% and highest F-Score measurement.
The agricultural production is affected by the climate changes i.e. humidity, rain, extremes of temperature etc. Additionally, abiotic stresses are causative element to the etiology of disease as well as pest on crops. The production of the crops can be improved by diagnosis as well as detecting the accurate disease on time or in early stage. Moreover, it is very difficult for accurately detecting and treatment based on the technique which used in disease and insect pests diagnosis. Few researchers have made efforts on predicting disease as well as pest crops using machine learning algorithms. Therefore, this paper presents disease identification in corn crops by analyzing the leaves in the very early stage. We have used PlantVillage dataset for experiments and analysis. The validity of the results has been cheeked on various performance metrics such as precision, accuracy, recall, storage space, running time of the model and AUC-RoC. The obtained results shows the proposed technique outperform in comparison with the traditional machine learning algorithms. Developed model is able to achieve the accuracy of 94%.
Dengue is a disease caused by four types of related viruses transmitted by a mosquito, most commonly Aedes Aegypti. The disease is considered an alarming threat to the health of populations spanning millions of people living in tropical and subtropical areas of the globe where the mosquito thrives. A large number of studies have confirmed that the rise of dengue disease is positively correlated with climate and climatic conditions, specifically, humidity, temperature and precipitation levels. Many of these studies include quantitative models correlating climate variables with the incidence of dengue cases. The quantitative models invite the question: how well would we be able to predict future cases of the disease based on climate variables that are included in weather forecasts? To answer this question we conducted a study on Dengue Forecasting using machine learning, which utilizes climate and dengue data (available to data scientists by US government) to forecast future dengue epidemics. In this research we proposed a novel model twofold linear regression which out perform compare to all previous models. we achieve 19.81 mean absolute error which is minimum as compare to traditional machine learning techniques. Moreover, we have analyzed various neural network, support vector machine, random forest, boosted tree, XGBoost based predictive models and evaluate their performance against proposed method.
Nowadays, face detection is common. Its used in many areas. With the help of face detection benchmark data-sets, many signs of progress have been made. Face detection methods used nowadays is not matching the real-world requirements. With new advancing technologies and services, we need to upgrade our existing systems. By using a data-set called as WIDER FACE which is very large in size than already existing data-sets, we can improve the performance. WIDER FACE data-set has many faces in it which may be challenging as it includes faces under different conditions. Moreover, we can see that in face detection task, WIDER FACE data-set is best for training and testing the accuracy of the model. But existing face detection algorithms and models are not up-to the mark. They have major limitations. So we created a WIDER FACE detection system which will help us overcome all those issues.
In this forecourt competition, we are provided a stringently mongrel division of ImageNet in order to exercise fine-grained image cataloguing. The dataset contains images of dogs of different breeds. Deep Learning is a technique by which a computer program learns statistical patterns within the data that will enable it to recognize or help us to distinguish between the different breeds of dogs. The model trains itself on the different features based on the images present and represent the data numerically, organizing the data in space. Initially, the image is divided into numerous lattices and a training batch size is set accordingly, then an algorithm is used to split and combine the descriptors, and the channel information of the image is extracted as the input of the convolutional neural network and finally, we design a convolutional neural network-based to identify the dog species.
A large volume of data will increase the performance of machine learning algorithms and avoid overfitting problems. Collecting a large amount of training data in the agricultural field for designing plant leaf disease detection and diagnosis model is a highly challenging task which takes more time and resources. Data augmentation increases the diversity of training data for machine learning algorithms without collecting new data. In this article, augmented plant leaf disease datasets was developed using basic image manipulation and deep learning based image augmentation techniques such as Image flipping, cropping, rotation, color transformation, PCA color augmentation, noise injection, Generative Adversarial Networks (GANs) and Neural Style Transfer (NST) techniques. Performance of the data augmentation techniques was studied using state-of-the-art transfer learning techniques, for instance, VGG16, ResNet, and InceptionV3. An extensive simulation shows that the augmented dataset using GAN and NST techniques achieves better accuracy than the original dataset using a basic image manipulation based augmented dataset. Furthermore, a combination of Deep learning, Color, and Position augmentation dataset gives the maximum classification performance than all other datasets.
Disease prevention and health management is becoming the key focus of diagnostic medicine these days rather than focusing on treatment after the infliction. Research advancements in using Artificial Intelligence for healthcare has enabled personalized diagnostic and treatment applications for disease identification, management and prevention. What was previously limited due to vast amount of data has now been possible because of Deep Neural Networks and their application for modeling complex diagnostic decisions. Mobile healthcare applications powered by Deep Neural Networks has enabled people to triage their conditions and make preemptive treatment decisions. In this paper, we reviewed recent advancements done in mobile healthcare applications using deep learning in the past 2–3 years and classified these implementations into three major categories: Cloud-computing assisted mobile healthcare systems, Edge-computing assisted mobile healthcare systems and offline mobile healthcare systems. Furthermore, based on the recent literature, we identified an initial framework that most mobile healthcare applications using deep learning employ.
In rural India, about 22.9% of death are due to heart diseases. Non-accessibility to efficient healthcare services in rural areas is one of the leading causes of this loss of life. Even though wearable devices are considered to be one of the efficient ways to provide better healthcare services, many doctors discourage the usage of these devices due to the noise and motion artifacts present in the signals acquired by these wearable devices. This research work mainly focuses on the performance enhancement of ‘AmritaSpandanm’, a wearable wireless ECG device that will enable to provide real-time ECG signals even when the patient is involved in routine activities. For this, a context aware system is designed and developed to continuously collect the physical activity, classify the real-time signals using an innovative classifier algorithm and tag the ECG signal based on the classifier results. Using the results from the classifier algorithm, the motion artifacts in the ECG dataare removed using two methods, namely Adaptive Filtering and Wavelet Transform. The complete system has been implemented and tested on 35 individuals. The results obtained using wavelet transform shows 99 percentage of classification compared to the adaptive filtering method and therefore, wavelet transform is a better method to remove the motion artifacts. Hence the proposed system is capable to capture both the physical activity and ECG data of individuals and to provide an ECG signal free from noise and motion artifacts.
Physicians often seek scientific evidence regarding how to best care for their patients, while making clinical decisions These scientific evidence are available in the form of published medical articles, reports and clinical trials. Considering the volume of already existing medical literature and the pace at which medical research is growing, getting the most relevant information will be a tedious task. In this paper, we describe an empirical approach to fetch relevant medical articles from the PubMed (about 733,328 articles of 45.2 gigabytes) collection, based on a given query. Our IR system comprises of three parts: inverted indexing using Lucene, lexical query expansion to increase recall with MetaMap and reranking aimed at optimizing the system. Word sense of ambiguous terms are introduced to limit the negative effects that synonymy-based query expansion may have on precision. The subsequent ranked list was then re-ranked with learning to rank algorithms. We evaluated our system using 30 medical queries and the results show that our system can handle various medical queries effectively and efficiently. Also, the final results demonstrate that the ensemble approach performs better than the Lucene baseline by boosting the ranking of articles that are near the top of several single ranked lists.
This work focusses on using digital signal processing techniques to analyze and extract audio features and use them to predict the type of event that might have taken place in an audio signal using supervised machine learning approaches. The performance of five classification approaches using different feature subsets were analysed. Feature subsets include frequencies of the segmental features, frequencies of the supra-segmental features and combination of both. This gives an insight about the relative importance of the feature subsets and also the need for extracting new features from existing features. Features were extracted after the audio signal was filtered using a lowpass Butterworth filter with a cutoff frequency of 1500 Hz; it was inferred that the including features of the difference signals improved the performance of the learning algorithms. The work also includes tuning the parameters of the classification approaches to improve the performance. The observations and inferences of the experimental results can potentially be used for designing robust surveillance systems for rare event detection.
In recent past, data mining, artificial intelligence, and machine learning have gained enormous attention to improve hospital performance. In some hospitals, medical personals want to improve their statists by decreasing the number of patients dying in the hospital. The research is focused on the mortality prediction of measurable outcomes, including the risk of complications & length of hospital stay. The duration spent in the hospital of the patient plays an important role both for patients & healthcare providers, influenced by numerous factors. LOS (length of stay) in critical care has great importance, both to the patient experience as well as the cost of care and is influenced by the complex environmental factors of the Hospitals. LOS is a parameter that is used to identify the extremity of illness & health-related resource utilization. This paper provides the improved prediction rate that a patient survives or dies in the range of length of stay in the hospital. It also anchors the analytical methods for the length of stay and mortality prediction.
Automatic detection and counting of moving objects in a video is a challenging task and has become key application area of deep learning algorithms as far as overwhelming number of vehicles on road and crime detection problems are concerned. In this paper, details of an intelligent agent are discussed which is developed for detection and counting of moving vehicle in a video stream captured through surveillance camera. The agent that we have developed works in two phases; first is object detection and second is the counting of moving vehicles. For object detection, we have applied state-of-the-art Deep Learning object detection algorithm Single Shot Detector (SSD). For counting of vehicles, we have devised two different algorithms which take input from object detection part. The first counting algorithm is an add-on for robustness on naive approach while the later one is an improvement regarding removal of dependency on capture environment parameters such as imaginary lines, direction of moving vehicles etc. Input to the agent is fed through surveillance camera in the form of video stream. We have used transfer learning by adopting standard Convolutional Neural Network models trained over COCO dataset. The algorithms that we developed for counting of moving vehicles are discussed and empirical results are presented which show their efficacy. We have also discussed the challenges that were faced in counting of moving vehicles in the video stream.