Methodology for Simulation of IoT-Based WSN Flood Data Collection Infrastructure

Internet of Things (IoT) is a potential technology to be used for data collection tasks in real-world environments. However, due to the difficulty of deploying and testing a real IoT implementation, many researchers end up having to use software simulation to evaluate their proposed techniques. This paper focuses on the use of IoTbased WSN for collecting flood-related data, which would then use by flood-related applications such as flood prediction applications and flood early warning systems. This paper proposed a methodology for simulating the IoT system used for flood data collection. The proposed methodology consists of four main steps which are identifying the flood environment, defining the architecture for flood data collection, simulating the IoT-based WSN flood data collection infrastructure and analyzing the results. The activities for each step are described in detail as to guide other researchers in the same area to adapt the methodology to their research work.


INTRODUCTION
Internet of things (IoT) is increasingly being deployed in many areas such as health, environment, transportation, smart city, environment monitoring and disaster. These IoT deployments are used to collect data from the realworld using things, where a 'thing' can be a device or a human. The data collected by things will be processed or/and transferred to a central location. The data collected by the thing will be used by an information system to help in making decision on a specific problem or a situation. Of particular interest to this paper is the use of IoT to assist in making decisions related to flood disaster.
Wireless Sensor Network (WSN) is one of the driver technologies for IoT [1]. In flood disaster, WSN is employed to collect flood data that is used for monitoring and recording the environmental conditions. The flood data collected will be transferred to the central location before it can be used by flood application. WSN consists of a group of spatially dispersed-self-powered sensing devices called nodes. The nodes will collect information such as water level, rainfall measurement, wind speed sensor and flow rate sensor. All these nodes are of lowcost, low-power, resource-constrained multi-functioning sensor nodes, often operating in an unattended, hostile environment, with limited computational and sensing capabilities. Due to these factors, WSN for flood is susceptible to a large number of security attacks that can cause network disruption, access or modify information that being transferred and attacks on node that will modify the node behaviour.
The research on Wireless Sensor Networks (WSN) and Internet of Things (IoT) often resorts to use a software simulation platform to evaluate their work. The simulation is needed for research purposes because most of the studies cannot be implemented physically. Based on the literature review conducted, there are quite a number of simulations works done by researchers that involve the simulation of data collection in an IoT-based WSN environment. However, none of the studies have provided a detailed description of the methodology that they utilised to simulate their IoT-based WSN flood data collection setting.
The aim of this paper is to propose a methodology in simulating the IoT-based WSN flood data collection environment.

II. IOT-BASED WSN IN FLOOD
The use of IoT-based WSN in collecting flood data provides significant benefits over traditional approaches. IoT-based WSN is the integration of WSNs and the Internet Protocol (IP) that connect the nodes to the Internet. This configuration allows flood data collected by WSN nodes to be transferred to the centralized storage or cloud storage. These data become an input to various flood application systems [2]- [9]. Flood prediction is one of the most critical application that can help to predict flood occurrences and once a flood is predicted to occur, an alert can be sent to the people living in the respective area [12]. An accurate and early flood detection is crucial and has the potential to save lives. The accuracy and the timeliness of the warning depends a lot on the flood-related data collected by the IoT-based WSN system.
WSN is a collection of nodes which function cooperatively to form a decentralized network system. In flood data collection environment, a sensing area will be identified and a group of nodes will be installed in the area. Each sensor in this group has its own function within the WSN. For example, in an IoT system deployed to collect river data, the nodes installed are used to collect data such as water level, sediment level and water velocity. The sensing area will be along the river and the WSN nodes will form a network for each sensing area as shown in Figure 1 below. Figure 1. WSN installed at the river Each sensing area consists of sensor nodes and a sink node. Data collected by the sensor nodes will be transmitted to the sink node and from the sink node the data will be further transmitted through multi-hop techniques to the base station. The base station will send the data to the central server or cloud-based storage to be stored. The communication technologies used for each data transmission may differ according to the distance of each node and whether it is a wired or wireless communication. Zigbee wireless technology can be used for data transmission between sensor nodes to sink node because of the limited transmissiom range of less than 100 meter [10]. For a wireless transmission that requires a longer distance, which may involve transmission form a sink node to another sink node and transmission from a sink node to a base station, technology such as Low Power Wide Area Network (LPWAN) can be used for data transmission. Data from the base station can be transferred to the central server or cloud-based storage using satellite or cellular technology such as global system for mobile (GSM), 4G, or 5G communication link. The data stored in the storage server would later be retrieved by flood management application. Simulation of the IoT implementation in flood data collection must be done because it can help in planning the infrastructure needed before the infrastructure is actually implemented. Furthermore, by simulating the flood data collection, it will help in evaluating the performance and effectiveness of the WSN in collecting the flood data.

III. FLOOD SIMULATION METHODOLOGIES
WSN has been the main technology used for collecting flood data to be used for forecasting and monitoring flood event. This section discusses existing works that have been done by researchers in simulating the WSN data collection process.
Abdullah [11] developed a flood model and simulated the flood model using OMNET++ simulator. The performance of the flood model is evaluated using parameters such as power consumption, end-to-end delay and throughput to evaluate the availability and scalability of the network. From the simulation done, the author identified the best model for the flood monitoring system for specific location in Saudi Arabia. The author summarizes the simulation steps into two. The first step is to set up the model which consists of the monitored area, number of base stations, number of sensors and sensor installation. The second step is to run the simulation on the model to be evaluated and the performance of the model is measured based on performance metrics, availability and scalability of the model. Flood/Drought Forecasting System (FDFS) [12] is developed to help the authorities in generating early warnings for flood or drought. WSN is used in collecting data as input for the system. The author explained the FDFS work starting from defining the WSN architecture for the data collection and using the NS2 simulator to simulate and evaluate the WSN architecture based on energy consumption, memory and processing overhead for a set of sensor nodes involved in the sensing area.
The use of IoT and machine learning based embedded system has been proposed to predict the probability of floods in a river basin [13]. However, the author did not highlight how the simulation was done. NS2 is used to simulate a simplified version of WSN network topology used and implement machine learning model for flood prediction. The author used NS2 simulator in simulating the WSN flood prediction process. The author identified the simulation parameters that will be configured into the experiment before the simulations takes place. These parameters include the size of area coverage, number of sensor nodes involve, packet size, transmission protocol, simulation time, routing protocol and initial energy for each node. Once the simulation is done, MATLAB is used to visualize the result.
The following research works also discussed about the use of WSN in collecting flood data but the detailed desription on how the simulation was conducted is not highlighted in their work. Lisangan et al. [14] designed a prototype in detecting flood in the city of Makassar using WSN. The design of the WSN in terms of sytem infrastructure, communication path between sensor nodes, and the technique used to visualize the data are explained. The author concluded that the data collected by WSN can help in monitoring the network congestion on the wireless public network and flood in Makassar. Kei Hiroi et al. [15] work focuses on early prediction of flash floods in using water level sensor for urban complex water flow (CWF). The result of monitoring the flash flood monitoring is compared to the actual height of the water level. Thekkil et al. [16] developed an early flood detection and control system that captures real time images using WSN. The image is compared with existing data before a warning is generated. The detail of the system implementation is presented and the result of the actual implementation is discussed. Nuhu et al. [17] simulated an energy efficient and accurate flood monitoring system that used 6loWPAN as a communication protocol. The actual small-scale system is implemented and tested using the emulator. Alfarra et al. [18] proposed a system in detecting occurrence of flood by measuring water level and streamflow of water in rivers. The simulation is focusing on the size of data being transferred to base station. Lukic et al. [19] proposed a system for river monitoring based on WSN. WSN is used for collecting environment data such as flow rate, water level, rainfall and pollution level. All these environmental parameters are collected and measured for a month. Simulation of the river monitoring is done using Matlab. The results show that the environmental parameters were accurately measured during the observation period using the proposed simulation model. From the work done by the researchers described above, we can conclude that WSN is one of the main technologies used in collecting flood data. The simulation is done to analyze the performance of the WSN before it can be deployed in real environment. The performance of the WSN is measured using performance parameters. The parameters are identified first before the simulation begins. The simulation is used to test the functionality of an IoT system and to evaluate the developed flood models. The functionality of an IoT system is simulated to evaluate whether the system will function well in the intended sensing environment. The result from the simulation helps in identifying the best flood model that can be implemented in real environment. Although the studies above have shown that simulation is the main practical approach to analyze the deployment of WSN in real environment, none of them has come out with a clear description of methodology that should be used to develop and run the simulation. For this reason, this paper will propose a simulation methodology for simulating the IoTbased WSN flood data collection.

IV. PROPOSED IOT-BASED FLOOD DATA COLLECTION SIMULATION METHODOLOGY
This section will describe the proposed simulation methodology that consists of four main steps. Each of the steps will be described as generic as possible to make it easier to be applied in many applications. Figure 2 shows the four main steps of the proposed flood data collection simulation.

A. Identify the Flood Environment
Flood environment refers to the situation or environment that needs to be simulated. The environment may include river flood, snowmelt flood, or flash flood that can occur at the underground parking area or flood that happens in city area due to blocked drainage. The type of flood environment is essential to be identified because it will affect the next step in this simulation methodology

B. Define the Architecture for Flood Data Collection
Defining the architecture for flood data collection is related to the type of flood environment that is identified in step 1. In this step, the environment detail will be defined and this may include the size of the area to be affected by the flood, the types of data that need to be collected, the method by which the data is to be transmitted and the data storage method. The area size will define the size of the sensing areas to be included in the architecture. Sensing area refers to the area where the data will be collected. If the architecture consists of more than one sensing area, it needs to consider the method by which the collected data is to be transferred to the central server. This can be done either by transmitting the data through an infrastructurebased network such as the cellular network or by sending the data from one node to another until it arrives at a sink node that will then transmit the data to the central server. Furthermore, a sensor or a group of sensors will be configured in the sensing area according to the types of data that need to be collected.
Referring to Figure 1 in Section II, the configuration may include three types of devices which are sensor node, sink node and base station. Table 1 below shows the three types of nodes and their definition. In each sensing area, each sensor node will transfer data to a sink node. Then, the sink node will transfer the data to another sink node if the sink node is far away from the base station. The last sink node will transfer the data to the base station and the base station will transfer it to the central or cloud storage. For each stage of the data transfer, the communication link needs to be configured with a specific wired or wireless communication technology.

C. Simulate the IoT-Based Flood Data Collection Infrastructure
This step is the primary process in simulating the flood data collection. This step will be further broken up to several sub-processes, as depicted in Figure 3.

1) Select the Simulation Platform
Many types of simulators can be used to simulate an IoT-based environment. These include Cooja, OMNET++, NS3, TOSSIM and Avrora. The choice of platform depends on the objectives of the simulation, the number of WSN nodes to be deployed, the types of language supported by the platform, the support for GUI based simulation and whether the simulator is open-source or commercial software. Table 3 shows the comparison between the simulators.

2) Configure the Simulation Environment
After identifying the simulation platform, this step will focus on configuring the simulation environment. It will start by configuring the nodes involved in the stated flood environment. In configuring this, the output from step B above can be used as a guide. The nodes need to be configured according to their purpose. The node can be a sensor node where the task is just collecting the data. The node can also be a sink node which is responsible for receiving data from the sensor nodes and then transmit them to the base station. Next, the base station will gather the data and will transfer the data to the central server. The communication links between the nodes also need to be configured. This may include the communication technology used and the frequency of data transmission. Table 4 shows examples of simulation parameters that are used in configuring the simulation environment. 3) Identify and Configure the Performance Metrics Performance metrics are identified according to the objective of the simulation. The performance metrics to be chosen may vary according to what type of information is needed. Performance metrics that can be used in flood simulation are data loss, power consumption, end-to-end delay and throughput [22].
Power consumption is about measuring the sensing node's battery life. This is related to how frequent the data need to be collected and transferred. The more frequent the the activity of the sensor node in collecting and transferring data, the faster the power will be drained. If this happens, it may result in data loss [21].
End-to-end delay refers to the time taken for a single packet to be transmitted across the a network from source to destination. The measurement of end-to-end delay for WSN flood can be defined as the time taken to transfer sensor data from the sensor node to the sink node [23]. Minimizing the end-to-end delay is important for flood prediction applications.
Throughput refers to the rate at which data is received by the receiver that can be measured in bit per second (bps). Since WSN uses wireless transmission, throughput can be affected by multiple factors such as signal strength, number of nodes in the area, or even the weather condition. The number of nodes can affect the throughput because by having more nodes in the same area, there will be overlapping signals which may cause signal interference and collision. This would then result in lower throughput.
Availability of the data and how big the sensing area are important for flood data collection. In the case where the flood data is used for predicting flood occurrence, flood data must be available and must be received on time, so that the warning can be issued earlier. The bigger the sensing area the more flood data can be collected and the more data can be analysed and this hopefully would enable for more accurate flood prediction.

4) Run the Simulation
Once the simulation environment is ready a list of experiments needs to be constructed in order to make sure that useful information is collected from the simulation. The experiment constructed must ensure a solid foundation to answer the questions addressed by the research being conducted. Hence, experiments can be conducted by involving performance parameters and using multiple different scenarios. Each experiment may consist of objectives, a scenario of the events, simulation parameters and performance metrics as explained in the previous steps.
In the simulation software, there are events or parameters that are randomly generated during simulation such as when a node transmits data and how much data is transmitted. This is done using a Random Number Generator (RNG). The RNG uses a pseudo-random number algorithm to produce a list of integers that are seemingly random. The pseudo-random number algorithm is initialized using an integer value, and this integer value is called a seed. Different seed values will give a different list of random numbers. To get a reliable simulation result, an experiment must be conducted multiple times, where each time the simulation runs, a different seed value is used for the RNG so that a unique simulation event is generated. The final result should be averaged from the result of each run. In deciding whether the simulations run for the specific performance parameter is enough or not, confidence level interval needs to be calculated. The confidence level interval can be used to determine whether or not the result is significant based on the configuration done in each experiment.
Authors in [24] define confidence interval as an estimated range of values that is likely to include an unknown population parameter. Authors in [25], [26] shows the methods on calculating the confidence interval.

D. Analyze the Result
The result from each of the experiments can be presented according to performance parameter as identified in Step 2 above. The result can be visualized using data visualization techniques. Among the common techniques used to present the results include table, bar chart, scatter plot, correlation matrices and histogram plot.
Analyzing the result is one of the critical steps that need to be documented. The analysis that can be done are on the effect of power consumption with regards to data transfer rate on every node, end-to-end delay experienced while transferring data from source to destination that involve many intermediate nodes and on the percentage of data loss that could happen according to the setting done for each experiment.

V. CONCLUSION
In this paper, we have proposed a 4-step methodology for simulating the IoT infrastructure used for collecting flood-related data. The methodology can be applied to other research work related to flood that uses an IoT technology such as WSN. It is hoped that the methodology can become a reference in simulating and evaluating IoTrelated techniques proposed by researchers in the area of flood disaster.