Gene transcriptions/A1BG/Programming

From Wikiversity
Jump to navigation Jump to search

Computer programs to search a nucleotide sequence along a DNA strand can be written in many languages. Here is an example written in BASIC:

# load "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/SuccessablesAGC--.bas"
# This program tests the discovery of AGC (AGCs).
1 dim indicator$(100),molecule$(100)
# This version of Successables.bas starts with the default genome option.
2 input "Start Successables execution from beginning (b), last stop (L), water exchange (w), stop (s)? >> ";decision1$
# For the negative strand (ZSCAN22 to A1BG, in the negative direction), use "--nt.bas".
8 if decision1$ = "b" then goto 10
9 if decision1$ = "s" then goto 100000
10 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/SuccessableIsoforms.bas" for input as #1
11 dim sgeneisoformid$(40000),bb$(40000)
12 input #1,successables
13 for i = 1 to successables
14 input #1,sgeneisoformid$(i)
15 next i
16 close #1
17 print "Successables array has been inputted."
# This subroutine goes one by one through each successable isoform.
# Do not use the index i for anything else.
20 for i = 1 to successables
21 file$ = sgeneisoformid$(i)+"--nt.bas"
22 file2$ = "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Nucleotide Sequences/"+file$
23 open file2$ for input as #1
24 dim numbertypes,nts$(40000)
25 input #1,numbertypes
26 for j = 1 to numbertypes
27 input #1,nts$(j)
28 next j
29 close #1 : poll = 0 : goto 30
# This is the 26 subroutine.
# AGC boxes (AGC)s 3'-AGCCGCC-5'
30 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1 : print "Working on the AGC box!"
31 input #1,agcboxnumber
32 close #1
33 if agcboxnumber = 0 then poll = 1 : goto 280
# This program tests the discovery of AGC boxes.
# The 2 subroutine.
# Check to see if AGC already in file.
# This 2 subroutine loads in gene isoforms and tests AGC--.bas.
40 dim geneisoform$(40000), agcbox$(40000)
41 dim indexjagcbox(40000)
42 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1
43 input #1,agcboxnumber
44 for m=1 to agcboxnumber
45 input #1,geneisoform$(m)
46 input #1,agcbox$(m)
47 input #1,indexjagcbox(m)
50 next m
51 close #1 : if poll = 1 then goto 54
54 for m=1 to agcboxnumber
55 if geneisoform$(m) = sgeneisoformid$(i) then poll = 2 : goto 100000
56 next m
# If the geneisoform is not in AGC--.bas then send program to find AGCs.
57 poll = 1 : goto 280
# This is the 280 subroutine.
# Find all possible AGC: 3'-0A-1G-2C-3C-4G-5C-6C-5'.
# Once found repeat entry must be prevented!
# Any time n > 0, restartagcboxj should be value so that j=restartagcboxj + 1 is the correct restarting value.
280 n=0 : box$="3'-" : indexagcboxj=1 : j=0 : restartagcboxj = 0
# Send computer to see if AGC already found. This is 281 to 306.
281 goto 307
# Recover indexagcboxj. Limit on n is 7.
282 for j = indexagcboxj to numbertypes
283 if n = 1 then goto 291
284 if n = 2 OR n = 3 OR n = 5 OR n = 6 then goto 294
285 if n = 4 then goto 297
288 if n > 0 then goto 295
289 if nts$(j) = "A" then goto 301
290 goto 296
291 if nts$(j) = "G" then goto 301
292 if n = 1 then j = j - 1
293 goto 296
294 if nts$(j) = "C" then goto 301
295 j = restartagcboxj - 1
296 n=0 : box$="3'-" : goto 304
297 if nts$(j) = "G" then goto 301
298 goto 295
301 n=n+1 : box$=box$ + nts$(j)
302 if n = 2 then restartagcboxj = j
# When an AGC has been found, first store the isoform and the AGC. 
# Then send the computer to 100000.
303 if n = 7 then goto 306
# For ZSCAN22 to A1BG use limit of 4560, but for ZNF497 to A1BG use 958.
304 if j = 4560 OR j > 4560 then goto 100000
305 next j
# Store isoform and its AGC.
306 box$=box$ + "-5'" : indexagcboxj = j
307 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1
308 input #1,agcboxnumber
309 close #1
# Check to see if AGC element already in file.
310 if agcboxnumber = 0 AND n = 0 then goto 282
311 goto 332
312 agcboxnumber = agcboxnumber + 1
313 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for output as #2
314 print #2,agcboxnumber
315 print #2,sgeneisoformid$(i)
316 print #2,box$
317 print #2,indexagcboxj
318 close #2
319 goto 295
# Direct computer to DCE.
320 goto 100000
# Check to see if AGC already in file.
327 if indexj < 4560 then goto 332
328 for m=1 to agcboxnumber
329 if geneisoform$(m) = sgeneisoformid$(i) then goto 100000
331 next m
332 agcboxnumber = agcboxnumber + 1
333 dim geneisoform$(40000), agcbox$(40000)
334 dim indexjagcbox(40000)
335 geneisoform$(agcboxnumber) = sgeneisoformid$(i)
336 agcbox$(agcboxnumber) = box$
337 indexjagcbox(agcboxnumber) = indexagcboxj
338 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for output as #2
339 print #2,agcboxnumber
340 for m=1 to agcboxnumber
341 print #2,geneisoform$(m)
342 print #2,agcbox$(m)
343 print #2,indexjagcbox(m)
344 next m
345 close #2
346 goto 295
347 next i
100000 end

In the file that's loaded into the interpreter, "#", without the quotes are non-executed comments for the programmer.

To "load" the program, copy only the "load ..." portion in front of the cursor (>), without the ()s.

Type: run, then answer the questions.